An Idelic Life

An Introduction to Tuning and Temperament

Friday, August 1, 2008 in music | 4 comments

In this post, I will explain some different tuning systems and their advantages and disadvantages. First, let’s see what the basic goals are of tuning systems.

There should be 12 notes per octave, in half-step intervals.
An octave (12 half steps) should be at a ratio of 2:1.
A fifth (7 half steps) should be at a ratio of 3:2.

We’ll look at some other goals later on, but let’s start with these. Since 7 and 12 are relatively prime, these goals are sufficient to generate an entire tuning system. However, this would require that 7 octaves be the same as 12 fifths, or $2^7=(3/2)^{12}$ . Unfortunately, this is not the case; in fact, it’s not even particularly close. We call the discrepancy, $3^{12}/2^{19}$ , a Pythagorean comma.

The first two principles really cannot be compromised. (Well, in some Indian, Arabic, and Turkish tuning systems, the first principle is modified. A common Turkish tuning system consists of 53 notes to an octave, for reasons we will see later on.) The first principle is essential because pianos and other keyboard instruments have 12 notes per octave, and Western music is always written with the 12 notes per octave model in mind. The second principle is essential since we want to treat two notes that differ by an octave as being the same note in many cases; therefore any fudging on goal 2 leads to unacceptable results.

The third principle, however, can be modified, if necessary (and, as we have just seen, it is necessary). The most naïve way of fudging is to make 11 out of 12 fifths be in a 3:2 ratio, and fit the last one to whatever is needed to make the cycle work out. This is known as Pythagorean tuning.

One immediate drawback of Pythagorean tuning is that it makes the so-called “wolf” fifth unusable beyond repair; it just sounds awful. But there are also more subtle problems with Pythagorean tuning. To see these, let’s see one other goal.

4. A major third (4 half steps) should be at a ratio of 5:4.

For simplicity, let’s ignore the problem of the wolf fifth for now. There are other more relevant problems to worry about already. Since $4=7\times 4-12\times 2$ , a major third should be four fifths minus two octaves. In terms of the ratios, this means that it would be necessary that $5/4=(3/2)^4/2^2=81/64$ . In other words, we’re off by a factor of $81/80$ . While this ratio may seem rather small, it turns out to be a very serious problem with the Pythagorean tuning system.

Mathematically, the problem is that we’re trying to factor rational numbers in terms of only powers of two and powers of three. Therefore, as soon as we need to use any prime greater than 3 (in this case, 5), we end up in trouble. However, we don’t really have to factor all rational numbers in terms of powers of two and three; we just have to approximate these factorizations reasonably well. Still, our goals so far are quite mutually incompatible, so we have to sacrifice something, somewhere.

As we pointed out, 5:4 is a better major third than 81:64, so let’s try to construct a tuning system that takes that into account. For simplicity, we’ll assume that our scales always start with C, and we’ll construct ratios for intervals above C. A good possibility is the following:

C 1

C-sharp 16/15

D 9/8

E-flat 6/5

E 5/4

F 4/3

F-sharp 45/32

G 3/2

A-flat 8/5

A 5/3

B-flat 9/5

B 15/8

C 2

This list is generated by using the rules described so far and using the ratio with smallest denominator when there are multiple choices. This tuning system is known as (5-limit) Just intonation, the 5 meaning that we have allowed ourselves to use rational numbers whose factorizations contain primes up to 5.

Naturally, the Just intonation chart should be compared to the chart for Pythagorean tuning, which is as follows (assuming we place the wolf between the G-sharp and the E-flat, which is as good a place as any):

C 1

C-sharp 2187/2048

D 9/8

E-flat 32/27

E 81/64

F 4/3

F-sharp 729/512

G 3/2

G-sharp 6561/4096

A 27/16

B-flat 16/9

B 243/128

C 2

Note the much larger denominators needed in Pythagorean tuning.

So, I pulled these goals out of thin air at the beginning, so I really ought to explain why they are natural goals. For that, it is necessary to discuss the overtone series.

On a string instrument, when a note is sounded, all integer mutliple frequencies are also sounded simultaneously. However, the higher multiples are quieter than the lower ones. Therefore, it is beneficial to have a tuning system that emphasizes at least the first few overtones, while possibly discarding higher ones. The first overtone, at a ratio of 2, sounds one octave higher than the original note (called the fundamental). The second overtone, at a ratio of 3, sounds an octave and a fifth higher than the original note. Subtracting the octave (or, equivalently, dividing by 2) tells us that fifths should be in a ratio of 3:2 above the fundamental. Continuing on, the third overtone is at a ratio of 4, so it sounds two octaves above the fundamental. The next overtone, at a ratio of 5, sounds two octaves and a third above the fundamental.

We really want all these notes to be part of a scale, since we tend to play several notes together, and common overtones yield pleasant sounds. So that’s where these goals I mentioned above come from.

So, it would appear that Just intonation is a good tuning system. However, again there are serious drawbacks. For example, from D to A, which is a perfect fifth, we have a ratio of 40:27, rather than the desired 3:2. There are many other intervals that are similarly problematical, always differing by a factor of 81/80 from the desired ratio. The result is that many such intervals must be avoided in Just intonation. Therefore, Just intonation is only usable in music which specifically avoids certain intervals. So, it’s really not a very nice solution.

One really obvious solution to the entire problem of tuning is just to make all the half steps equal, in a ratio of $2^{1/12}$ . In fact, this temperament, known as equal temperament, is what is generally used on pianos today. What is gained by using equal temperament is that there are no wolf tones, and all intervals are usable. However, $2^{1/12}$ is irrational, so we do not get any perfect overtones or intervals with perfect ratios, except the octaves. So, in using equal temperament, one sacrifices all the overtones (except octaves) in order to get a uniformly playable system.

To many people, the sacrifice of overtones is unacceptable. Therefore, we’ll ignore equal temperament from now on.

One attempt to resolve our problems is to use the idea of Pythagorean tuning, but instead of putting the wolf tone all in one place, we spread it out over the octave by putting a quarter of a Pythagorean comma in four different places. To do this, we make the fifths C-G, G-D, D-A, and B-F-sharp a quarter of a comma smaller than the usual 3/2 ratio. Here is that system, known as Werckmeister I:

C 1

C-sharp 256/243

D $64/81\times \sqrt{2}$

E-flat 32/27

E $256/243\times 2^{1/4}$

F 4/3

F-sharp 1024/729

G $8/9\times 8^{1/4}$

G-sharp 128/81

A $1024/729\times 2^{1/4}$

B-flat 16/8

B $128/81\times 2^{1/4}$

C 2

(Werckmeister also devised several other temperaments, but this one is the most popular.) Of course, we’ve lost many of the perfect fifths here. However, they’re not that bad, and the thirds are also pretty good. (The ratio of the C-E here, $256/243\times 2^{1/4}$ , and the desired C-E, 5:4, is about 1.002; that’s definitely acceptable. Of course, some thirds aren’t very good, but these ones are likely to be played less frequently in most “reasonable” keys. So Werckmeister I seems like a pretty good temperament.

I’ll post about some other temperaments (such as quarter comma and sixth comma meantone) another day, but this is a math blog, so I’d like to say something that has some semi-serious mathematical content. In particular, I mentioned earlier that 53 notes to an octave can lead to a very nice tuning system, so I’ll explain where I got that number. Ideally, we’d like some number of fifths to be equal to some other number of octaves. This would allow us to satisfy goals 2 and 3 perfectly. This means that we would like to solve $(3/2)^x=2^y$ as a Diophantine equation (meaning, look for integer solutions). Unfortunately, the only solution is $(0,0)$ , which is completely useless. But all is not lost, as it is possible to approximate solutions to this equation. Really, we only require that $(3/2)^x\approx 2^y$ . I prefer to get rid of the variable $y$ and look for approximate rational solutions to $(3/2)^x\approx 2$ . In other words, I want good rational approximations of $\log_2(3/2)$ .

We can easily find the best possible rational approximations by using continued fractions. The continued fraction coefficients begin with 0, 1, 1, 2, 2, 3, 1, 5, 2, 23, meaning that

$\log_2(3/2)=\frac{1}{1+}\frac{1}{1+}\frac{1}{2+}\frac{1}{2+}\frac{1}{3+}\frac{1}{1+}\frac{1}{5+}\frac{1}{2+}\frac{1}{23+\cdots}$ .

(I can’t figure out how to get WordPress to allow nested fractions, unfortunately.) Cutting this off at various points gives the following sequence of rational approximations for $\log_2(3/2)$ :

1, 1/2, 3/5, 7/12, 24/41, 31/53, 179/306, 207/353, and I don’t really want to work out the next one. In these fractions, the numerator represents the number of units that are to be in a fifth, and the denominator represents the number of units that are to be in an octave. So the usual system corresponds to the approximation 7/12. Good rational approximations are ones whose denominators are much smaller than the next term of the continued fraction approximation. Therefore, 7/12 is quite good, as is 31/53. Thus a 53-note scale is a very natural alternative to the standard 12-note scale.

Galois Groups of Local Fields

Tuesday, July 29, 2008 in algebraic number theory | 2 comments

We begin with a few definitions.

Definition 1: A integral domain $R$ is called a Dedekind domain if it is noetherian, every nonzero prime ideal is maximal, and it is integrally closed in its field of fractions.

Definition 2: A ring $R$ is called a discrete valuation ring (DVR) if it is a principal ideal domain (PID) with exactly one nonzero prime ideal. (In other language, a DVR is a local PID of Krull dimension 0 or 1.)

One very important property of Dedekind domains is that ideals have unique factorizations as products of prime ideals. I used this property in the case of rings of integers in my last post to say that if $L/K$ is an extension of number fields with rings of integers $B/A$ , so if $\mathfrak{p}\subset A$ is a prime ideal, then we can write $\mathfrak{p}B=\prod_{i=1}^g \mathfrak{P}_i^{e_i}$ . But this result holds in more generality, for any Dedekind domain.

Also, it is very easy to check that a DVR is a Dedekind domain. But one very common occurrence of DVRs is as localizations of rings of integers. In particular, if $A$ is a Dedekind domain and $\mathfrak{p}$ is a prime ideal of $A$ , then $A_{\mathfrak{p}}$ is a DVR.

One way to interpret a DVR is through the following filtration of ideals. Let $R$ be a DVR, and let $\mathfrak{p}$ be the unique nonzero prime ideal of $R$ . Then every nonzero ideal of $R$ is of the form $\mathfrak{p}^n$ for some $n\ge 0$ (where by $\mathfrak{p}^0$ I mean $R$ ). Now, for any $x\in R\setminus\{0\}$ , there is an integer $n$ so that $x\in\mathfrak{p}^n\setminus\mathfrak{p}^{n+1}$ . We can now define a function $v:R\setminus\{0\}\to\mathbb{N}$ (where $\mathbb{N}$ includes 0 in this case) by $v(r)=n$ as above. We can extend our definition of $v$ to all of $R$ by setting $v(0)=+\infty$ .

It is also possible to extend $v$ to the quotient field $K$ of $R$ by setting $v(x/y)=v(x)-v(y)$ ; it is easy to check that this is well-defined. Now, $v$ satisfies the following properties:

1) $v:K\setminus\{0\}\to\mathbb{Z}$ is a surjective homomorphism.

2) $v(x+y)\ge\min(v(x),v(y)$ .

We call such a function $v$ a valuation of the field $K$ .

Knowing $v$ is enough to reconstruct $R$ , since $R=\{x\in K:v(x)\ge 0\}$ . Furthermore, $\mathfrak{p}=\{x\in K:v(x)\ge 1\}$ . We call $R$ the valuation ring of $K$ .

Let’s look at a few places where DVRs arise naturally.

1) As we mentioned earlier, the localization of a Dedekind domain at a prime ideal is a DVR. So, for example, $\mathbb{Z}_{(p)}=\{x/y\in\mathbb{Q}:p\nmid y\}$ is a DVR if $p$ is a prime. The unique prime ideal is $p\mathbb{Z}_{(p)}$ .

2) The ring $\mathbb{Z}_p$ of $p$ -adic integers is a DVR with unique prime ideal $p\mathbb{Z}_p$ . Also, finite extensions of the field $\mathbb{Q}_p$ of $p$ -adic numbers inherit valuations from $\mathbb{Q}_p$ , and so they contain DVRs as described above. In particular, if $K/\mathbb{Q}_p$ is a finite field extension, then the integral closure of $\mathbb{Z}_p$ in $K$ is a DVR.

Now, if $R$ is a DVR and $\mathfrak{p}$ is its prime ideal, then $R/\mathfrak{p}$ is a field. In the cases described above, this will always be a finite field; in what follows, we always assume that this field is finite. We call $R/\mathfrak{p}$ the residue field.

We can also put a topology on a valued field $K$ by letting the following sets be a basis for the topology: if $x\in K$ and $n\ge 0$ is an integer, then $\{y\in K:v(x-y)\ge n\}$ is an open set. These sets generate the topology. In what follows, we will assume that $K$ is complete as a topological space with this topology. Finite extensions of $\mathbb{Q}_p$ are complete with respect to this topology, so this will be our motivating example. The residue fields will also be finite.

Last post, I pointed out that if $L/K$ is a Galois extension of number fields, then $efg=[L:K]$ . This holds more generally, however. If $L/K$ is a finite Galois field extension, and $A\subset K$ is a Dedekind domain so that $K$ is the quotient field of $A$ , and $B$ is the integral closure of $A$ in $L$ , then we still have $efg=[L:K]$ .

We now interpret this in the case of $K$ a field complete with respect to a discrete valuation $v$ , and $A$ the valuation ring of $K$ . Let $L/K$ be a finite Galois extension, and let $B$ be the integral closure of $A$ in $L$ , or, equivalently, the valuation ring of $L$ . Then $L$ is also complete with respect to a discrete valuation $w$ that is very closely related to $v$ , as we will see soon.

Let $\mathfrak{p}$ be the prime of $A$ , and let $\mathfrak{P}$ be the prime of $B$. Since there is only one prime, $g=1$ . Hence $\mathfrak{p}B=\mathfrak{P}^e$ . Now, if $x\in K$ , then $w(x)=ev(x)$ , and if $x\in L$ , then $fw(x)=v(N_{L/K}x)$ . (But we won’t need these results in what follows, at least today.) The implication is the decomposition group of the extension $L/K$ is the entire Galois group.

We can put a filtration on the Galois group as follows: For $i\ge -1$ , let $G_i=\{\sigma\in Gal(L/K):w(\sigma(x)-x)\ge i+1 \hbox{ for all } x\in L\}$ . We call $G_i$ the $i^{\hbox{th}}$ ramification group of $L/K$ . $G_{-1}=G$ is the entire Galois group (or the decomposition group; $G_0$ is the inertia group. Also, each $G_i$ is normal in $G$ .

Now, I won’t prove it here, but it can be shown that if the residue field is finite of characteristic $p>0$ and $K$ is complete, then for each $i\ge 1$ , $G_i/G_{i+1}$ is a direct product of copies of $\mathbb{Z}/p\mathbb{Z}$ , and $G_0/G_1$ is a subgroup of the roots of unity of $B/\mathfrak{P}$ (and hence finite and cyclic of order prime to $p$ ). Hence, by basic group theory or otherwise, $G_0$ is a semidirect product of a normal Sylow $p$ -subgroup and a cyclic group of order prime to $p$ . In particular, $G_0$ is solvable. However, as shown in the last post, $G/G_0\cong Gal((B/\mathfrak{P})/(A/\mathfrak{p}))$ is cyclic since it is the Galois group of an extension of finite fields. Hence:

Theorem: $G$ is solvable.

A Generalization of Dirichlet’s Theorem

Sunday, July 13, 2008 in Uncategorized | Tags: algebraic number theory | Leave a comment

The following is a well-known result:

Theorem: If $a$ and $m$ are integers, with $(a,m)=1$ , then there are infinitely many primes congruent to $a\pmod m$ .

It turns out that Dirichlet’s Theorem is actually a special case of Artin’s Reciprocity Law. So, we’ll discuss how this works.

Let $L/K$ be an extension of number fields. (That is, $L$ and $K$ are finite extensions of $\mathbb{Q}$ .) Let $A$ and $B$ be the rings of integers of $K$ and $L$ , respectively. (This means that $A$ and $B$ are the integral closures of $\mathbb{Q}$ in $K$ and $L$ , respectively.) Now, let $\mathfrak{p}$ be a nonzero prime ideal in $A$ . Then $\mathfrak{p}B=\prod_{i=1}^g \mathfrak{P}_i^{e_i}$ for some primes $\mathfrak{P}_i$ of $B$ and some positive integers $e_i$ . If $e_i>1$ , we say that $\mathfrak{P}_i$ is ramified over $\mathfrak{p}$ . We call $e_i$ the ramification index. The primes $\mathfrak{P}_i$ are said to lie above $\mathfrak{p}$ .

Since $A$ and $B$ are Dedekind domains, $\mathfrak{p}$ and $\mathfrak{P}_i$ are maximal ideals. Hence $k=A/\mathfrak{p}$ and $\ell=B/\mathfrak{P}_i$ are finite fields, and $\ell/k$ is a field extension. Let $f_i=[\ell:k]$ . We call $f_i$ the residue degree.

It is not too difficult to show that if $n=[L:K]$ , then $n=\sum_{i=1}^g e_if_i$ . If $L/K$ is a Galois extension, then $e_i$ and $f_i$ are independent of $i$ (since the Galois group of $L$ over $K$ acts transitively on the $\mathfrak{P}_i$ ), so we can write $n=efg$ .

Now, let’s define a few subgroups of $Gal(L/K)$ . We’ll assume from now on that $L/K$ is a Galois extension. Furthermore, we fix a prime $\mathfrak{p}$ of $A$ , and some prime $\mathfrak{P}$ of $B$ lying above $\mathfrak{p}$ . Now, define $D_{\mathfrak{P}}=\{\sigma\in Gal(L/K):\sigma(\mathfrak{P})=\mathfrak{P}\}$ . We call $D_{\mathfrak{P}}$ the decomposition group. We now have a homomorphism $\phi : D_{\mathfrak{P}}\to Gal(\ell/k)$ . To define $\phi$ , we note that an element of $D_{\mathfrak{P}}$ permutes cosets of $B/\mathfrak{P}$ and thus gives the desired homomorphism. Furthermore, this homomorphism is surjective. The kernel $T_{\mathfrak{P}}$ of $\phi$ is called the inertia group. Hence $D_{\mathfrak{P}}/T_{\mathfrak{P}}\cong Gal(\ell/k)$ .

It is not hard to determine the sizes of $D_{\mathfrak{P}}$ and $T_{\mathfrak{P}}$ in terms of quantities we already understand: $|D_{\mathfrak{P}}|=ef$ and $|T_{\mathfrak{P}}|=e$ . In particular, if $\mathfrak{p}$ is an unramified prime, then $D_{\mathfrak{P}}\cong Gal(\ell/k)$ .

That’s particularly nice, because Galois groups of extensions of finite fields are always cyclic, generated by the Frobenius automorphism. Thus in the unramified case, $D_{\mathfrak{P}}$ is cyclic and generated by an automorphism $\sigma$ satisfying the congruence $\sigma(x)\cong x^{|k|} \pmod{\mathfrak{P}}$ for all $x\in B$ . (We can extend $\sigma$ to all of $L$ by multiplicativity.) Furthermore, this element $\sigma$ is unique. The common notation for $\sigma$ is $(\mathfrak{P},L/K)$ .

If $\mathfrak{P}$ and $\mathfrak{Q}$ are two primes lying above $\mathfrak{p}$ , then there is some element $\sigma\in Gal(L/K)$ so that $\sigma(\mathfrak{P})=\mathfrak{Q}$ . It is easy to verify that $(\mathfrak{Q},L/K)=\sigma(\mathfrak{P},L/K)\sigma^{-1}$ . Therefore, if $Gal(L/K)$ is abelian, then $(\mathfrak{P}/L/K)$ depends only on $\mathfrak{p}$ . In this case, we may write $(\mathfrak{p},L/K)$ for this element.

Finally, we’re ready to state (part of) the Artin reciprocity theorem.

Let $L/K$ be an abelian Galois extension of number fields, and let $\sigma\in Gal(L/K)$ be fixed. Then there are infinitely many primes $\mathfrak{p}$ of $A$ that are unramified and so that $\sigma=(\mathfrak{p},L/K)$ .

(In fact, only finitely many primes ramify, since primes ramify if and only if they divide the discriminant, which can be easily verified. The other part of the statement is more interesting.)

Let’s look at one example. Take $K=\mathbb{Q}$ and $L=\mathbb{Q}(\zeta_n)$ , where $\zeta_n=e^{2\pi i/n}$ . Then $Gal(L/K)\cong(\mathbb{Z}/n\mathbb{Z})^\times$ , where the isomorphism is as follows: if $(m,n)=1$ , then there is an automorphism $\sigma_m\in Gal(L/K)$ defined by $\sigma_m(\zeta_n)=\zeta_n^m$ . Now, if $(p,n)=1$ , then $((p),L/K)=\sigma_p$ . In particular, this case of the theorem is equivalent to Dirichlet’s Theorem on primes in arithmetic progressions.

It turns out that the theorem isn’t too much more general than this, since any abelian extension is contained in a cyclotomic extension (this is the Kronecker-Weber Theorem), and it’s not hard to see what happens to Frobenius elements when we pass to sub-extensions.

All this material can be found (with many more details included) in Serre’s Local Fields.

Projective and Injective Modules

Tuesday, June 17, 2008 in homological algebra, ring theory | 2 comments

Let’s fix a ring $R$ . A module (assumed to be a left module, I suppose, but it doesn’t really matter as long as we’re consistent) $M$ is said to be projective if the functor $Hom(M,-)$ is exact. (That is, if $0\to A\to B\to C\to 0$ is a short exact sequence of $R$ -modules, then $0\to Hom(M,A)\to Hom(M,B)\to Hom(M,C)\to 0$ is also exact.) Dually, $M$ is said to be injective if the functor $Hom(-,M)$ is exact.

There are various equivalent conditions for projectives and injectives. One particularly useful result is that projective modules are exactly the direct summands of free modules. Another one is that injective modules satisfy a certain extension property: A module $J$ is injective if and only if for any map $\phi:A\to J$ and any injective map $f:A\to B$ , there exists a map (not necessarily unique) $\theta:B\to J$ so that $\phi=\theta\circ f$ .

Actually, we didn’t need to start with a ring $R$ at all; it would make just as much sense to allow $M$ to be an object in an arbitrary abelian category $\mathcal{C}$ . We say that $\mathcal{C}$ has enough projectives if for every object $A$ of $\mathcal{C}$ , there is an epimorphism (or, a surjective map, in the case of many interesting categories) $P\to A$ , where $P$ is projective. Dually, $\mathcal{C}$ has enough injectives if for every object $A$ of $\mathcal{C}$ , there is a monomorphism (or, an injective map, in the case of many interesting categories) $A\to J$ , where $J$ is injective.

It is easy to see that the category of modules over a ring $R$ has enough projectives: if $A$ is an $R$ -module, just take the free module on all the elements of $A$ , and then quotient out by the submodule consisting of all relations in $A$ . Hence $A$ is isomorphic to a quotient of a free (and hence projective) module.

It is also true that the category of modules over a ring $R$ has enough injectives, but this is a bit trickier to prove. To begin, we note that an arbitrary product of injective objects is injective. This follows from the isomorphism $Hom(A,\prod_{i\in I} B_i)\cong\prod_{i\in I} Hom(A,B_i)$ . We also note (although I’m not going to prove it here) that in the category of abelian groups (or $\mathbb{Z}$ -modules), injective modules are the same as divisible modules (i.e. modules $M$ so that the maps $m\mapsto nm$ for $n\neq 0$ are all surjective).

Let’s first show that the category of abelian groups has enough injectives. The abelian group that plays the most important role here is $\mathbb{Q}/\mathbb{Z}$ . Let $A$ be any abelian group, and let $I(A)$ be the product of copies of $\mathbb{Q}/\mathbb{Z}$ , indexed by the set $Hom(A,\mathbb{Q}/\mathbb{Z})$ . Then $I(A)$ is injective, and there is a canonical map $e_A:A\to I(A)$ . We now check that $e_A$ is actually an injective map. To do this, pick $0\neq a\in A$ . It is easy to find some nontrivial map $a\mathbb{Z}\to\mathbb{Q}/\mathbb{Z}$ . By the extension property for injective modules, this map extends to a map on all of $A$ . This is enough to show that $e_A$ is an injective map.

Now let’s return to the category of modules over an arbitrary ring $R$ . It can be shown that if $J$ is an injective abelian group, then $Hom(R,J)$ has the structure of an injective $R$ -module. Now, let $M$ be an arbitrary $R$ -module. Then let $I(M)$ be the product of copies of $I_0=Hom(R,\mathbb{Q}/\mathbb{Z})$ , indexed by the set $Hom_R(M,I_0)$ . Then, just as before, there is a canonical injective map $M\to I(M)$ . This completes the proof that the category of $R$ -modules has enough injectives.

Martingale Monkeys

Friday, February 8, 2008 in probability theory | 9 comments

A friend told me about an interesting problem that can be solved easily using martingales. I’ll start with the problem and solution, and then I’ll talk a bit about martingales in general. (That’s all I can do; I don’t know much about them.)

Problem: A monkey is sitting at a typewriter, typing a letter (A-Z) independently and with uniform distribution each minute. What is the expected amount of time that passes before ABRACADABRA is spelled?

Solution: Suppose that, before every keystroke is made, a new monkey enters and wagers $1 on the next keystroke being an A fairly (so that if the keystroke is indeed an A, then the payoff is $26). If the keystroke is an A, the monkey stays and wagers everything (in this case $26) on the next letter being B, and so on. If the monkey ever loses a wager, then it leaves. Now let’s analyze what happens when ABRACADABRA is finally spelled out. The monkey who kept making correct wagers all the way through won $ $26^{11}$ . But another monkey who got in on the second ABRA won $ $26^4$ , and a third monkey who got in on the final A got $26. Hence the total payoff is $ $26^{11}+26^4+26$ . But all the wagers are fair, and the house gets $1 on every turn from the new monkey, so the expected time before ABRACADABRA is spelled is $26^{11}+26^4+26$ .

I’m trying to figure out intuitively why the expected time should be longer than $26^{11}$ . Here’s my best understanding of it: if a lot of random letters are typed, then on average ABRACADABRA will be about as common as any other 11-letter string. However, they’ll be clumpier than some other strings, since we’ll see some instances of ABRACADABRACADABRA, which contain ABRACADABRA twice. But we can never clump, say, AAAAAAAAAAB more densely. Therefore, although the number of occurrences should be roughly equal in the limiting case, one can clump more easily than the other, so it should take more time to the first occurrence of ABRACADABRA than to the first occurrence of AAAAAAAAAAB.

In fact, we can make this clumping relationship even clearer with the following question: Suppose we have only two letters, A and B, and a monkey randomly types them. What is the probability that BA occurs before AA? The answer is $3/4$ for a rather simple reason: if the sequence does not begin with AA, then there will be a B before the first occurrence of A, so BA will occur before AA unless the sequence starts with AA, which happens only $1/4$ of the time.

Now on to martingales. A martingale is a sequence of random variables $X_1,X_2,X_3,\ldots$ so that the expectation $E(X_{n+1}\mid X_1,\ldots,X_n)=X_n$ (and $E(|X_n|)<\infty$ ). In this case, the random variables $X_n$ for each monkey are the payoffs after $n$ wagers by that particular monkey. If the $(n+1)^\text{th}$ wager is going to be made, then $X_n=26^n$ , and $X_{n+1}$ is going to be $26^{n+1}$ with probability $1/26$ and 0 with probability $25/26$ . Thus $E(X_{n+1}\mid X_1,\ldots,X_n)=26^{n+1}/26=26^n=X_n$ . Hence this chain is a martingale.

A Mathematician’s Take on Challah Braiding

Wednesday, September 12, 2007 in food, group theory | 3 comments

For many years, my mother has baked a challah nearly every Friday for Shabbat. Occasionally, however, she asks me to do some portion of the challah making, possibly including the braiding. For reasons we’ll see later, I don’t like her braiding algorithm. This post includes several algorithms, written in a way that someone who knows what a braid group is can understand. (I never had much success following those series of diagrams I sometimes see; I always wished someone would write out the braiding process in terms of generators of the braid group, so that’s what I’m going to do here after I give the preliminary definitions.)

Wikipedia’s page on braid groups has lots of interesting things, so I’ll only write a few essential points here, leaving the reader to explore Wikipedia at eir leisure. I’m finding it a bit tricky to give a good informal definition of braids, so I’ll just assume that my reader knows roughly what a braid is and skip to the formal definition.

The braid group on $n$ strands is the group $B_n=\langle a_1,\ldots,a_{n-1}\mid a_ia_{i+1}a_i=a_{i+1}a_ia_{i+1} \text{ for } 1\le i\le n-2, \ a_ia_j=a_ja_i \text{ for } |i-j|>1\rangle$ . In terms of actually playing with braid strands, $a_i$ means interchanging strand $i$ with strand $i+1$ by putting strand $i$ over strand $i+1$ . It is pretty simple to see that these generators do indeed induce all possible braids (although I haven’t yet said what a braid is), and that the relations ought to hold. Now, of course, a braid is an element of the braid group.

The braid groups become rather complicated quite quickly. While $B_0=B_1=0$ and $B_2=\mathbb{Z}$ , already $B_3$ is nonabelian, and it’s isomorphic to the fundamental group of the complement of a trefoil knot in $\mathbb{R}^3$ .

Note also that there is a natural homomorphism $\pi:B_n\to S_n$ that tells us where the strand that started in the $i^\text{th}$ position ends up.

Okay, now it’s time for some challah braiding algorithms. My mother’s usual challah has four strands on the bottom and three on the top. The algorithm for the top braid is pretty natural: $(a_1a_2^{-1})^n$ , where $n$ is decided by the length of the dough ropes.

I’m more concerned about the element of $B_4$ used for the bottom braid. She uses $(a_1a_2^{-1}a_3^{-1}a_2)^n$ . If $n=1$ , we have $\pi(a_1a_2^{-1}a_3^{-1}a_2)=(142)(3)$ (in cycle notation). This is already bad news to me: one step of the algorithm produces a single fixed point! I think one step of the algorithm ought to give an $n$ -cycle (here a 4-cycle) or else a pure braid (i.e. a braid in the kernel of $\pi$ ). But it gets worse: the strand that starts in position 3 has no undercrossings. So when we’re done, it sits on top of every other strand.

It turns out not to be so bad because the three-strand braid sits on top of the four-strand braid, so the central portion of the four-strand braid is not visible in the finished bread. But aesthetically (and mathematically), this feels like a serious flaw to me.

Fortunately, I found an alternate algorithm for four-strand braiding that lacks these flaws: $(a_2a_1a_3^{-1})^n$ . If $n=1$ , $\pi(a_2a_1a_3^{-1})=(1243)$ , which is nice. Furthermore, every strand has both overcrossings and undercrossings. So this is my new preferred braid.

Sometimes, however, it is preferable to braid with six strands. There was an article in the newspaper that explained how to do it, but I was unable to follow it. Fortunately, I found a YouTube video that shows someone doing it (possibly the same way; I can’t tell). I was able to transcribe this method in terms of generators of the braid group. However, I’m not quite sure where it is supposed to end, so my braid may be slightly different from the one shown in the video. The braid is the video is $(e^{-1}d^{-1}c^{-1}b^{-1}a^{-1}(bcdeabd^{-1}c^{-1}b^{-1}a^{-1}e^{-1}d^{-1})^n$ , except that it might stop somewhere in the middle of the $(\cdot)^n$ . I don’t really want to calculate $\pi$ of this braid (computations like this have never been that easy for me), but I would guess that it is a 6-cycle if it stops at an appropriate moment. (Also, it’s not as complicated as the formula would make it seem, since there’s a lot of stuff like moving the strand on the right all the way over to the left, and it takes a lot of generators to express that, even though it’s not complicated when you’re actually braiding dough.)

The Eilenberg Swindle

Tuesday, August 28, 2007 in K-theory, ring theory | 5 comments

Recall from the last post that if $R$ is a commutative ring, we define $K_0(R)$ to be the Grothendieck group of the isomorphism classes of finitely generated projective $R$ -modules. It is natural to ask what happens if we replace finitely generated projective modules with countably generated projective modules. Let us write $\mathfrak{K}_0(R)$ for this group. It turns out that understanding $\mathfrak{K}_0(R)$ is extremely easy.

Theorem: For any commutative ring $R$ , $\mathfrak{K}_0(R)=0$ .

Proof: We have to show that if $A$ , $B$ , $C$ , and $D$ are countably generated projective $R$ -modules, there is some countably generated projective $R$ -module $E$ so that $A\oplus D\oplus E\cong B\oplus C\oplus E$ . Define $E=\bigoplus_{i=1}^\infty (A\oplus B\oplus C\oplus D)$ . Hence $\mathfrak{K}_0(R)=0$ .

A similar construction shows up in the theory of group rings. Here’s an exercise from T.Y. Lam’s Exercises in Classical Ring Theory:

Exercise 8.16: Let $G$ and $H$ be any two groups. Show that there is some ring $R$ so that $R[G]\cong R[H]$ . (Here $R[G]$ is the ring of finite $R$ -linear combinations of elements of $G$ , and multiplication is defined by the group multiplication of $G$ .)

Solution: Let $K=(G\times H)\times(G\times H)\times\cdots$ , and set $R=\mathbb{Z}[K]$ . Then $R[G]\cong R[H]$ .

Lam makes the comment that, although consideration of the group rings $\mathbb{Z}[G]$ and $K[G]$ are very useful for determining properties of $G$ (for instance, the modules over these rings are the objects of study in group cohomology and representation theory, respectively), the group ring $R[G]$ for an arbitrary ring $R$ might not give us much information about $G$ .

There’s an interesting article I found on more general Eilenberg swindles, but the authors don’t define progenerators, so I’ll include that here.

Let $R$ be a ring and $M$ a right $R$ -module. Define $M^\ast=Hom_R(M,R)$ and $S=Hom_R(M,M)$ . Then $M$ and $M^\ast$ are $S-R$ and $R-S$ bimodules, respectively. Furthermore, we can define multiplications $M^\ast M\subseteq R$ by $m^\ast m=m^\ast(m)$ and $MM\ast\subseteq S$ by $mm^\ast(m')=m(m^\ast m')$ . We say that $M$ is a progenerator if $MM^\ast=S$ and $M^\ast M=R$ .

New objects from old using equivalence classes of pairs

Tuesday, August 14, 2007 in K-theory, ring theory | 1 comment

In many places in mathematics, we see some variant of the following simple construction. The first time we see it is in constructing the integers from the natural numbers:

Consider pairs $(a,b)$ of elements of $\mathbb{N}$ (which includes zero for our purposes, but it doesn’t really matter this time). We form equivalence classes out of these pairs by saying that $(a,b)\sim(c,d)$ if $a+d=b+c$ . We can create a group structure on these pairs by setting $(a,b)+(c,d)=(a+c,b+d)$ . The resulting group is isomorphic to $\mathbb{Z}$ . So that’s how to construct the integers from the natural numbers.

We see a similar construction when we discuss localizations of rings. Let $R$ be a commutative ring and $S$ a multiplicative subset containing 1. (If $R$ is noncommutative, you can still localize provided that $S$ is an Ore set, but I don’t feel like going there now.) We now consider pairs $(r,s)\in R\times S$ under the equivalence relation $(r_1,s_1)\sim(r_2,s_2)$ if there is some $t\in S$ so that $tr_1s_2=tr_2s_1$ . The set of equivalence classes has the structure of a ring, called the localization of $R$ at $S$ , and denoted by $R_S$ . This construction is generally seen with $S=R\setminus\mathfrak{p}$ , where $\mathfrak{p}$ is a prime ideal of $R$ . The resulting ring is then local (meaning that it has a unique maximal ideal, namely $\mathfrak{p}R_S$ . (We generally write $R_{\mathfrak{p}}$ rather than $R_S$ in this situation.) Anyway, this construction is really useful because localizations at prime ideals are frequently principal ideal domains, and we know all sorts of interesting theorems about finitely generated modules over principal ideal domains. And then we can use some Hasse principle-type result to transfer our results back to the original ring.

Notice that I allowed the multiplicative set of localization to contain zero. However, in this case, the localization becomes the trivial ring (or not a ring, if you require that $0\neq 1$ in your definition of a ring, as many people do). More generally, allowing zero divisors in the multiplicative set causes various elements in $R$ to become zero in the localization.

A similar construction shows up in $K$ -theory. Suppose $A$ is any commutative semigroup. We consider pairs $(a,b)\in A\times A$ under the equivalence relation $(a,b)\sim(c,d)$ if there is some $e\in A$ so that $a+d+e=b+c+e$ . (This is necessary since we do not assume that $A$ satisfies the cancellation property.) The resulting equivalence classes form a group called the Grothendieck group of $A$ and denoted by $K_0(A)$ .

The Grothendieck group satisfies the following universal property. Let $\phi$ be the map sending $a\in A$ to $(a+b,b)$ for any $b\in A$ . (This is easily seen to be well-defined.) Now let $G$ be any abelian group and $\psi:A\to G$ any semigroup homomorphism. Then there is a (unique) map $\theta:K_0(A)\to G$ so that $\theta\phi=\psi$ .

Grothendieck groups can be very helpful for studying rings. Let $R$ be a commutative ring, and let $A$ denote the semigroup of isomorphism classes of projective $R$ modules (under the operation of direct sum). Then $K_0(A)$ (or $K_0(R)$ , as people often write) is an important object of study. If $R$ is a field, for instance, then $K_0(R)\cong\mathbb{Z}$ . However, if $R$ is the ring of integers of a number field $K$ , then $K_0(R)\cong\mathbb{Z}\oplus C(K)$ , where $C(K)$ is the ideal class group.

Perhaps more interesting is Swan’s Theorem, which relates vector bundles over a compact topological space to the projective modules over its ring of continuous functions: they have isomorphic Grothendieck groups. But that’s probably the subject of another post, especially if I can manage to understand my notes from Max Karoubi’s lecture series in Edmonton.

Tschirnhaus Transformations

Monday, August 13, 2007 in Galois theory | 1 comment

This is based on a talk by Zinovy Reichstein from the PIMS Algebra Summer School in Edmonton.

The motivation comes from looking at ways to simplify polynomials. For example, if we start with a quadratic equation $x^2+ax+b$ , we can remove the linear term by setting $y=x+\frac{a}{2}$ ; our equation then becomes $y^2+b'$ .

We can do something similar with any degree polynomial. Consider the polynomial $x^n+a_1x^{n-1}+a_2x^{n-2}+\cdots+a_n$ . We may make the substitution $y=x+\frac{a_1}{n}$ to remove the $(n-1)$ -degree term.

We can also make the coefficients of the linear and constant terms equal with the substitution $z=\frac{b_n}{b_{n-1}}y$ (where the $b$ ‘s are the coefficients of the polynomial expressed in terms of $y$ ).

Enough for motivation. Suppose $f(x)=x^n+a_1x^{n-1}+\cdots+a_n$ is a polynomial, and let $K=\mathbb{C}(a_1,\ldots,a_n)$ . (So, in particular, $a_1,\ldots,a_n$ form a transcendence basis for $K$ over $\mathbb{C}$ .) Let $L=K[x]/(f(x))$ . A Tschirnhaus transformation is an element $y\in L$ so that $L=K(y)$ .

Applying a Tschirnhaus transformation to a polynomial of degree $n$ gives us another polynomial of degree $n$ with different coefficients. They also allow us to simplify polynomial expressions in various senses. We will use the following two criteria of simplification:

1) A simplification involves making as many coefficients as possible 0.

2) A simplification involves making the transcendence degree of $\mathbb{C}(b_1,\ldots,b_n)$ over $\mathbb{C}$ as small as possible. (For a polynomial of degree $n$ , we will write $d(n)$ for this number.)

Suppose $n=5$ . Hermite showed that it is possible to make $b_1=b_3=0$ and $b_4=b_5$ . Therefore $d(5)\le 2$ . Klein showed that $d(5)$ is in fact equal to 2.

Now suppose $n=6$ . Joubert showed that again we can make $b_1=b_3=0$ and $b_5=b_6$ . Therefore $d(6)\le 3$ .

It is unknown whether we can make $b_1=b_3=0$ when $n=7$ . However, it is known that we cannot do so if $n$ is of the form $n=3^r+3^s$ for $r>s\ge 0$ or $n=3^r$ for $r\ge 0$ . It is also known (Buhler and Reichstein) that $d(n)\ge\left\lfloor\frac{n}{2}\right\rfloor$ .

Dynamical Systems in Number Theory and Linear Algebra

Thursday, July 5, 2007 in algebraic number theory, dynamical systems, linear algebra | 2 comments

I have been reading Joseph Silverman’s new book on arithmetic dynamics lately. There’s a lot of really fascinating stuff in there, including a large number of potential research problems that are currently way beyond me, but I’ll continue thinking about them! Most interesting so far is the Uniform Boundedness Conjecture:

Let $d\ge 2$ , $N\ge 1$ , and $D\ge 1$ be integers. Then there exists a constant $C=C(d,N,D)$ such that for any number field $K$ with $D \ge [ K : \mathbb{Q} ]$ and any morphism $\phi:\mathbb{P}^N(K)\to\mathbb{P}^N(K)$ of degree $d$ , the number of preperiodic points of $\phi$ in $\mathbb{P}^N(K)$ is at most $C$ .

Not much is known about this conjectures; even the case $d=2$ , $N=1$ , and $D=1$ is open. It’s even open if we restrict to morphisms of the form $\phi_c(z)=z^2+c$ . Bjorn Poonen has shown, however, that these maps have no rational periodic points of exact period 4 or 5; it is conjectured that they have no rational periodic points of exact period greater than 3.

However, there is a positive result of the above type that doesn’t depend that much on some of the above quantities:

Let $K$ be a number field and $\phi:\mathbb{P}^1\to\mathbb{P}^1$ be a rational map over $K$ . Let $\mathfrak{p}$ and $\mathfrak{q}$ be prime ideals of $\mathfrak{o}_K$ so that $\phi$ has good reduction at $\mathfrak{p}$ and $\mathfrak{q}$ (meaning that when we reduce $\phi$ modulo $\mathfrak{p}$ and $\mathfrak{q}$ , we end up with a map $\tilde\phi$ of the same degree as $\phi$ ) and whose residue characteristics are distinct. Then the period $n$ of any periodic point of $\phi$ in $\mathbb{P}^1(K)$ satisfies $n\le (N\mathfrak{p}^2-1)(N\mathfrak{q}^2-1)$ , where $N$ denotes the (absolute) norm.

(See, for instance, my algebraic number theory notes for definitions of some of these terms.)

Anyway, that wasn’t really the point of this post, as you may have guessed from the title. I meant to talk about theorems that pretend not to be related to dynamical systems but actually are. First we need to discuss height functions a bit; there’s a lot more about them in Silverman’s book and in my elliptic curve notes.

We let $K$ be a number field and $M_K$ the set of standard absolute values on $K$ (These are the absolute values on $K$ whose restriction to $\mathbb{Q}$ is either the standard absolute value or one of the $p$ -adic absolute values.) We write $n_v=[K_v:\mathbb{Q}_v]$ (where $F_v$ denotes the completion of $F$ with respect to the absolute value $v$ ). Suppose $P\in\mathbb{P}^N(K)$ ; we can then write $P=[x_0,\ldots,x_N]$ for some $x_0,\ldots,x_N\in K$ . We then define the height of $P$ with respect to $K$ to be $H_K(P)=\prod_{v\in M_K} \max\{|x_0|_v,\ldots,|x_N|_v\}^{n_v}$ . One can check that this is well-defined, and that if $L/K$ is a finite extension of number fields and $P\in K$ , then $H_L(P)=H_K(P)^{[L:K]}$ . Hence it is possible to define the absolute height of $P$ by $H(P)=H_K(P)^{1/[K:\mathbb{Q}]}$ .

One of the key facts about heights is the following: If $B$ and $D$ are constants, then $\{P\in\mathbb{P}^N(\overline{\mathbb{Q}}):H(P)\le B \text{ and } [\mathbb{Q}(P):\mathbb{Q}]\le D\}$ is finite. A corollary is the following well-known result of Kronecker:

Let $\alpha\in\overline{\mathbb{Q}}$ be nonzero. Then $H(\alpha)=1$ if and only if $\alpha$ is a root of unity.

Proof: If $\alpha$ is a root of unity, then $H(\alpha)=1$ is clear. Now suppose that $H(\alpha)=1$ . For any $\beta$ and $n$ , we have $H(\beta^n)=H(\beta)^n$ , so $H(\alpha^n)=H(\alpha)^n=1$ , so $\{\alpha,\alpha^2,\alpha^3,\ldots\}$ is a set of bounded height and is therefore finite. Therefore there are integers $i>j>0$ such that $\alpha^i=\alpha^j$ , so $\alpha$ is a root of unity.

And now for linear algebra. Sheldon Axler has a well-known book on linear algebra without determinants. He therefore uses dynamical systems to show the following familiar result:

Theorem: Every operator on a finite-dimensional nonzero complex vector space has an eigenvalue.

Proof: Let $V$ be such a vector space of dimension $n$ . Let $T$ be an operator on $V$ , and let $v\in V$ be nonzero. Then $\{v,Tv,T^2v,\ldots,T^nv\}$ cannot be a linearly independent set. Hence there exist $a_0,\ldots,a_n\in\mathbb{C}$ , not all zero, so that $a_0v+a_1Tv+\cdots+a_nT^nv=0$ . Suppose $m$ is maximal with respect to $a_m\neq 0$ . Then $a_0v+a_1Tv+\cdots+a_mT^mv=0$ . Since we’re working over $\mathbb{C}$ , the polynomial $a_0+a_1z+\cdots+a_mz^m$ factors as $a_m(z-\lambda_1)\cdots(z-\lambda_m)$ . We then have $0=a_0v+a_1Tv+\cdots+a_mT^mv=a_m(T-\lambda_1I)\cdots(T-\lambda_mI)v$ . Hence some $T-\lambda_jI$ is not injective. This $\lambda_j$ is an eigenvalue for $T$ .

Blogroll
- WordPress.com
- WordPress.org

An Idelic Life

Blogroll