A friend told me about an interesting problem that can be solved easily using martingales. I’ll start with the problem and solution, and then I’ll talk a bit about martingales in general. (That’s all I can do; I don’t know much about them.)

Problem: A monkey is sitting at a typewriter, typing a letter (A-Z) independently and with uniform distribution each minute. What is the expected amount of time that passes before ABRACADABRA is spelled?

Solution: Suppose that, before every keystroke is made, a new monkey enters and wagers $1 on the next keystroke being an A fairly (so that if the keystroke is indeed an A, then the payoff is $26). If the keystroke is an A, the monkey stays and wagers everything (in this case $26) on the next letter being B, and so on. If the monkey ever loses a wager, then it leaves. Now let’s analyze what happens when ABRACADABRA is finally spelled out. The monkey who kept making correct wagers all the way through won $ 26^{11}. But another monkey who got in on the second ABRA won $ 26^4, and a third monkey who got in on the final A got $26. Hence the total payoff is $ 26^{11}+26^4+26. But all the wagers are fair, and the house gets $1 on every turn from the new monkey, so the expected time before ABRACADABRA is spelled is 26^{11}+26^4+26.

I’m trying to figure out intuitively why the expected time should be longer than 26^{11}. Here’s my best understanding of it: if a lot of random letters are typed, then on average ABRACADABRA will be about as common as any other 11-letter string. However, they’ll be clumpier than some other strings, since we’ll see some instances of ABRACADABRACADABRA, which contain ABRACADABRA twice. But we can never clump, say, AAAAAAAAAAB more densely. Therefore, although the number of occurrences should be roughly equal in the limiting case, one can clump more easily than the other, so it should take more time to the first occurrence of ABRACADABRA than to the first occurrence of AAAAAAAAAAB.

In fact, we can make this clumping relationship even clearer with the following question: Suppose we have only two letters, A and B, and a monkey randomly types them. What is the probability that BA occurs before AA? The answer is 3/4 for a rather simple reason: if the sequence does not begin with AA, then there will be a B before the first occurrence of A, so BA will occur before AA unless the sequence starts with AA, which happens only 1/4 of the time.

Now on to martingales.  A martingale is a sequence of random variables X_1,X_2,X_3,\ldots so that the expectation E(X_{n+1}\mid X_1,\ldots,X_n)=X_n (and E(|X_n|)<\infty). In this case, the random variables X_n for each monkey are the payoffs after n wagers by that particular monkey. If the (n+1)^\text{th} wager is going to be made, then X_n=26^n, and X_{n+1} is going to be 26^{n+1} with probability 1/26 and 0 with probability 25/26. Thus E(X_{n+1}\mid X_1,\ldots,X_n)=26^{n+1}/26=26^n=X_n. Hence this chain is a martingale.