Taylor's Law of Programming Probability

16th March 2001

Introduction: the History of Laziness in Programming

The Perl manual says that ``the three principal virtues of a programmer are Laziness, Impatience, and Hubris.'' What does this mean? Larry Wall, the author of Perl, explains that ``lazy programmers do not like to write the same code more than once. Thus, a lazy programmer is much more likely to write code to be reusable and as applicable in as many situations as possible.'' And noted Perl guru Randal L. Schwartz expands on this theme by defining Laziness as ``the quality that makes you go to great effort to reduce overall energy expenditure. It makes you write labor-saving programs that other people will find useful, and document what you wrote so you don't have to answer so many questions about it. Hence, the first great virtue of a programmer.''

In a similar way, Jon Bentley extols the virtues of ``creative laziness'' in programming, in his utterly wonderful book Programming Pearls [amazon.co.uk], [amazon.com]. He defines it as the tendency to stop and think before ploughing in and doing a lot of work that more analysis would have shown to be unnecessary.

The Law

In the spirit of these classic applications of laziness to the noble art of programming, I offer Taylor's law of Programming Probability:

The theoretical possibility of a catastrophic occurrence in your program can be ignored if it's less likely than the entire installation being wiped out by meteor strike.

An Example: MD5 Collisions

We're working on an application which involves people buying things over the internet using credit cards. We want to make sure that people can't make multiple accounts in this application, then use those two accounts to buy things using the same credit card. (This would allow people an easy way to get around restrictions on the maximum amount of stuff you can buy.)

The problem is that, for all sorts of reasons, we don't want to store people's credit card information on the system. Doing this would mean that it would be catastrophic if our system were cracked; and maybe more importantly, storing such details just makes people uncomfortable.

One of our engineers proposed a solution to this: instead of storing the credit card number itself, we store the MD5 checksum of the card number. This is a popular one-way encryption algorithm - even if someone were able to break into our system and get hold of our list of MD5-encrypted credit card numbers, they could never decrypt them, so no-one's card would be compromised. But when someone tries to buy from our site, we can check the MD5 checksum of the card they want to use against our database of the MD5 checksums of cards that have been used from other accounts.

The ``problem'' with this solution is that it's theoretically possible for two credit card numbers to encode to the same MD5 checksum. If this happened, then someone with a good card could be wrongly refused permission to use it, because the system would think that card had been used from a different account.

So: what's the chance of this happening?

So the chance of any given credit card ``colliding'' with any other is one in 2128/1016 = one in 34028236692093846346337

Now suppose that every single individual in the world signed up to our system. Then we'd have five billion accounts, and so five billion credit card numbers' MD5 checksums on file. Then the chance of a new subscriber's credit card colliding with any one of those would be one in 2128/1016 / 5000000000 = one in 6805647338419 (Call this number x.)

How bad is that?

Well, the geological and fossil record suggests that the Earth is hit by a massive bolide (meterorite or comet) roughly every hundred million years or so, causing mass extinctions, especially of higher life forms. (The most recent such impact, about sixty-five million years ago, is the most likely cause of the extinction of the dinosaurs.)

A hundred million years is 100000000x365x24x60x60 = 3153600000000000 seconds, or about pi petaseconds. (Call this number c.) So if one new person registered with our system every second, you could expect about c/x new subscribers to be wrongly told that their credit cards were already registered before the next time the earth is wiped out by a meteor. That's about 463, or 0.00000926% of the whole subscriber base.

To put it another way, if one person registers with our web site every second, it will be about 215805 years before we have a collision.

Folks, this is not something we should lose sleep over!

Other Applications

The important number here seems to be 100000000x365x24x60x60 = 3153600000000000, which we called c - the average number of seconds per mass extinction of life on earth. The reciprocal of this number, 1/c is the probability that all higher life on Earth will be wiped out by meteor strike in any given second. (The name c stands for the catastrophe number.)

There is just no point in worrying about anything that happens with a probability of less than one in c, because you're more likely to be killed (along with the rest of humanity and most of the mammals, birds, etc.) during this exact second than you are to be bitten by your putative problem.

My final thought is that the number c is surprisingly low. For example, it suggests that 128-bit encryption is unnecessary. 52-bit encryption would be sufficient to make cracking (at one attempt per second) less likely than meteor-induced extinction, since 252 is slightly larger than c.

I welcome other applications of what I am grandiosely calling ``Taylor's law''. Please email them to me at <mike@miketaylor.org.uk> - I will include interesting applications on this page.

I've had quite a bit of interesting feedback on this page, most notably John Deters's carefully argued rebuttal to the law's application to credit-card checksums.

Thanks to Jani Nurminen for pointing out that MD5 checksums are 128 bits wide, not the 256 bits that I originally claimed. (Now fixed.)

Thanks to Dan Milstein for pointing out my failure to consider all the possible collision combinations between the five billion possible credit-card holders. He says:

You should be taking into account every possible way to form a pair of CC#'s in your DB, which is 5 Billion choose 2 = (5 billion * (5 Billion - 1)) / 2, so your final odds are about 2.5 Billion off (as in, it's 2.5 billion times more likely than your estimating).

Oops. Better fix the sums some time.

Feedback to <mike@miketaylor.org.uk> is welcome!