Taylor's Law: John Deters's Rebuttal

22nd August 2002

Here's what John had to say in response to Taylor's law of Programming Probability:

Date: Wed, 21 Aug 2002 14:21:45 -0500
From: "John.Deters" <John.Deters@dont.spam.me<
To: mike@miketaylor.org.uk
Subject: Problem with your example

I was checking out your Taylor's Law of Programming Probability (with which I wholeheartedly agree) but saw an error in your assumptions for your usage of MD5 to protect your client's account numbers.
I know, because I made the same error myself.
First, credit card account numbers are not nearly as randomly distributed as 1 in 10^16. Visas, for example follow a format like 4BBBBBBaaaaaaaaK, where 4 is always a 4 (Visa numbers begin with 4), BBBBBB is a Visa member bank number, aaaaaaaa is the unique account with that bank, and K is a check digit. K is easily calculable, and is present only to help prevent miskeying of the number.
There are a finite number of banks and therefore a finite set of bank numbers. The bank numbers are not kept secret. If I make the assumption that one of your customers comes from bank 012345, then I simply have to generate the MD5 check sums for the following set of numbers: 401234500000000K-401234599999999K. That's only 100,000,000 MD5s to calculate, which is not a huge problem on today's PCs. I then have a catalog of MD5s to compare to anything I find on your web site, and I know when I've got a match.
Granted, I've accomplished nothing more than unscrambling the account number, but in the right context that information may be very damaging to my customers. I'm just trying to point out that the system as described on your web page isn't quite as secure as it may appear to be at first glance.

I have nothing to add in response to that analysis: it seems to blow Taylor's Law right out of the window so far as its application to credit card security is concerned.

Of course, the law still applies in the more general sense: events which are less likely to occur than somewhat (say 1 in c, which we defined to be about 3x10¹⁵) can be ignored - and that applies to credit card MD5 collisions as much as to anything else. But we do need to be aware of the possibility of Bad Guys routing around the probability.

John goes on to makes some observations about how we might defeat someone trying crack our credit-card database:

For our application, we went over many different solutions. Most of them were along the lines of ``Let's add some salt to the account number before hashing.''
We also discussed possible attacks. The salt has to be present on the machine doing the hashing as well as on the machine doing the verifying. This means an attacker could obtain it. Salting may be better than nothing at all, and it may be good enough if it stays obscure, but once you've been rooted you really have no way of knowing what your attackers may have learned. You can assume that they took a copy of your table OUR_CUSTOMERS_SECRET_HASHED_VISA_CARD_NUMBERS. You hope that they didn't take a copy of the GenerateSecretVisaHash program.
Don't get me wrong: salting the account numbers would be better than simply hashing them straight up. And hashing them is far wiser than saving the numbers in the clear. Any custom secret obscuration is some defense against script-kiddies. But if your server is compromised it becomes impossible to tell if was just a kid who got lucky with a NIMDA or lion worm, or a funded attacker looking for some specific dirt.
Ultimately, you are legally responsible for the safety of that data. I think that if you are going to keep it, hashing it is a good idea. Perhaps a judge might agree with you. I bet Visa would disagree. Perhaps the best solution involves a different approach, rather than storing the account number. That I don't know and can't answer for you.

And I particularly like John's conclusion:

What it all boils down to is: is the probability of your server getting hacked higher than the probability of a meteor striking it? :-)

Feedback to <mike@miketaylor.org.uk> is welcome!