Hash Collision

In a previous article on hashes and how they are used to store passwords I mentioned hash collisions.  A hash collision is where two different inputs can create an identical hash.  This was first done by researchers in China (Collisions for Hash Functions MD4, MD5, HAVAL-128 and RIPEMD) and one thing that you can be sure of in cryptography is that as soon as one person or group find a weakness, more weaknesses follow.

What do hash collisions mean?

The main issue with hash collisions is that they could, in theory, compromise the "one-way" nature of hashes like MD5 (when I say "one way" I means that it is not possible to recover the input that generated the hash in anything resembling a reasonable amount of time).  If I take the string "password" and hash that using MD5 I get the hash 5f4dcc3b5aa765d61d8327deb882cf99.  The last thing I want is for another string to have the same hash.  Also, while there's no evidence that hash collisions make it possible to recover passwords from their hashes (or in more technical terms that preimage resistance or second preimage resistance are at risk) this kind of research attracts more research.  Calm down, nothing is insecure and no one is going to be breaking digital certificates any time soon.

The paranoid

However, there's always going to be the paranoid that worry that two inputs could have the same hash output and what to do something about this (if only to protect their systems from being used as experimental test beds for testing theoretical attacks.

There is a simple solution to this problem - store two hashes for the input.  Use two algorithms, such as SHA-1 and MD5, and store the hashes for both.  Then, when a password challenge is issued, hash it using both and test.  Two inputs that produce a collision in MD5 are statistically highly unlikely to produce a collision in a SHA-1 hash too.

Password Entered
Password Hashes Stored

Since both algorithms (for MD5 and SHA-1) are relatively fast, this doesn't add much overhead to authentication systems.


Is there any reason to go to these lengths?  Maybe, maybe not.  If you are developing a new system then you've not much to lose by doing this.  Your password database will more than double in size (from approximately 32 bytes per 88-character password to 72 bytes) by doing this which may or may not be a concern.  

In an existing system then taking this approach could be a problem.  How do you get the new hash to complement the existing one?  One approach might be to add the new hash after authenticating a user, but this adds overhead and possible problems.


The bottom line to all this is that these one way hash algorithms are getting long in the tooth and it's time to look for newer, modern algorithms to replace them before they become a liability.

Update: See MD5 Collisions - Implications.

Adrian Kingsley-Hughes
Last updated: Feb 20th 2005
Print This Page   |   Email me when this page changes    |  Search This Site System Scanner does the work for you!"); ?>

Contact Us