Hashing has been a long used technique to store and validate information. Hashing is the function of applying a translation to a string to generate a unique (and replicable) fingerprint of the data (often referred to as a checksum).
| String | Hash |
| Hello World | b10a8db164e0754105b7a99be72e3fe5 |
| Hello Worlds | 9f0d0cd67d27414d0ae8f5eca41a1a36 |
As you can see, even a minor change to the string results in a completely different hash. This is why hashes are often used to verify that data has been unmodified. Another great feature of hashes is that they are just a fingerprint of the data. There is no way to reconstitute the original data from a hash*.
Hashing for Verifying the Authenticity of Data
As you can see in the document the author generates an SHA512 hash of his/her document and places that document on the Internet. The client then downloads the document and runs it through the same algorithm on their own. If the hash we generate matches the authors then the document has been unmodified. If the hashes don't match then the document may have been tampered with and should not be trusted.
Storing the hash right next to the document really defeats the purpose. If an attacker gained access to the document, that attacker could then modify the document, generate a new hash for the modified document and overwrite the original. From a client point of view there is a document and a hash that work out. So it passes the verification. This is why it is really important to store the hash separately from the data you are trying to authenticate. Optimally the hash should be stored with a third party or be given out upon request directly from the author.
Hashing for Passwords
The most common way operating systems store passwords is in a hashed format. I actually don't know of a modern operating system that does not do this. The premise is quite simple. When your account is created your password is hashed and stored on the disk. When you login, you type in your password which then gets hashed. The operating system compares the hash it just generated (based on you entering your password into the login screen) with the hash it has on disk. If the two hashes match then you entered the same password your account was created with and so you are successfully logged in. If properly implemented this is a great way to authenticate users without having to ever store their passwords (I think properly implementing this is a post to itself).
Algorithms
For us in the .NET world there are 5 hashing functions available to us:
| Acronym | Name | Output Size (bit) | Recommended |
| MD5 | Message-Digest algorithm 5 | 128 | No (collision can be created in less than one minute on a standard computer) |
| SHA1 | Secure Hash Algorithm | 160 | No (collisions are possible to create but much harder than MD5) |
| SHA256 | Secure Hash Algorithm | 256 | Yes |
| SHA384 | Secure Hash Algorithm | 384 | Yes |
| SHA512 | Secure Hash Algorithm | 512 | Yes |
The SHA256-SHA512 are often referred to as SHA2 and are the preferred choice currently. The main reason for this is the lack of collisions which have been created on the MD5, SHA0, and SHA1 protocols. The SHA2 protocols are relatively new but have not been shown to have collisions (yet).
Collisions
A hash collision occurs when two different values produce the same hash. In theory this could mean that two totally different documents could yield the same hash. Worse yet that two separate passwords could yield the same hash. This would reduce the time taken to brute force a system as it would be possible to enter a password that collides against a hashed password in the system thus reducing the number of guesses required to force our way into a system. The use of salting is a great way to help protect against this (more on this later) but I prefer to use a stronger algorithm (and salting) when designing a system.
Code
The code is actually quite simple. All you have to do is convert your input data to a byte array and run it through one of the classes that derives from HashAlgorithm (MD5, SHA1, SHA256Managed, SHA384Managed, or SHA512Managed) and call the ComputeHash method on it. Here is a quick sample using the SHA512 Managed class:
public string CreateHash(string input)
{
HashAlgorithm sha = new SHA512Managed();
byte[] inputData = Convert.FromBase64String(input);
byte[] hashedBytes = sha.ComputeHash(inputData);
return Convert.ToBase64String(hashedBytes);
}
That is the basics of hashing. Next up I will write some posts on properly implementing hashing and salting for password systems.
*Well besides brute forcing the hash. This is done by throwing data at the hash function until the same hash is generate as the one you are trying to get. If the hashes match then you probably (see collisions) have the original data.