IP/Network | ShareTechnote

IP/Network

Hashing

I don't know exactly where the term 'Hash/Hashing' came from. I have searched several definitions in Google, but I didn't find any connections between the definition in our regular language and the meaning in the context of data security.

Hashing in Data Security is a special technique to make a number (Hash Output) from another number (Hash Input). Usually Hash output is the number with smaller number of digit than the Hash input, but it is not always the case.

Now the question is why do we need this kind of method ? What is the main usage of Hashing algorithm.

Main usage of Hashing algorithm is to make a special number (output of Hashing Algorithm) that has 'some correlation' with each and every bit of original data (input of Hashing Algorithm). The 'some correlation' should be designed in very special way in which if even a single bit in the original value (Hash input) changes, the hashing value (output of the algorithm) gets completely different.

Characteristics of Hashing Algorithm

Some important aspects of Hashing can be illustrated as follows.

As mentioned above, a Hashing algorithm takes an input value (or string) and produces a random-looking output. Even though the input and output are tighly related, it will be extremly difficult (or extremly time-consuming even with high performance computer) to figure out the 'input' value from the output value.

Following illustration shows how a very small differences in input values produces a complete different output. In this example, there is only one letter differences between the two inputs but the output looks completely different.

Following illustration shows that the length of the outpu remain same regardless of the length of the input.

Application of Hashing

Example 1 > Data Interity Check

OK.. I see what it is. Now let's assume I have a special algorithm to do this.. and so what ? What can I use it for ?

One of the most common application is to use this algorithm for Data Integrity check.

Let's suppose you generate a Hash value (output of an Hashing algorithm) from a chunk of data (hash input) and send the Hash value and the original data(hash input) to another person.

Now the person who received the data is wondering if the data was corrupted or modified (compromised) by some reason or by some bad person. How the received person can verify such a thing (this verification process is 'Integrity Check') ?

It is simple. Just to calculate the Hash value from the received data and compare the result with the calculated hash value that you recieved. If they are different, you can say the data was corrupted or compromised during the delivery process.

Typical Hasing Algirthms

There are many different Hashing algorithms, but some of those which has been most widely used are

MD5, MD4, MD3...
SHA-1, SHA-2, SHA256, SHA384, SHA512 ...

People say MD algorithm is broken now.. but still some applications are using this algorithm. Due to the break of MD algorithm, now SHA are more widely used.

Hashing vs CRC

You may think Hashing concept is very similar to Checksum concept.. and ask what are the difference between Checksum and Hashing. I think the fundamental principle is almost same.. but main difference is in application. And due to this application differences, there become some differences in terms of complexity and detailed algorithm.

Checksum (e.g, CRC, IP Checksum) is mainly used to detect some unintentional error generated in data transmission and usually use simpler algorithm and shorter value than in Hashing. So in checksum, there can be many different set of data that can produce the same checksum value. Hashing is mainly used to detect some modification /changes created by someone in the middle usually for bad purpose. In principle, there should be only one data that can generate a certain Hash value. If you find multiple data that generate the same Hash value, you can say "the Hash algorithm is broken or vulnerable".

References :

Hashing: Why & How? (YouTube) : Very good introduction
MD5 Hash Tutorial - What the MD5 hash means and how to use it to verify file integrity (YouTube) : Very Intuitive
An Illustrated Guide to Cryptographic Hashes (Steve Friedl's ) : Excellent, Intuitive and in-depth