We are all working remotely with multiple files being shared and myriad login attempts being made from different sources for the various tools that we use. Such activities are often unavoidable and are a necessity for businesses to function smoothly. Let us first talk about credentials. In many organizations, a username and password are present in a table in a database. When someone attempts to log in, the system checks the username and compares the password entered by the user with the one present in the table to check for a match. The most basic password storage format is cleartext where “readable data is stored in the clear”. However, security at this level is nothing much to talk about as it is almost like writing down the credentials in a piece of digital paper which can easily be hacked by hackers. So, you can imagine how vulnerable the system is! The solution?
Enter hashing algorithm, a fundamental part of cryptography, which refers to “chopping data into smaller, mixed up pieces which makes it difficult for the end user to go back to the original text/state”. A hash function is an algorithm that generates a fixed-length result or hash value from a specific input data. It is different from encryption which converts plain text to encrypted text and with the help of decryption, converts the encrypted text back to the original plain text. In the case of a hashing algorithm, plain text is converted into a hashed text through a cryptographic hash function, thereby making it difficult for hackers to make sense of it. (A hash length of 160 to 512 bits is good). But it doesn’t provide a way to go back to the original text.
So, if we have to ensure password security, hashing ensures that the passwords are hashed and stored in pairs along with usernames in the database table. When one logs in, the password typed is hashed and compared with the hashed entry from the database table. If there is a match, voila! The user is allowed to continue.
Hashtags can be used for password storage, integrity checks, digital signatures, message authentication codes. They can also come in handy for fingerprinting, file transfers, checksums etc.
What is an ideal cryptographic hash function?
There are some key aspects that make a hash function ideal for usage.
Hash functions behave as one-way functions
It is impossible to go back to the original text once it has been subjected to a hashing algorithm. So, if you get a specific result, an ideal hash function will ensure that you do not get the initial inputs which lead to the result. For example, 6 divided by 2 gives you a result of 3. But so does 9 divided by 3. But there would be no way to determine the initial two numbers from just the result ‘3’.
Hash functions make use of the avalanche effect very well
A particular input provides a particular output, but even a very minor change in the input (even if very insignificant) will lead to a pretty drastic change in the output.
Hash functions should be fast to compute
For any given input data subjected to hashing, getting results within seconds should not be a problem if the hash function is built strongly
Hash function outputs shouldn’t have any collision
The outputs of two input parameters should never be the same (look at the length of a hash function output and you will get what we are saying)
Hash functions are deterministic
The output of one input parameter has to be the same irrespective of whenever one checks or how many ever times one uses it. This especially comes in handy when multiple people need to be verified at different points in time
Hashing algorithm in action – How does it work?
We spoke about passwords and credentials at the start of the article. Now let us talk about file transfer. If one person (let us call him X) wants to transfer a file to another (let us call her Y). Without a hashing algorithm in place, the only way X can confirm the contents or the recipients would be to check in person with Y. But that would be cumbersome, and in a way, pointless in a busy, fast-moving yet highly insecure world. And if the message is long, the files are heavy or the mail contains multiple types, formats and numbers of attachments associated with it, this process will go for days on end.
But with a hashing algorithm, X can generate a checksum (a small-sized block of digital data derived from another block to detect errors during transmissions) for the specific file. Once Y receives the file and the checksum, she can use the same hashing algorithm on the received file. This would ensure that the correct file is sent by the correct sender, to the correct recipient.
Types of Hashing Algorithms
MD5 (MD stands for Message Digest)
One of the most commonly used yet amongst the most unsecure algorithms. When a password is converted into a specific pattern using this method, it is very easy to simply Google the hash value to get the original value. So, this is best avoided and, in fact, considered unsuitable for further use.
Input: An example of MD5
Output (Checksum containing 32 digits hexadecimal number like the following): 6c30eeb06ce8eb66b7a65191272b9743
SHA (Secure Hash Algorithm) family of algorithms
SHA-0, introduced in 1993, has been compromised myriad times. SHA-1, though a slightly improved version which has been used for Secure Socket Layer (SSL) security, has also been subjected to many attacks. SHA-2 is now recommended since it is more complicated. SHA-3 can be used by companies who are very serious about security.
Input: An example of SHA-1
It is a 512-bit hash function, derived from Advanced Encryption Standard (AES).
Input: An example of Whirlpool
RIPEMD family of algorithms:
It stands for RACE Integrity Primitives Evaluation Message Digest and was developed sometime in the mid-1990s. There are multiple versions like RIPEMD-160, RIPEMD-256 and RIPEMD-320. Since the output lengths keep increasing in the subsequent versions, the security coverage also increases.
Input: An example of RIPEMD-160
It is known as cyclic redundancy code and is commonly known for its spreading properties. It is also supposed to be a lot quicker leading to smooth file transfers and validations.
Input: An example of CRC32
Hashing Algorithm’s Security Limitations
Hashing algorithms are secure but are not immune to attackers. At times, a hacker has to provide an input to the hash function which can then be used for authentication. Multiple login attempts through brute force attacks can also be tried out till a match is found.
Since one exact input can have one exact output every single time, a typical, commonly-used password like ‘123456’ will be easier for a hacker to hash and gain unauthorised entry. Also, if multiple users are mapped to the same password, the hacker will be smiling all the way.
Another method called rainbow table attack where a hacker uses a large database of precomputed hash chains to crack passwords is common. Let us talk about the most commonly used password in the world – 123456. Let us consider Md5 hashing function. The way a rainbow table attack will work is as follows:
- Pass the password (123456) through an MD5 hashing function to get: e10adc3949ba59abbe56e057f20f883e
- Pass only the first few characters of the hashed value above (e10adc) to further get another re-hashed value: 96bf38d01b84aa16cf2bb9f55c61ac85
- Repeat the above procedure until enough hashes are obtained in the form of a chain, starting from the initial plain text to the final hashed text
- Store all of them in a table
- Keep going through the list one at a time until a match is found
To counter such attempts, salting technique is used wherein further complexity is added to the hashed value to make it more difficult to crack the password. Here, random data is added to the input of a hash function to generate a much more complex output. Rainbow table mainly works on unsalted hash values, so this adds a further layer of security.
Runtime Application Self-Protection (RASP), which detects attacks on an application in real time, is a good practice to watch out for. With limited human intervention and a smart analysis of contextual behaviour of applications, better security is guaranteed. So, when any suspicious activity is detected, RASP would ensure to terminate a session or provide the relevant alerts to the users for further actions. And they do have an advantage over firewalls which just look at the perimeter of an application and don’t have much of an idea about what is going on inside an application.
Even though newer versions of hashing algorithms are introduced in the market with an added layer of security and SHA-2 does seem to be a good option out there, it is always better to be updated with the latest in the hashing algorithm technology.
But definitely when it comes to business continuity where credential verification and file/message transfers are an on-going activity across the world, hashing algorithm does the job well.
Frequently Asked Questions
1. Is AES a hashing algorithm?
AES is not a hashing algorithm. AES is an encryption standard used to protect electronic data. It is a block cipher that encrypts data in blocks of 128 bits each. It is a symmetric algorithm that uses the same key for encryption and decryption.
2. Is RSA a hashing algorithm?
RSA is an asymmetric encryption algorithm that uses two different keys to encrypt and decrypt data. RSA keys are 1024-bits or 2048-bits long.
3. What is MD5 and SHA256?
The MD5 (Message-Digest algorithm) is a hashing algorithm that generates a 128-bit digest. It ensures that the file remains unaltered by producing and comparing checksums for both sets of data. However, it is no longer considered a secure hashing method. SHA256 stands for Secure Hash Algorithm 256-bit which is an irreversible hash function. It is used for secure password hashing and resists brute force attacks.
4. What are the 3 types of the hash collision algorithms?
CRC-32, MD5, and SHA-1 are the three types of hash algorithm that have varying levels of collision risk. CRC-32 poses the highest risk for collision whereas SHA-1 presents the lowest risk.
5. Which hash algorithm is fastest?
SHA-1 is the fastest hashing algorithm that delivers ~587.9 ms per 1M operations for short strings and 881.7 ms per 1M for longer strings.
6. How many types of hashing are there?
MD5, SHA-2 and CRC32 are the three important types of hashing used for file integrity checks. MD5 encodes information into 128-bit fingerprints and is used as a checksum. SHA-2 has hash functions with values of 224, 256, 384 or 512 bits. CRC is used to identify accidental changes to data. Common areas of application for CRC32 include ZIP files and FTP servers.