Understanding Hash Functions in Cryptography

In my previous article, it was nearly impossible to discuss blockchain technology without mentioning hash functions. In this article, I’ll dive deeper, dedicating the entire discussion to exploring the critical role hash functions play in the world of cryptography and security.

Overview of Hash Functions

Hash functions are fundamental to modern cryptography, providing a means to generate a unique identifier, or "digest," for any input data. Despite their simplicity, hash functions play a critical role in ensuring data integrity, authenticity, and security in various applications, from file verification to secure communications.

What is a Hash Function?

A hash function is a cryptographic tool that accepts an input of any size—be it a file, message, or even a single character—and produces a fixed-length output, typically a string of hexadecimal characters. This output is known as a hash or digest. The key property of a hash function is that it consistently produces the same output for the same input, making it a reliable means of verifying data integrity.

Consider a scenario where you download a file from the internet. Alongside the download link, you might see a long string of characters labeled as a "SHA-256 checksum" or something similar. This checksum is the hash of the original file. By hashing the file you downloaded and comparing it to the provided checksum, you can verify that the file has not been tampered with during transmission.

Core Properties of Hash Functions

Cryptographic hash functions are designed to meet three essential security properties, which are crucial for their effectiveness in real-world applications:

Pre-image Resistance: This property ensures that it is practically impossible to reverse-engineer a hash to find the original input. In other words, given a hash, one should not be able to determine the exact input that generated it. This "one-way" characteristic is vital for protecting sensitive information.
Second Pre-image Resistance: Given a specific input and its corresponding hash, it should be infeasible to find a different input that produces the same hash. This property is crucial for maintaining the integrity of data, ensuring that no two distinct pieces of data can share the same hash.
Collision Resistance: This property ensures that it is extremely difficult to find any two different inputs that result in the same hash. Collision resistance is fundamental to preventing attacks where an adversary might attempt to find two different inputs that produce the same hash, potentially leading to data integrity issues.

Collision resistance does not assume any prior knowledge of specific inputs and is generally considered a stronger requirement because it covers a broader range of potential attacks. If a hash function is collision-resistant, it is also second pre-image resistant, but the reverse is not necessarily true.

Hash Functions in Practice

Hash functions are employed in various real-world scenarios to ensure the security and integrity of data:

File Verification: When you download software or other files from the internet, the provider often includes a hash (such as a SHA-256 checksum) to allow you to verify the file's integrity. By comparing the hash of the downloaded file to the provided hash, you can confirm that the file has not been altered.
Subresource Integrity (SRI): Web developers use SRI to ensure that externally loaded resources, like JavaScript files from content delivery networks (CDNs), have not been tampered with. By including a hash of the resource in the HTML code, the browser can verify that the file is authentic before executing it.
Peer-to-Peer Networks: In systems like BitTorrent, files are split into chunks, each of which is hashed. These hashes are used to verify the integrity of each chunk as it is downloaded from different peers, ensuring that the final reassembled file is accurate and untampered.

Standardized Hash Functions: SHA-2 and SHA-3

Two families of hash functions have become the gold standard in cryptography: SHA-2 and SHA-3.

SHA-2: Developed by the National Security Agency (NSA) and standardized by NIST, SHA-2 includes variants like SHA-224, SHA-256, SHA-384, and SHA-512, which produce outputs of 224, 256, 384, and 512 bits, respectively.

SHA-256 is the most widely used variant, offering a balance of security and performance. Despite its widespread adoption, SHA-2 is susceptible to length-extension attacks, making it less suitable for certain applications, particularly those involving secrets.

SHA-3: In response to vulnerabilities discovered in older hash functions like MD5 and SHA-1, NIST held a competition to develop a new standard. The result was SHA-3, based on the Keccak algorithm. Unlike SHA-2, SHA-3 uses a sponge construction, which is immune to length-extension attacks and is suitable for hashing secrets.

SHA-3 variants include SHA-3-224, SHA-3-256, SHA-3-384, and SHA-3-512, offering the same output lengths as their SHA-2 counterparts but with enhanced security features.

Extendable Output Functions (XOFs): SHAKE and cSHAKE

While traditional hash functions produce fixed-length outputs, certain applications require variable-length digests. This need gave rise to extendable output functions (XOFs), which allow users to specify the desired output length.

SHAKE: Part of the SHA-3 family, SHAKE provides an arbitrary-length output, making it versatile for generating digests, random numbers, and cryptographic keys. SHAKE is especially useful in situations where the flexibility of output length is required.

SHAKE128("I Love Cryptography", 256)
af0331e5bf7450ecdfb38dc10c097bb881b4fca0a044f7238f5ba09a5920395c

SHAKE128("I Love Cryptography", 512)
af0331e5bf7450ecdfb38dc10c097bb881b4fca0a044f7238f5ba09a5920395ca755ac266e7eed368d23d9e820a84f8dbcafda32ef81313df330853833a42616

cSHAKE: An extension of SHAKE, cSHAKE introduces a customization string that allows users to create unique instances of the XOF. This feature, known as domain separation, is useful in cryptographic protocols where different hash functions are needed to maintain security across various contexts.

cSHAKE128("I Love Cryptography", 256, "sameh abouelsaad")
dca100b503f99232e099bb24f17daa4b4a30f702b32b6aa72fa14ba1738a6269

cSHAKE128("I Love Cryptography", 256, "someone else")
f2c08a8e2c093b19058f03399a279d6909bc74e9a47865b15bfb6b833e01de2b

Hashing Passwords

Storing passwords securely requires more than just hashing them with a standard hash function as these algorithms are designed to be computed quickly, so if the hashed values are compromised, it is possible to try guessed passwords at high rates.

Specialized password-hashing algorithms like Argon2, bcrypt, and scrypt are designed to be slow and resistant to brute-force attacks, making them ideal for securely storing passwords.

These algorithms often include salts—large random, non-secret values added to each password before hashing—to prevent attackers from using precomputed tables (rainbow tables) to crack passwords.

let's talk about this in more details:

When you create an account on a website, instead of storing your password in plain text, the website stores a hashed version of it. For example, the hash of "password123" might be something like "ef92b778bafe771e89245b89ecbc9b9e".

When you log in, the website hashes the password you enter and compares it to the stored hash. If they match, you’re granted access.

Now if a list of these hashed passwords is stolen, attackers could use a dictionary attack, where they hash common passwords and compare them to the stolen hashes to find matches.

To prevent this, websites add a "salt" to each password before hashing. A salt is just random data that’s added to the password, so instead of hashing "password123", the website might hash "password123+random_salt". This makes dictionary attacks much harder because even if two users have the same password, their hashes will look completely different due to the unique salts.

Practical Example

Let's revisit the file verification use case we talked about earlier.

Imagine you have copied a file from a source that isn’t official or trustworthy, such as a flash memory. In such cases, you might hesitate to run it due to concerns about potential malware. By checking the file's checksum, you can confirm its authenticity and ensure it hasn’t been tampered with.

Verifying a File’s Integrity Using SHA-256 on Linux

Suppose you've copied a file called ubuntu-24.04.1-live-server-amd64.iso from someone's flash memory.

Obtain the SHA-256 Hash from the Source: The ubuntu website provides an SHA-256 hash for the Ubuntu desktop version 24.04.1 iso file. Here is how it looks.:

c2e6f4dc37ac944e2ed507f87c6188dd4d3179bf4a3f9e110d3c88d1f3294bdc *ubuntu-24.04.1-desktop-amd64.iso

Generate the Hash Locally: Open your terminal and navigate to the directory where the file is located. Then, run the following command (note: If you're calculating the checksum of a binary file, e.g., an executable, image, or archive, you should use the -b option to ensure that the file is read in binary mode):

sha256sum -b ubuntu-24.04.1-desktop-amd64.iso

Compare the Hashes: The terminal will output a hash string. Compare this with the hash provided by the website. If they match, the file is intact and has not been altered.
Wait. There is even a better way!: You can also use the "sha256sum -c" option as a more convenient way to compare a known hash with the hash of a given file and report if both hashes match (The * below, before the file name, indicates that this file should read using the binary mode):

sha256sum -c <<< "c2e6f4dc37ac944e2ed507f87c6188dd4d3179bf4a3f9e110d3c88d1f3294bdc *ubuntu-24.04.1-desktop-amd64.iso"

If the hashes match, the command output will look like this:

ubuntu-24.04.1-desktop-amd64.iso: OK

Congrats! This simple command-line tool demonstrates how hash functions can ensure data integrity in your daily operations.

Conclusion

As technology evolves, so too must our approach to security. By staying informed about the latest cryptographic standards, we can ensure our systems remain robust against emerging threats.

Let’s continue to prioritize security in all our digital endeavors. If you found this overview helpful, or if you have any questions, feel free to connect and discuss further.

Understanding Hash Functions in Cryptography

Sameh Farouk

Overview of Hash Functions

What is a Hash Function?

Core Properties of Hash Functions

Hash Functions in Practice

Standardized Hash Functions: SHA-2 and SHA-3

Extendable Output Functions (XOFs): SHAKE and cSHAKE

Hashing Passwords

Practical Example

Conclusion

More articles by Sameh Farouk

Explore content categories

Overview of Hash Functions

What is a Hash Function?

Core Properties of Hash Functions

Hash Functions in Practice

Standardized Hash Functions: SHA-2 and SHA-3

Extendable Output Functions (XOFs): SHAKE and cSHAKE

Hashing Passwords

Practical Example

Conclusion

More articles by Sameh Farouk

Blockchain Technology: A Beginner Guide to Its Foundations and Applications

Location-based addressing vs Content-based addressing

distributed and decentralized systems explained

P2P Systems 101: What They Are, How They Work and what libp2p offer

Part 2: Things That Are Often Overlooked By Newer Python Programmers

Part 1 - Things That Are Often Overlooked By Newer Python Programmers

Explore content categories