satoshi.ke

peer to peer systems


Hashing

Hashing is the process of taking in a block of data of any size and producing a corresponding fixed size representation known as its hash.

hashing example: input to output [dot file]

Different types of hashing algorithms exist, but each algorithm always produces hashes of the same size for any inputs it receives. For example, the SHA256 algorithm will always produce a hash of 256 bits (32 bytes) long for any input passed.

Other common hashing algorithms are MD5 which produces a 128 bit hash, and the SHA512 that produces a 512 bit hash.

Properties of hash functions

Some desirable properties of a hashing function:

  • They are deterministic. For a given hashing algorithm, any input to the hash function will always produce the same hash.

  • They are preimage resistant. Given the output hash, it is not possible to derive the input that was used to generate it.

  • They are collision resistant, meaning that while it is theoretically possible to find two inputs that produce the same hash, in practice this should be very hard. Some algorithms are more collision resistant than others. MD5 for example is not recommended for any critical uses because it is easy to find collisions.

  • Any small change in the input should result in a drastic change in the output hash, what is called the ‘avalanche effect’. For example, changing even a single character in a long block of text passed into the hash function should result in a drastically different output hash from the original one. This makes it much harder to predict the input data that was used to generate the hash.

Uses

Password authentication relies on hashing user passwords. When a user first provides their password, only the password hash is stored in the database. Any time the user is being authenticated, a hash of the password they provide is generated on the fly and compared to the stored one, and the check passes if they match. In case the database is ever leaked, only the hashes are compromised but not the original user passwords.1

Checking the integrity of downloaded files can also utilize hashing. The provider typically publishes the hash of each file to be downloaded on their website. After the file has been downloaded, a hash is generated locally and matched to the published one.

A blockchain is essentially blocks of data chained together using their hashes. The hash of the previous block is used to generate the hash of the next block, which makes it possible to detect any tampering of blockchain data, because any subsequent blocks will have invalid hashes. This is how blockchains enforce immutability i.e. any data accepted onto the blockchain cannot be changed.

Examples in code

Python

In python, we use the hashlib module that comes with the standard library.

We can generate an SHA256 hash as follows

import hashlib


def hash_function(input_text):
    """Returns a fixed length hash of the input"""

    input_bytes = bytes(input_text, encoding="utf-8")
    return hashlib.sha256(input_bytes).hexdigest()


variable_length_input = "The quick brown fox jumps over the lazy dog"
fixed_length_hash = hash_function(variable_length_input)

print(fixed_length_hash)
# outputs: d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592

print(len(fixed_length_hash))
# outputs: 64

[ source file]

In the snippet above, we provide a string to the hashing function hashlib.sha256 and generate its hash. The function expects an input in bytes, so we convert the string to bytes first. The generated hash can be represented in different forms. We want an easily readable one so we produce a hexadecimal string representation by calling hexdigest on the hash.

As mentioned previously, SHA256 hashes are 256 bits long. Each hexadecimal character represents 4 bits. This results in a 64 (256/4) character long hash.

JavaScript

NodeJS comes with the crypto module that we can use to generate hashes.

const { createHash } = await import('node:crypto');

const hashFunction = (input_text) => {
    const hash = createHash('sha256');
    hash.update(input_text);
    return hash.digest('hex');

let variableLengthInput = "The quick brown fox jumps over the lazy dog";
let fixedLengthHash = hashFunction(variableLengthInput);

console.log(fixedLengthHash);
// outputs: d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592

console.log(fixedLengthHash.length);
// outputs: 64
};

[ source file]

Notice that for the same input as we had in Python, we received the same hash.

createHash returns the object that will generate the hash. We add the text to be hashed using update, and get the hash output using the digest. This gives us the 64 character hexadecimal hash.

In this article, we looked at the basics of hashing. In a future article, we will try implementing a simple hashing algorithm.


  1. Note that this is a simplistic scenario that can be defeated by the rainbow table attack. A more secure setup is hashing a salt along with the password ↩︎