Hashing in Blockchain

Hashing in Blockchain

When we talk about blockchains, one of the first things that strike us is data security. The rising popularity and acceptance of the technology is because of the same reason.

Blockchains are immutable. And why do I say it? What do I mean when I say “Blockchains are immutable.” The important word in this context is “immutable”. What does the word “immutable” mean?

It means that something that can’t be changed over time, something that can’t be altered. 

Are private databases immutable? No they aren’t. In a private database most of the users have read only access to the data. But the system administrator has the power to change data.  We all trust him with his job, but a naughty administrator can tinker with the data and modify it.

That is where blockchains come in. When we use this term with blockchains it means that once a data is written to a blockchain, it cannot be changed. No one can change it, not even the system administrator. It stays there without any modification.

For transactions this provides an assurance that the data in your system has not been altered.

Blockchain Immutability – What is Hashing?

Now you might ask – “What makes blockchain immutable?” One word answer for that – hashing.

So what is hashing. In simpler terms, hashing is a mathematical function that takes an input and provides an output.

What makes hashing interesting is that you can provide an input of any length to it. However, the output will be always of a fixed length.

For example, if you are hashing the alphabet “A”, you would get a 256-bit output. And, if you hash the entire “Mahabharata” or “Ramayana” or “The Bible”, you would get a different output of course, but of the same length as you got when you hashed the letter “A”. The output would be of 256-bit length.

A hash function takes an input of any length and creates an output of fixed length.

The hash output contains a series of alphabets and numbers which is incomprehensible by a human. The hash output is referred to as message digest or digital fingerprint. This means that the hash output for a particular string will always be unique.

Many people wonder at this very fact – how can the hash output never be the same? In order to get a re-occurrence of the same string, it would take like all the computers since the beginning of the time a billion years ago for an output match with two different string or sets of data.

Secure Hash Algorithm and SHA – 256

In the blockchain context, SHA 256 is a well known type of hash. SHA stands for Secure Hash Algorithm. It is of length 256 which means that using this hashing technique will result in an output of 256 bit or 64 characters.

SHA-256 Output

Note how the SHA-256 output of ‘A’ and ‘a’ are different. That’s how powerful they are.

Two of the most important properties of a hash function are:

  1. It is difficult to back calculate the original data from the hash.
  2. A slight change in the input data results in an unpredictable change in the output

Detecting Data Tampering

How can you detect a change in your data using hash functions? Compare the message digest or hash output  of the two versions. A change in the message digest will mean modification to the data.

Now a particular block in the blockchain contains transactional data and also hash of the contents of the previous block. So each blog’s hash is derived from the content present in the block.

Now if ever two concurrently present block has a hash mismatch, it would clearly mean that data has been modified.

Let us consider a scenario – we have a blockchain ecosystem which has close to 300,000 blocks. Suppose each of these blocks represent a number like 1,2,3… till 300,000. In reality this is not how it is done. I am numbering them to make it understandable.

You have a copy of the ledger on your computer and decide to tamper with it. Say you tamper the block number 200000. This means that there are 100000 more blocks after this block.

On the same block, suppose you delete a transaction – to prove that it never happened.

What will immediately happen?

The block’s hash would fail. Why? Because you have changed the content of block number 20000. Immediately the hash string of this block changes.

Now the next block 200001 already contains the hash string of block number 200000. This hash string was for the data that was previously there before you modified it. This hash string was generated when block number 200000 got validated when the transactions in it were approved long ago.

So it is immediately understandable that someone has played mischief with the data, which is you.

Now if you want to fool the system, you would have to manually change the previous block hash string for all the subsequent blocks. Changing 100000 hash strings in the system would require massive computing power and is technically not possible.

This would result in the entire blockchain failing. Of course there are checks and balances to ensure that the blockchain does not fail. Like you have made change on your block and you would now need to add it back to the ecosystem.

Now remember that you are not the only one who has the copy of ledger. There will be hundreds of users who would have the copy.

The system is so built that the block that you have modified would be rejected as majority of the ledger keepers have a block which does not match yours.

Blockchains make use of hash functions everywhere. Each block contains data that is hashed. If someone ever tries to change the data on a block, the hashed value would change and this change would be noticeable to all. When people say that blockchain data cannot be changed, it is not that the data can never be altered. Practically it would be impossible. It would require collusion and massive computing power. And detecting a change is very easy.

In the next section we will study merkle trees where hashing is implemented.

Hashing in Blockchain

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top