Demystify Blockchain

Demystify Blockchain

Bitcoin explained, with analogy of classical systems

Below, "Bitcoin" refers to the system that circulates "bitcoin", the cryptographic digital asset.

When you ask the question "what is blockchain", you are likely to have many incoherent answers. Though you are frustrated by the puzzle pieces of blockchain being trustworthy, secure, and reliable, you find it hard to reason with your consultants when they use words like decentralization, cryptography, and byzantine fault tolerance. Although you are well-trained to be suspicious about anything without trade offs, you are assured that the performance issue of public blockchain is addressed by enterprise solutions. Now you have a brand new question: how does enterprise blockchain work, before your original question is clearly answered.

You decide to backtrack and find out what is Bitcoin, the first blockchain system, from an engineering perspective. You are at the right place. I will explain how Bitcoin works as a decentralized ledger for developers like you in this post.

The term "ledger" could appear peculiar if you are not familiar with accounting. In a nutshell, a ledger is a key-value store where the key identifies an account and the value records the balance of that account. Note that the account balance is just a notion of amount. It can represent anything from number of banana cakes to amount of fiat or crypto-currency and other digital assets. Dictionary (as in C#, Python) and Map (as in Go, Java) are common key-value store data structures.

Changes of a ledger reflect account activities such as deposit and transfer. If Tom pays 10 dollars to Jerry, their bank account balances should be modified accordingly. In a classical system, the custodian of a ledger is responsible for carrying out such changes (most of the time known as transaction in data-intensive application realm). The custodian can be a bank or an airline company, and the ledger will record the amount of fiat currency or loyalty points accordingly. However, if Tom transfers bitcoins to Jerry, such single ledger custodian does not exist. Instead, Tom and Jerry, and all other participants in the Bitcoin system, each owns a copy of the ledger and all make changes to update the ownership.

As crazy as it sounds, one immediate question you may ask is:

  • What if Jerry fabricate an inauthentic transaction to transfer Tom's bitcoin?

Here comes the application of modern asymmetric cryptography in Bitcoin: an account in the ledger is not only a unique identifier, but also a public key. To issue any transaction that moves bitcoin from account A to B, the owner of account A must sign the transaction with its private key. As a reminder, asymmetric cryptography can be applied to 1) endorse a message by a certain party A or 2) encrypt a message for a certain party A to decrypt. In the first use case, party A signs a message with its private key so that other parties can use party A's public key to verify the signature and confirm the endorsement. In the second use case, a sending party encrypts the message with party A's public key and only party A can decrypt the message with its private key. Any unauthorized party who eavesdrops the message is in vain. In Bitcoin, signature on a transaction message is used to ensure the transaction is endorsed by the owner of the bitcoin involved in the transaction. It is the first use case of asymmetric cryptography.

Another question that could come to you is:

  • How do Tom and Jerry keep in sync on the changes when each has their own copy of the ledger?

The answer to this question appears straight forward as consistency and consensus are well-studied topics in traditional distributed database domain: multiple replication techniques (statement-based, write-ahead or row-based logs, triggers, etc.) can keep data consistency and various algorithms (Paxos, PBFT, etc.) can guarantee consensus of the system (who is inaccessible, who should be the next leader, etc.) regardless of system failures (network segregation, process crash, etc.).

However, it is not that simple when it comes to Bitcoin. The fundamental difference of a decentralized ledger such as Bitcoin is its nature of permission-less. This has two dramatic consequences. First, the system is running in a super hostile environment as no single participant is inherently trustworthy. Second, it is almost impossible to determine a majority or elect a leader since anyone can join or leave the system at any time.

To elaborate the consequences, let's consider such a problem:

  • What if Tom only has 10 bitcoins but he sends two transactions, one transfers 10 bitcoins to Jerry and the other transfers 10 bitcoins to Mickey?

Apparently, both transactions are authentic as Tom signed them with its private key. Without the other transaction, each is also legitimate to happen. The problem is that Tom only has 10 bitcoins yet he spend them twice. In a traditional system, a ledger custodian is responsible of ordering the two transactions, executing the first and reporting error for the second one. However, when Tom, Jerry and Mickey each owns a copy of the ledger, if Tom is cunning enough and knows that a long network latency exists between Jerry and Mickey, he could send one transaction to Jerry and one transaction to Mickey. When Jerry's change reaches to Mickey over network, Mickey would have already accepted the other transaction and modified its own ledger copy and vice versa when Mickey's change reaches Jerry. At this point, it is hard for Jerry and Mickey to consolidate whose change should be voided as the total order of the two transactions does not exist. This is a simplified narrative of the double-spend problem.

As part of the attempt to solve the problem, Bitcoin introduces a new data structure: blockchain (denoted as "a time server" in section 3 of Bitcoin white paper). To the greatest simplicity, a blockchain is a kind of linked list but uses hashing mechanism instead of memory pointers to keep a total order. Any given block can be conceptually taken as a series of bytes, starting with the hash of its parent block, followed by a set of ordered transactions and sealed with a nonce (explained two paragraphs later). It is worth noting that such data structure is very robust to any malicious tampering as a small change of any block breaks their linkage (the chain): a change in a block's transaction set requires changes of all its successors' hashes and a change in a block's recorded hash of its parent requires changes of all its predecessors' contents to maintain the chain structure.

Bitcoin enforces every participant to add new transactions into a pending block rather than modifying a key-value store in place, therefore it enables every copy of the ledger in the system to maintain a total order of all transactions that the owner is aware of. Current state of the ledger could still be implicitly inferred from a blockchain data structure by replaying all historical transactions in the blockchain. In practice, various caching and specialized data structures (Merkle tree, etc.) are adopted to avoid replaying all transactions all the time to verify current state of the ledger.

Although the blockchain data structure enables each participant to establish their own view of a total order of transactions they knew, Jerry and Mickey could still debate whose view should be the "official" next one. To resolve such dispute, Bitcoin devises the Proof-of-Work mechanism (section 4 of Bitcoin white paper) to determine whose turn it is to persist the set of transactions in their pending block. The work in the Proof-of-Work mechanism is to solve a mathematical problem that is hard to compute but whose answer is easy to verify. Say Jerry finds the answer first, then it gains the right to attach its pending block to the latest chain it knew and broadcast its view of the system.

In Bitcoin system, the problem is to find a nonce for the pending block that value of the nonce makes the hash of the whole block having required number of zero bits.
Bitcoin white paper

A malicious participant in Bitcoin system can demonstrate various antagonistic behaviors such as transaction censorship. In some cases, you may wonder:

  • What if a participant fabricate a blockchain and broadcast it at its turn?

Satoshi assumes that the majority of the participants in Bitcoin are honest. With this assumption, as long as each participant has a fair chance to broadcast its version of blockchain, the overall result is satisfying. But, don't we just agree that in a permission-less system as Bitcoin, it is impossible to determine majority? Well, at any particular point of time, it is true. However, the majority that Satoshi talks about is the majority aggregated over the lifetime of the system. Therefore, it is possible to dexterously overload the blockchain data structure with another mission: to determine the history created by majority by preferring the longest chain. The bitcoin software is programmed in the way that whenever a participant learns the existence of a longer chain, it discards its current view of a shorter history, work out the proof-of-work mathematical problem from the longer chain and try to publish its current pending block as the attachment to the longer chain.

Satoshi Nakamoto: person or persons who developed bitcoin, authored the bitcoin white paper, and created and deployed bitcoin's original reference implementation.
Wikipedia

Since the right of broadcasting new block is gained through calculating mathematical problems over the lifetime of the system, Bitcoin materializes fairness as the ownership of computational power known as one CPU one vote, and Satoshi's assumption is that no single participant in Bitcoin can own more than 51% of the computational power available on earth over the time. I will discuss arguments and the counter arguments of this assumption in future posts.

Now that the fundamentals of Bitcoin are explained inside out with analogies of traditional system, you can better form your own point of view regarding values of bitcoin although I will discuss more about this in later posts. Up to authoring this post, Bitcoin seems successful in the hostile permission-less environment with moderate adoptions. A few occurrences of hard fork (you are encouraged and hopefully also prepared to research what is a hard fork) reveals Bitcoin's experimental nature. Next, we will build our knowledge of Ethereum, another vivid public decentralized network, based on understanding of Bitcoin.

Stay tuned.

Just read a book on it from the perspective of how blockchain could change society. Like AI the predictions are dire.

Like
Reply

To view or add a comment, sign in

More articles by Zhen Li, PhD

Others also viewed

Explore content categories