Build a blockchain from scratch | Insight
Blockchains are Doritos Locos Taco in IT. As complicated as software can be, there are a limited number of tools available to software developers. Software developers combine arrays, databases, objects, pointers and other logical constructions in unique ways to create effective solutions to problems, but most of these combinations have been known for decades. (This is the classic book that describes some of these common combinations.) Software developers are like chefs at a Mexican restaurant mixing hard shells, salsa and beans to make a taco, but later they use a melt tortilla, salsa and beans to make in burrito.
Because there are only so many ingredients available, innovative combinations are rare. In the fast food area, Dorito’s Locos Taco was one of these innovative combinations: It’s just like a regular taco, but the shell is instead a giant, folded Dorito. It was unveiled at Taco Bell in 2012 and quickly became the restaurant’s most popular menu item. It was a brilliant combination of existing ingredients, it surprised everyone and it was delicious.
Like Doritos Locos Taco, blockchains combine well-known software ingredients to create something completely new, brilliant and surprising. Hashing and peer-to-peer file sharing have been standard software designs since the early days of computing, but no one thought of putting them together to build a blockchain until recently. And blockchains have had a major impact on software, the economy and Internet culture in the form of cryptocurrencies, non-fungible tokens (NFTs) and public ownership registers.
And like a taco, the elements of a blockchain are not complicated. This post goes through building a blockchain from scratch to explain how everything comes together to create something unique in computer science.
The ingredients
While there may be more involved as they become more complex, the basic ingredients of a blockchain are hashing and peer-to-peer file sharing. IP / Decode has discussed hashing in two posts:
Effectively, hashing takes some input and converts it into a “hash” string of numbers and letters. For example,
“All you need is love” becomes “ed590c566dc35fefb1a1424c6541ba11”
Hashing is a one-way street. Although there is no “nothing you can do that cannot be done”, no one can take “ed590c566dc35fefb1a1424c6541ba11” and transform it back into the line from the Beatles song. Also, it is almost impossible for two input strings to result in the same output hash value. Even when the input value is very similar (“All you need is allowed”), the output is dramatically different (“b9965afad09081b3b7e6d14f037ca56e”).
Peer-to-peer file sharing refers to making copies of a document and distributing it to different computers. File sharing achieves two goals: The document is available for multiple computers, and it receives protection against changes. The availability is obvious: the more copies of the Declaration of Independence you make, the more people in the colonies can read it. (The first printing of the declaration was 200 copies.) The protection against change is a by-product of the wide distribution. If a Boston loyalist printed a version of the declaration that was a dizzying love song to King George III, it would be difficult to publish it as the authentic document with hundreds of genuine copies circulating in Massachusetts and the other twelve colonies.
Build a blockchain
These two basic ingredients come together in a blockchain to store information that is widespread and difficult to change. Block chains are made up of a number of “blocks”, each of which contains information. Each block is “linked” together by storing the hash value of a previous block.
I wrote SimpleBlockchain to show how these elements work together. The source code is available on my GitHub, and you can explore the actual blockchain here and the XML file that stores the data here.
The source code for each block is below. Each block has an ID, a timestamp (registers when the site was opened), its own hash value and the previous block’s hasd value:
Below is a visual representation of part of the blockchain:
Real blockchains would carry more data. For example, a block may detect the transfer of a crypto asset from one party to another or the movement of an item through a distribution channel.
Each time the site loads, a new block containing the timestamp is created, and then the new block is added to the top of the chain. The “gen_hash ()” function below generates the hash of a block, which is the result of combining all the contents together, including the hash value of the previous block (“last_hash”):
You can see these hash links between blocks in the commented blocks below:
Using the hash value of the previous block to generate the hash value of the current block is incredibly powerful, because it effectively locks the data inside each previous block. Remember that a hash value is the result of its input, and even small adjustments to that input result in dramatically different hash values. If someone were to adjust the time stamp or hash value in block 1, then the hash values for block 2 and block 3 would no longer correspond to what is written to the block chain. This mismatch occurs because the hash values of block 2s and block 3s depend on the hash value of block 1s.
Changing the contents of a block is similar to editing human evolution by adding an African elephant to our natural history 10 million years ago. Because our physiology and DNA today depend on all of our ancestors who came before us, we could easily detect the error of including the African elephant in our evolutionary chain. (Humans and African elephants evolved from a common ancestor, at least as far back as this flopping fish, but humans did not evolve directly from African elephants.)
If there was only one copy of SimpleBlockchain, it would be easy to change a block in the chain: Just modify the block and reconstruct all the blocks that follow it so that the hash matches. The modified blockchain would be the only copy of SimpleBlockchain, and no one would be wiser.
And that’s why peer-to-peer file sharing secures the blockchain better. If millions of copies of the blockchain are widely distributed, the one copy with the modified blocks is the odd one and the network will reject it as fake. If we continue with the example above, if the modified blockchain is the false declaration of independence that pays homage to King George III, no one will believe it with the many more copies of Thomas Jefferson’s draft in circulation.
Use of blockchains
The use of blockchains has exploded for decades since they became popular tools. Today, blockchains operate, among other things, cryptocurrencies, NFTs and supply chains. Blockchains should be considered when an organization wants to create an unchanging record of transactions.
The law often builds property chains, and wherever this occurs there is an obvious use for blockchains. For example, ongoing registration of real estate for real estate takes place with the registrar, typically at county level. Our current (and historical) approach to recording title stands in stark contrast to blockchain recording. Title registration is centralized: the county governor for deeds is the only source for determining ownership; Blockchains are decentralized: there can be millions of identical registrations of ownership spread across many computers. Title registration connects previous deeds with current deeds: To track the title, you must work backwards from transfer to dissemination to determine if the current title is clean. Blockchains perform this title chain authentication automatically using hashes to link each pure title to the next dissemination.