This is an article about some aspect of blockchain technology, but it is intended for a non-technical audience. This article is building up some basic technical understanding as a foundation for later technical articles. You’ll probably come to the end of this article and have a lot of questions, like, “But what does this have to do with validating transactions?”, and “What’s proof of stake?” Please be patient! We’ll get to those questions in future articles.
Strings
In computer programming languages, a piece of text is represented as a “string”. Most programming languages represent a string by having you place the string between quotation marks. My favored programming language, Python, allows you to use either single quotes or double quotes. Here are some examples of strings:
'Hello, world'
"Hello, world"
'Goodnight, moon'
'Goodbye, cruel world'
'Attack at dawn.'
'The quick brown fox jumped over the lazy dog.'Note that the first two strings are exactly the same. The quotation marks around the strings aren’t part of the string, they just tell the programming language that what is contained between them is a string.
Functions
A function, in a computer programming language, is a small part of a program which takes some pieces of data and transforms them in some way, and then gives you back some data.
Don’t get intimidated intimidated by this: it’s simpler than it sounds. For example, the “len” function takes a string, and returns a number which tells you how many characters (letters) are in the string. Here is an example:
>>> len('Hello, world')
12
>>> len('abcdefghijklmnopqrstuvwxyz')
26If you count the characters in the string, “Hello, world”, you’ll see there are 12 characters. If you count the characters in the alphabet, you’ll see there are 26. All the “len” function is doing is just counting the characters in the string you give it, and returning the count1. Note that the computer thinks of the comma and the space as characters, so these are included in the length.
Programmers can write new functions that do all sorts of things. For example, I’ve written a function, “first_five” which takes a string, gets the first five letters, and returns a string containing the first five characters. This function doesn’t come with the programming language, I created it:
>>> first_five('Hello, world')
'Hello'
>>> first_five('Goodnight, moon')
'Goodn'
>>> first_five('Goodbye, cruel world')
'Goodb'Functions can also take numbers. For example, I’ve created a function that doubles numbers:
>>> double(1)
2
>>> double(5)
10Hash functions
A hash function is a function that takes a piece of data, and returns a seemingly random piece of data. There are many different hash functions. For example, md5, sha1, and blake2s are three different hash functions2:
>>> md5('Hello, world')
'bc6e6f16b8a077ef5fbc8d59d0b931b9'
>>> sha1('Hello, world')
'e02aa1b106d5c7c6a98def2b13005d5b84fd8dc8'
>>> blake2s('Hello, world')
'ffda16218e874a2f24e36300bffca2064e4f75a338c20fa540d26fb66b912de5'Any change to the input of a hash function, no matter how small, should create a massive difference in the output of the function:
>>> sha1('Hello, world')
'e02aa1b106d5c7c6a98def2b13005d5b84fd8dc8'
>>> sha1('Mello, world')
'4488ae4fbb17c9e7772f25c7c11764b00a31c736'
>>> sha1('Hello, world.')
'2ae01472317d1935a84797ec1983ae243fc6aa28'Collision-resistance
A collision is when two inputs to a hash function produce the same output string.
Some functions are collision-resistant, meaning that it should be very difficult to find two inputs that produce the same output.
Creating a function that behaves in this way is easier said than done, and some hash functions do it badly. The md5 function we showed above is intended to be a collision-resistant hash, but with modern computers we’ve been able to find some collisions. For example3:
>>> md5(0x4dc968ff0ee35c209572d4777b721587d36fa7b21bdc56b74a3dc0783e7b9518afbfa200a8284bf36e8e4b55b35f427593d849676da0d1555d8360fb5f07fea2)
'008ee33a9d58b51cfeb425b0959121c9'
>>> md5(0x4dc968ff0ee35c209572d4777b721587d36fa7b21bdc56b74a3dc0783e7b9518afbfa202a8284bf36e8e4b55b35f427593d849676da0d1d55d8360fb5f07fea2)
'008ee33a9d58b51cfeb425b0959121c9'Looking at the two inputs to the md5 function, finding the differences between them is like a game of spot-the-difference: it’s quite hard to verify visually that these inputs are actually different. So if you were relying on md5 to tell you that they are different, that would be a big problem.
Thankfully, we have better collision-resistant hash functions now. The most commonly used collision-resistant hash functions are sha2 and ripemd.
If it’s worth it, let me work it
It is hard to find two inputs to a collision-resistant hash function which produce the same output. A side-effect of this is that it’s hard to find an input to a collision-resistant-hash which produces an output which starts with a specific string.
Let’s say we’re trying to find a string which we can pass into sha2, and get an output that starts with “1”. There are 16 possibilities for the first letter in the output (“0123456789abcdef”), so for every input to the collision-resistant hash, there is a 1 in 16 chance that the first letter will be “1”. Let’s try it:
>>> sha2('0')
'5feceb66ffc86f38d952786c6d696c79c2dbc239dd4e91b46729d73a27fb57e9'
>>> sha2('1')
'6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b'
>>> sha2('2')
'd4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35'
>>> sha2('3')
'4e07408562bedb8b60ce05c1decfe3ad16b72230967de01f640b7e4729b49fce'
>>> sha2('4')
'4b227777d4dd1fc61c6f884f48641d02b4d121d3fd328cb08b5531fcacdabf8a'
>>> sha2('5')
'ef2d127de37b942baad06145e54b0c619a1f22327b2ebbcfbec78f5564afe39d'
>>> sha2('6')
'e7f6c011776e8db7cd330b54174fd76f7d0216b612387a5ffcfb81e6f0919683'
>>> sha2('7')
'7902699be42c8a8e46fbbb4501726517e86b22c56a189f7625a6da49081b2451'
>>> sha2('8')
'2c624232cdd221771294dfbb310aca000a0df6ac8b66b696d90ef06fdefb64a3'
>>> sha2('9')
'19581e27de7ced00ff1ce50b2047e7a567c76b1cbaebabe5ef03f7c3017bb5b7'As you can see, it took us 10 tries to find an input (“9”) to the hash which returns a value that starts with “1”. However, if anyone wants us to prove that we have found an input to sha2 which starts with “1”, we can give them the input (“9”) and they only have to run the hash function with the input we give them once to verify that the output starts with “1”:
>>> sha2('9')
'19581e27de7ced00ff1ce50b2047e7a567c76b1cbaebabe5ef03f7c3017bb5b7'This is known as an asymmetric function. Note that it took the prover running the hash 9 times to find a hash that started with “1”, but it only took the verifier running the hash once to verify that the hash has been found.
The proof of work proves a few things:
It proves that time elapsed between the time the work was requested, and the time that the proof was found.
It proves that we used some electricity and computer hardware to run the hash functions.
But, this only proves that we’ve done this work this one time. If someone else asks for the same proof of work again later, we could just provide them with the same proof of work, without actually doing the work.
If someone wants me to prove that I’ve done different work, they simply need to give me a different starting place to look inputs with the required properties. This input should be unique, but otherwise it doesn’t matter what it is. This input is called a nonce.
Let’s see what happens if a verifier gives us a nonce “abc”, and we want to prove that we did new work after they gave us this nonce:
>>> sha2('abc0')
'56abfbd7d2ea606e667945422de5a368b8b0272b8f29081cb058b594dd7e3249'
>>> sha2('abc1')
'dbfcfd0d87220f629339bd3adcf452d083fde3246625fb3a93e314f833e20d37'
>>> sha2('abc2')
'4bdd0bbfe3f4c52cc2c8ff02f1fef29663dd9938f230304915805af1fa71e968'
>>> sha2('abc3')
'851ca7a5e2d4bce908ced2c566ce1ef6f1cc1921fcb4c270353cbc81f2e3b59c'
>>> sha2('abc4')
'7bb3daccaf354e7e2f3471fc688c8b3d12d970f365816324c36c981bc3a99f25'
>>> sha2('abc5')
'a724e9ff61764a77745f4f6ef4e50e1f758f436f3b4a1cf28cdaac97f1ffdfb9'
>>> sha2('abc6')
'7cae2668f61b05445b86b3319ab84cd5d3630f76d956f5096ef680468073c3d8'
>>> sha2('abc7')
'53dd02b72c4e7463b448e5374abedc168dcd200ad7e1221fe92d440c545859c6'
>>> sha2('abc8')
'dff0b0e2a9a1f5eb91fee57eb6de7f7fef2ae871eb82cb34f7cd39b77c3da107'
>>> sha2('abc9')
'a2bd21098ba5f97d152a97e89887e463f1d15d47fa0819151600dac4395b29b0'
>>> sha2('abc10')
'a81c43c82465f2becf61da46f2108b893802cb3d2fb83ef70216ed41b5d873d7'
>>> sha2('abc11')
'c0f852a67830b3fb052513573b07057974eba4efd30675780affbc5ac374ae3c'
>>> sha2('abc12')
'8d51feb34e3e69f6fa6dffc577e2c60490cf9a7fcd835f9f6af1505b71d74773'
>>> sha2('abc13')
'a2a3fcd0c8932e398e381f5e53ede4ee0170b83b69c518dbdc324632408ac496'
>>> sha2('abc14')
'cef70ae2b8d721edfd9e846e68b66d80539c79582390ba31e28c99bd24d002a5'
>>> sha2('abc15')
'355e8a50eb5c482532b93f33f1a9e59807ca34df271d56d5d9152537944c1704'
>>> sha2('abc16')
'49504130bc739e83eff510b3bd0d4af3c91710647bbc8778895e90e06c98a2f8'
>>> sha2('abc17')
'ca200bc965ca98bda52de30d969df93cd687c1dfd21011604a9dcf4b4c33500c'
>>> sha2('abc18')
'ea5df6b44bfca51d2b1d972ac1eab51388855c5324d7ba8be7b306dceac9bdff'
>>> sha2('abc19')
'1392ef83ff04f6e7d066c22f3799942d26dfd5ddfbe7b8319d28e1dce017a967'This time it took us 20 tries! Remember, the 1 in 16 chance we have of finding a result that starts with 1 means that this function will take an average of 16 tries to find the result we want. It may take fewer tries, or it may take many more tries.
But, now we, the provers, can give the verifier the proof, “19” to show that we did an average of 16 hashes of work on the nonce “abc”.
A few things to note about this:
I’m just choosing our potential proofs by counting up starting at 0. It doesn’t have to be done this way: you can choose any way of creating these potential proofs, and as long as each potential proof is unique, you’ll always have a 1 in 16 chance of finding a proof.
I combined the nonce (“abc”) and the potential proofs (“0” through “19”) by just putting the proofs after the nonce, but there are lots of other ways you could combine these two values (for example, you could put the potential proof before the nonce). It doesn’t matter how you combine the nonce with the proofs, as long as both the prover and the verifier agree on how to combine the nonce and the potential proofs.
Any collision-resistant hash will work for this.
Difficulty
In the examples above, I wanted to show you how the work is done, so I wanted proofs of work that would only take a few tries: an average of 16 tries, to be exact. However, proof of work is usually not done by people manually running hash functions, it’s done by computers that can run quadrillions of hashes per second. Finding a proof that produces a hash output which has a 1 in 16 chance of being found happens pretty instantaneously for a mining computer. It proves that they did work, but it doesn’t prove that they did much work.
If you want to prove more work, you can simply create more difficult-to-fulfill requirements for the hash output. A simple example of this is increasing the number of 1s at the beginning of the hash output. There’s a 1 in 16 chance of finding a hash with one 1, a 1 in 256 chance of finding a hash with two 1s, a 1 in 4096 chance of finding a hash with three 1s, etc. Simply increasing the number of 1s required in this way increases the average number of hashes the prover will have to run exponentially. Requiring just four 1s means it will take an average of 65536 tries to find a proof for a given nonce. Running 65536 hashes takes about a second on my laptop, so we’re already taking amounts of work which are noticeable to a human, despite the speed of computers.
But all this does is make the asymmetric function more asymmetrical. While it takes an average of 65536 hashes to create a proof of work at four 1s of difficulty, it still only ever takes running one hash to verify that the work was done. Even if the proof of work was created with hundreds of digits of difficulty using the most powerful pool of mining rigs in the world, I can verify the proof quickly with a weak processor on an old Nokia phone, provided the Nokia phone is running the right software. That’s the power of proof of work.
This isn’t actually how it works in most code: most code stores the length of the string at the beginning of the memory. But this understanding is true in some programming languages and will suffice for the purposes of this article.
If you know Python, you’ll recognize that even though I’m using the Python REPL I’m not using the normal hashlib library directly—I wrapped the md5, sha1, and blake2s functions in my own helpers. That’s because this article is about proof of work, it’s not intended to teach people Python.
Credit for finding this collision goes to Marc Stevens. If you want to reproduce this result with your own code, you’ll need to convert the given integers into blocks of 64 bytes with a big-endian byte order.