On forks, orphans and reorgs

TLDR;

Security of a forked chain is near identical to an unforked chain.
No transactions were lost or double spent.
Reorgs are part of the Bitcoin design and incentivize more secure 0-conf.
The design turns the unreliability of distributed systems from a problem into a feature.

Background

On 18th April 2019 we saw an interesting event on the Bitcoin SV block chain. Two sets of miners began mining competing chains which resulted in two chain re-orgs. The headline number on twitter is that there was a re-org of 3 blocks then a second re-org of 6 blocks. What hasn’t been pointed out is that the 6 block re-org encapsulated the 3-block reorg so from an outsider’s point of view it was like part of the chain switched to a longer branch then switched back to the original branch once it became longer. Imagine a race where someone takes the lead then falls back into second place again. The headline number I haven’t seen reported yet is that during the event no chain was ever more than 2 blocks ahead of the other. Perhaps more importantly we have seen very little discussion on how this affects users.Bitmex has published some useful data that helps us to unpick this event here:

https://twitter.com/BitMEXResearch/status/1119197415804932096?s=09
- BitMEXResearch

They also make the comment:

Based on the current most work Bitcoin Cash SV chain, all the TXIDs (except coinbase transactions) from the fork made it the main chain. Therefore no double spends appear to have occurred

What is a re-org?

Put simply, it is a normal part of the Nakamoto consensus. It is first important to clarify the terminology. A re-org is an event, and an orphan block (or chain) is the consequence of that event. In a 2014 report CEX.io noted that on the BTC blockchain orphans occurred 1-3 time a day: https://blog.cex.io/bitcoin-dictionary/what-is-an-orphan-block-9632These are almost all 1 block re-orgs. Blockchain.com appears to have been collecting this data between 2014-2017: https://www.blockchain.com/charts/n-orphaned-blocks?timespan=all&daysAverageString=1&scale=0 When two miners find a block at near the same time the network divides into two sets of miners mining on top of different blocks. This usually resolves when the next block is found. One of the competing blocks is retained as it now part of a longer chain and one is orphaned. Occasionally both second blocks will be found near the same time and the race continues this time with 2 blocks potentially orphaned once it’s resolved by the mining of a third block. This can be exacerbated by slow block propagation. We noted earlier ‘miners find a block at near the same time’. What really matters is not the time miner finds the block but the time the other miners see the block. They’ll start mining on top of the one they see and validate first. Since miners are highly incentivised not to have their blocks orphaned (they’ll lose their block reward if that happens) they are also highly incentivised to propagate their blocks quickly.

Why should users care?

In short they probably shouldn’t, it’s only a problem for the miners. There is a widespread myth that something bad happens to the transactions in an orphaned block. That perhaps they go missing or they get double spent? We can demonstrate the first is false and the second is so unlikely as to be not worth worrying about. We can also demonstrate that security of a forked chain from the point of view of the user is almost identical to one that is un-forked.One reason they might care however is in understanding how this works there is a valuable lesson to be learned about how bitcoin security really works which has been missed for the last 10 years.

What happens in a node during a re-org?

Firstly let’s state that all nodes have slightly different view of the bitcoin ledger state. That view is commonly called the UTXO set and we will see that UTXO is not actually a static thing that only changes once per block, the view of it that matters is in constant flux. A node’s job can be boiled down to answering one simple question. “Is this transaction valid or invalid?”. The answer to this question is a function of their view of bitcoin history. That history is a composite of two things, the transactions that are included in the longest chain of blocks the node has seen and the unconfirmed transactions they have seen (a.k.a the mempool). The composite view (remember this is the transient UTXO set) is obtained by overlaying the latter on top of the former. This allows them to answer the question “are any of the inputs in this new transaction double spends?”. This view is constantly changing, usually in response to a new transaction being seen. Occasionally in response to a re-org. But in the latter case it doesn’t change by much, in fact maybe not at all.When a node sees a new longest chain it goes through the re-org process. First a rollback then a roll forward. What is does under the hood is this:

For each block on the orphaned chain (starting with the highest and working backwards) it takes every transaction in the block and puts it back into the mempool. Those transactions are temporarily considered ‘unconfirmed’. Note that there cannot be any double spends in these blocks since the node has already validated them and all transactions already in its mempool have been tested against the history that includes this block to ensure they aren’t double spends either.
Once we’ve reached the fork block we then start working our way forwards on the new longest chain validating it the same way we normally would for a new block. For every transaction in the new blocks we check if we have that transaction in the mempool. If we do it is evicted from the mempool. We also check if our mempool contains a double spend of that transaction and evict it too

Step 1 is probably most important to note. What the node is doing is effectively extracting transactions from the orphaned blocks to ensure they do not get lost if they are not in the longer chain. BitMex’s research confirmed this was the case in this event:

Based on the current most work Bitcoin Cash SV chain, all the TXIDs (except coinbase transactions) from the fork made it the main chain. Therefore no double spends appear to have occurred

The only way in which they could get lost is if the mempool reaches its limit however, all the mining nodes** I know of are configured with a mempool of many GB so it would take a massive re-org for this to happen. In the future we plan to write these transactions to disk so there is a fall back in case the mempool does get full in this rare case.So the only likely change to node’s composite view of transaction history occurs if the new longest chain contains transaction they hadn’t seen yet. Under usual conditions that should be quite rare but even if it happens it doesn’t hurt anyone.This understanding of how a node actually handles a re-org coupled with the magic of the “first seen rule” leads us to an interesting point.

Just ask the miners

The first seen rule is something I’ve talked about a lot in recent months. If a miner accepts a transaction they won’t accept a double spend of that transaction into their mempool even if it’s got a higher fee. The only exception to this is if they accept a re-org of a chain that contains a double spend (which they may not in the future under certain conditions). So if you want to be sure a transaction won’t be double spent even in the face of a re-org you can get a pretty certain answer to this by asking all of the miners. If you are a business that understands risk/reward you may not do this for all transactions, perhaps just a sample of higher value transactions. But if all the miners respond and tell you that they have accepted the transaction then even if there is a re-org from one of those miners your transaction will be in both sides of the fork. Yes this technique does require a way to query the miners, one of which is the Merchant-to-Miner API we currently have in development and will be available sometime in Q3, but it demonstrates how simple this problem is to solve for those rare users that need to care about confirmations.

What about double spends?

The above-mentioned technique is effective so long as there isn’t a dishonest miner lurking somewhere on the network. However it doesn’t alter the risk of the double spend occurring, it just changes when you’re likely to know about it. In almost every case this would be the next block. An additional risk from longer re-orgs only becomes apparent if a malicious miner is secretly mining a long chain they plan to release later. They wouldn’t logically do unless they have > 51% of hash power. This is the attack Satoshi described in the whitepaper and the assumption that 51% of nodes will always be honest is fundamental to the Bitcoin security proposition. So it is safe to conclude that the risk of a double spend occurring due to a re-org is no different to it occurring under normal circumstances.In a later post I will explain why I think even this case is very unlikely to occur on the Bitcoin SV network once some changes are made that are coming in the very near future.

What can we learn from this?

About security

This would be a non-problem if the tools existed to monitor both chains at once (for exchanges etc). @nikitazh commented on twitter that

The network was basically stuck for 1.5 hours, and this shows that even 6 confirmations are not enough

https://twitter.com/nikzh/status/1118899374027878400

I will give Nikita the benefit of the doubt and assume that this was misunderstanding rather than misleading. For someone that saw both chains (the headers for both are broadcast so anyone can request the blocks), there was at most a two block height difference. If you could see your transaction is in both forks of the chain then it wouldn’t matter if the reorg was 100 blocks long. In the absence of another hidden chain you know your transaction is confirmed regardless of who wins.It only matters to miners who are the ones that get penalised for being on the wrong fork. And in fact this a feature of the system. By motivating miners to upgrade their capacity to avoid this penalty they experience a consequent gain in capacity to accept and propagate transactions faster which is fundamental to ensuring strong 0-conf security. This is the brilliance of Satoshi’s design. It takes the fundamental problems of distributed system design and instead of trying to solve them all, it combines them with incentives to turn them into a feature.The key takeaway from this? “Forks don’t matter, just get your tooling right”. The current state of bitcoin is that many businesses rely on the node software too much for things it shouldn’t be doing. If you are genuinely concerned about confirmations (and from what I’ve seen exchanges are the only ones that actually are) then setup your system so that you can monitor both forks then take the lowest number of confirmations out of the two as the number you base your decisions on. If you want a robust system to count confirmations in the face of forks then this is what is required. If you’re running an exchange handling 10s of millions of dollars per day in volume this shouldn’t be a big ask of your developer teams. The alternative is to require more confirmations which simply abrogates the responsibility back to the node except in the case there is a fork longer than your required count. Let’s not forget that a 24 block re-org happened back in 2013. In that case the cause was a software bug. But it could also happen due to major network outage or any number of other reasons so you should expect this to happen from time to time.

About service continuity

The one unfortunate thing that we did observe during this event was a degradation of some services that operate on top of the Bitcoin SV blockchain. The mining nodes** were all fine, albeit a bit stressed and slow to respond which I’ll address shortly, but otherwise worked as expected. Some of the block explorers either stopped updating or became slow to respond and some of unwriter’s services had problems processing the re-org.Stress tests are useful in that they highlight these kind of capacity constraints not just in node software but in services that make use of the blockchain. We have a place where stress tests happen continuously. The Scaling test network (a.k.a the STN). I would take this opportunity to invite any service that was affected to run a test instance of their service on the STN. This will highlight these capacity related issues before they become an issue on mainnet and give those dev teams useful data to help them uncover performance bottlenecks. Incidentally we’ll be making a formal invitation to developers and businesses about this in a few days.

Node performance

I’ve previously mentioned the mining nodes** behaved as expected albeit slowly. It has been well known for years that two blocks of the same size, one with a few large transaction and one with many small transaction will have completely different performance characteristics so this is no surprise. However, there are some issues this event highlights with node performance and analysis of the block times and sizes tends to confirm most of the conclusions we’ve drawn from our extensive research into not only node performance but also propagation performance of both transactions and blocks. Fortunately the fixes for these several issues are already works in progress, some due for release in June and most of the remainder in the following release. Unfortunately the detail of these would be as long as this post already is so I will leave explaining them for another post in the near future.

** footnote: In this article I have referenced "mining nodes" several times to distinguish the instances of Bitcoin SV that are providing block templates to miners from those instances that are not. It should be noted that mining is part of the definition of a node. Those instances of Bitcoin SV that are not involved in mining would more appropriately be called "fat wallets" or "blockchain listeners".