Learn from the LND failure at Lightning – Bitcoin Magazine
This is an opinion piece by Shinobi, a self-taught Bitcoin educator and tech-savvy Bitcoin podcast host.
9 October 2022, Burak from Bitmatrix (an exchange built on Liquid Network) created and broadcast a transaction to the Bitcoin mainnet, using a UTXO with a Tapscript multisig with a threshold of 998 out of 999. This transaction had 998 individual signatures in the witness field, and was almost 0 .1 MB in size, and in a fun way, it reused the exact same public key for each and every one of the 999 participants in the multisig. This transaction caused a massive disruption to the Lightning Network by exposing a bug in LND and btcd (an alternative client for the Bitcoin network).
The whole purpose of this transaction was to demonstrate the improved scalability of multi-signature scripts that Taproot has enabled. Even without using Schnorr signature-based MuSig protocols, Taproot can enable much larger multisig participant sets than previous versions of Bitcoin Script. This can be a bit of a nuanced discussion over the previous size limit of multisig if you dive into all the possible ways you can construct multisig with Bitcoin Script, so for the sake of simplicity I’ll simply discuss the previous limits that apply. to Pay-to-script-hash (P2SH) and Pay-to-witness-script-hash (P2WSH) multisig constructs. In the standard way of doing a P2SH multisig, the maximum size limit for participants is only 15, and in the case of the standard P2WSH multisig, the maximum size is 20. These limits are due to how large a script is allowed to use these different script operations , and limitations on how many processing operations are allowed to be performed within the scope of a single script. Violation of one of these limits invalidates a transaction.
With the implementation of Taproot, these script size limits were completely removed, meaning that the only limitations with Taproot script size are the block size limit itself. This is where the problem comes in regarding LND and btcd. The consensus rules implemented in btcd correctly removed these limits on script size, but the problem is that the peer-to-peer communication codebase also implemented script size controls to add a double layer of defense for node operators. Blocks and transactions would go through some sort of “pre-consensus” consensus validation before even getting to the core consensus code that performs the proper validation, the logic being that double-checking adds extra layers of defense against invalid blocks or transactions. This code was not properly updated to remove the script size limits, and continues to enforce previous SegWit script limits against Taproot transactions. So while the actual consensus code itself would have properly validated this very large Taproot transaction, the block containing it was never actually passed from the peer-to-peer validation to the actual consensus validation logic, meaning that all btcd- nodes stopped at the block, including Burak’s transaction.
Why did this affect LND, given that many people are running Bitcoin Core under their LND instance? That’s because LND uses the same code that btcd does to receive and process blocks. So even if your LND node was running on top of Bitcoin Core, which would have properly validated the relevant block and not stalled, your LND instance would have refused to accept that block and stalled even though the main chain’s node continued to run correctly.
This bug was fixed very quickly, and as far as I know, it wasn’t actively exploited in a way that caused any harm, but this opened up every LND node on the Lightning Network to potential theft of funds in channels unless they used an external watchtower. Because the node was stopped at that block, it did not have a real-time view of the blockchain, and in the event that a channel counterparty had submitted an old channel state to the blockchain, it would have been completely unaware of it and unable to respond with the appropriate penalty transaction to secure the user’s funds. This was a very serious bug that put a huge percentage of the bitcoin on the Lightning Network at risk of theft unless users manually patched and updated their nodes themselves, or personally monitored their channels to be able to manually respond in the event of a shutdown. with an outdated state. I have to say that the vast majority of non-technical node operators probably wouldn’t be able to.
Fortunately, this issue was not widely exploited, but had this been discovered in the codebase before Burak’s transaction was pushed to the blockchain, this could have been intentionally exploited by bad actors in a very tactical way. A person, or a group of people, could very easily have opened a large number of channels on the network and exchanged all the money in those channels back to themselves on the chain through a submarine exchange, and put all the funds in the channel on the other side, and then sent into a large Taproot transaction like Burak did, immediately shutting down their channels using an outdated state. The victims would not even be aware of it, and even if they were, given the relatively low technical expertise of many node operators, it is very likely that most would not have been able to respond in time to manually correct the problem with a penalty transaction.
This error highlights two important issues to consider. First, multiple independent implementations of Bitcoin nodes can be very dangerous. Fortunately, almost no one runs btcd as a node for anything serious, so the effect this had on the Bitcoin base network was something that could be completely ignored, except for a very small handful of individuals whose nodes simply stopped. If miners had run btcd this could very easily have resulted in a chain split on the Bitcoin network which would have taken all btcd operators off on a minority chain which would have required manual intervention to correct. The other problem is that when it comes to other layers above the main network, implementations of consensus checks should be done very carefully. This is a tricky problem, because while any Lightning node running on top of a Bitcoin full node could in theory simply outsource 100% of this validation to that node, not all Lightning nodes use their own trusted full urge. That is unlikely to change – many users will in all likelihood continue to operate nodes in such a way, so to some extent checks on some or all of the Bitcoin consensus rules must also be supported in Lightning implementations.
Going forward, I hope this is a wake-up call to how important it is to ensure that consensus validation checks are all in sync with each other across software in this space, as without that synchronicity between everything, there is not actually a uniformly coherent Bitcoin Network. Everyone should be very happy that this didn’t result in a massive network-wide exploit, but people should be aware of how serious this problem could have been if things hadn’t worked out the way they did.
This is a guest post by Shinobi. Opinions expressed are entirely their own and do not necessarily reflect the opinions of BTC Inc or Bitcoin Magazine.