Those of us who have spent our share of time on the BitcoinTalk Forums know full well what an orange ignore button means. That would be an indicator that the person it appears under is one of the most ignored people on the entirety of the forums – and few proud owners of an orange ignore button are more famous than Atlas. The one service he does for the community, however, appears to be pointing out how the technically semi-literate in the crowd are likely to misunderstand what the devs are talking about and have the most panicky and fearful reaction to it possible. This brings us to the topic of today’s article: Ultraprune.
Bitcoin has a serious usability problem. A lot is being done to remedy that problem but some of the biggest issues seem almost built in to the core concepts. One of the biggest issues is the time (and space) it takes to download the block chain – brand new users wanting to use the main client have to wait hours, even days before they can actually process a transaction. This leads to heavier adoption of eWallets, “selfish” clients and other systems designed to circumvent the downloading of the block chain. Problem is, these clients do the network itself no good since any user not holding the block chain can’t fully validate transactions or send blocks to other clients. Something has to be done.
And something is being done. There’s an experimental version of the Bitcoin client referred to as the “ultraprune branch” currently being developed. It seems like it should be a great thing, but it’s got a few users, including the aforementioned Atlas, scared and screaming. The name alone seems imposing – the presence of the word “prune” seems to indicate that the intent is to hack of chunks of data to reduce the overall size of the block chain and users should be fearful of any scheme to reduce size by deleting data, but that’s not actually what ultraprune does. There seems to be a great deal of confusion about what ultraprune is and how it works, so let’s try to figure the whole thing out, shall we?
In my In Plain English series I do my best to explain the often-difficult and highly technical concepts behind Bitcoin (and potentially other things, in the future) without using a lot of jargon or complicated concepts. I aim to break the problem down into simple easy-to-swallow concepts. Where I find the use of certain technical language unavoidable, I do my best to define those few terms well. If there is any confusion or incorrect information please let me know in the comments and I’ll do my best to remedy it.
To understand ultraprune, we have to understand a few things about how Bitcoin actually works. Bitcoin, technically, is less like a currency and more like a ledger. Bitcoin keeps track of every transaction everyone makes and stores it in a database, commonly referred to as the “block chain.” Every Bitcoin user has a copy of this database, and that’s how you verify that a payment is correct – when someone sends you money, your computer and every computer in the network looks at the transaction database and makes sure the sender has enough money, properly signed the transaction and so on. If anything about the transaction isn’t quite right, everyone rejects it. To get this database to everyone, it is broken down into “blocks” – small chunks of data that can be easily distributed. Every 10 minutes or so, all of the new transactions that happened in that time are bundled up into a new block and distributed to the network. When you first install Bitcoin and wait ages for the “synchronizing with network” progress bar to crawl across the screen, this is what’s happening: you’re receiving fragments of the database from other users and piecing them together. That’s also why you can’t send or receive bitcoins until the synchronization finishes – without the transaction database you have no way of knowing whether any transaction you could perform is valid.
It should be pretty obvious that the record of every transaction ever made is pretty big and takes a lot of blocks, a lot of time and a lot of hard disk space to download and store. What’s more, it’s starting to take a fair amount of heavy lifting to sift through all that data to find relevant details for any given transaction – this thing is bulky. But this is basically the same sort of thing that banks do, and they don’t seem to have these problems, so there must be a way around it, right? In fact, there are two: use better databases and do most of your work referring only to balances.
Your bank almost certainly has a long and arduous record of every transaction you’ve ever made and could print out that record for an accountant to peruse, if you wanted. But for the most part, that level of detail isn’t needed to just perform transactions. When you swipe your bank card, all you, the merchant and the bank need to know is what your balance is right now. Bitcoin needs that same information and accesses it in much the same way. Many (most) of the “blocks” making up the transaction database store only historical records of what happened some time ago, but some store transactions with “unspent outputs” – meaning that in that block someone sent money to some address and the money at that address hasn’t been spent yet – they’re holding a balance. At any given time only a tiny fraction of the block chain actually holds these kind of transactions, so there are a couple of clever things we could do with this information:
- Built an ultra-lightweight Bitcoin client that only downloads blocks containing current balances
- Separate the relevant blocks from the others so that the working set needed for validation is small, fast and portable, without compromising the integrity of the full block chain
The development team are working on idea #2 – basically storing balance data in a separate database from the “detailed” data, allowing clients to both start working faster and dig through less information to get to the relevant bits while retaining the ability to dig through historical information. The fear of many is that someone will implement method #1.
Why is that so scary? Well a big part of what makes Bitcoin different than previous e-currencies is that it’s distributed: everyone has a copy of everything the network needs to run and that makes it very resilient and very hard to shut down. If only a partial set of blocks are downloaded, only that partial set can be shared with others and that weakens the integrity of the network. Of course this is already a problem because in attempts to bypass the sometimes days-long download of the block chain, many people are opting for services that host wallets on their behalf or using “selfish” clients on Android or iOS (iPhone) platforms, none of which contribute much of anything to the network. I personally say that even carrying a partial block chain would be better than carrying none at all, but definitely share the concerns of those worried about such an implementation. It should also be noted that at no point is anything actually deleted from the block chain, such clients would simply choose not to download large portions of it.
Worry not, however, because (at this time) no such thing is actually being planned. A lot of time, effort and discussion stands between us and any sort of ultraprune implementation that doesn’t also store the whole of the block chain. For now, the developers intend to create two separate databases: one small lean fast database that stores only blocks with “unspent outputs” – balances, basically – and another that stores everything the block chain database has traditionally stored. This should allow us to perform transactions much faster after a brand new install and should keep the processing power needed to dig through the database much lower without actually hurting the integrity of the network. Undoubtedly someone will build a client that only downloads the relevant blocks, but my money says it’ll be geared toward mobile devices where storage space is limited – and truthfully, this would actually be an upgrade from the way mobile devices currently handle things, storing only blocks relevant to their own addresses.
Oh, and remember how I said banks did two things different? The devs are upgrading Bitcoin’s database software to something a little faster/beefier too.
So there you have it, ultraprune in a (hopefully) simplified nutshell. There are doubtless some technical concerns, though honestly more of those surround the swapping out of database formats than the actual ultraprune code itself and none of them are quite as bad as Atlas and friends would have you believe. Whatever difficulties we may encounter, it is this author’s belief that ultraprune, as it is conceptualized now, will actually improve the decentralization of Bitcoin since many folks are currently using eWallets or other centralized services simply because they were unwilling to wait hours or days for the block chain to download. Alleviate that concern and I think a great deal more users will switch to the main client, which is great for Bitcoin.