Ultraprune In Plain English

Merkle Tree
This entry is part 4 of 6 in the series In Plain English

Those of us who have spent our share of time on the BitcoinTalk Forums know full well what an orange ignore button means. That would be an indicator that the person it appears under is one of the most ignored people on the entirety of the forums – and few proud owners of an orange ignore button are more famous than Atlas. The one service he does for the community, however, appears to be pointing out how the technically semi-literate in the crowd are likely to misunderstand what the devs are talking about and have the most panicky and fearful reaction to it possible. This brings us to the topic of today’s article: Ultraprune.

Bitcoin has a serious usability problem. A lot is being done to remedy that problem but some of the biggest issues seem almost built in to the core concepts. One of the biggest issues is the time (and space) it takes to download the block chain – brand new users wanting to use the main client have to wait hours, even days before they can actually process a transaction. This leads to heavier adoption of eWallets, “selfish” clients and other systems designed to circumvent the downloading of the block chain. Problem is, these clients do the network itself no good since any user not holding the block chain can’t fully validate transactions or send blocks to other clients. Something has to be done.

And something is being done. There’s an experimental version of the Bitcoin client referred to as the “ultraprune branch” currently being developed. It seems like it should be a great thing, but it’s got a few users, including the aforementioned Atlas, scared and screaming. The name alone seems imposing – the presence of the word “prune” seems to indicate that the intent is to hack of chunks of data to reduce the overall size of the block chain and users should be fearful of any scheme to reduce size by deleting data, but that’s not actually what ultraprune does. There seems to be a great deal of confusion about what ultraprune is and how it works, so let’s try to figure the whole thing out, shall we?

In my In Plain English series I do my best to explain the often-difficult and highly technical concepts behind Bitcoin (and potentially other things, in the future) without using a lot of jargon or complicated concepts. I aim to break the problem down into simple easy-to-swallow concepts. Where I find the use of certain technical language unavoidable, I do my best to define those few terms well. If there is any confusion or incorrect information please let me know in the comments and I’ll do my best to remedy it.

To understand ultraprune, we have to understand a few things about how Bitcoin actually works. Bitcoin, technically, is less like a currency and more like a ledger. Bitcoin keeps track of every transaction everyone makes and stores it in a database, commonly referred to as the “block chain.” Every Bitcoin user has a copy of this database, and that’s how you verify that a payment is correct – when someone sends you money, your computer and every computer in the network looks at the transaction database and makes sure the sender has enough money, properly signed the transaction and so on. If anything about the transaction isn’t quite right, everyone rejects it. To get this database to everyone, it is broken down into “blocks” – small chunks of data that can be easily distributed. Every 10 minutes or so, all of the new transactions that happened in that time are bundled up into a new block and distributed to the network. When you first install Bitcoin and wait ages for the “synchronizing with network” progress bar to crawl across the screen, this is what’s happening: you’re receiving fragments of the database from other users and piecing them together. That’s also why you can’t send or receive bitcoins until the synchronization finishes – without the transaction database you have no way of knowing whether any transaction you could perform is valid.

It should be pretty obvious that the record of every transaction ever made is pretty big and takes a lot of blocks, a lot of time and a lot of hard disk space to download and store. What’s more, it’s starting to take a fair amount of heavy lifting to sift through all that data to find relevant details for any given transaction – this thing is bulky. But this is basically the same sort of thing that banks do, and they don’t seem to have these problems, so there must be a way around it, right? In fact, there are two: use better databases and do most of your work referring only to balances.

Your bank almost certainly has a long and arduous record of every transaction you’ve ever made and could print out that record for an accountant to peruse, if you wanted. But for the most part, that level of detail isn’t needed to just perform transactions. When you swipe your bank card, all you, the merchant and the bank need to know is what your balance is right now. Bitcoin needs that same information and accesses it in much the same way. Many (most) of the “blocks” making up the transaction database store only historical records of what happened some time ago, but some store transactions with “unspent outputs” – meaning that in that block someone sent money to some address and the money at that address hasn’t been spent yet – they’re holding a balance. At any given time only a tiny fraction of the block chain actually holds these kind of transactions, so there are a couple of clever things we could do with this information:

  • Built an ultra-lightweight Bitcoin client that only downloads blocks containing current balances
  • Separate the relevant blocks from the others so that the working set needed for validation is small, fast and portable, without compromising the integrity of the full block chain

The development team are working on idea #2 – basically storing balance data in a separate database from the “detailed” data, allowing clients to both start working faster and dig through less information to get to the relevant bits while retaining the ability to dig through historical information. The fear of many is that someone will implement method #1.

Why is that so scary? Well a big part of what makes Bitcoin different than previous e-currencies is that it’s distributed: everyone has a copy of everything the network needs to run and that makes it very resilient and very hard to shut down. If only a partial set of blocks are downloaded, only that partial set can be shared with others and that weakens the integrity of the network. Of course this is already a problem because in attempts to bypass the sometimes days-long download of the block chain, many people are opting for services that host wallets on their behalf or using “selfish” clients on Android or iOS (iPhone) platforms, none of which contribute much of anything to the network. I personally say that even carrying a partial block chain would be better than carrying none at all, but definitely share the concerns of those worried about such an implementation. It should also be noted that at no point is anything actually deleted from the block chain, such clients would simply choose not to download large portions of it.

Worry not, however, because (at this time) no such thing is actually being planned. A lot of time, effort and discussion stands between us and any sort of ultraprune implementation that doesn’t also store the whole of the block chain. For now, the developers intend to create two separate databases: one small lean fast database that stores only blocks with “unspent outputs” – balances, basically – and another that stores everything the block chain database has traditionally stored. This should allow us to perform transactions much faster after a brand new install and should keep the processing power needed to dig through the database much lower without actually hurting the integrity of the network. Undoubtedly someone will build a client that only downloads the relevant blocks, but my money says it’ll be geared toward mobile devices where storage space is limited – and truthfully, this would actually be an upgrade from the way mobile devices currently handle things, storing only blocks relevant to their own addresses.

Oh, and remember how I said banks did two things different? The devs are upgrading Bitcoin’s database software to something a little faster/beefier too.

So there you have it, ultraprune in a (hopefully) simplified nutshell. There are doubtless some technical concerns, though honestly more of those surround the swapping out of database formats than the actual ultraprune code itself and none of them are quite as bad as Atlas and friends would have you believe. Whatever difficulties we may encounter, it is this author’s belief that ultraprune, as it is conceptualized now, will actually improve the decentralization of Bitcoin since many folks are currently using eWallets or other centralized services simply because they were unwilling to wait hours or days for the block chain to download. Alleviate that concern and I think a great deal more users will switch to the main client, which is great for Bitcoin.

Series Navigation<< Bitcoin Attacks in Plain EnglishPonzi Schemes In Plain English >>

Comments

  1. Can you explain to me, in plain english, why clients never seem to get more than 8 connections? And wouldn't having more connections lead to an increase blockchain download rate?

  2. cypherdoc says:

    great explanation. you've found your niche!

  3. Would it be possible to actually delete useless nodes in the "complete" database? I mean, the nodes that are important are only the nodes that hold a balance and nodes that are in the middle of nodes that hold a balance. What are the dangers of deleting some transactions that have happened but are now completely irrelevant?

    • I can't see any reason why you wouldn't be able to, with some modifications to the client of course. The consequences of you, an individual doing this, are fairly trivial – you just won't have copies of those blocks to send to anyone else requesting them. The consequences of everyone or even a large minority doing this is that those blocks become significantly less available to the entire network, increasing the chance of data loss, increasing the time taken to download the whole block chain and potentially creating gaps in the data that could be quite problematic.

      Basically, your client is free to keep or discard whatever data you like, but discarding data reduces your client's usefulness to the network and making a widespread practice of discarding data decreases the usefulness of each network node on average.

  4. I guess this relates to your more recent "stats" posting too but surely a wider adoption that would heavily tilt the balance of the user-base into the "technophobe" camp would also mean that a centralised storage option is preferable to most participants?

    In general, when it comes protecting things of value on their computers, the majority of people are somewhat careless. Whether it's just laziness and/or just a lack of trust in their own technical ability, cloud storage makes more sense for most. It's quite possible (if not probable) that a more general Bitcoin audience would prefer to trust a third party to maintain their Wallet for them.

    Do you imagine a future compromise whereby hosted wallets can also better contribute to the integrity of the whole network?

    • I think it's a double-edged sword, not by happenstance, but inherently. Any tendency in humans to abdicate the responsibility of protecting oneself from people-without-integrity (may I call them "politicians"?) will eventually lead to a win for such exploiters. These wins pile up and lead to, among other things, the impugnment of the system through which the exploitation is done. Think banks and our perception of them, because in days of yore, dishonest goldsmiths produced fake receipts, and eventually governments said "Oh, that's ok!" Banking is still a good idea, but not with the built-in guarantee to fail that fractional reserve introduces.

      Hobbes (Thomas) suggested that we travel with weapons because we know everyone is out to get us. Of course, it's not everyone, but a few (and fewer still as our ability to communicate increases, eg with cellphones). I think that encouraging people to trust those they've deemed trustworthy is great, but encouraging them to find someone to trust simply because that makes life easier is horrible because it leads to leviathan.

  5. Pieter Wuille says:

    Thanks, nice article! A few things I find confusing though: the logo at the top of the pos represents a merkle tree. A merkle tree is necessary for the transaction pruning mechanism Satoshi described in his paper, and is essential for SPV nodes, but it has nothing to do with ultraprune really. Also the #2 idea presented, while being correct, is completely independent from ultraprune. All it does is replace the block chain index by a pruned copy of it. A full node still requires processing the entire history (though doing so will happen much faster).

    • "The idea behind ultraprune is to use an ultra-pruned copy (only unspent transaction outputs in a custom compact format) of the block chain for validation (as opposed to a transaction index into the block chain). It still keeps all blocks around for serving them to other nodes, for rescanning, and for reorganisations. As such, it is still a full node. So, despite the name, it does not implement any actual pruning yet, though pruning would be trivial to implement now. This would have profound effects on the network though, so may still need some discussion first."

      Source: You. https://bitcointalk.org/index.php?topic=119525.0

      Edit: Ah wait I see what you're talking about, it's the prioritization comment. I can fix that.
      Edit: Fixed.

Speak Your Mind

*

%d bloggers like this: