The Difficulty With Bitcoin Stats

google-analytics

There has been a lot of news lately surrounding a certain study (PDF warning) funded by Citi. The study attempted analysis of the Bitcoin block chain for a number of purposes and came up with a few startling numbers that a lot of the “mainstream media” (if Bitcoin has such a thing as mainstream media) have been running wild with. The most commonly cited is that 78% of all coins are supposedly being stashed under the digital mattress. I think, however, that it’s fair to qualify the condition of those 78% of coins as a bit more complicated than all that. The study fails to account for the fact that a huge portion of all coins ever mined were mined back when they were essentially worthless. 10,000 BTC bought a pizza, once upon a time, and CPU mining was the norm. Back then, losing several thousand coins to a hard disk failure or otherwise losing a wallet was like a $5 bill going missing from your wallet – annoying but not worth losing sleep over. So how many of those 78% of coins are actually spendable but sitting idle and how many are lost to time? Like so many things with Bitcoin, the real problem is: we just don’t know.

People forget when dealing with Bitcoin that while it was designed with a certain kind of transparency in mind and it is infinitely more open and study-able than traditional currencies, it was also built with privacy in mind and to that end it’s very hard to get certain kinds of information about Bitcoin transactions. Of that 78% we can certainly say that some is lost, some is savings and some are likely the cold storage accounts of large exchanges or merchant service companies – but we don’t know (and for the most part can’t know) which. All those stale coins are like Binion’s silver, clever financial analysis can certainly indicate that something’s missing, but the very nature of a thing being missing means we haven’t a clue where it actually is.

It goes beyond the design of Bitcoin, too. Bitcoiners themselves are often elusive beasts, difficult to track. Allow me to share a personal anecdote.

This company that currently hosts this site offers me “unlimited” bandwidth and storage on my portion of this tiny shared server. Recently I was considering upgrading to a private server, but even my current company caps bandwidth on VPS systems, so I thought I’d poke around and see how much bandwidth this one site was actually using. Conveniently my host happens to offer a number of server-side log analyzers that can provide just that sort of information. I fired up AWStats and jotted down my bandwidth numbers for later analysis, then looked over at the page views column and gaped.

To understand my surprise, you must understand the difference between log analyzers like AWStats and client-side tracking like Google Analytics. Google Analytics is a little snippet of JavaScript that’s embedded into each page on your site and when someone views your site their computer runs that JavaScript and hands over all kinds of useful information like what kind of browser they’re using, how big their screen is, whether they’re on a mobile device and much more. The wealth of data gathered by Analytics can be quite valuable, but the method is flawed. Not every browser runs JavaScript the same way and some people disable JavaScript or otherwise block such tracking attempts, so some percentage of visitors to a given site simply won’t be counted – you can usually assume that your Google Analytics numbers are a bit on the low side. Log analyzers, on the other hand, dig through your web server’s logs, counting each and every time the server says it delivered a page to someone. Server side logs usually have far less useful information, but the information they do have tends to be more accurate. Armed with this knowledge, let me explain why my jaw hit the floor: Google Analytics said that I’d served up a bit over 60,000 pages in the month of September – AWStats was reporting just over 250,000.

Again, it’s important to understand that there will always be a disparity – but this one was WAY too big. My wife’s food blog, for example reported about 10,000 hits that same month through Google Analytics and about 11,000 through AWStats – after checking a number of other sites I run, I came to the conclusion that her numbers are pretty typical. Google Analytics seemed to miss less than 10% of visitors on my average site, so why are less than a quarter of visitors to this site being properly counted?

At first I thought AWStats must be wrong, so I tried Webalizer too, then downloaded the raw logs and ran Analog. Within reasonable constraints the server-side logs all agreed. I tried installing other client-side analysis tools like OWA, just in case Google Analytics was flawed in some way that other JavaScript-based tracking wasn’t and the numbers all came out the same. 76% of viewers of this site appear to have JavaScript disabled or are otherwise circumventing client-side analytics.

Now my example may be a bit long-winded and I apologize for that, but it does go a long way toward making my point: Bitcoiners, you are some of the hardest people to gather information about I’ve ever met. If I can’t even get a solid number to tell my sponsors how many of you are viewing my stuff in a given month, I find it unlikely that any analysis of this community or our behavior will even be close to accurate.

But this can work both ways – sometimes a lack of data is just as telling as an abundance of it. I would also posit that this could make an interesting metric by which to judge Bitcoin adoption: Analytics parity. As more individuals outside of the highly-technical JavaScript-disabling community adopt Bitcoin, the percentage of you that remain uncounted should begin to decrease. Gather enough data points across enough Bitcoin sites with some non-Bitcoin sites as a control and compare the percentage who go uncounted. If my site alone is anything to judge by, we’ve got a long long way to go.

No tips yet.
Be the first to tip!

Tip With Bitcoin

1GAzKrNQEFHEaeS8CQeZKCtCjRBYCk1nyi

Each post gets its own unique Bitcoin address so by tipping you're not only making my continued efforts possible but telling me what you liked. Vote with your (Bitcoin) wallet!

Comments

  1. A Nony Mouse says:

    10,000 BTC bought TWO pizzas, not one. Let's not falsify history here.

  2. Drugged Thug says:

    Dude, your blog is called CODING In My Sleep. Its audience is always going to be dominated by technically-inclined people.

    • Ah, but if I go back to a time before I blogged about Bitcoin the disparity between client and server side stats is within normal levels. Also, on days when I got huge spikes of traffic from Hacker News the disparity is reduced, compared to days when most of my traffic came from Reddit/r/Bitcoin et al. Bitcoiners are a hard crowd to put numbers to.

  3. FWIW Adi Shamir will be publishing a (somewhat) corrected version of that paper in the next few days making it more clear that the results obtained are highly uncertain.

  4. Gary Rowe says:

    Hi David – another great article. BTW your QR code and link for donations is still broken…

  5. I believe most of 2009 and 2010 crops, so almost 25% of total bitcoins that will be ever minted are in the hands of the Creator and his Angels.
    Is it fair and they deserve them? Sure, they invented the system!
    Does this pose a risk? What risk, to dump them on the market to lower price long enough time to destroy the system? On the one hand, they clearly demonstrated they are not stupid. On the other hand, would they kill their own child, in which they invested a lot of time and resources, just for fun? I doubt.

Leave a Reply

%d bloggers like this: