CloudFlare's customers: Same old power law

This just in: Little guy gets screwed !

August 11, 2012

Ten years ago, the tubes were all atwitter about web traffic and the power law. Google's PageRank from 0 to 10, displayed on the Google toolbar, used a log base that averaged between 4 and 6. This meant that you needed about five times the number of links to increase your PageRank by one digit, if everything else was equal. The power law is not a new idea — it has been observed in various natural phenomena. Look at income distribution in dog-eat-dog capitalism, for example. The Occupy Wall Street demonstrators are not exaggerating when they compare the one percent to the 99 percent.

Nevertheless, it was new in cyberspace. The Long Tail was hyped by the chief editor of Wired. He seemed to argue that you should be delighted that you cannot sell anything from your website due to low traffic. You are the wave of the future, so let's see a smile!

Along with Google's PageRank, the traffic-tracking site Alexa also used a log scale. High traffic domains get ranked with lower numbers, so that the number one domain on the web has the highest traffic. Years ago, reports on actual traffic from various webmasters resulted in a plot of Alexa rank against daily unique visitors (Alexa itself did not offer this information). The formula for the curve that best fit these plots took Alexa's traffic rank number, raised it to the power of -0.732, and multiplied the result by 7 million. Even though this data is old, it mainly affects the multiplier, and the nasty curve remains the same. This graph is plotted on a linear scale. If you were to re-plot it using an optimized log base, it would be a straight line from upper left to lower right.

Here is a current graph from Alexa that compares traffic to traffic. Note that the 'Y' axis is plotted on a logarithmic scale. You have to do this when plotting power-law phenomena in order to make the graph readable, as a linear plot would squash much of the data to one edge of the graph. One must keep in mind that it is not your usual graph. For the casual reader, it presents a massive distortion.

Over the past two months, this site has collected 130,000 domain names that have used CloudFlare at some point since late 2010. Within these, we found 86,846 with a direct-connect IP address. The numbers change weekly as our database expands. This page will not update, however. Even when the multiplier changes, with a large initial sample the curve will look the same. If and when CloudFlare begins to enforce its terms of service or adopts new admission policies, then it is worth another look. CloudFlare itself claims nearly 500,000 domains, which means that our current sample is adequate.

The idea to create this page came on August 6. I noticed from our daily haul of domains that showed DNS activity on CloudFlare's servers, that 140 different domain names on that single day had the same direct-connect IP address ( This IP geolocates to Thailand. I recalled from the graph in Prolexic Quarterly Global DDoS Attack Report - Q2 2012, that Thailand has recently shown an amazing increase in activity. I also knew from our own websites, which continue to deal with bots (about 35,000 per day) inherited from the days when Scroogle was functioning, that lately the IP addresses from Thailand were showing the largest daily numbers — much larger than China! We finally blocked the entire country.

My first instinct was to look more closely at that one IP address. Then I decided that this is not my problem. (It ought to be something that CloudFlare does whenever customer activity exceeds certain parameters.) The purpose of this website is to look at CloudFlare as a whole. That's when I thought to look at the distribution of direct-connect IP addresses across all 86,846 unique domains. Does the power law operate on CloudFlare's customers? I thought it might, knowing that the hype was for free DDoS protection, and that they accepted all comers.

Our homebrew software ranked the 37,017 unique IPs attached to the 86,846 unique domain names. Here is the top of the new list. The number in front is the number of domains controlled by that IP:
  1079    USA
    633    USA
    621    USA
    600    BAHAMAS
    357    USA
    339    USA
    248    USA
    247    USA
    228    USA
    199    THAILAND
    184    USA
    183    CANADA
    161    USA
    156    USA
    139    USA
    137    USA
    132    USA
    124    USA
You can see that the top 0.054 percent of the unique IPs control 6,485 domains, which is 7.5 percent of all the domains. Here are the stats at the long-tail end of CloudFlare:
Domains per IP in CloudFlare's long tail

1 domain per IP = 69 percent of IPs and 29 percent of domains
2 or more = 31 percent of IPs and 71 percent of domains
3 or more = 17 percent of IPs and 59 percent of domains
4 or more = 11 percent of IPs and 51 percent of domains
5 or more =  8 percent of IPs and 46 percent of domains

Yes, this is the power law. Does it matter? It wouldn't be important if CloudFlare wasn't hyped to the hilt by Matthew Prince and friends. The curve would tend to flatten out if CloudFlare regulated new admissions, and especially if that meant charging a monthly rate for every domain. Instead, it's a free service. This attracts domain squatters and affiliate farmers on one end, and bloggers with cat pictures on the other end. Those aren't necessarily bad, but it also seems that each end sports a high incidence of riffraff and criminal activity. Good luck if you try to complain to CloudFlare. They will tell you that they are not the hosting provider.

That's not quite true, and I doubt that a judge or jury would agree with CloudFlare. They host the DNS and they cache content. If they yanked the DNS on a domain the traffic would drop to almost zero within a few minutes. That would at least get the owner's attention.

Mr. Prince is much too tolerant of abuse, and it's the little guy who gets screwed, whether he uses CloudFlare or not.

If you want to see the domains and IP addresses, our list is available at the bottom of this page.
A zip file there can be downloaded and used with grep to research specific IPs or netblocks.

home page