The Littlest Internet
Many people have often wondered how big the Internet really is. I know the answer, but I’m not telling. Instead, I set out to find which country has the fewest web pages. Along the way, I learned a bit about US sovereignty and unicycles.
Introduction, and a Bit About Methodology
I work at Google, but I’m speaking only for myself here, not for Google. To be honest, anyone with decent search engine skills could figure this out on their own. I just happen to have a close working relationship with the Internet.
First, some terminology. When you see a URL such as “http://www.andrewchatham.com/blog/”, the host for the website is everything before between the 2nd and 3rd slash, “www.andrewchatham.com” in this case. The top-level domain, or TLD, is the last part of the host. In this case, .com is the TLD, but there are also country-specific TLDs, so that the TLD for “www.google.fr” is .fr, indicating France. There are about 250 valid TLDs, most of them country-specific. Here you can find a list of country-specific TLDs.
Each country controls its own registration process, and they have vastly different requirements, so that a website with a domain from a certain country doesn’t necessarily operate in that country. For example, the tiny nation of Tuvalu — which has population of ten thousand and which may soon be underwater — sold off the rights to .tv to another company, which then sells domains such as “sports.tv”.
At Google, one of the projects I’ve worked on is our crawler, a very large program which downloads a copy of the Internet. I therefore have access to some better statistics than you could find publicly. I used these numbers to identify the countries with the smallest web presence, but you can approximately verify the results using the search engine itself. Google’s “site:” operator returns only results matching a given site, and it can also be used on top-level domains, so that a search for [site:.com] shows around 5 billion web pages hosted on .com. These estimates can be quite inaccurate, but they’re public and they roughly correlate with the private numbers, so I’ll refer to them.
For the most part, crawlers work by following links, and so if I create a web page but no one ever links to it, it probably won’t ever be found by Google. Google might also be forbidden from downloading some pages by a robots.txt file. For these reasons, the numbers might be underestimates of the true number of web pages, but it doesn’t affect the conclusions.
“Large” TLDs
We’ve already seen that .com is a very popular TLD. Other well-populated TLDs include .uk, with 477 million results in Google, and .de, with 176 million.
I started looking at the size of various TLDs because I thought there couldn’t possibly be much use for the niche TLDs, .museum and .aero. I may have been wrong, as there are certainly more pages on both than I had originally thought. [site:.aero] and [site:.museum] both show about 500,000 pages, and a cursory glance says that they are actually are about airplanes and museums. That may not seem like a lot of pages, but it’s way more than we’ll see on the really unpopular TLDs, and it puts them on par with .va, the TLD for Vatican City.
Small TLDs
Among countries you’ve probably heard of, the one with the fewest web pages is Iraq. A search for [site:.iq] reveals only 702 web pages, which makes the Iraqi web presence a bit more than twice the size of my website. I understand they’ve been very busy in Iraq lately, so creating a large web presence and an easy domain registration process may be low on their list of priorities. Afghanistan, by contrast, has 116,000 pages, including those for several banks and software companies.
Compared to some lesser-known places, though, Iraq dominates the Tubes.
For the other Internet dorks out there, we have .arpa, with 135 results. .arpa is a holdover from the Primordial Internet, and I’m not sure how someone managed to get a Unicycle Blog on there, but they did. Technically, .example and .invalid are TLDs that can never have any web pages, but I’ll leave them out of consideration.
The big winner is .mh, the TLD for the Marshall Islands. It has exactly one website with a single page, http://www.nic.net.mh/, which has not been updated since 1997.
gee.um
Although it does not win first prize, second-place .um did a better job of capturing my imagination. With two web sites serving a total of seven pages, .um is the TLD for the United States Minor Outlying Islands. With a permanent population of zero, that gives .um the highest number of web pages per capita of any country-code TLD (or undefined, for you math pedants).
I had never heard of the US Minor Outlying Islands before, but I learned from their registry’s homepage that the islands were brought under US hegemony by the Guano Islands Act of 1856. I guessed that some politician must have had the unfortunate name of Guano, but it was not so. In fact, the US took control of these islands so that we could harvest bird poop. Most importantly, the US has permission to use military intervention to defend said bird poop from invaders. According to the Guano Act, a US citizen can claim for the United States any uninhabited island or rock containing guano deposits, subject to the president’s discretion. This is how an unpopulated quasi-nation is born and gets its Internet on.
Update: Alas! The same day I started writing this post, they decided to cancel the .um TLD!
February 13th, 2007 at 1:26 pm
hi im in high school and my name is andrew i stumbled on your site by accident when i was searching for my name on google
February 13th, 2007 at 1:31 pm
I just saw the books you read and i wanted to say that i read all the harry potter books all the philip pullman books, the lord of the rings and i saw the name of the rose last week and i just love each one of them. You should even read Eragon and the eldest
March 28th, 2007 at 7:50 pm
I am in technology and this article is definitely an interesting one for me.
July 1st, 2007 at 10:19 pm
Oh wow, so does this mean countries such as Afghanistan or Pakistan have more web pages than Iraq which is supposed to be/ used to be a more developed country? Or is it more of countries such as Afghanistan do not have any websites?
December 18th, 2007 at 8:50 am
Hi Andrew,
I checked your post while writing an article - cute information, esp. about the bird poo.
And I guess meanwhile .mh has to share the first place with .kp
Cheers, Arne