[Sigia-l] Don't submit websites to search engines?

Mon May 17 21:04:44 EDT 2004

On 18/5/04 2:32 AM, "Jon Hanna" <jon at hackcraft.net> wrote:

>>>     http://www.google.com/webalerts?hl=en
>> 
>> I'm not too sure that's a valid source of data to support your claim
>> that "Huge swathes of the web have *never* been crawled".
> 
> You are correct that that isn't really an irrefutable source of data for such
> a claim. However huge swathes of the web have *never* been crawled.

There was some research done some years ago, described by Albert-László
Barabási in his book "Linked: The New Science of Networks", which looked at
the overlap between search engines and from the degree of overlap estimate
the size of the web that wasn't crawled. Back then the figure wasn't very
impressive, and the web wasn't all that big back then either. They also
noted that the web was growing faster than it was being crawled.

I've got several different google webalerts running - it's amazing what old
crud it finds (and lots of more recent stuff, of course). I do find that it
indexes some sources more regularly (eg. blogs), which we already knew.
Nonetheless, there are some sources it's only crawling years after the fact.

e.