[Sigia-l] Don't submit websites to search engines?

Christina Wodtke cwodtke at eleganthack.com
Mon May 17 23:01:15 EDT 2004

----- Original Message ----- 
From: "Eric Scheid" <eric.scheid at ironclad.net.au>
To: "sigia l" <sigia-l at asis.org>
Sent: Monday, May 17, 2004 6:04 PM
Subject: Re: [Sigia-l] Don't submit websites to search engines?

On 18/5/04 2:32 AM, "Jon Hanna" <jon at hackcraft.net> wrote:

>>>     http://www.google.com/webalerts?hl=en
>> I'm not too sure that's a valid source of data to support your claim
>> that "Huge swathes of the web have *never* been crawled".
> You are correct that that isn't really an irrefutable source of data for
> a claim. However huge swathes of the web have *never* been crawled.

There was some research done some years ago, described by Albert-László
Barabási in his book "Linked: The New Science of Networks", which looked at
the overlap between search engines and from the degree of overlap estimate
the size of the web that wasn't crawled. Back then the figure wasn't very
impressive, and the web wasn't all that big back then either. They also
noted that the web was growing faster than it was being crawled.

I've got several different google webalerts running - it's amazing what old
crud it finds (and lots of more recent stuff, of course). I do find that it
indexes some sources more regularly (eg. blogs), which we already knew.
Nonetheless, there are some sources it's only crawling years after the fact.


When replying, please *trim your post* as much as possible.
*Plain text, please; NO Attachments

Searchable list archive:   http://www.info-arch.org/lists/sigia-l/
Sigia-l mailing list -- post to: Sigia-l at asis.org
Changes to subscription: http://mail.asis.org/mailman/listinfo/sigia-l

More information about the Sigia-l mailing list