[Sigia-l] Don't submit websites to search engines?

Mon May 17 10:52:08 EDT 2004

> > So the premise of the article
> > is ridiculous on its face
> 
> Another ridiculous premise is that the search engines exhaustively
> crawl the
> web on a regular basis. Huge swathes of the web have *never* been
> crawled,
> and not simply because they are behind /robots.txt, dynamic pages, or
> firewalls. Huge swathes of eminently crawlable webspace.
> 
> How do I know? I have google send me an email alert anytime they add a
> new
> page to their index that contains the keyword "IAwiki", and I receive
> a
> trickle of alerts for what I know to be very old pages. Specifically
> the
> SIGIA-L archives, and even other blogs.
> 
>     http://www.google.com/webalerts?hl=en

I'm not too sure that's a valid source of data to support your claim
that "Huge swathes of the web have *never* been crawled".

While I don't doubt the probability, considering the size of the web, I
have to say my experience is quite different.

In the head of every page of my site(s), I have a script that e-mails me
every time Googlebot hits a page. The e-mail tells me what time and what
page was hit.  

I got so tired of waking up every morning to 100+ e-mails that I turned
off the script!

Does this mean the page was re-indexed? No. But does this mean the bot
came by? I think that's clear.

-Karl