[Sigia-l] Don't submit websites to search engines?

Listera listera at rcn.com
Tue May 18 00:39:45 EDT 2004


From
> http://www.salon.com/tech/feature/2004/03/09/deep_web/index_np.html

> Those of us who place our faith in the Googlebot may be surprised to learn
> that the big search engines crawl less than 1 percent of the known Web.

Is there any concrete evidence of this?

> The "deep Web" is the great lode of databases, flight schedules, library
> catalogs, classified ads, patent filings, genetic research data and another
> 90-odd terabytes of data that never find their way onto a typical search
> results page.

Nonsense. A lot of this stuff is already available on the web.

> Case in point: In 1999, the CIA issued a revised edition of "The Chemical and
> Biological Warfare Threat." It's a public document, but you won't find it on
> Google.

Cut and paste "The Chemical and Biological Warfare Threat" into Google,
elapsed time less than 0.28 seconds:

http://www.sci.sdsu.edu/classes/biology/bio610/bernstein/PDFS/Dr.Sabbadini/w
arfare.pdf

The reason why we don't have access to some data on the web is far less a
technological issue than one of data-providers not wanting to make it
accessible. Given permission by data-providers, Google, or Yahoo for that
matter, can easily make what's not currently available accessible. No
technomagic here. Unfortunately, business models and IP legalities still
make this an unsolved problem, no matter what  the press release of a
particular SE company might insinuate.

Ziya
Nullius in Verba 





More information about the Sigia-l mailing list