[Sigia-l] Findability

Simon Wistow simon at thegestalt.org
Tue Jan 28 10:33:20 EST 2003


On Tue, Jan 28, 2003 at 12:36:58PM +0530, Madhu Menon said:
> Please do disclose to us the Google algorithm. :)
> 
> To the best of my knowledge, they've never really told us much about it. 
> PageRank is just one element.

AFAIK PageRank is the name for the whole system and also for the
'relevancy through linking' algorithm.

Basically there are a number of factors with various weightings,
traditional PageRank being just one of them.

Another one is whether or not a link is in the Yahoo! Directory. There
are many others. From what I know when Google says it's tweaking its
algorithms (such as the occasion that caused lots of bloggers to start
moaning that their sites weren't number one any more) it is reevaluting
its metrics and adding or removing rules.

Basically there are several factors that dictate whether a search engine
is 'good' or not. Coverage is one - how many sites you index, obviously
there's Relevancy - how good the results are and there's Freshness - how
often you update your index.

As an aside Google has a very fast Inverted Index - the bit that turns a
URL/Page with a load of words in it into a list of words with every URL
that contains them.


What Google *is* good at is finding individual phrases in pages e.g

http://www.google.com/search?q=tiger+tiger+burning+bright


However it's quite bad at finding categories

http://www.google.com/search?q=jpg

	vs

http://search.yahoo.com/bin/search?p=jpg


for example.


Since more and more web pages are becoming coherent 'sites' rather than
just collections of individual pages I think this is going to be more of
a problem.

Google also doesn't index as often or as fast as FAST or Inktomi.

You may or may not have noticed (beneath the clutter) that Yahoo! has
changed the way it does searches. It used to look in the directory and
then if there were no matches then it would query Google (or AV or
before that Open Text). Now what happens is that directory results are
mixed together with Google results. 


This is not an attack on Google, just some counter observations. I
should also point out that I work for Yahoo! on the Search Team so read
into that what you will - either I know what I'm talking about or I've
drunk deeply of the Corporate Kool-Aid. Or both :)

Simon










More information about the Sigia-l mailing list