[Sigia-l] Findability
Karl Fast
karl.fast at pobox.com
Mon Jan 27 22:31:02 EST 2003
> > 1. The algorithms don't work as well. Once you drop into the tens or
> > hundreds of thousands of documents, the retrieval algorithms become
> > much less effective.
>
> Was that based on your experience of applying Google technology to a
> small site? Or was that just speculation?
Good question.
In terms of Google this is speculation on my part. In terms of other
retrieval tools this is based more on experience (depends on the
tool).
In a course I took last term on information retrieval systems we
spent some time on the Google algorithm. Part of our discussions
were about how these things tend to work better with more stuff.
More interconnections and information redundancy were the main
reasons.
However, I don't know where the break-over point is. If you were to
plot effectiveness vs. document collection (and let's assume it's a
hypertext collection where things are linked) I would be *extremely*
suprised if it was linear.
Note also that evaluating the effectiveness of retrieval systems is
a difficult and much-debated topic. You should see the academic
literature on this. Frightening.
--karl
More information about the Sigia-l
mailing list