[Sigia-l] search results and thesauri

Ziya Oz ZiyaOz at earthlink.net
Sat May 25 14:21:20 EDT 2002


"Andrew McNaughton" wrote:
 
> I took it that you were pre-calculating the size of all set intersections
> amongst these.  combinatorial functions grow quickly which was my concern.

Nope. You do it on the fly. No need to pre-calculate. As I said, set
operations are extremely fast.
 
> Whether you think of it as a DB or not, you do have to look up an index.
> The lookup you do is essentially the same as for any other search except
> that you only retrieve (the size of) a list of document id's, not titles,
> summaries and so forth.

Actually there is a huge difference. Searching in the small array of sets
cached in memory (say, 5K or 65K words) is instantaneous, without the need
to load records or anything else. If you have a three-tier setup with an app
server, you literally do not even need to hit the main DB at all.

> To get the size of the intersection, your search engine would still be
> considering all members of the set internally, even if it didn't return this
> info.

Like I said, set operations (cached and performed in memory) are
instantaneous. And if you have an app server, you do not even hit the search
engine at all (see above).

> For smaller collections, it's retrieving the details that bites, so the
> cost of getting the intersection set sizes is minimal, depending how many
> times you do it. 

But that's the whole point. You do *not* need or get the details until you
have the absolute minimum number of records desired. Until then, you are
just playing with (extremely small and efficient) pointers to records, not
the records themselves.



More information about the Sigia-l mailing list