[Sigia-l] search results and thesauri
Ziya Oz
ZiyaOz at earthlink.net
Sat May 25 03:50:44 EDT 2002
"Andrew McNaughton" wrote:
> Any examples?
Nothing public. But I developed a smart banner ad server, way back in mid
90s before there were large-scale commercial apps for this purpose. It had a
client/server, graphical front-end that visually managed sets, among other
stuff. Pure HTML UI would obviously be less slick, but there's always Flash
MX :-)
>> Sets are pre-calculated and thus set operations are instantaneous, requiring
>> zero search of actual records. Once a small number is reached, a button
>> would perform the actual search across records.
>
> This works for a very small number of terms, but the server load and page
> size required to deliver this pre-calculated info to a client side tool
> increases dramatically with the number of terms involved.
Actually, that's not really a bottleneck at all (unless, of course, you were
Google :-). I designed the app above to work with up to 65,000
terms/categories. But it could be more. Actual word-set being used at the
time was somewhere around 7,000.
Before a 'document' was included into the DB, it was parsed. All distinct
words minus common terms like "the" and others from an in-house list were
discarded. What was left was matched against the existing category list; if
there were new words in this doc, they were added to the list. Each word
represented a set. Each set contained a pointer to the record/doc that
contained that word.
So when the user entered a word, all documents that contained that word
could be returned instantaneously and *without* doing a DB search. Set
operations like union and intersect are virtually instantaneous and, again,
they require no trip to the DB.
Since each set 'knows' what and how many documents are included in it, you
can return that info immediately to the user. If they perform Boolean ops on
sets, again, you can return the resulting set operation result immediately
to the browser. (Of course, in a client/server situation this is
transparent, extremely fluid and pretty much in real time.)
Now when the user is done with all the selections/combinations and ends up
with a small number of indicated docs, then he can submit it for the actual
search of the database records. Until then, it's mostly passing a small
amount of numbers back and forth, between the browser and the app/server.
Anyhow, the trick here is that you never touch the DB until and unless you
need to. Sets allow you to create virtual collections of taxonomies mapped
against actual records with just a few bytes of info on each. Needless to
say there are some things to watch out for in this architecture, depending
on what exactly you might be doing.
Hope this helps.
Best,
Ziya
More information about the Sigia-l
mailing list