[Sigia-l] RE: using thesauri to improve search

Sanchez, Mario Mario.Sanchez at fishersci.com
Wed Jun 12 12:30:00 EDT 2002


> > you need to determine that a search for "drug addicted teens" 
> > should find items that are cross-sected by the facets 
> > "drug addiction" and "adolescents."
> 
> Of course, if the engine were to consider all the permutations for all
> entries, it could get very complex very quickly:
> 
> ABCD, A"BCD", "AB"CD,"ABC"D...
> Add AND/OR permutations for each
> Add stemming (potentially) for each
> Add synonyms (potentially) for each
> Etc
> 
> Cross compare them and you got a mess. Completely automating 
> this stuff is really tough.

Exactly - and you don't want to completely automate it, but want full
control over all the methods you use to try to decipher the searcher's
intent. In this example:

You don't just make every combination of words in the search string a
potential phrase. Instead you explicitly define relevant phrases for your
domain, and then test the user's search for those specific phrases. In this
example "drug addicted" might be in a phrase dictionary, but "addicted
teens" would not. Therefore it would translate "drug addicted teens" to
["drug addicted" AND "teens"].

An excellent example of this at work within my industry is a request for
"latex free exam gloves." In this case, I can say with very high certainty
that the system should query for ["latex free" AND "exam gloves"] and that a
search for ["latex" AND "free" AND "exam" AND "glove"] will return products
the searcher does not want - specifically "Latex Powder-Free Exam Gloves."
BIG difference!! I also don't have "free exam" as a relevant phrase
anywhere. Therefore my phrase dictionary will include "latex free" and "exam
glove" - and other word combinations will be ignored.

Another excellent example is a search for "chemical retardant hazmat suit" -
I explicitily want to recognize "chemical retardant" as a single concept and
as a phrase. I do not want to return "chemical resistant, fire retardant"
suits for this search, which a simple AND search would do.

And if you break down how you think about a concept like "latex free exam
gloves" - this approach mimics your though somewhat. You actually **think**
of "latex free" as one concept (it just happens to be expressed as 2 words)
and "exam gloves" as another. Depending on your context, you may also have a
concept of "free exam" - but that concept isn't relevent to any of the
products or content my users are searching, therefore I never test for this
combination of words.

Mario Sanchez



More information about the Sigia-l mailing list