[Sigia-l] Re: using thesauri to improve search

Michael Fry mwf24 at drexel.edu
Tue Jun 11 16:21:20 EDT 2002


I asked:

>> ...ignoring issues of synonymy for the moment--how do we map multi-word, 
multi-concept queries such as [drug addiction teens] to the appropriate, 
individual indexing terms, i.e. "drug addiction" and "adolescents"?

You replied:

> Why ignore issues of synonymy even for now? Won't you need to track down 
varient terms and keep track of them in the thesaurus?

Sorry, I didn't communicate too well (and yes, you're absolutely right about 
that). We don't intend to ignore synonymy in practice. I meant that readers 
should ignore it for the purpose of my example. I didn't want anybody to 
confuse the issue and miss my intended point, which had to do with mapping 
unquoted phrases to phrases in the vocabulary. (I suppose I didn't do so 
well!)

> Look at multi-word search entries in an AND then OR order unless the user 
specifies something else: "drug addiction teens" would return results as if 
the user had typed "drug AND addiction AND teens" [then] "drug OR addiction OR 
teens"

I think I disagree, perhaps because I have the benefit of being more intimate 
with the terminology on this project. An AND search for three words isn't 
inherently the same as an AND search for a phrase and another word, even if 
the phrase contains the two other words.

Hypothetically, the first could return documents about 'sex-addicted teens and 
recreational drugs' just as effectively as it returns documents about 'drug 
addicted teens.' In the project domain, the latter would be highly relevant; 
the former, only minimally so (even if they made for an amusing read).

If I'm not mistaken, phrases (i.e. multi-concept terms) are particularly 
useful for indexing the docs because they're more specific. They allow 
administrators to "tell" the search engine that "this doc is primarily about 
'drug addiction', so when you see that phrase in a search string, retrieve 
this doc. You should also retrieve docs that contain the individual 
words/concepts 'drugs' and 'addiction,' but rank them lower than those tagged 
with 'drug addiction' because they are less likely to be truly about the core 
concept."

mf




More information about the Sigia-l mailing list