[Sigia-l] Re: using thesauri to improve search
Michael Fry
mwf24 at drexel.edu
Tue Jun 11 16:21:20 EDT 2002
I asked:
>> ...ignoring issues of synonymy for the moment--how do we map multi-word,
multi-concept queries such as [drug addiction teens] to the appropriate,
individual indexing terms, i.e. "drug addiction" and "adolescents"?
You replied:
> Why ignore issues of synonymy even for now? Won't you need to track down
varient terms and keep track of them in the thesaurus?
Sorry, I didn't communicate too well (and yes, you're absolutely right about
that). We don't intend to ignore synonymy in practice. I meant that readers
should ignore it for the purpose of my example. I didn't want anybody to
confuse the issue and miss my intended point, which had to do with mapping
unquoted phrases to phrases in the vocabulary. (I suppose I didn't do so
well!)
> Look at multi-word search entries in an AND then OR order unless the user
specifies something else: "drug addiction teens" would return results as if
the user had typed "drug AND addiction AND teens" [then] "drug OR addiction OR
teens"
I think I disagree, perhaps because I have the benefit of being more intimate
with the terminology on this project. An AND search for three words isn't
inherently the same as an AND search for a phrase and another word, even if
the phrase contains the two other words.
Hypothetically, the first could return documents about 'sex-addicted teens and
recreational drugs' just as effectively as it returns documents about 'drug
addicted teens.' In the project domain, the latter would be highly relevant;
the former, only minimally so (even if they made for an amusing read).
If I'm not mistaken, phrases (i.e. multi-concept terms) are particularly
useful for indexing the docs because they're more specific. They allow
administrators to "tell" the search engine that "this doc is primarily about
'drug addiction', so when you see that phrase in a search string, retrieve
this doc. You should also retrieve docs that contain the individual
words/concepts 'drugs' and 'addiction,' but rank them lower than those tagged
with 'drug addiction' because they are less likely to be truly about the core
concept."
mf
More information about the Sigia-l
mailing list