[Sigia-l] using thesauri to improve search
Michael Fry
mwf24 at drexel.edu
Mon Jun 10 12:55:05 EDT 2002
Hi,
I have questions about how to integrate a thesaurus (of > 500 terms) into a
site for the purpose of indexing content and improving search results.
Here's my (perceived) dilemma:
So far, we've been working toward a faceted thesaurus that, where appropriate,
breaks multi-word concepts or phrases (e.g., "drug addiction" and "low income
youth") into discrete facets. (FYI, we'll probably also build a browsable
version of the terms, but the initial goal is to improve search.)
For example:
populations
<by age>
adolescents
adults
youth
<by economic status>
low income populations
middle class populations
working class populations
<by condition>
drug addiction
glycemia
malnutrition
The plan is for these terms to be assigned to relevant documents, so that a
document might be simultaneously indexed with "drug addiction," "adolescents"
and "low income populations," and thus be retrieved whenever somebody's query
includes the appropriate, matching concepts.
The problem (I think) is that users aren't likely to search with the
vocabulary we build, and aren't likely to explicitly specify phrases in their
queries. So what we've got is a post-coordinate vocabulary trying to match up
with a mix of pre- and post-coordinate query types.
If that's the case--and ignoring issues of synonymy for the moment--how do we
map multi-word, multi-concept queries such as [drug addiction teens] to the
appropriate, individual indexing terms, i.e. "drug addiction" and
"adolescents"? Specifically, if 'drug addiction' isn't submitted as a phrase
(i.e. wrapped in quotes), how does the search software, Inktomi, know that
users are looking for the 'drug addiction' term in our vocabulary?
I've been working on this thesaurus a lot lately, but suddenly I feel like
what we're developing isn't going to bridge the gap between users and
information as well as it ought to. Is there something about search engine
software that I'm underestimating? Do we have to design a more complex search
UI in order to facilitate the translation? Should we be building a vocabulary
that's pre-coordinated rather than post-coordinated?
Thanks very much.
mf
More information about the Sigia-l
mailing list