[Sigia-l] using thesauri to improve search

Michael Fry mwf24 at drexel.edu
Mon Jun 10 12:55:05 EDT 2002


Hi,

I have questions about how to integrate a thesaurus (of > 500 terms) into a 
site for the purpose of indexing content and improving search results.

Here's my (perceived) dilemma:

So far, we've been working toward a faceted thesaurus that, where appropriate, 
breaks multi-word concepts or phrases (e.g., "drug addiction" and "low income 
youth") into discrete facets. (FYI, we'll probably also build a browsable 
version of the terms, but the initial goal is to improve search.)

For example:

populations
 <by age>
  adolescents
  adults
  youth
 <by economic status>
  low income populations
  middle class populations
  working class populations
 <by condition>
  drug addiction
  glycemia
  malnutrition

The plan is for these terms to be assigned to relevant documents, so that a 
document might be simultaneously indexed with "drug addiction," "adolescents" 
and "low income populations," and thus be retrieved whenever somebody's query 
includes the appropriate, matching concepts.

The problem (I think) is that users aren't likely to search with the 
vocabulary we build, and aren't likely to explicitly specify phrases in their 
queries. So what we've got is a post-coordinate vocabulary trying to match up 
with a mix of pre- and post-coordinate query types.

If that's the case--and ignoring issues of synonymy for the moment--how do we 
map multi-word, multi-concept queries such as [drug addiction teens] to the 
appropriate, individual indexing terms, i.e. "drug addiction" and 
"adolescents"? Specifically, if 'drug addiction' isn't submitted as a phrase 
(i.e. wrapped in quotes), how does the search software, Inktomi, know that 
users are looking for the 'drug addiction' term in our vocabulary?

I've been working on this thesaurus a lot lately, but suddenly I feel like 
what we're developing isn't going to bridge the gap between users and 
information as well as it ought to. Is there something about search engine 
software that I'm underestimating? Do we have to design a more complex search 
UI in order to facilitate the translation? Should we be building a vocabulary 
that's pre-coordinated rather than post-coordinated?

Thanks very much.

mf




More information about the Sigia-l mailing list