[Sigia-l] using thesauri to improve search
Leonard Will
L.Will at willpowerinfo.co.uk
Mon Jun 10 18:08:24 EDT 2002
In message <3D63713E at webmail.drexel.edu> on Mon, 10 Jun 2002, Michael
Fry <mwf24 at drexel.edu> wrote
>
>So far, we've been working toward a faceted thesaurus that, where
>appropriate, breaks multi-word concepts or phrases (e.g., "drug
>addiction" and "low income youth") into discrete facets.
Be careful to recognise the distinction between "concepts" and
"phrases".
Drug addiction is a single concept, which you may choose to label with a
multi-word term such as "drug addiction" or a single word term such as
"addiction". Low income youth, on the other hand, is a combination of
two concepts: "people with low incomes" and "young people", which are
distinct and each of which can have its own scope note in a thesaurus.
You may choose a single or multi-word term to label each of these
concepts; that does not affect the distinction between the concepts
themselves.
>(FYI, we'll probably also build a browsable version of the terms, but
>the initial goal is to improve search.)
It would be a good idea to do so, and to make this an integral part of
the search interface, as Avi Rappoport has suggested. If you are using a
controlled vocabulary I think it is best to let the users see what it
is, so that they can choose the terms that best match their needs. Your
search interface will be two-stage:
1. Map from the terms the user thinks of to the terms of the controlled
vocabulary. You can use free text search techniques for this, including
string matching, stemming, truncation and so on, to display possible
terms from the controlled vocabulary for the user to choose from, with
the option of navigating to broader or narrower terms, selecting
subtrees, related terms and so on. The system should help the user to
define and isolate the concepts in the enquiry and to combine terms to
express each of them.
2. Use the chosen controlled vocabulary terms to retrieve documents of
interest.
>For example:
>
>populations
> <by age>
> adolescents
> adults
> youth
> <by economic status>
> low income populations
> middle class populations
> working class populations
> <by condition>
> drug addiction
> glycemia
> malnutrition
As an aside, this does not seem to be a very good example. Perhaps
"populations" is the most used term in the subject area of the
thesaurus, but I would prefer "people" as the broader term of
"adolescents" and "adults". These are _kinds of_ people, not kinds of
population.
"Youth" is not a kind of person or population; "young people" would be
better, unless you mean "children" (scope notes presumably clarify
this).
"Drug addiction" and "malnutrition" are not kinds of populations or
kinds of people. They are either "social problems" or "medical
conditions" or both, and should go in those facets. "Glycemia" is a
medical condition. You would either have to list these concepts under
more appropriate headings, or change the terms to something like "drug
addicts", "people with glycemia" and "malnourished people", if that is
what you mean.
>Specifically, if 'drug addiction' isn't submitted as a phrase (i.e.
>wrapped in quotes), how does the search software, Inktomi, know that
>users are looking for the 'drug addiction' term in our vocabulary?
I don't know how Inktomi works, but if it just presents the usual little
dumb box saying "type your search here", then you will have to go
through the two-stage process I have noted above to guide the user to
use the controlled vocabulary properly.
>Is there something about search engine software that I'm
>underestimating?
No, I think that most search engine interfaces are designed for simple
text searches and don't allow for the greater power and functionality
that controlled vocabularies provide.
>Do we have to design a more complex search UI in order to facilitate
>the translation?
Yes
>Should we be building a vocabulary that's pre-coordinated rather than
>post-coordinated?
This may be helpful for browsing, but not for specific searches. You can
have both by building a combined thesaurus and a faceted classification
using the same terms. I would like to see more use made of the kind of
interfaces that Avi Rappoport mentions to allow intelligent searching
using a faceted thesaurus, with feedback at each stage so that the user
can refine the search according to the results.
Leonard Will
--
Willpower Information (Partners: Dr Leonard D Will, Sheena E Will)
Information Management Consultants Tel: +44 (0)20 8372 0092
27 Calshot Way, Enfield, Middlesex EN2 7BQ, UK. Fax: +44 (0)870 051 7276
L.Will at Willpowerinfo.co.uk Sheena.Will at Willpowerinfo.co.uk
---------------- <URL:http://www.willpowerinfo.co.uk/> -----------------
More information about the Sigia-l
mailing list