[Sigia-l] search results and thesauri

Tal Herman therman at seralat.com
Thu May 23 20:13:51 EDT 2002


Avi Rappoport suggests (message quoted below) that it is best to expose the
use of synonyms to people using your search engine.  I've been working on a
system where we chose not to display the synonyms to our users.

We've been building a system that enables journalists to locate scholars
with expertise in particular subject matter areas so that they can use them
as sources in stories.  (No URL because it's still in alpha, plus you'd have
to an approved journalist to log in.)

The system depends upon a taxonomy of some 5,000 terms defining expertise
areas.  The taxonomy is designed such that each of the 6,000 plus listed
scholars can be located through many different taxonomic terms, even if
their expertise is limited to a single subject area (e.g. one set of terms
defines expertise in terms of time, another in terms of geography, another
in terms of specific subjects).

Journalists locate scholars one of two ways: browsing through the taxonomy
in a hierarchical manner; or entering words/phrases into a search field and
clicking 'search'.  The search is carried out against the taxonomy itself,
to which the scholars are indirectly linked by the client's subject matter
experts, and scoring is based upon a number of factors, including the number
of relevant taxonomic terms referenced by the search.

In this system, the vocabulary of the taxonomy is and has to be completely
controlled.  The same term cannot have multiple meanings because that would
degrade the quality of the results to be obtained.  Journalists, however, do
not always spell things correctly or know the word that is used in the
taxonomy for the thing they are looking for.

To account for this, we developed a facility for the client whereby they
could create single or multiple word synonyms for particular taxonomic
terms.  Search terms entered by a journalist are automatically evaluated
against the synonym list and, based upon a set of rules too complicated and
proprietary to outline here, a set of scholars whose taxonomy affiliations
match the search terms and/or their synonyms are returned.

The results set is a group of scholars, including for each their contact
information and a list of relevant items that make them expert in the area
the journalist is investigating.  There is often no explicit match for any
of the search terms or their synonyms in the information displayed, although
the list of relevant items (publications, papers, courses taught, etc.)
tends to provide an excellent basis for evaluating which scholars might be
most useful for the journalist to contact.

While we preserve for the journalist the terms that they search for, we
don't reveal the synonyms that were relied upon to find their results set.
I don't find this approach problematic in the least for the audience we are
serving and the service that we are providing.  The journalists simply don't
care if the word 'moslem' was translated to it's officially approved synonym
'muslim' during the course of the search, or if their spelling of 'Koran'
was changed to 'Qu'ran' for purposes of the search.  All they care about is
if they've gotten a results set that satisfies their need for a particular
kind of expert.

In this circumstance, the information displayed with the scholar provides
the context for their listing and is sufficient to help the journalist make
the decision about who to contact.  There is no need to reveal the inner
workings of the search process.

Might some journalists find this information interesting?  Probably, but our
research  shows most of them simply don't care.  All they want is an answer
as quickly as possible.  So, as with many of the questions that IAs address,
the answer to whether you want to reveal the inner workings of your
synonym/thesauri process to your users is 'it depends.'

Tal

====================================
tal herman
merrill-hall new media
therman at merrillhall.com
404.827.9883 (v); 404.875.6572 (f)
http://www.merrillhall.com/
====================================

-----Original Message-----
From: sigia-l-admin at asis.org [mailto:sigia-l-admin at asis.org]On Behalf Of
Avi Rappoport
Sent: Wednesday, May 22, 2002 12:36 PM
To: sigia-l at asis.org
Subject: Re: [Sigia-l] search results and thesauri


These are great discussions -- if anyone has good (or bad) examples
of implementations, please post.  I love to use real examples when
I'm giving talks and writing articles.

I think best practices require a number of different approaches, all
depending on how much vocabulary control and keyword metatagging
you're doing.

I do not recommend whisking someone to the theoretical applicable
page without going through search results first.  It breaks the
expectation and may be wrong.  For example, I'm analyzing a search
spellchecker (paper to come soon) and someone misspelled "New England
Journal of Medicine".  The site actually has a record for that, but
because they misspelled it, the automatic system took them directly
to the listings for -- Great Britain (because of "England").  Whoops.

My main rule is to explain any automatic conversions (and wish people
would do that for stemming).  For true synonyms (doctor =>
physician), I think it's legit to just issue the search for the
preferred term, put a note on the results page, and consider
highlighting hits in a different form for the original vs. preferred
term.  In this example, you could do italic for :doctor" and bold for
"physician".

It does get trickier when you're talking about less solid agreement.
On a health site, it turned out that they used "Primary Care
Provider" or "PCP".  PCP is a more inclusive term, because it covers
Nurse Practitioners and such, but it's not what people really expect
when they type in a very general term like "doctor".  So I would
recommend that the site actually provide an informative response to
general questions about doctors, rather than just using a synonym
system in this case.

The only search engine I know that allows search admin control over
whether synonyms are automatically searched or whether they are
presented as an option is Inktomi Search Software: the synonym file
has flags for behavior.

Looking forward to learning more,

Avi


At 12:03 PM -0700 5/22/02, Chris Farnum wrote:
(snip)
>So here's a related question... does your search
>engine handle both synonyms AND preferred terms.
>Often 3rd party solutions don't include both and you
>are forced to decide how far to stretch the simple
>equivalence (synonym ring) feature they've given you.
>Your answer will depend partly on your content, partly
>on your indexing guidelines, and partly on how you've
>designed your CV.  For example, if you've got a
>collection of carefully edited and authored content in
>which term usage is very consistent you might need to
>concentrate on the alternate terms that point to your
>preferred terms (so users are less likely to get null
>results).  On the other hand if you have a more
>diverse set of content and users that are concerned
>with high recall, you will likely spend more effort on
>synonyms.  These issues may also impact how you answer
>the highlighting question.
>
>Regards,
>Chris

--
Search Server Industry Analysis from Search Tools Consulting
    (510) 845-2551  -- <mailto: analyst at searchtools.com>
Complete Guide to Search Engines for Web Sites and Intranets
    <http://www.searchtools.com>
Content Management Symposium, Chicago O'Hare Marriott, June 28 - 30.
See http://www.asis.org/CM

ASIST SIG IA: http://www.asis.org/SIG/SIGIA/index.html
_______________________________________________
Sigia-l mailing list
Sigia-l at asis.org
http://mail.asis.org/mailman/listinfo/sigia-l




More information about the Sigia-l mailing list