[Sigia-l] search results and thesauri

Andrew McNaughton andrew at scoop.co.nz
Sat May 25 00:37:37 EDT 2002


On Fri, 24 May 2002, Tal Herman wrote:

> On Thu, 23 May 2002, Tal Herman wrote:
>
> > > While we preserve for the journalist the terms that they search for,
> > > we don't reveal the synonyms that were relied upon to find their
> > > results set.  I don't find this approach problematic in the least
> > > for the audience we are serving and the service that we are
> > > providing.  The journalists simply don't care if the word 'moslem'
> > > was translated to it's officially approved synonym 'muslim' during
> > > the course of the search, or if their spelling of 'Koran' was
> > > changed to 'Qu'ran' for purposes of the search.  All they care about
> > > is if they've gotten a results set that satisfies their need for a
> > > particular kind of expert.
>
> To which Andrew McNaughton responded on 24 May 2002,
>
> > If you are only dealing with *precise* synonyms, this is entirely
> > un-problematic.  The problem comes around if "Qu'ran" is not a preferred
> > term and you list experts in the Qu'ran under "Muslim" or "Middle Eastern
> > Reliegion" or some such term.  Using broader terms or related terms in a
> > search which are not precisely the same as the user's search can improve
> > results, but because these terms are not precise synonyms, the result is
> > not always appropriate to the search intent, and it can lead to unexpected
> > results.  This is the sort of issue that makes it important to give users
> > feedback on the terms that are actually used in the search, and the
> > opportunity to modify them.
>
> I think that I have to disagree with Andrew on this one.  I'll admit that as
> an IA, it offends my sensibilities not to reveal to the searcher the tricks
> that I'm using to get them the results for their search.  On the other hand,
> it's my obligation to provide the searcher with as good a results set as
> possible without distracting them with unnecessary information.  In the case
> of the particular application we're talking about here, the results set
> includes the titles of relevant books authored, papers presented, and
> courses taught by the scholar(s) whose names are returned in response to the
> query, and this additional information provides the context for the
> journalist to make the decision about whom to contact.
>
> The searcher is not directly exposed to the taxonomy at all (unless browsing
> for scholars using the taxonomic hierarchy).  It is quite possible that a
> user might enter a term for which there was a precise match in the taxonomy
> and the results returned would not contain any direct reference at all to
> that term.  For example, to pick a current hot topic, the journalist might
> search for the term 'taliban' because he or she is doing a story on
> Afghanistan.  The list of results for any particular scholar might not
> include the word 'taliban' at all, although 'taliban' may be an entry in the
> taxonomy.  On the other hand, the term 'taliban' may not be a part of the
> taxonomy, rather experts on the Taliban might be grouped under the term
> 'afghanistan' or 'islamic fundamentalism', along with experts in other
> subjects relevant to Afghanistan or other types of Islamic fundementalist
> groups.

When someone searches for taleban, do you combine the results for
'afghanistan' and 'islamic fundamentalism' with a boolean AND, with an OR,
or do you just return one of them?

If there's only a handful of results for islamic fundamentalism then this
will present few problems, but I doubt you would find it hard to think of
a broad enough term in your system for the list to be too long to browse
comfortably, so that the user needs a basis for refining their search.
to do that they need to know a bit about the result set they are looking
at.

Results matching 'afghanistan' OR 'islamic fundamentalism' is probably
safe in a small resource set where this list remains small enough to work
through, but depending on the volume of material involved under the terms
involved, this will not always be a workable approach.

Assuming you have terms which cover more than a few dozen items, as well
as more specific terms, the same approach to combining terms related to
what a user enters will not be suitable across the board.


> For the journalist, this distinction is relatively unimportant.  The
> additional information included with a scholar name lists the materials that
> caused that scholar to be returned as part of the results set and is enough
> of a basis upon which the decision to contact or not contact a particular
> scholar can be made.  Thus, the question is really one of audience.  In this
> case, the audience doesn't need exposure to synonyms, rather they need
> exposure to other information relevant to the search results so that they
> can decide on their own which result is best for their needs.

It's not a question of audience, it's a question of scale and accuracy.
At the point where it's reasonable to just scan the information in a list,
the search system has done its job.  It's when the list size and
(in)accuracy combine to make an unworkable result set that the user needs
more information in order to improve their search.

Andrew McNaughton




More information about the Sigia-l mailing list