[Sigia-l] Comparing search engines

Surla, Stacy SSurla at aspensys.com
Thu Jun 19 13:47:37 EDT 2003


I went through somewhat similar search engine analysis process lately.  In
my case our company had already licensed a particular search engine, and I
wanted to implement an instance as an upgrade to a website that was using
IIS by default.  My assumption was that the new engine would produce better
results, but this was not borne out by initial tests.  So we stepped back to
try and figure out 1) What constitutes a good search system anyway, 2) Can
engine B be configured to produce the results we need, and 3) Would we have
to change something about our website to make the engine work (e.g. turn the
site into a database)?

The IT staff who selected search engine B in the first place had diligently
determined that the product could be implemented within our environment.
But they had not really looked at the picture from the users' point of view,
or asked what good search results would be, or how websites would need to be
organized to give good results with this product.  In the end, we wound up
not implementing engine B.  (Making this all a problem for another day.)

But in the process we defined a set of search considerations for ourselves.
We also came up with questions for the engine vendor (the answers to which
led to the decision to drop the quest).  In case it's useful, here's a
little summary:

-----

To provide useful and positive search functionality to our users, we need to
deliver in three areas:

-- Indexing the Content:  There must be a way to build an index based on
what's important about any given content item.  This must take into account
the context of a particular site. "Engine B," as currently configured, does
not bring the most relevant documents to the top of a results list. Issues
include the following:  Our Web sites contain three types of content -
structured (content already contained in databases), semi-structured (HTML,
PDF), and unstructured (full text documents).  Structured content could
theoretically be indexed according to any fields we create.  However, the
unstructured or semi-structured content is indexed by "Engine B" with only 1
or 2 fields (title and body or just body).  This means HTML metatags are not
captured as metadata, and PDF titles are not captured at all.  The full-text
indexing algorithms do not identify what's important about these content
objects sufficient to produce good search results.

-- Building the Search: We need to make the query-building process as
practical as possible for our users.  We see this as involving back-end
organization to optimize or minimize the need for user interaction with
advanced tools.  Back-end organization includes defining search zones,
applying stemming and other query building tools, and using controlled
vocabularies. Our original enthusiasm for using "Engine B" was based on its
reported capabilities in this area. [Utilizing "Engine B's" capabilities in
this area require coding skills that are not readily available in the IT
department.]

--Displaying Results: Building a useful results page requires tools for
ranking, clustering, and highlighting results.  In the Technical Questions
memorandum we asked whether various results displays would be possible.
["Engine B's" capabilities in this area require coding skills that are not
readily available in the IT department.]

-----

~Stacy Surla



More information about the Sigia-l mailing list