[Sigia-l] Full-Text vs Keyword Searching
Alice Preston
aliceflute at hotmail.com
Fri Feb 25 15:27:16 EST 2005
Hi Marcia, nice to hear from you.
This is complicated by the fact that the majority of the material is not web
material (not HTML), it's mostly been scanned into PDF from archival
physical materials or is some completely other format with metadata in the
database. So there are a number of issues with terminology (terms used back
when vs the "correct" or "preferred" terms now, in addition to simple
changes in spelling, etc.). I see it as somewhat analogous to the problems
when instead of creating a true index for a document, somebody creates a
"concordance" from the words that occur in the document. You'll get a lot
more "false positive" hits when you search using a concordance, and many of
them won't be of any real interest. Neither will you find an article about
"color blindness" (for example) if the author only referred to "color visual
acuity" (or something equally close but "off").
I also have read that people don't bother with "advanced search" and after
testing a number of search engine implementations myself, I agree with those
who leave them alone. However, I was interested to see what people are
doing, and am glad to hear from several people with good information.
Alice Preston
Ithaka Harbors
Princeton, NJ
>From: Marcia Morante <marcia at kcurve.com>
>To: 'Alice Preston' <aliceflute at hotmail.com>, sigia-l at asis.org
>CC: marcia at kcurve.com
>Subject: RE: [Sigia-l] Full-Text vs Keyword Searching
>Date: Fri, 25 Feb 2005 15:10:49 -0500
>
>(Hi Alice -
>
>Lots of questions for a Friday, but congrats on your new job.
>
>Couple of issues here:
>
>1. You sound on exactly the right track by identifying attributes (I would
>call them metadata elements) of the documents that are browsable. You know
>best your material use and think about material, but it sounds as Content
>type would be useful as would Date (or range of dates), Audience type
>(parents, children) and probably others. Check out the site
>http://www.firstgov.gov for an example of how this was done on an Hone
>Page.
>They even organize by Topic (most popular). These same attributes or
>Metadata elements can be used on the Search Results page to allow users to
>refine their searches or to sort them.
>
>2. There's many ways to confiure search engine. Recent research indicates
>that people don't use Advanced Search, so I wouln't spend a lot of time on
>that.
>
>You also seem to be diffrentiating between the HTML Header KeyWords and the
>text represented by the document itself. If conntent is tagged with robust
>and consistent keywords, most search engines will let you use them as the
>"full text". I'm not sure why you are so concerned about separatating
>them,
>but if you are, it can be done. Some search engines will allow you to add
>value to terms that come from specific fields, such as Title. If you don't
>want to index Header information like Keywords, you usually can speify
>that.
>
>Hope this bit helps. Perhaps knowing your audiences and objetives would
>help you to make some decisions on how to proceed.
>
>Cheers,
>
>Marcia
More information about the Sigia-l
mailing list