[Sigia-l] Full-Text vs Keyword Searching

Marcia Morante marcia at kcurve.com
Fri Feb 25 15:10:49 EST 2005


(Hi Alice -

Lots of questions for a Friday, but congrats on your new job.

Couple of issues here:

1.  You sound on exactly the right track by identifying attributes (I would
call them metadata elements) of the documents that are browsable.  You know
best your material use and think about material, but it sounds as Content
type would be useful as would Date (or range of dates), Audience type
(parents, children) and probably others.  Check out the site
http://www.firstgov.gov for an example of how this was done on an Hone Page.
They even organize by Topic (most popular).  These same attributes or
Metadata elements can be used on the Search Results page to allow users to
refine their searches or to sort them.  

2.  There's many ways to confiure search engine.  Recent research indicates
that people don't use Advanced Search, so I wouln't spend a lot of time on
that.  

You also seem to be diffrentiating between the HTML Header KeyWords and the
text represented by the document itself.  If conntent is tagged with robust
and consistent keywords, most search engines will let you use them as the
"full text".  I'm not sure why you are so concerned about separatating them,
but if you are, it can be done.  Some search engines will allow you to add
value to terms that come from specific fields, such as Title.  If you don't
want to index Header information like Keywords, you usually can speify that.

Hope this bit helps.  Perhaps knowing your audiences and objetives would
help you to make some decisions on how to proceed.

Cheers,

Marcia

> -----Original Message-----
> From: sigia-l-bounces at asis.org 
> [mailto:sigia-l-bounces at asis.org] On Behalf Of Alice Preston
> Sent: Thursday, February 24, 2005 2:40 PM
> To: sigia-l at asis.org
> Subject: [Sigia-l] Full-Text vs Keyword Searching
> 
> Greetings--
> 
> Having recently switched jobs, I find myself being asked to 
> provide different kinds of justifications for what I do (or 
> what users might do) than in the past. This week's question 
> has to do with full-text search vs keyword search. I know 
> something about the subject, but maybe not enough to keep out 
> of trouble.
> 
> The application will have huge collections of items (some are 
> documents, some images, some other types of materials). To 
> assist users with finding what they need, we plan to offer 
> browsing through access to a couple of hierarchies, such as 
> Topic, Origin of the material, (etc.) with sub headings 
> taking them in directions we expect to be somewhat frequently 
> used. We also plan to offer both a simple and "advanced" 
> search. It seems that the development side expected for 
> simple search to be just full-text search (with exact match) 
> and advanced search to require the user to select an 
> attribute and provide match text for the attribute.  (I'm 
> still figuring out what they thought would happen with all 
> the keywords the content providers are working to supply in 
> the metadata.)
> 
> So here's the question (OK, several questions):
> 
> Have you worked on a system with a combination of browse and 
> search where you dealt with keywords in some additional 
> manner? How did you do it? For example, what if the simple 
> search was sensitive to whenever a keyword was entered and 
> switched to matches on keywords then? Or what about an 
> application where there was an Index choice as well as (or 
> maybe instead of) Advanced Search?
> 
> Other thoughts on how people succeed or don't succeed with 
> such searches?
> 
> My gut instinct is to step farther away from full-text search 
> (especially because some of the materials are "in 
> context"--think full pages of newspapers instead of just 
> clippings--and the searching will find the text strings on 
> the same page around the piece in question, not only within 
> the piece itself). I do not yet have access to target users; 
> they will be university undergraduate and graduate students, 
> librarians, faculty, and non-university researchers in 
> several different academic areas, across the world. I fully 
> realize how many "ifs" there are, but I am in the position of 
> needing to either back development's initial plan to omit 
> keywords from their work or to supply some kind of generic 
> evidence that they should use them in some manner.
> 
> Thanks in advance for any background or opinions you can 
> share-- Alice Preston Ithaka Harbors Princeton, NJ USA
> 
> 
> ------------
> When replying, please *trim your post* as much as possible.
> *Plain text, please; NO Attachments
> 
> Searchable list archive:   http://www.info-arch.org/lists/sigia-l/
> ________________________________________
> Sigia-l mailing list -- post to: Sigia-l at asis.org Changes to 
> subscription: http://mail.asis.org/mailman/listinfo/sigia-l
> 




More information about the Sigia-l mailing list