[Sigia-l] on facets, with examples (mixing apples and oranges and tomatoes)

karl fast karl.fast at pobox.com
Fri Apr 12 21:26:18 EDT 2002


> Traditionally, Peter is correct. Facets are generally
> post-enumerative. When you are putting together your taxonomy, you
> break the item into mutually exclusive pieces.  The user assembles
> the pieces they are looking for (this color, this region, etc.) to
> find the item.

Yes.

> If an item fits in more than one category in your facet, you might
> not have broken the facet down enough. I suppose it could also have
> more than one color.

No. That's not how I understand it (more likely I misunderstand what
you're saying here, so apologies in advance).

Here's how I would explain it (please correct my mistakes):

Facets are a concept map. A faceted thesaurus is a hierarchical map
of *concepts* that describe the items you are indexing. Each major,
or top-level, facet is a mutually exclusive category. These major
facets are then further subdivided into sub-facets. Each of these
sub-facets is also mutually exclusive. The sub-faceting goes on
until you can't do any more. Mutual exclusivity is critical all the
way down the chain.

When you index an item, you are identifying the distinct concepts
that describe the item using terms that exist within the faceted
thesaurus (example coming up soon).

You select terms from the faceted thesaurus *irregardless* or which
facet they come from. Most items will have terms from multiple
facets. This does not mean you haven't broken down your facets
enough. It simply means the item you are describing is complex and
best represented with multiple concepts.

Okay, here's the example:

EXAMPLE: You are indexing a document about mothers who boycotted
         an art gallery from displaying ceremonial weapons.

  Now let's say you're using the Art & Architecture Thesaurus (AAT)
  to select your terms. This is a faceted thesaurus. You can find it
  online here (click the "browse" link at the top to browse the
  hierarchy):

    http://www.getty.edu/research/tools/vocabulary/aat/  

  For this example you would probably selected the following index
  terms:

     mothers
     boycotts   
     ceremonial weapons
     (you'd probably find a term for art gallery too, but I'm lazy)

  All three of these terms are in the AAT. 

  But they are all in DIFFERENT facets.

  "mothers" is under the AGENTS facet. The whole hierarchical path
  is as follows. Items in <> represent concepts used to sub-facet
  (ie: hierarchically categorize) concepts, but do not map to an
  index term.

      AGENTS FACET
         People
            <people by family relationship>
                parents
                   mothers            
         
  "boycotts" is under the ACTIVITIES facet.

      ACTIVITIES FACET
         events
             boycotts

  "ceremonial weapons" is under the OBJECTS facet.

      OBJECTS FACET
         <Furnishings and Equipment>
             <Weapons and Ammunition>
                 weapons
                    ceremonial weapons


Is this making sense?

There is nothing wrong with having an item represented by terms in
multiple facets. That's perfectly normal.

People often ask "where do I put things in a facted thesaurus?" The
answer is that YOU'RE NOT PUTTING ANYTHING ANYWHERE. You are simply
describing the item using the concepts in the thesaurus. And these
concepts have been organized into hierarchies of mutually exclusive
categories.

This leads to a critical point.

  1. Facets are used by the person *DEVELOPING* the thesaurus as a
     tool for organizing the concepts. The process of facet analysis
     is a useful process for making sense of the problem space,
     finding indexable concepts, and organizing them (much as
     usability is a useful mechanism for finding problems with your
     site). 

  2. Facets are often *NOT* used as a means of *ACCESS*. In many
     cases a faceted thesaurus is "flattened" into an alphabetical
     list of terms with the standard thesaural relationships:
     broader/narrow terms, use/use for, and related terms.

     The broader/narrow terms provides hierarchical access within
     specific concepts. But the interface doesn't use facets as an
     access mechanism. (this might be a bit confusing since the
     distinction here is rather fine).


Perhaps an example will help.     
     
EXAMPLE: ERIC is a huge educational database. It uses a faceted
         thesaurus for an indexing language.

         My university provides access to ERIC through WebSPIRS. The
         WEBSpirs interface has a big THESAURUS button. But click it
         and you have two options for locating terms: search for a
         term or browse the alphabetical listings. The nine major
         ERIC facets are not there. 

         Now if you find a term in ERIC it will probably have
         broader/narrow terms. These are hierarchical. And they fall
         out of the faceting process. But this is just a chunk of
         the thesaurus. The user can't browse the whole thesaurus
         using the faceted structured.

         You can see this by comparing access to the ERIC thesaurus
         and the AAT.
         
         ERIC is online for free (with a different interface from
         the WEBSpirs interface I mentioned above). The ERIC
         thesaurus is here:

         http://www.ericfacility.net/extra/pub/thessearchresults.cfm

         Try searching for "Chemistry." You'll see there are
         hierarchical relationships, and these hierarchical
         relationships were created through the faceting process,
         but they are not tied into the whole faceted structure. You
         can go all the way up to "Liberal Arts," but there you
         stop. And "Liberal Arts" is not one of the nine major ERIC
         facets.

         Now compare this the browse interface of the AAT where you
         can go up and down (this interface is not the most
         intuitive; it has some interaction problems, but spend time
         mucking with it).
         
         http://www.getty.edu/research/tools/vocabulary/aat/hierarchies.html

I hope this is making sense.
         
IMPORTANT POINT:

  There is nothing preventing the whole faceted structure from being
  used in the interface. It just hasn't been done, or done well.

Why not? 

Imagine you're looking for the article I described above (the one
about mothers and ceremonial weapons).

You could SEARCH for those terms, but this is only easy if you can
map the terms you use to describe the concepts with the terms in the
thesaurus. Is it moms or mothers? Or perhaps you knew the weapon was
a dagger. How easy would it be to map that term to the concept of
"ceremonial weapons" without having lots of experience with AAT?

The challenge is developing a BROWSEABLE interface. This interface
will, ideally, let you pick up all of these terms as you go. But the
terms are in all these different heirarchies. You don't want to (a)
force people to drill down each hierarchy looking for terms to build
up a seach string (takes too long) or (b) build up a rich conceptual
map of the whole thesaurus before they can find anything (again,
takes too much time).

The FLAMENCO project is the best example so far (I think) of a
browseable interface that lets you do all this.

  http://bailando.sims.berkeley.edu/flamenco.html

Some other efforts to develop better interfaces to faceted thesauri:

  View-based searching systems - a new paradigm for information
  retrieval based on faceted classification and indexing using
  mutually constraining knowledge-based views. 
  http://www.hud.ac.uk/schools/cedar/bcshci.htm

  Augmenting Thesaurus Relationships: Possibilities for Retrieval
  http://jodi.ecs.soton.ac.uk/Articles/v01/i08/Tudhope/

  Thesaurus based access to multimedia collections: Faceted retrieval tools
  http://web.glam.ac.uk/schools/soc/research/hypermedia/facet_proj/index.php  
  

Good lord that was long. Did anyone even bother to read all that?

  
hope this helps.
  

--karl  










More information about the Sigia-l mailing list