[Sigia-l] on facets, with examples (mixing apples and oranges and tomatoes)
karl fast
karl.fast at pobox.com
Fri Apr 12 21:26:18 EDT 2002
> Traditionally, Peter is correct. Facets are generally
> post-enumerative. When you are putting together your taxonomy, you
> break the item into mutually exclusive pieces. The user assembles
> the pieces they are looking for (this color, this region, etc.) to
> find the item.
Yes.
> If an item fits in more than one category in your facet, you might
> not have broken the facet down enough. I suppose it could also have
> more than one color.
No. That's not how I understand it (more likely I misunderstand what
you're saying here, so apologies in advance).
Here's how I would explain it (please correct my mistakes):
Facets are a concept map. A faceted thesaurus is a hierarchical map
of *concepts* that describe the items you are indexing. Each major,
or top-level, facet is a mutually exclusive category. These major
facets are then further subdivided into sub-facets. Each of these
sub-facets is also mutually exclusive. The sub-faceting goes on
until you can't do any more. Mutual exclusivity is critical all the
way down the chain.
When you index an item, you are identifying the distinct concepts
that describe the item using terms that exist within the faceted
thesaurus (example coming up soon).
You select terms from the faceted thesaurus *irregardless* or which
facet they come from. Most items will have terms from multiple
facets. This does not mean you haven't broken down your facets
enough. It simply means the item you are describing is complex and
best represented with multiple concepts.
Okay, here's the example:
EXAMPLE: You are indexing a document about mothers who boycotted
an art gallery from displaying ceremonial weapons.
Now let's say you're using the Art & Architecture Thesaurus (AAT)
to select your terms. This is a faceted thesaurus. You can find it
online here (click the "browse" link at the top to browse the
hierarchy):
http://www.getty.edu/research/tools/vocabulary/aat/
For this example you would probably selected the following index
terms:
mothers
boycotts
ceremonial weapons
(you'd probably find a term for art gallery too, but I'm lazy)
All three of these terms are in the AAT.
But they are all in DIFFERENT facets.
"mothers" is under the AGENTS facet. The whole hierarchical path
is as follows. Items in <> represent concepts used to sub-facet
(ie: hierarchically categorize) concepts, but do not map to an
index term.
AGENTS FACET
People
<people by family relationship>
parents
mothers
"boycotts" is under the ACTIVITIES facet.
ACTIVITIES FACET
events
boycotts
"ceremonial weapons" is under the OBJECTS facet.
OBJECTS FACET
<Furnishings and Equipment>
<Weapons and Ammunition>
weapons
ceremonial weapons
Is this making sense?
There is nothing wrong with having an item represented by terms in
multiple facets. That's perfectly normal.
People often ask "where do I put things in a facted thesaurus?" The
answer is that YOU'RE NOT PUTTING ANYTHING ANYWHERE. You are simply
describing the item using the concepts in the thesaurus. And these
concepts have been organized into hierarchies of mutually exclusive
categories.
This leads to a critical point.
1. Facets are used by the person *DEVELOPING* the thesaurus as a
tool for organizing the concepts. The process of facet analysis
is a useful process for making sense of the problem space,
finding indexable concepts, and organizing them (much as
usability is a useful mechanism for finding problems with your
site).
2. Facets are often *NOT* used as a means of *ACCESS*. In many
cases a faceted thesaurus is "flattened" into an alphabetical
list of terms with the standard thesaural relationships:
broader/narrow terms, use/use for, and related terms.
The broader/narrow terms provides hierarchical access within
specific concepts. But the interface doesn't use facets as an
access mechanism. (this might be a bit confusing since the
distinction here is rather fine).
Perhaps an example will help.
EXAMPLE: ERIC is a huge educational database. It uses a faceted
thesaurus for an indexing language.
My university provides access to ERIC through WebSPIRS. The
WEBSpirs interface has a big THESAURUS button. But click it
and you have two options for locating terms: search for a
term or browse the alphabetical listings. The nine major
ERIC facets are not there.
Now if you find a term in ERIC it will probably have
broader/narrow terms. These are hierarchical. And they fall
out of the faceting process. But this is just a chunk of
the thesaurus. The user can't browse the whole thesaurus
using the faceted structured.
You can see this by comparing access to the ERIC thesaurus
and the AAT.
ERIC is online for free (with a different interface from
the WEBSpirs interface I mentioned above). The ERIC
thesaurus is here:
http://www.ericfacility.net/extra/pub/thessearchresults.cfm
Try searching for "Chemistry." You'll see there are
hierarchical relationships, and these hierarchical
relationships were created through the faceting process,
but they are not tied into the whole faceted structure. You
can go all the way up to "Liberal Arts," but there you
stop. And "Liberal Arts" is not one of the nine major ERIC
facets.
Now compare this the browse interface of the AAT where you
can go up and down (this interface is not the most
intuitive; it has some interaction problems, but spend time
mucking with it).
http://www.getty.edu/research/tools/vocabulary/aat/hierarchies.html
I hope this is making sense.
IMPORTANT POINT:
There is nothing preventing the whole faceted structure from being
used in the interface. It just hasn't been done, or done well.
Why not?
Imagine you're looking for the article I described above (the one
about mothers and ceremonial weapons).
You could SEARCH for those terms, but this is only easy if you can
map the terms you use to describe the concepts with the terms in the
thesaurus. Is it moms or mothers? Or perhaps you knew the weapon was
a dagger. How easy would it be to map that term to the concept of
"ceremonial weapons" without having lots of experience with AAT?
The challenge is developing a BROWSEABLE interface. This interface
will, ideally, let you pick up all of these terms as you go. But the
terms are in all these different heirarchies. You don't want to (a)
force people to drill down each hierarchy looking for terms to build
up a seach string (takes too long) or (b) build up a rich conceptual
map of the whole thesaurus before they can find anything (again,
takes too much time).
The FLAMENCO project is the best example so far (I think) of a
browseable interface that lets you do all this.
http://bailando.sims.berkeley.edu/flamenco.html
Some other efforts to develop better interfaces to faceted thesauri:
View-based searching systems - a new paradigm for information
retrieval based on faceted classification and indexing using
mutually constraining knowledge-based views.
http://www.hud.ac.uk/schools/cedar/bcshci.htm
Augmenting Thesaurus Relationships: Possibilities for Retrieval
http://jodi.ecs.soton.ac.uk/Articles/v01/i08/Tudhope/
Thesaurus based access to multimedia collections: Faceted retrieval tools
http://web.glam.ac.uk/schools/soc/research/hypermedia/facet_proj/index.php
Good lord that was long. Did anyone even bother to read all that?
hope this helps.
--karl
More information about the Sigia-l
mailing list