[Sigia-l] Distributed thesaurus?

Lars Marius Garshol larsga at garshol.priv.no
Sun Sep 22 16:55:31 EDT 2002


* Andrew McNaughton
| 
| It's worth keeping in mind that simplicity is an essential
| ingredient of interoperability. 

That is certainly true. There is great value in simplicity. The
oppsite also applies, however.

| The ANSI relationship types (BT, NT, RT, USE, UF) are limiting in
| many ways, but there's very little in it that isn't going to be
| wanted by almost everyone who wants to map out subject
| relationships, and in almost all cases it will do most of what you
| want to do.  

Actually, that's not the case. If these are the only relationships
you've got, there's not a whole lot you can say. You've effectively
restricted yourself to a very narrow scope. 

The point with topic maps is that rather than just building thesauri
you can build much more precise models of the world, which is what
makes topic maps so helpful for findability. Thesauri work up to a
point, but only a few limited domains really fit into a tree model,
while most of the real world does not.

| It's also quite complex enough for most uses in that training
| cataloguing staff to use this model in a correct and consistent
| manner is already a significant issue.

Well, I think part of the problem is that they are trained to bang a
round peg into a square hole. Nobody would have problems extending
something like Steve Pepper's Italian Opera Topic Map, because
classifying things as being one (or more) of opera, composer,
librettist, play, novel, author, city, country, character, and theatre
is not very difficult.

Similarly, working out whether the relationship between two of these
is born-in, died-in, pupil-of, premiered-at, character-in-opera,
contained-in, composed-by, killed-by (for characters) etc is not very
hard. 

Arranging the same information in a thesaurus would be much harder,
and at the same time it would lose information and yield a system with
much fewer uses and which is much less easy to use.

| Ease of generating large collections of terms is also a mixed
| blessing.  The difficulty of producing a good vocabulary lies in
| carefully deciding what should be in it, considering the level of
| granularity which should be represented in various areas, the
| particular language used by the target users, and how the collection
| and the cataloguing is likely to evolve in future.  Also policy is
| required for how terms are selected in order that the collection can
| be kept as consistent as possible, so that users can get reasonably
| predictable and complete search results.

This is absolutely true, and it applies equally to both thesauri and
topic maps that use more precise ontologies. These are general
information management issues and something I doubt we will ever avoid
having to deal with.
 
| If you want to get a single term added to the LCSH you would expect
| to invest a substantial ammount of time in the process, eg a good
| deal of research into existing and possible alternative terms, and
| the approach taken by other thesauri.  You would need to cite source
| documents, and your application would be decided by comittee.
| There's good reasons for this investment of effort.  Without it the
| thesaurus would become unworkable.

Certainly, but I think that is in part because of the peg/hole
problem. How controversial would it be to say that Bizet was a
composer? How much research would you need to put forward before
people would accept it, and how much checking would it require? I
would say that the added clarity would make it a lot easier.
  
| To the extent that topicmaps provides a mechanism for describing the
| model that a set of subjects and relationships uses, it's good to
| have a consistent way of doing this.  However, I suspect it's only
| really useful in combination with standard descriptions of
| mechanisms such as you propose to provide as an example.  If
| describing things like 'Broader Term' is something that needs to be
| done ad-hoc rather than on the basis of an appropriate standard
| template which is used where appropriate by most topicmap users,
| then how good a basis for standard data exchange is this?

Topic maps are no different than XML in this respect, really; topic
maps just have a model that do certain things better and certain
things worse than XML. What is similar is that in both cases you must
choose how you want to represent something, either by adopting someone
else's standard (published subjects) or by making your own.

The choice is yours, and means greater power and flexibility to the
user. Of course, it also means more rope to hang yourself with.

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
ISO SC34/WG3, OASIS GeoLang TC        <URL: http://www.garshol.priv.no >




More information about the Sigia-l mailing list