[Sigia-l] Distributed thesaurus?
Andrew McNaughton
andrew at scoop.co.nz
Sun Sep 22 15:24:12 EDT 2002
> * peter at poorbuthappy.com
> | I agree with Lars [...] that topicmaps can be easily used for
> | distributed thesauri. The problem is that someone still has to
> | define limits or agreements for the topicmap: how do we represent
> | certain thesauri concepts in our topicmap?
>
> It maps quite nicely as follows:
>
> - terms become topics (of type 'term', probably),
>
> - RT/BT/NT become one topic each, plus some extra topics for the
> association roles,
>
> - relationships can then be expressed as associations typed with the
> RT/BT/NT associations.
>
> And that's all you need, really. (If you want, I'll write up an
> example for you and post it. Shouldn't be hard.)
>
> Of course, all you have now is a thesaurus in topic map form, but even
> so it is quite useful since you can now make relations across
> thesauri, and you can even relate your simple thesaurus to something
> that is stronger on ontological commitment allowing you to break out
> of the thesaurus straightjacket.
It's worth keeping in mind that simplicity is an essential ingredient of
interoperability. The ANSI relationship types (BT, NT, RT, USE, UF) are
limiting in many ways, but there's very little in it that isn't going to
be wanted by almost everyone who wants to map out subject relationships,
and in almost all cases it will do most of what you want to do. It's also
quite complex enough for most uses in that training cataloguing staff
to use this model in a correct and consistent manner is already
a significant issue.
Ease of generating large collections of terms is also a mixed blessing.
The difficulty of producing a good vocabulary lies in carefully deciding
what should be in it, considering the level of granularity which should be
represented in various areas, the particular language used by the target
users, and how the collection and the cataloguing is likely to evolve in
future. Also policy is required for how terms are selected in order that
the collection can be kept as consistent as possible, so that users can
get reasonably predictable and complete search results.
If you want to get a single term added to the LCSH you would expect to
invest a substantial ammount of time in the process, eg a good deal of
research into existing and possible alternative terms, and the approach
taken by other thesauri. You would need to cite source documents, and
your application would be decided by comittee. There's good reasons for
this investment of effort. Without it the thesaurus would become
unworkable.
Of course all of this might be simpler if you don't need to coordinate
your efforts in a workable way with other organisations, but that's kind
of the point of having standards.
To the extent that topicmaps provides a mechanism for describing the model
that a set of subjects and relationships uses, it's good to have a
consistent way of doing this. However, I suspect it's only really useful
in combination with standard descriptions of mechanisms such as you
propose to provide as an example. If describing things like 'Broader
Term' is something that needs to be done ad-hoc rather than on the basis
of an appropriate standard template which is used where appropriate by
most topicmap users, then how good a basis for standard data exchange is
this?
Andrew McNaughton
More information about the Sigia-l
mailing list