[Sigia-l] Distributed thesaurus?

Lars Marius Garshol larsga at garshol.priv.no
Sun Sep 22 07:03:55 EDT 2002


* Travis Wilson
| 
| An interesting discussion on the XFML mailing list (xfml.org) has
| got me to the point where I want a single distributed thesaurus to
| exist on the net. You know how each institution has its own
| controlled vocabulary of terms, which is a self-contained graph of
| terms and their relationships? I want a standard syntax by which I
| can link my terms to terms in other people's thesauri. Those links
| would comply with conventional thesaurus relationships
| (RT/BT/NT/etc), so the end result would be one huge distributed
| vocabulary.

This is what topic maps do. You've just described a straightforward
topic map application. (I call it that because you are restricting the
association types and probably also the topic types.)
 
| XFML is the most promising recent development effort I've seen in
| this area, but I wonder if anyone has seen another that might be a
| potential standard.

Topic maps is a standard already. ISO/IEC 13250:2002.

The XFML specification says it's based on topic maps, and from what
I've seen of it that's true. The main problem with it, as I see it, is
that topic maps are an identity-based technology, and XFML does not
seem to have a well-defined notion of identity, nor any way in which
its identities can be connected to those of topic maps.

This may sound like an obscure technical point, but the effect is that
with XFML you can't have a reliable distributed thesaurus, since you
have no rules for when your term A is the same as my term B. (My next
article on XML.com, currently in outline, will be about precisely this
issue.) So when I say something about A it will not match up with what
you've said about, and if you receive A from two different sources you
won't know that it's the same.

With topic maps, on the other hand, you have full control over this.
You can know, with 100% certainty, when two things really are the
same, and you can also know within what context each name of a thing
is used, and also where each statement made about that thing comes
from. (Mary Nishikawa presented a paper that shows this

The result is that merging multiple classification systems or thesauri
is technically straightforward, as is making sense of the resulting
structure. You can also build a single thesaurus this way, should you
want to.

(Uh, yes, I guess this might also qualify as a promotional email. It
is really just my thoughts on this issue, but they do lean in one
particular direction...)

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
ISO SC34/WG3, OASIS GeoLang TC        <URL: http://www.garshol.priv.no >




More information about the Sigia-l mailing list