[Sigia-l] Distributed thesaurus?

Lars Marius Garshol larsga at garshol.priv.no
Sun Sep 22 14:13:39 EDT 2002


* Lars Marius Garshol
|
| This may sound like an obscure technical point, but the effect is
| that with XFML you can't have a reliable distributed thesaurus,
| since you have no rules for when your term A is the same as my term
| B.[...]

* peter at poorbuthappy.com
| 
| Indeed, that's a fundamental difference. Let me just throw in that
| XFML is a lot less ambitious than Topicmaps - it's a whole different
| ballgame.

I agree, but I think the omission of this particular feature is a
great pity, since it means that you can't have interoperability across
XFML datasets, which I think greatly lowers the value of XFML.
Creating an XFML -> TM mapping is easy, but of little value as long as
everything arrives in the topic map in the form of topics of
inscrutable identity.

If I were you I would seriously consider adding at least an optional
attribute for giving each topic a URI. Then people can do it if they
like and skip it if they don't.
 
| However, XFML *does* allow for distributed metadata (not distributed
| "thesaurus", but distributed faceted metadata hierarchies), it just
| wants you to make connections between topics *manually*, instead of
| doing it automatically (using psi's) like topicmaps. It's a
| different philosophy, really. There are rules that indicate which
| topics are the same, it is unambigiously indicated within each
| <topic> element.

I noticed that ability, but why not remove that allow you to point to
a third party (a subject indicator) instead, since that gives you much
greater leverage. With this approach, you are essentially stuck within
the XFML world, since you can never point out of XFML.
 
| I agree with Lars [...] that topicmaps can be easily used for
| distributed thesauri. The problem is that someone still has to
| define limits or agreements for the topicmap: how do we represent
| certain thesauri concepts in our topicmap? 

It maps quite nicely as follows:

  - terms become topics (of type 'term', probably),

  - RT/BT/NT become one topic each, plus some extra topics for the
    association roles,

  - relationships can then be expressed as associations typed with the
    RT/BT/NT associations.

And that's all you need, really. (If you want, I'll write up an
example for you and post it. Shouldn't be hard.)

Of course, all you have now is a thesaurus in topic map form, but even
so it is quite useful since you can now make relations across
thesauri, and you can even relate your simple thesaurus to something
that is stronger on ontological commitment allowing you to break out
of the thesaurus straightjacket.

In fact, that simple first step allows you to continue in all manner
of directions, should you want to.

| That is the format you (I think) are waiting for. So you need a
| subset of topicmaps for your purposes, a spec in which you agree how
| to represent distributed thesauri in the topicmap format.

Actually, there's no need to subset topic maps, because that would
effectively mean going back to thesauri, and we already have those, so
there's little point in reinventing them. What you need is a way to
express your thesaurus as a topic map, so that you can break out of
the inherent limitations of the thesaurus model.

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
ISO SC34/WG3, OASIS GeoLang TC        <URL: http://www.garshol.priv.no >




More information about the Sigia-l mailing list