SV: [Sigia-l] Findability

Lars Marius Garshol larsga at garshol.priv.no
Mon Jan 27 15:58:47 EST 2003


* Lars Marius Garshol
|
| Well, I do wish that more people would realize that simple
| categorization and taxonomies are not going to take anyone very far
| on the road towards findability.

* Gunnar Langemark
| 
| I wasn't talking about simple categorization but about faceted
| metadata which - as far as I understand it - is a quite different
| ball game. I'm not the expert on this topic.

Faceted metadata essentially means replacing a single hierarchy by
multiple hierarchy, with one hierarchy for each facet. Now,
structurally that's actually the same as having a single hierarchy.
(Just add a single node at the top linking all the facets and you've
got it.)

Admittedly, the division into different facets *does* lend a little
more discipline to the construction and use of the hierarchy, but the
fact is that you still have a hierarchy. (The multiple/single
hierarchies distinction is somewhat contrived, as I demonstrated
above.)

So I don't think calling faceted metadata "a quite different ball
game" is entirely accurate. If you'd said "slightly better" I would
have agreed.

| But you continue:
| 
| >or something else.
| 
| and here we disagree.
|
| If description is at all possible - "something else" MUST be able to
| describe the world well. Albeit all description is by nature
| reductionist - or else we would be talking about simulation and
| virtual reality. But now we're in the realm of nitpicking and
| hairsplitting. ;)

Not really. We're in the realm of misunderstanding. :-)

If you read what I wrote again you'll see that I was talking about
hierarchies, so the "something else" refers to other kinds of
hierarchies than "taxonomies and thesauri".
 
* Lars Marius Garshol
|
| When you use them you nearly always end up with ill-defined
| categories that cannot really be used for any form of automated
| processing, and which do not really help searching very much,
| either.
 
* Gunnar Langemark
|
| What I AM talking about is the individuals opportunity to take
| control of his own contextualization so to speak, and the ability to
| connect different sets of categories - and the ability to choose
| your peers with whom you share categorization systems. This is
| power.

I think any talk about individual power of categories is misleading,
whether power resides with individuals or is centralised depends on
the architecture of the system and not on its data representation.

What I said, and what you agreed with, is that hierarchies provide
very poor descriptive power, and that is the only sort of power it
makes sense to discuss in this context, since centralization is
orthogonal to the issue of whether to use a hierarchical system or an
ontology.

And hierarchies fail on several counts in this regard:

 - you cannot type relationships, so there is only a single kind of
   relationship between the nodes,

 - you cannot type the nodes, so a machine can't tell countries from
   diseases from people from animal species,

 - you cannot assign properties to the nodes beyond one or two fixed
   kinds of names and perhaps some untyped URIs to content, and

 - your relationships must form a tree.

Obviously this doesn't give you much power to model the real world,
and equally obviously facets do not solve any of these problems. (You
could theoretically solve the second problem with them, but I doubt
that would work well in practice.)

| I guess that if your ideal is full fledged automation - even faceted
| metadata wouldn't do the trick, and individual power over categories
| would be counterproductive.

I think you misunderstood what I meant by automation: I was thinking
of having a machine use the categorization system. Of course, so long
as you are talking about simple categories that won't work, but with
an ontology the machine can do more than show you a tree. (See below
for a trivial example.)

| If however, your ideal is power to the user, faceted metadata, and
| distributed metadata, and syndication - might prove to be the
| concepts along which, work could be done in order to empower us all
| - individually - with the tools needed to connect to the right
| content on the web.
 
I don't think simple Dublin Core-like metadata is enough, either. And
syndication is not going to help me deal with my own content.

| I simply want to be able to connect to others (and their content)
| who I personally choose to connect to, and to be able to automate
| some of the tasks.
| What I am talking about is the personalization of content search and
| aggregation.

Then we are talking about the same things.
 
* Lars Marius Garshol
|
| I've seen enough systems built with thesauri and taxonomies by now
| to know that in bigger hierarchical systems there is usually a lot
| of very useful information buried, but it *is* buried. Getting the
| real data out of the mire is usually quite a bit of work.
 
* Gunnar Langemark
|
| I agree that content tend to be buried in large systems with single
| hierarchies. 

Not just the content, but also the nodes in the hierarchies tend to
contain useful information that cannot be extracted and made use of
without manual effort.

I'll give you an example which may make it clearer what I mean. I was
looking at the thesaurus system used by a large oil company to
classify the content in one of its portals. It was a huge thing with
thousands of nodes, and users were told to connect all documents to as
many nodes as were relevant.

One branch of the thesaurus (or facet, as you'd call it) dealt with
geography, and another with the company structure. That's fine as far
as it goes, of course, but unfortunately these two are connected. So
you'll find something like this (this is from memory, and so unlikely
to be entirely accurate):

  Oil services
    Oil surveying
      Kano field location
    Oil production

  Africa
    Nigeria
      Kano (a town)
    Morocco

Now, the survey report of the Kano field location will have to be
connected to two nodes, but of course, what you'd really like to
express is that there is a connection between the two nodes. You could
in theory do it by putting "Kano field location" under "Kano", but
then it could no longer be under "Oil surveying".

What you'd really want to say is something like this:

  "Oil services" is a "business area"
  "Oil surveying" is a "department"
  "Kano field location" is an "office"
  "Oil production" is a "department"

  "Africa" is a "continent"
  "Nigeria" is a "country"
  "Kano" is a "place"
  "Morocco" is a "country"

  "Kano field location" is "located in" "Kano"

Given a model like this you could actually just connect the survey
report to "Kano field location" and be done with it. Users interested
can go to the field location from the place, should they be interested
in the survey report.

And, what is much more, you can also answer questions such as "what
departments have offices in Africa?" and you can easily list all
departments, all countries, all countries in Africa, and so on and so
forth. 

You can also go much further than this, because you can connect
various kinds of information with each node, you can discover when
nodes in different systems represent the same thing, and so on and so
forth for quite a while.

| But if facets are not part of the solution - what IS your solution
| to it?

Ontologies. And for information architecture I'd use topic maps.

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
GSM: +47 98 21 55 50                  <URL: http://www.garshol.priv.no >




More information about the Sigia-l mailing list