[Sigcr-l] Exhaustivity and specifity of indexing

Andrew Grove Andrew.Grove at microsoft.com
Sat Aug 26 15:16:26 EDT 2006


Birger, et al.:
I've been following the discussion with great interest.  All relevant.  My goals for asking what is the more specific question were several:
1.  Identify more clearly Kora's original information need in order to avoid misunderstanding it and spending time answering something different.  Motivated by personal time management objectives.
2.  Identify deficiencies in the literature in order to identify opportunities for contribution to it.
3.  Possibly identify related bodies of literature that might contribute to answering Kora's question(s).

That said, I will add these brief comments.

This sounds very much like the long-standing discussion in Taxonomy between "lumping" and "splitting".  As a practitioner, not a scholar, I make a pretty clear distinction between the two.  Both are useful for describing and retrieving information objects.  Classification ("lumping") provides relatively broad, general categories which serve to group similar objects (topics, concepts, provenance, purpose, etc.).  Indexing ("splitting") marks objects in a manner which distinguishes each from others which are similar but not the same.  Because of the multiplicity of objects having the same or similar characteristics, indexing also serves to group them -- but at a very specific level.  Because of the multiplicity of objects alone, classification also serves to distinguish them -- but at broad and general levels.  A highly detailed classification, which an extended DDC could become, tends to dive into the realm of indexing languages.  A broad, general index language, which many are for pragmatic reasons based on collection size and scope, tends to "bubble up" into the realm of classification schemes.  The distinction between classification and indexing ends up becoming situational and very fluid.

For what it's worth, I will suggest examination of the literature on Taxonomy, and the branches of Logic and Linguistics which deal, specifically, with the relationships of objects to each other.

Most respectfully yours,
Andrew

-----Original Message-----
From: Birger Hjørland [mailto:BH at db.dk] 
Sent: Saturday, August 26, 2006 11:34 AM
To: Andrew Grove; Leonard Will; sigcr-l at asis.org
Subject: SV: [Sigcr-l] Exhaustivity and specifity of indexing

Answer to Andrew: 
Yes, I believe the literature is comprehensive and answers most questions. My point was that is Kore expected separate literatures about the specificity of indexing and classification, whereas I proposes that this is fundamentally the same. The next round was about the specificity about the indexing language versus the actual indexing/classification practice, where I suggested, following Cutter (1876) to index as specific as possible in tghe given system. 
 
kind regards Birger
 
 
 

________________________________

Fra: sigcr-l-bounces at asis.org på vegne af Andrew Grove
Sendt: lø 26-08-2006 16:51
Til: Leonard Will; sigcr-l at asis.org
Emne: Re: [Sigcr-l] Exhaustivity and specifity of indexing



Hello,

I am resisting the urge to leap in too quickly here.  In Kora's original message, there's mention of literature on the subject but it does not suffice.  So true, there is a wealth of literature on the subject.  So much of it in fact, I wonder in what manner it does not suffice.  What is, forgive me, the more specific question the literature does not answer?

Most respectfully,
Andrew

Andrew Grove
Program Manager, Taxonomy
Knowledge Network Group
Microsoft Corporation
425 706-5557


-----Original Message-----
From: sigcr-l-bounces at asis.org [mailto:sigcr-l-bounces at asis.org] On Behalf Of Leonard Will
Sent: Saturday, August 26, 2006 6:34 AM
To: sigcr-l at asis.org
Subject: Re: [Sigcr-l] Exhaustivity and specifity of indexing

In message <FB64419FDA34834382771A20964B15CED27AF9 at amon.it.lth.se> on Thu, 24 Aug 2006, Koraljka Golub <kora at it.lth.se> wrote
>
>Does anyone know of any references or have any opinion about 
>exhaustivity and specificity of classification, meaning assignment of 
>classes from a classification scheme.

In message <73573C2DCB0154408D790B1E7EDB0C521B9E1F at ka-exch01.db.dk> on Sat, 26 Aug 2006, Birger Hjørland <BH at db.dk> wrote
>Dear Kora,
>I believe, that you are making the wrong assumption that indexing and 
>classification is different in this respect. If you take a concept from 
>a controlled vocabulary (say, a thesaurus) this is in my opinion 
>similar to taking a class from a a clasification system (which also 
>represents a concept). So, the specificity of a term in a thesaurus 
>depends on the number of terms given and the specificity of a class in 
>a classification system depends on the number og classes given (the 
>more terms/classes, the greater the specificity of applying a given 
>term/class). It it worth considering however, that although the overall 
>specificity can be measured by counting the number of 
>descriptors/classes, any given system will have a greater specificity 
>in some areas compared to others (DDC, for example, is much more 
>specific in Christianity compared to other religions).

I agree with what Birger says, but I think that Koraljka's question was not so much about the specificity provided in the scheme itself, but the specificity with which it is applied when classifying documents, i.e., for example, is it worth while to use the full specificity possible in DDC by adding all the possible common subdivisions, "divide-like"
instructions and so on, or is it better to simplify by limiting class numbers to 3 (or 6 or whatever) digits?

The answer to this must be that it depends on the material being classified. The aim should be to classify specifically enough to make it easy for the user to scan through the items in a class. I usually think of this as meaning that a class should contain between 10 and 50 items.
If the collection is large, or concentrated in a single subject area, more specificity will be needed than if it is a small, general collection.

Other considerations are:

a. Allowing for growth of the collection. You don't want to have to go back and re-classify if more material is added in a given subject area.

b. Compatibility with what is being done elsewhere. Do you share records, obtain them from elsewhere or merge them in a combined catalogue?

c. Provision of access from concepts that are scattered by the classification. These may come later in the citation order of combining facets in a synthesised class number, and if the number is truncated they will be lost.

d. Adequacy of the alphabetical index constructed to show where topics have been classed. It is seldom adequate to rely on the index published with the schedules, but far too often that is all that is provided. It will not show many synthesised numbers, and there is little point in creating these if you do not also create the means of finding them.

Exhaustivity is more a matter of subject analysis of the documents. Do you identify and record topics that are only treated incidentally in a document, or do you restrict indexing and classification to the main topics only? There is no simple answer, so much depending on the nature of the collection, the users, and the purpose of the catalogue.

Leonard Will

--
Willpower Information       (Partners: Dr Leonard D Will, Sheena E Will)
Information Management Consultants              Tel: +44 (0)20 8372 0092
27 Calshot Way, Enfield, Middlesex EN2 7BQ, UK. Fax: +44 (0)870 051 7276
L.Will at Willpowerinfo.co.uk               Sheena.Will at Willpowerinfo.co.uk
---------------- <URL:http://www.willpowerinfo.co.uk/> -----------------


_______________________________________________
Sigcr-l mailing list
Sigcr-l at asis.org
http://mail.asis.org/mailman/listinfo/sigcr-l

_______________________________________________
Sigcr-l mailing list
Sigcr-l at asis.org
http://mail.asis.org/mailman/listinfo/sigcr-l









More information about the Sigcr-l mailing list