ART:Takahashi,Impact Factor & Small Fields

Quentin L. Burrell familyburrell at ENTERPRISE.NET
Fri Jul 16 05:54:18 EDT 1999


The frequency distributions of citation data given by Stegmann reveal
features deserving of comment. The distributions relate to a set of 289
references and the citations they accumulate (a) during a one-year period
(1996) and (b) during a longer period (1994-1999). [Note: Careful
consideration of the data reveals an error in the second set - the entry
opposite 14 citations should be 6 not 5.]

(i) The dependence of the shape of the distributions - from inspection of
the data or via graphical representation - on the length of the time period
is typical of a variety of informetric data such as library circulations (1,
2) and cumulating bibliographies (3,4).

(ii) Stegmann remarks that, over the one-year period, 9.6% of the papers
 =22% of cited papers) provide 50% of citations. Over the extended period,
12.8% of all papers (=18% of cited papers) provide 50% of citations. These
are variants of "80/20" or other concentration analyses. The importance of
both the length of time period and the inclusion or otherwise of the "zero
category"  in such analyses have been pointed out previously (e.g. 5, 6).

(iii) Consideration of semi-log graphs of the frequency distributions
suggests the fitting of negative binomial distributions in both cases. Just
using the simple moment estimators for the parameters allows one to generate
fitted values in good agreement with the observed frequencies (as judged by
a chi-squared goodness of fit test).

(iv) From the above it seems that a simple model for the acquisition of
citations by a body of published papers might well be the familiar
Gamma-Poisson model, whereby individual papers are cited according to a
Poisson process while different papers acquire citations at different rates
(the individual impact factors?) where these rates follow approximately a
gamma distribution.

Quoting Garfield, "Citation data is subtle stuff". Stegmann's data suggest
that, like in most other areas of informetrics (7, 8, 9), successful
citation modelling will require incorporation of a time parameter.

(I apologise if others have already made similar remarks. I would be
grateful to receive references, or even better - I don't have ready access
to a large University library - copies of relevant papers.)

(1) Kent et al:  "The use of library materials" Dekker, 1979
(2) QL Burrell & V Cane: JRSS(A), 145 (1982)
(3) QLB: In "Informetrics 91" Ranganathan Endowment, 1992
(4) V Oluic-Vukovic: JASIS, 43 (1992)
(5) QLB: J Doc, 41 (1985)
(6) QLB: Inf Proc & Man, 28, (1992)
(7) QLB: JASIS, 44 (1993)
(8) V O-V: JASIS, 48 (1997)
(9) QLB: Scientom, 30 (1994)


Dr Quentin L Burrell
119 Friary Park
Ballabeg
Isle of Man IM9 4EX
United Kingdom

Tel. +44 (0)1624 824638



----- Original Message -----
From: Johannes Stegmann <stegmann at UKBF.FU-BERLIN.DE>
To: <SIGMETRICS at listserv.utk.edu>
Sent: 12 July 1999 16:06
Subject: Re: [SIGMETRICS] ART:Takahashi,Impact Factor & Small Fields


> I would like to comment on the paper by Takahashi et al.
>
> 1. Takahashi states: "It is recognized that there is poor correlation
> between citation counts of individual papers and journal IFs" and cites
the
> BMJ paper by Seglen
> (Brit Med J 1997; 314:498-502).
> I think, the correlation between the number of citations an individual
> article received and the topic-based impact factor is not better. For
> example, when I retrieve all research-relevant (journal articles, reviews)
> "asbestos-paper" (asbestos as main subject-heading) from MEDLINE
> (publication years 1994 and 1995) and the citations subsequently received
> (from SCISEARCH/SOCIAL SCISEARCH) I can draw similiarly shaped graphs as
> Figure 1 and Figure 2 in the Seglen-paper. The data are as follows:
>
> No. of papers with asbestos as main heading, years 1994/1995:    289
> No. of citations received in 1996:                               253
IF=0.89
> No. of citations received from 1994 to 1999:                    1312
IF=4.5
>
> If I look for numbers of citations received by individual articles, then I
> find for the citing year 1996:
>
>  No. of Papers cited         No. of Citations
>
>       168                          0
>        62                        1
>        30                          2
>         9                          3
>        10                          4
>         4                          5
>         2                          6
>         1                          7
>         2                          8
>         1                          9
>
> 168 papers (58 percent) are not cited in 1996. Only 121 papers (42
percent)
> are cited. This would give an IF (cited papers only) of 2.1.
> Starting with the highest cited paper, 22 percent of the cited papers (9.6
> percent if all papers are considered) accumulate 50 percent of all
> citations, and 50 percent of the cited papers accumulate 76 percent of the
> citations (20.8 percent if all papers are considered).
>
> Looking for all citations received from 1994 until today, I find:
>
>
> No. of Papers cited              No. of Citations
>
>         82                          0
>         47                        1
>         22                           2
>         18                          3
>         17                          4
>         19                          5
>         13                          6
>         11                          7
>          9                          8
>          8                          9
>          3                         10
>          4                         11
>          1                         12
>          6                         13
>          5                         14
>          1                         15
>          4                         16
>          2                         17
>          2                       18
>          4                       19
>          2                         20
>          3                         23
>          2                         25
>          1                         26
>          1                         33
>          1                         35
>
> 82 papers (28.4 percent) are not cited at all. 207 papers (71.6 percent)
> are cited. This would give an IF (cited papers only) of 6.3.
> Starting with the highest cited paper, 18 percent of the cited papers
(12.8
> percent if all papers are considered) accumulate 50 percent of all
> citations, and 50 percent of the cited papers accumulate 84 percent of the
> citations (35.6 percent if all papers are considered).
>
> Thus, the questions remains if there is any "evaluation value" in citation
> counting as far as single articles are considered.
>
> 2. I see another problem in the definition of a topic. It is not difficult
> to build more specific topics, e.g. by separating papers on epidemiology
> (of occupational dis.) into classes defined by region/country, and a paper
> dealing with the situtation in France might accumulate more citations from
> France-based research than from other countries. We could end up in topics
> constituted by single articles.
>
> 3. It is possible, of course, not only to link MEDLINE but also EMBASE,
> BIOSIS, SCISEARCH itself and other databases with the citation databases
> SCISEARCH/SOCIAL SCISEARCH.
>
>
> Johannes Stegmann
>
> -------------------------------------------------------
> Dr. Johannes Stegmann      Univ. Hospital Benjamin Franklin
> Free University Berlin     Medical Library
> stegmann at ukbf.fu-berlin.de Hindenburgdamm 30
> Tel.: +49 30 8445 2035     D-12200 Berlin
> Fax:  +49 30 8445 4454     Germany
>   Homepage:  http://www.medizin.fu-berlin.de/medbib/home.html
>



More information about the SIGMETRICS mailing list