Uncitedness and self-citations

Quentin L. Burrell familyburrell at ENTERPRISE.NET
Tue Dec 5 07:50:22 EST 2000


Recent list contributions on these two topics have been interesting but no-one seems to have suggested a possible connection. I know that it is unwise to speculate in the absence of data, but I have a suspicion that one of the possible prime reasons for self-citations is to avoid uncitedeness, or at least low-citedness.

Before he complains, let me say that I would fully agree with Eric Ackerman that this just fuels the "supposition and speculation" surrounding self-citations except that I am suggesting that self-citation can be a defensive as well as an aggressive strategy in the citation game. Mike Koenig reports apocryphally that high peer rating is highly correlated with self-citation, but this is not surprising. What is disconcerting is where self-citations are the sole citations or at least predominate a paper's citations.

It seems to me that in citation analyses one should certainly not simply omit self-citations but that they could in some way be discounted. If a paper receives 100 citations, 80 of which are self-citations, is its "impact" not less than one whose 100 citations include ony 20 self-citations? One simple way to build this in would be by discounting self-citations according to the degree to which they dominate the citations as follows:

If a paper receives N citations, of which a proportion p are self-citations, then its discounted citation score (DCS)  is (1-p)N + (1-p)pN, so that each of the (1-p)N non-self-citations is given full weight but each of the pN self-citations is discounted by a factor (1-p).
In the examples above, 20 non-self-citations + 80 self-citations gives, since p=0.8 here, a DCS of  0.2x100 + 0.2x0.8x100 = 20 + 16 = 36.
Similarly in the converse mix, 80 non-self-citations + 20 self-citations gives (p=0.2) a DCS of 0.8x100 + 0.8x 0.2x100 = 80 + 16 = 96.
 A 50-50 mix would lead to a DCS of 75, all 100 non-self-citations a DCS of 100 and all 100 self-citations a DCS of 0.

Note that, so far as calculations are concerned, it is easiest to write
DCS = (1-p)N + (1-p)pN = (1-p)(1+p)N = (1-p^2)N

I would be interested to know what list members think of these scores or, more to the point, has anyone investigated this kind of thing before?

As a final point, I would concur with Ronald Rousseau that in citation analyses one needs to refer to the particular database and, most importantly, to the time period. Incorporating the time parameter to develop a stochastic model of citation accumulation would be most interesting. Again could people point me in the right direction if this has already been done (possibly by Ronald himself?).

Best wishes

Quentin Burrell






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/sigmetrics/attachments/20001205/021a637a/attachment.html>


More information about the SIGMETRICS mailing list