Online Academic Abuses and the Power of Openness: Naming & Shaming

Tue Jul 31 17:44:59 EDT 2012

Dear Stevan: Some of your readers may not be able to access my 1997 letter to BMJ because you have posted a link that requires a subscription to the BMJ files. The proper link to use is
http://garfield.library.upenn.edu/papers/bmj14june1997.pdf

where I have posted the full text.

Unfortunately my comments have been distorted in some cases to justify deplorable excesses in the use of references to the same journal when I emphasized that such references should be relevant and not mere window dressing-- a blatant attempt to increase the impact factor of the journal in question. Best wishes. Gene Garfield

________________________________
From: ASIS&T Special Interest Group on Metrics [mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Stevan Harnad
Sent: Tuesday, July 31, 2012 10:03 AM
To: SIGMETRICS at LISTSERV.UTK.EDU
Subject: Re: [SIGMETRICS] Online Academic Abuses and the Power of Openness: Naming & Shaming

Sorry for the long delay in replying to this. I missed it, and it has just been drawn to my attention:

On 10 April 2012 Gustaf Nelhans wrote:
Dear Professor Harnad,
    I believe that it is not always easy to identify the motives behind specific instances of self references (although in the case at hand, the number of mutual citations identified seem to speak for themselves...). The practice of self citation is (as you acknowledge) not in itself a bad thing, but the problem is how to distinguish its legitimate use from its abuse.
Agreed. And in fact the outcome of tests comparing rankings and correlation patterns based on total citation counts, and citation counts minus self-citations tend to be very similar. Nevertheless, looking at individuals with or without self-citations and comparing them to the population norms can raise a red flag which can then be examined manually.
This is equally valid on the individual level as in editor-suggested references. I would like to draw into attention an exchange about these matters from 1997, where Eugene Garfield stated:
"Recognising the reality of the Matthew effect, I believe that an editor is justified in reminding authors to cite equivalent references from the same journal, if only because readers of that journal presumably have ready access to it. To call this "manipulation" seems excessive unless the references chosen are irrelevant or mere window dressing." (Garfield, Eugene. 1997. Editors are justified in asking authors to cite equivalent references from same journal. BMJ 314 (7096):1765. http://www.bmj.com/content/314/7096/1765.2.short )
Gene Garfield made this suggestion in 1997, before OA became a distinct possibility. In a world where the only way to access articles is if your institution can afford a subscription, "preferentially cite this journal" might have had an ounce of validity -- alongside the obvious pound of self-interest.

But no longer today.

An editor telling the author of an article to cite more articles in his journal because readers have "more access" to it is outrageous. Rather, he should tell authors to self-archive it (Green OA) if they really want to make their articles more accessible.
    My question is if there could exist any method of identifying "bad apples" that does not account for the specific context in the article in which the reference is placed.
Only in a population statistical sense. Individual anomalies flagged by the population metrics would still need to be examined manually.

But automated text-analytic tools may eventually also become sensitive enough to make a contribution, sorting out some of the nature of the citation from the accompanying text, not just from the author/article/journal counts.
In my understanding of the problem, the proposed way of using statistical methods for identifying baselines for self citations in various fields could be one important step, but I wonder if it would suffice to make the identification process complete?
It is a necessary but not a sufficient condition for answering all the kinds of questions one might have about uses and misuses of citations.

In statistics there is always, and necessarily, a difference between population data and individual cases. Medical conditions are the best illustration: I have an illness. I want to be treated for my illness, and not for what, on average, works most often with people that have symptoms most like mine. (See Kahneman & Tversky on the base rate fallacy<http://en.wikipedia.org/wiki/Base_rate_fallacy>.)

For citations, "bad" citations can be identified on a statistical basis, comparing two populations of citations, and perhaps even one individual's total citations as compared to the population norms, to see whether there is something anomalous (such as excess self-citation swelling the citation count).

But it won't tell you whether an individual citation is good or bad. It is possible to develop and apply automated text-analytic algorithms to the text surrounding a citation, to try to predict whether it is positive or negative, and such algorithms can even be "trained up" with corrective feedback based on human evaluation of whether each individual citation was positive or negative.

But it's early days for both of these, and validating statistical predictors will first take an awful lot of individual hand-validation in order to test and improve the algorithms.

But for journals or individuals it is definitely possible to check computationally whether they deviate from population norms/baselines, and then look at the cases that the population anomaly detectors single out, and check them manually to see whether they are indeed cases of bad faith, legitimate practice, or just statistical anomalies.

Citation cartels (and many other systematic abuses) are more detectable if the entire corpus is accessible precisely because everybody can detect them: no need to wait to see whether proprietary database owners with other interests get around to or see fit to provide the data needed to monitor and detect abuses.

Global OA not only provides the open database, but it provides the (continuous) open means of flagging anomalies in the population pattern, checking them, and naming and shaming the cases where there really has been willful misuse or abuse.

It's yet another potential application for crowd-sourcing.

Stevan Harnad

Harnad, S. (2008) Validating Research Performance Metrics Against Peer Rankings<http://www.int-res.com/abstracts/esep/v8/n1/p103-107/>. Ethics in Science and Environmental Politics 8 (11) doi:10.3354/esep00088  The Use And Misuse Of Bibliometric Indices In Evaluating Scholarly Performance   http://eprints.ecs.soton.ac.uk/15619/

Harnad, S. (2009) Open Access Scientometrics and the UK Research Assessment Exercise<http://eprints.ecs.soton.ac.uk/17142/>. Scientometrics 79 (1) Also in Proceedings of 11th Annual Meeting of the International Society for Scientometrics and Informetrics 11(1), pp. 27-33, Madrid, Spain. Torres-Salinas, D. and Moed, H. F., Eds.  (2007)

Harnad, S. (2009) Multiple metrics required to measure research performance<http://openaccess.eprints.org/index.php?/archives/508-guid.html>.  Nature (Correspondence) 457 (785) (12 February 2009)<http://www.nature.com/nature/journal/v457/n7231/full/457785a.html>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/sigmetrics/attachments/20120731/14ceb0d3/attachment.html>