Online Academic Abuses and the Power of Openness: Naming & Shaming

Tue Jul 31 10:02:48 EDT 2012

Sorry for the long delay in replying to this. I missed it, and it has just
been drawn to my attention:

On 10 April 2012 Gustaf Nelhans wrote:

Dear Professor Harnad,
>     I believe that it is not always easy to identify the motives behind
> specific instances of self references (although in the case at hand, the
> number of mutual citations identified seem to speak for themselves…). The
> practice of self citation is (as you acknowledge) not in itself a bad
> thing, but the problem is how to distinguish its legitimate use from its
> abuse.

Agreed. And in fact the outcome of tests comparing rankings and correlation
patterns based on total citation counts, and citation counts minus
self-citations tend to be very similar. Nevertheless, looking at
individuals with or without self-citations and comparing them to the
population norms can raise a red flag which can then be examined manually.

This is equally valid on the individual level as in editor-suggested
> references. I would like to draw into attention an exchange about these
> matters from 1997, where Eugene Garfield stated:
> “Recognising the reality of the Matthew effect, I believe that an editor
> is justified in reminding authors to cite equivalent references from the
> same journal, if only because readers of that journal presumably have ready
> access to it. To call this “manipulation” seems excessive unless the
> references chosen are irrelevant or mere window dressing.” (Garfield, Eugene.
> 1997. Editors are justified in asking authors to cite equivalent references
> from same journal. *BMJ* 314 (7096):1765.
> http://www.bmj.com/content/314/7096/1765.2.short )

Gene Garfield made this suggestion in 1997, before OA became a distinct
possibility. In a world where the only way to access articles is if your
institution can afford a subscription, "preferentially cite this journal"
might have had an ounce of validity -- alongside the obvious pound of
self-interest.

But no longer today.

An editor telling the author of an article to cite more articles in his
journal because readers have "more access" to it is outrageous. Rather, he
should tell authors to self-archive it (Green OA) if they really want to
make their articles more accessible.

    My question is if there could exist any method of identifying “bad
> apples” that does not account for the specific context in the article in
> which the reference is placed.

Only in a population statistical sense. Individual anomalies flagged by the
population metrics would still need to be examined manually.

But automated text-analytic tools may eventually also become sensitive
enough to make a contribution, sorting out some of the nature of the
citation from the accompanying text, not just from the
author/article/journal counts.

In my understanding of the problem, the proposed way of using statistical
> methods for identifying baselines for self citations in various fields
> could be one important step, but I wonder if it would suffice to make the
> identification process complete?

It is a necessary but not a sufficient condition for answering all the
kinds of questions one might have about uses and misuses of citations.

In statistics there is always, and necessarily, a difference between
population data and individual cases. Medical conditions are the best
illustration: I have an illness. I want to be treated for my illness, and
not for what, on average, works most often with people that have symptoms
most like mine. (See Kahneman & Tversky on the base rate
fallacy<http://en.wikipedia.org/wiki/Base_rate_fallacy>
.)

For citations, "bad" citations can be identified on a statistical basis,
comparing two populations of citations, and perhaps even one individual's
total citations as compared to the population norms, to see whether there
is something anomalous (such as excess self-citation swelling the citation
count).

But it won't tell you whether an individual citation is good or bad. It is
possible to develop and apply automated text-analytic algorithms to the
text surrounding a citation, to try to predict whether it is positive or
negative, and such algorithms can even be "trained up" with corrective
feedback based on human evaluation of whether each individual citation was
positive or negative.

But it's early days for both of these, and validating statistical
predictors will first take an awful lot of individual hand-validation in
order to test and improve the algorithms.

But for journals or individuals it is definitely possible to check
computationally whether they deviate from population norms/baselines, and
then look at the cases that the population anomaly detectors single out,
and check them manually to see whether they are indeed cases of bad faith,
legitimate practice, or just statistical anomalies.

Citation cartels (and many other systematic abuses) are more detectable if
the entire corpus is accessible precisely because everybody can detect
them: no need to wait to see whether proprietary database owners with other
interests get around to or see fit to provide the data needed to monitor
and detect abuses.

Global OA not only provides the open database, but it provides the
(continuous) open means of flagging anomalies in the population pattern,
checking them, and naming and shaming the cases where there really has been
willful misuse or abuse.

It's yet another potential application for crowd-sourcing.

Stevan Harnad

*Harnad, S. (2008) **Validating Research Performance Metrics Against Peer
Rankings* <http://www.int-res.com/abstracts/esep/v8/n1/p103-107/>*. **Ethics
in Science and Environmental Politics* 8 (11) doi:10.3354/esep00088  The
Use And Misuse Of Bibliometric Indices In Evaluating Scholarly
Performance   http://eprints.ecs.soton.ac.uk/15619/
**
*Harnad, S. (2009) **Open Access Scientometrics and the UK Research
Assessment Exercise* <http://eprints.ecs.soton.ac.uk/17142/>*.
Scientometrics 79 (1) *Also in *Proceedings of 11th Annual Meeting of the
International Society for Scientometrics and Informetrics* 11(1),
pp. 27-33, Madrid, Spain. Torres-Salinas, D. and Moed, H. F., Eds.  (2007)

*Harnad, S. (2009) **Multiple metrics required to measure research
performance*<http://openaccess.eprints.org/index.php?/archives/508-guid.html>
*.  *Nature (Correspondence) 457 (785) (12 February
2009)<http://www.nature.com/nature/journal/v457/n7231/full/457785a.html>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/sigmetrics/attachments/20120731/cca2d410/attachment.html>