University Cross-Check on Thomson ISI Citation Metrics

Stevan Harnad harnad at ECS.SOTON.AC.UK
Wed Dec 19 05:52:20 EST 2007

         ** Cross-Posted **

On Wed, 19 Dec 2007, Armbruster, Chris wrote:

>> Subject: [SIGMETRICS] FW: GENERAL: accuracy of Thomson data
>> Incorrect journal abbreviations and non-ISI sources Citations
>> Calls for an audit of WoS data.
>  Would you trust the situation to improve if digital repositories
>  (institutional, disciplinary and/or national) were to provide data
>  in future?

Of course -- and particularly institutional repositories (IRs), since the
universities and research institutions themselves are the primary

>  One would possibly expect that a decentralised solution
>  would provide more comprehensive (types of publication, languages
>  etc.) and more accurate coverage,

Not because it was "decentralized" but because the authors' institutions
(not their journals!) are the primary content-providers and have a
direct stake in the discoverability, validity and attribution of their
own research output.

>  but one might also worry that the corpus will be less well defined....

How will it be less well defined? All journal articles -- their
full texts and metadata, *including their cited references* -- will be
deposited, tagged, harvestable, harvested, indexed and analyzed by
(open and transparent) software, globally. The reference lists of each
article will provide a redundant, distributed cross-check on all the
articles they cite, many times over. Central indexes of journals and
their contents (like Thomson ISI) will provide further cross-checks
on validity, and will be able to correct their own data against the 
primary OA database.

But the prerequisite for all of this is that the primary content must
be provided in the author's own institution's Open Access (OA) IR.

>  Hence, what would you think
>  if repositories developed a system of author registration (unique
>  identifier, institutional affiliation) and provided data?

It is an obvious and natural solution -- once all the primary content is
being systematically self-archived in the author's own OA IR. (Not while
only 15% of it is being haphazardly deposited willy-nilly -- in IRs,
Central Repositories, and on arbitrary websites.)

The way to ensure that all of this is systematically and reliably done is
for researchers' own institutions (and funders) to mandate the
self-archiving of their own published research output:

>  What is the scope for delivering scientometrics to the digital
>  workbench of scientists?  I have anecdotal evidence that review
>  panels (for major grants, tenure etc. - often very senior scientists)
>  routinely use software and search engines to look up the citation data
>  and indices of applicants and candidates.

All that is need is for research institutions and funders to mandate
that the all-important primary data itself be provided (by mandating
self-archiving). The rest (the harvesting and the software) will take
care of itself, many times over. It is that primary distributed
institutional OA database itself that is still missing today, and
urgently overdue.

>  If we were not to dismiss
>  this simply as evaluation mania, but to say that all scientists
>  (senior and junior) now need tools for metric research evaluation to
>  reduce complexity on an everyday basis (and develop strategies for
>  research, teaching, publishing and networking) - is scientometrics
>  developed enough to be a reliable tool?

What is not "developed enough" is university and research-funder
policy for exposing and managing their own research assets online --
for which the essential component is each researcher's institution's
own OA IR, reliably filled with each institution's own research article
output. Scientometrics is waiting to data-mine that OA corpus, once
universities (and funders) get around to doing the obvious (and already
overdue) thing in the online era: to mandate the deposit of their research
output in the researcher's OA IR.

>  Context: for the Max Planck Digital Library I am looking into the
>  potential of digital libraries and repositories for the generation,
>  collection and evaluation of scientometric data.

Splendid! And are the Max-Planck Institutes at long last getting around
to implementing their "Berlin Declaration" by mandating the deposit of
their own research output in their own IR (and making the IR OA)?

For some idea of how long this has been taking at the MPIs, Google: amsci ("max planck" OR mpi)

Brody, T., Carr, L., Gingras, Y., Hajjem, C., Harnad, S. and Swan, A.
(2007) Incentivizing the Open Access Research Web:
Publication-Archiving, Data-Archiving and Scientometrics. CTWatch
Quarterly 3(3).

Harnad, S. (2007) Open Access Scientometrics and the UK Research
Assessment Exercise. Proceedings of 11th Annual Meeting of the
International Society for Scientometrics and Informetrics 11(1) : 27-33,
Madrid, Spain. Torres-Salinas, D. and Moed, H. F., Eds.

Shadbolt, N., Brody, T., Carr, L. and Harnad, S. (2006) The Open
Research Web: A Preview of the Optimal and the Inevitable, in Jacobs,
N., Eds. Open Access: Key Strategic, Technical and Economic Aspects.

Stevan Harnad

If you have adopted or plan to adopt an policy of providing Open Access
to your own research article output, please describe your policy at:

     BOAI-1 ("Green"): Publish your article in a suitable toll-access journal
     BOAI-2 ("Gold"): Publish your article in an open-access journal if/when
     a suitable one exists.
     in BOTH cases self-archive a supplementary version of your article
     in your own institutional repository.

More information about the SIGMETRICS mailing list