Validating Open Access Metrics for RAE 2008

Sun Aug 19 10:52:30 EDT 2007

                             ** Cross-Posted **

On 9-Aug-07, at 7:22 AM, [identity deleted] wrote:

> I have been commissioned to write a news story for [publication
> name deleted] inspired by your post of 12 July on the American
> Scientist Forum regarding the HEFCE's RAE being out of touch.
>    http://users.ecs.soton.ac.uk/harnad/Hypermail/Amsci/6542.html
>
> I would welcome your comments on this, especially on how you
> consider the RAE may be out of touch on the wider issue of OA as
> well as the CD/PDF issue.

It is not that the RAE is altogether out of touch. First let me count
the things that they are doing right:

      (1) It is a good idea to have a national research performance
      evaluation to monitor and reward research productivity and progress.
      Other countries will be following and eventually emulating the UK's
      lead. (Australia is already emulating it.)
      http://openaccess.eprints.org/index.php?/archives/226-guid.html

      (2) It is also a good idea to convert the costly, time-consuming,
      and wasteful (and potentially biased) panel-based RAE of past years
      to an efficient, unbiased metric RAE, using objective measures
      that can be submitted automatically online, with the panel's
      role being only to monitor and fine-tune. This way the RAE will
      no longer take UK researchers' precious time away from actually
      doing UK research in order to resubmit and locally "re-peer-review"
      work that has already been submitted, published and peer-reviewed,
      in national and international scholarly and scientific journals.
      http://www.ariadne.ac.uk/issue35/harnad/

But, as with all policies that are being shaped collectively by
disparate (and sometimes under-informed) policy-making bodies, a few
very simple and remediable flaws in the reformed RAE system have gone
detected and hence uncorrected. They can still be corrected, and I
hope they will be, as they are small, easily fixed flaws, but, if
left unfixed, they will have huge negative consequences, compromising
the RAE as well as the RAE reforms:

      (a) The biggest flaw concerns the metrics that will be used. Metrics
      first have to be tested and validated, discipline by discipline, to
      ensure that they are valid indicators of research performance. Since
      the UK has relied on the RAE panel evaluations for 2 decades, and
      since the last RAE (2008) before conversion to metrics is to be a
      parallel panel/metrics exercise, the natural thing to do is to test
      as many candidate metrics as possible in this exercise, and to cross-
      validate them against the rankings given by the panels, separately,
      in each discipline. (Which metrics are valid performance indicators
      will differ from discipline to discipline.)

All indications so far are that this cross-validation exercise is *not*
what RAE 2008 and HEFCE are planning to do. Instead, there is a focus
on a few pre-selected metrics, rather than the very rich spectrum of
potential metrics that could be tested. The two main pre-selected
metrics are (i) prior research funding and (ii) citation counts.

(i) Prior research funding has already been shown to be extremely highly
correlated with the RAE panel rankings in a few (mainly scientific)
disciplines, but this was undoubtedly because the panels, in making their
rankings, already had those metrics in hand, hence could themselves have
been explicitly counting them in making their judgments! Now, although a
correlation between metrics and panel rankings is desirable initially,
because that is the way to launch and validate the choice of metrics,
in the case of this particular metric there is not only a potential
interaction, indeed a bias, that makes the two (the metric and the panel
ranking) non-independent, and hence invalidates the test of this metric's
validity, but there is another, even deeper reasoning for not putting
a lot of weight on the prior-funding metric:

The UK has a Dual System for research funding: (A) competitive
individual researcher project proposals and (B) the RAE panel
rankings (awarding top-sliced research funding to University
Departments, based on their research performance). The prior-funding
metric is determined largely by (A). If it is also given a heavy
weight in (B) then that is not improving the RAE [i.e., (B)]: It is
merely collapsing the UK's Dual System into (A) alone, and doing away
with the RAE altogether. As if this were not bad enough, the prior-
funding metric is not even a valid metric for many of the RAE
disciplines.

(ii) Citations counts are a much better potential candidate metric.
Indeed, in many of the RAE disciplines, citation counts have already been
tested and shown to be correlated with the panel rankings, although not
nearly as highly correlated as prior funding (in those few disciplines
where prior funding is indeed highly correlated). The somewhat
weaker correlation in the case of the citation metric is a good thing,
because it leaves room for other metrics to contribute to the assessment
outcome too. It is unlikely, and undesirable, to expect performance
evaluation to be based on a single metric. But citation counts are
certainly a strong candidate for serving as a particularly important
one among the array of many metrics to be validated and used in future
RAEs. Citation counts also have the virtue that they were not explicitly
available to the RAE panels when they made their rankings (indeed,
it was explicitly forbidden to submit or count citations). So their
correlation with the RAE panel rankings is a genuine empirical correlation
rather than an explicit bias.

So the prior-funding metric (i) needs to be used cautiously, to avoid
bias and self-fulfilling prophecy, and the citation-count metric (ii)
is a good candidate, but only one of many potential metrics that can
and should be tested in the parallel RAE 2008 metric/panel exercise.

(Other metrics include co-citation counts, download counts, download
and citation growth and longevity counts, hub/authority scores,
interdisciplinarity scores, and many other rich measures for which
RAE 2008 is the ideal time to do the testing and validation,
discipline by disciplines -- as it is virtually certain that
disciplines will differ in which metrics are predictive for them, and
what the weightings of each metric should be.)

       Harnad, S. (2007) Open Access Scientometrics and the UK Research
       Assessment Exercise. In Proceedings of 11th Annual Meeting of the
       International Society for Scientometrics and Informetrics 11(1), pp.
       27-33, Madrid, Spain. Torres-Salinas, D. and Moed, H. F., Eds.
       http://eprints.ecs.soton.ac.uk/13804/

       Shadbolt, N., Brody, T., Carr, L. and Harnad, S. (2006) The Open
       Research Web: A Preview of the Optimal and the Inevitable, in Jacobs,
       N., Eds. Open Access: Key Strategic, Technical and Economic Aspects,
       Chandos. http://eprints.ecs.soton.ac.uk/12453/

       Brody, T., Carr, L., Gingras, Y., Hajjem, C., Harnad, S. and
       Swan, A. (2007) Incentivizing the Open Access Research Web:
       Publication-Archiving, Data-Archiving and Scientometrics. CTWatch
       Quarterly 3(3).
       http://eprints.ecs.soton.ac.uk/14418/01/ctwatch.html

Yet it looks as if RAE 2008 and HEFCE are not currently planning to
commission this all-important validation exercise of metrics against
panel rankings for a rich array of candidate metrics. This is a huge
flaw and oversight, though it could still be easily remedied by going
ahead and doing such a systematic cross-validation study after all.

For such a systematic metric/panel cross-validation study in RAE
2008, however, the array of candidate metrics has to be made as rich
and diverse as possible. The RAE is not currently making any effort
to collect as many potential metrics as possible in RAE 2008, and
this is partly because it is overlooking the growing importance of
online, Open Access metrics -- and indeed overlooking the growing
importance of Open Access itself, both in research productivity and
progress itself, and in evaluating it.

This brings us to the second flaw in HEFCE's RAE 2008 plans:

      (b) For no logical or defensible reason at all, RAE 2008 is insisting
      that researchers submit the publishers' PDFs for the 2008 exercise.

Now it is progress that RAE are accepting electronic drafts rather
than requiring hard copy, as in past years. But in insisting that the
electronic drafts must be the publisher's PDF, they create two unnecessary
problems.

One unnecessary problem, a minor one, is that the RAE imagines that
in order to have the publisher's PDF for evaluation, they need to
seek (or even pay for) permission from the publisher. This is
complete nonsense! *Researchers* (i.e., the authors)  submit their
own published work to the RAE for evaluation. For the researchers,
this is Fair Dealing (Fair Use) and no publisher permission or payment
whatsoever is needed. (As it happens, I believe HEFCE has worked out a
"special arrangement" whereby publishers "grant permission" and "waive
payment." But the completely incorrect notion that permission or payment
were even at issue, in principle, has an important negative consequence,
which I will now describe.)

What HEFCE should have done -- instead of mistakenly imagining that
it needed permission to access the papers of UK researchers for
research evaluation -- was to require researchers to deposit their
peer-reviewed, revised, accepted final drafts in their own
University's Institutional Repositories (IRs) for research
assessment. The HEFCE panels could then access them directly in the
IRs for evaluation.

This would have ensured that all UK research output was deposited in
each UK researcher's university IR. There is no publisher permission
issue for the RAE: The deposits can, if desired, be made Closed
Access rather than Open Access, so only the author, the employer and
the RAE panels can access the full text of the deposit. That is Fair
Dealing and requires absolutely no permission from anyone.

But, as a bonus, requiring the deposit of all UK research output (or
even just the 4 "best papers" that are currently the arbitrary limit
for RAE submissions) into the researcher's IR for RAE evaluation
would have ensured that 62% of those papers could immediately have
been made OA (because 62% of journals already endorse immediate OA
self-archiving)

       http://romeo.eprints.org/stats.php

And for the remaining 38% this would have allowed each IR's "Fair
Use" button to be used by researchers webwide to request an
individual email copy semi-automatically (with these "eprint
requests" provide a further potential metric, along with download
counts).

       http://openaccess.eprints.org/index.php?/archives/274-guid.html

Instead, HEFCE needlessly insisted on the publisher's PDF (which, by
the way, could likewise have been deposited by all authors in their
IRs, as Closed Access, without needing any permission from their
publishers) being submitted to RAE directly. This effectively cut off
not only a rich potential source of RAE metrics, but a powerful
incentive for providing OA, which has been shown, in itself, to
increase downloads and citations directly in all disciplines.

       http://opcit.eprints.org/oacitation-biblio.html

In summary, 2 good things -- (1) research performance itself, and (2)
conversion to metrics -- plus 2 bad things -- (3) failure to
explicitly provide for the systematic evaluation of a rich candidate
spectrum of metrics against the RAE 2008 panel rankings and (4)
failure to require deposit of the authors' papers in their own IRs,
to generate more OA metrics, more OA, and more UK research impact.

The good news is that there is still time to fully remedy (3) and (4)
if only policy-makers take a moment to listen, think it through, and
do the little that needs to be done to fix it.

I am hoping that this will still happen -- and even your article
could help make it happen!

Stevan Harnad

PS To allay a potential misunderstanding:

It is definitely *not* the case that the RAE panel rankings are themselves
infallible or face-valid! The panelists are potentially biased in many
ways. And RAE panel review was never really "peer review," because peer
review means consulting the most qualified specialists in the world
for each specific paper, whereas the panels are just generic UK panels,
evaluating  all the UK papers in their discipline: It is the journals
who already conducted the peer review.

So metrics are not just needed to put an end to the waste and the cost of
the existing RAE, but also to try to put the outcome on a more reliable,
objective, valid and equitable basis. The idea is not to *duplicate*
the outcome of the panels, but to improve it.

Nevertheless -- and this is the critical point -- the metrics *do*
have to be validated, and, as an essential first step, they have to be
cross-validated against the panel rankings, discipline by discipline. For
even though those panel rankings are and always were flawed, they are
what the RAE has been relying upon, completely, for 2 decades.

So the first step is to make sure that the metrics are chosen and
weighted, to get as close an approximation to the panel rankings as
possible, discipline by discipline. Then, and only then, can the "ladder"
of the panel-rankings -- which got us where we are -- be tossed away,
allowing us to rely on the metrics alone -- which can then be calibrated
and optimised in future years, with feedback from future meta-panels that
are monitoring the rankings generated by the metrics and, if necessary,
adjusting and fine-tuning the metric weights or even adding new,
still-to-be-discovered-and-tested metrics to them.

In sum: despite its warts, the current RAE panel rankings need to be used
to bootstrap the new metrics into usability. Without that prior validation
based on what has been used until now, the metrics are just hanging from
a skyhook and no one can say whether or not they measure what the RAE
panels have been measuring until now. Without validation, there is no
continuity in the RAE and it is not really a "conversion" to metrics,
but simply an abrupt switch to another, untested assessment tool.

(Citation counts have been tested elsewhere, in other fields, but as there
has never been anything of the scope and scale of the UK RAE, across all
disciplines in an entire country's research output, the prior patchwork
testing of citation counts as research performance indicators is nowhere
near providing the evidence that would be needed to make a reliable,
valid choice of metrics for the UK RAE: only cross-validation within
the RAE parallel metric/panel exercise itself can provide that kind of evidence,
and the requisite continuity for a smooth, rational transition from
panel rankings to metrics.)