Open Access Metrics: Use REF2014 to Validate Metrics for REF2020

Gopal T V gopal at ANNAUNIV.EDU
Fri Dec 19 19:40:18 EST 2014

Dear Dr. Stevan Harnad,

Many thanks for the mail.

The need for Metrics is well appreciated. The efforts of eminent scholars
like you in this area are fantastic.

Coming from the domain of Software Engineering, I read the terms
Verification and Validation as follows.

Verification: Are we building the metric system right ?

[As of now, this thread is aiming to understand this aspect. Assurances
will emerge on the process of maturation of this system].

Validation: Are we building the right metric system ?

[This is based on the overall purpose for which we need this system is
built. My observations are from this standpoint].

I subscribe to Open access (OA) which primarily means unrestricted online
access to peer-reviewed scholarly research. I have following REF2014 and
the efforts are appreciated.

If a paper in "X" language is translated into "Y" language by a translator
[author(s) are only acknowledged], what will be the difference in the
rankings of these versions ?

It is your call..if we need to take this forward off-line..

Thank you again for your mail.

Warmest Regards

Gopal T V
0 98401 21302
Dr. T V Gopal
Department of Computer Science and Engineering
College of Engineering
Anna University
Chennai - 600 025, INDIA
Ph : (Off) 22351723 Extn. 3340
      (Res) 24454753
Home Page :

On Sat, December 20, 2014 4:56 am, Stevan Harnad wrote:
> Adminstrative info for SIGMETRICS (for example unsubscribe):
> On Dec 19, 2014, at 4:11 PM, Gopal T V <gopal at> wrote:
>> I view the metrics [Parameters & Weightages are well appreciated] as a
>> means of positioning the research results globally with a "quality
>> indicator" leaving the facilitation to ensure the Identity & Vision of a
>> given Institution to its very own governance models.
>> Metrics do not automatically imply "knowing" both Science and the
>> Scientist [the generator of this Science] together for there is a
>> disconnect between the formula and the reality. If this is not assured,
>> we
>> once again stare at the path towards the IPR.
>> IMHO, Metrics help or indicate correctives to "sluggishness" than a
>> truly
>> means of observation or discovery.
>> As you would appreciate, Science has been expressed in a multitude of
>> languages and it may have to be that way as well.
> Dear Professor Gopal,
> This is not a discussion of the merits of the principle of research
> assessment in general or the UK’s REF2014.
> Nor is it a discussion of the merits of using metrics.
> It is a discussion of the proposal to validate multiple metrics against
> the REF2014 peer rankings, as criterion, with a view to the possibility of
> substituting the metrics for the peer rankings in future, or at least
> supplementing them, if the predictive power of the metrics proves
> sufficiently high.
> (Note that the REF2014 rankings were not arrived at via metrics, but via
> peer panel rankings in each of 36 “units of assessment” (roughly,
> disciplines) for 154 UK universities.)
> You ask:
>> How are translations of original works traced in "Open Access” ?
> I don’t understand your question. Whether the text is in the original
> language or in translation, if it is made OA, it is OA, if not, not.
>> Is there a way of tracing the "mutations" in the progressive expressions
>> of research results ?
> That is one of the possibilities of OA metrics — if the mutations are
> all made OA. (Same for linking texts with translations.)
> Best wishes,
> Stevan Harnad
>> On Fri, December 19, 2014 10:03 pm, Stevan Harnad wrote:
>>> Adminstrative info for SIGMETRICS (for example unsubscribe):
>>> On Dec 19, 2014, at 5:06 AM, Jon Crowcroft <jon.crowcroft at>
>> wrote:
>>>> I can see you might want 30 params to fit 7500 + Papers - what I am
>> saying is that the _noise_ will be dreadful and also there are
>>>> systematic reasons which I outlined that mean for low citation/low
>> paper
>>>> count areas, you will have almost no fit at all
>>> 1. Actually, it’s not 7500 papers that are being fitted but the
>> rankings
>>> for about 154 institutions x 36 units of assessment (fields) = 5544.
>>> 2. You are right that 30 parameters is a lot for each unit of
>>> assessment
>> analyzed separately for its 154 rank data-points. Many of the weaker
>> metrics will probably have near zero weights, but the strong ones, and
>> the
>>> weight of their contributions, will be estimated.
>>> 3. For the low-paper, low-citation fields, the analysis will be
>> comparing
>>> likes with likes (and paper-counts and citation counts are not the only
>> potential metrics).
>>> I think much of the noise will be coming from the missing metrics
>> because
>>> of the missing OA.
>>>> the _obvious_ thing people will do is to move all their research to
>> areas which do have a good fit
>>> Another solution is to subdivide units of assessments more finely,
>> allowing low-count subfields to compete only among themselves, and then
>> recombine them giving each subfield an a-priori weight in the combined
>> total for the unit of assessment.
>>>> predictable research isn't (research)
>>> Agreed. (And I am not defending the REF per se, just trying to make the
>> best of it, by testing and validating metrics to supplement or replace
>> costly, time-consuming panel review.)
>>>> bad idea - sorry, I just fundamentally disagree about this
>>>> approach…
>>> The REF, or the metric fitting of the peer rankings?
>>>> I don't dispute you can (over) fit the data….
>>> The objective is not just to fit the data, but to test how well the
>> metrics can predict the peer rankings, initialize their weights, and use
>> them to supplement or replace future peer rankings.
>>> (Of course their predictive power can also be tested by split-half
>> comparisons within the REF2014 sample; and of course the initial weights
>> can continue to be updated across time based on further peer rankings or
>> other criteria.)
>>>> On Thu, Dec 18, 2014 at 11:23 AM, Stevan Harnad
>>>> <harnad at
>> <mailto:harnad at>> wrote:
>>>>> On Dec 18, 2014, at 3:39 AM,  [name deleted because posted off-list]
>> wrote:
>>>>> that's very high dimensionality in that equation.
>>>> I don’t think 30 metric predictors for about 6% of the
>>>> planet’s
>> annual research output  (UK) is such an under-fit.
>>>> (But we could start with the most likely metrics first, and then see
>> how
>>>> much variance is accounted for by adding more.)
>>>>> you don't have enough data points to have any decent confidence about
>> those weights - i
>>>> That cannot be stated in advance. First we need to calculate the
>> multiple regression on the REF2014 rankings and determine how much each
>> metric contributes.
>>>>> suggest you look at the REF data… and see how many different
>> journal/venues and all over the ACM Classification hierarchy, the 7000
>> odd outputs appeared in - you'll find in any given venue, topic you
>> rarely have more than a handful of items - your variance will be
>> terrible
>>>> The proposal is not to assess the predictive power of any one of the 4
>> publications submitted.
>>>> The REF2014 peer rankings themselves are based on peers (putatively)
>> re-reading those 4 pubs per researcher, but the regression equation I
>> sketched is based on (OA) data that go far beyond that.
>>>> (In point of fact, it’s absurd and arbitrary to base the REF
>>>> assessment on just 4 papers in a 6-year stretch. That restriction is
>> dictated by the demands of the peers having to read all those papers,
>> but open-access metrics can be harvested and have no such human
>> bottleneck constraint on them. What you could complain, legitimately,
>> is
>>>> that not all those potential data are OA yet... Well, yes — and
>> that’s part of the point.)
>>>> REF2020Rank =
>>>> w1(pubcount) + w2(JIF) + w3(cites) +w4(art-age) + w5(art-growth) +
>> w6(hits) + w7(cite-peak-latency) + w8(hit-peak-latency) + w9(citedecay)
>> + w10(hitdecay) + w11(hub-score) + w12(authority+score) + w13(h-index)
>> +
>>>> w14(prior-funding) +w15(bookcites) + w16(student-counts) +
>>>> w17(co-cites
>> + w18(co-hits) + w19(co-authors) + w20(endogamy) + w21(exogamy) +
>> w22(co-text) + w23(tweets) + w24(tags), + w25(comments) +
>>>> w26(acad-likes) etc. etc.
>>>>> and the result of munging all those _different_ distributions into
>>>>> one
>> single model will be to prssure people to move their work areas to the
>> best fit topic/venue, which is not a true measure of anything desired
>> by us of HEFCE or RC.UK <> to my knowledge.
>>>> I cannot fathom what one, two, three or N things a researcher can do
>>>> in
>> order to maximize their score on the above equation (other than to try
>> to do good, important, useful work…).
>>>>> please do the detailed work…
>>>> Will try. But there a few details you need to get straight too…
>>>> (<:3
>>>>> On Wed, Dec 17, 2014 at 3:38 PM, Stevan Harnad
>>>>> <harnad at
>> <mailto:harnad at>> wrote:
>>>>>> On Dec 17, 2014, at 9:54 AM, Alan Burns <alan.burns at YORK.AC.UK
>> <mailto:alan.burns at YORK.AC.UK>> wrote:
>>>>>> Those that advocate metrics have never, to at least my satisfaction,
>> answered the
>>>>>> argument that accuracy in the past does not mean effectiveness in
>>>>>> the
>> future,
>>>>>> once the game has changed.
>>>>> I recommend Bradley on metaphysics and Hume on induction
>>>>> <>:
>>>>> "The man who is ready to prove that metaphysical knowledge is wholly
>> impossible… is a brother metaphysician with a rival theory
>>>>> <>”
>> Bradley, F. H. (1893) Appearance and Reality
>>>>> One could have asked the same question about apples continuing to
>>>>> fall
>> down in future, rather than up.
>>>>> Yes, single metrics can be abused, but not only van abuses be named
>> and
>>>>> shamed when detected, but it become harder to abuse metrics when they
>> are part of a multiple, inter-correlated vector, with disciplinary
>> profiles on their normal interactions: someone dispatching a robot to
>> download his papers would quickly be caught out when the usual
>> correlation between downloads and later citations fails to appear. Add
>> more variables and it gets even harder,
>>>>>> Even if one was able to define a set of metrics that perfectly
>> matches
>>>>>> REF2014.
>>>>>> The announcement that these metric would be used in REF2020 would
>> immediately invalidate there use.
>>>>> In a weighted vector of multiple metrics like the sample I had
>>>>> listed,
>> it’s no use to a researcher if told that for REF2020 the mertic
>> equation will be the following, with the following weights for their
>> particular discipline:
>>>>> w1(pubcount) + w2(JIF) + w3(cites) +w4(art-age) + w5(art-growth)
>> w6(hits) +w7(cite-peak-latency) + w8(hit-peak-latency) +w9(citedecay)
>> +w10(hitdecay) + w11(hub-score) + w12(authority+score) + w13(h-index)
>> +
>>>>> w14(prior-funding) +w15(bookcites) + w16(student-counts) +
>> w17(co-cites
>>>>> + w18(co-hits) + w19(co-authors) + w20(endogamy) + w21(exogamy) +
>> w22(co-text) + w23(tweets) + w24(tags), +w25(comments) +
>>>>> w26(acad-likes) etc. etc.
>>>>> The potential list could be much longer, and the weights can be
>> positive or negative, and varying by discipline.
>>>>> "The man who is ready to prove that metric knowledge is wholly
>> impossible… is a brother metrician with rival m
>>>>> <>etrics…”
>>>>>> if you wanted to do this properly, you should have to take a lot of
>> outputs that were NOT submitted and run any metric scheme on them as
>> well as those submitted.
>>>>>>> too late:)
>>>> You would indeed — and that’s why it all has to be made
>>>> OA…
>>>>>>> On Wed, Dec 17, 2014 at 2:26 PM, Stevan Harnad
>>>>>>> <harnad at <mailto:harnad at>> wrote:
>> Steven Hill of HEFCE has posted “an overview of the work HEFCE are
>> currently commissioning which they are hoping will build a robust
>> evidence base for research assessment” in LSE Impact Blog 12(17)
>> 2014 entitled Time for REFlection: HEFCE look ahead to provide
>> rounded evaluation of the REF
>>>>>>> <>
>> Let me add a suggestion, updated for REF2014, that I have made
>> before
>>>>>>> (unheeded):
>>>>>>> Scientometric predictors of research performance need to be
>> validated
>>>>>>> by showing that they have a high correlation with the external
>> criterion they are trying to predict. The UK Research Excellence
>> Framework (REF) -- together with the growing movement toward making
>> the full-texts of research articles freely available on the web --
>> offer a unique opportunity to test and validate a wealth of old and
>> new scientometric predictors, through multiple regression analysis:
>> Publications, journal impact factors, citations, co-citations,
>> citation chronometrics (age, growth, latency to peak, decay rate),
>> hub/authority scores, h-index, prior funding, student counts,
>> co-authorship scores, endogamy/exogamy, textual proximity,
>>>>>>> download/co-downloads and their chronometrics, tweets, tags, etc.)
>> can all be tested and validated jointly, discipline by discipline,
>> against their REF panel rankings in REF2014. The weights of each
>> predictor can be calibrated to maximize the joint correlation with
>> the rankings. Open Access Scientometrics will provide powerful new
>> means of navigating, evaluating, predicting and analyzing the
>> growing
>>>>>>> Open Access database, as well as powerful incentives for making it
>> grow faster.
>>>>>>> Harnad, S. (2009) Open Access Scientometrics and the UK Research
>> Assessment Exercise <>.
>> Scientometrics 79 (1) Also in Proceedings of 11th Annual Meeting of
>> the International Society for Scientometrics and Informetrics 11(1),
>> pp. 27-33, Madrid, Spain. Torres-Salinas, D. and Moed, H. F., Eds.
>> (2007)
>>>>>>> See also:
>>>>>>> The Only Substitute for Metrics is Better Metrics
>>>>>>> <>
>> (2014)
>>>>>>> and
>>>>>>> On Metrics and Metaphysics
>>>>>>> <>
>> (2008)

More information about the SIGMETRICS mailing list