Open Access Metrics: Use REF2014 to Validate Metrics for REF2020

David Wojick dwojick at CRAIGELLACHIE.US
Thu Dec 18 10:57:40 EST 2014

This does sound interesting, Stevan, especially if you got an unexpected 
result. But I doubt it would validate or invalidate any scientometric 
predictors. It is basically a decision model for a single organization 
going through a more or less single, albeit complex, decision exercise. To 
begin with, it is just one organization. Then too, simple multiple 
regression seems like a very crude way to derive such a model. The large 
number of factors is also a concern, as others have noted, especially if we 
are trying to establish causality. I would think that the more factors used 
the less credible the result. But then we also need to think that we have 
all the significant factors, don't we? Perhaps not. Are there useful 
precedents for this? Finally, is all the needed data available and how much 
might this cost?

I guess that if I were peer reviewing this as a preliminary proposal I 
would be positive but not enthusiastic. More information is needed about 
the proposed project and its goals.


At 07:23 AM 12/18/2014, you wrote:
>Adminstrative info for SIGMETRICS (for example unsubscribe): 
>>On Dec 18, 2014, at 3:39 AM,  [name deleted because posted off-list]  wrote:
>>that's very high dimensionality in that equation.
>I don’t think 30 metric predictors for about 6% of the planet’s annual 
>research output  (UK) is such an under-fit.
>(But we could start with the most likely metrics first, and then see how 
>much variance is accounted for by adding more.)
>>you don't have enough data points to have any decent confidence about 
>>those weights - i
>That cannot be stated in advance. First we need to calculate the multiple 
>regression on the REF2014 rankings and determine how much each metric 
>>suggest you look at the REF data
 and see how many different 
>>journal/venues and all over the ACM Classification hierarchy, the 7000 
>>odd outputs appeared in - you'll find in any given venue, topic you 
>>rarely have more than a handful of items - your variance will be terrible
>The proposal is not to assess the predictive power of any one of the 4 
>publications submitted.
>The REF2014 peer rankings themselves are based on peers (putatively) 
>re-reading those 4 pubs per researcher, but the regression equation I 
>sketched is based on (OA) data that go far beyond that.
>(In point of fact, it’s absurd and arbitrary to base the REF assessment 
>on just 4 papers in a 6-year stretch. That restriction is dictated by the 
>demands of the peers having to read all those papers, but open-access 
>metrics can be harvested and have no such human bottleneck constraint on 
>them. What you could complain, legitimately, is that not all those 
>potential data are OA yet... Well, yes — and that’s part of the point.)
>REF2020Rank =
>w1(pubcount) + w2(JIF) + w3(cites) +w4(art-age) + w5(art-growth) + 
>w6(hits) + w7(cite-peak-latency) + w8(hit-peak-latency) + w9(citedecay) + 
>w10(hitdecay) + w11(hub-score) + w12(authority+score) + w13(h-index) + 
>w14(prior-funding) +w15(bookcites) + w16(student-counts) + w17(co-cites + 
>w18(co-hits) + w19(co-authors) + w20(endogamy) + w21(exogamy) + 
>w22(co-text) + w23(tweets) + w24(tags), + w25(comments) + w26(acad-likes) 
>etc. etc.
>>and the result of munging all those _different_ distributions into one 
>>single model will be to prssure people to move their work areas to the 
>>best fit topic/venue, which is not a true measure of anything desired by 
>>us of HEFCE or <>RC.UK to my knowledge.
>I cannot fathom what one, two, three or N things a researcher can do in 
>order to maximize their score on the above equation (other than to try to 
>do good, important, useful work
>>please do the detailed work

>Will try. But there a few details you need to get straight too
>>On Wed, Dec 17, 2014 at 3:38 PM, Stevan Harnad 
>><<mailto:harnad at>harnad at> wrote:
>>>On Dec 17, 2014, at 9:54 AM, Alan Burns 
>>><<mailto:alan.burns at YORK.AC.UK>alan.burns at YORK.AC.UK> wrote:
>>>Those that advocate metrics have never, to at least my satisfaction, 
>>>answered the
>>>argument that accuracy in the past does not mean effectiveness in the 
>>>once the game has changed.
>>I recommend Bradley on metaphysics and Hume on 
>>man who is ready to prove that metaphysical knowledge is wholly 
 is a brother metaphysician with a rival theory” Bradley, F. 
>>H. (1893) Appearance and Reality
>>One could have asked the same question about apples continuing to fall 
>>down in future, rather than up.
>>Yes, single metrics can be abused, but not only van abuses be named and 
>>shamed when detected, but it become harder to abuse metrics when they are 
>>part of a multiple, inter-correlated vector, with disciplinary profiles 
>>on their normal interactions: someone dispatching a robot to download his 
>>papers would quickly be caught out when the usual correlation between 
>>downloads and later citations fails to appear. Add more variables and it 
>>gets even harder,
>>>Even if one was able to define a set of metrics that perfectly matches 
>>>The announcement that these metric would be used in REF2020 would
>>>immediately invalidate there use.
>>In a weighted vector of multiple metrics like the sample I had listed, 
>>it’s no use to a researcher if told that for REF2020 the mertic 
>>equation will be the following, with the following weights for their 
>>particular discipline:
>>w1(pubcount) + w2(JIF) + w3(cites) +w4(art-age) + 
>>w5(art-growth)  w6(hits) +w7(cite-peak-latency) + w8(hit-peak-latency) 
>>+w9(citedecay) +w10(hitdecay) + w11(hub-score) + w12(authority+score) + 
>>w13(h-index) + w14(prior-funding) +w15(bookcites) + w16(student-counts) + 
>>w17(co-cites + w18(co-hits) + w19(co-authors) + w20(endogamy) + 
>>w21(exogamy) + w22(co-text) + w23(tweets) + w24(tags), +w25(comments) + 
>>w26(acad-likes) etc. etc.
>>The potential list could be much longer, and the weights can be positive 
>>or negative, and varying by discipline.
>>man who is ready to prove that metric knowledge is wholly impossible
>>a brother metrician with rival metrics
>>>if you wanted to do this properly, you should have to take a lot of 
>>>outputs that were NOT submitted and run any metric scheme on them as 
>>>well as those submitted.
>>>>too late:)
>You would indeed — and that’s why it all has to be made OA

>>>>On Wed, Dec 17, 2014 at 2:26 PM, Stevan Harnad 
>>>><<mailto:harnad at>harnad at> wrote:
>>>>Steven Hill of HEFCE has posted “an overview of the work HEFCE are 
>>>>currently commissioning which they are hoping will build a robust 
>>>>evidence base for research assessment” in LSE Impact Blog 12(17) 2014 
>>>>for REFlection: HEFCE look ahead to provide rounded evaluation of the REF
>>>>Let me add a suggestion, updated for REF2014, that I have made before 
>>>>Scientometric predictors of research performance need to be validated 
>>>>by showing that they have a high correlation with the external 
>>>>criterion they are trying to predict. The UK Research Excellence 
>>>>Framework (REF) -- together with the growing movement toward making the 
>>>>full-texts of research articles freely available on the web -- offer a 
>>>>unique opportunity to test and validate a wealth of old and new 
>>>>scientometric predictors, through multiple regression analysis: 
>>>>Publications, journal impact factors, citations, co-citations, citation 
>>>>chronometrics (age, growth, latency to peak, decay rate), hub/authority 
>>>>scores, h-index, prior funding, student counts, co-authorship scores, 
>>>>endogamy/exogamy, textual proximity, download/co-downloads and their 
>>>>chronometrics, tweets, tags, etc.) can all be tested and validated 
>>>>jointly, discipline by discipline, against their REF panel rankings in 
>>>>REF2014. The weights of each predictor can be calibrated to maximize 
>>>>the joint correlation with the rankings. Open Access Scientometrics 
>>>>will provide powerful new means of navigating, evaluating, predicting 
>>>>and analyzing the growing Open Access database, as well as powerful 
>>>>incentives for making it grow faster.
>>>>Harnad, S. (2009) <>Open Access 
>>>>Scientometrics and the UK Research Assessment Exercise. Scientometrics 
>>>>79 (1) Also in Proceedings of 11th Annual Meeting of the International 
>>>>Society for Scientometrics and Informetrics 11(1), pp. 27-33, Madrid, 
>>>>Spain. Torres-Salinas, D. and Moed, H. F., Eds.  (2007)
>>>>See also:
>>>>Only Substitute for Metrics is Better Metrics (2014)
>>>>Metrics and Metaphysics (2008)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the SIGMETRICS mailing list