Open Access Metrics: Use REF2014 to Validate Metrics for REF2020

Stevan Harnad harnad at ECS.SOTON.AC.UK
Thu Dec 18 14:18:26 EST 2014


> On Dec 18, 2014, at 12:58 PM, David Wojick <dwojick at CRAIGELLACHIE.US> wrote:
> 
> Regarding the organization, I thought you were trying to match the REF rankings.

Yes, I am.

> Those were produced by a single organization, using a specific decision process, not by all the researchers and universities whose work was submitted.

The decision process consisted of a panel of peers for each discipline who had to assess and rank the 4 papers per researcher from each university for research quality. (There were other considerations too, but I think everyone agrees that the bulk of the ranking was based on the 4 outputs per researcher per discipline per university.)

> Also, the credibility I am referring to is that of the analysis, not of the metrics you choose to use. You seem to be giving this analysis more credence than it probably deserves. As i said, multiple regression analysis is a crude approach to decision modeling.

The analysis consists of measuring the correlation of a battery of metrics with the REF rankings.

The analysis has not been done. I merely proposed the method. So I am not sure what it is that is being given "more credence than it probably deserves.”  (It’s certainly not my proposal to do this analysis that is getting "more credence than it probably deserves”: As I said, so far my proposal has been unheeded!)

Let’s reserve judgment on how crude an approach it will be until it is tried, and we see how well it can predict the REF rankings. After that we can talk about refining it further.

And it is not “decision modelling” that is being proposed, but the testing of the power of a set of metrics to predict the REF rankings.

But maybe the “analysis” whose credibility you are questioning is the REF peer ranking itself? But that’s not what’s at issue here! What is being proposed is to validate a metric battery so that if it proves to predict the peer rankings (such as they are, warts and all) sufficiently well, then it can replace (or at least supplement) them.

But those candidate metrics, until they are validated against some criterion, cannot have any credence at all: they are simply untested, unvalidated metrics. (This has not hitherto discouraged people from using them blindly as if they had been validated [e.g. the JIF], but we can’t do anything about that here! REF2014 provides an excellent opportunity to test and validate multiple metrics, at long last, weighing their independent predictive power.)

You don’t think the REF2014 peer rankings for all disciplines in all institutions in all of the UK is a sufficiently good criterion against which to validate the metrics? Then please propose an alternative criterion. But not a hypothetical alternative that is not even as available as the REF rankings and what metric and OA data we have so far. An alternative that is as readily doable as what we have in hand, with REF2014.

(This is where you can help out by backing the most effective OA policy for the US federal agencies, based on the evidence, so that those policies can then generate the OA that will maximize the predictive power of the metrics that depend on OA.)

> I do not see what any of this has to do with OA policy, especially US policy, just because you want to do some computations based on the REF results. And it sounds like you cannot do them because the metrical data is not available. It is a possibly interesting experiment, but that is all as far as I can see, not a reason to make or change policies.

I stated exactly what it has to do with OA policy: Many of these potential metrics are unavailable or only partially available because the research publications are not OA. This means that the proposed analysis will underestimate the power of metrics because the underlying data is only partly available.

Effective OA policies will generate that missing OA, maximizing the predictive power of the metrics. 

(By the way, the analysis we have used to test and validate the metrics that predict the effectiveness of OA policies is very similar to the analysis I have proposed to test and validate the metrics that predict the REF rankings.)

Harnad, S. (2009) Open Access Scientometrics and the UK Research Assessment Exercise <http://eprints.ecs.soton.ac.uk/17142/>. Scientometrics 79 (1) Also in Proceedings of 11th Annual Meeting of the International Society for Scientometrics and Informetrics 11(1), pp. 27-33, Madrid, Spain. Torres-Salinas, D. and Moed, H. F., Eds. (2007) 

Gargouri, Y, Lariviere, V, Gingras, Y, Brody, T, Carr, L and Harnad, S (2012) Testing the Finch Hypothesis on Green OA Mandate Ineffectiveness. Open Access Week 2012 http://eprints.soton.ac.uk/344687/ <http://eprints.soton.ac.uk/344687/>
Vincent-Lamarre, Philippe, Boivin, Jade, Gargouri, Yassine, Larivière, Vincent and Harnad, Stevan (2014) Estimating Open Access Mandate Effectiveness: I. The MELIBEA Score. <http://eprints.soton.ac.uk/370203/> (under review) http://eprints.soton.ac.uk/370203/ <http://eprints.soton.ac.uk/370203/>

> At 12:39 PM 12/18/2014, you wrote:
>> Adminstrative info for SIGMETRICS (for example unsubscribe): http://web.utk.edu/~gwhitney/sigmetrics.html <http://web.utk.edu/~gwhitney/sigmetrics.html> On Dec 18, 2014, at 10:57 AM, David Wojick <dwojick at CRAIGELLACHIE.US <mailto:dwojick at CRAIGELLACHIE.US> > wrote:
>>> 
>>> This does sound interesting, Stevan, especially if you got an unexpected result.
>> 
>> The objective is actually not to get an unexpected result, David, but to generate a battery of metrics that predicts the actual REF2014 peer ranking as closely as possible, so that in REF2020 it can be the metrics rather than the peers that do the ranking.
>> 
>>> But I doubt it would validate or invalidate any scientometric predictors.
>> 
>> A high correlation would certainly validate the REF battery, for the REF.
>> 
>>> It is basically a decision model for a single organization going through a more or less single, albeit complex, decision exercise. To begin with, it is just one organization.
>> 
>> All researchers, at all UK institutions, in each discipline, is a “single organization”? 
>> 
>> (To paraphrase an erstwhile UK researcher: "some organization!" "some singularity!")
>> 
>> The UK does 6-11% of the world’s research. Not a bad sample, I’d say, for a first pass at validating those metrics.
>> 
>>> Then too, simple multiple regression seems like a very crude way to derive such a model.
>> 
>> Simple multiple regression is a natural first step. (I agree that after that more sophisticated analyses will be possible too.)
>> 
>>> The large number of factors is also a concern, as others have noted, especially if we are trying to establish causality.
>> 
>> For the REF, all you need is predictivity. But I agree that causality too is important, and with continuous assessment instead of just stratified post-hoc sampling, it will be possible to make much more powerful use of the time domain.
>> 
>> (I don’t think a starting battery of 30 metrics would be too many -- far from it. But some of them will prove to have low or no Beta weights. That’s why metric validation is an empirical exercise.)
>> 
>>> I would think that the more factors used the less credible the result.
>> 
>> The credibility of each metric will be the proportion of the total variance that it accounts for. It is an empirical question whether a few metrics will account for the lion’s share of the variance, and the rest will have negligibly small or no weights.
>> 
>>> But then we also need to think that we have all the significant factors, don't we? Perhaps not. Are there useful precedents for this?
>> 
>> I am certain that my back-of-the-matchbox list of candidate metrics was neither exhaustive nor optimal. It was just indicative. All other credible candidates are welcome!
>> 
>> REF2020Rank = 
>> 
>> w1(pubcount) + w2(JIF) + w3(cites) +w4(art-age) + w5(art-growth) + w6(hits) + w7(cite-peak-latency) + w8(hit-peak-latency) + w9(citedecay) + w10(hitdecay) + w11(hub-score) + w12(authority+score) + w13(h-index) + w14(prior-funding) +w15(bookcites) + w16(student-counts) + w17(co-cites + w18(co-hits) + w19(co-authors) + w20(endogamy) + w21(exogamy) + w22(co-text) + w23(tweets) + w24(tags), + w25(comments) + w26(acad-likes) etc. etc.
>> 
>> 
>> 
>>> Finally, is all the needed data available and how much might this cost?
>> 
>> The REF2014 data <http://www.ref.ac.uk/> were released today and are available at once, for testing against metrics, discipline by discipline.
>> 
>> What’s still very sparse and gappy is the availability of the 26 OA metrics sketched above — and that’s because a lot of the source material is not yet OA. The proprietary databases (like WoS and SCOPUS) are not OA either. But if the papers were all OA, then the metrics could all easily be harvested and calculated from them.
>> 
>>> I guess that if I were peer reviewing this as a preliminary proposal I would be positive but not enthusiastic. More information is needed about the proposed project and its goals.
>> 
>> I wasn’t actually counting on your recommendation for peer review of the proposal to validate metrics against REF2014, David: I was rather hoping it might help inspire you to recommend the right OA policy model to OSTI <http://openaccess.eprints.org/index.php?serendipity%5Baction%5D=search&serendipity%5BsearchTerm%5D=wojick&serendipity%5BsearchButton%5D=%3E>  for which you consult. That way we would have a better hope of making the all-important OA data available when President Obama’s OSTP directive is implemented...
>> 
>>> At 07:23 AM 12/18/2014, you wrote:
>>>> Adminstrative info for SIGMETRICS (for example unsubscribe): http://web.utk.edu/~gwhitney/sigmetrics.html <http://web.utk.edu/~gwhitney/sigmetrics.html> 
>>>>> On Dec 18, 2014, at 3:39 AM,  [name deleted because posted off-list]  wrote:
>>>>> 
>>>>> that's very high dimensionality in that equation.
>>>> 
>>>> I don’t think 30 metric predictors for about 6% of the planet’s annual research output  (UK) is such an under-fit.
>>>> 
>>>> (But we could start with the most likely metrics first, and then see how much variance is accounted for by adding more.)
>>>>  
>>>>> you don't have enough data points to have any decent confidence about those weights - i
>>>> 
>>>> That cannot be stated in advance. First we need to calculate the multiple regression on the REF2014 rankings and determine how much each metric contributes.
>>>> 
>>>>> suggest you look at the REF dataÂ
>>>>>  and see how many different journal/venues and all over the ACM Classification hierarchy, the 7000 odd outputs appeared in - you'll find in any given venue, topic you rarely have more than a handful of items - your variance will be terrible
>>>> 
>>>> The proposal is not to assess the predictive power of any one of the 4 publications submitted. 
>>>> 
>>>> The REF2014 peer rankings themselves are based on peers (putatively) re-reading those 4 pubs per researcher, but the regression equation I sketched is based on (OA) data that go far beyond that. 
>>>> 
>>>> (In point of fact, it’s absurd and arbitrary to base the REF assessment on just 4 papers in a 6-year stretch. That restriction is dictated by the demands of the peers having to read all those papers, but open-access metrics can be harvested and have no such human bottleneck constraint on them. What you could complain, legitimately, is that not all those potential data are OA yet... Well, yes — and that’s part of the point.)
>>>> 
>>>> REF2020Rank = 
>>>> 
>>>> w1(pubcount) + w2(JIF) + w3(cites) +w4(art-age) + w5(art-growth) + w6(hits) + w7(cite-peak-latency) + w8(hit-peak-latency) + w9(citedecay) + w10(hitdecay) + w11(hub-score) + w12(authority+score) + w13(h-index) + w14(prior-funding) +w15(bookcites) + w16(student-counts) + w17(co-cites + w18(co-hits) + w19(co-authors) + w20(endogamy) + w21(exogamy) + w22(co-text) + w23(tweets) + w24(tags), + w25(comments) + w26(acad-likes) etc. etc.
>>>> 
>>>>> and the result of munging all those _different_ distributions into one single model will be to prssure people to move their work areas to the best fit topic/venue, which is not a true measure of anything desired by us of HEFCE or RC.UK <http://rc.uk/> to my knowledge.
>>>> 
>>>> I cannot fathom what one, two, three or N things a researcher can do in order to maximize their score on the above equation (other than to try to do good, important, useful workÂ
>>>> ).
>>>> 
>>>>> please do the detailed workÂ
>>>>> 
>>>> 
>>>> Will try. But there a few details you need to get straight tooÂ
>>>>  (<:3
>>>> 
>>>>> 
>>>>> On Wed, Dec 17, 2014 at 3:38 PM, Stevan Harnad <harnad at ecs.soton.ac.uk <mailto:harnad at ecs.soton.ac.uk> > wrote:
>>>>>> 
>>>>>> On Dec 17, 2014, at 9:54 AM, Alan Burns <alan.burns at YORK.AC.UK <mailto:alan.burns at YORK.AC.UK>> wrote:
>>>>>> Those that advocate metrics have never, to at least my satisfaction, answered the
>>>>>> argument that accuracy in the past does not mean effectiveness in the future,
>>>>>> once the game has changed.
>>>>> I recommend Bradley on metaphysics and Hume on induction <http://plato.stanford.edu/entries/induction-problem/>:
>>>>> " The man who is ready to prove that metaphysical knowledge is wholly impossibleÂ
>>>>>  is a brother metaphysician with a rival theory <https://www.goodreads.com/quotes/1369088-the-man-who-is-ready-to-prove-that-metaphysical-knowledge>” Bradley, F. H. (1893) Appearance and Reality
>>>>> One could have asked the same question about apples continuing to fall down in future, rather than up.
>>>>> Yes, single metrics can be abused, but not only van abuses be named and shamed when detected, but it become harder to abuse metrics when they are part of a multiple, inter-correlated vector, with disciplinary profiles on their normal interactions: someone dispatching a robot to download his papers would quickly be caught out when the usual correlation between downloads and later citations fails to appear. Add more variables and it gets even harder,
>>>>> 
>>>>>> Even if one was able to define a set of metrics that perfectly matches REF2014.
>>>>>> The announcement that these metric would be used in REF2020 would
>>>>>> immediately invalidate there use.
>>>>> 
>>>>> In a weighted vector of multiple metrics like the sample I had listed, it’s no use to a researcher if told that for REF2020 the mertic equation will be the following, with the following weights for their particular discipline:
>>>>> 
>>>>> w1(pubcount) + w2(JIF) + w3(cites) +w4(art-age) + w5(art-growth)  w6(hits) +w7(cite-peak-latency) + w8(hit-peak-latency) +w9(citedecay) +w10(hitdecay) + w11(hub-score) + w12(authority+score) + w13(h-index) + w14(prior-funding) +w15(bookcites) + w16(student-counts) + w17(co-cites + w18(co-hits) + w19(co-authors) + w20(endogamy) + w21(exogamy) + w22(co-text) + w23(tweets) + w24(tags), +w25(comments) + w26(acad-likes) etc. etc.
>>>>> 
>>>>> 
>>>>> The potential list could be much longer, and the weights can be positive or negative, and varying by discipline.
>>>>> 
>>>>> " The man who is ready to prove that metric knowledge is wholly impossibleÂ
>>>>>  is a brother metrician with rival m <https://www.goodreads.com/quotes/1369088-the-man-who-is-ready-to-prove-that-metaphysical-knowledge>etricsÂ
>>>>> ”
>>>>> 
>>>>> 
>>>>> if you wanted to do this properly, you should have to take a lot of outputs that were NOT submitted and run any metric scheme on them as well as those submitted.
>>>>>> 
>>>>>> too late:)
>>> 
>>> You would indeed — and that’s why it all has to be made OAÂ
>>> 
>>> 
>>>>>> On Wed, Dec 17, 2014 at 2:26 PM, Stevan Harnad <harnad at ecs.soton.ac.uk <mailto:harnad at ecs.soton.ac.uk> > wrote:
>>>>>> Steven Hill of HEFCE has posted “an overview of the work HEFCE are currently commissioning which they are hoping will build a robust evidence base for research assessment” in LSE Impact Blog 12(17) 2014 entitled Time for REFlection: HEFCE look ahead to provide rounded evaluation of the REF <http://blogs.lse.ac.uk/impactofsocialsciences/2014/12/17/time-for-reflection/>
>>>>>> Let me add a suggestion, updated for REF2014, that I have made before (unheeded):
>>>>>> Scientometric predictors of research performance need to be validated by showing that they have a high correlation with the external criterion they are trying to predict. The UK Research Excellence Framework (REF) -- together with the growing movement toward making the full-texts of research articles freely available on the web -- offer a unique opportunity to test and validate a wealth of old and new scientometric predictors, through multiple regression analysis: Publications, journal impact factors, citations, co-citations, citation chronometrics (age, growth, latency to peak, decay rate), hub/authority scores, h-index, prior funding, student counts, co-authorship scores, endogamy/exogamy, textual proximity, download/co-downloads and their chronometrics, tweets, tags, etc.) can all be tested and validated jointly, discipline by discipline, against their REF panel rankings in REF2014. The weights of each predictor can be calibrated to maximize the joint correlation with the rankings. Open Access Scientometrics will provide powerful new means of navigating, evaluating, predicting and analyzing the growing Open Access database, as well as powerful incentives for making it grow faster.
>>>>>> Harnad, S. (2009) Open Access Scientometrics and the UK Research Assessment Exercise <http://eprints.ecs.soton.ac.uk/17142/>. Scientometrics 79 (1) Also in Proceedings of 11th Annual Meeting of the International Society for Scientometrics and Informetrics 11(1), pp. 27-33, Madrid, Spain. Torres-Salinas, D. and Moed, H. F., Eds.  (2007) 
>>>>>> See also:
>>>>>> The Only Substitute for Metrics is Better Metrics <http://openaccess.eprints.org/index.php?/archives/1136-The-Only-Substitute-for-Metrics-is-Better-Metrics.html> (2014)
>>>>>> and
>>>>>> On Metrics and Metaphysics <http://openaccess.eprints.org/index.php?/archives/479-On-Metrics-and-Metaphysics.html> (2008)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/sigmetrics/attachments/20141218/4232ee74/attachment.html>


More information about the SIGMETRICS mailing list