Continuous multi-metric research assessment
Stevan Harnad
harnad at ECS.SOTON.AC.UK
Fri Nov 16 10:56:54 EST 2007
On 16-Nov-07, at 9:33 AM, Jonathan Adams wrote:
> Stevan
> I think your enthusiasm is great, and long may it continue, but I am
> less certain about the transparency in your metrics utopia.
> There will of course be multiple metrics in the algorithm but
> ultimately
> they condense around the funding allocated. So, at the point where
> citation metrics are combined with different kinds of variable such as
> funding, they have to condense to a single number to be weighted
> against
> the other factors.
Jonathan:
"Condense on funding"? I am not sure what that means. If you have N
metric predictors, one of which is prior funding, you first
initialize them by multiply-regressing them on the criterion (the
panel rankings), to validate them. That gives you initial beta
weights on each of the N metrics. Each beta weight indicates what
percentage of the variation in the criterion (the panel rankings)
each metric can predict. Some metrics will have higher beta weights,
some lower, some none.
Then comes the adjustment of the weights. If it should turn out that
in some fields the prior-funding metric has a heavy beta weight, we
may want to reduce that, so as not to allow the RAE rank to just
become just a multiplier on the prior-funding level. This would
preserve the Dual Funding system. Otherwise, an RAE rank dominated by
the prior-funding rank would simply "condense" the dual funding
system into a single funding system.
Having initialized the beta weights by validating them against the
panel rankings, some fields may want to calibrate them further, not
just to down-weight prior funding but (to use your example) to up-
weight applications journal publications, rather than allow the high-
citation basic journal articles to dominate.
The initial panel rankings are the launching point, but after that,
the peer panels' role in continuous assessment should be to fine-tune
the weights on each variable, according to the peers' criteria for
the field. This is not done on an ad hoc basis, to favor one
institution or author over another (as institutions and authors are
sometimes wont to do, self-servingly) but in order to generate
weightings that are rational and equitable according to the peer
panel's judgment of the needs of the field. In the old, non-metric
RAE, the peer panels did all the ranking; in the new metric ranking,
they simply fine-tune the metrically generated rankings, by adjusting
the weights.
And yes, the fact that the assessment will be open, continuous and
multi-metric will not only be a source of information to all, but it
will expose and protect against abuse; and it will allow the
assessment system to be flexible and adaptive, based on objective
data patterns, dynamically tuned, rather than rigid and a-prioristic.
Best wishes,
Stevan Harnad
> Director, Evidence Ltd
> + 44 113 384 5680
>
>
> -----Original Message-----
> From: ASIS&T Special Interest Group on Metrics
> [mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Stevan Harnad
> Sent: 16 November 2007 14:17
> To: SIGMETRICS at LISTSERV.UTK.EDU
> Subject: [SIGMETRICS] Continuous multi-metric research assessment
>
> Adminstrative info for SIGMETRICS (for example unsubscribe):
> http://web.utk.edu/~gwhitney/sigmetrics.html
>
> On Fri, 16 Nov 2007, Jonathan Adams (Director, Evidence Ltd) wrote:
>
>> Stevan
>> Following on from your comments at the end of last week (below) I
> agree
>> that it is possible tentatively to pick out 'over-production' of poor
>> quality papers (although I am less optimistic about the comprehensive
>> analytical detection of publication abuse you foresee).
>
> Jonathan,
>
> I think you may be greatly underestimating (1) the power of
> multivariate
> (as opposed to univariate) analysis, validation and weighting as well
> as (2) the power of open access (i.e., online, public, pervasive,
> continuous, and dynamic) metrics.
>
> You get a completely different sense of what is possible, and how, if
> you think in terms of:
>
> (i) individual, isolated metrics, assessed at long intervals
> under
> closed scrutiny (like the current RAEs)
>
> or if you think instead in terms of:
>
> (ii) a large (possibly growing) battery of candidate metrics,
> assessed jointly and continuously rather than at long intervals,
> with the contribution of each metric to their joint predictive
> power
> initially validated against existing criteria that have been
> relied
> on
> before (such as the RAE panel rankings) and then updated
> dynamically,
> field by field, by adjusting the weights on each component metric
> -- and always under open scrutiny.
>
> Not only can "overproduction" of lightweight papers be detected and
> weighted by simply profiling on the joint relation between (say) the
> article count, the article citation count, the journal citation
> average
> ("impact factor") and the journal download count -- but so can other
> anomalous or abusive profiles be detected, exposed, and penalized
> and discouraged through weighting.
>
>> By contrast to over-production, do you think that an assessment
>> system
>> that looks at total output would run the risk of suppressing outputs
>> that might be predicted to be cited less frequently?
>
> Not unless it is decided (for some unknown a-priori reason!) that a
> profile consisting of N highly cited papers plus M less cited
> papers is
> to be given a lower weight that a profile consisting of N highly cited
> papers plus 0 less cited papers!
>
>> UK research assessment currently looks at four outputs per
>> researcher,
>> usually selected by the individual as their best research.
>
> That, of course, was a foolish, arbitrary constraint all along: It
> was (well-meaningly) intended to minimise both salami-slicing and the
> number of papers the panel would have to read. But of course
> continuous
> OA metrics solve both problems, as they can detect and weight the
> salami-slicing profile, and panel-reading (after the validation phase)
> is no longer a factor, except as a periodical higher-level check on
> the continuous, dynamic weightings and profiles. (So let all papers
> be considered, continuously, and let 1000 metrics bloom, under open
> peer scrutiny, and panel monitoring and weight calibration!)
>
>> The proposal is that post-2008 the metrics assessment would be of all
>> output, creating a profile and then deriving a metric derived from
>> that.
>
> "A" metric? Or a battery of metrics? (The "h-index" and its ilk are
> all
> examples of a-priori, unvalidated, fixed, 1-number metrics; what is
> needed is a rich multiple regression equation, with adjustable
> weights,
> validated initially against the 2001 and 2008 RAE panel rankings. You
> can add prewired metrics like the h-index to the battery, but don't
> use
> them *instead* of a weighted, multimetric battery.)
>
>> Is there a risk that researchers, realizing that outputs aimed at
>> practitioners often appear in relatively lower impact journals, would
>> then tend to reduce the number of papers they produced aimed at
>> transferring knowledge from the research base and concentrate on
> outputs
>> targeted at high-impact journals in the research-base core? They
>> would expect by doing so to avoid dilution of their citation average.
>
> This would be faulty reasoning on the part of researchers, if there
> were
> a continuous, multi-metric equation in place, with its weights being
> dynamically updated under peer scrutiny to detect and weight exactly
> this sort of practise!
>
> If applications are valued in a field, add application metrics: Are
> certain journals more applications oriented? Crank up their weight! Is
> it better to partition citations into basic vs. applied journals, with
> differential weights for citations in the one and the other in certain
> fields? Do so. Don't just think of a univariate measure (citations,
> or h-index) and how authors might bias that measure by altering the
> kind
> of journals they publish in, or the number of articles they submit for
> assessment! Think multivariately, dynamically, and openly.
>
> New applications metrics, besides journal types, might include
> downloads, or even (if possible) industrial IP downloads; patents are
> also metrics. Depending on the field, there will no doubt be other
> measurable, monitorable performance indicators for applications impact
> (and for teaching impact too!).
>
> It's not all about ways to bias one single citation metric, but about
> developing richer metrics. If the worry is about encouraging
> technology
> transfer and applications flow, find objective measures of it and plug
> it
> into the equation. Don't treat it as just a default bias, to be
> minimized
> by cutting down on metrics!
>
>> The net effect could be to reduce the UK's volume of less frequently
>> cited papers, but also to reduce information flow to the people who
>> turn research into practice.
>
> This is again univariate thinking. Yes, citation counts are important,
> but there are citations and citations. Basic citations, applied
> citations. Basic publications, applied publications. Not only do
> fields
> have to compare like with like, but their preferred blends can be
> weighted and rewarded accordingly.
>
> (This, by the way, is not "biasing", any more than mandating and
> rewarding
> publication itself is biasing: it is providing incentives for the kind
> of research performance we want, and that we want to reward.
> Continuous
> multivariate OA metrics allow preferred profiles to be rewarded and
> encouraged dynamically. Cheater detection allows self-citations,
> robotic
> or anomalous download inflation, salami-slicing, etc. to be detected,
> exposed and penalized. Metrics are not ends in themselves, they are
> merely objective performance correlates. They are easy to abuse
> singly,
> but much harder to abuse jointly, and in the open.)
>
> The UK's RAE is unique; so is its new conversion to metrics. The UK is
> hence leading the world in research metrics. Don't think cravenly in
> terms of how the UK will stack up in terms of existing, unvalidated,
> univariate metrics. Think in terms of establishing metric standards
> for
> the entire world research community in the metric OA era!
>
> Harnad, S., Carr, L., Brody, T. & Oppenheim, C. (2003)
> Mandated online RAE CVs Linked to University Eprint
> Archives: Improving the UK Research Assessment Exercise
> whilst making it cheaper and easier. Ariadne 35.
> http://www.ecs.soton.ac.uk/~harnad/Temp/Ariadne-RAE.htm
>
> Shadbolt, N., Brody, T., Carr, L. and Harnad, S. (2006) The Open
> Research Web: A Preview of the Optimal and the Inevitable, in
> Jacobs,
> N., Eds. Open Access: Key Strategic, Technical and Economic
> Aspects,
> chapter 21. Chandos. http://eprints.ecs.soton.ac.uk/12453/
>
> Harnad, S. (2007) Open Access Scientometrics and the UK Research
> Assessment Exercise. In Proceedings of 11th Annual Meeting of the
> International Society for Scientometrics and Informetrics 11(1),
> pp.
> 27-33, Madrid, Spain. Torres-Salinas, D. and Moed, H. F., Eds.
> http://eprints.ecs.soton.ac.uk/13804/
>
> Brody, T., Carr, L., Gingras, Y., Hajjem, C., Harnad, S. and
> Swan, A. (2007) Incentivizing the Open Access Research Web:
> Publication-Archiving, Data-Archiving and Scientometrics. CTWatch
> Quarterly 3(3). http://eprints.ecs.soton.ac.uk/14418/
>
> Stevan Harnad
>
>> Jonathan Adams
>>
>> Director, Evidence Ltd
>> + 44 113 384 5680
>>
>> Comment on: "Bibliometrics could distort research assessment"
>> Guardian Education, Friday 9 November 2007
>> http://education.guardian.co.uk/RAE/story/0,,2207678,00.html
>>
>> Yes, any system (including democracy, health care, welfare, taxation,
>> market economics, justice, education and the Internet) can be abused.
>> But
>> abuses can be detected, exposed and punished, and this is especially
>> true in the case of scholarly/scientific research, where "peer
>> review"
>> does not stop with publication, but continues for as long as research
>> findings are read and used. And it's truer still if it is all online
> and
>> openly accessible.
>>
>> The researcher who thinks his research impact can be spuriously
> enhanced
>> by producing many small, "salami-sliced" publications instead of
>> fewer
>> substantial ones will stand out against peers who publish fewer, more
>> substantial papers. Paper lengths and numbers are metrics too, hence
>> they too can be part of the metric equation. And if most or all peers
> do
>> salami-slicing, then it becomes a scale factor that can be factored
> out
>> (and the metric equation and its payoffs can be adjusted to
>> discourage
>> it).
>>
>> Citations inflated by self-citations or co-author group citations can
>> also be detected and weighted accordingly. Robotically inflated
> download
>> metrics are also detectable, nameable and shameable. Plagiarism is
>> detectable too, when all full-text content is accessible online.
>>
>> The important thing is to get all these publications as well as their
>> metrics out in the open for scrutiny by making them Open Access. Then
>> peer and public scrutiny -- plus the analytic power of the algorithms
>> and the Internet -- can collaborate to keep them honest.
>>
More information about the SIGMETRICS
mailing list