Enriching the Impact Regression Equation

Quentin L. Burrell quentinburrell at MANX.NET
Mon Jan 17 16:10:25 EST 2005


I congratulate Stevan and Loet for demonstrating the potential value of
SIGMETRICS as a discussion forum - something I have pleaded for in the past.

Can I chip in a few brief comments on the current theme.

I feel that trying to get people away from impact factors as an adequate
measure is similar to getting politicians away from quoting "the average
family". Variability and statistical distributions are completely ignored
with the result that much information is disposed of/lost/ignored.

Loet notes the skewness of the distributions and advocates use of variance,
but unfortunately this is based on second moments whereas skewness is based
on third moments. Hence at least the first three moments of the distribution
are relevant. Indeed, I would argue that the entire distribution is of
importance.

Stevan seems to say that multiple regression models sort everything out.
Multiple regression was not a viable technique before the advent of
computers. Now it is all too viable, giving "fits" to models whether or not
the models are appropriate. I feel that a certain caution should be
exercised in analyses that are purely data analysis based and rather more
attention paid to the theoretical assumptions on which the models are based.

My own inclination is to a model building rather than a data stripping
approach. So far as impact is concerned, there is an important (in my view!)
paper by Frandsen and Rousseau (JASIST 56, 58-62) that dissects the various
components that go to determine impact. I am currently working on a
stochastic formulation of  the same sort of issues that I hope will shed
some light on the mean/var/skew/"shape"/"concentration" aspects of impact
factors.

Quentin

Dr Quentin L Burrell
Isle of Man International Business School
The Nunnery
Old Castletown Road
Douglas
Isle of Man IM9 4EX
via United Kingdom

q.burrell at ibs.ac.im

www.ibs.ac.im


----- Original Message -----
From: "Loet Leydesdorff" <loet at LEYDESDORFF.NET>
To: <SIGMETRICS at listserv.utk.edu>
Sent: Monday, January 17, 2005 5:55 PM
Subject: Re: [SIGMETRICS] Enriching the Impact Regression Equation


> Yes, Stevan, I agree that one should go in the direction of multiple
> regression analysis or structural equation models if one wishes to explain
> journal impact. See as a nice example:
>
> Weiping Yue and Concepcion S. Wilson, "An Integrated Approach for the
> Analysis of Factors Affecting Journal Citation Impact in Clinical
> Neurology," Proceedings ASIST 2004, pp. 527 ff.
>
> Note that the ecological fallacy may sometimes change not only the size,
> but
> also the sign of the regression coefficients.
>
> With kind regards,
>
>
> Loet
>
>> -----Original Message-----
>> From: ASIS&T Special Interest Group on Metrics
>> [mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Stevan Harnad
>> Sent: Monday, January 17, 2005 2:37 PM
>> To: SIGMETRICS at LISTSERV.UTK.EDU
>> Subject: Re: [SIGMETRICS] Enriching the Impact Regression Equation
>>
>> On Mon, 17 Jan 2005, Loet Leydesdorff wrote:
>>
>> > Dear Stevan,
>> >
>> > An additional consideration which you wish to pay attention
>> to, is the
>> > so-called ecological fallacy: "What is true for trees, is not
>> > necessarily true for a wood." On the one hand, the aggregation of
>> > citations to individual papers (using your methods) does not
>> > necessarily lead to a good indicator for journals and, on the other
>> > hand, journal self-citations are very different from author
>> self-citations.
>>
>> Dear Loet, thanks for your reply.
>>
>> You are right, but the advantage of a multple regression
>> equation is that it can be used for and validated (and
>> regression-weights adjusted) against whatever one likes. So
>> if what one wants the impact equation to predict is
>> article/author research impact, one adjusts the weights
>> accordingly. If it is merely journal usage, one can adjust
>> them otherwise.
>>
>> > Journal indicators (e.g., impact factors) are defined at
>> the level of
>> > journals. The journals themselves as organizers may play a role in
>> > their values. But I agree that one should preferably look at the
>> > distributions (the variance) instead of the mean given the
>> skewness of the distributions.
>>
>> More than the variance! The regression equation would add
>> many other impact indicators; and the specific recursive
>> adjustment I proposed ("CiteRank") would go well beyond mere
>> variance and distribution of the citation counts too.
>>
>> (Btw, is the Bach on http://users.fmg.uva.nl/lleydesdorff/list.htm
>> played by man or machine?)
>>
>> Best wishes,
>>
>> Stevan
>>
>> > With kind regards,
>> >
>> >
>> > Loet
>> >
>> > > -----Original Message-----
>> > > From: ASIS&T Special Interest Group on Metrics
>> > > [mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Stevan Harnad
>> > > Sent: Sunday, January 16, 2005 4:02 PM
>> > > To: SIGMETRICS at LISTSERV.UTK.EDU
>> > > Subject: [SIGMETRICS] Enriching the Impact Regression Equation
>> > >
>> > > In the OACI Leiden statement (if there is to be one)
>> > > http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/4082.html
>> > > the following constructive recommendations could perhaps be made:
>> > >
>> > > The 2-year average number of citations to a journal
>> (i.e., the ISI
>> > > impact
>> > > factor) is not meaningless and unpredictive, but merely a
>> needlessly
>> > > crude measure of the impact of either an article, an author or a
>> > > journal.
>> > > It can be gretaly refined and improved.
>> > >
>> > > Apart from exact citation counts for articles (and authors), and
>> > > apart from avoiding the comparison of apples with oranges
>> (by making
>> > > sure these measures are used in comparing like with
>> like), there are
>> > > obvious ways that even journal impact factors could be
>> made far more
>> > > accurate and representative of true research impact.
>> > >
>> > > Right now, "like tends to cite like" in more ways than one!
>> > > Not only do articles in phytology tend to cite articles in
>> > > phytology, but average research tends to cite average
>> research! This
>> > > means that there is necessarily a quanitative citation
>> bulge toward
>> > > the middle (mean) of the distribution that masks any far more
>> > > important qualitative impact from the smaller, higher quality
>> > > tail-end of the distribution.
>> > >
>> > > There are at least five ways that this could be remedied
>> -- and it
>> > > makes no sense to wait for ISI, with their primary need
>> to pay more
>> > > attention to market matters, to get around to doing all
>> this for us.
>> > > A growing Open Access full-text corpus can count on many talented
>> > > and enterprising doctoral students like Tim Brody doing this and
>> > > more:
>> > >
>> > > (1) RECURSIVE "CiteRank": A recursive measure of citation of
>> > > citation weight could replace flat citation counting: If
>> article A
>> > > cites article B, Article A's citation weight is not
>> > > 1 but a normalized multiple of
>> > > 1 based on the number of citations the *citing* article
>> has itself
>> > > received. This would go some way toward replacing the
>> pure weight of
>> > > numbers by a recursive measure of the weight of the
>> numbers (without
>> > > ever yet leaving the circle of citation counts
>> themselves). Average
>> > > work will lose some of its strength-of-numbers unless it
>> manages to
>> > > draw citations from above-average articles too (still in terms of
>> > > citation counts).
>> > >
>> > > [This recursive technique is analogous to Google's
>> PageRank, hence
>> > > could perhaps be called CiteRank; it is ironic that
>> Google got the
>> > > idea of PageRank from citation ranking, but then improved it, yet
>> > > the improvement has not yet percolated back to citation ranking,
>> > > because ISI had no particular motive to implement it --
>> perhaps even
>> > > a disincentive, as it might reduce the journal impact
>> factor of the
>> > > large, average journals which are of necessity ISI's numerical
>> > > mainstay!]
>> > >
>> > > (2) USAGE COUNTS: The circularity of citation counting
>> can also be
>> > > broken in various ways. One is by adding download counts to the
>> > > impact measure, not as a weight on the citation count, but as a
>> > > second variable in a multiple regression equation. We
>> know  now from
>> > > Tim Brody's findings that downloads correlate with and
>> hence predict
>> > > citations. That means citation counts plus download counts are
>> > > better predictors of impact than just citation counts
>> alone, and are
>> > > especially good at correcting for early impact, which may
>> not yet be
>> > > felt in the citation counts.
>> > > http://www.ecs.soton.ac.uk/~harnad/Temp/timbrody.new.doc
>> > >
>> > > (3) RATING SCORES: A more radical way to break out of the
>> > > circularity of citation counting can be done in two ways:
>> > > Systematic rating polls can easily be conducted, asking
>> researchers
>> > > (by field and subfield) to rank the N most important articles in
>> > > their field in the past year (or two).
>> > > Even with the inevitable incest this will evoke, a good-sized
>> > > systematic sample will pick out the recurrent articles
>> (because, by
>> > > definition, local-average mediocrity effects are merely
>> local) and
>> > > then the rankings could either be used as
>> > > (3a) a third independent variable in the impact
>> regression equation
>> > > or, perhaps more interestingly, as (3b) another constraint on the
>> > > weighting of the CiteRank score (effectively making that
>> weight the
>> > > result of a 2nd order regression equation based on the citer's
>> > > citation count aas well as on the citer's rating score:
>> the download
>> > > count could also be used instead as a 3rd component in this 2nd
>> > > order regression). The result will be a still better
>> adjustment of
>> > > the citation count for an article (and hence an adjustment of the
>> > > journal's average citation count too).
>> > >
>> > > (4) CO-CITATION & HUB-AUTHORITY SCORES: Although I would need to
>> > > consult with a statistician to sort it out optimally, I
>> am certain
>> > > that co-citation (what article/author is co-cited with what
>> > > article/author) can also be used to correct or add to the impact
>> > > regression equation. So, I expect, could a hub
>> > > (fan-in) and authority (fan-out) score, as well as a
>> better use of
>> > > citation latency (ISI's "immediacy factor") in the impact
>> equation.
>> > >
>> > > (5) AUTHOR/JOURNAL SELF-CITATIONS: Another clean-up factor for
>> > > citation counts is of course the elimination of self-citations,
>> > > which would be interesting not only for author
>> self-citations, but
>> > > also journal
>> > > self-citations: This ttoo might be added as another pair of
>> > > variables in the regression equation (self-citation score and
>> > > journal self-citation score), with the weight adjusting
>> itself, as
>> > > the variable's proves its predictivity.
>> > >
>> > > The predictivity and validity of the regression equation
>> should of
>> > > course also be actively tested and calibrated by validating it
>> > > against
>> > > (a) later citation impact, (b) subjective impact ratings
>> (2, above),
>> > > (c) other impact measures such as prizes, funding, and time-line
>> > > descendents that are further than one citation-step away
>> (A is cited
>> > > by B, B is cited by C: this could be an uncited credit to A...)
>> > >
>> > > And all of this is without even mentioning full-text "semantic"
>> > > analysis.
>> > > So the potential world of impact analysis is a rich and
>> diverse one.
>> > > Let us not be parochial, focussing only on the limits of the ISI
>> > > 2-year average journal citation-count that has become so
>> mindlessly
>> > > overused by libraries and assessors.
>> > > Let us talk instead about the positive horizons OA opens up!
>> > >
>> > > Cheers, Stevan
>> > >
>> >
>>



More information about the SIGMETRICS mailing list