Enriching the Impact Regression Equation

Stevan Harnad harnad at ECS.SOTON.AC.UK
Mon Jan 17 08:36:59 EST 2005

On Mon, 17 Jan 2005, Loet Leydesdorff wrote:

> Dear Stevan,
> An additional consideration which you wish to pay attention to, is the
> so-called ecological fallacy: "What is true for trees, is not necessarily
> true for a wood." On the one hand, the aggregation of citations to
> individual papers (using your methods) does not necessarily lead to a good
> indicator for journals and, on the other hand, journal self-citations are
> very different from author self-citations.

Dear Loet, thanks for your reply.

You are right, but the advantage of a multple regression equation is
that it can be used for and validated (and regression-weights adjusted)
against whatever one likes. So if what one wants the impact equation
to predict is article/author research impact, one adjusts the weights
accordingly. If it is merely journal usage, one can adjust them otherwise.

> Journal indicators (e.g., impact factors) are defined at the level of
> journals. The journals themselves as organizers may play a role in their
> values. But I agree that one should preferably look at the distributions
> (the variance) instead of the mean given the skewness of the distributions.

More than the variance! The regression equation would add many other impact
indicators; and the specific recursive adjustment I proposed ("CiteRank")
would go well beyond mere variance and distribution of the citation counts

(Btw, is the Bach on http://users.fmg.uva.nl/lleydesdorff/list.htm
played by man or machine?)

Best wishes,


> With kind regards,
> Loet
> > -----Original Message-----
> > From: ASIS&T Special Interest Group on Metrics
> > [mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Stevan Harnad
> > Sent: Sunday, January 16, 2005 4:02 PM
> > Subject: [SIGMETRICS] Enriching the Impact Regression Equation
> >
> > In the OACI Leiden statement (if there is to be one)
> > http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/4082.html
> > the following constructive recommendations could perhaps be made:
> >
> > The 2-year average number of citations to a journal (i.e.,
> > the ISI impact
> > factor) is not meaningless and unpredictive, but merely a
> > needlessly crude measure of the impact of either an article,
> > an author or a journal.
> > It can be gretaly refined and improved.
> >
> > Apart from exact citation counts for articles (and authors),
> > and apart from avoiding the comparison of apples with oranges
> > (by making sure these measures are used in comparing like
> > with like), there are obvious ways that even journal impact
> > factors could be made far more accurate and representative of
> > true research impact.
> >
> > Right now, "like tends to cite like" in more ways than one!
> > Not only do articles in phytology tend to cite articles in
> > phytology, but average research tends to cite average
> > research! This means that there is necessarily a quanitative
> > citation bulge toward the middle (mean) of the distribution
> > that masks any far more important qualitative impact from the
> > smaller, higher quality tail-end of the distribution.
> >
> > There are at least five ways that this could be remedied --
> > and it makes no sense to wait for ISI, with their primary
> > need to pay more attention to market matters, to get around
> > to doing all this for us. A growing Open Access full-text
> > corpus can count on many talented and enterprising doctoral
> > students like Tim Brody doing this and more:
> >
> > (1) RECURSIVE "CiteRank": A recursive measure of citation of
> > citation weight could replace flat citation counting: If
> > article A cites article B, Article A's citation weight is not
> > 1 but a normalized multiple of
> > 1 based on the number of citations the *citing* article has
> > itself received. This would go some way toward replacing the
> > pure weight of numbers by a recursive measure of the weight
> > of the numbers (without ever yet leaving the circle of
> > citation counts themselves). Average work will lose some of
> > its strength-of-numbers unless it manages to draw citations
> > from above-average articles too (still in terms of citation counts).
> >
> > [This recursive technique is analogous to Google's PageRank,
> > hence could perhaps be called CiteRank; it is ironic that
> > Google got the idea of PageRank from citation ranking, but
> > then improved it, yet the improvement has not yet percolated
> > back to citation ranking, because ISI had no particular
> > motive to implement it -- perhaps even a disincentive, as it
> > might reduce the journal impact factor of the large, average
> > journals which are of necessity ISI's numerical mainstay!]
> >
> > (2) USAGE COUNTS: The circularity of citation counting can
> > also be broken in various ways. One is by adding download
> > counts to the impact measure, not as a weight on the citation
> > count, but as a second variable in a multiple regression
> > equation. We know  now from Tim Brody's findings that
> > downloads correlate with and hence predict citations. That
> > means citation counts plus download counts are better
> > predictors of impact than just citation counts alone, and are
> > especially good at correcting for early impact, which may not
> > yet be felt in the citation counts.
> > http://www.ecs.soton.ac.uk/~harnad/Temp/timbrody.new.doc
> >
> > (3) RATING SCORES: A more radical way to break out of the
> > circularity of citation counting can be done in two ways:
> > Systematic rating polls can easily be conducted, asking
> > researchers (by field and subfield) to rank the N most
> > important articles in their field in the past year (or two).
> > Even with the inevitable incest this will evoke, a good-sized
> > systematic sample will pick out the recurrent articles
> > (because, by definition, local-average mediocrity effects are
> > merely local) and then the rankings could either be used as
> > (3a) a third independent variable in the impact regression
> > equation or, perhaps more interestingly, as (3b) another
> > constraint on the weighting of the CiteRank score
> > (effectively making that weight the result of a 2nd order
> > regression equation based on the citer's citation count aas
> > well as on the citer's rating score: the download count could
> > also be used instead as a 3rd component in this 2nd order
> > regression). The result will be a still better adjustment of
> > the citation count for an article (and hence an adjustment of
> > the journal's average citation count too).
> >
> > (4) CO-CITATION & HUB-AUTHORITY SCORES: Although I would need
> > to consult with a statistician to sort it out optimally, I am
> > certain that co-citation (what article/author is co-cited
> > with what article/author) can also be used to correct or add
> > to the impact regression equation. So, I expect, could a hub
> > (fan-in) and authority (fan-out) score, as well as a better
> > use of citation latency (ISI's "immediacy factor") in the
> > impact equation.
> >
> > (5) AUTHOR/JOURNAL SELF-CITATIONS: Another clean-up factor
> > for citation counts is of course the elimination of
> > self-citations, which would be interesting not only for
> > author self-citations, but also journal
> > self-citations: This ttoo might be added as another pair of
> > variables in the regression equation (self-citation score and
> > journal self-citation score), with the weight adjusting
> > itself, as the variable's proves its predictivity.
> >
> > The predictivity and validity of the regression equation
> > should of course also be actively tested and calibrated by
> > validating it against
> > (a) later citation impact, (b) subjective impact ratings (2,
> > above), (c) other impact measures such as prizes, funding,
> > and time-line descendents that are further than one
> > citation-step away (A is cited by B, B is cited by C: this
> > could be an uncited credit to A...)
> >
> > And all of this is without even mentioning full-text
> > "semantic" analysis.
> > So the potential world of impact analysis is a rich and
> > diverse one. Let us not be parochial, focussing only on the
> > limits of the ISI 2-year average journal citation-count that
> > has become so mindlessly overused by libraries and assessors.
> > Let us talk instead about the positive horizons OA opens up!
> >
> > Cheers, Stevan
> >

More information about the SIGMETRICS mailing list