Enriching the Impact Regression Equation

Mon Jan 17 01:45:17 EST 2005

Dear Stevan,

An additional consideration which you wish to pay attention to, is the
so-called ecological fallacy: "What is true for trees, is not necessarily
true for a wood." On the one hand, the aggregation of citations to
individual papers (using your methods) does not necessarily lead to a good
indicator for journals and, on the other hand, journal self-citations are
very different from author self-citations.

Journal indicators (e.g., impact factors) are defined at the level of
journals. The journals themselves as organizers may play a role in their
values. But I agree that one should preferably look at the distributions
(the variance) instead of the mean given the skewness of the distributions.

With kind regards,

Loet

> -----Original Message-----
> From: ASIS&T Special Interest Group on Metrics
> [mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Stevan Harnad
> Sent: Sunday, January 16, 2005 4:02 PM
> To: SIGMETRICS at LISTSERV.UTK.EDU
> Subject: [SIGMETRICS] Enriching the Impact Regression Equation
>
> In the OACI Leiden statement (if there is to be one)
> http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/4082.html
> the following constructive recommendations could perhaps be made:
>
> The 2-year average number of citations to a journal (i.e.,
> the ISI impact
> factor) is not meaningless and unpredictive, but merely a
> needlessly crude measure of the impact of either an article,
> an author or a journal.
> It can be gretaly refined and improved.
>
> Apart from exact citation counts for articles (and authors),
> and apart from avoiding the comparison of apples with oranges
> (by making sure these measures are used in comparing like
> with like), there are obvious ways that even journal impact
> factors could be made far more accurate and representative of
> true research impact.
>
> Right now, "like tends to cite like" in more ways than one!
> Not only do articles in phytology tend to cite articles in
> phytology, but average research tends to cite average
> research! This means that there is necessarily a quanitative
> citation bulge toward the middle (mean) of the distribution
> that masks any far more important qualitative impact from the
> smaller, higher quality tail-end of the distribution.
>
> There are at least five ways that this could be remedied --
> and it makes no sense to wait for ISI, with their primary
> need to pay more attention to market matters, to get around
> to doing all this for us. A growing Open Access full-text
> corpus can count on many talented and enterprising doctoral
> students like Tim Brody doing this and more:
>
> (1) RECURSIVE "CiteRank": A recursive measure of citation of
> citation weight could replace flat citation counting: If
> article A cites article B, Article A's citation weight is not
> 1 but a normalized multiple of
> 1 based on the number of citations the *citing* article has
> itself received. This would go some way toward replacing the
> pure weight of numbers by a recursive measure of the weight
> of the numbers (without ever yet leaving the circle of
> citation counts themselves). Average work will lose some of
> its strength-of-numbers unless it manages to draw citations
> from above-average articles too (still in terms of citation counts).
>
> [This recursive technique is analogous to Google's PageRank,
> hence could perhaps be called CiteRank; it is ironic that
> Google got the idea of PageRank from citation ranking, but
> then improved it, yet the improvement has not yet percolated
> back to citation ranking, because ISI had no particular
> motive to implement it -- perhaps even a disincentive, as it
> might reduce the journal impact factor of the large, average
> journals which are of necessity ISI's numerical mainstay!]
>
> (2) USAGE COUNTS: The circularity of citation counting can
> also be broken in various ways. One is by adding download
> counts to the impact measure, not as a weight on the citation
> count, but as a second variable in a multiple regression
> equation. We know  now from Tim Brody's findings that
> downloads correlate with and hence predict citations. That
> means citation counts plus download counts are better
> predictors of impact than just citation counts alone, and are
> especially good at correcting for early impact, which may not
> yet be felt in the citation counts.
> http://www.ecs.soton.ac.uk/~harnad/Temp/timbrody.new.doc
>
> (3) RATING SCORES: A more radical way to break out of the
> circularity of citation counting can be done in two ways:
> Systematic rating polls can easily be conducted, asking
> researchers (by field and subfield) to rank the N most
> important articles in their field in the past year (or two).
> Even with the inevitable incest this will evoke, a good-sized
> systematic sample will pick out the recurrent articles
> (because, by definition, local-average mediocrity effects are
> merely local) and then the rankings could either be used as
> (3a) a third independent variable in the impact regression
> equation or, perhaps more interestingly, as (3b) another
> constraint on the weighting of the CiteRank score
> (effectively making that weight the result of a 2nd order
> regression equation based on the citer's citation count aas
> well as on the citer's rating score: the download count could
> also be used instead as a 3rd component in this 2nd order
> regression). The result will be a still better adjustment of
> the citation count for an article (and hence an adjustment of
> the journal's average citation count too).
>
> (4) CO-CITATION & HUB-AUTHORITY SCORES: Although I would need
> to consult with a statistician to sort it out optimally, I am
> certain that co-citation (what article/author is co-cited
> with what article/author) can also be used to correct or add
> to the impact regression equation. So, I expect, could a hub
> (fan-in) and authority (fan-out) score, as well as a better
> use of citation latency (ISI's "immediacy factor") in the
> impact equation.
>
> (5) AUTHOR/JOURNAL SELF-CITATIONS: Another clean-up factor
> for citation counts is of course the elimination of
> self-citations, which would be interesting not only for
> author self-citations, but also journal
> self-citations: This ttoo might be added as another pair of
> variables in the regression equation (self-citation score and
> journal self-citation score), with the weight adjusting
> itself, as the variable's proves its predictivity.
>
> The predictivity and validity of the regression equation
> should of course also be actively tested and calibrated by
> validating it against
> (a) later citation impact, (b) subjective impact ratings (2,
> above), (c) other impact measures such as prizes, funding,
> and time-line descendents that are further than one
> citation-step away (A is cited by B, B is cited by C: this
> could be an uncited credit to A...)
>
> And all of this is without even mentioning full-text
> "semantic" analysis.
> So the potential world of impact analysis is a rich and
> diverse one. Let us not be parochial, focussing only on the
> limits of the ISI 2-year average journal citation-count that
> has become so mindlessly overused by libraries and assessors.
> Let us talk instead about the positive horizons OA opens up!
>
> Cheers, Stevan
>