Chronicle of Higher Education Impact Factor Article

Mon Oct 10 21:40:16 EDT 2005

On Mon, 10 Oct 2005, Stephen J Bensman wrote:

>>  SH: If you wish to challenge this, it would be very helpful if you
>>  could reply with a specific artifact that could not be detected and
>>  corrected on a full-text Open Access database. (The CHE article,
>>  after all, was talking -- over and over -- about one simple,
>>  obviously soluble problem: using Journal impact factors to evaluate
>>  individuals' work. Surely direct citation counts are preferable --
>>  and the far richer and more diverse regression equation I sketched
>>  is more preferable still.)
>
> You are talking utopia, but I will name two possible artifacts.  First, the
> probabilities governing both use and citations are so low that it is really
> not possible to use these for comparison purposes for the vast majority of
> people or articles.

No doubt there will be cases (perhaps many/most cases) where any
differences will be too close to call or below the grain of random
variation. In such cases neither the journal CIF nor the exact article
citation counts will be sensitive enough to do any differential
evaluation. But there will also be cases where they will be sensitive
enough. And even where neither CIF nor citation counts are enough, an
Open Access (OA) corpus makes possible regression equation with a total
of at least 15 predictor variables, of which citation counts are just one:

    [1-4] article/author citation counts, growth rates, peak latencies,
    longevity; [5-8] download counts, growth rates, peak latencies,
    longevity; [9] download/citation correlation based predicted
    citations; [10-11] hub/authority scores; [13-14] co-citation with/by
    scores; [15] co-text scores (semantic proximity measures).

For example, early downloads predict citations 18 months later.

    Brody,  T. ,  Harnad,  S. and Carr,  L. (2005) Earlier Web Usage
    Statistics as Predictors of Later Citation Impact. Journal of the
    American Association for Information Science and Technology (JASIST,
    in press).  http://eprints.ecs.soton.ac.uk/10713/

> The differences are not all that great for the vast majority,
> as most people and articles are restricted to a very small range.

It's true that many articles have next to no citations, but not only will
the richer impact multiple regression have a better chance of detecting
significant differential variance in an OA database, but OA itself
enhances citations by 25%-250%, in all citation-brackets, putting more
meat on the psychometric bones. It also triples downloads:

    http://www.crsc.uqam.ca/lab/chawki/classement_citations.htm

And CiteRank (a variant of Authorities and PageRank) could also weight
citations differntially, also helping to stretch out the variance.

And the variance at the download level is below the sensitivity grain of
ISI's citation counts, but not OA's download counts.

> Elites are identifiable but these are small by their very definition.  ISI
> does cover the best, and you will be dragging just more dross.

No doubt the full OA corpus will include a lot of dross, and that will
be by definition undifferentiable and not worth differentiating. But OA
will also level the playing field, and give merit a better chance to
be accessed and used. The 25%-250% OA citation-count advantage for OA
vs. non-OA articles within the same journal was all based on ISI journals
("the best"), so there seems scope for increasing the available citation
variance there too...

>  Second, you
> will have the problem of differing professional ages with a concomitant
> Matthew Effect causing the misallocation scientific credit, citations, etc.

That's one of the many effects that *can* be detected and taken into
account in an OA corpus, by suitably adjusting window-sizes.

> A Nobelist can write a letter to the editor that will draw more citations
> than most people do in a lifetime.  No matter how you boil it, you are
> going to have to use some type of subjective judgment for the vast majority
> of people.  And these are just possible problems--not the least are your
> fuzzy definitions of what you are comparing.

I am not saying black-box bean-counting, no matter how rich and varied the
regression equation, can replace human judgment altogether. But it can
become a better and better supplement to it.

Stevan Harnad

> > Stevan Harnad <harnad at ECS.SOTON.AC.UK>@listserv.utk.edu> on 10/10/2005
> > 03:07:26 PM
> >
> >     Comment on:
> >     Richard Monastersky, The Number That's Devouring Science,
> >     Ahronicle of Higher Education, October 1, 2005
> >     http://chronicle.com/weekly/v52/i08/08a01201.htm
> >     [text appended at the end of the comment]
> >
> >         Impact Analysis in the PostGutenberg Era
> >
> > Although Richard Monasterky describes a real problem -- the abuse of
> > journal impact factors -- its solution is so obvious one hardly required
> > so many words on the subject:
> >
> > A journal's citation impact factor (CIF) is the average number of
> > citations received by articles in that journal (ISI -- somewhat
> > arbitrarily -- calculates CIFs on the basis of the preceding two
> > years, although other time-windows may also be informative; see
> > http://citebase.eprints.org/analysis/correlation.php )
> >
> > There is an undeniable relationship between the usefulness of an
> > article and how many other articles use and hence cite it. Hence CIF
> > does measure the average usefulness of the articles in a journal. But
> > there are three problems with the way CIF itself is used, each of them
> > readily correctable:
> >
> >     (1) A measure of the average usefulness of the articles in the
> journal
> >     in which a given article appears is no substitute for the actual
> >     usefulness of each article itself: In other words, the journal CIF is
> >     merely a crude and indirect measure of usefulness; each article's own
> >     citation count is the far more direct and accurate measure. (Using
> >     the CIF instead of an article's own citation count [or the average
> >     citation count for the author] for evaluation and comparison is
> >     like using the average marks for the school from which a candidate
> >     graduated, rather than the actual marks of the candidate.)
> >
> >     (2) Whether comparing CIFs or direct article/author citation counts,
> >     one must always compare like with like. There is no point comparing
> >     either CIFs between journals in different fields, or citation counts
> >     for articles/authors in different fields. (No one has yet bothered
> >     to develop a normalised citation count, adjusting for different
> >     baseline citation levels and variability in different fields. It
> >     could easily be done, but it has not been -- or if it has been done,
> >     it was in an obscure scholarly article, but not applied by the actual
> >     daily users of CIFs or citation counts today.)
> >
> >     (3) Both CIFs and citation counts can be distorted and abused.
> Authors
> >     can self-cite, or cite their friends; some journal editors can and do
> >     encourage self-citing their journal. These malpractices are
> deplorable,
> >     but most are also detectable, and then name-and-shame-able and
> >     correctable. ISI could do a better job policing them, but soon the
> >     playing field will widen, for as authors make their articles open
> >     access online, other harvesters -- such as citebase and citeseer
> >     and even google scholar -- will be able to harvest and calculate
> >     citation counts, and average, compare, expose, enrich and correct
> >     them in powerful ways that were in the inconceivable in the Gutenberg
> >     era:.
> >
> >     http://citebase.eprints.org/
> >     http://citebase.eprints.org/
> >     http://scholar.google.com/
> >
> > So, yes, CIFs are being misused and abused currently, but the cure is
> > already obvious -- and wealth of powerful new resources are on the way
> > for measuring and analyzing
> > research usage and impact online, including (1) download counts, (2)
> > co-citation
> > counts (co-cited with, co-cited by), (3) hub/authority ranks (authorities
> > are highly cited papers cited by many highly cited papers; hubs cite
> > many authorities), (4) download/citation correlations and other
> time-series
> > analyses, (5) download growth-curve and peak latency scores, (6) citation
> > growth-curve and peak-latency scores, (7) download/citation longevity
> > scores,
> > (8) co-text analysis (comparing similar texts, extrapolating directional
> > trends), and much more. It will no longer be just CIFs and citation
> counts
> > but a rich multiple regression equation, with many weighted predictor
> > variables based on these new measures. And they will be available both
> > for navigators and evaluators online, and based not just on the current
> ISI
> > database but on all of the peer-reviewed research literature.
> >
> > Meanwhile, use the direct citation counts, not the CIFs.
> >
> > Some self-citations follow (and then the CHE article's text):
> >
> > Brody,  T. (2003) Citebase Search: Autonomous Citation Database for
> e-print
> > Archives,  sinn03 Conference on Worldwide Coherent Workforce,  Satisfied
> > Users -
> > New Services For Scientific Information,  Oldenburg,  Germany,  September
> > 2003
> > http://eprints.ecs.soton.ac.uk/10677/
> >
> > Brody,  T. (2004) Citation Analysis in the Open Access World Interactive
> > Media
> > International
> > http://eprints.ecs.soton.ac.uk/10000/
> >
> > Brody,  T. ,  Harnad,  S. and Carr,  L. (2005) Earlier Web Usage
> Statistics
> > as
> > Predictors of Later Citation Impact. Journal of the American Association
> > for
> > Information Science and Technology (JASIST,  in press).
> > http://eprints.ecs.soton.ac.uk/10713/
> >
> > Hajjem,  C., Gingras, Y., Brody, T., Carr, L. & Harnad, S. (2005) Across
> > Disciplines, Open Access Increases Citation Impact. (manuscript in
> > preparation).
> > http://www.ecs.soton.ac.uk/~harnad/Temp/chawki1.doc
> >
> > Hajjem,  C. (2005) Analyse de la variation de pourcentages d'articles en
> > accÃ¨s
> > libre en fonction de taux de citations
> > http://www.crsc.uqam.ca/lab/chawki/ch.htm
> >
> > Harnad,  S. and Brody,  T. (2004a) Comparing the Impact of Open Access
> (OA)
> > vs.
> > Non-OA Articles in the Same Journals. D-Lib Magazine,  Vol. 10 No. 6
> > http://eprints.ecs.soton.ac.uk/10207/
> >
> > Harnad,  S. and Brody,  T. (2004) Prior evidence that downloads predict
> > citations. British Medical Journal online.
> > http://eprints.ecs.soton.ac.uk/10206/
> >
> > Harnad,  S. and Carr,  L. (2000) Integrating,  Navigating and Analyzing
> > Eprint
> > Archives Through Open Citation Linking (the OpCit Project). Current
> Science
> > 79(5):pp. 629-638.
> > http://eprints.ecs.soton.ac.uk/5940/
> >
> > Harnad,  S. ,  Brody,  T. ,  Vallieres,  F. ,  Carr,  L. ,  Hitchcock,
> S.
> > ,
> > Gingras,  Y. ,  Oppenheim,  C. ,  Stamerjohanns,  H. and Hilf,  E. (2004)
> > The
> > Access/Impact Problem and the Green and Gold Roads to Open Access.
> Serials
> > Review,  Vol. 30,  No. 4,  310-314
> > http://eprints.ecs.soton.ac.uk/10209/
> >
> > Hitchcock,  S. ,  Brody,  T. ,  Gutteridge,  C. ,  Carr,  L. ,  Hall,  W.
> ,
> > Harnad,  S. ,  Bergmark,  D. and Lagoze,  C. (2002) Open Citation
> Linking:
> > The
> > Way Forward. D-Lib Magazine 8(10).
> > http://eprints.ecs.soton.ac.uk/7717/
> >
> > Hitchcock,  S. ,  Carr,  L. ,  Jiao,  Z. ,  Bergmark,  D. ,  Hall,  W. ,
> > Lagoze,  C. and Harnad,  S. (2000) Developing services for open eprint
> > archives:
> > globalisation,  inteeration and the impact of links. In Proceedings of
> the
> > 5th
> > ACM Conference on Digital Libraries,  San Antonio,  Texas,  June 2000. ,
> > pages
> > pp. 143-151.
> > http://eprints.ecs.soton.ac.uk/2860/
> >
> > Hitchcock,  S. ,  Woukeu,  A. ,  Brody,  T. ,  Carr,  L. ,  Hall,  W. and
> > Harnad,  S. (2003) Evaluating Citebase,  an open access Web-based
> > citation-ranked search and impact discovery service. Technical Report
> > ECSTR-IAM03-005,  School of Electronics and Computer Science,  University
> > of
> > Southampton
> > http://eprints.ecs.soton.ac.uk/8204/
> >
> > Stevan Harnad
>