Scientometric OAI Search Engines

Stevan Harnad harnad at ECS.SOTON.AC.UK
Sun Aug 25 10:28:55 EDT 2002


Something revolutionary is in the making in the form of scientometric
OAI search engines.

Citebase http://citebase.eprints.org/ is a prototype OAI service
http://www.openarchives.org/service/listproviders.html now available
(free, of course) to give research authors, users, their institutions
and their research-funders a foretaste of what is coming and what
is possible.

Citebase has just been incorporated as an experimental feature for
all users of the Physics Archive http://arxiv.org -- the largest
http://arxiv.org/show_monthly_submissions and most heavily used
http://arxiv.org/show_weekdays_graph Eprint Archive to date.

We are hoping that by demonstrating the remarkable possibilities that a
full-text citation-linked open-access corpus opens up, citebase will
help to accelerate the rate at which the refereed research
literature is made openly accessible online through institutional
self-archiving http://www.arl.org/sparc/IR/ir.html

The mother of all hyperlinks is bibliographic
citation. Google's spectacularly successful system of
ranking digital content by the number of incoming links
http://www7.scu.edu.au/programme/fullpapers/1921/com1921.htm is simply
a generalized case of the pre-existing scholarly practice of following
the links that authors provide by citing their reference. The
number of incoming reference links has long been used for ranking
scholarly/scientific content by, for example, the Institute for
Scientific Information (ISI) in the form of the citation impact factor
http://www.isinet.com/isi/hot/essays/journalcitationreports/7.html
http://cogprints.ecs.soton.ac.uk/archive/00001697/index.html

Here are the elements in the chain:

    (1) A manuscript (preprint) is submitted to a journal for evaluation
    in the form of peer review.

    (2) The manuscript is peer-reviewed, revised and, if successful,
    published under the journal's name, certifying that it has met that
    journal's quality standards.

    (3) The journal's name and track record for quality is then used by
    researchers, research-funders, and the author's own institution as
    one of the guides in evaluating whether the work should be read,
    used, cited and further funded, and whether the author should be
    rewarded through salary increases, promotion, or prizes.

    (4) In addition to the journal-name's established reputation for
    peer-review standards, its "citation impact factor" (the average
    number of citation links to its articles from other articles) is
    used an evaluative guide by potential users and funders.

    (5) Articles and authors can also be evaluated and ranked, not just
    by the name-brand and citation impact of the journal in which they
    appear, but by the individual citation impact of each individual
    article and/or author.

    (6) Journal reputations and journal/article/author citation impacts
    can also be supplemented by evaluations in review articles and
    commentaries and by various forms of promotion and self-promotion
    by journals, authors, alerting services, and the public press
    (although these evaluations themselves would need to be evaluated,
    if they were not simply to be counted as further citations).

    (7) A new potential measure of on-line impact, not available in the
    on-paper era, is usage, in the form of "hits." This measure is noisy
    (it can be inflated by automated web-crawlers, short-changed by
    intermediate caches, abused by deliberate self-hits from authors,
    and undiscriminating between nonspecific site-browsing and
    item-specific reading) yet it seems to have some signal-value too,
    partly correlated with and partly independent of citation impact:
    http://opcit.eprints.org/opcitresearch.shtml

    (8) Nor do citations and hits exhaust the potential of online
    performance indicators. They are just the beginning of a wealth of
    potential scientometric guides to users and evaluators, including
    co-citation analysis, time-series analysis, and other potentially
    predictive analyses of correlations and trends among citations, hits,
    and even articles' content-words that will no doubt be invented and
    discovered as more of this corpus comes online.

So try out citebase, and don't forget to supplement your experience with
your imagination:

    (a) Citebase content right now is preponderantly in physics,
    mathematics and computer science. Imagine what it would be like if
    the full-text open-access content http://www.soros.org/openaccess/
    were up there in all the other disciplines too. (And remember
    that getting it up there depends on -- and waits on -- only
    you!) http://www.eprints.org/self-faq/#researcher/authors-do

    (b) Notice how natural and useful it feels to navigate the literature
    via citation links, guided by author or article ranking in terms of
    citation impact or hit impact. Imagine how much more useful it will
    feel when all the research literature is up there, gap-free, and
    spawns still newer and more powerful online scientometric guides.
    http://opcit.eprints.org/opcitpapers.shtml

Stevan Harnad

Harnad, S. (2001) "Research access, impact and assessment." Times Higher
Education Supplement 1487: p. 16.
http://cogprints.soton.ac.uk/documents/disk0/00/00/16/83/index.html

NOTE: A complete archive of the ongoing discussion of providing open
access to the refereed journal literature online is available at the
American Scientist September Forum (98 & 99 & 00 & 01):

    http://amsci-forum.amsci.org/archives/september98-forum.html
                            or
    http://www.cogsci.soton.ac.uk/~harnad/Hypermail/Amsci/index.html

Discussion can be posted to: september98-forum at amsci-forum.amsci.org

See also the Budapest Open Access Initiative:
    http://www.soros.org/openaccess

and the Free Online Scholarship Movement:
    http://www.earlham.edu/~peters/fos/timeline.htm



More information about the SIGMETRICS mailing list