Locating Cited References on the Web

Stevan Harnad harnad at ECS.SOTON.AC.UK
Thu Sep 26 19:29:59 EDT 2002


Dear Colleagues:

I have yet another remarkable (experimental) resource from Southampton
to draw to your attention.

(These things take my breath away, and I can praise them without
any blushing or self-reference because they have nothing to do
with me: These are the fruits of the dazzling convergence of
raw talent that there is at Southampton -- in this case, doctoral
candidate Mike Jewell, with the collaboration of that already
well-demonstrated wizard, doctoral candidate-to-be Chris Gutteridge.)

Paracite is a piece of magic for identifying, and where possible
finding, cited references from a variety of potential sources on the Web,
on the basis of a very raw input: It is described below in the magicians'
own words.

(The purpose, as always, is to keep increasing the incentives for the
self-archiving of research output -- pre and post peer review -- in
interoperable OAI Archives, so that benefits like these, and much, much
more, can be shared by the research community, across disciplines and
around the world.)

--------------------------------------------------------------------

"What is ParaCite?" http://paracite.eprints.org

    ParaCite is an experimental service, being designed at the University
    of Southampton, for the location of articles from raw references. When
    a reference is passed to the service, it is split into its component
    parts (e.g. author, title, year), and transferred to the search
    resource. Based on the subject area, and the data provided, a set
    of resources is presented that the system believes have the highest
    probability of providing the full text article at no charge.

"How is the reference parsed?"

    The reference parsing is currently achieved using a custom built
    reference handler. This uses a chain of regular expressions, combined
    with a set of templates for common reference styles. The parser is
    still in an experimental status, but once it is stable we plan to
    release the Perl modules as open source.

"How is the resource ranking decided?"

    At present the resources are ranked depending on the following
    factors:

    * Full Text/Abstract: If a full text version is available, a
    higher score is awarded

    * Free/Toll Free: If the document is free, the score is increased

    * Pre-selected ranking: If the resource is one that we feel is
    reliable and of good quality, a reward point can be added.

    Obviously this does not provide a realistic weight at the current
    time, and we are working on adding the following weights:

    * Subject area: If the reference is known to be in a certain
    subject area, resources of that type can be promoted.

    * Search results: Some search engines now provide search results
    in a parsable form. We hope to use this to judge whether the results
    are useful.

    * Popularity: By logging which resources are most frequently used,
    they can be promoted within the system.

"How do I add my resource to your collection?"

    At present ParaCite is not accepting new resources, but we hope to
    have a form set up for this purpose soon.

"Doesn't OpenURL do this already?" http://www.sfxit.com/openurl/

    OpenURL is able to resolve a URL that contains reference information,
    but is not (to our knowledge) able to parse references directly. Also,
    OpenURL requires information regarding the type of material (e.g.
    whether the item is a book), whereas ParaCite does not need this
    information.

"I used this reference, and it didn't work - why?"

    The ParaCite parser is still highly experimental - please email your
    reference to us at paracite at ecs.soton.ac.uk, and we will ensure the
    parser can handle it.

"Where can I get more information?"

    If you have any further questions, please email
    paracite at ecs.soton.ac.uk and we will try to answer them.

http://paracite.eprints.org



More information about the SIGMETRICS mailing list