Locating Cited References on the Web
Stevan Harnad
harnad at ECS.SOTON.AC.UK
Thu Sep 26 19:29:59 EDT 2002
Dear Colleagues:
I have yet another remarkable (experimental) resource from Southampton
to draw to your attention.
(These things take my breath away, and I can praise them without
any blushing or self-reference because they have nothing to do
with me: These are the fruits of the dazzling convergence of
raw talent that there is at Southampton -- in this case, doctoral
candidate Mike Jewell, with the collaboration of that already
well-demonstrated wizard, doctoral candidate-to-be Chris Gutteridge.)
Paracite is a piece of magic for identifying, and where possible
finding, cited references from a variety of potential sources on the Web,
on the basis of a very raw input: It is described below in the magicians'
own words.
(The purpose, as always, is to keep increasing the incentives for the
self-archiving of research output -- pre and post peer review -- in
interoperable OAI Archives, so that benefits like these, and much, much
more, can be shared by the research community, across disciplines and
around the world.)
--------------------------------------------------------------------
"What is ParaCite?" http://paracite.eprints.org
ParaCite is an experimental service, being designed at the University
of Southampton, for the location of articles from raw references. When
a reference is passed to the service, it is split into its component
parts (e.g. author, title, year), and transferred to the search
resource. Based on the subject area, and the data provided, a set
of resources is presented that the system believes have the highest
probability of providing the full text article at no charge.
"How is the reference parsed?"
The reference parsing is currently achieved using a custom built
reference handler. This uses a chain of regular expressions, combined
with a set of templates for common reference styles. The parser is
still in an experimental status, but once it is stable we plan to
release the Perl modules as open source.
"How is the resource ranking decided?"
At present the resources are ranked depending on the following
factors:
* Full Text/Abstract: If a full text version is available, a
higher score is awarded
* Free/Toll Free: If the document is free, the score is increased
* Pre-selected ranking: If the resource is one that we feel is
reliable and of good quality, a reward point can be added.
Obviously this does not provide a realistic weight at the current
time, and we are working on adding the following weights:
* Subject area: If the reference is known to be in a certain
subject area, resources of that type can be promoted.
* Search results: Some search engines now provide search results
in a parsable form. We hope to use this to judge whether the results
are useful.
* Popularity: By logging which resources are most frequently used,
they can be promoted within the system.
"How do I add my resource to your collection?"
At present ParaCite is not accepting new resources, but we hope to
have a form set up for this purpose soon.
"Doesn't OpenURL do this already?" http://www.sfxit.com/openurl/
OpenURL is able to resolve a URL that contains reference information,
but is not (to our knowledge) able to parse references directly. Also,
OpenURL requires information regarding the type of material (e.g.
whether the item is a book), whereas ParaCite does not need this
information.
"I used this reference, and it didn't work - why?"
The ParaCite parser is still highly experimental - please email your
reference to us at paracite at ecs.soton.ac.uk, and we will ensure the
parser can handle it.
"Where can I get more information?"
If you have any further questions, please email
paracite at ecs.soton.ac.uk and we will try to answer them.
http://paracite.eprints.org
More information about the SIGMETRICS
mailing list