Reference Parsing ParaTools v1.00 Released (fwd)
harnad at ECS.SOTON.AC.UK
Wed Jan 29 13:26:12 EST 2003
Forwarding a tool for parsing a reference and then finding out
whether it is available somewhere on the Web. It could be used
by authors putting links into their bibliographies, by users
trying to find cited works, and perhaps even by software engines,
automatically inserting links into reference lists.
(Well done, Mike!)
---------- Forwarded message ----------
Date: Wed, 29 Jan 2003 17:53:17 +0000
From: Mike Jewell <moj at ecs.soton.ac.uk>
Reply-To: September 1998 American Scientist Forum
<SEPTEMBER98-FORUM at LISTSERVER.SIGMAXI.ORG>
To: SEPTEMBER98-FORUM at LISTSERVER.SIGMAXI.ORG
Subject: Reference Parsing ParaTools v1.00 Released
Resent-Subject: Reference Parsing ParaTools v1.00 Released
ParaTools Version 1.00 Released
January 29th 2003
ParaTools v1.00 has been released and is available from:
Created at Southampton University as an offshoot of the OpCit and
EPrints projects, ParaTools is a set of Perl modules for the handling
of references. It includes:
- Reference parsing modules and templates
- Document parsing modules (Experimental)
- OpenURL creation/processing routines
- Parsing examples
- Web Service examples
The toolkit is available under the GNU Public License, which ensures that
ParaTools remains entirely free and open-source, and has been designed
to be easily expandable. The parsing functionality in ParaTools is
already in use in the ParaCite system (http://paracite.eprints.org),
and the project is under active development.
[ParaTools can also be integrated into EPrints 2 - see:
for more information]
We are very interested in user feedback, and the ParaTools website
provides information on submitting bugs and suggestions, as well as
contributing code to the project.
Documentation is available at:
And a mailing list has been created at:
A more detailed summary of ParaTools follows this announcement.
moj at ecs.soton.ac.uk
What is ParaTools?
ParaTools, or the ParaCite ToolKit, provides a set of Perl modules
that aim to extract references from bibliographies in documents and
then parse these references into their component parts (such as author,
year, volume, and title). Take, for example, a document that contains
and so I think my theory should work.
Jewell, M (2002) A useful paper. Journal of Useful Papers, 5:10-20
Bloggs, J (2001) The art of anonymity. Journal of Alias Creation, 10:5-6
The ParaTools 'DocParser' modules aim to strip out everything but the references,
and return these as plain text. i.e.
("Jewell, M (2002) A useful paper. Journal of Useful Papers, 5:10-20",
"Bloggs, J (2001) The art of anonymity. Journal of Alias Creation, 10:5-6")
Once the references are extracted, the ParaTools 'CiteParser' modules
can pull out the individual parts of the reference. For example, applied
to the first reference, 'Jewell' is extracted as the author's surname,
'Journal of Useful Papers' as the publication, and '2002' as the year,
as well as other useful fields including the title, issue, and page range.
Finally, ParaTools provides an 'OpenURL' module, which can use the information
extracted from the reference to create an OpenURL link. The OpenURL for
the above paper would be:
This can then be appended to an OpenURL resolver's base URL to provide
an interface to an OpenURL-enabled resource. e.g.
The ParaTools package has many applications, including:
# Converting reference lists into valid OpenURLs
# Converting existing metadata into valid OpenURLs
# Collecting metadata from references to carry out internal searches
# Extracting reference lists from documents
Mike Jewell <mike at mikesroom.org> | A clash of doctrine is not a disaster --
http://www.mikesroom.org | it is an opportunity.
More information about the SIGMETRICS