Corporate addresses in Web of Science v5
loet at LEYDESDORFF.NET
Mon Jul 18 08:41:17 EDT 2011
Unlike the WoS4 interface, the new WoS5 interface (introduced yesterday)
does not contain a unique delimiter of the address information in each
record. (WoS4 used a period for this.) It may therefore occur that it is not
possible to parse unambiguously whether a new line is a continuation of the
previous address or a new address.
I used the following rules: If author names are coupled to the addresses,
these are placed between brackets and these brackets can be used for an
unambiguous delineation of the address information. The relations between
authors and corporate addresses are stored in a file csau.dbf in this case.
If there are no author names, the address field can be on one or two lines.
The following rules are used:
1. if the line ends with a comma, the next line is considered as a
continuation of the address information;
2. if the next line contains no commas or only a single one, this line
is also considered as a continuation of the previous one;
3. in other cases, the two lines are each time concatenated and tested
on the number of commas. If this number is five or larger, the two lines are
considered as separate addresses. The number of five is chosen because in
some cases four commas can still be considered as a single address, but six
commas almost never. However, errors are possible on both sides because some
addresses contain only two commas and some individual addresses more than
In some cases of older records, the field for the responding author (RP)
contains additional information (Costas & Iribarren-Maestro, Scientometrics,
2007). This field is tested on whether the first subfield (the organization)
is the same and similarly for the country name. A new address is only added
if this test fails. The number of this address (CSNR in CS.DBF) is 999 in
order to distinguish it clearly from the other addresses numbered
consecutively and harvested from the address fields (C1).
The above procedure with the test for the five commas will unavoidably
generate some error. However, this is a consequence of the restructuration
of the address field in the new WoS5 interface. From the files which I
tested, I also noted that short address information may be country specific
(e.g., Germany). Most addresses, however, contain three or four commas. Two
consecutive addresses with two commas (and no author information between
brackets) can be erroneously concatenated. Addresses with five or more
commas may erroneously be distinguished as two addresses (if the line break
is not at a comma, etc.).
I will react on feedback suggesting improvements. The different programs
using the address field will be replaced as there is demand for this. All
programs will now be Win32 although they keep the same user interface (using
the C-prompt). The upgrade to Win32 is needed because 64-bits computers can
no longer handle the 16-bits programs under DOS.
I replaced this morning ISI.exe at http://www.leydesdorff.net/software/isi
with a new version. Feedback is appreciated. I'll replace the other programs
before December 31 when v4 becomes obsolete.
Best wishes (and apologies for crosspostings),
Professor, University of Amsterdam
Amsterdam School of Communications Research (ASCoR)
Kloveniersburgwal 48, 1012 CX Amsterdam.
Tel. +31-20-525 6598; fax: +31-842239111
<mailto:loet at leydesdorff.net> loet at leydesdorff.net ;
Visiting Professor, ISTIC, <http://www.istic.ac.cn/Eng/brief_en.html>
Beijing; Honorary Fellow, SPRU, <http://www.sussex.ac.uk/spru/> University
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the SIGMETRICS