Corporate addresses in Web of Science v5

Loet Leydesdorff loet at LEYDESDORFF.NET
Mon Jul 18 18:56:23 EDT 2011

Ø  Why, oh why, isn’t it easier to figure out which author goes with which
address after all this time?


Dear Christina, 


This problem was precisely solved by Thomson Reuters a few years ago when
authors were coupled to addresses between [brackets] in the address field.
In these cases, I use the brackets as delimiters. The problem is mainly with
older records (before 2008?) when these couplings are not in the data. 


The coupling is accounted for in my relational database management by the
file csau.dbf which only contains relational information between the files
au.dbf and cs.dbf. 


Best wishes, 



From: ASIS&T Special Interest Group on Metrics
[mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Pikas, Christina K.
Sent: Monday, July 18, 2011 10:49 PM
Subject: Re: [SIGMETRICS] Corporate addresses in Web of Science v5


Hi All-

I shared this with a colleague of mine who is working with me on a science
mapping project and he made a very logical point: How odd it is that WoS
data are getting more difficult to parse instead of easier! The prevailing
movement is to make data more interoperable and more machine-friendly.


Why, oh why, isn’t it easier to figure out which author goes with which
address after all this time?


Thank you for your efforts, Loet.






Christina K Pikas


The Johns Hopkins University Applied Physics Laboratory

Christina.Pikas at

(240) 228 4812 (DC area)

(443) 778 4812 (Baltimore area)





From: ASIS&T Special Interest Group on Metrics
[mailto:SIGMETRICS at] On Behalf Of Loet Leydesdorff
Sent: Monday, July 18, 2011 8:41 AM
Subject: [SIGMETRICS] Corporate addresses in Web of Science v5


Dear colleagues, 


Unlike the WoS4 interface, the new WoS5 interface (introduced yesterday)
does not contain a unique delimiter of the address information in each
record. (WoS4 used a period for this.) It may therefore occur that it is not
possible to parse unambiguously whether a new line is a continuation of the
previous address or a new address. 


I used the following rules: If author names are coupled to the addresses,
these are placed between brackets and these brackets can be used for an
unambiguous delineation of the address information. The relations between
authors and corporate addresses are stored in a file csau.dbf in this case. 


If there are no author names, the address field can be on one or two lines.
The following rules are used: 


1.      if the line ends with a comma, the next line is considered as a
continuation of the address information;

2.      if the next line contains no commas or only a single one, this line
is also considered as a continuation of the previous one; 

3.      in other cases, the two lines are each time concatenated and tested
on the number of commas. If this number is five or larger, the two lines are
considered as separate addresses. The number of five is chosen because in
some cases four commas can still be considered as a single address, but six
commas almost never. However, errors are possible on both sides because some
addresses contain only two commas and some individual addresses more than


In some cases of older records, the field for the responding author (RP)
contains additional information (Costas & Iribarren-Maestro, Scientometrics,
2007). This field is tested on whether the first subfield (the organization)
is the same and similarly for the country name. A new address is only added
if this test fails. The number of this address (CSNR in CS.DBF) is 999 in
order to distinguish it clearly from the other addresses numbered
consecutively and harvested from the address fields (C1).


The above procedure with the test for the five commas will unavoidably
generate some error. However, this is a consequence of the restructuration
of the address field in the new WoS5 interface. From the files which I
tested, I also noted that short address information may be country specific
(e.g., Germany). Most addresses, however, contain three or four commas. Two
consecutive addresses with two commas (and no author information between
brackets) can be erroneously concatenated. Addresses with five or more
commas may erroneously be distinguished as two addresses (if the line break
is not at a comma, etc.).


I will react on feedback suggesting improvements. The different programs
using the address field will be replaced as there is demand for this. All
programs will now be Win32 although they keep the same user interface (using
the C-prompt). The upgrade to Win32 is needed because 64-bits computers can
no longer handle the 16-bits programs under DOS. 


I replaced this morning ISI.exe at
with a new version. Feedback is appreciated. I’ll replace the other programs
before December 31 when v4 becomes obsolete.


Best wishes (and apologies for crosspostings), 




Loet Leydesdorff 

Professor, University of Amsterdam
Amsterdam School of Communications Research (ASCoR)
Kloveniersburgwal 48, 1012 CX Amsterdam.
Tel. +31-20-525 6598; fax: +31-842239111

 <mailto:loet at> loet at ;
Visiting Professor, ISTIC,  <>
Beijing; Honorary Fellow, SPRU,  <> University
of Sussex 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the SIGMETRICS mailing list