Contents of Journal of Informetrics, 3(1): 1-90, January 2009

Eugene Garfield eugene.garfield at THOMSON.COM
Tue Feb 3 16:12:40 EST 2009


Journal of Informetrics
Volume 3, Issue 1, Pages 1-90 (January 2009)

ABSTRACTS, AUTHOR ADDRESSES, ETC. FOLLOW BELOW AFTER CONTENTS PAGE:

Edwin A. Henneken, Michael J. Kurtz, Alberto Accomazzi, Carolyn S. Grant,
Donna Thompson,  Elizabeth Bohlen, Stephen S. Murray
Use of astronomical literature—A report on usage patterns		
Pg. 1

Daniel Torres-Salinas, Henk F. Moed
Library Catalog Analysis as a tool in studies of social sciences and 
humanities: An exploratory study of published book titles in Economics	
Pg.9

Lutz Bornmann, Werner Marx, Hermann Schier, Erhard Rahm, Andreas Thor, Hans-
Dieter Daniel		
Convergent validity of bibliometric Google Scholar data in the field of 
chemistry—Citation counts for papers  that were accepted by Angewandte 
Chemie International Edition or rejected but published elsewhere,
 using Google Scholar, Science Citation Index, Scopus, and Chemical 
Abstracts
Pg.27		
		
Eleftheria Vasileiadou
Stabilisation operationalised: Using time series analysis to understand the 
dynamics of research collaboration					
Pg. 36		

Per Ahlgren, Cristian Colliander
Document–document similarity approaches and science mapping: Experimental 
comparison of five approaches
Pg. 49
									
Raf Guns, Ronald Rousseau
Real and rational variants of the h-index and the g-index	
Pg.64

Laila Khreisat
A machine learning approach for Arabic text classification using N-gram 
frequency statistics		
Pg. 72	
		
Amir Hosein Keyhanipour, Maryam Piroozmand, Kambiz Badie
A GP-adaptive web ranking discovery framework based on combinative content 
and context features	
Pg.78									




----------------------------------------------------------------------------
Use of astronomical literature—A report on usage patterns 
Edwin A. Henneken a, Michael J. Kurtza, Alberto Accomazzia, Carolyn S. 
Granta, Donna Thompsona, Elizabeth Bohlena and Stephen S. Murraya 

Harvard-Smithsonian Center for Astrophysics, 60 Garden Street, Cambridge, 
MA 02138, USA

Abstract
In this paper we present a number of metrics for usage of the SAO/NASA 
Astrophysics Data System (ADS). Since the ADS is used by the entire 
astronomical community, these are indicative of how the astronomical 
literature is used. We will show how the use of the ADS has changed both 
quantitatively and qualitatively. We will also show that different types of 
users access the system in different ways. Finally, we show how use of the 
ADS has evolved over the years in various regions of the world.

Address for correspondence:
Edwin A. Henneken
Harvard-Smithsonian Center for Astrophysics, 60 Garden Street, Cambridge, 
MA 02138, USA
ehenneken at cfa.harvard.edu

Journal of Informetrics  Volume 3, Issue 1, January 2009, Pages 1-8
http://dx.doi.org/10.1016/j.joi.2008.10.001
 -----------------------------------------------
                                                                    
Library Catalog Analysis as a tool in studies of social sciences and 
humanities: An exploratory study of published book titles in Economics

Daniel Torres-Salinasa, and Henk F. Moedb 

aEvaluación de la Ciencia y de la Comunicación Científica, Centro de 
Investigación Médica Aplicada, Universidad de Navarra, Pamplona, Spain
bCentre for Science and Technology Studies (CWTS), Leiden University, The 
Netherlands

 Abstract
This paper explores the use of Library Catalog Analysis (LCA), defined as 
the application of bibliometric or informetric techniques to a set of 
library online catalogs, to describe quantitatively a scientific-scholarly 
field on the basis of published book titles. It focuses on its value as a 
tool in studies of Social Sciences and Humanities, especially its cognitive 
structures, main book publishers and the research performance of its 
actors. The paper proposes an analogy model between traditional citation 
analysis of journal articles and Library Catalog Analysis of book titles. 
It presents the outcomes of an exploratory study of book titles in 
Economics included in 42 academic library catalogs from 7 countries. It 
describes the process of data collection and cleaning, and applies a series 
of indicators and thematic mapping techniques. It illustrates how LCA can 
be fruitfully used to assess book production and research performance at 
the level of an individual researcher, a research department, an entire 
country and a book publisher. It discusses a number of issues that should 
be addressed in follow-up studies and concludes that LCA of published book 
titles can be developed into a powerful and useful tool in studies of 
Social Sciences and Humanities.

Address for correspondence:
Henk F. Moed
Leiden University, Leiden, NETHERLANDS
moed at cwts.leidenuniv.nl

Journal of Informetrics  Volume 3, Issue 1, January 2009, Pages 9-26
http://dx.doi.org/10.1016/j.joi.2008.10.002

---------------------------------------------------------------------

Convergent validity of bibliometric Google Scholar data in the field of 
chemistry—Citation counts for papers that were accepted by Angewandte 
Chemie International Edition or rejected but published elsewhere, using 
Google Scholar, Science Citation Index, Scopus, and Chemical Abstracts

Lutz Bornmanna,  ,  , Werner Marxb, Hermann Schierb, Erhard Rahmc, Andreas 
Thorc and Hans-Dieter Daniela, d

aETH Zurich, Professorship for Social Psychology and Research on Higher 
Education, Zähringerstr. 24, CH-8092 Zurich, Switzerland
bMax Planck Institute for Solid State Research, Heisenbergstraße 1, D-70569 
Stuttgart, Germany
cUniversity of Leipzig, Department of Computer Science, PF 100920, D-04009 
Leipzig, Germany
dUniversity of Zurich, Evaluation Office, Mühlegasse 21, CH-8001 Zurich, 
Switzerland

 Abstract
Examining a comprehensive set of papers (n = 1837) that were accepted for 
publication by the journal Angewandte Chemie International Edition (one of 
the prime chemistry journals in the world) or rejected by the journal but 
then published elsewhere, this study tested the extent to which the use of 
the freely available database Google Scholar (GS) can be expected to yield 
valid citation counts in the field of chemistry. Analyses of citations for 
the set of papers returned by three fee-based databases – Science Citation 
Index, Scopus, and Chemical Abstracts – were compared to the analysis of 
citations found using GS data. Whereas the analyses using citations 
returned by the three fee-based databases show very similar results, the 
results of the analysis using GS citation data differed greatly from the 
findings using citations from the fee-based databases. Our study therefore 
supports, on the one hand, the convergent validity of citation analyses 
based on data from the fee-based databases and, on the other hand, the lack 
of convergent validity of the citation analysis based on the GS data.

Address for correspondence:
Lutz Bornmann
ETH Zurich, Professorship for Social Psychology and Research on Higher 
Education, Zähringerstr. 24, CH-8092 Zurich, Switzerland
bornmann at gess.ethz.ch

Journal of Informetrics Volume 3, Issue 1, January 2009, Pages 27-35

http://dx.doi.org/10.1016/j.joi.2008.11.001
------------------------------------------------------

Stabilisation operationalised: Using time series analysis to understand the 
dynamics of research collaboration

Eleftheria Vasileiadou
Institute for Environmental Studies (IVM), Vrije Universiteit Amsterdam, De 
Boelelaan 1085, 1081 HV Amsterdam, The Netherlands

Abstract
The aim of the paper is to investigate the use of online data and time 
series analysis, in order to study the dynamics of new types of research 
collaboration in a systematic way. Two international research teams were 
studied for more than 3 years, and quantitative data about their internet 
use together with observation of their collaboration patterns were 
gathered. Time series analysis (ARIMA modelling) was performed on their use 
of internet, and specific types of models related to specific ways of 
conducting research at a distance. The paper proposes the use of online 
data and ARIMA models to identify the stabilisation of a complex system, 
such as a research team, and investigate everyday research practices.
Address for correspondence:	

Eleftheria Vasileiadou
Institute for Environmental Studies (IVM), Vrije Universiteit Amsterdam, De 
Boelelaan 1085, 1081 HV Amsterdam, The Netherlands
eleftheria.vasileiadou at gmail.com

Journal of Informetrics    Volume 3, Issue 1, January 2009, Pages 36-48
http://dx.doi.org/10.1016/j.joi.2008.11.002

---------------------------------------------------------------------

Document–document similarity approaches and science mapping: Experimental 
comparison of five approaches

Per Ahlgrena, and Cristian Collianderb 

aDepartment of e-Resources, University Library, Stockholm University, SE-
106 91 Stockholm, Sweden
bUniversity Library, Jönköping University, SE-551 11 Jönköping, Sweden

Abstract
This paper treats document–document similarity approaches in the context of 
science mapping. Five approaches, involving nine methods, are compared 
experimentally. We compare text-based approaches, the citation-based 
bibliographic coupling approach, and approaches that combine text-based 
approaches and bibliographic coupling. Forty-three articles, published in 
the journal Information Retrieval, are used as test documents. We 
investigate how well the approaches agree with a ground truth subject 
classification of the test documents, when the complete linkage method is 
used, and under two types of similarities, first-order and second-order. 
The results show that it is possible to achieve a very good approximation 
of the classification by means of automatic grouping of articles. One text-
only method and one combination method, under second-order similarities in 
both cases, give rise to cluster solutions that to a large extent agree 
with the classification.

Address for correspondence:

Per Ahlgren
Department of e-Resources, University Library, Stockholm University, SE-106 
91 Stockholm, Sweden
per.ahlgren at sub.su.se

Journal of Informetrics  Volume 3, Issue 1, January 2009, Pages 49-63
http://dx.doi.org/10.1016/j.joi.2008.11.003
-------------------------------------------------------

Real and rational variants of the h-index and the g-index

Raf Gunsa, and Ronald Rousseaub, c, d,
aUniversity of Antwerp, IBW, Venusstraat 35, City Campus, 2000 Antwerpen, 
Belgium
bKHBO (Association K.U.Leuven), Industrial Sciences and Technology, Zeedijk 
101, B-8400 Oostende, Belgium
cHasselt University, Universitaire Campus, B-3590 Diepenbeek, Belgium
dK.U.Leuven, Steunpunt O&O Indicatoren and Dept. MSI, Dekenstraat 2, B-3000 
Leuven, Belgium

Abstract
The definitions of the rational and real-valued variants of the h-index and 
g-index are reviewed. It is shown how they can be obtained both graphically 
and by calculation. Formulae are derived expressing the exact relations 
between the h-variants and between the g-variants. Subsequently these 
relations are examined. In a citation context the real h-index is often, 
but not always, smaller than the rational h-index. It is also shown that 
the relation between the real and the rational g-index depends on the 
number of citations of the article ranked g + 1. Maximum differences 
between h, hr and hrat on the one hand and between g, gr and grat on the 
other are determined.

Address for correspondence:
Raf Guns
University of Antwerp, IBW, Venusstraat 35, City Campus, 2000 Antwerpen, 
Belgium
raf.guns at ua.ac.be
Journal of Informetrics  Volume 3, Issue 1, January 2009, Pages 64-71
http://dx.doi.org/10.1016/j.joi.2008.11.004
----------------------------------------------

A machine learning approach for Arabic text classification using N-gram 
frequency statistics

Laila Khreisat
Dept. of Computer Science, Math and Physics, Fairleigh Dickinson 
University, 285 Madison Ave., Madison, NJ 07940, USA

Abstract
In this paper a machine learning approach for classifying Arabic text 
documents is presented. To handle the high dimensionality of text 
documents, embeddings are used to map each document (instance) into R (the 
set of real numbers) representing the tri-gram frequency statistics 
profiles for a document. Classification is achieved by computing a 
dissimilarity measure, called the Manhattan distance, between the profile 
of the instance to be classified and the profiles of all the instances in 
the training set. The class (category) to which an instance (document) 
belongs is the one with the least computed Manhattan measure. The Dice 
similarity measure is used to compare the performance of method. Results 
show that tri-gram text classification using the Dice measure outperforms 
classification using the Manhattan measure.

Address for correspondence:
Laila Khreisat
Dept. of Computer Science, Math and Physics, Fairleigh Dickinson 
University, 285 Madison Ave., Madison, NJ 07940, USA
Khreisat at fdu.edu

Journal of Informetrics   Volume 3, Issue 1, January 2009, Pages 72-77
http://dx.doi.org/10.1016/j.joi.2008.11.005
------------------------------------

A GP-adaptive web ranking discovery framework based on combinative content 
and context features

Amir Hosein Keyhanipour a, , Maryam Piroozmanda and Kambiz Badiea
aIT Research Faculty, Iran Telecommunication Research Center (ITRC), 
Tehran, Iran

Abstract
The problem of ranking is a crucial task in the web information retrieval 
systems. The dynamic nature of information resources as well as the 
continuous changes in the information demands of the users has made it very 
difficult to provide effective methods for data mining and document 
ranking. Regarding these challenges, in this paper an adaptive ranking 
algorithm is proposed named GPRank. This algorithm which is a function 
discovery framework, utilizes the relatively simple features of web 
documents to provide suitable rankings using a multi-layer/multi-population 
genetic programming architecture. Experiments done, illustrate that GPRank 
has better performance in comparison with well-known ranking techniques and 
also against its full mode edition.

Address for correspondence:
Amir Hosein Keyhanipour
IT Research Faculty, Iran Telecommunication Research Center (ITRC), Tehran, 
Iran
keyhanipour at yahoo.com

Journal of Informetrics   Volume 3, Issue 1, January 2009, Pages 78-89
http://dx.doi.org/10.1016/j.joi.2008.11.006
---------------------------------------------



More information about the SIGMETRICS mailing list