Several papers of possible interest to Sig Metrics readers
Eugene Garfield
eugene.garfield at THOMSONREUTERS.COM
Fri Oct 28 13:27:57 EDT 2011
--------------------------------------------------------------------------
TITLE: Zipf's law for all the natural cities in the United
States: a geospatial perspective (Article, English)
AUTHOR: Jiang, B; Jia, T
SOURCE: INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION
SCIENCE 25 (8 SP ISS). 2011. p.1269-1281 TAYLOR &
FRANCIS LTD, ABINGDON
SEARCH TERM(S): ZIPF* item_title
KEYWORDS: natural cities; power law; data-intensive geospatial
computing; scaling of geographic space
KEYWORDS+: POWER-LAW; DISTRIBUTIONS
ABSTRACT: This article provides a new geospatial perspective on
whether or not Zipf's law holds for all cities or for the largest cities
in the United States using a massive dataset and its computing. A major
problem around this issue is how to define cities or city boundaries.
Most of the investigations of Zipf's law rely on the demarcations of
cities imposed by census data, for example, metropolitan areas and census-
designated places. These demarcations or definitions (of cities) are
criticized for being subjective or even arbitrary. Alternative solutions
to defining cities are suggested, but they still rely on census data for
their definitions. In this article we demarcate urban agglomerations by
clustering street nodes (including intersections and ends), forming what
we call natural cities. Based on the demarcation, we found that Zipf's
law holds remarkably well for all the natural cities (over 2-4 million in
total) across the United States. There is little sensitivity for the
holding with respect to the clustering resolution used for demarcating
the natural cities. This is a big contrast to urban areas, as defined in
the census data, which do not hold stable for Zipf's law.
AUTHOR ADDRESS: B Jiang, Univ Gavle, Div Geomat, Dept Technol & Built
Environm, Gavle, Sweden
--------------------------------------------------------------------------
TITLE: Interpretations and misinterpretations of scientometric
data in the report of the Royal Society about the scientific landscape in
2011 (Article, English)
AUTHOR: Jacso, P
SOURCE: ONLINE INFORMATION REVIEW 35 (4). 2011. p.669-682
EMERALD GROUP PUBLISHING LIMITED, BINGLEY
SEARCH TERM(S): GARFIELD E rauth; SCIENTOMETRIC* item_title;
GARFIELD E SCIENTIST 10:11 1996
KEYWORDS: Sciences; Reports; Research; Publications; Measurement;
Search output; Databases; China; United Kingdom
KEYWORDS+: H-INDEX; RESEARCH PERFORMANCE; GOOGLE SCHOLAR; SCIENCE;
CHINA; IMPACT; DATABASES; SCOPUS; PRODUCTIVITY;
PUBLICATION
ABSTRACT: Purpose - This paper aims to discuss some caveats about
the findings of Part 1 of the Royal Society's report from the perspective
of the choice and reliability of the source base, and the bibliometric
and scientometric indicators.
Design/methodology/approach - The paper argues that the Royal Society's
report gives too much emphasis to the growth rate of the publications of
Chinese researchers when interpolating those data and forecasting that,
within the decade and possibly as early as 2013, China will be ahead of
even the USA in terms of the number of publications.
Findings - In an era when the "publish or perish" slogan is replaced by
the "get cited or perish" mantra, the report barely discusses how much
China is behind the world average and especially the above countries in
terms of the most important scientometric indicators that take into
account the productivity/quantity aspect and the citedness of
publications as a proxy for quality.
Originality/value - The paper illustrates that there are much better
measures for the assessment of research activity than the one-dimensional
productivity numbers, such as the h-index or the uncitedness rate, and
the citations/publication rate where China is far below and the USA is
far above the world average scores, and uses some charts to paint a more
realistic picture of the scientific landscape.
AUTHOR ADDRESS: P Jacso, Univ Hawaii Manoa, Honolulu, HI 96822 USA
--------------------------------------------------------------------------
TITLE: Comparative Recall and Precision of Simple and Expert
Searches in Google Scholar and Eight Other Databases (Article, English)
AUTHOR: Walters, WH
SOURCE: PORTAL-LIBRARIES AND THE ACADEMY 11 (4). OCT 2011.
p.971-1006 JOHNS HOPKINS UNIV PRESS, BALTIMORE
SEARCH TERM(S): SEGLEN PO J AM SOC INFORM SCI 43:628 1992
KEYWORDS+: LATER-LIFE MIGRATION; MULTIDISCIPLINARY FIELD; WEB;
COVERAGE; SCIENCE; ARTICLES; JOURNALS
ABSTRACT: This study evaluates the effectiveness of simple and
expert searches in Google Scholar (GS), Econ Lit, GEOBASE, PATS, POPLINE,
PubMed, Social Sciences Citation Index, Social Sciences Full Text, and
Sociological Abstracts. It assesses the recall and precision of 32
searches in the field of later-life migration: nine simple keyword
searches and 23 expert searches constructed by demography librarians at
three top universities. For simple searches, Google Scholar's recall and
precision are well above average. For expert searches, the relative
effectiveness of GS depends on the number of results users are willing to
examine. Although Google Scholar's expert-search performance is just
average within the first fifty search results, GS is one of the few
databases that retrieves relevant results with reasonably high precision
after the fiftieth hit. The results also show that simple searches in GS,
GEOBASE, PubMed, and Sociological Abstracts have consistently higher
recall and precision than expert searches. This can be attributed not to
differences in expert-search effectiveness, but to the unusually strong
performance of simple searches in those four databases.
AUTHOR ADDRESS: WH Walters, Menlo Coll, Atherton, CA 94027 USA
-
--------------------------------------------------------------------------
TITLE: Correlation between Download and Citation and Download-
citation Deviation Phenomenon for Some Papers in Chinese Medical Journals
(Article, English)
AUTHOR: Liu, XL; Fang, HL; Wang, MY
SOURCE: SERIALS REVIEW 37 (3). SEP 2011. p.157-161 ELSEVIER INC,
SAN DIEGO
SEARCH TERM(S):
GARFIELD E JAMA-J AM MED ASSOC 295:90 2006;
GARFIELD E SCIENCE 122:108 1955
KEYWORDS+: IMPACT FACTOR; OPEN ACCESS; METRICS; NUMBER
ABSTRACT: The authors collected the numbers of citations and
downloads from 2005 to 2009 of papers in five Chinese general
ophthalmological journals: Recent Advances in Ophthalmology, Chinese
Ophthalmic Research, Ophthalmology in China, Journal of Clinical
Ophthalmology and Chinese Journal of Practical Ophthalmology, published
in 2005 from the Chinese Academic Journals Full-text Database and the
Chinese Citation Database in Chinese National Knowledge Infrastructure
(CNKI) to determine the correlation between download and citation and the
peak time of download frequency (OF). The citations from 2000 to 2009 of
papers published in 2000 were collected to determine the peak time of
citation frequency (CF) of medical papers. There is a highly positive
correlation between OF and CF (r = 4.91, P = 0.000). Serials Review 2011;
37:157-161. (C) 2011 Elsevier Inc. All rights reserved.
AUTHOR ADDRESS: XL Liu, Xinxiang Med Univ, Henan Res Ctr Sci Journals,
Xinxiang 453003, Henan Province, Peoples R China
[ ]<-- Enter an X to order article (IDS: 827OV 00003) ISSN: 0098-7913
--------------------------------------------------------------------------
TITLE: Journal Self-citation Analysis of Some Chinese Sci-tech
Periodicals (Article, English)
AUTHOR: Xia, XD; Wu, YW
SOURCE: SERIALS REVIEW 37 (3). SEP 2011. p.171-173 ELSEVIER INC,
SAN DIEGO
SEARCH TERM(S): GARFIELD E rauth;
HIRSCH JE P NATL ACAD SCI USA 102:16569 2005;
CITATION item_title; CITATION ANALYS* item_title;
CITATION* item_title; JOURNAL item_title;
GARFIELD E JAMA-J AM MED ASSOC 295:90 2006
KEYWORDS+: IMPACT FACTOR; PAGERANK; INDEX
ABSTRACT: This study investigates self-citation rates of 222
Chinese journals within seven groups including 76 journals of agronomy
(34.2 percent), 57 of biology (25.7 percent), 28 of environmental science
and technology (12.6 percent), 15 of forestry (6.8 percent), 24 of
academic journals of agricultural university (10.8 percent), 9 of aquatic
sciences (4.1 percent), and 13 of animal husbandry and veterinary
medicine (5.9 percent). The average self-citation rates range from 2
percent to 67 percent in 2006, 1 percent to 68 percent in 2007 and 0
percent to 67 percent in 2008. There is a significant difference in self-
citation rate between most groups of journals. The self-citation rate is
positively and significantly correlated with the self-citation rate in
2006 for all 222 journals (N = 222, R-2 = 0.194, P = 0.004) (P<0.05).
However, the self-citation rate is not significantly correlated with the
journal's impact factor in 2007 (N = 222, R-2 = 0.114, P = 0.091) and
2008 (N = 222, R-2 = 0.112, P = 0.096) (P<0.05) for the 222 journals. The
relationship between self-citation rate and journal impact factor is
discussed. Serials Review 2011; 37:171-173. (C) 2011 Elsevier Inc. All
rights reserved.
AUTHOR ADDRESS: XD Xia, China Natl Rice Res Inst, Editorial Off, Hangzhou
310006, Zhejiang, Peoples R China
--------------------------------------------------------------------------
-------------------------------
--------------------------------------------------------------------------
TITLE: Characterizing and Modeling Citation Dynamics (Article,
English)
AUTHOR: Eom, YH; Fortunato, S
SOURCE: PLOS ONE 6 (9). SEP 22 2011. p.NIL_331-NIL_337 PUBLIC
LIBRARY SCIENCE, SAN FRANCISCO
Open access journal
SEARCH TERM(S): GARFIELD E rauth; PRICE DJD rauth;
SEGLEN PO J AM SOC INFORM SCI 43:628 1992;
CITATION item_title; CITATION* item_title;
GARFIELD E SCIENCE 122:108 1955
KEYWORDS+: PREFERENTIAL ATTACHMENT; SCIENTIFIC PUBLICATION; RANDOM
NETWORKS; DISTRIBUTIONS; COMPETITION; EVOLUTION; SCIENCE;
IMPACT; TAILS
ABSTRACT: Citation distributions are crucial for the analysis and
modeling of the activity of scientists. We investigated bibliometric data
of papers published in journals of the American Physical Society,
searching for the type of function which best describes the observed
citation distributions. We used the goodness of fit with Kolmogorov-
Smirnov statistics for three classes of functions: log-normal, simple
power law and shifted power law. The shifted power law turns out to be
the most reliable hypothesis for all citation networks we derived, which
correspond to different time spans. We find that citation dynamics is
characterized by bursts, usually occurring within a few years since
publication of a paper, and the burst size spans several orders of
magnitude. We also investigated the microscopic mechanisms for the
evolution of citation networks, by proposing a linear preferential
attachment with time dependent initial attractiveness. The model
successfully reproduces the empirical citation distributions and accounts
for the presence of citation bursts as well.
AUTHOR ADDRESS: YH Eom, Inst Sci Interchange, Complex Networks & Syst
Lagrange Lab, Turin, Italy
--------------------------------------------------------------------------
TITLE: Whetting the Appetite of Scientists: Producing Summaries
Tailored to the Citation Context (Article, English)
AUTHOR: Wan, S; Paris, C; Dale, R
SOURCE: JCDL 09: PROCEEDINGS OF THE 2009 ACM/IEEE JOINT
CONFERENCE ON DIGITAL LIBRARIES. 2009. p.59-68 ASSOC
COMPUTING MACHINERY, NEW YORK
SEARCH TERM(S): CITED ARTICLE abstract; CITATION item_title;
CITATION* item_title
KEYWORDS: Information needs; Information browsing; Scientific
Literature; Biomedical Researchers; User Modeling and
Interactive IR; Summarization
ABSTRACT: The amount of scientific material available
electronically is forever increasing. This makes reading the published
literature, whether to stay up-to-date on a topic or to get up to speed
on a new topic, a difficult task. Yet, this is an activity in which all
researchers must be engaged on a regular basis. Based on a user
requirements analysis, we developed a new research tool, called the
Citation-Sensitive In-Browser Summariser (CSIBS), which supports
researchers in this browsing task. CSIBS enables readers to obtain
information about a citation at the point at which they encounter it.
This information is aimed at enabling the reader to determine whether or
not to invest the time in exploring the cited article further, thus
alleviating information overload. CSIBS builds a summary of the cited
document, bringing together metadata about the document and a citation-
sensitive preview that exploits the citation context to retrieve the
sentences from the cited document that are relevant at this point. This
paper briefly presents our user requirements analysis, then describes the
system and, finally, discusses the observations from an initial pilot
study. We found that CSIBS facilitates the relevancy judgment task, by
increasing the users' self-reported confidence in making such judgements.
AUTHOR ADDRESS: S Wan, CSIRO, ICT Ctr, Sydney, NSW, Australia
[ ]<-- Enter an X to order article (IDS: BWY55 00008)
--------------------------------------------------------------------------
TITLE: CEBBIP: A Parser of Bibliographic Information in Chinese
Electronic Books (Article, English)
AUTHOR: Gao, LC; Tang, Z; Lin, XF
SOURCE: JCDL 09: PROCEEDINGS OF THE 2009 ACM/IEEE JOINT
CONFERENCE ON DIGITAL LIBRARIES. 2009. p.73-76 ASSOC
COMPUTING MACHINERY, NEW YORK
SEARCH TERM(S): BIBLIOGRAPHIC* item_title
KEYWORDS: Metadata extraction; Digital Library; Chinese Electronic
Book; Bibliography; Machine learning
ABSTRACT: Bibliographic information is essential for many digital
library applications, such as citation analysis, academic searching and
topic discovery. And bibliographic data extraction has attracted a great
deal of attention in recent years. In this paper, we address the problem
of automatic extraction of bibliographic data in Chinese electronic book
and propose a tool called CEBBIP. for the task, which includes three main
systems: data preprocessing, data parsing and data postprocessing. In the
data preprocessing system, the tool adopts a rules-based method to locate
citation data in a book and to segment citation data into citation
strings of individual referencing literature. And a learning-based
approach, Conditional Random Fields (CRF), is employed to parse citation
strings in the data parsing system. Finally, the tool takes advantage of
document intrinsic local format consistency to enhance citation data
segmentation and parsing through clustering techniques. CEBBIP has been
used in a commercial E-book production system. Experimental results show
that CEBBIP's precision rate is very high. More specially, adopting the
document intrinsic local format consistency obviously improves the
citation data segmenting and parsing accuracy.
AUTHOR ADDRESS: LC Gao, Peking Univ, Inst Comp Sci & Technol, Beijing
100871, Peoples R China
--------------------------------------------------------------------------
TITLE: Learning to Assess the Quality of Scientific Conferences:
A Case Study in Computer Science (Article, English)
AUTHOR: Martins, WS; Goncalves, MA; Laender, AHF; Pappa, GL
SOURCE: JCDL 09: PROCEEDINGS OF THE 2009 ACM/IEEE JOINT
CONFERENCE ON DIGITAL LIBRARIES. 2009. p.193-202 ASSOC
COMPUTING MACHINERY, NEW YORK
SEARCH TERM(S): MACROBERTS MH rauth;
HIRSCH JE P NATL ACAD SCI USA 102:16569 2005
KEYWORDS: Machine Learning; Classification; Digital Library;
Conference Assessment
KEYWORDS+: IMPACT FACTOR; JOURNALS; INDEX
ABSTRACT: Assessing the quality of scientific conferences is an
important and useful service that can be provided by digital libraries
and similar systems. This is specially true for fields such as Computer
Science and Electric Engineering, where conference publications are
crucial. However, the majority of the existing approaches for assessing
the quality of publication venues has been proposed for journals. In this
paper, we characterize a large number of features that can be used as
criteria to assess the quality of scientific conferences and study how
these several features can be automatically combined by means of machine
learning techniques to effectively perform this task. Within the features
studied are citations, submission and acceptance rates, tradition of the
conference, and reputation of the program committee members. Among our
several findings, we can cite that: (1) separating high quality
conferences from medium and low quality ones can be performed quite
effectively, but separating the last two types is a much harder task; and
(2) citation features followed by those associated with the tradition of
the conference are the most important ones for the task.
AUTHOR ADDRESS: WS Martins, Univ Fed Minas Gerais, Dept Comp Sci,
BR-31270901 Belo Horizonte, MG, Brazil
--------------------------------------------------------------------------
TITLE: Building a Thailand Researcher Network Based on a
Bibliographic Database (Article, English)
AUTHOR: Haruechaiyasak, C; Kongthon, A; Thaiprayoon, S
SOURCE: JCDL 09: PROCEEDINGS OF THE 2009 ACM/IEEE JOINT
CONFERENCE ON DIGITAL LIBRARIES. 2009. p.391 ASSOC
COMPUTING MACHINERY, NEW YORK
SEARCH TERM(S): BIBLIOGRAPHIC* item_title
KEYWORDS: Expertise retrieval; social network; R&D management
ABSTRACT: Among many practical and domain-specific tasks, expertise
retrieval (ER) has recently gained increasing attention in the
information retrieval and knowledge management communities. ER can be
broadly classified into two tasks: expert finding and expert profiling.
The expert finding task aims to identify a list of people who carry some
certain knowledge specified by the input query [1, 3]. The expert
profiling, on the other hand, focuses on identifying the area of
expertise associated with a given person [2]. To construct an expert
profile, two types of information which can be used to describe an expert
are topical and social information. The topical information represents
domain and degree of knowledge in which an expert possesses. The social
information measures an association aspect among experts such as research
project collaboration, publication co-authoring and program committee
assignment.
This paper describes our ongoing project to design and implement an
expert retrieval system with the scope on researchers who work in
Thailand. The first step is to build expert profiles for each researcher.
To identify expertise in different research areas, we could use many
different forms of evidence such as curriculum vitae, personal homepage
or professional profiles from social networking website, e.g., Linked In.
However, there are two main difficulties in using these information
sources. The first is due to the distributed nature of the information
sources. Gathering individual profiles from the public information
source, i.e., the web, could be very tedious. An intelligent information
extraction algorithm is required to understand different document
templates. Also, the profile collection is most likely to be incomplete,
since some researchers do not provide their profiles in public for
privacy reason. The second problem is due to the inconsistency of terms
used to describe the area of expertise, e.g., association, rule mining
(hyponym) vs data mining (hypernym) or avian flu virus vs H5N1 (synonym).
Although integrating information from multiple sources could be very
helpful for providing more supported information, we leave this issue as
our future work.
In our current system prototype, we assume that the areas of expertise
among researchers can be extracted from bibliographic databases. We use
the Science Citation Index (SCI) database to provide the information for
representing the expert profiles. From the SCI database, we queried and
retrieved publications covering from the year 2001 to 2008 by specifying
the affiliation equal to "Thailand". The results contain a set of
approximately 23,000 publications. We downloaded and extracted four
related fields including authors (denoted by AU), controlled terms
(denoted by ID), keywords (denoted by DE) and subject category (denoted
by SC).
To build a researcher network, we consider two types of relationships:
direct and indirect. The direct (or social) relationship is defined as
the co-authoring degree between one researcher to others. The co-
authoring degree between two researchers, co-authoring(A,B), can be
calculated based on the co-occurrence frequency between A and B found in
the field AU of 23,000 retrieved records. The indirect (or topical
relationship is defined when two researchers have publications under the
same topics. The topical degree between two researchers, topical(A,B),
can be calculated based on the similarity measure between two sets of
extracted keywords, keyword(A) and keyword (B), representing researcher A
and B, respectively. The keyword set can be extracted from the fields ID,
DE and SC. An author with high frequencies on particular keywords is
considered an expert in the corresponding research topics.
AUTHOR ADDRESS: C Haruechaiyasak, Natl Elect & Comp Technol Ctr NECTEC,
Human Language Technol Lab HLT, Thailand Sci Pk, Klongluang
12120, Pathumthani, Thailand
--------------------------------------------------------------------------
TITLE: Journal Ranking Based on Social Information (Article,
English)
AUTHOR: Wang, JL; Gao, K; Ren, YL; Li, G
SOURCE: JCDL 09: PROCEEDINGS OF THE 2009 ACM/IEEE JOINT
CONFERENCE ON DIGITAL LIBRARIES. 2009. p.453 ASSOC
COMPUTING MACHINERY, NEW YORK
SEARCH TERM(S): JOURNAL item_title
KEYWORDS: Journal ranking; Social information; Mining
ABSTRACT: Recently, literature analysis has become a hot issue in
academic studies. In order to quantify the importance of journals and
provide researchers with target vehicles for their work, this poster
proposes a novel approach based on the social information through
considering the potential relationship between journals quality and
authors' affiliation. Based on the formula proposed in this work, the
importance of journals can be estimated and ranked.
AUTHOR ADDRESS: JL Wang, Qingdao Technol Univ, Sch Comp Engn, Qingdao,
Peoples R China
More information about the SIGMETRICS
mailing list