Wouters P. Menczer, F. Amitay E. Prime-Claverie, C; Beigbeder, M... Papers in JASIST 55(14) December 2004

Wed Dec 22 15:54:52 EST 2004

The following appeared in the December issue of JASIST.  I thought they
might be of interest to members of the SIG-Metrics List.

TITLE: Formally citing the web (Article, English)

AUTHOR: Wouters, P; de Vries, R

SOURCE: JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE
AND TECHNOLOGY 55 (14). DEC 2004. p.1250-1260 JOHN WILEY
& SONS INC, HOBOKEN

ABSTRACT:
How do authors refer to Web-based information sources in
their formal scientific publications? It is not yet well known how
scientists and scholars actually include new types of information
sources, available through the new media, in their published work. This
article reports on a comparative study of the lists of references in 38
scientific journals in five different scientific and social scientific
fields. The fields are sociology, library and information science,
biochemistry and biotechnology, neuroscience, and the mathematics of
computing. As is well known, references, citations, and hyperlinks play
different roles in academic publishing and communication. Our study
focuses on hyperlinks as attributes of references in formal scholarly
publications. The study developed and applied a method to analyze the
differential roles of publishing media in the analysis of scientific and
scholarly literature references. The present secondary databases that
include reference and citation data (the Web of Science) cannot be used
for this type of research. By the automated processing and analysis of
the full text of scientific and scholarly articles, we were able to
extract the references and hyperlinks contained in these references in
relation to other features of the scientific and scholarly literature.
Our findings show that hyperlinking references are indeed, as expected,
abundantly present in the formal literature. They also tend to cite more
recent literature than the average reference. The large majority of the
references are to Web instances of traditional scientific journals. Other
types of Web-based information sources are less well represented in the
lists of references, except in the case of pure e-journals. We conclude
that this can be explained by taking the role of the publisher into
account. Indeed, it seems that the shift from print-based to electronic
publishing has created new roles for the publisher. By shaping the way
scientific references are hyperlinking to other information sources, the
publisher may have a large impact on the availability of scientific and
scholarly information.

AUTHOR ADDRESS: P Wouters, NIWI KNAW, Nerdi, POB 95110, NL-1090 HC
Amsterdam, Netherlands

ISSN: 1532-2882

--------------------------------------------------------------------------

TITLE: Lexical and semantic clustering by web links (Article,
English)

AUTHOR: Menczer, F

SOURCE: JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE
AND TECHNOLOGY 55 (14). DEC 2004. p.1261-1269 JOHN WILEY
& SONS INC, HOBOKEN

ABSTRACT: Recent Web-searching and -mining tools are combining text
and link analysis to improve ranking and crawling algorithms. The central
assumption behind such approaches is that there is a correlation between
the graph structure of the Web and the text and meaning of pages. Here I
formalize and empirically evaluate two general conjectures drawing
connections from link information to lexical and semantic Web content.
The link-content conjecture states that a page is similar to the pages
that link to it, and the link-cluster conjecture that pages about the
same topic are clustered together. These conjectures are often simply
assumed to hold, and Web search tools are built on such assumptions. The
present quantitative confirmation sheds light on the connection between
the success of the latest Web-mining techniques and the small world
topology of the Web, with encouraging implications for the design of
better crawling algorithms.

AUTHOR ADDRESS: F Menczer, Indiana Univ, Sch Informat, Dept Comp Sci,
Bloomington, IN 47408 USA

ISSN: 1532-2882

--------------------------------------------------------------------------

TITLE: Trend detection through temporal link analysis (Article,

English)

AUTHOR: Amitay, E; Carmel, D; Herscovici, M; Lempel, R; Soffer, A

SOURCE: JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE
AND TECHNOLOGY 55 (14). DEC 2004. p.1270-1281 JOHN WILEY
& SONS INC, HOBOKEN

ABSTRACT: Although time has been recognized as an important
dimension in the co-citation literature, to date it has not been
incorporated into the analogous process of link analysis on the Web. In
this paper, we discuss several aspects and uses of the time dimension in
the context of Web information retrieval. We describe the ideal case
where search engines track and store temporal data for each of the pages
in their repository, assigning timestamps to the hyperlinks embedded
within the pages. We introduce several applications which benefit from
the availability of such timestamps. To demonstrate our claims, we use a
somewhat simplistic approach, which dates links by approximating the age
of the page's content. We show that by using this crude measure alone it
is possible to detect and expose significant events and trends. We
predict that by using more robust methods for tracking modifications in
the content of pages, search engines will be able to provide results that
are more timely and better reflect current real-life trends than those
they provide today.

AUTHOR ADDRESS: E Amitay, IBM Res Lab Haifa, IL-31905 Haifa, Israel

[ ISSN: 1532-2882

--------------------------------------------------------------------------

TITLE: Transposition of the cocitation method with a view to
classifying web pages (Article, English)

AUTHOR: Prime-Claverie, C; Beigbeder, M

SOURCE: JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE
AND TECHNOLOGY 55 (14). DEC 2004. p.1282-1289 JOHN WILEY
& SONS INC, HOBOKEN

ABSTRACT: The Web is a huge source of information, and one of the
main problems facing users is finding documents which correspond to their
requirements. Apart from the problem of thematic relevance, the documents
retrieved by search engines do not always meet the users' expectations.
The document may be too general, or conversely too specialized, or of a
different type from what the user is looking for, and so forth. We think
that adding metadata to pages can considerably improve the process of
searching for information on the Web. This article presents a possible
typology for Web sites and pages, as well as a method for propagating
metadata values, based on the study of the Web graph and more
specifically the method of cocitation in this graph.

AUTHOR ADDRESS: C Prime-Claverie, Ecole Natl Super Mines, Lab RIM G2I, 158
Cours Fauriel, F-42023 St Etienne, France

ISSN: 1532-2882