From garfield at CODEX.CIS.UPENN.EDU Mon Aug 4 16:26:10 2003 From: garfield at CODEX.CIS.UPENN.EDU (Eugene Garfield) Date: Mon, 4 Aug 2003 16:26:10 -0400 Subject: Keylock CJ, "Mark Melton's geomorphology and geography's quantitative revolution" Transactions of the Institute of British Geographers 28(2):142-157, 2003 Message-ID: Christopher J. Keylock : c.keylock at geog.leeds.ac.uk Full Text Available at : http://www.blackwell-synergy.com/links/doi/10.1111/1475-5661.00084/abs/ This paper contains some interesting citation analyses as reflected in captions for Tables 1 and 2 and figures 1, 2, and 3. Also note "Sociologies of Scientific Publication" on page 143 of this paper. TITLE : Mark Melton's geomorphology and geography's quantitative revolution AUTHOR : Keylock CJ JOURNAL: TRANSACTIONS OF THE INSTITUTE OF BRITISH GEOGRAPHERS 28 (2): 142-157 2003 Document type: Review Language: English Cited References: 126 Times Cited: 0 Abstract: Mark Melton published some important papers in the late 1950s that have had a significant influence upon the subsequent development of geomorphology. Two of these papers were published in the same journal in the same year, and have a similar number of total citations, and these are compared in this study. Although both papers present novel empirical findings and discuss innovative conceptual frameworks, the extent and manner to which they have been used within geography and geology differs quite markedly. This reveals marked differences in the conceptual frameworks and research priorities of the two groups of scientists, which may help explain why geomorphology has proceeded differently on the two sides of the Atlantic since the quantitative revolution. Author Keywords: quantitative revolution, geomorphology, history of ideas, methodology, systems analysis, thermodynamics KeyWords Plus: NATURAL CHANNEL NETWORKS, FRACTAL RIVER NETWORKS, DRAINAGE SYSTEMS, ENVIRONMENTAL CONSTRAINTS, GEOLOGICAL-SOCIETY, AMERICA-BULLETIN, EVOLUTION, MODEL, SPACE, TIME Addresses: Keylock CJ, Univ Leeds, Sch Geog, Woodhouse Lane, Leeds LS2 9JT, W Yorkshire, England Univ Leeds, Sch Geog, Leeds LS2 9JT, W Yorkshire, England Publisher: INST BRITISH GEOGRAPHERS, LONDON IDS Number: 695VT ISSN: 0020-2754 Cited Author Cited Work Volume Page Year ABRAHAMS AD GEOLOGICAL SOC AM B 84 353 1973 ABRAHAMS AD GEOLOGICAL SOC AM B 83 1523 1972 BAKER VR AM J SCI 278 97 1978 BANAVAR JR J STAT PHYS 104 1 2001 BARNES TJ ENVIRON PLANN A 34 487 2002 BARNES TJ ENVIRON PLANN A 30 203 1998 BARNES TJ ENVIRON PLANN D 19 409 2001 BARNES TJ LOGICS DISLOCATION 1996 BECKINSALE RP PROCESS FORM GEOMORP 3 1997 BERRY BJL ANN ASSOC AM GEOGR 48 83 1958 BUNGE W FIELD NOTES 1 1 1969 BUNGE W THEORETICAL GEOGRAPH 1962 CALDARELLI G PHYS REV E 1 63 2001 CHORLEY RJ ESSAYS GEOMORPHOLOGY 1966 CHORLEY RJ FRONTIERS GEOGRAPHIC 1965 CHORLEY RJ FRONTIERS GEOGRAPHIC 21 1965 CHORLEY RJ GEOMORPHOLOGY PRESEN 1 1978 CHORLEY RJ J GEOL 65 628 1957 CHORLEY RJ PHYSICAL GEOGRAPHY 1971 CHORLEY RJ PROG PHYS GEOG 24 563 2000 CHORLEY RJ US GEOLOGICAL SURV B 500 1962 CHURCH M EARTH SURF PROCESSES 6 199 1981 CHURCH M PROGR PHYSICAL GEOGR 4 342 1980 COLLINS HM SOC STUD SCI 29 163 1999 CRONIN B SCIENTOMETRICS 54 31 2002 CULLING WEH J GEOL 73 230 1965 DAVID PA CREATION TRANSFER KN 1998 EMBLETON C GEOMORPHOLOGY PRESEN 1978 FIORENTINO M WATER RESOUR RES 29 1215* 1993 GARDINER V PROGR PHYSICAL GEOGR 2 1 1978 GREGORY KJ GEOMORPHOLOGY PRESEN 1978 HACK JT AM J SCI A 258 80 1960 HACK JT US GEOLOGICAL SURV B 294 1957 HAGGETT P GEOGR J 127 60 1961 HAGGETT P LOCATIONAL ANAL HUMA 1965 HAGGETT P PROGR HUMAN GEOGRAPH 15 302 1991 HARRISON W 7 US ARM COAST ENG R 1964 HARVEY D SOCIAL JUSTICE CITY 1973 HEPPLE LW ENVIRON PLANN D 19 385 2001 HORTON RE GEOL SOC AM BULL 56 275 1945 HOWARD AD WATER RESOUR RES 30 2261* 1994 IIJASZVASQUEZ EJ GEOMORPHOLOGY 5 297 1992 IJJASZVASQUEZ EJ GEOPHYS RES LETT 20 1583* 1993 JOHNSTON RJ AUSTR GEOGRAPHICAL S 38 125 2000 JOHNSTON RJ GEOGRAPHY GEOGRAPHER 1979 KENNEDY BA EARTH SURF PROCESSES 3 328 1978 KENNEDY BA GEOMORPHOLOGY 5 231 1992 KENNEDY BA THESIS U CAMBRIDGE 1965 KHINCHIN AI MATH FDN STAT MECH 1949 KIRKBY MJ Z GEOMORPHOLOGIE S 79 63 1990 KOONS D AM J SCI 253 53 1955 KRUMBEIN WC J SEDIMENT PETROL 29 575 1959 KUHN TS STRUCTURE SCI REVOLU 1962 LABARBERA P WATER RESOUR RES 25 735* 1989 LATOUR B SCI ACTION 1987 LAU SSS PROG PHYS GEOG 25 178 2001 LEOPOLD LB FLUVIAL PROCESSES GE 1964 LEOPOLD LB US GEOLOGICAL SURVEY 252 1953 LEY D ASS AM GEOGRAPHERS M 7 1974 LIVINGSTONE DN DISSEMINATING DARWIN 7 1999 MACKIN JH FABRIC GEOLOGY 1963 MACKIN JH GEOL SOC AM BULL 59 463 1948 MANDELBROT BB FRACTAL GEOMETRY NAT 1983 MARK DM MATH GEOL 9 63 1977 MCCARTY HH ANN ASSOC AM GEOGR 46 263 1956 MCCARTY HH ECON GEOGR 30 95 1954 MCCARTY HH MCCARTY MCCARTHY SPA 1954 MELTON MA 11 COL U DEP GEOL OF 1957 MELTON MA 16 COL U DEP GEOL OF 1958 MELTON MA GEOL SOC AM BULL 71 133 1960 MELTON MA GEOLOGICAL SOC AM B 69 355 1958 MELTON MA J GEOL 73 1 1965 MELTON MA J GEOL 73 715 1965 MELTON MA J GEOL 67 345 1959 MELTON MA J GEOL 66 35 1958 MELTON MA J GEOL 66 442 1958 MONTGOMERY K CAN GEOGR 35 345 1991 MORISAWA M GEOLOGICAL SOC AM B 100 1016 1988 MORISAWA ME J GEOL 66 587 1958 MORISAWA ME T AM GEOPHYSICAL UNI 38 86 1957 MURRAY AB WATER RESOUR RES 32 2579* 1996 ORME AR GEOMORPHOLOGY 47 325 2002 PEEL RF ADV SCI 24 205 1967 PHILLIPS JD J GEOL 100 365 1992 RHOADS BL SCI NATURE GEOMORPHO 21 1996 RICHARDS K EARTH SURF PROCESSES 15 195 1990 RICHARDS K SCI NATURE GEOMORPHO 171 1996 RICHARDS KS RIVERS FORM PROCESS 1982 RINALDO A PHYS REV LETT 70 822* 1993 ROBINSON AH ANN ASSOC AM GEOGR 46 233 1956 RODRIGUEZITURBE I FRACTAL RIVER BASINS 1997 SACK D GEOMORPHOLOGY 5 251 1992 SCHAEFER FK ANN ASSOC AM GEOGR 43 226 1953 SCHEIDEGGER AE WATER RESOUR RES 3 1041* 1967 SCHUMM SA AM J SCI 254 693 1956 SCHUMM SA DRAINAGE BASIN MORPH 1977 SCHUMM SA GEOL SOC AM BULL 67 597 1956 SCOTT AJ OXFORD HDB EC GEOGRA 2000 SHERMAN DJ SCI NATURE GEOMORPHO 87 1996 SHREVE RL J GEOL 75 178 1967 SHREVE RL J GEOL 74 17 1966 SMART JS GEOL SOC AM BULL 80 1757 1969 SMART JS GEOLOGICAL SOC AM B 84 351 1973 STARK CP AM J SCI 301 486 2001 STARK CP NATURE 352 423 1991 STODDART DR PROCESS FORM GEOMORP 383 1997 STRAHLER AN AM J SCI 248 673 1950 STRAHLER AN GEOL SOC AM BULL 69 27 1958 STRAHLER AN GEOL SOC AM BULL 63 1117 1952 STRAHLER AN J GEOL 62 1 1954 STRAHLER AN PROG PHYS GEOG 16 65 1992 STRAHLER AN T AM GEOPHYSICAL UNI 38 913 1957 TARBOTON DG J HYDROL 187 105 1996 THORNES JB GEOGRAPHY 70 222 1985 THORNES JB GEOMORPHOLOGY TIME 1977 THORNES JB PROCESS GEOMORPHOLOG 1 1979 THRIFT NJ DIFFUSING GEOGRAPHY 375 1995 TROUTMAN BM WATER RESOUR RES 28 563* 1992 VANBEMMELEN RW J GEOL 69 453 1961 VONBERTALANFFY L BRIT J PHILOS SCI 1 134 1950 WHIPPLE KX J GEOPHYSICAL RES 107 2002 WHITE HD J AM SOC INF SCI TEC 52 87 2001 WHITTEN EHT GEOLOGICAL SOC AM B 75 455 1964 WILCOCK DN GEOLOGICAL SOC AM B 86 47 1975 WILLGOOSE G WATER RESOUR RES 27 1671* 1991 WITTGENSTEIN L PHILOS INVESTIGATION 1953 When responding, please attach my original message _______________________________________________________________________ Eugene Garfield, PhD. email: garfield at codex.cis.upenn.edu home page: www.eugenegarfield.org Tel: 215-243-2205 Fax 215-387-1266 President, The Scientist LLC. www.the-scientist.com Chairman Emeritus, ISI www.isinet.com Past President, American Society for Information Science and Technology (ASIS&T) www.asis.org _______________________________________________________________________ From loet at LEYDESDORFF.NET Tue Aug 5 05:48:39 2003 From: loet at LEYDESDORFF.NET (Loet Leydesdorff) Date: Tue, 5 Aug 2003 11:48:39 +0200 Subject: Triple Helix Issue of Scientometrics 58(2), (forthcoming) Message-ID: Topical Issue of Scientometrics 58(2), forthcoming October 2003 The Triple Helix of University-Industry-Government Relations Loet Leydesdorff and Martin Meyer The Triple Helix of university-industry-government relations provides a neo-evolutionary model of the process of innovation that is amenable to measurement. Economic exchange, intellectual organization, and geographical constraints can be considered as different dynamics that interact in a knowledge-based economy as a complex system. Differentiation spans the systems of innovation, while performative integration enables organizations to retain wealth from knowledge. Because of the systematic organization of interfaces among the subsystems under study, different perspectives can be expected in the reflection. Consequences for the heuristics, the research design, and normative implications are specified and the organization of the issue is further explained. a. The geographical perspective on systems of innovation 1. Danell, Rickard & Olle Persson, ?Regional R&D Activities and Interactions in the Swedish Triple Helix? The Swedish innovation system is analysed in terms of the interaction between academia, government and the private sector. For each of 21 Swedish regions we analyse the distribution of research activities, doctoral employment, and publication output, as well as the flow of doctoral graduates and the distribution of co-authorship links across regions and sectors. The three main urban regions have about 75 percent of all R&D activities and outputs. They also have a more balanced supply of academic, governmental and private research activities than the smaller regions, and the interactions among sectors within these regions are more intense. The inter-regional flow of PhDs is also to the advantage of the big regions. So far, decentralization of the academic sector does not seem to have had as similar decentralizing effect on private R&D. Unless this imbalance changes, smaller regions will continue to be net exporters of skill and knowledge to the big regions. 2. Goktepe, Devrim, ?The Triple Helix as a Model to Analyze the Isreali Magnet Program and Lessons for Late-Developing Countries like Turkey? Although the systemic changes towards innovation networking between university-industry and governmental actors have recently found a place on the international policy and literature agenda, networking between the organizations and people -for the national survival, production and growth- has been deeply rooted in the Israeli system even before the establishment of the Israeli State in 1948. Internal and international constraints fostered the formation of personal links, as did institutional settings that promoted networking. This paper reviews the interaction of societal, organizational and cultural features that render innovation networks in Israel successful. The research focuses on the impacts of the Israeli Magnet Program on the Israeli R&D growth and performance. The implications of innovation networks for a late-developing country like Turkey are reviewed in the contexts of catching-up and cross-regional collaboration between the Israeli and Turkish industries and academies. 3. Verbeek, Arnold, Koenraad Debackere, & Marc Luwel, ?Science cited in patents: A geographic ?flow? analysis of bibliographic citation patterns in patents? The interplay and cross-fertilization between science and technology, but also the specific role of science for technological development, have received ample attention in both the research and the policy communities. It is in this context that the concepts of ?absorptive capacity? and ?knowledge spillovers? play an important role. We operationalize the science-technology link by quantifying and modeling bibliographic references to the scientific literature as they occur in patents. This approach allows exploring the associative patterns between science creation (as emerging from the scientific literature) and technology development (as emerging from the patent literature). In the current paper, we focus on an analysis of the geographic distribution of the science citation patterns in patents, singling out two fields of (different) technological development, namely biotechnology and information technology. In both fields, the science citation flows from the European, Japanese and US science bases into USPTO and EPO-patents are explored and modeled. Intensive geographic citation flows between the regions are identified, pointing (amongst others) to the strength of both the US and the European science bases as sources for technological activity and creativity around the world. b. University-industry relations in a knowledge-based economy 4. Bhattacharya, Sujit, & Martin Meyer, ?Large Firms and the Science/Technology Interface: Patents, Patent Citations, and Scientific Output of Multinational Corporations in Thin Films? Firms operating in science-based technological fields reflect some of the complexities of the science-technology interaction. The present study attempts to investigate these interactions by analyzing patent citations, publication and patent outputs of multinational corporations (MNCs) in ?thin film? technology. In particular we explore different characteristics of knowledge production and knowledge utilization of these firms. The results indicate no correlation between intensity of research activity and patents produced by the MNCs. The relationship between scientific and technological knowledge generation as well as the linkage between science and technology appear to be firm-specific rather than dependent on a technological or industrial sector. The dispersion of journal sources for the majority of patent citations of scientific literature as well as for the majority of scientific outputs is narrow. Basic journals play an important role in patent citation as well as in addressing research of MNCs in thin-film technology. 5. Gray, Denis O., & Harm-Jan Steenhuis, ?Quantifying the Benefits of Participating in an Industry University Research Center: An Examination of Research Cost Avoidance? The challenges to conducting valid and complete outcome evaluations of cooperative research activities, like the National Science Foundation Industry/University Cooperative Research Centers (IUCRC) Program, are daunting. The current study tries to make a small but important contribution to this area by attempting to develop quantitative estimates of one center benefit ? R&D cost avoidance. Cost avoidance is operationalized as R&D costs industrial members would have incurred but did not, because they participated in university-based industrial consortia, minus the costs of belonging to the consortia. Data were collected from a total of 18 industrial sponsors from three IUCRCs on 35 different research projects. Findings indicate that some firms do avoid R&D costs by participating in an IUCRC but the prevalence of this benefit varies across centers and across firms. The implications of these findings for policy, practice and future research are discussed. 6. Ranga, Liana Marina, Koenraad Debackere, & Nick von Tunzelman, ?Entrepreneurial Universities and the Dynamics of Academic Knowledge Production: a case study of basis versus applied research in Belgium? This paper explores issues related to the impact of Science-Industry relationships on the knowledge production of academic research groups, in particular on the alleged shift to the more applied research end under the influence of business partners? needs. Our findings from a case study of the Belgian Katholieke Universiteit Leuven (K.U. Leuven) show a significant steady growth over time of publications produced by academic research groups involved in University-Industry linkages, closely related to factors both internal and external to the university that have stimulated academic entrepreneurial behaviour. On an aggregated level for 1985-2000, basic research publications appear to be more present than applied ones, both in total numbers and in growth rates. Our findings show that applied and basic research publications generally rose together in the same year. No clear and generalised evidence of a shift towards the applied research end determined by the involvement in U-I linkages was found, the weak indications of such a shift within groups coming only for groups that have already high applied versus basic orientation. These results suggest that the academic research groups examined have developed a record of applied publications without affecting their basic research publications and, rather than differentiating between applied and basic research publications, it is the combination of basic and applied publications that consolidate the group?s R&D potential. Accordingly, critical assessments of the University side of the emerging ?Triple Helix? need to take into account the dynamic nature of the research dimension. 7. Meyer, Martin, Tatiana Goloubeva, & Jan Timm Utecht, ?Towards Hybrid Triple Helix Indicators: A Study of University-related Patents and a Survey of Academic Inventors.? This paper presents work directed at capturing the entrepreneurial and collaborative activity of university researchers. The Triple Helix points to the emergence of the entrepreneurial university as well as to an increasing overlay of activities in universities, industry and government. This study explores ways in which patent-based metrics could be utilized in a Triple Helix context, and how hybrid indicators could be developed by combining patent with survey data. More specifically, it aims to develop indicators that connect technological inventiveness of university researchers to both funding organizations and users, as well as to entrepreneurial activities by academics. The paper develops a simplified model of the innovation process to benchmark the relevance of the indicators to the Triple Helix. An analysis of Finnish academic patents illustrates that patent data can already provide useful indicators but, on its own, cannot provide information about how academic patents are interconnected with government or industry through funding or utilization links. An exclusive analysis of patents can point to patent concentrations on certain universities, to inventors and assignees, or to potential gaps in translating applied science into industrial technology. However, the patent data had to be combined with an inventor survey in order to relate academic patents more to their Triple Helix environment. The survey indicated that most patented academic inventions are connected to (often publicly funded) scientific research by the inventors and tend to be utilized in large firms rather than in start-up companies founded by academic entrepreneurs. 8. Cozzens, Susan & Kamau Bobb, ?Measuring the Relationship between High Technology Development Strategies and Wage Inequality? Growing income and wage inequality in a range of countries has raised concern. High-technology development may be contributing to this inequality, by encouraging higher wages at the upper end of the income distribution. Most studies of the possibility of this effect have used generic, aggregated data. In this paper, we introduce the possibility of linking wage inequality directly to specific industrial strategies using the Theil index of inequality. This measure portrays the portion of wage inequality that is attributable to wages in specific industries. We illustrate this concept with data from U.S. states. c. The intellectual organization of knowledge-based innovations 4. Bhattacharya, Sujit, Hildrun Kretschmer, & Martin Meyer, ?Characterizing Intellectual Spaces between Science and Technology? The paper presents a methodology for studying the interactions between science and technology. Our approach rests mostly on patent citation and co-word analysis. In particular, this study aims to delineate intellectual spaces in thin-film technology in terms of science/technology interaction. The universe of thin-film patents can be viewed as the macro-level and starting point of our analysis. Applying a bottom-up approach, intellectual spaces at the micro-level are defined by tracing prominent concepts in publications, patents, and their citations of scientific literature. In another step, co-word analysis is used to generate meso-level topics and sub-topics. Overlapping structures and specificities that emerge are explored in the light of theoretical understanding of science-technology interactions. In particular, one can distinguish prominent concepts among patent citations that either co-occur in both thin-film publications and patents or reach out to one of the two sides. Future research may address the question to what extent one can interpret directionality into this. 5. Heimeriks, Gaston & Peter van den Besselaar, ?Mapping Communications and Collaboration in Heterogeneous Research Networks? The aim of this mainly methodological paper is to present an approach for researching the triple helix of university-industry-government relations as a heterogeneous and multi-layered communication network. The layers included are: the formal scholarly communication in academic journals, the communication network based on project collaborations, and finally the communication of information over the ?virtual? network of web links. The approach is applied on typical ?Mode 2? fields such as biotechnology, while using a variety of data sources. We present some of the initial findings, which indicate the different structures and functions of the three layers of communication. 6. Gl?nzel, Wolfgang & Martin Meyer, ?Patents Cited in the Scientific Literature: An Exploratory Study of ?Reverse? Citation Relations? This paper reports on a new approach to study the linkage between science and technology. Unlike most contributions to this area we do not trace citations of scientific literature in patents but explore citations of patents in scientific literature. Our analysis is based on papers recorded in the 1996-2000 annual volumes of the CD Edition of Science Citation Index? (SCI) of the Institute for Scientific Information (ISI) and patent data provided by the US Patent and Trademark Office. Almost 30,000 US patents were cited by scientific research papers. We analysed the citation links by scientific fields and technological sectors. Chemistry-related subfields tended to cite patents more than other scientific area. Among technological sectors, chemicals clearly dominates followed by drugs and medical patents as the most frequently cited categories. Further analyses included a country-ranking based on inventor-addresses of the cited patents, a more detailed inspection of the ten most cited patents, and an analysis of class-field transfers. The paper concludes with the suggestions for future research. One of them is to compare our ?reverse? citation data with ?regular? patent citation data within the same classification system to see whether citations occur, irrespectively of their directionality, in the same fields of science and technology. Another question is as to how one should interpret reverse citation linkages. 7. Ortega Priego, Jos? Luis, ?A Vector Space Model as methodological approach to the Triple Helix dimensionality: A comparative study of Biology and Biomedicine Centers of two European National Research Councils from a Webometric View.? The aim of this paper is to propose a Vector Space Model as a new methodological approach which allows us to present the relationships between the elements of the Triple Helix Model (University, Industry, Government) in a spacial model by using the webpages of the National Research Councils of Germany and Spain as examples. Outlinks of the Biomedicine and Biology centres of these national councils were analysed with the intention of representing graphically these relationships through the Vector Space Model that allows for Multidimensional Scaling in three dimensions. Results show a map with the differences and similarities between the Spanish and German cases. It may be concluded that these results could become a qualitative indicator of a scientific and technical reality. 8. Leydesdorff, Loet, ?The Mutual Information of University-Industry-Government Relations: An Indicator of the Triple Helix Dynamics? University-industry-government relations provide a networked infrastructure for knowledge-based innovation systems. This infrastructure organizes the dynamic fluxes locally and the knowledge base remains emergent given these conditions. Whereas the relations between the institutions can be measured as variables, the interacting fluxes generate a probabilistic entropy. The mutual information among the three institutional dimensions provides us with an indicator of this entropy. When this indicator is negative, self-organization can be expected. The self-organizing dynamic may temporarily be stabilized in the overlay of communications among the carrying agencies. The various dynamics of Triple Helix relations at the global and national levels, in different databases, and in different regions of the world, are distinguished by applying this indicator to scientometric and webometric data. _____ Loet Leydesdorff Amsterdam School of Communications Research (ASCoR) Kloveniersburgwal 48, 1012 CX Amsterdam Tel.: +31-20- 525 6598; fax: +31-20- 525 3681 loet at leydesdorff.net ; http://www.leydesdorff.net/ The Challenge of Scientometrics ; The Self-Organization of the Knowledge-Based Society -------------- next part -------------- An HTML attachment was scrubbed... URL: From garfield at CODEX.CIS.UPENN.EDU Tue Aug 5 10:48:36 2003 From: garfield at CODEX.CIS.UPENN.EDU (Eugene Garfield) Date: Tue, 5 Aug 2003 10:48:36 -0400 Subject: Hou JY, Zhang YC "Effectively finding relevant Web pages from linkage information" IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 15 (4): 940-951 JUL-AUG 2003 Message-ID: Jingyu Hou jingyu at deakin.edu.au Yanchun Zhang yzhang at csm.vu.edu.au FULL TEXT AVAILABLE AT : http://sci.vu.edu.au/~yzhang/papers/115730-final.pdf TITLE : Effectively finding relevant Web pages from linkage information AUTHOR : Hou JY, Zhang YC JOURNAL : IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 15 (4): 940-951 JUL-AUG 2003 Document type: Article Language: English Cited References: 28 Times Cited: 0 Abstract: This paper presents two hyperlink analysis-based algorithms to find relevant pages for a given Web page (URL). The first algorithm comes from the extended cocitation analysis of the Web pages. It is intuitive and easy to implement. The second one takes advantage of linear algebra theories to reveal deeper relationships among the Web pages and to identify relevant pages more precisely and effectively. The experimental results show the feasibility and effectiveness of the algorithms. These algorithms could be used for various Web applications, such as enhancing Web search. The ideas and techniques in this work would be helpful to other Web-related researches. Author Keywords: World Wide Web, Web search, information retrieval, hyperlink analysis, singular value decomposition (SVD) Addresses: Hou JY, Deakin Univ, Sch Informat Technol, Melbourne, Vic 3125, Australia Deakin Univ, Sch Informat Technol, Melbourne, Vic 3125, Australia Victoria Univ Technol, Sch Comp Sci & Math, Melbourne, Vic 8001, Australia Publisher: IEEE COMPUTER SOC, LOS ALAMITOS IDS Number: 696RY ISSN: 1041-4347 Cited Author Cited Work Volume Page Year BHARAT K P 21 ANN INT ACM SIG 104 1998 BHARAT K P 7 INT WORLD WID WE 469 1998 BRIN S P 7 INT WORLD WID WE 1998 BRIN S PAGERANK CITATION RA 1998 CARR LA P ACM HYP 98 PITTSB 113 1998 CHAKRABARTI S P 7 WORLD WID WEB C 65 1998 CHEN C P 10 ACM C HYP HYP 9 51 1999 CHEN C P 8 ACM C HYP HYP 97 177 1997 DATTA BN NUMERICAL LINEAR ALG 1995 DEAN J P 8 INT WORLD WID WE 389 1999 DEERWESTER S J AM SOC INFORM SCI 41 391 1990 ELBELTAGY SR P 12 ACM C HYP HYP 151 2001 GARFIELD E SCIENCE 178 471 1972 GIBSON D P ACM C HYP HYP 225 1998 GOLUB GH MATRIX COMPUTATIONS 1993 HOU J J APPL SYSTEMS STUDI 3 2002 HOU J P 1 INT C WEB INF SY 1 333 2000 HOU J P 13 AUSTR DAT C ADC 65 2002 KAINDL H P 9 ACM C HYP HYP 217 1998 KLEINBERG J J ACM 46 1999 LARSON R P ANN M AM SOC INF S 1996 MUKHERJEA S COMPUT NETWORKS ISDN 27 1075 1995 MUKHERJEA S P 8 ACM C HYP SOUTH 187 1997 MUKHERJEA S P ECHT94 136 1994 PAPADIMITRIOU C P ACM S PRINC DAT SY 1997 PITKOW J P C HUM FACT COMP SY 383 1997 TERVEEN L P C HUM FACT COMP SY 448 1998 WEISS R P 7 ACM C HYP 180 1996 When responding, please attach my original message _______________________________________________________________________ Eugene Garfield, PhD. email: garfield at codex.cis.upenn.edu home page: www.eugenegarfield.org Tel: 215-243-2205 Fax 215-387-1266 President, The Scientist LLC. www.the-scientist.com Chairman Emeritus, ISI www.isinet.com Past President, American Society for Information Science and Technology (ASIS&T) www.asis.org _______________________________________________________________________ From Peter.van.den.Besselaar at NIWI.KNAW.NL Wed Aug 6 02:54:20 2003 From: Peter.van.den.Besselaar at NIWI.KNAW.NL (Peter van.den.Besselaar) Date: Wed, 6 Aug 2003 08:54:20 +0200 Subject: Descriptive statistics, inferential statistics, rhetorical statistics Message-ID: In a contribution to this list, Loet Leydesdorff replied to my brief communication in JASIST (2003-1) "Empirical evidence for self-organization?". My reply - as letter to the editor - is now published in JASIST 2003-9: "Descriptive statistics, inferential statistics, rhetorical statistics" Loet Leydesdorff (2003) argues that my analysis (Van den Besselaar, 2003) is not correct and not relevant. In his argument, however, he mixes up samples and populations, and he incorrectly uses concepts such as 'significance' and 'eigenstructures'. Leydesdorff's data are attributes of the papers in "a carefully selected set" of biotechnology journals. In other words, it is not a sample from a larger set of journals, and therefore he analyzes on the level of the population. Applying statistical techniques on a population is descriptive statistics. Of course statistical packages like SPSS calculate 'significance levels' but these belong to the realm of inferential statistics, that is generalizing from random samples to populations. In his claim that my "simulation results usually did not pass the significance tests provided by SPSS" and that his "results using bibliometric data did pass these tests", he is confusing samples and populations. As there is no sample whatsoever, using the qualification 'significant' is irrelevant and misplaced. The same holds when he uses the results of the simulations to conclude that "the network of words does also not significantly correlate with the geographical division." Samples come into play when testing the quality of the discriminant analysis. I have drawn random samples, and use the sample statistics (the discriminant functions) to predict the population parameters. As every random sample fails to do this, one has to conclude that using discriminant analysis for describing the relation between 'title words' and 'region of origin' is wrong. This can be explained by the large number of unique observations in the data, and this also explains the results of the simulations (Van den Besselaar, Heimeriks 1998, pp 98-100). Leydesdorff argues that this test of the DA is not relevant because "one cannot expect any significant correlation between the eigenstructures of highly specific samples." Of course one does not expect this in case of highly specific samples, but my test shows that the eigenstructures of random samples are completely different. Leydesdorff states that I misread and selectively quote his paper, as he is not doing first order data analysis. He tries to develop a 'new methodology for second order theorizing' to answer 'what-if questions' about the interaction between the global knowledge production system and regional institutionalization. I do not have problems with type of questions, but the 'new methodology' needs clarification: what can we conclude from the 'significant' correlations between the 'regional word sets' with the word sets representing the 'intellectual space' (Leydesdorff & Heimeriks 2001, p.1268)? First, the mapping of the intellectual space is based on a very weak factor structure (Leydesdorff & Heimeriks 2001, 1266). Second, the regional word sets are highly questionable (Van den Besselaar 2003). Additionally, I showed that the positions of the three regions within the intellectual space change from year to year (E-mail communication, November 1998). This change is so implausible that one should seriously doubt about the adequacy of the methods used to measure these positions: the discriminant analysis. The conclusion is that, despite the 'significant' results, the 'new methodology' is not convincing. What remains is an example of rhetorical statistics. From loet at LEYDESDORFF.NET Wed Aug 6 03:23:12 2003 From: loet at LEYDESDORFF.NET (Loet Leydesdorff) Date: Wed, 6 Aug 2003 09:23:12 +0200 Subject: Descriptive statistics, inferential statistics, rhetorical statistics Message-ID: Rejoinder to Van den Besselaar's Letter entitled "Descriptive statistics, inferential statistics, rhetorical statistics." Van den Besselaar (2003) illustrates his argument with a quote of our conclusion that "the network of words does not significantly correlate with the geographical division" (Leydesdorff & Heimeriks, 2001, p. 1266). However, this conclusion was entirely based on replicating the simulations suggested by Van den Besselaar in previous exchanges, i.e., on random samplings. Van den Besselaar & Heimeriks (2000, pp. 89-93) was for that reason provided as a reference. Indeed, the inference cannot be based on the descriptive statistics. Loet Leydesdorff ----- Original Message ----- From: "Peter van.den.Besselaar" To: Sent: Wednesday, August 06, 2003 8:54 AM Subject: [SIGMETRICS] Descriptive statistics, inferential statistics, rhetorical statistics > In a contribution to this list, Loet Leydesdorff replied to my brief communication in JASIST (2003-1) "Empirical evidence for self-organization?". My reply - as letter to the editor - is now published in JASIST 2003-9: > > > "Descriptive statistics, inferential statistics, rhetorical statistics" > > Loet Leydesdorff (2003) argues that my analysis (Van den Besselaar, 2003) is not correct and not relevant. In his argument, however, he mixes up samples and populations, and he incorrectly uses concepts such as 'significance' and 'eigenstructures'. > > Leydesdorff's data are attributes of the papers in "a carefully selected set" of biotechnology journals. In other words, it is not a sample from a larger set of journals, and therefore he analyzes on the level of the population. Applying statistical techniques on a population is descriptive statistics. Of course statistical packages like SPSS calculate 'significance levels' but these belong to the realm of inferential statistics, that is generalizing from random samples to populations. In his claim that my "simulation results usually did not pass the significance tests provided by SPSS" and that his "results using bibliometric data did pass these tests", he is confusing samples and populations. As there is no sample whatsoever, using the qualification 'significant' is irrelevant and misplaced. The same holds when he uses the results of the simulations to conclude that "the network of words does also not significantly correlate with the geographical division." > > Samples come into play when testing the quality of the discriminant analysis. I have drawn random samples, and use the sample statistics (the discriminant functions) to predict the population parameters. As every random sample fails to do this, one has to conclude that using discriminant analysis for describing the relation between 'title words' and 'region of origin' is wrong. This can be explained by the large number of unique observations in the data, and this also explains the results of the simulations (Van den Besselaar, Heimeriks 1998, pp 98-100). Leydesdorff argues that this test of the DA is not relevant because "one cannot expect any significant correlation between the eigenstructures of highly specific samples." Of course one does not expect this in case of highly specific samples, but my test shows that the eigenstructures of random samples are completely different. > > Leydesdorff states that I misread and selectively quote his paper, as he is not doing first order data analysis. He tries to develop a 'new methodology for second order theorizing' to answer 'what-if questions' about the interaction between the global knowledge production system and regional institutionalization. I do not have problems with type of questions, but the 'new methodology' needs clarification: what can we conclude from the 'significant' correlations between the 'regional word sets' with the word sets representing the 'intellectual space' (Leydesdorff & Heimeriks 2001, p.1268)? First, the mapping of the intellectual space is based on a very weak factor structure (Leydesdorff & Heimeriks 2001, 1266). Second, the regional word sets are highly questionable (Van den Besselaar 2003). Additionally, I showed that the positions of the three regions within the intellectual space change from year to year (E-mail communication, November 1998). This change is so implausible t! > hat one should seriously doubt about the adequacy of the methods used to measure these positions: the discriminant analysis. The conclusion is that, despite the 'significant' results, the 'new methodology' is not convincing. What remains is an example of rhetorical statistics. From ronald.rousseau at KHBO.BE Sat Aug 9 13:46:31 2003 From: ronald.rousseau at KHBO.BE (Ronald Rousseau) Date: Sat, 9 Aug 2003 19:46:31 +0200 Subject: BRS compactness Message-ID: Dear Colleagues, Gene Garfield asked me to put this on the list. I hope it is useful for some of you. Best regards, Ronald Rousseau *********************************************************************** BRS-compactness in networks: Theoretical considerations related to cohesion in citation graphs, collaboration networks and the internet Mathematical and Computer Modelling Volume 37, Issues 7-8 , April 2003 , Pages 879-899 L. Egghe and R. Rousseau Abstract Compactness as introduced by Botafogo, Rivlin and Shneiderman, in short: BRS- compactness, is studied in general, as it can be used to describe the cohesion of parts of the internet or collaboration networks, and in the particular case of a unidirectional network, such as a citation graph. It is shown that the connection coefficient is an upper bound for the BRS-compactness value of a network. During our investigations, we derive an upper bound for the generalized Wiener index of a directed graph. Several networks are constructed and their BRS-compactness values are calculated. Author Keywords: BRS-compactness; Networks; Hyperlinks; Internet; Citation networks; Collaboration graphs; Generalized Wiener index; Sum of distances in a graph References 1. J. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan and A. Tomkins, The web as a graph: Measurements, models, and methods. In: Proceedings of the Fifth Annual International Computing and Combinatorics Conference (1999). 2. A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins and J. Wiener, Graph structure in the web. In: Proceedings of the 9th International World Wide Web Conference (2000). 3. R.A. Botafogo, E. Rivlin and B. Shneiderman, Structural analysis of hypertexts: Identifying hierarchies and useful metrics. ACM Transactions on Information Systems 10 (1992), pp. 142?180 4. S. Johnson, Control for hypertext construction. Communications of the ACM 38 8 (1995), p. 87. 5. J. de Vocht, Experiments for the characterization of hypertext structures. In: Masters Thesis, Eindhoven University of Technology (1994). 6. E. Rivlin, R. Botafogo and B. Shneiderman, Navigating in hyperspace: Designing a structure-based toolbox. Communications of the ACM 37 2 (1994), pp. 87?96. 7. G. Salton, J. Allan and C. Buckley, Automatic structuring and retrieval of large text files. Communications of the ACM 37 2 (1994), pp. 97?108. 8. L. Calvi and P. de Bra, Using dynamic hypertext to create multi-purpose textbooks. In: Proceedings of ED-MEDIA 97 (1997). 9. E. Mendes, W. Hall and R. Harrison, Applying metrics to the evaluation of educational hypermedia applications. Journal of Universal Computer Science 4 (1998) 10. K. Khan and C. Locatis, Searching through cyberspace: The effects of link display and link density on information retrieval from hypertext on the world wide web. Journal of the American Society for Information Science 49 (1998), pp. 176?182. 11. G.H. Leazer and J. Furner, Topological indices of textual identity networks. In: L. Woods, Editor, Proceedings of the 62nd Annual Meeting of the American Society for Information Science, Knowledge: Creation, Organization and Use, Information Today, Medford, NJ (1999), pp. 345?358. 12. M. Randi , On characterization of molecular branching. Journal of the American Chemical Society 97 (1975), pp. 6609?6615. 13. D.J. de Solla Price, Networks of scientific papers. Science 149 (1965), pp. 510?515. 14. M.A. Shepherd, C.R. Watters and Y. Cai, Transient hypergraphs for citation networks. Information Processing and Management 26 (1990), pp. 395?412 15. A. Pritchard, On the structure of information transfer networks. In: M. Phil. Thesis, School of Librarianship, Polytechnic of North London (1984). 16. Y. Ding, S. Foo and G. Chowdhury, A bibliometric analysis of collaboration in the field of information retrieval. International Information and Library Review 30 (1998), pp. 367?376. 17. H. Kretschmer, Types of two-dimensional and three-dimensional collaboration patterns. In: C. Macias-Chapula, Editor, Proceedings of the Seventh Conference of the International Society for Scientometrics and Informetrics, Universidad de Colima, Mexico (1999), pp. 244?266. 18. A.L. Barabasi and R. Albert, Emergence of scaling in random networks. Science 286 (1999), pp. 509?512 19. S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan and A. Tomkins, Hypersearching the web. Scientific American 280 6 (1999), pp. 54?60. 20. S. Brin and L. Page, Anatomy of a large-scale hypertextual web-search engine. In: Proceedings of the 7th International World Wide Web Conference (1998), pp. 107?117 Brisbane, Australia . 21. M.R. Henzinger, Hyperlink analysis for the web. IEEE Internet Computing 5 1 (2001), pp. 45?50 22. G. Pinski and F. Narin, Citation influences for journal aggregates of scientific publications: Theory, with applications to the literature of physics. Information Processing and Management 12 (1976), pp. 297?312 23. N. Geller, On the citation influence methodology of Pinski and Narin. Information Processing and Management 14 (1978), pp. 93?95 24. J. Kleinberg, Authoritative sources in a hyperlinked environment. Journal of the ACM 46 5 (1999), pp. 604?632. 25. P. de Bra, Using hypertext metrics to measure research output levels. Scientometrics 47 (2000), pp. 227?236 26. D. Knuth. In: The Art of Computer Programming, Volume 1. Fundamental Algorithms, Addison-Wesley, Reading, MA (1969). 27. W.K. Chen, Applied Graph Theory. , North-Holland, Amsterdam (1971). 28. R.J. Wilson, Introduction to Graph Theory. , Longman, London (1972). 29. M.E.J. Newman, The structure of scientific collaboration networks. In: Proceedings of the National Academy of Science 98 (2001), pp. 404?409 (2) . 30. C. Berge, Th?orie des Graphes et ses Applications. , Dunod, Paris (1967). 31. A. Gibbons, Algorithmic Graph Theory. , Cambridge University Press, Cambridge, UK (1985). 32. F. Harary, Graph Theory. , Addison-Wesley, Reading, MA (1969). 33. N. Trinajsti , Chemical Graph Theory. , CRC Press, Boca Raton, FL (1992). 34. H. Wiener, Structural determination of paraffin boiling points. Journal of the American Chemical Society 69 (1947), pp. 17?20. 35. Y.W. Kim and J.H. Kim, A model of knowledge based information retrieval with hierarchical concept graph. Journal of Documentation 46 (1990), pp. 113? 136. 36. J. Plesnik, On the sum of all distances in a graph or digraph. Journal of Graph Theory 8 (1984), pp. 1?21. 37. R.C. Entringer, D.E. Jackson and D.A. Snyder, Distance in graphs. Czechoslovak Mathematical Journal 26 (1976), pp. 283?296. 38. C.P. Ng and H.H. Teh, On finite graphs of diameter 2. Nanta Mathematica 1 (1966), pp. 72?75. 39. J.K. Doyle and J.E. Graver, Mean distance in a graph. Discrete Mathematics 17 (1977), pp. 147?154. 40. Y. Fang and R. Rousseau, Lattices in citation networks: An investigation into the structure of citation graphs. Scientometrics 50 2 (2001), pp. 273? 287. 41. I. Gutman and O. Polansky, Mathematical Concepts in Organic Chemistry. , Springer-Verlag, Berlin (1986). 42. D.H. Rouvray, Predicting chemistry from topology. Scientific American 255 3 (1986), pp. 36?43. 43. L. Egghe and R. Rousseau, A measure for the cohesion of weighted networks. Journal of the Americana Society for Information Science and Technology 54 3 (2003), pp. 193?202 ************************************************************************* Ronald Rousseau International Program Chair, 9th ISSI Conference - Beijing KHBO - Industrial Sciences and Technology Zeedijk 101 B-8400 Oostende Belgium Guest Professor at the Library of the Chinese Academy of Sciences Honorary Professor Henan Normal University (Xinxiang, China) E-mail: ronald.rousseau at khbo.be web page: users.pandora.be/ronald.rousseau ------------------------------------------- | Please visit www.cscd.ac.cn/issi2003 | | the site of the Beijing ISSI conference | =========================================== From isidro at CINDOC.CSIC.ES Mon Aug 11 07:45:19 2003 From: isidro at CINDOC.CSIC.ES (Isidro F. Aguillo) Date: Mon, 11 Aug 2003 13:45:19 +0200 Subject: Cybermetrics: Call for papers Message-ID: Cybermetrics is not dead Since 1997, Cybermetrics the electronic journal devoted to informetrics/scientometrics/cybermetrics/webometrics is the main scientometrics-related website. We think it is playing an important role as reliable provider of information, news, bibliography, software and even data for our community. Unfortunately the number of quality manuscripts we received were not very high and the papers finally accepted for publication are becoming scarce. But did you know that Rousseau article (Cybermetrics, 1997, "sitations") is cited at least 43 times in the Web of Science alone, being his most-cited paper!. It is becoming evident that web visibility is much higher than printed visibility (at least for most journals). We are planning to update both the design and contents of the e-journal but we badly needed additional contributions from the scientometrics/informetrics community so we invite all of you to submit your manuscripts for editorial reviewing. Thanks in advance -- ^ Isidro F. Aguillo isidro at cindoc.csic.es CINDOC-CSIC Joaquin Costa, 22 28002 Madrid. SPAIN +34-630858997 www.cindoc.csic.es/cybermetrics ^ -------------- next part -------------- An HTML attachment was scrubbed... URL: From harnad at ECS.SOTON.AC.UK Mon Aug 11 19:10:55 2003 From: harnad at ECS.SOTON.AC.UK (Stevan Harnad) Date: Tue, 12 Aug 2003 00:10:55 +0100 Subject: Free Access vs. Open Access In-Reply-To: Message-ID: On Mon, 11 Aug 2003, Matthew Cockerill wrote: >sh> "The use one makes of those full texts is to read them, >sh> print them off, quote/comment them, cite them, and use >sh> their *contents* in further research, building on them. >sh> What is "re-use"? And what is "redistribution" (when >sh> everyone on the planet with access to the web has access >sh> to the full-text of every such article)?" > > Having free access to articles on the publisher's website would certainly > offer progress compared to the current status quo. But it would not offer > anything like the benefits of true open access. Free access to the current 20,000 journals (2 million articles yearly) would be like the difference between night and day. Compared to that, the difference between "free" and "true open" access amounts to just a few degrees of luminosity. But let me agree at once that if free access were gerrymandered so all the user could do was to browse the text on-screen, without being able to download, save, grep, or print-off, then that would indeed arbitrarily limit free access's usefulness. How many (if any) of the several million free-access refereed-journal articles currently on the web, however -- whether BOAI-1, BOAI-2, or otherwise -- are gerrymandered in that way? If (as I suspect) the answer is "very few" or even "none that I know of," then this hypothetical constraint is not worth another moment's thought or energy diverted from the real task at hand, which is to turn night into day, as soon as possible. > Here are just some of the > reasons why re-use and re-distribution rights are vital to open access: > > (1) Digital permanence - it is not enough for the publisher to be the only > body which curates the full archive of published research content. To ensure > long term digital permanence of the scientific record, it is vital that > articles should be deposited with multiple archives, and redistributable > from and between those archives. It seems to me that this is conflating (arbitrarily) two completely independent matters. One is toll-free online *access* to the articles in the 20K journals that are currently only accessible via tolls. The other is the *preservation* of that toll-based corpus. Well, preservation of that toll-based corpus was always a concern, in on-paper days as in on-line days, and the concern has nothing whatsoever to do with free (or open) access! We could have a failsafe preservation system without free access, or we could have a failsafe preservation with free access; or we could have an uncertain preservation system without free access (as we do now) or an uncertain preservation system with free access (bringing the present system out into the light of day). The preservation burden has to be (and will be, and is being) faced in any case. Why on earth should that entirely orthogonal longterm task be coupled in *any way* to the immediate and urgent problem of free access today? And why should "open access" be linked with or defined in terms of the eventual solution to the preservation problem, one way or the other? (This is not an argument for indifference to preservation: it is an argument for decoupling two completely independent desiderata.) > (2) A flexible choice of tools for searching and browsing > The reason that Google exists is because the web is free for anyone to > download and index. As a result, there is competition among search engines, > and Google had the incentive to develop a better system for indexing web > pages, which has since driven other search engine companies to improve the > tools they offer. > > Compare this with the situation with scientific research. If the research > resides only on the publisher's site, you don't have a free choice of what > tools you use to search and browse it - you are stuck with what that > particular publisher provides you with. We are quite squarely in the domain of hypotheticals here. (Which publisher's free-access corpus, inaccessible to google, are we talking about?) But let us suppose that a publisher provides free access -- not gerrymandered free access, but free access that allows downloading, saving, grepping and printing: First, I will bet that such a publisher will want to maximize the visibility and impact of his contents by allowing at least the indexing metadata to be harvested, both by google, and by the OAI search engines specializing in the refereed journal literature. But even if we get doubly hypothetical here, and suppose the publisher does *not* disclose the metadata to harvesters, there is still a super-simple solution: Every author has an online CV. Their CV will contain the metadata for every one of their journal publications. (Such CVs can and will be OAI-compliant: http://paracite.eprints.org/cgi-bin/rae_front.cgi ). Add the URL for the free-access full-text on the publisher's website to your CV entry and the circle is closed. (Better still, also self-archive the full text in your own institutional OAI-compliant repository!) End of story. > This ties in with developments in Grid computing (e.g. > http://www.escience-grid.org.uk/ ). With open access, published research > would be available "on tap" via the grid, and scientists would be able to > use their preferred choice of grid tools to access the data, rather than > being stuck with the tools provided by the publisher. As stated above, the CV/OAI gambit above already trivially takes care of closing the circle. I agree, though, that for many research purposes, it is beneficial to have not just the metadata but the full-text inverted and indexed, as well as agent-harvestable and. Again, if the publisher's free-access site doesn't do this, the author's institutional site certainly can and will. In fact, authors and their institutions are the ones with the most direct interest in making sure their own research output is maximally usable in this way. http://www.ecs.soton.ac.uk/~harnad/Temp/unto-others.html Let us not, however, conflate article-text archiving with data-archiving. Data-archiving is important too, but it is an extra: an independent new bonus of the online era, having nothing to do with the question of toll-free access to article-texts. In the paper era, raw data were not published, just summarized in what was published. Eventually data will no doubt be incorporated into online publications in some way, but until then there is certainly no need for authors to wait! They can publish their article, as before, and, in addition, self-archive the data on which their article is based in their own OAI-compliant institutional research repository (the same repository in which the full-text of their article can and should be self-archived too, whether it appears in an open-access journal, a toll-access journal, or a toll-access journal that offers toll-free access too). Again, the online CV can close the circle, if it is not already closed of its own accord. And this way, although it is functionally independent, data-archiving can help speed the progress toward toll-free full-text access too. > (3) Datamining > > With a million or so biomedical research articles being published each year, > the sheer volume of output is an obstacle to the comprehension and synthesis > of the results reported in that research. If the XML of the articles can be > brought together in one place then the tools of datamining can be applied to > it to extract useful but non-obvious information. Agreed. See above. But before we get carried away with the potential perks, let's not forget the still absent basics: Let there be Light (toll-free full-text access), now! Leave the Solar-Energy and Club-Med projects for when we already have our daily fill of photons. > The simplest type of datamining is citation analysis > > Currently you need to pay ISI a lot of money to find out what cites what, > but with true open access, citation analysis becomes trivial. Perhaps not quite trivial. (There's still the problem of parsing, identifying and linking the citations for all those articles without the ultimate mark-up: But we're working on it: http://opcit.eprints.org/ ). But again, this is an independent perk, because you could have universal citation linking and analysis even *without* toll-free full-text access! For an article's reference list, like its indexing metadata (and its accompanying empirical data) can all be self-archived by the author (guess where?). We are in fact promoting this solution for royalty-based books, whose authors, unlike journal article-authors, are unlikely to want to make their full-texts accessible toll-free. Their metadata and reference lists, however, are another matter, and can (and will) be tucked into the institutional OAI-compliant repository too, with a new indicator of global book citation impact as the harvestable reward. http://www.ariadne.ac.uk/issue35/harnad/ > So, for example, if you view a PubMed record: > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_ui > ds=11667947&dopt=Abstract > you already get links to all the full text articles in PubMed Central which > cite that PubMed item > http://www.pubmedcentral.gov/tocrender.fcgi?action=cited&tool=pubmed&pubmedi > d=11667947 And if you look at citebase, you will see how this generalizes to the entire OAI-compliant literature: http://citebase.eprints.org/cgi-bin/search > The more true open access research that is published and archived at PubMed > Central, the more useful this becomes for biomedical researchers. [Sure, > "screen-scaping" HTML from free articles displayed on publisher sites could > give some citation information, but with nothing like the ease, accuracy and > reliability that it can be obtained with the use of XML data, as at PubMed > Central]. Fine. But I'd rather have toll-free access to all 20K journals right now, rather than waiting for these XML perks -- wouldn't you? Again, toll-free access is one thing -- and extremely important, already reachable, and already overdue -- and potential perks such as citation-based navigation are another. Let there be light first; then we can worry about calibrating the photometers on our Yashicas. > Beyond citation analysis, there are many other forms of datamining that are > possible: > For more information see: > http://www.biomedcentral.com/info/about/datamining/ > > e.g. Research articles can be mined for details of protein interactions > http://bioinfo.mshri.on.ca/prebind/ See above. Right now, it is an indisputable fact that open-access publishing today (BOAI-2) is the solution only for that 5% of the literature (of 20K journals) that has a suitable open-access journal today. The immediate solution for all the rest is self-archiving (BOAI-1), rather than continuing to wait for more open-access journals to spawn and grow. (If, in the meanwhile, toll-access publishers also want to help hasten things along by providing free access, they are certainly welcome to do so! I still regret -- for the sake of open access -- that the BOAI http://www.soros.org/openaccess/sign2.shtml?o was not ready to count it as publisher support of open access if a toll-access journal supported author self-archiving of their articles http://www.ecs.soton.ac.uk/~harnad/Temp/rcoptable.gif: *Of course* that is publisher support for open access! By the same token, I would certainly consider it as publisher support for open access if a toll-access journal made its full-text contents publicly accessible online toll-free. Even if it was gerrymandered full-text access -- as long as they also supported self-archiving!) > And as scientific content is increasingly marked up using richer forms of > semantically meaningful XML (e.g. CML for chemical structures, MathML for > equations), the value of datamining will continue to increase. All true. And it will all prevail eventually. But we need free access *now*. http://www.ecs.soton.ac.uk/~harnad/Temp/che.htm > The BioLINK group are using BioMed Central's open access corpus as the raw > material for a datamining competition, designed to stimulate progress in the > development of tools for biological datamining. > http://www.pdg.cnb.uam.es/BioLINK/BioCreative_task2.html That is commendable and welcome. But it must not be forgotten what percentage of the annual biological journal literature that sample actually represents. We must not be held back to that small percentage because we are informed that mere free access is not good enough -- not "true open access." Such rarefied fussiness does not serve the cause of either free or open access at this point. > (4) Derivative works and compilations > Say that a scientist performs a meta-analysis on a group of published > clinical trials, and wants to make available the conclusions of that > research. Or perhaps a datamining researcher has taken a corpus of 1000 > articles breast cancer, and established some interesting conclusions. All very welcome and valuable (indeed, inevitable) developments in the online age. But I'd rather that progress toward free access for all 20K did not wait for these perks. Indeed, the sooner we have free access, the sooner the rest will come too. > In a true open access environment, each is free to post the results of their > research, *along with* the actual corpus of data which the research was > based on (effectively, the raw data for that research). > But in a non-open access environment, that raw data (i.e. the research > articles) cannot be redistributed, which makes it far more difficult than it > needs to be for other scientists to reproduce, critique and follow up the > work. I am afraid I have to disagree. As already noted above, authors are as free to self-archive (in their institutional repositories) the empirical data underlying their toll-access publications as they are to do so with the data underlying their open-access publications. Data-archiving is another thing for which there is no point sitting around awaiting the era of universal open-access publishing. Data-archiving will encourage article self-archiving, and both will hasten the era of universal open-access. > Similarly, a scientist may wish to make a point by assembling a collection > of certain articles or article fragments (perhaps they wish to assemble a > comparison of the methods used for a certain technique). > In an open access world, as long as they cite the sources, they are > completely free to create and redistribute that compilation. Such a > selective compilation may in itself be extremely useful contribution to > science. I can't follow this at all. A compilation is a list of articles, whether online or on-paper, whether toll-access of open-access. If the full-texts of the texts are *free* access, all the compilation need list is their URLs. (Ditto for article "fragments": try section number, paragraph number, or even [yech!] PDF page number.) > (5) Print redistribution rights - the National Health Service, for example, > should be able to redistribute thousands of printed copies of an important > research article (which it may have funded) to its doctors if it wishes to > do so. It should not have to pay a hefty copyright fee for the privilege. I have no views on this, but it has nothing to do with open access, which even in the strict BOAI definition refers to online access, not to multiple printing and redistribution rights. Besides, this is all becoming moot in the online era: Why distribute print copies instead of URLs, if the texts are publicly accessible online toll-free? (I think it is a big mistake, and clouds the issue, to try to link online toll-free access arguments with paper-printing rights. Don't forget that those worthy paper-based arguments would have been just as worthy in the paper era. So surely they are *not* what has changed in the online era.) > Certainly, print redistribution will likely become less significant in the > future, but there is no logical reason that the scientific community should > not be free to exchange and distribute the research that it has created in > print form, as well as online. The case for multiple printing rights is *much* weaker than the case for toll-free online access. Please let us not needlessly weaken the case for free access by handicapping it with such needless extra burdens. Free access will erode the need to print, even as it erodes publisher opposition to printing. But now, all fussing about print "redistribution" rights does is provoke needless opposition, to no good purpose. Keep it light, till everyone sees the light. Stevan Harnad NOTE: A complete archive of the ongoing discussion of providing open access to the peer-reviewed research literature online is available at the American Scientist September Forum (98 & 99 & 00 & 01 & 02 & 03): http://amsci-forum.amsci.org/archives/september98-forum.html or http://www.cogsci.soton.ac.uk/~harnad/Hypermail/Amsci/index.html Discussion can be posted to: september98-forum at amsci-forum.amsci.org From M.Davis at UNSW.EDU.AU Wed Aug 13 03:59:05 2003 From: M.Davis at UNSW.EDU.AU (Mari Davis) Date: Wed, 13 Aug 2003 17:59:05 +1000 Subject: Clements & Wang "Who cites what? Economic Record 79, 2003 Message-ID: Subscribers may be interested to know about this article that was published in June this year in an Australian journal, "The Economic Record". Clements, K. W. and Wang, P. (2003) Who cites what? Economic Record, 79 (245):229-244. Abstract: The paper analyses citations in the work of a large number of PhD students. We show that the pattern of citations of journal articles, books, and other reference material differs substantially across areas within economics. An investigation of reciprocal citations reveals a surprisingly low degree of communication among Group of Eight universities [see note below: the top ranked research universities in Australia] and a high propensity to cite authors from the same institution, especially supervisors. We also analyse the Australian share of cited works, and identify journals, articles and authors that PhD students value highly. Author Affiliation: Economic Research Centre, Department of Economics, The University of Western Australia, Western Australia, Australia Note: The Go8 universities include: The University of Adelaide, South Australia; Australian National University, Canberra, ACT; University of Melbourne, Victoria,; Monash University, Clayton, Victoria; University of New South Wales, Sydney, NSW; University of Queensland, Brisbane, Qld; University of Sydney, NSW; and University of Western Australia, Perth, WA. For more information about the work of the Group of Eight universities, see: http://www.go8.edu.au/about.html Mari Davis John Metcalfe Research Fellow Co-Director of Bibliometric & Informetric Research Group (BIRG) School of Information Systems, Technology and Management The University of New South Wales Quadrangle Level 2 Sydney NSW 2052 Australia m.davis at unsw.edu.au http://birg.web.unsw.edu.au/ Tel: +61 2 9385 7127 Fax: +61 2 9662 4061 From garfield at CODEX.CIS.UPENN.EDU Fri Aug 15 17:15:57 2003 From: garfield at CODEX.CIS.UPENN.EDU (Eugene Garfield) Date: Fri, 15 Aug 2003 16:15:57 -0500 Subject: Nature article: Impact factors: just part of a research treadmill Message-ID: Your friend or colleague garfield at codex.cis.upenn.edu thought this article from Nature would be of particular interest to you. Their message is below. Nature article: Impact factors: just part of a research treadmill The address is: http://www.nature.com/cgi-taf/DynaPage.taf?file=/nature/journal/v424/n6950/full/424723b_fs.html Their message: Poignant message about publish or perish in Brazil. EG Nature (www.nature.com) To view the whole range of information available in Nature and its portals, visit www.nature.com/nature To subscribe to the science journal with the most cutting-edge research, news, views and reviews, click www.nature.com/subscribe. From M.Davis at UNSW.EDU.AU Wed Aug 20 03:27:54 2003 From: M.Davis at UNSW.EDU.AU (Mari Davis) Date: Wed, 20 Aug 2003 17:27:54 +1000 Subject: Davis & Wilson (2003) Research Contributions in Ophthalmology: Australia's productivity Message-ID: These 2 items on research productivity in the field of ophthalmology are published in an issue of Clinical & Experimental Ophthalmology (CEO), a journal that listserv readers may not come readily across in searching for informetric studies. The first is a research paper, the second the editorial. The editor points out that some important issues are raised in the paper and he goes on to speculate about underlying trends, possible causes and consequences for Australian research. ************* 1. Davis, M. and Wilson, C.S. (2003) Research contributions in ophthalmology: Australia's productivity. Clinical and Experimental Ophthalmology, 31(4): 286-293. ISSN 1442-6404 Author Affiliation: School of Information Systems, Technology and Management, The Bibliometric & Informetric Research Group, University of New South Wales, Sydney NSW, Australia. Abstract: In 2000, the Australian and New Zealand Journal of Ophthalmology (ANZJO) changed title to Clinical and Experimental Ophthalmology. At this time, a review of Australia's contributions to the literature over the previous 21 years appears timely. Bibliometric indicators are used extensively to assess research performance; they offer views of a field that might not otherwise be apparent. We explore publication output data to construct a picture of ophthalmology that may be of benefit to researchers and ophthalmologists. Methods: Science Citation Index and Social Sciences Citation Index were used to collate data on ophthalmology research literature from 1980 to 2000. The paper focuses particularly on Australia's contribution to this literature, including publication frequency vis-?-vis the world, collaboration, and the journals in which Australian researchers frequently publish. Comparison is also made for other countries of similar scientific stature or language. Results: Since 1980, Australia has ranked in the top ten nations contributing to world research. Its contribution was close to world average in the 1980s, but increasing numbers of researchers and papers show Australia exceeding the world average during the 1990s. Most collaboration by Australians is within Australia. Although fewer in numbers, collaborative papers with overseas researchers include 28 other countries. Data on the journals in which Australians publish show that Australians continue to publish in its own regional journal. Conclusions: This paper, one of a series on the literature of the vision sciences, provides some initial benchmarks on Australia's standing and contribution to the field of Ophthalmology. References: Sims, JL, McGhee CNJ. Clinical and Experimental Ophthalmology 2003; 31: 1-9. Davis M, Wilson CS, Hood WW. Scientometrics 1999; 46: 399-416. Kumbar M. Akhtary, S. Library Science with a Slant to Documentation and Information Studies 1998; 35: 201-207. Davis M, Wilson CS. Scientometrics 2001a; 52: 395-410. Davis M. Wilson CS. Proceedings, 2nd Berlin Workshop on Scientometrics and Informetrics - Collaboration in Science and in Technology. 2001b; 47-61. Ugolini D, Cimmino MA, Casilli, C, Mela GS. Scientometrics 2001; 52: 45-58. Australian Science and Technology Council (ASTEC). Profiles of Australian Science. 1989. Schubert A, Gl?nzel W, Braun T. Scientometrics 1989; 16:3-478. Institute for Scientific Information (ISI). Science in Australia, 1997-2001. At ********* 2. Editorial: McMenamin, Paul G.(2003)Looking into the mirror: research productivity in Australian ophthalmology. Clinical & Experimental Ophthalmology, 31(4):281-283. Author Affiliation: School of Anatomy and Human Biology, University of Western Australia, Perth, WA Australia. References: Davis M. & Wilson CS. CEO 2003, 31(4):286-93. Sims JL & McGhee CN. CEO 2003, 31(1):14-22. Asker DA, Glasziou PP, DelMar,CB. MedJAust 2001, 175:340-41 Wierzbicki AS, Reynolds TM. J Clin Pathol 2002, 55:495-8. McGhee CN 2003. Editorial CEO 2003, 31:1-3. Lee AJ et al 2003. CEO 2003, 31:331-35. Tretiach m, VanDriel D, Gillies MC 2003. CEO 2003, 31:348-53 Martins a, et al 2003. CEO 2003, 31:354-56. Mari Davis John Metcalfe Research Fellow School of Information Systems, Technology and Management The University of New South Wales Quadrangle Level 2 Sydney NSW 2052 Australia m.davis at unsw.edu.au http://birg.web.unsw.edu.au/ Tel: +61 2 9385 7127 Fax: +61 2 9662 4061 From Andrea.Scharnhorst at NIWI.KNAW.NL Wed Aug 20 07:37:39 2003 From: Andrea.Scharnhorst at NIWI.KNAW.NL (Andrea Scharnhorst) Date: Wed, 20 Aug 2003 13:37:39 +0200 Subject: JCMC special issue Message-ID: The actual JCMC special issue contains articles also relevant for webometrics. Journal of Computer Mediated Communication Special issue: Internet Networks: The Form and the Feel http://www.ascusc.org/jcmc/vol8/issue4/ Editors: Anne Beaulieu and Han Woo Park * Anne Beaulieu: "Combining Approaches for the Study of Networks on the Internet" (Editor's Introduction) * Iina Hellsten: "Focus on Metaphors: The Case of 'Frankenfood' on the Web" * Devan Rosen, Joseph Woelfel, Dean Krikorian, George A. Barnett: "Procedures for Analyses of Online Communities" * Kirsten A. Foot, Steven M. Schneider, Meghan Dougherty, Michael Xenos, Elena Larsen: "Analyzing Linking Practices: Candidate Sites in the 2002 US Electoral Web Sphere" * Paul Wouters, Diana Gerbec: "Interactive Internet? Studying Mediated Interaction with Publicly Available Search Engines" * Andrea Scharnhorst: "Complex Networks and the Web: Insights from Nonlinear Physics" * Han Woo Park, Mike Thelwall: "Hyperlink Analyses of the World Wide Web: A Review" JCMC is a free peer-reviewed academic on-line journal sponsored by the Annenberg School for Communication at the University of Southern California. (from the editor's introduction) Networks and the study of Internet phenomena are in many ways inseparable. Beyond the power of the metaphor, though, and the obvious kinship of certain approaches to the study of the Internet such as 'network analysis,' the relation between research methods and the constitution of networks as empirical objects must be articulated in correspondence with every research question. Two approaches seem to prevail in Internet scholarship so far: substantial analysis on a case-by-case basis on the one hand, and formal network analysis on the other. Networks have therefore been studied in terms of their substance, for example via the common cultures of individuals who socialize through the Internet. New forms of expression have also been identified. Another stream of research has addressed the more formal aspects of networks, often using automated tools that render these networks quantitatively. Given this distinction between formal and substantive approaches, Internet studies seem to be reproducing some of the distinctions between qualitative and quantitative styles that have been deplored across most social sciences, from psychology to communication. Dr. Andrea Scharnhorst NERDI Netherlands Institute for Scientific Information Services (NIWI) KNAW Joan Muyskenweg 25 Postbus 95110 1090 HC Amsterdam The Netherlands Tel: +20 4628 670 www.niwi.knaw.nl/nerdi -------------- next part -------------- An HTML attachment was scrubbed... URL: From garfield at CODEX.CIS.UPENN.EDU Mon Aug 25 13:52:57 2003 From: garfield at CODEX.CIS.UPENN.EDU (Eugene Garfield) Date: Mon, 25 Aug 2003 13:52:57 -0400 Subject: Takeo Nakayama, Tsuguya Fukui, Shunichi Fukuhara,Kiichiro Tsutani, Shigeaki Yamazaki,"Comparison Between Impact Factors and Citations in Evidence-Based Practice Guidelines" JAMA 290(6):755-756, August 13 2003 Message-ID: Takeo Nakayama : nakayama at pbh.med.kyoto-u.ac.jp REPRINTED WITH PERMISSION FROM THE AUTHOR Title Comparison Between Impact Factors and Citations in Evidence-Based Practice Guidelines Journal JAMA. 290(6):755-756. August 13, 2003 Authors Takeo Nakayama, Tsuguya Fukui, Shunichi Fukuhara, Kiichiro Tsutani, Shigeaki Yamazaki, To the Editor: Impact factors of medical journals are calculated as the total number of current citations of articles published in a journal during the previous 2 calendar years divided by the total number of designated articles published in that journal during the same period.1 Thus, impact factors indicate the annual average number of citations of articles that have appeared in a given journal. Impact factors are widely regarded as a quality ranking for scientific journals. Concerns have arisen, however, that scientific communities might be overly reliant on impact factors to assess the worth of scientific publications, as these numbers may be artifically inflated in a number of ways.2-4 A related problem is that entire scientific disciplines tend to be evaluated based on average impact factors of their collective journals.5 Despite these concerns, researchers who have published in journals with high impact factors may be more likely to be rewarded by their institutions. We attempted to validate this interpretation of impact factors by analyzing the bibliographic citations of articles used to support the Guide to Clinical Preventive Services6 by the US Preventive Services Task Force (USPSTF). The USPSTF guidelines are generally thought to reflect the highest level of scientific evidence. Thus, we hypothesized that the guidelines should be supported by a larger number of citations from journals with high impact factors. Methods We assessed all references in all 25 chapters of the current version of the USPSTF guidelines and counted the number of references from each journal listed. The journals' impact factors for 2001 were obtained from the Institute for Scientific Information and the Journal of Citation Reports (Science/Social Science; version for 2001).7 Counts were recombined for journals that were renamed. Results Among the total 1740 citations in the reference sections of the 25 chapters, 1531 were from scientific journals and 209 were from academic books and official reports. The most cited journal was JAMA with 135 citations, followed by the American Journal of Preventive Medicine (102), BMJ (77), and The Lancet (70). Fifty-six journals had articles cited more than 5 times, comprising a total of 1185 citations. Of these 56 journals, 6 (11%) had an impact factor of more than 10.0; 10 (18%) had an impact factor of 5.0 to 10.0; 11 (20%) had an impact factor of 3.0 to 5.0; and 28 (51%) had impact factors of less than 3.0. Of this latter group, 11 20%) had impact factors of less than 2.0. Only 7 journals (13%) appeared in the top 100 journals ranked by impact factors (2001). The median impact factor of these 56 journals was 2.76. There was a significant correlation between impact factors and times cited in the USPSTF guidelines (Kendall r = 0.26, P = .005). Comment We found that the number of citations by the USPSTF guidelines roughly parallels the impact factors for the respective journals. Journals with low impact factors, however, were also cited frequently as providing important evidence. This finding may reflect the fact that journals that focus on preventive services tend to have lower impact factors than do journals in other scientific disciplines. Some of the possible domains of impact of journal articles that cannot be measured by impact factors are changes in readers' knowledge, practice, clinical outcomes, funding priorities for research, and prompting of further learning. Overreliance on impact factors may undervalue the unique contributions of individual areas of research. In the field of clinical or preventive medicine, in particular, citation analyses on evidence-based practice guidelines may be a more accurate assessment of the contributions of individual journals and researchers. Although we only assessed the area of preventive health services, we suspect that this general conclusion may extend to other areas of scientific inquiry. Acknowledgment: We thank Ms Akiko Yoshida for assistance in editing the manuscript. Takeo Nakayama, MD, PhD Department of Medical System Informatics Tsuguya Fukui, MD, PhD Department of General Medicine and Clinical Epidemiology Shunichi Fukuhara, MD, DMsc Department of Healthcare Research Kyoto University Graduate School of Medicine Kyoto, Japan Kiichiro Tsutani, MD, PhD Department of Pharmacoeconomics Graduate School of Pharmaceutical Sciences The University of Tokyo Tokyo, Japan Shigeaki Yamazaki, PhD Department of Library and Information Science Aichi Shukutoku University Aichi, Japan 1. Garfield E. Which medical journals have the greatest impact? Ann Intern Med. 1986;105:313-320. ISI | MEDLINE 2. Seglen PO. Why the impact factor of journals should not be used for evaluating research? BMJ. 1997;314:498-502. FULL TEXT 3. Smith R. Journal accused of manipulating impact factor. BMJ. 1997;314:463. 4. Hemmingsson A, Mygind T, Skjennald A, Edgren J. Manipulation of impact factors by editors of scientific journals. AJR Am J Roentgenol. 2002;178:767. FULL TEXT 5. Valdecasas AG, Castroviejo S, Marcus LF. Reliance on the citation index undermines the study of biodiversity. Nature. 2000;403:698. MEDLINE 6. US Preventive Services Task Force. Guide to clinical preventive services, second edition. Available at: http://www.odphp.osophs.dhhs.gov/pubs/guidecps/. Accessed June 25, 2003. 7. ISI Web of Knowledge. Available at: http://www.isinet.com/isi/products/citation/jcr/jcrweb. Accessibility verified (paid subscription) July 24, 2003. When responding, please attach my original message _______________________________________________________________________ Eugene Garfield, PhD. email: garfield at codex.cis.upenn.edu home page: www.eugenegarfield.org Tel: 215-243-2205 Fax 215-387-1266 President, The Scientist LLC. www.the-scientist.com Chairman Emeritus, ISI www.isinet.com Past President, American Society for Information Science and Technology (ASIS&T) www.asis.org _______________________________________________________________________ From Garfield at CODEX.CIS.UPENN.EDU Tue Aug 26 11:10:37 2003 From: Garfield at CODEX.CIS.UPENN.EDU (Garfield, Eugene) Date: Tue, 26 Aug 2003 11:10:37 -0400 Subject: FW: [Asis-l] JASIST TOC, Volume 54, #12 Message-ID: fyi When responding, please attach my original message __________________________________________________ Eugene Garfield, PhD. email: garfield at codex.cis.upenn.edu home page: http://www.eugenegarfield.org/ Tel: 215-243-2205 Fax 215-387-1266 President, The Scientist LLC. http://www.the-scientist.com/ 3535 Market St., Phila. PA 19104-3389 Chairman Emeritus, ISI http://www.isinet.com/ 3501 Market Street, Philadelphia, PA 19104-3302 Past President, American Society for Information Science and Technology (ASIS&T) http://www.asis.org/ -----Original Message----- From: Richard Hill [mailto:rhill at asis.org] Sent: Monday, August 25, 2003 2:37 PM To: asis-l at asis.org; nancy at cni.org; journals at bubl.ac.uk; jesse at listserv.utk.edu; bwilson at cnri.reston.va.us; jhatzakos at asis.org; AMY.E.FRIEDLANDER at cpmx.mail.saic.com; Einat Amitay; irlist at sheffield.ac.uk Subject: [Asis-l] JASIST TOC, Volume 54, #12 Journal of the American Society for Information Science and Technology Volume 54, Number 12. October 2003 [Note: at the end of this message are URLs for viewing contents of JASIST from past issues. Below, the contents of Bert Boyce's "In this Issue" and from Loren Mendelsohn's Introduction to "Perspectives on...Chemistry Journals: The Transition from Paper to Electronic with Lessons for Other Disciplines" has been cut into the Table of Contents.] CONTENTS EDITORIAL In This Issue Bert R. Boyce 1079 RESEARCH Bibliomining for Automated Collection Development in a Digital Library Setting: Using Data Mining to Discover Web-Based Scholarly Research Works Scott Nicholson Published online 7 July 2003 1081 Nicholson suggests the use of data mining techniques to discover patterns in the world wide web's pages needed for automated collection development for academic digital libraries. Possible techniques include logistic regression, where the variable combinations that best predict classes are discovered and used to predict membership of new observations; memory-based reasoning, like N-neighbor non-parametric analysis, where a distance function between new and existing observations allows a choice among pre-classified neighbors; Decision/classification trees, where rules for dividing a large set are made on the basis of the best discriminating variable; and neural networks, where neurons accept 0-1 measurements for each variable and weigh and combine variables until the optimal weight combination for the training set is determined.. Forty two librarians ranked selection criteria from the literature and suggested additional criteria. Low ranked criteria were removed and new suggestions added with iterations until consensus was reached. These criteria were made operational in a Perl program that analyzed web pages. 4500 scholarly pages were identified for use as a training set, and 500 from other sites as a test set. An additional 4500 non-scholarly pages were identified for the training set and 500 for the test set. Values were collected by the program for each criteria creating surrogate records for the pages. Logistic regression correctly classified 463 scholarly pages and 473 random pages. N-neighbor non-parametric analysis correctly classified 438 scholarly pages and 475 random pages. The classification tree method correctly classified 478 scholarly pages and 480 random pages. Neural networks correctly classified 465 scholarly pages and 469 random pages. Accuracy (precision) varied between 93.75% and 96%, while return (recall) varied form 87.6% to 95.6%.While the classification tree method provided the highest values all models were effective. Overlap in Bibliographic Databases William W. Hood and Concepcion S. Wilson Published online 16 June 2003 1091 From over 100 DIALOG databases Hood and Wilson locate about 15,600 records for a period from 1965 to 1993 on Fuzzy Set Theory by searching "fuzzy" and extracting by hand a list of pertinent records. The data was then cleaned and standardized and a combination of two duplicate detection keys were used to locate overlapping records found in more than one database. The frequency distribution shows no overlap occurs for 63.26% of the records, 12.29% were duplicated once, and .03% were duplicated 12 times, the highest rate. The distribution would appear to fit the inverse power law but an exponential curve provides a better fit. Looking at the papers found in only one database, 42% of the 5815 found in SCISEARCH are unique and represent 15.7% of the total record set. Intra-database duplicates were found in 28 databases. MATHSCI, which retains originals when they are amended, had a 17.8% duplication rate in the fuzzy set literature. While the PASCAL double indexing accounted for its .5% duplication rate, the .4% rate in SCISEARCH resulted from new records with references being added when the original had been previously entered without references. Overall intra-database duplication is quite low. Overlapping records correlate with overlapping DIALOG OneSearch categories. The Experience of Libraries Across Time: Thematic Analysis of Undergraduate Recollections of Library Experiences Jacqueline Kracker and Howard R. Pollio Published online 11 June 2003 1104 Kracker and Pollio look at the patron's impressions of libraries by way of the qualitative research techniques of content analysis and phenomenological inquiry in which one identifies reoccurring themes in recorded dialogs concerning a topic and the ground upon which they occur. Thus the meaning of the concept for that individual may be identified in terms of their direct experience. One hundred and eighteen undergraduate students enrolled in a freshman psychology course volunteered as subjects. Each was asked to provide, along with basic demographic data, a short description of three specific incidents related to libraries, and a longer description of one of these incidents. The incidents were categorized into six school level categories and five type of library categories resulting in 708 coded events. With the self considered as the ground themes having to do with atmosphere, size and abundance, organization /rules and their effect, what I do in a library, and memories were identified. This allows one to formulate a typical library experience for a 19 year old college student, an experience that changes during different educational periods. Intermediary's Information Seeking, Inquiring Minds, and Elicitation Styles Mei-Mei Wu and Ying-Hsang Liu Published online 18 July 2003 1117 Wu and Lui are concerned with finding the linguistic styles used by intermediaries in their conduct of interactions with those with information needs, and with determining if certain mind sets can be associated with such styles. Thirty patrons' interactions with one of five different intermediaries were video and audio taped while an observer kept notes. Participants responded to questionnaires on their perceptions of the process and general user satisfaction and users were interviewed on audio tape post search. Using seven categories of linguistic form, ten categories of elicitation purpose, and seven categories of communication function, the texts were analyzed and a chi- square test showed differences in each among intermediaries and identified three styles termed situational (differing with user needs), functional (no functional differences), and stereotypical (purposes, functions and forms are constant). The mind set of the intermediary determined by analysis of discourse led to three types; problem detection (focus on reexpressing and understanding the need), query formulation (focus on terminology), and database instruction (focus on proper selection and use of databases). No linkage between styles and mind sets was established. PERSPECTIVES ON ... CHEMISTRY JOURNALS: THE TRANSITION FROM PAPER TO ELECTRONIC WITH LESSONS FOR OTHER DISCIPLINES Introduction and Overview: Chemistry Journals: The Transition From Paper to Electronic With Lessons for Other Disciplines Loren D. Mendelsohn Published online 18 July 2003 1136 The articles in this Perspectives have been en selected from papers presented at the Tri-Society Symposium, held on June 9, 2002, in Los Angeles, California, this Symposium. They discuss a broad spectrum of issues that have been raised as an increasing number of libraries convert from paper to online journal subscriptions, ranging from broad questions addressing the process of the changeover to studies of more specific issues. Taken together, they provide a useful overview of the process and contribute significantly to the scholarship in this field. Moreover, these articles have broader applications. The questions raised by the transition from print to electronic are not related solely to chemical information or even science and technology information; since scholarly journals in all disciplines are making the transition from print to electronic, similar questions can be raised with regard to all disciplines. New Knowledge Management Systems: The Implications for Data Discovery, Collection Development, and the Changing Role of the Librarian David Stern Published online 18 July 2003 1138 David Stern's introductory essay raises several questions concerned with the trend toward electronic journals. By highlighting such issues as complex differential pricing plans, the development of new and complex tools for data manipulation, and how these factors affect the role of the librarian, he provides a framework for reading and understanding many of the issues discussed in the subsequent articles. Making the Transition From Print to Electronic Serial Collections: A New Model for Academic Chemistry Libraries? Tina E. Chrzastowski Published online 18 July 2003 1141 In examining the feasibility of moving from paper to electronic journals in a particular library, Tina E. Chrzastowski proposes and evaluates a new model for the academic chemistry library. In so doing, she establishes a list of basic factors and criteria that must be evaluated by any institution considering this transition. Changing Use Patterns of Print Journals in the Digital Age: Impacts of Electronic Equivalents on Print Chemistry Journal Use K. T. L. Vaughan Published online 18 July 2003 1149 K.T.L. Vaughan examines the transition from a different perspective, focusing instead on how the use of paper copies of journals is affected by making available electronic copies of those same journals. By exploring this particular aspect of the question, she provides data that will help library administrators evaluate the utility of retaining paper copies in an increasingly electronic environment. Linking of Errata: Current Practices in Online Physical Sciences Journals Emily L. Poworoznek Published online 18 July 2003 1153 One of the central questions raised by the change from paper to electronic has to do with the nature of the copy of record. Emily L. Poworoznek examines the treatment of errata in electronic journals by a large group of commercial and professional society publishers, pointing out the significance of this issue for the integrity of the scientific record. She further compares these new approaches with the traditional manner of handling errata in printed journals, and discusses indexing under both systems, recommending the necessity of standards that will function under the electronic serials rubric. Managing Tradeoffs in the Electronic Age A. Ben Wagner Published online 18 July 2003 1160 A. Ben Wagner's historical analysis provides an excellent wrap-up, reviewing the introduction and development of electronic resources over the past three decades and analyzing the gains and losses involved in the transition. His paper provides a framework for decision-making in this area. BOOK REVIEWS The Accidental Systems Librarian, by Rachel Singer Gordon Lisa A. Ennis Published online 7 July 2003 1165 Library Information Systems: From Library Automation to Distributed Information Access Solutions, by Thomas R. Kochtanek and Joseph R. Matthews Brenda Chawner Published online 7 July 2003 1166 Impact of Digital Technology on Library Collections and Resource Sharing, edited by Sul H. Lee William J. Wheeler Published online 7 July 2003 1167 Persuasive Technology: Using Computers to Change What We Think and Do, by B. J. Fogg Anastasis D. Petrou, Ph.D. Published online 7 July 2003 1168 CALL FOR PAPERS Special Topic Issue of JASIST: Multilingual Information Systems Published online 12 June 2003 ------------------------------------------------------ The ASIS web site contains the Table of Contents and brief abstracts as above from January 1993 (Volume 44) to date. The John Wiley Interscience site includes issues from 1986 (Volume 37) to date. Guests have access only to tables of contents and abstracts. Registered users of the interscience site have access to the full text of these issues and to preprints. Executive Director American Society for Information Science and Technology 1320 Fenwick Lane, Suite 510 Silver Spring, MD 20910 FAX: (301) 495-0810 PHONE: (301) 495-0900 http://www.asis.org ____ ________________________________________ Asis-l mailing list Asis-l at asis.org http://mail.asis.org/mailman/listinfo/asis-l ________________________________________________________________________ This email has been scanned for all viruses by the MessageLabs Email Security System. ________________________________________________________________________ This email has been scanned for all viruses by the MessageLabs Email Security System. For more information on a proactive email security service working around the clock, around the globe, visit http://www.messagelabs.com ________________________________________________________________________ From harnad at ECS.SOTON.AC.UK Wed Aug 27 15:47:04 2003 From: harnad at ECS.SOTON.AC.UK (Stevan Harnad) Date: Wed, 27 Aug 2003 20:47:04 +0100 Subject: Request for journal/article/field statistics from Ulrichs and ISI Message-ID: This is a request for some statistical information from any colleague who has access to the Ulrichs Directory: http://www.ulrichsweb.com/ulrichsweb/ and to ISI http://www.isinet.com/isi/ In order to get a clear idea of current progress toward open access, I would be very grateful if somone could provide me with the following data (for 2002 and 2003): >From Ulrich's: (1a) (TJ) The total number of refereed (peer-reviewed) journals Ulrich's indexes currently (24,116 active was latest figure) (1b) (TA) What is the total annual *article* count for that total of 24,116 refereed journals (or estimate, or average per journal)? (1c) If possible, it would be a huge help if the TJ and TA count could be broken down by fields, e.g., total number of journals (tj) and total number of articles (ta) in Physics (tjp, tap), Chemistry (tcj, tca), Linguistics (tlj, tla), etc.) >From ISI: (2a) The same data as above, but only for the journals indexed by ISI: TJ(ISI) (last count: 8676) (2b) Same data as 1b, but for ISI only: TA(ISI): total articles, estimate, or average. (2c) Same discipline breakdown as 2c for ISI: tj and ta for physics, etc.) These statistics would be a great help to me. I will then post the data, and some analyses I will do with them. Stevan Harnad harnad at cogsci.soton.ac.uk Professor of Cognitive Science Department of Electronics and phone: +44 23-80 592-582 Computer Science fax: +44 23-80 592-865 University of Southampton http://www.ecs.soton.ac.uk/~harnad/ Highfield, Southampton SO17 1BJ UNITED KINGDOM From garfield at CODEX.CIS.UPENN.EDU Thu Aug 28 14:20:37 2003 From: garfield at CODEX.CIS.UPENN.EDU (Eugene Garfield) Date: Thu, 28 Aug 2003 14:20:37 -0400 Subject: "Deciphering Impact Factors (Editorial)" and "Supplementary Information" Nature Neuroscience 6(8):783, August 2003 Message-ID: NATURE NEUROSCIENCE has kindly made available the following two papers in full-text format to members of SIG-Metrics List TITLE : Deciphering Impact Factors (Editorial) http://www.nature.com/cgi-taf/DynaPage.taf?file=/neuro/journal/v6/n8/full/nn 0803-783.html and Supplementary Information http://www.nature.com/neuro/journal/v6/n8/suppinfo/nn0803-783_S1.html AUTHOR: Charles Jennings JOURNAL: Nature Neuroscience 6(8):783, August 2003 From garfield at CODEX.CIS.UPENN.EDU Thu Aug 28 15:14:39 2003 From: garfield at CODEX.CIS.UPENN.EDU (Eugene Garfield) Date: Thu, 28 Aug 2003 15:14:39 -0400 Subject: Lomnicki A. "Impact factors reward and promote excellence - The system is unkind but effective. Others would do less good for developing countries (Letter. English)" Nature 424 (6948):p.487 (31 July 2003) Message-ID: Adam Lomnicki : lomnicki at eko.uj.edu.pl Title : Impact factors reward and promote excellence - The system is unkind but effective. Others would do less good for developing countries (Letter. English) Author : Lomnicki, A. Journal : Nature 424 (6948):p.487 (31 July 2003) Full-text posted with permission of the author: Nature 424, 487 (31 July 2003); doi:10.1038/424487a IMPACT FACTORS REWARD AND PROMOTE EXCELLENCE Sir ? I expected to read some robust criticism of Peter A. Lawrence's Commentary "The politics of publication" (Nature 422, 259?261; 2003), so I was surprised at the chorus of approval in Correspondence (Nature 423, 479?480 & 585; 2003, and Nature 424, 14; 2003). These views and proposals require rebuttal. I believe that the present system of evaluation is the only one possible, and that Lawrence's apparently utopian proposals would do more harm than good. It is a clich? that modern societies can hardly function without science, and that science has become very expensive and highly specialized, hence requiring an evaluation system. There are two socially justifiable reasons for supporting science. First, scientists make discoveries that increase our knowledge, understanding and predictive power. However, many well-educated people in all fields of science are needed to translate these into progress. Universities can produce these people and can test their skill and knowledge, but they cannot test the skill and knowledge of their own teachers, which has to be done through the engagement of the teachers themselves in scientific activity. The second important reason for supporting science, therefore, is to teach students and to maintain a group of specialists in different fields who can adapt the newest scientific achievements to their society. Politicians and others who fund science need a tool to identify these people. In the best laboratories, the first reason for maintaining science alone is considered paramount. But the second reason is vital to all modern societies, including those unable to produce Nobel prizewinners. Scientists maintain the polite fiction that all of them are equal and do equally good science. But this is not the case. The best laboratories make the most important scientific discoveries. A little lower are those in which less important discoveries are made, but which contain researchers who fully understand what others are doing and who can apply this knowledge. At the bottom are places where people only pretend to do science and are unable to follow progress in their field. The system of rewards in science must assure promotion of the best laboratories, improvement of the decent and denial of public funds to the worst. Neither international congresses nor big international programmes can make this objective distinction between good and poor science, so some other means of evaluation are required. The appearance of the Science Citation Index (SCI) in the 1960s was a breakthrough in the development of objective numerical methods for the evaluation of science and scientists. This can be seen by comparing now with then, and by looking at places where numerical methods of evaluation are unknown. In countries far behind the scientific leaders, scientists are no less numerous, and many universities and scientific journals are supported by public funds. These journals publish many papers but have very low circulations and an insignificant impact on other scientists. This is a waste for the society supporting such research, as the scientists cannot make important discoveries, convey or build on discoveries made by others, or follow developments in their own field. Sometimes this can be seen in rich countries too. In the 1960s and 1970s it was a waste of time to browse German and French journals on ecology and evolutionary biology. This state of affairs changed completely after young researchers started to be rewarded for publishing in journals with a high impact factor: now German and French researchers in these fields write papers that are well worth reading. Evaluation of scientists on the basis of the impact factor and other indices is like the market economy: the system is wrong and unjust, but other systems are much worse. Thousands of books have been written on the evils of capitalism, and now we have articles on the evils of evaluations derived from citation indices. The authors of these articles ignore the global effect of applying this system and concentrate instead on particular cases: a paper got many citations despite being published in a journal with a low impact factor, or a poor paper was cited many times. Evaluation based on citations is a statistical method that has to be used on large samples and carefully applied to avoid pitfalls. Arguments against the system should be statistical, not particular. Critics of this evaluation system propose a utopia with high moral standards. They want science managers and journal editors not to be narrow specialists but to be able to evaluate scientists in all different fields within, say, ecology or molecular biology. With the present extent of specialization this seems hardly possible. In this utopia, managers and editors would be absolutely honest and not guided by their own scientific interests, predilections or aversions. The entire system relies on the best side of human nature. Abandonment of objective methods of science evaluation derived from the SCI would be most dangerous in developing countries and others where science is not first-rate. It would keep their societies from knowing how far behind their scientific institutions are. Worse, it would remove a tool for rewarding researchers who attempt to do good science and for eliminating those who do not. Adam omnicki Institute of Environmental Sciences, Jagiellonian University, ul. Ingardena 6, 30-060 Krak?w, Poland