From garfield at CODEX.CIS.UPENN.EDU Tue Nov 4 15:01:50 2003 From: garfield at CODEX.CIS.UPENN.EDU (Eugene Garfield) Date: Tue, 4 Nov 2003 15:01:50 -0500 Subject: Nancy L. Pelzer,William H. Wiese, "Bibliometric study of grey literature in core veterinary medical journals" J Med Libr Assoc. 2003 October; 91 (4): 434=?ISO-8859-1?Q?=96441?= Message-ID: Nancy L. Pelzer: npelzer at iastate.edu; William H. Wiese: wwiese at iastate.edu Full Text at : http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=209509 J Med Libr Assoc. 2003 October; 91 (4): 434?441 TITLE Bibliometric study of grey literature in core veterinary medical journals AUTHOR Nancy L. Pelzer, M.A. Associate Professor and Team Leader, Cataloging Dept. William Roberts Parks and Ellen Sorge Parks Library Iowa State University, Ames, Iowa 50011 William H. Wiese, M.A., Associate Professor and V Veterinary Medical Librarian Veterinary Medical Library Iowa State University, Ames, Iowa 50011 JOURNAL J Med Libr Assoc. 2003 October; 91 (4): 434?441 ABSTRACT: Objectives: Grey literature has been perceived by many as belonging to the primary sources of information and has become an accepted method of nonconventional communication in the sciences and medicine. Since little s known about the use and nature of grey literature in veterinary medicine, a systematic study was done to analyze and characterize the bibliographic citations appearing in twelve core veterinary journals. Methods: Citations from 2,159 articles published in twelve core veterinary journals in 2000 were analyzed to determine the portion of citations from grey literature. Those citations were further analyzed and categorized according to the type of publication. Results: Citation analysis yielded 55,823 citations, of which 3,564 (6.38%) were considered to be grey literature. Four veterinary specialties, internal medicine, pathology, theriogenology, and microbiology, accounted for 70% of the total number of articles. Three small-animal clinical practice journals cited about 2.5?3% grey literature, less than half that of journals with basic research orientations, where results ranged from almost 6% to approximately 10% grey literature. Nearly 90% of the grey literature appeared as conferences, government publications, and corporate organization literature. Conclusions: The results corroborate other reported research that the incidence of grey literature is lower in medicine and biology than in some other fields, such as aeronautics and agriculture. As in other fields, use of the Internet and the Web has greatly expanded the communication process among veterinary professionals. The appearance of closed community email forums and specialized discussion groups within the veterinary profession is an example of what could become a new kind of grey literature. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=209509 When responding, please attach my original message _______________________________________________________________________ Eugene Garfield, PhD. email: garfield at codex.cis.upenn.edu home page: www.eugenegarfield.org Tel: 215-243-2205 Fax 215-387-1266 President, The Scientist LLC. www.the-scientist.com Chairman Emeritus, ISI www.isinet.com Past President, American Society for Information Science and Technology (ASIS&T) www.asis.org _______________________________________________________________________ From garfield at CODEX.CIS.UPENN.EDU Tue Nov 4 15:12:43 2003 From: garfield at CODEX.CIS.UPENN.EDU (Eugene Garfield) Date: Tue, 4 Nov 2003 15:12:43 -0500 Subject: Atlas MC. "Emerging ethical issues in instructions to authors of high-impact biomedical journals" J Med Libr Assoc. 2003 October; 91 (4): 442=?ISO-8859-1?Q?=96449?= Message-ID: Michel C. Atlas: mcatlas at louisville.edu Full Text available at : http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=209510&rendertype= abstract AUTHOR Michel C. Atlas, M.L.S., AHIP, Professor Kornhauser Health Sciences Library University of Louisville, Kentucky 40292 TITLE Emerging ethical issues in instructions to authors of high-impact biomedical journals JOURNAL J Med Libr Assoc. 2003 October; 91 (4): 442?449 ABSTRACT Public interest in issues concerning the maintenance of high ethical standards in the conduct of scientific research and its publication has been increasing. Some of the developments in these issues as reflected in the publication of the medical literature are traced here. This paper attempts to determine whether public interest is reflected in the specific requirements for authors for manuscript preparation as stated in the ?Instructions to Authors? for articles being prepared for submission to 124 ?high- impact? journals. The instructions to authors of these journals were read on the Web for references to ethical standards or requirements. The ethical issues that the instructions most often covered were specifically related to the individual journal's publication requirements. The results suggest that while the editors and publishers of the biomedical literature are concerned with promoting and protecting the rights of the subjects of the experiments in the articles they publish, and while these concerns are not yet paramount, they are evolving and growing. From FNHavemann at COMPUSERVE.DE Mon Nov 10 11:17:46 2003 From: FNHavemann at COMPUSERVE.DE (Frank Havemann) Date: Mon, 10 Nov 2003 17:17:46 +0100 Subject: Fwd: CfP: March 2004, ROORKEE, INDIA Message-ID: CALL FOR PAPERS: INTERNATIONAL WORKSHOP ON WEBOMETRICS, INFORMETRICS AND SCIENTOMETRICS & 5TH COLLNET MEETING 2-5 March 2004, ROORKEE, INDIA (Announcement with details on http://www.collnet.de) Deadline for submitting abstracts (1 or 2 pages): November 30, 2003 Please send your abstracts to: Yogendra Singh Librarian Central Library Indian Institute of Technology (IIT) Roorkee Roorkee-247667 U.A. India Tel: +91-1332-285239, 277587 Fax: +91-1332-277587, 273560 E-mail: yogi at iitr.ernet.in, rulib at iitr.ernet.in r_kundra at yahoo.com Please send also a copy to: E-mail: kretschmer.h at onlinehome.de With kind regards, Hildrun Kretschmer Program Chair The Netherlands and Germany ---------------------------------------------------------------- Dr.sc.phil., Dr.oec., Dipl.-psych. Honorary Professor Henan Normal University, Xinxiang, China Research Associate, Nerdi, NIWI, KNAW, Amsterdam, The Netherlands Private Lecturer Free University Berlin, Germany COLLNET Co-ordinator Homepage: http://www.h-kretschmer.de E-mail: kretschmer.h at onlinehome.de (forwarded by Frank Havemann, Berlin, fh at wissenschaftsforschung.de) From harnad at ECS.SOTON.AC.UK Fri Nov 14 17:29:15 2003 From: harnad at ECS.SOTON.AC.UK (Stevan Harnad) Date: Fri, 14 Nov 2003 22:29:15 +0000 Subject: How to compare research impact of toll- vs. open-access research In-Reply-To: <200306162055.QAA05784@maury.cfa.harvard.edu> Message-ID: [Posted with permission from Michael Kurtz, Astrophysics, Harvard] On Fri, 14 Nov 2003, Michael Kurtz wrote: > You may be interested in: > http://listserv.nd.edu/cgi-bin/wa?A2=ind0311&L=pamnet&D=1&O=D&P=1632 > which is a report by the librarian liason of the AAS Pub board meeting. > The relevant paragraph (at the end) is: > > Finally, there was a very interesting brief report from Greg Schwarz > (from the ApJ editorial office) on some work he's doing tracking > citation rates of papers published in the ApJ based on whether they were > posted on astro-ph or not. He studied samples from July-Dec. 1999 and > July-Dec. 2002. The first interesting point is that 72% of the papers > published in the latter period had appeared on astro-ph, although the > submission rate to the server seems to be leveling off. He also noted > that the number of authors per paper has been increasing along with the > total length and that most astro-ph submissions are after acceptance by > the journal. The really fascinating conclusion he's drawn, at least from > my perspective, is that ApJ papers that were also on astro-ph have a > citation rate that is _twice_ that of papers not on the preprint server. > Moreover, this higher citation rate appears to continue once the time > gap disappears (that is, papers on astro-ph are viewed about nine months > ahead of the journal paper, but after several years of availability, the > astro-ph papers are still being cited at a significantly higher rate). > > You have shown some similar work already, but this seems nicely done. > With the majority of ApJ papers going to astro-ph those which are not > preprinted (and which are less referenced) seem the oddballs. > > I have been assuming that the higher citation rates for papers which are > preprinted was due to the preprinting; perhaps the effect is that lower > quality/interest papers are not preprinted. Can I ask for a clarification (because the word "preprinted," unlike "self-archived," is somewhat ambiguous): Are you specifically referring here to the prepublication part of an article's timeline, your point being that in astrophysics, where the publishers' versions are all effectively "open access" by the time they appear (in that they are all available to the entire worldwide astrophysical research community via site-licenses to the relatively small and closed group of journals involved), there are *still* twice as many citations of those papers that were self-archived before publication (as either pre-refereeing preprints or post-refereeing postprints or both) than to those that only became openly accessible when they became available as from the publisher? That would be very useful news both for the value of open access to eprints (preprints and postprints) in general and the value of prepublication self-archiving in particular, suggesting that (if we take Steve Lawrence's figures for the overall citation advantage of free online access to eprints over the its alternatives -- online or on-paper -- which is a citation advantage of 4.5) we see that a two-fold advantage already comes from free access to the prepublication phase alone. The causality, of course, is uncertain here, as you note: Is it that earlier open-access enhances the citation counts, or that the better articles are the ones that are being self-archived earlier? In any case, it is certainly a vote both for open access and for early self-archiving! Cheers, Stevan Date: Fri, 14 Nov 2003 15:19:23 -0500 From: Michael Kurtz Hi Stevan, First I should note that I personally have nothing to do with this study, I have only read Sarah's report. The author, Greg Schwartz (gschwarz #@# as.arizona.edu) would certainly be a better source as to what he is saying. Certainly you may post my message about it. To your question: Greg is referring to papers which have been deposited in the ArXiv, normally astro-ph, thus they are self-archived in advance of publication (preprinted). There are other avenues for astronomy articles to be preprinted; he seems from the description not to be taking them into account. In your terminology he notes that most of the articles were submitted to the ArXiv after they were accepted (thus are post-refereeing postprints); there is no requirement for this by any astronomy journal, but it has long been the common practice, since before preprints became electronic. So the answer to your first paragraph question is YES! Greg may know if there is a difference in citation rate for papers which were deposited in the ArXiv before they were accepted (pre-refereeing preprints) vs after they were accepted (postprints); this would help to clear up the causality issue, as the preprints were self-archived earlier. In any event this is a huge vote for the importance of self-archiving. Best wishes Michael From loet at LEYDESDORFF.NET Sat Nov 15 06:07:49 2003 From: loet at LEYDESDORFF.NET (Loet Leydesdorff) Date: Sat, 15 Nov 2003 12:07:49 +0100 Subject: the knowledge base of an economy Message-ID: * apologies for cross-postings The Knowledge Base of an Economy: What is it? Can it be measured? Can it be modeled?, European Association for Evolutionary and Political Economics, Maastricht, November 8, 2003 1. Introduction 1.1 What is the knowledge base of an economy? 1.2 The operation of the knowledge base 1.3 A micro-foundation of the Triple Helix model 2. How can the knowledge base of an economy be measured? 2.1 Next-order systems formation as a result of innovation at interfaces 2.2 Operationalization in terms of the Triple Helix model 3. Can the knowledge base (of an economy) be simulated? 3.1 Anticipation at the level of the social system 3.2 Anticipatory modeling 3.3 Strong anticipation and technological determination 4. The complex dynamics of a knowledge-based economy available at http://www.leydesdorff.net/eaepe03/index.htm or http://www.leydesdorff.net/eaepe03/knowledgebase.pdf _____ Loet Leydesdorff Amsterdam School of Communications Research (ASCoR) Kloveniersburgwal 48, 1012 CX Amsterdam Tel.: +31-20- 525 6598; fax: +31-20- 525 3681 loet at leydesdorff.net ; http://www.leydesdorff.net/ The Challenge of Scientometrics ; The Self-Organization of the Knowledge-Based Society -------------- next part -------------- An HTML attachment was scrubbed... URL: From harnad at ECS.SOTON.AC.UK Sat Nov 15 16:03:01 2003 From: harnad at ECS.SOTON.AC.UK (Stevan Harnad) Date: Sat, 15 Nov 2003 21:03:01 +0000 Subject: Interoperability - subject classification/terminology In-Reply-To: <1068906324.3fb637548daf5@webmail.shef.ac.uk> Message-ID: On Sat, 15 Nov 2003, Prof. Tom Wilson wrote: > Stevan Harnad says: > >s> Please remember that most researchers currently search their abstracts databases >s> and their toll-access journal content databases without the help of any subject >s> classification taxonomies. This will continue to be the case for the open-access >s> full-text database, once it grows to a significant size. Journal articles -- >s> especially when they include inverted full-text -- are not, and never >s> were, searched via prepackaged subject classifications or taxonomies >s> or aggregations. > I think that Stevan is a little too sweeping in his generalisation here. In the > days before machine searching, pretty well all abstracting journals were > organized according to some subject specific classification scheme: Chemical > Abstracts, Metallurgical Abstracts, Nuclear Science Abstracts are among those I > searched on behalf of scientists in that dim and distant past. At that time > users certainly relied upon those classification schemes to help them to reduce > the volume of material they needed to search. I agree completely. But we are now in the days of machine searching, done by the researchers themselves, for themselves, google-style. When search is restricted to the inverted full-text corpus of the annual 2.5 million articles published in the planet's 24,000 refereed journals, there is no need whatsoever to rely on classification schemes. http://www.eprints.org/self-faq/#26.Classification > Those classification schemes > continue today in the print versions and online versions generally offer the > possibility of a search by class. Yes, but does anyone bother to use them (online)? > The debate about the cheapness of simplistic > Boolean searching (which puts the costs on the user to disentangle the useful > from the useless) versus the cost (to the producer) of high quality subject > indexing and classification has never been settled - and doubtless never will > be. But one thing is sure: It is irrelevant to the issue of open access, and certainly not something to wait for! Stevan Harnad > ___________________________________________________ > Professor T.D. Wilson, PhD > Publisher/Editor in Chief > Information Research > InformationR.net > University of Sheffield > Sheffield S10 2TN, UK > e-mail: t.d.wilson at shef.ac.uk > Web site: http://InformationR.net/ > ___________________________________________________ > From j.hartley at PSY.KEELE.AC.UK Mon Nov 17 03:45:15 2003 From: j.hartley at PSY.KEELE.AC.UK (J. Hartley) Date: Mon, 17 Nov 2003 08:45:15 -0000 Subject: Interoperability - subject classification/terminology Message-ID: Colleagues interested in this debate might like to have a copy of our paper 'How useful are 'key words' in scientific journals' published in J. Info. Sc., 2003, 29, 5, 433-438. James Hartley & Ron Kostoff (Copies available from j.hartley at psy.keele.ac.uk) ----- Original Message ----- From: "Stevan Harnad" To: Sent: 15 November 2003 21:03 Subject: Re: [SIGMETRICS] Interoperability - subject classification/terminology > On Sat, 15 Nov 2003, Prof. Tom Wilson wrote: > > > Stevan Harnad says: > > > >s> Please remember that most researchers currently search their abstracts databases > >s> and their toll-access journal content databases without the help of any subject > >s> classification taxonomies. This will continue to be the case for the open-access > >s> full-text database, once it grows to a significant size. Journal articles -- > >s> especially when they include inverted full-text -- are not, and never > >s> were, searched via prepackaged subject classifications or taxonomies > >s> or aggregations. > > > I think that Stevan is a little too sweeping in his generalisation here. In the > > days before machine searching, pretty well all abstracting journals were > > organized according to some subject specific classification scheme: Chemical > > Abstracts, Metallurgical Abstracts, Nuclear Science Abstracts are among those I > > searched on behalf of scientists in that dim and distant past. At that time > > users certainly relied upon those classification schemes to help them to reduce > > the volume of material they needed to search. > > I agree completely. But we are now in the days of machine searching, done by the > researchers themselves, for themselves, google-style. When search is restricted to > the inverted full-text corpus of the annual 2.5 million articles published in the > planet's 24,000 refereed journals, there is no need whatsoever to rely on > classification schemes. > http://www.eprints.org/self-faq/#26.Classification > > > Those classification schemes > > continue today in the print versions and online versions generally offer the > > possibility of a search by class. > > Yes, but does anyone bother to use them (online)? > > > The debate about the cheapness of simplistic > > Boolean searching (which puts the costs on the user to disentangle the useful > > from the useless) versus the cost (to the producer) of high quality subject > > indexing and classification has never been settled - and doubtless never will > > be. > > But one thing is sure: It is irrelevant to the issue of open access, and certainly > not something to wait for! > > Stevan Harnad > > > ___________________________________________________ > > Professor T.D. Wilson, PhD > > Publisher/Editor in Chief > > Information Research > > InformationR.net > > University of Sheffield > > Sheffield S10 2TN, UK > > e-mail: t.d.wilson at shef.ac.uk > > Web site: http://InformationR.net/ > > ___________________________________________________ > > From harnad at ECS.SOTON.AC.UK Tue Nov 18 12:48:59 2003 From: harnad at ECS.SOTON.AC.UK (Stevan Harnad) Date: Tue, 18 Nov 2003 17:48:59 +0000 Subject: Interoperability - subject classification/terminology In-Reply-To: <886EF25AF8BEF64EB89A820EF84064FF026719EE@UCMAIL4> Message-ID: On Tue, 18 Nov 2003, Franklin, Rosemary (franklra) wrote: > The only quibble with your bet is that humanities scholars/researchers often > work in the realm of abstract (soft) ideas and arguments which are not so > easily searched and retrieved, while the sciences are concrete (hard)with > data and vocabulary more easily discovered. How do you search nuances? I don't know of any evidence that inverted full-text boolean search is any less effective in one field than another. (Does anyone have any such evidence?) Stevan Harnad > -Original Message----- > From: Stevan Harnad [mailto:harnad at ecs.soton.ac.uk] > Sent: Friday, November 14, 2003 12:07 PM > To: BOAI Forum > Cc: september98-forum at amsci-forum.amsci.org > Subject: [BOAI] Re: Interoperability - subject > classification/terminology [bcc][faked-from][mx] > > > On Thu, 13 Nov 2003, Franklin, Rosemary (franklra) wrote: > > > Generally you are searching in natural language, depending on the fields > > tagged and how the file is organized. Portals such as the HUMBUL site and > > others organized around broad subject areas are value-added OAI searching > > and have controlled vocabulary added, or they are in the process of > adding. > > I would like to make a bet about values that will prove to be worth and not > worth > adding to a full-text corpus of refereed research journal articles. (Note > that > this bet pertains *only* to the refereed journal article corpus, but that > does > include all disciplines, including the humanities): > > Until and unless XML tagging of the full-texts themselves prevails -- a > desirable outcome that is largely independent of the urgent goal of open > access -- nothing will come even close to matching (let alone beating) > the power of boolean search over the inverted full-texts, google-style > (but restricted to the OAI-compliant domain). > > Please remember that most researchers currently search their abstracts > databases and > their toll-access journal content databases without the help of any subject > classification taxonomies. This will continue to be the case for the > open-access > full-text database, once it grows to a significant size. Journal articles -- > especially when they include inverted full-text -- are not, and never > were, searched via prepackaged subject classifications or taxonomies > or aggregations. And even those taxonomies and aggregations that exist > were generated by machine analysis of the database rather than by human > classification. (In other words, they were generated by "semantic-web" > -- i.e., syntactic-web! -- computations on the full-text database.) > > See Subject Thread: > "Interoperability - subject classification/terminology" > http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2384.html > > I know that especially in the humanities, many scholars and librarians are > betting > otherwise. It will be interesting to see what the outcome turns out to be. > > But let it be stressed again: This has nothing to do with open access, > except > inasmuch as it is extremely important not to hold back open access for even > one > microsecond in order to wait for classification/taxonomy values to be added > -- any > more than open access should be delayed in any way to wait for preservation > values > to be added. > > The intuitive point to keep in mind is that we are talking about OAI > eprint space, not google space. Needle/haystack problems in google space > vanish when it is contracted to just the OAI eprint subspace. OAI eprint > space > consists of the yearly 2,500,000 articles in the planet's 24,000 > peer-reviewed > journals in all fields and languages, before (preprints) and after peer > review (postprints). > > http://www.eprints.org/self-faq/#What-is-Eprint > > Stevan Harnad > > NOTE: Complete archive of the ongoing discussion of providing open > access to the peer-reviewed research literature online is available at > the American Scientist September Forum (98 & 99 & 00 & 01 & 02 & 03): > http://amsci-forum.amsci.org/archives/september98-forum.html > http://www.cogsci.soton.ac.uk/~harnad/Hypermail/Amsci/index.html > Posted discussion to: september98-forum at amsci-forum.amsci.org > > Dual Open-Access Strategy: > BOAI-2 ("gold"): Publish your article in a suitable open-access > journal whenever one exists. > BOAI-1 ("green"): Otherwise, publish your article in a suitable > toll-access journal and also self-archive it. > http://www.soros.org/openaccess/read.shtml > http://www.ecs.soton.ac.uk/~harnad/Temp/berlin.htm > http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0026.gif > http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0021.gif > http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0024.gif > http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0028.gif > From isidro at CINDOC.CSIC.ES Wed Nov 19 03:15:02 2003 From: isidro at CINDOC.CSIC.ES (Isidro F. Aguillo) Date: Wed, 19 Nov 2003 09:15:02 +0100 Subject: SARS in Cybermetrics In-Reply-To: <200009251706.NAA30175@infomed.sld.cu> Message-ID: Dear colleagues: The ejournal Cybermetrics (http://www.cindoc.csic.es/cybermetrics) has just published the following article: Newspaper Coverage of SARS: A Comparison among Canada, Hong Kong, Mainland China and Western Europe Linda Chee Yuk Chan, Bihui Jin, Ronald Rousseau, Liwen Vaughan, Yuan Yu A quantitative analysis of newspaper coverage of SARS was conducted, where the occurrence of the word SARS in newspaper articles, rather than newspaper content was examined. Data were collected from six newspapers representing Canada, mainland China, Hong Kong, and Western Europe. These data were then compared with the World Health Organization?s data on SARS cases and SARS deaths. A brief history of SARS is also provided to place the results of the study in the context of the SARS events. The analysis finds not only a similarity between the two western media examined, but also a contrast between the western media and the Chinese media in SARS coverage. The study demonstrates the usefulness of informetric methods in analyzing popular media. The paper is avalaible free of charge in the following URL address: http://www.cindoc.csic.es/cybermetrics/articles/v6i1p1.html You are cordially invited to contribute to one of the most read journals in our discipline and with a high rate of citations. Thanks in advance, ^ Isidro F. Aguillo isidro at cindoc.csic.es CINDOC-CSIC Joaquin Costa, 22 28002 Madrid. SPAIN +34-630858997 www.cindoc.csic.es/cybermetrics ^ -- ^ Isidro F. Aguillo isidro at cindoc.csic.es CINDOC-CSIC Joaquin Costa, 22 28002 Madrid. SPAIN +34-630858997 www.cindoc.csic.es/cybermetrics ^ From palvarez at UNEX.ES Wed Nov 19 04:18:32 2003 From: palvarez at UNEX.ES (=?iso-8859-1?Q?Pedro_=C1lvarez_Mart=EDnez?=) Date: Wed, 19 Nov 2003 10:18:32 +0100 Subject: Interoperability - subject classification/terminology Message-ID: Dear Stevan, Maybe the technique developed in the papers "The Rasch model. Measuring information from Keywords: The diabetes field" (Journal of the American Society for Information Science 47 (6):468-476, 1996), and "Measuring information through topical subheadings of the Medline database: a case study" ( Journal of Information Science, 25(5) 1999, pp. 395-402) will help. Pedro Alvarez Full Prof., PhD., Diplomate in Measurement ----- Original Message ----- From: "Stevan Harnad" To: Sent: Tuesday, November 18, 2003 6:48 PM Subject: Re: [SIGMETRICS] Interoperability - subject classification/terminology > On Tue, 18 Nov 2003, Franklin, Rosemary (franklra) wrote: > > > The only quibble with your bet is that humanities scholars/researchers often > > work in the realm of abstract (soft) ideas and arguments which are not so > > easily searched and retrieved, while the sciences are concrete (hard)with > > data and vocabulary more easily discovered. How do you search nuances? > > I don't know of any evidence that inverted full-text boolean search > is any less effective in one field than another. (Does anyone have any > such evidence?) > > Stevan Harnad > > > -Original Message----- > > From: Stevan Harnad [mailto:harnad at ecs.soton.ac.uk] > > Sent: Friday, November 14, 2003 12:07 PM > > To: BOAI Forum > > Cc: september98-forum at amsci-forum.amsci.org > > Subject: [BOAI] Re: Interoperability - subject > > classification/terminology [bcc][faked-from][mx] > > > > > > On Thu, 13 Nov 2003, Franklin, Rosemary (franklra) wrote: > > > > > Generally you are searching in natural language, depending on the fields > > > tagged and how the file is organized. Portals such as the HUMBUL site and > > > others organized around broad subject areas are value-added OAI searching > > > and have controlled vocabulary added, or they are in the process of > > adding. > > > > I would like to make a bet about values that will prove to be worth and not > > worth > > adding to a full-text corpus of refereed research journal articles. (Note > > that > > this bet pertains *only* to the refereed journal article corpus, but that > > does > > include all disciplines, including the humanities): > > > > Until and unless XML tagging of the full-texts themselves prevails -- a > > desirable outcome that is largely independent of the urgent goal of open > > access -- nothing will come even close to matching (let alone beating) > > the power of boolean search over the inverted full-texts, google-style > > (but restricted to the OAI-compliant domain). > > > > Please remember that most researchers currently search their abstracts > > databases and > > their toll-access journal content databases without the help of any subject > > classification taxonomies. This will continue to be the case for the > > open-access > > full-text database, once it grows to a significant size. Journal articles -- > > especially when they include inverted full-text -- are not, and never > > were, searched via prepackaged subject classifications or taxonomies > > or aggregations. And even those taxonomies and aggregations that exist > > were generated by machine analysis of the database rather than by human > > classification. (In other words, they were generated by "semantic-web" > > -- i.e., syntactic-web! -- computations on the full-text database.) > > > > See Subject Thread: > > "Interoperability - subject classification/terminology" > > http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2384.html > > > > I know that especially in the humanities, many scholars and librarians are > > betting > > otherwise. It will be interesting to see what the outcome turns out to be. > > > > But let it be stressed again: This has nothing to do with open access, > > except > > inasmuch as it is extremely important not to hold back open access for even > > one > > microsecond in order to wait for classification/taxonomy values to be added > > -- any > > more than open access should be delayed in any way to wait for preservation > > values > > to be added. > > > > The intuitive point to keep in mind is that we are talking about OAI > > eprint space, not google space. Needle/haystack problems in google space > > vanish when it is contracted to just the OAI eprint subspace. OAI eprint > > space > > consists of the yearly 2,500,000 articles in the planet's 24,000 > > peer-reviewed > > journals in all fields and languages, before (preprints) and after peer > > review (postprints). > > > > http://www.eprints.org/self-faq/#What-is-Eprint > > > > Stevan Harnad > > > > NOTE: Complete archive of the ongoing discussion of providing open > > access to the peer-reviewed research literature online is available at > > the American Scientist September Forum (98 & 99 & 00 & 01 & 02 & 03): > > http://amsci-forum.amsci.org/archives/september98-forum.html > > http://www.cogsci.soton.ac.uk/~harnad/Hypermail/Amsci/index.html > > Posted discussion to: september98-forum at amsci-forum.amsci.org > > > > Dual Open-Access Strategy: > > BOAI-2 ("gold"): Publish your article in a suitable open-access > > journal whenever one exists. > > BOAI-1 ("green"): Otherwise, publish your article in a suitable > > toll-access journal and also self-archive it. > > http://www.soros.org/openaccess/read.shtml > > http://www.ecs.soton.ac.uk/~harnad/Temp/berlin.htm > > http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0026.gif > > http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0021.gif > > http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0024.gif > > http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0028.gif > > From palvarez at UNEX.ES Wed Nov 19 05:57:53 2003 From: palvarez at UNEX.ES (=?iso-8859-1?Q?Pedro_=C1lvarez_Mart=EDnez?=) Date: Wed, 19 Nov 2003 11:57:53 +0100 Subject: Interoperability - subject classification/terminology Message-ID: Dear Stevan, The most recent work using the technique in the previos papers, has been applied to Economics. "There are 19 different areas in Economics. Research papers in Economics are assigned to each area according to their topical content described by their descriptors. Codifying areas and descriptors allow Boolean algebra to be applied, and latent information related amongst areas can be obtained. Processing data from Econlit database, it has been selected the Quantitative Methods descriptors, and found out the presence of this descriptors in all 19 Economics areas. The identity of the Quantitative Methods descriptors for each area reveal the quantitaive research profile in that area". Pedro Alvarez Prof. PhD. D.M. ----- Original Message ----- From: "Stevan Harnad" To: Sent: Tuesday, November 18, 2003 6:48 PM Subject: Re: [SIGMETRICS] Interoperability - subject classification/terminology > On Tue, 18 Nov 2003, Franklin, Rosemary (franklra) wrote: > > > The only quibble with your bet is that humanities scholars/researchers often > > work in the realm of abstract (soft) ideas and arguments which are not so > > easily searched and retrieved, while the sciences are concrete (hard)with > > data and vocabulary more easily discovered. How do you search nuances? > > I don't know of any evidence that inverted full-text boolean search > is any less effective in one field than another. (Does anyone have any > such evidence?) > > Stevan Harnad > > > -Original Message----- > > From: Stevan Harnad [mailto:harnad at ecs.soton.ac.uk] > > Sent: Friday, November 14, 2003 12:07 PM > > To: BOAI Forum > > Cc: september98-forum at amsci-forum.amsci.org > > Subject: [BOAI] Re: Interoperability - subject > > classification/terminology [bcc][faked-from][mx] > > > > > > On Thu, 13 Nov 2003, Franklin, Rosemary (franklra) wrote: > > > > > Generally you are searching in natural language, depending on the fields > > > tagged and how the file is organized. Portals such as the HUMBUL site and > > > others organized around broad subject areas are value-added OAI searching > > > and have controlled vocabulary added, or they are in the process of > > adding. > > > > I would like to make a bet about values that will prove to be worth and not > > worth > > adding to a full-text corpus of refereed research journal articles. (Note > > that > > this bet pertains *only* to the refereed journal article corpus, but that > > does > > include all disciplines, including the humanities): > > > > Until and unless XML tagging of the full-texts themselves prevails -- a > > desirable outcome that is largely independent of the urgent goal of open > > access -- nothing will come even close to matching (let alone beating) > > the power of boolean search over the inverted full-texts, google-style > > (but restricted to the OAI-compliant domain). > > > > Please remember that most researchers currently search their abstracts > > databases and > > their toll-access journal content databases without the help of any subject > > classification taxonomies. This will continue to be the case for the > > open-access > > full-text database, once it grows to a significant size. Journal articles -- > > especially when they include inverted full-text -- are not, and never > > were, searched via prepackaged subject classifications or taxonomies > > or aggregations. And even those taxonomies and aggregations that exist > > were generated by machine analysis of the database rather than by human > > classification. (In other words, they were generated by "semantic-web" > > -- i.e., syntactic-web! -- computations on the full-text database.) > > > > See Subject Thread: > > "Interoperability - subject classification/terminology" > > http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2384.html > > > > I know that especially in the humanities, many scholars and librarians are > > betting > > otherwise. It will be interesting to see what the outcome turns out to be. > > > > But let it be stressed again: This has nothing to do with open access, > > except > > inasmuch as it is extremely important not to hold back open access for even > > one > > microsecond in order to wait for classification/taxonomy values to be added > > -- any > > more than open access should be delayed in any way to wait for preservation > > values > > to be added. > > > > The intuitive point to keep in mind is that we are talking about OAI > > eprint space, not google space. Needle/haystack problems in google space > > vanish when it is contracted to just the OAI eprint subspace. OAI eprint > > space > > consists of the yearly 2,500,000 articles in the planet's 24,000 > > peer-reviewed > > journals in all fields and languages, before (preprints) and after peer > > review (postprints). > > > > http://www.eprints.org/self-faq/#What-is-Eprint > > > > Stevan Harnad > > > > NOTE: Complete archive of the ongoing discussion of providing open > > access to the peer-reviewed research literature online is available at > > the American Scientist September Forum (98 & 99 & 00 & 01 & 02 & 03): > > http://amsci-forum.amsci.org/archives/september98-forum.html > > http://www.cogsci.soton.ac.uk/~harnad/Hypermail/Amsci/index.html > > Posted discussion to: september98-forum at amsci-forum.amsci.org > > > > Dual Open-Access Strategy: > > BOAI-2 ("gold"): Publish your article in a suitable open-access > > journal whenever one exists. > > BOAI-1 ("green"): Otherwise, publish your article in a suitable > > toll-access journal and also self-archive it. > > http://www.soros.org/openaccess/read.shtml > > http://www.ecs.soton.ac.uk/~harnad/Temp/berlin.htm > > http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0026.gif > > http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0021.gif > > http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0024.gif > > http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0028.gif > > From johannes.stegmann at MEDIZIN.FU-BERLIN.DE Wed Nov 19 07:47:14 2003 From: johannes.stegmann at MEDIZIN.FU-BERLIN.DE (Johannes Stegmann) Date: Wed, 19 Nov 2003 13:47:14 +0100 Subject: Interoperability - subject classification/terminology In-Reply-To: <001301c3ae7e$2c5a65e0$9819319e@casa> Message-ID: >> > >> > Please remember that most researchers currently search their abstracts >> > databases and their toll-access journal content databases without the help of any >> > subject classification taxonomies. This maybe so. But in PubMed, e.g., searcher's input is matched ("invisibly") against a vocabulary, and having matched to a thesaurus term (i.e. MeSH term) or a included MeSH synonym, PubMed's search engine searches the database for the MeSH terms (including exploding the terms) in addition to a simple free text search. This can be followed viewing "Details". Regards, Johannes ------------------------------------------------------- Dr. Johannes Stegmann Charit? - University Medicine Berlin Joint Facility of Free University and Humboldt-University Campus Benjamin Franklin Hindenburgdamm 30, 12203 Berlin, Germany johannes.stegmann at medizin.fu-berlin.de Tel.: +49 30 8445 2035 Fax: +49 30 8445 4454 Homepage: http://www.medizin.fu-berlin.de/medbib/home.html From harnad at ECS.SOTON.AC.UK Wed Nov 19 14:29:32 2003 From: harnad at ECS.SOTON.AC.UK (Stevan Harnad) Date: Wed, 19 Nov 2003 19:29:32 +0000 Subject: Interoperability - subject classification/terminology In-Reply-To: <3FBB8D7D.24962.160A5B5@localhost> Message-ID: On Wed, 19 Nov 2003, Chris Korycinski wrote: > > But we are not talking here about books or book-indexing! We are > > talking about the annual 2.5 million full-text refereed-journal > > articles. > > ... in subjects outside science, remember. I understand fully. My bet (that inverted full-text boolean search is all that is needed to navigate the entire refereed-journal corpus, all 24,000 journals' worth, and would beat human classification any day) does apply to all disciplines, both science and non-science. But my bet does not apply to books and book indexes (although -- without wagering! -- I do believe that software-based indexing and navigation will prevail there too, as "semantic-web" tools grow and improve for navigating large text corpora -- in any/every domain). > My original comments apply just as well to articles as to books - many > of the 'books' she works on are papers or conference proceedings. If a set of articles is gathered together and published as an indexed book, then that is a book! Nolo contendere. Book users don't have online powers over the book, moreover navigating just a local book is a much narrower and more focussed task than navigating the whole of the journal literature in a field. But please don't forget that journals never had (and never needed) subject indices, the way books do, and one of the reasons is probably that, apart from happening to have been accepted for the same issue, their articles don't have much to do with one another -- and if the *were* gathered into a book as a collection, they wouldn't be in the *same* book, but many different, topic-specific ones. Journal article space never had or needed a subject index in paper days, a fortiori, it needs it even less in online days, with the possibility of boolean inverted-text searching (as well as other digital prestidigitation, such as similarity matching, latent semantic indexing, citation-linking, citation-ranking, download-ranking, co-citation analysis, etc.). > The reality is that these areas are intrinsically different as they often > (I'm by no means saying always!) deal with concepts/points-of-view > rather than facts. And concepts lie closer to the realms of metadata > and hence are intractable by naive and simplistic schemes such as > keyword/inverted file indexing. I'm not sure I disagree. I agree that book-space, especially in some subjects, needs something more than just keyword and inverted full-text searching: But my guess is that that something more will turn out to consist of further text-analytic software tools. But if by metadata you mean that human judgment will have to do the tagging and sorting, as in human indexing days, I doubt it (though I make no bets, outside the one area I am pretty sure about: the annual 2,500,000 articles in the planet's 24,000 refereed journals -- across all disciplines and languages). > Have a look at the example I gave... it was edited out of my posting! Apologies. here it is again: > It is concepts, not words people want. The same concept is often expressed in > different words, or, to take another example: "Major announced in Westminster that > Maastricht was totally unacceptable". > > Is this about Westminster? Majors? The Netherlands? No. Try "British foreign > policy" or something similar (depending on the thrust of the book. My guess? This particular example, and countless others like it, are already a piece of cake for some of the more sophosticated digital-text processors I mentioned above. > Belive me, this is a simple example compared to many sociological or philosophical > texts and any inverted-file style of 'indexing' would produce complete rubbish. Indexing, yes, but software text-analyzers? I don't suggest you make any wagers! But my bet about the refereed journal corpus stands! Cheers, Stevan Some references on LSI and SW: http://lsa.colorado.edu/ http://www.cs.utk.edu/~lsi/ http://javelina.cet.middlebury.edu/lsa/out/lsa_definition.htm http://www.w3.org/2001/sw/ http://citebase.eprints.org/cgi-bin/search http://citeseer.nj.nec.com/cs http://citeseer.nj.nec.com/white96similarity.html From montagna at UN.ORG Wed Nov 19 17:26:38 2003 From: montagna at UN.ORG (Maria Laura Montagna) Date: Wed, 19 Nov 2003 17:26:38 -0500 Subject: Maria Laura Montagna/NY/UNO is out of the office. Message-ID: I will be out of the office starting 18/11/2003 and will not return until 24/11/2003. I will respond to your message when I return. From harnad at ECS.SOTON.AC.UK Thu Nov 20 04:59:05 2003 From: harnad at ECS.SOTON.AC.UK (Stevan Harnad) Date: Thu, 20 Nov 2003 09:59:05 +0000 Subject: Interoperability - subject classification/terminology In-Reply-To: <536WHE1TWSJIQKPJ98C7JIUS1V4WB6.3fbc84d3@your-z3mdcejeuo> Message-ID: On Thu, 20 Nov 2003, Barry Mahon wrote: > 19/11/2003 19:29:32, Stevan Harnad wrote: > > > >But please don't forget that journals never had (and never needed) subject > >indices, the way books do > > What about abstracting and indexing services?? (1) The discussion was about whether there is any need for a human-generated subject index when a full-text inverted index is available for boolean search. With abstract/indexing services only article titles and abstracts are available for searching, not article full-texts. (Open access to the 2,500,000 annual articles in the 24,000 peer-reviewed journals means open access to the full text.) (2) Even with abstract/indexing services it would be interesting to find out which users and how many do and do not use the subject index, and why and why not (and how long the subject index will continue to be a human-generated one -- if it still is at all -- in the era of automatic tools such as latent semantic indexing and the other new similarity and classification metrics). Stevan Harnad From garfield at CODEX.CIS.UPENN.EDU Fri Nov 21 16:04:45 2003 From: garfield at CODEX.CIS.UPENN.EDU (Eugene Garfield) Date: Fri, 21 Nov 2003 16:04:45 -0500 Subject: Corby K. "Constructing core journal lists: Mixing science and alchemy" PORTAL-LIBRARIES AND THE ACADEMY 3 (2): 207-217 APR 2003 Message-ID: Katherine Corby : corby at mail.lib.msu.edu TITLE : Constructing core journal lists: Mixing science and alchemy AUTHOR : Corby K JOURNAL : PORTAL-LIBRARIES AND THE ACADEMY 3 (2): 207-217 APR 2003 Document type: Article Language: English Cited References: 36 Times Cited: 0 Abstract: Via an overview of core journal studies, emphasizing the social sciences and education, this review looks for best practices in both motivation and methodology. Selection decisions receive particular focus. Lack of correlation between methods is indicative of the complexity of the topic and the need for judgment in design and use. KeyWords Plus: CITATION, FACULTY Addresses: Corby K, Michigan State Univ Lib, E Lansing, MI USA Michigan State Univ Lib, E Lansing, MI USA Publisher: JOHNS HOPKINS UNIV PRESS, JOURNALS PUBLISHING DIVISION, 2715 NORTH CHARLES ST, BALTIMORE, MD 21218-4319 USA IDS Number: 684XT Cited Author Cited Work Volume Page Year ID *ERIC PROC REF FAC CIJE SOURC J IND 2003 *ISI J CIT REP 2000 *RR BOWK CO ULR INT PER DIR 1987 BAYER AE REV HIGH EDUC 6 111 1983 BONK WJ BUILDING LIB COLLECT 137 1979 BRADFORD SC DOCUMENTATION 1950 BROADUS RN COLL RES LIBR 46 30 1985 BUDD JM J HIGH EDUC 61 84 1990 DAVIS P PORTAL-LIBR ACAD 2 155 2002 DOREIAN P INFORM PROCESS MANAG 25 205 1989 DOREIAN P INFORMATION PROCESSI 25 206 1989 DOREIAN P INFORMATION PROCESSI 25 207 1989 FUNKHOUSER ET HUM COMMUN RES 22 563 1996 GARFIELD E ESSAYS INFORMATION S 4 479 1979 HARDESTY L SERIALS LIB 16 139 1989 HUGHES J LIBR ACQUIS PRACT TH 19 403 1995 JOSWICK KE COLL RES LIB 58 54 1997 JOSWICK KE COLL RES LIBR 58 48 1997 KOONG KS FACULTY USAGE HIGHER 1989 LINE MB COLL RES LIB 36 394 1975 LINE MB COLL RES LIBR 36 393 1975 LINE MB COLLECTION MANAGEMEN 2 315 1978 LUCE TS ED RES 7 8 1978 MACK T LIBR INFORM SCI RES 13 131 1991 NISONGER TE J AM SOC INFORM SCI 50 1005 1999 OBRIEN NP ED GUIDE REFERENCE I 2000 PAN E COLLECTION MANAGEMEN 2 32 1978 SANDISON A J AM SOC INFORM SCI 26 351 1975 SCALES PA J DOC 32 20 1976 SLATER BM B MED LIB ASS 82 70 1994 SMART JC AM EDUC RES J 18 399 1981 SMART JC AM EDUC RES J 18 407 1981 SMART JC RES HIGH EDUC 19 175 1983 TESTA J ISI DATABASE J S NOV 2002 WALLACE DP LIBR INFORM SCI RES 11 59 1989 WALLACE DP LISR LIB INFORMATION 11 69 1989 When responding, please attach my original message _______________________________________________________________________ Eugene Garfield, PhD. email: garfield at codex.cis.upenn.edu home page: www.eugenegarfield.org Tel: 215-243-2205 Fax 215-387-1266 President, The Scientist LLC. www.the-scientist.com Chairman Emeritus, ISI www.isinet.com Past President, American Society for Information Science and Technology (ASIS&T) www.asis.org _______________________________________________________________________ ISSN: 1531-2542 From kate.mccain at CIS.DREXEL.EDU Sat Nov 22 04:02:06 2003 From: kate.mccain at CIS.DREXEL.EDU (Kate McCain) Date: Sat, 22 Nov 2003 04:02:06 -0500 Subject: Kate McCain/Drexel_IST is out of the office. Message-ID: I will be out of the office starting 11/22/2003 and will not return until 12/01/2003. From Garfield at CODEX.CIS.UPENN.EDU Mon Nov 24 12:25:43 2003 From: Garfield at CODEX.CIS.UPENN.EDU (Garfield, Eugene) Date: Mon, 24 Nov 2003 12:25:43 -0500 Subject: FW: Electronic Archives--The Nightmare continues Message-ID: Thought this would interest our members. Best wishes for Thanksgiving. For all our friends outside the US best greeting for the approaching New Year. Gene Garfield When responding, please attach my original message __________________________________________________ Eugene Garfield, PhD. email: garfield at codex.cis.upenn.edu home page: http://www.eugenegarfield.org/ Tel: 215-243-2205 Fax 215-387-1266 President, The Scientist LLC. http://www.the-scientist.com/ 3535 Market St., Phila. PA 19104-3389 Chairman Emeritus, ISI http://www.isinet.com/ 3501 Market Street, Philadelphia, PA 19104-3302 Past President, American Society for Information Science and Technology (ASIS&T) http://www.asis.org/ -----Original Message----- From: Lucy Rowland [mailto:lrowland at uga.edu] Sent: Monday, November 24, 2003 11:39 AM To: grapevine at listserv.uga.edu; BSDNET-L; North Carolina Chapter Subject: Electronic Archives--The Nightmare continues http://www.washingtonpost.com/wp-dyn/articles/A8730-2003Nov23.html On the Web, Research Work Proves Ephemeral Electronic Archivists Are Playing Catch-Up in Trying to Keep Documents >From Landing in History's Dustbin By Rick Weiss Washington Post Staff Writer Monday, November 24, 2003; Page A08 It was in the mundane course of getting a scientific paper published that physician Robert Dellavalle came to the unsettling realization that the world was dissolving before his eyes. The world, that is, of footnotes, references and Web pages. Dellavalle, a dermatologist with the Veterans Affairs Medical Center in Denver, had co-written a research report featuring dozens of footnotes - - many of which referred not to books or journal articles but, as is increasingly the case these days, to Web sites that he and his colleagues had used to substantiate their findings. Problem was, it took about two years for the article to wind its way to publication. And by that time, many of the sites they had cited had moved to other locations on the Internet or disappeared altogether, rendering useless all those Web addresses -- also known as uniform resource locators (URLs) -- they had provided in their footnotes. "Every time we checked, some were gone and others had moved," said Dellavalle, who is on the faculty at the University of Colorado Health Sciences Center. "We thought, 'This is an interesting phenomenon itself. We should look at this.' " He and his co-workers have done just that, and what they have found is not reassuring to those who value having a permanent record of scientific progress. In research described in the journal Science last month, the team looked at footnotes from scientific articles in three major journals -- the New England Journal of Medicine, Science and Nature -- at three months, 15 months and 27 months after publication. The prevalence of inactive Internet references grew during those intervals from 3.8 percent to 10 percent to 13 percent. "I think of it like the library burning in Alexandria," Dellavalle said, referring to the 48 B.C. sacking of the ancient world's greatest repository of knowledge. "We've had all these hundreds of years of stuff available by interlibrary loan, but now things just a few years old are disappearing right under our noses really quickly." Dellavalle's concerns reflect those of a growing number of scientists and scholars who are nervous about their increasing reliance on a medium that is proving far more ephemeral than archival. In one recent study, one-fifth of the Internet addresses used in a Web-based high school science curriculum disappeared over 12 months. Another study, published in January, found that 40 percent to 50 percent of the URLs referenced in articles in two computing journals were inaccessible within four years. "It's a huge problem," said Brewster Kahle, digital librarian at the Internet Archive in San Francisco. "The average lifespan of a Web page today is 100 days. This is no way to run a culture." Of course, even conventional footnotes often lead to dead ends. Some experts have estimated that as many as 20 percent to 25 percent of all published footnotes have typographical errors, which can lead people to the wrong volume or issue of a sought-after reference, said Sheldon Kotzin, chief of bibliographic services at the National Library of Medicine in Bethesda. But the Web's relentless morphing affects a lot more than footnotes. People are increasingly dependent on the Web to get information from companies, organizations and governments. Yet, of the 2,483 British government Web sites, for example, 25 percent change their URL each year, said David Worlock of Electronic Publishing Services Ltd. in London. That matters in part because some documents exist only as Web pages -- for example, the British government's dossier on Iraqi weapons. "It only appeared on the Web," Worlock said. "There is no definitive reference where future historians might find it." Web sites become inaccessible for many reasons. In some cases individuals or groups that launched them have moved on and have removed the material from the global network of computer systems that makes up the Web. In other cases the sites' handlers have moved the material to a different virtual address (the URL that users type in at the top of the browser page) without providing a direct link from the old address to the new one. Page 2 of 2 < Back On the Web, Research Work Proves Ephemeral When computer users try to access a URL that has died or moved to a new location, they typically get what is called a "404 Not Found" message, which reads in part: "The page cannot be displayed. The page you are looking for is currently unavailable." So common are such occurrences today, and so iconic has that message become in the Internet era, that at least one eclectic band has named itself "404 Not Found," and humorists have launched countless knockoffs of the page -- including www.mamselle.ca/error.html, which looks like a standard error page but scolds people for spending too much time on their computers ("This page cannot be displayed because you need some fresh air . . .") and www.coxar.pwp.blueyonder.co.uk, which offers political commentary about the U.S. war in Iraq ("The weapons you are looking for are currently unavailable."). Not all apparently inaccessible Web sites are really beyond reach. Several organizations, including the popular search engine Google and Kahle's Internet Archive (www.archive.org), are taking snapshots of Web pages and archiving them as fast as they can so they can be viewed even after they are pulled down from their sites. The Internet Archive already contains more than 200 terabytes of information (a terabyte is a million million bytes) -- equivalent to about 200 million books. Every month it is adding 20 more terabytes, equivalent to the number of words in the entire Library of Congress. "We're trying to make sure there's a good historical record of at least some subsets of the Web, and at least some record of other parts," Kahle said. "We're injecting the past into the present." But with an estimated 7 million new pages added to the Web every day, archivists can do little more than play catch-up. So others are creating new indexing and retrieval systems that can find Web pages that have wandered to new addresses. One such system, known as DOI (for digital object identifier), assigns a virtual but permanent bar code of sorts to participating Web pages. Even if the page moves to a new URL address, it can always be found via its unique DOI. Standard browsers cannot by themselves find documents by their DOIs. For now, at least, users must use go-between "registration agencies" -- such as one called CrossRef -- and "handle servers," which together work like digital switchboards to lead subscribers to the DOI-labeled pages they seek. A hodgepodge of other retrieval systems is cropping up, as well -- all part of the increasingly desperate effort to keep the ballooning Web's thoughts accessible. If it all sounds complicated, it is. But consider the stakes: The Web contains unfathomably more information than did the Alexandria library. If our culture ends up unable to retrieve and use that information, then all that knowledge will, in effect, have gone up in smoke. Research editor Margot Williams contributed to this report. Lucy M. Rowland, MS, MLS, CNU Head, Science Collections & Research Facilities University of Georgia Libraries Athens, GA 30602-7412 lrowland at uga.edu +1-706-542-6643 FAX: +1-706-542-7907 www.libs.uga.edu/science/science.html "Human subtlety will never devise an invention more beautiful, more simple, or more direct than does Nature." --Leonardo da Vinci "Always do right. It will gratify some people and astonish the rest." --Mark Twain ________________________________________________________________________ This email has been scanned for all viruses by the MessageLabs Email Security System. ________________________________________________________________________ This email has been scanned for all viruses by the MessageLabs Email Security System. For more information on a proactive email security service working around the clock, around the globe, visit http://www.messagelabs.com ________________________________________________________________________ From harnad at ECS.SOTON.AC.UK Tue Nov 25 06:52:40 2003 From: harnad at ECS.SOTON.AC.UK (Stevan Harnad) Date: Tue, 25 Nov 2003 11:52:40 +0000 Subject: Measuring cumulating research impact loss across fields and time Message-ID: On Tue, 25 Nov 2003, [identity deleted] wrote: > Dear Prof. Harnad, > > Do you have any notes that go with your Open Access PowerPoint presentation > http://www.ecs.soton.ac.uk/~harnad/Temp/openaccess.ppt > - specifically in the slide 25/52 (Quo usque tandem > patientia nostra?) where does the data come from for the 2 graphs - > "What we stand to gain" and "Yearly, Monthly, Daily Impact Losses" come > from and how has it been calculated? > http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0025.gif It is based on the 336% impact-loss estimate from the Lawrence study (bottom-left corner). It simply cumulates that impact-loss to show how big it really is, and how it is growing with time. With collaborators at UQaM, Southampton, Oldenburg and Loughborough we are now extending the Lawrence study (which was on a sample from computer science) to the entire 10-year ISI database from 1992-2002 (about ten million articles) across all disciplines, in order (1) to show the relative growth of open access across time, by discipline, and (2) to estimate the relative impact advantage (in terms of citation counts) that open access provides, across time, by discipline. Our method is first to compute the citation count for each of the ten million articles indexed in the ISI database (using an algorithm that takes each indexed article's reference list and fuzzy-matches each cited article to the article it cites, whenever that too is in the database). Then we send a software agent to the web to check, for each of those ten million articles (again by fuzzy-matching), whether a full-text of it is accessible toll-free on the web. We then compare, display and extrapolate, year by year, field by field, journal by journal, (1) the number and (2) citation counts for articles that are and are not openly accessible. These will be the actual data, replacing the Lawrence estimate in that slide. We will then convert those impact losses into research income losses for universities and research institutions, and use those data to show university administrators, quantitatively, why it is that they need to extend existing "publish or perish" policy to "publish *and* provide open access to your publications" (in order to maximize research impact -- and income). The hypothesis is that the only thing holding back immediate universal open-access provision by researchers and their institutions today is ignorance about (1) the magnitude of the needless accumulating impact losses, and about (2) the simple, legal, and virtually cost-free way that those losses can be immediately reversed through the dual open-access strategy of (i) publishing in an open-access journal wherever a suitable one exists (5%), and (ii) self-archiving all toll-access publications otherwise (95%). Meanwhile, keep using those powerpoints to encourage open-access provision! Stevan Harnad NOTE: A complete archive of the ongoing discussion of providing open access to the peer-reviewed research literature online is available at the American Scientist September Forum (98 & 99 & 00 & 01 & 02 & 03): http://amsci-forum.amsci.org/archives/september98-forum.html http://www.cogsci.soton.ac.uk/~harnad/Hypermail/Amsci/index.html Post discussion to: september98-forum at amsci-forum.amsci.org Dual Open-Access Strategy: BOAI-2 ("gold"): Publish your article in a suitable open-access journal whenever one exists. BOAI-1 ("green"): Otherwise, publish your article in a suitable toll-access journal and also self-archive it. http://www.soros.org/openaccess/read.shtml http://www.ecs.soton.ac.uk/~harnad/Temp/berlin.htm http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0026.gif http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0021.gif http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0024.gif http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0028.gif From harnad at ECS.SOTON.AC.UK Wed Nov 26 07:09:47 2003 From: harnad at ECS.SOTON.AC.UK (Stevan Harnad) Date: Wed, 26 Nov 2003 12:09:47 +0000 Subject: Measuring cumulating research impact loss across fields and time In-Reply-To: Message-ID: On Wed, 26 Nov 2003, David Spurrett & Subbiah Arunachalam wrote: >ds> I look forward to the results of the empirical study you describe. >ds> I would be curious to know... whether >ds> there was a further pattern that related (a) the extent to which >ds> publications by authors at particular institutions cited research >ds> materials available through open access, with (b) their local >ds> institutional budget for expenditure on journals. > >sa> Stevan Harnad talked about a study on the relative >sa> citation rates of open-access and toll-access articles >sa> he is conducting in collaboration with UQaM, >sa> Southampton, Oldenburg and Loughborough. When will the >sa> results become available? Will there be any interim >sa> reports? I am curious to know. The study is ongoing and we will report the results (as a pre-refereeing preprint!) as soon as they are available. But meanwhile, much information inheres in -- and many telling estimates can be made from -- the data that are already available. David Spurrett's & Subbiah Arunachalam's queries suggest the following preliminary analysis, which can already be done by anyone on the basis of the data already available. (I will ask our super-talented team at Southampton if they can squeeze it in, along with all the other ongoing studies!): We know from the Lawrence study (below) that the citation enhancement factor for open- vs. toll-access is about 4.5 in computer science (4.5 times as many citations for open- vs. toll-access articles in the same venue). http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0006.gif We know from the Eysenck and Smith RAE outcome study in Psychology (and from the Oppenheim studies in other disciplines) that the correlation between RAE outcome and citation impact is about .90 (in Psychology). http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0007.gif We also know the 2001 RAE outcome, rank-ordering every department in every university in the UK http://www.hero.ac.uk/rae/submissions/ and we also know the size of the funding and the funding difference associate with each rank. Hence it is very easy to take those rank orders, for each discipline, and calculate -- based on that discipline's correlation between its RAE rank and its citation impact -- the estimated income increase that would arise from the rank increase induced by the impact increase caused by open access! In particular, it would be possible to illustrate how the rank order would change if, for example, the research output of the lowest-ranked department in each discipline became open-access, and gained a 2-fold, 3-fold, 4-fold, or 4.5-fold increase in impact (depending on how close it came to the Lawrence 4.5 estimate -- which might itself be an underestimate in some disciplines!). The RAE/impact correlation would predict what rank that department would get, and the RAE/funding correlation would predict how much more money that would translate into. Obviously if *all* the articles in all disciplines suddenly became open-access overnight, there would not be such a dramatic change in rankings (though it would give some research a better fighting chance), because all impact would simply be scaled up. (*Simply scaled up*! But that in itself would represent a huge benefit to research progress and productivity.) But never mind that. We must appeal to our lower instincts, in trying to persuade individual researchers and their institutions that open access is in their interests. So the above data should be taken in a first-come, first-served competitive spirit: Right now, it is definitely not the case that *all* articles are open access. Almost all are not. Nor is the transition happening overnight (as it could have done, already a decade ago). http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0024.gif So the incentive to self-archive comes from the fact that those who do it *now* stand the best chance of changing the relative research impact-ranking (and hence the research funding) in their favor: and the study I've sketched would estimate by just how much. A dimensionless picture of the size of the increment is already visible in: http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0025.gif The RAE data are open-access, so anyone can do this study. But I will try to persuade the Southampton team to do it, in order to provide ammunition for those who are hard at working trying to inform university administrators and research funders about the benefits to be expected from mandating open-access provision for all their research output. [A slight correction to David Spurrett's query about the correlation between > "(a) the extent to which publications by authors at particular > institutions cited research materials available through open access, > with (b) their local institutional budget for expenditure on journals." First, that's the wrong correlation. We've agreed it's not journal budget expenditures that will persuade researchers to self-archive, but research income. Second, we can already answer the question: That correlation is zero, because the small existing volume of open-access there is so far has not led to any toll-cancellations, in any discipline (including Physics, where self-archiving and open-access are most advanced). The correlation *might* change eventually, but that will not be a *cause* of universal open access, but an *effect*: http://www.nature.com/nature/debates/e-access/Articles/harnad.html#B1 ] Lawrence, S. (2001) Free online availability substantially increases a paper's impact. Nature Web Debates. http://www.nature.com/nature/debates/e-access/Articles/lawrence.html Kurtz, Michael J.; Eichhorn, Guenther; Accomazzi, Alberto; Grant, Carolyn S.; Demleitner, Markus; Murray, Stephen S.; Martimbeau, Nathalie; Elwell, Barbara. (submitted) The NASA Astrophysics Data System: Sociology, Bibliometrics, and Impact. http://cfa-www.harvard.edu/~kurtz/jasis-abstract.html the forthcoming Schwartz et al. study http://listserv.nd.edu/cgi-bin/wa?A2=ind0311&L=pamnet&D=1&O=D&P=1632 the work of Andrew Odlyzko: http://www.dtc.umn.edu/~odlyzko/doc/complete.html and Tim Brody's remarkable citebase usage and citation impact calculator http://citebase.eprints.org/cgi-bin/search as well as his usage/citation impact correlator http://citebase.eprints.org/analysis/correlation.php which can predict later citation impact from earlier usage (download) impact using variable time-windows and ranges for the Physics ArXiv (you need the latest java to be able to use it) at: Smith, Andrew, & Eysenck, Michael (2002) "The correlation between RAE ratings and citation counts in psychology," June 2002 http://psyserver.pc.rhbnc.ac.uk/citations.pdf Oppenheim, Charles (1995) The correlation between citation counts and the 1992 Research Assessment Exercises ratings for British library and information science departments, Journal of Documentation, 51:18-27. Oppenheim, Charles (1998) The correlation between citation counts and the 1992 research assessment exercise ratings for British research in genetics, anatomy and archaeology, Journal of Documentation, 53:477-87. http://dois.mimas.ac.uk/DoIS/data/Articles/julkokltny:1998:v:54:i:5:p:477-487.html Holmes, Alison & Oppenheim, Charles (2001) Use of citation analysis to predict the outcome of the 2001 Research Assessment Exercise for Unit of Assessment (UoA) 61: Library and Information Management. http://www.shef.ac.uk/~is/publications/infres/paper103.html Harnad, S., Carr, L., Brody, T. & Oppenheim, C. (2003) Mandated online RAE CVs Linked to University Eprint Archives: Improving the UK Research Assessment Exercise whilst making it cheaper and easier. Ariadne. http://www.ariadne.ac.uk/issue35/harnad/ Stevan Harnad NOTE: A complete archive of the ongoing discussion of providing open access to the peer-reviewed research literature online is available at the American Scientist September Forum (98 & 99 & 00 & 01 & 02 & 03): http://amsci-forum.amsci.org/archives/september98-forum.html http://www.cogsci.soton.ac.uk/~harnad/Hypermail/Amsci/index.html Post discussion to: september98-forum at amsci-forum.amsci.org Dual Open-Access Strategy: BOAI-2 ("gold"): Publish your article in a suitable open-access journal whenever one exists. BOAI-1 ("green"): Otherwise, publish your article in a suitable toll-access journal and also self-archive it. http://www.soros.org/openaccess/read.shtml http://www.ecs.soton.ac.uk/~harnad/Temp/berlin.htm http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0026.gif http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0021.gif http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0024.gif http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0028.gif