A new metrics-related book focused on academic search engines
David Wojick
dwojick at CRAIGELLACHIE.US
Fri Oct 10 17:38:54 EDT 2014
Dear Stephen,
Your paper is 42 pages long. Can you point to the section where you explain
the semantic nature of linking and citation? So far as I know neither
relation is semantic.
David
David Wojick
http://insidepublicaccess.com/
At 11:26 AM 10/10/2014, you wrote:
>Adminstrative info for SIGMETRICS (for example unsubscribe):
>http://web.utk.edu/~gwhitney/sigmetrics.html
>David and Jeroen,
>I explain the bases of how Google works semantically by links in my
>following arXiv posting:
>
>Eugene Garfield, Francis Narin, and PageRank: The Theoretical Bases of the
>Google Search Engine
>Authors: <http://arxiv.org/find/cs/1/au:+Bensman_S/0/1/0/all/0/1>Stephen
>J. Bensman
>(Submitted on 13 Dec 2013)
>Abstract: This paper presents a test of the validity of using Google
>Scholar to evaluate the publications of researchers by comparing the
>premises on which its search engine, PageRank, is based, to those of
>Garfield's theory of citation indexing. It finds that the premises are
>identical and that PageRank and Garfield's theory of citation indexing
>validate each other.
>Subjects:
>Information Retrieval (cs.IR); Digital Libraries (cs.DL); Physics and
>Society (physics.soc-ph)
>Cite as:
><http://arxiv.org/abs/1312.3872>arXiv:1312.3872 [cs.IR]
>
>(or <http://arxiv.org/abs/1312.3872v1>arXiv:1312.3872v1 [cs.IR] for this
>version)
>
>You will see that Garfields theory of citation indexing is based upon the
>premise that subject sets are better defined by links than by words. This
>is the same bases on which the Google search engine operates.
>
>Our new paper is entitled POWER-LAW DISTRIBUTIONS, THE H-INDEX, AND
>GOOGLE SCHOLAR (GS) CITATIONS: A TEST OF THEIR RELATIONSHIP WITH ECONOMICS
>NOBELISTS, and here is its abstract:
>This paper comprises an analysis of whether Google Scholar (GS) can
>construct documentary sets relevant for the evaluation of the works of
>researchers. The researchers analyzed were two samples of Nobelists in
>economics: an original sample of five laureates downloaded in September,
>2011; and a validating sample of laureates downloaded in October,
>2013. Two methods were utilized to conduct this analysis. The first is
>distributional. Here it is shown that the distributions of the laureates
>works by total GS citations belong within the Lotkaian or power-law
>domain, whose major characteristic is asymptote or tail to the
>right. It also proves that this asymptote is conterminous with the
>laureates h-indexes, which demarcate their core uvre. This overlap is
>proof of both the ability of GS to form relevant documentary sets and the
>validity of the h-index. The second method is semantic. This method
>shows that the extreme outliers at the right tip of the taila signature
>feature of the economists distributionsare not random events but related
>by subject to contributions to the discipline for which the laureates were
>awarded this prize. Another interesting finding is the important role
>played by working papers in the dissemination of new economic knowledge.
>This is what I mean by semanticthe works with the highest GS cites were
>on topics and contributions for which the laureates were awarded the
>prize. Semantically that is dead on. When this paper is finally posted
>on arXiv, I would appreciate it, if you would vet it, before we submit to
>a journal with dictatorial referees.
>Respectfully,
>
>Stephen J Bensman
>LSU Libraries
>Lousiana State University
>Baton Rouge, LA 70803
>
>
>
>
>
>
>
>From: ASIS&T Special Interest Group on Metrics
>[mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Bosman, J.M. (Jeroen)
>Sent: Friday, October 10, 2014 9:57 AM
>To: SIGMETRICS at LISTSERV.UTK.EDU
>Subject: Re: [SIGMETRICS] A new metrics-related book focused on academic
>search engines
>
>Adminstrative info for SIGMETRICS (for example unsubscribe):
><http://web.utk.edu/~gwhitney/sigmetrics.html>http://web.utk.edu/~gwhitney/sigmetrics.html
>
>Stephen,
>
>Maybe I should just have patience and wait for your paper. But do you mean
>by that it "works semantically by links" that it takes citations into
>account for its hybrid ranking? That is a fact and something MAS does as
>well. Or are you suggesting that GS also looks at links pointing to the
>web pages of the articles? The latter would be new(s) for me.
>
>One of the differences between G and GS is btw that G has years ago
>stopped interpreting each space as a Boolean AND, but GS still does, as
>far as I can tell.
>
>Best regards,
>Jeroen
>
>
>Op 10 okt. 2014 om 16:37 heeft "Stephen J Bensman"
><<mailto:notsjb at LSU.EDU>notsjb at LSU.EDU> het volgende geschreven:
>Adminstrative info for SIGMETRICS (for example unsubscribe):
><http://web.utk.edu/~gwhitney/sigmetrics.html>http://web.utk.edu/~gwhitney/sigmetrics.html
>
>Jeoren,
>This is a revolution with deep roots. Garfield laid out the main premise
>of the Google search engine in an article he published in Science in 1955
>on citation indexing. It is an accelerating revolution that now is
>reaching warp speed.
>
>The main reason Google delivers more relevant sets than Microsoft is that
>it semantically works by links and not words. This enables it to take
>advantage of the power-law linkage structure of the WWW to zero in on the
>most important and relevant documents.
>
>I wish to hell that arXiv would finally post our working paper, where we
>prove all this with economics Nobelists. Then I can vet our theories.
>
>Respectfully,
>
>Stephen J Bensman, Ph.D
>LSU Libraries
>Lousiana State University
>Baton Rouge, LA 70803
>
>PS I am a historian by training, and there is nothing that is outdated for
>me. Older, highly cited stuff is of the greatest interest, for we may be
>looking at the influence of time and the degree of incorporation.
>
>From: ASIS&T Special Interest Group on Metrics
>[<mailto:SIGMETRICS at LISTSERV.UTK.EDU>mailto:SIGMETRICS at LISTSERV.UTK.EDU]
>On Behalf Of Bosman, J.M. (Jeroen)
>Sent: Thursday, October 09, 2014 4:41 PM
>To: <mailto:SIGMETRICS at LISTSERV.UTK.EDU>SIGMETRICS at LISTSERV.UTK.EDU
>Subject: Re: [SIGMETRICS] A new metrics-related book focused on academic
>search engines
>
>Adminstrative info for SIGMETRICS (for example unsubscribe):
><http://web.utk.edu/~gwhitney/sigmetrics.html>http://web.utk.edu/~gwhitney/sigmetrics.html
>
>Stephen,
>
>Thanks for your insightful elaboration. The ideas stem from about 1935
>(Otlet), 1945 (Bush) and 1955 (Garfield), the implementation from the
>early sixties in SCI, futher ideas in 1976 (Narin) and 1989 (Berners-Lee)
>and Google elaborated on that in 1996 with PageRank and a hydrid . So I
>doubt that the revolution takes a just a decade. It already has taken some
>decades and will take some more decades, for the change is not restricted
>to discovery but includes distribution as well, just as with the printing
>press and scholarly journal. So probably the 'revolution' will only be
>complete when at some point in the future the academic book, journal and
>paper are replaced by instant production/publication/discovery, for
>instance in a smart nanopublications type of way? Also I think that for
>the system to collapse Google Scholar is not a conditio sine qua non.
>ArXiv (1991) and Citeseer (1998) are way older than GS and together they
>have revolutionized search and distribution more than GS has done, albeit
>in a much more restricted field of physics and information science.
>
>On a less theoretical note, you say that MAS has been proven wrong and
>Google Scholar may be wright. But every other day I have to tell my
>students that in order to get relevant stuff they need to use GS pubyear
>filters, because if they don't they will end up using highly cited but
>outdated stuff. Over 95% of my students (>500 each year) had never
>realised this! By the way, I am not saying that MAS does a better job in
>this respect and I am a fan of Google Scholar.
>
>Best,
>Jeroen Bosman
>@jeroenbosman
>
>Op 9 okt. 2014 om 22:27 heeft "Stephen J Bensman"
><<mailto:notsjb at LSU.EDU>notsjb at LSU.EDU> het volgende geschreven:
>Adminstrative info for SIGMETRICS (for example unsubscribe):
><http://web.utk.edu/~gwhitney/sigmetrics.html>http://web.utk.edu/~gwhitney/sigmetrics.html
>
>Jeroen
>Here is summary of what I think that we are involved in with academic
>search engines:
>
>Academic search engines are an extremely complex topic, since we are now
>engaged in an information revolution on the same scale as the invention of
>the printing press in the 15th century and the scientific journal in the
>17th century, except what was accomplished took centuries then, and we
>will do it in a decade or so now. One facet of this information
>revolution is that what was once semantically defined by words is now
>semantically defined by linkages. On top of it, this information
>revolution is entwined with a scientific revolution on the power-law
>distributional structure of nature and society that was launched as a
>result of the development of the World Wide Web.
>
>Given the complexity of this thing, we need some sort of standardization,
>so we can better deal with it. There has to be some sort of agreement on
>what is right and what is wrong. MAS seems to be based on a systemnumber
>of word tokens in given documentthat was proven wrong and ineffective in
>semantically defining relevant document sets. For me it is very hard to
>grasp that a Googlebot crawled out of a garage in Palo Alto in 2004, and
>suddenly an entire system began to collapse and be replaced by something
>else. This took less than 10 years. The Chinese have a curse about
>living in interesting times, and our times are sure interesting in this sense.
>
>Respectfully,
>
>Stephen J Bensman
>LSU Libraries
>Lousiana State University
>Baton Rouge, LA 70803
>USA
>
>
>
>
>From: ASIS&T Special Interest Group on Metrics
>[<mailto:SIGMETRICS at LISTSERV.UTK.EDU>mailto:SIGMETRICS at LISTSERV.UTK.EDU]
>On Behalf Of Bosman, J.M. (Jeroen)
>Sent: Thursday, October 09, 2014 2:40 PM
>To: <mailto:SIGMETRICS at LISTSERV.UTK.EDU>SIGMETRICS at LISTSERV.UTK.EDU
>Subject: Re: [SIGMETRICS] A new metrics-related book focused on academic
>search engines
>
>Adminstrative info for SIGMETRICS (for example unsubscribe):
><http://web.utk.edu/~gwhitney/sigmetrics.html>http://web.utk.edu/~gwhitney/sigmetrics.html
>
>Isidro, Stephen, Enrique,
>
>Thanks. I already downloaded the book and started reading. Hoewever I do
>not applaud the fact that MAS is coming to a standstill. I think it offers
>some very nice options and even unique things (ASAIK) such as the citation
>contexts. I also do not understand why it is necessary to have a single
>standard in order to be able to assess how the WWW revolutionizes the
>scholarly information system. Stephen, could you elaborate on why you
>think that is necassary? Could that assessment not include various
>parallel lines of development of these systems? And perhaps we already
>need an addendum to the book with today's news of the launch of Paperity.
>
>Best,
>Jeroen
>
>
>
>
>
>
>Op 9 okt. 2014 om 18:23 heeft "Stephen J Bensman"
><<mailto:notsjb at LSU.EDU>notsjb at LSU.EDU> het volgende geschreven:
>Enrique,
>Thank you for this information. It simplifies matters. At least MAS no
>longer needs to be taken into account, and we can focus on Google
>Scholar. If we are going to make assessments on how the WWW is
>revolutionizing the scientific/scholarly information system, we have to
>have a single standard, and that is Google. The problems are complex
>enough without the need to compare competitive systems. Life was better
>and easier when the SCI was the single standard just as it was when peer
>ratings were the only standard
>
>SB.
>
>
>
>From: ASIS&T Special Interest Group on Metrics
>[<mailto:SIGMETRICS at LISTSERV.UTK.EDU>mailto:SIGMETRICS at LISTSERV.UTK.EDU]
>On Behalf Of Enrique Orduña
>Sent: Thursday, October 09, 2014 9:47 AM
>To: <mailto:SIGMETRICS at LISTSERV.UTK.EDU>SIGMETRICS at LISTSERV.UTK.EDU
>Subject: Re: [SIGMETRICS] A new metrics-related book focused on academic
>search engines
>
>Adminstrative info for SIGMETRICS (for example unsubscribe):
><http://web.utk.edu/~gwhitney/sigmetrics.html>http://web.utk.edu/~gwhitney/sigmetrics.html
>
>Dear friends,
>
>Interesting issues all of them. And of course I already purchased a copy
>of Ortega's book :)
>
>As regards Microsoft Academic Search, and PoP software, we must take into
>account that MAS is completely outdated. This issue is detected by Ortega
>in his book. Moreover it was published by EC3 Research group by means of a
>working paper few months ago. A more in-depth analysis has been performed,
>which has been recently accepted for publication, where we study this drop
>of coverage according to disciplines, universities and journals.
>
>Therefore, MAS cannot be used now for quantitative purposes. Additionally,
>the MAS API does not work properly with queries that return hit count
>estimates surpassing 1,000 results. And we can add finally all sometimes
>unknown legal considerations in the reuse of Bing results due to Microsoft
>copyright.
>
>Finally, some official voices from Microsoft announced that MAS results
>will be integrated into Bing results, in an ongoing processs.
>
>As regards Google Scholar, as Isidro said, "site" command may be used both
>in Google and Google Scholar. But be carefull, because search commands are
>changing in Scholar. For example the combination of "site" and "filetype"
>stopped working. In any case, site command in Google and Bing sometimes
>get us unexpected results in terms of coverage.
>
>Best,
>
>Enrique
>
>On Thu, Oct 9, 2014 at 4:32 PM, Stephen J Bensman
><<mailto:notsjb at lsu.edu>notsjb at lsu.edu> wrote:
>Adminstrative info for SIGMETRICS (for example unsubscribe):
><http://web.utk.edu/~gwhitney/sigmetrics.html>http://web.utk.edu/~gwhitney/sigmetrics.html
>
>Isidro,
>Thanks for the information. I am looking forward to hearing from
>Jose. He and I are already in close contact on these matters. I
>definitely want you two to vet the paper we have done. It should be ready
>soon. I screwed up in posting in it on arXiv, and it may take a while to
>correct my stupidity of submitting the damn thing multiple times, because
>I did not know what I was doing.
>
>You have already answered one of my questions. The former Yahoo research
>engine was based upon AltVista, which defined documentary sets by
>words. It was this system that Page tested and rejected as delivering
>incoherent, irrelevant sets. Instead Page incorporated Garfield's theory
>of citation indexing, which defines relevant sets by linkages. He
>strengthened this by also incorporating Narin's influential method. Doing
>this delivered clearer more relevant sets than AltVista. Multiple
>linkages are better at semantically defining sets that multiple token
>words. If your book presents these facts, then I can strangle Microsoft
>Academic in its cradle, as Churchill once said of a certain political
>system that now seems to have come back into vogue.
>
>I hope to get the book and hear from Jose.
>
>Respectfully,
>
>Stephen J Bensman, Ph.D
>LSU Libraries
>Lousiana State University
>Baton Rouge, LA 70803
>USA
>
>
>
>-----Original Message-----
>From: ASIS&T Special Interest Group on Metrics
>[mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Isidro F. Aguillo
>Sent: Thursday, October 09, 2014 9:07 AM
>To: <mailto:SIGMETRICS at LISTSERV.UTK.EDU>SIGMETRICS at LISTSERV.UTK.EDU
>Subject: Re: [SIGMETRICS] A new metrics-related book focused on academic
>search engines
>
>Adminstrative info for SIGMETRICS (for example unsubscribe):
><http://web.utk.edu/~gwhitney/sigmetrics.html>http://web.utk.edu/~gwhitney/sigmetrics.html
>
>Dear Stephen,
>
>Ooops!
>
>Sorry, I am not the author of the book. it was written by my collaborator
>and friend José Luis Ortega, also in this forum, so you can expect an
>answer from him soon.
>
>But, I can give a few hints to some of your questions. Bing is using the
>technology of the former Yahoo search engine. I do not know exactly the
>way Bing works but my feeling is they are using visits as main criteria.
>Probably there are far more variables involved, but number of visits play
>a similar role to links in Google`s PageRank. Of course, it is also
>possible links are also taken into account.
>
>Microsoft Academic Search is a completely different animal. Really it is a
>traditional bibliographic database, but I must recognize that although
>they are using h-index, I am unable to understand the rankings they
>publish. To my knowledge, MAS and Bing are completely independent
>products. On the contrary, Google and Google Scholar are closely interlinked.
>
>Regarding web indicators I use number of webpages under different levels
>of web addresses, like for example number of webpages in the webservers of
>your university
>
>site:<http://lsu.edu>lsu.edu
>
>This syntax is valid for Google, Bing and even Google Scholar.
>
>Best regards,
>
>
>
>On 09/10/2014 15:36, Stephen J Bensman wrote:
> > Adminstrative info for SIGMETRICS (for example unsubscribe):
> >
> <http://web.utk.edu/~gwhitney/sigmetrics.html>http://web.utk.edu/~gwhitney/sigmetrics.html
> >
> > Isidro,
> > Thanks for writing this book-- Academic Search Engines: A Quantitative
> Outlook. I am having LSU Libraries buy a copy of it, so you have sold at
> least one. I hope that you have discussed the differences between how
> the Google and Microsoft search engines operate. I understand how
> PageRank operates, but I do not understand how Bing operates. All I know
> is that you obtain much better results with Google than with Microsoft,
> which seems to be quite new. I have tested them both.
> >
> > For your information, Harzing has now interfaced her PoP program with
> Microsoft Academic as well as Google Scholar. Now you can really run
> comparative tests between Google and Microsoft. You seem to get better
> results with her PoP than with the Microsoft Academic site itself. At
> least her rankings are much better, although it is quite obvious from her
> program that Microsoft coverage is much weaker.
> >
> > As a matter of curiosity, what metric did you use to measure the
> quantitative aspects? You cannot use standard bibliographic
> classifications such as number of books, journals, journal articles,
> working papers, etc. etc., because I do not think that either Google or
> Microsoft can identify these. The Web has no authority structure
> whatever. You are not dealing with OCLC WorldCat. It must be something
> like megabytes of data or something like that.
> >
> > We are finishing a paper on how Google Scholar operates. I'd like you
> to vet it when we have it ready.
> >
> > Respectfully,
> >
> > Stephen J Bensman, Ph.D.
> > LSU Libraries
> > Lousiana State University
> > Baton Rouge, LA 70803
> > USA
> >
> >
> > -----Original Message-----
> > From: ASIS&T Special Interest Group on Metrics
> > [mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Isidro F. Aguillo
> > Sent: Wednesday, October 08, 2014 6:27 AM
> > To: <mailto:SIGMETRICS at LISTSERV.UTK.EDU>SIGMETRICS at LISTSERV.UTK.EDU
> > Subject: [SIGMETRICS] A new metrics-related book focused on academic
> > search engines
> >
> > Adminstrative info for SIGMETRICS (for example unsubscribe):
> >
> <http://web.utk.edu/~gwhitney/sigmetrics.html>http://web.utk.edu/~gwhitney/sigmetrics.html
> >
> > José Luis Ortega. Academic Search Engines: A Quantitative Outlook.
> > Elsevier, 2014. Chandos Information Professional Series ISBN
> > 1780634722, 9781780634722
> >
> >
> <http://store.elsevier.com/Academic-Search-Engines/Jose-Luis-Ortega/isb>http://store.elsevier.com/Academic-Search-Engines/Jose-Luis-Ortega/isb
> > n-9781843347910/
> >
> >
> > Academic Search Engines: intends to run through the current panorama of
> the academic search engines through a quantitative approach that analyses
> the reliability and consistence of these services. The objective is to
> describe the main characteristics of these engines, to highlight their
> advantages and drawbacks, and to discuss the implications of these new
> products in the future of scientific communication and their impact on
> the research measurement and evaluation. In short, Academic Search
> Engines presents a summary view of the new challenges that the Web set to
> the scientific activity through the most novel and innovative searching
> services available on the Web.
> >
> > Key Features:
> > · This is the first approach to analyze search engines exclusively
> addressed to the research community in an integrative handbook.
> > · This book is not merely a description of the web functionalities of
> these services; it is a scientific review of the most outstanding
> characteristics of each platform, discussing their significance with
> recent investigations.
> > · This book introduces an original methodology based on a quantitative
> analysis of the covered data through the extensive use of crawlers and
> harvesters which allow going in depth into how these engines are working.
> >
> > José Luis Ortega (CCHS-CSIC) is a web researcher in the Spanish
> National Research Council (CSIC). He achieved a fellowship in the
> Cybermetrics Lab of the CSIC, where he finished his doctoral studies
> (2003-8). In 2005, he was employed by the Virtual Knowledge Studio of the
> Royal Netherlands Academy of Sciences and Arts, and in 2008 he took up a
> position as information scientist in the CSIC. He now continues his
> collaboration with the Cybermetrics Lab in research areas such as
> webometrics, web usage mining, visualization of information, academic
> search engines and social networks for scientists.
> >
>
>
>--
>
>************************************
>Isidro F. Aguillo, HonDr.
>The Cybermetrics Lab, IPP-CSIC
>Grupo Scimago
>Madrid. SPAIN
>
><mailto:isidro.aguillo at csic.es>isidro.aguillo at csic.es
>ORCID 0000-0001-8927-4873
>ResearcherID: A-7280-2008
>Scholar Citations SaCSbeoAAAAJ
>Twitter @isidroaguillo
>Rankings Web <http://webometrics.info>webometrics.info
>************************************
>
>
>---
>Este mensaje no contiene virus ni malware porque la protección de avast!
>Antivirus está activa.
><http://www.avast.com>http://www.avast.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/sigmetrics/attachments/20141010/857bc2ef/attachment.html>
More information about the SIGMETRICS
mailing list