A new metrics-related book focused on academic search engines

David Wojick dwojick at CRAIGELLACHIE.US
Fri Oct 10 17:38:54 EDT 2014


Dear Stephen,

Your paper is 42 pages long. Can you point to the section where you explain 
the semantic nature of linking and citation? So far as I know neither 
relation is semantic.

David

David Wojick
http://insidepublicaccess.com/


At 11:26 AM 10/10/2014, you wrote:
>Adminstrative info for SIGMETRICS (for example unsubscribe): 
>http://web.utk.edu/~gwhitney/sigmetrics.html
>David and Jeroen,
>I explain the bases of how Google works semantically by links in my 
>following arXiv posting:
>
>Eugene Garfield, Francis Narin, and PageRank: The Theoretical Bases of the 
>Google Search Engine
>Authors: <http://arxiv.org/find/cs/1/au:+Bensman_S/0/1/0/all/0/1>Stephen 
>J. Bensman
>(Submitted on 13 Dec 2013)
>Abstract: This paper presents a test of the validity of using Google 
>Scholar to evaluate the publications of researchers by comparing the 
>premises on which its search engine, PageRank, is based, to those of 
>Garfield's theory of citation indexing. It finds that the premises are 
>identical and that PageRank and Garfield's theory of citation indexing 
>validate each other.
>Subjects:
>Information Retrieval (cs.IR); Digital Libraries (cs.DL); Physics and 
>Society (physics.soc-ph)
>Cite as:
><http://arxiv.org/abs/1312.3872>arXiv:1312.3872 [cs.IR]
>
>(or <http://arxiv.org/abs/1312.3872v1>arXiv:1312.3872v1 [cs.IR] for this 
>version)
>
>You will see that Garfield’s theory of citation indexing is based upon the 
>premise that subject sets are better defined by links than by words.  This 
>is the same bases on which the Google search engine operates.
>
>Our new paper is entitled “POWER-LAW DISTRIBUTIONS, THE H-INDEX, AND 
>GOOGLE SCHOLAR (GS) CITATIONS: A TEST OF THEIR RELATIONSHIP WITH ECONOMICS 
>NOBELISTS,” and here is its abstract:
>“This paper comprises an analysis of whether Google Scholar (GS) can 
>construct documentary sets relevant for the evaluation of the works of 
>researchers.  The researchers analyzed were two samples of Nobelists in 
>economics: an original sample of five laureates downloaded in September, 
>2011; and a validating sample of laureates downloaded in October, 
>2013.  Two methods were utilized to conduct this analysis.  The first is 
>distributional.  Here it is shown that the distributions of the laureates’ 
>works by total GS citations belong within the Lotkaian or power-law 
>domain, whose major characteristic is asymptote or “tail” to the 
>right.  It also proves that this asymptote is conterminous with the 
>laureates’ h-indexes, which demarcate their core œuvre.  This overlap is 
>proof of both the ability of GS to form relevant documentary sets and the 
>validity of the h-index.  The second method is semantic.  This method 
>shows that the extreme outliers at the right tip of the tail—a signature 
>feature of the economists’ distributions—are not random events but related 
>by subject to contributions to the discipline for which the laureates were 
>awarded this prize.  Another interesting finding is the important role 
>played by working papers in the dissemination of new economic knowledge.”
>This is what I mean by semantic—the works with the highest GS cites were 
>on topics and contributions for which the laureates were awarded the 
>prize.  Semantically that is dead on.  When this paper is finally posted 
>on arXiv, I would appreciate it, if you would vet it, before we submit to 
>a journal with dictatorial referees.
>Respectfully,
>
>Stephen J Bensman
>LSU Libraries
>Lousiana State University
>Baton Rouge, LA 70803
>
>
>
>
>
>
>
>From: ASIS&T Special Interest Group on Metrics 
>[mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Bosman, J.M. (Jeroen)
>Sent: Friday, October 10, 2014 9:57 AM
>To: SIGMETRICS at LISTSERV.UTK.EDU
>Subject: Re: [SIGMETRICS] A new metrics-related book focused on academic 
>search engines
>
>Adminstrative info for SIGMETRICS (for example unsubscribe): 
><http://web.utk.edu/~gwhitney/sigmetrics.html>http://web.utk.edu/~gwhitney/sigmetrics.html 
>
>Stephen,
>
>Maybe I should just have patience and wait for your paper. But do you mean 
>by that it "works semantically by links" that it takes citations into 
>account for its hybrid ranking?  That is a fact and something MAS does as 
>well. Or are you suggesting that GS also looks at links pointing to the 
>web pages of the articles? The latter would be new(s) for me.
>
>One of the differences between G and GS is btw that G has years ago 
>stopped interpreting each space as a Boolean AND, but GS still does, as 
>far as I can tell.
>
>Best regards,
>Jeroen
>
>
>Op 10 okt. 2014 om 16:37 heeft "Stephen J Bensman" 
><<mailto:notsjb at LSU.EDU>notsjb at LSU.EDU> het volgende geschreven:
>Adminstrative info for SIGMETRICS (for example unsubscribe): 
><http://web.utk.edu/~gwhitney/sigmetrics.html>http://web.utk.edu/~gwhitney/sigmetrics.html 
>
>Jeoren,
>This is a revolution with deep roots.  Garfield laid out the main premise 
>of the Google search engine in an article he published in Science in 1955 
>on citation indexing.  It is an accelerating revolution that now is 
>reaching warp speed.
>
>The main reason Google delivers more relevant sets than Microsoft is that 
>it semantically works by links and not words.  This enables it to take 
>advantage of the power-law linkage structure of the WWW to zero in on the 
>most important and relevant documents.
>
>I wish to hell that arXiv would finally post our working paper, where we 
>prove all this with economics Nobelists.  Then I can vet our theories.
>
>Respectfully,
>
>Stephen J Bensman, Ph.D
>LSU Libraries
>Lousiana State University
>Baton Rouge, LA 70803
>
>PS I am a historian by training, and there is nothing that is outdated for 
>me.  Older, highly cited stuff is of the greatest interest, for we may be 
>looking at the influence of time and the degree of incorporation.
>
>From: ASIS&T Special Interest Group on Metrics 
>[<mailto:SIGMETRICS at LISTSERV.UTK.EDU>mailto:SIGMETRICS at LISTSERV.UTK.EDU] 
>On Behalf Of Bosman, J.M. (Jeroen)
>Sent: Thursday, October 09, 2014 4:41 PM
>To: <mailto:SIGMETRICS at LISTSERV.UTK.EDU>SIGMETRICS at LISTSERV.UTK.EDU
>Subject: Re: [SIGMETRICS] A new metrics-related book focused on academic 
>search engines
>
>Adminstrative info for SIGMETRICS (for example unsubscribe): 
><http://web.utk.edu/~gwhitney/sigmetrics.html>http://web.utk.edu/~gwhitney/sigmetrics.html 
>
>Stephen,
>
>Thanks for your insightful elaboration. The ideas stem from about 1935 
>(Otlet), 1945 (Bush) and 1955 (Garfield), the implementation from the 
>early sixties in SCI, futher ideas in 1976 (Narin) and 1989 (Berners-Lee) 
>and Google elaborated on that in 1996 with PageRank and a hydrid . So I 
>doubt that the revolution takes a just a decade. It already has taken some 
>decades and will take some more decades, for the change is not restricted 
>to discovery but includes distribution as well, just as with the printing 
>press and scholarly journal. So probably the 'revolution' will only be 
>complete when at some point in the future the academic book, journal and 
>paper are replaced by instant production/publication/discovery, for 
>instance in a smart nanopublications type of way? Also I think that for 
>the system to collapse Google Scholar is not a conditio sine qua non. 
>ArXiv (1991) and Citeseer (1998) are way older than GS and together they 
>have revolutionized search and distribution more than GS has done, albeit 
>in a much more restricted field of physics and information science.
>
>On a less theoretical note, you say that MAS has been proven wrong and 
>Google Scholar may be wright. But every other day I have to tell my 
>students that in order to get relevant stuff they need to use GS pubyear 
>filters, because if they don't they will end up using highly cited but 
>outdated stuff. Over 95% of my students (>500 each year) had never 
>realised this! By the way, I am not saying that MAS does a better job in 
>this respect and I am a fan of Google Scholar.
>
>Best,
>Jeroen Bosman
>@jeroenbosman
>
>Op 9 okt. 2014 om 22:27 heeft "Stephen J Bensman" 
><<mailto:notsjb at LSU.EDU>notsjb at LSU.EDU> het volgende geschreven:
>Adminstrative info for SIGMETRICS (for example unsubscribe): 
><http://web.utk.edu/~gwhitney/sigmetrics.html>http://web.utk.edu/~gwhitney/sigmetrics.html 
>
>Jeroen
>Here is summary of what I think that we are involved in with academic 
>search engines:
>
>“Academic search engines are an extremely complex topic, since we are now 
>engaged in an information revolution on the same scale as the invention of 
>the printing press in the 15th century and the scientific journal in the 
>17th century, except what was accomplished took centuries then, and we 
>will do it in a decade or so now.  One facet of this information 
>revolution is that what was once semantically defined by words is now 
>semantically defined by linkages.  On top of it, this information 
>revolution is entwined with a scientific revolution on the power-law 
>distributional structure of nature and society that was launched as a 
>result of the development of the World Wide Web.”
>
>Given the complexity of this thing, we need some sort of standardization, 
>so we can better deal with it.  There has to be some sort of agreement on 
>what is right and what is wrong.  MAS seems to be based on a system—number 
>of word tokens in given document—that was proven wrong and ineffective in 
>semantically defining relevant document sets.  For me it is very hard to 
>grasp that a Googlebot crawled out of a garage in Palo Alto in 2004, and 
>suddenly an entire system began to collapse and be replaced by something 
>else.  This took less than 10 years.  The Chinese have a curse about 
>living in interesting times, and our times are sure interesting in this sense.
>
>Respectfully,
>
>Stephen J Bensman
>LSU Libraries
>Lousiana State University
>Baton Rouge, LA 70803
>USA
>
>
>
>
>From: ASIS&T Special Interest Group on Metrics 
>[<mailto:SIGMETRICS at LISTSERV.UTK.EDU>mailto:SIGMETRICS at LISTSERV.UTK.EDU] 
>On Behalf Of Bosman, J.M. (Jeroen)
>Sent: Thursday, October 09, 2014 2:40 PM
>To: <mailto:SIGMETRICS at LISTSERV.UTK.EDU>SIGMETRICS at LISTSERV.UTK.EDU
>Subject: Re: [SIGMETRICS] A new metrics-related book focused on academic 
>search engines
>
>Adminstrative info for SIGMETRICS (for example unsubscribe): 
><http://web.utk.edu/~gwhitney/sigmetrics.html>http://web.utk.edu/~gwhitney/sigmetrics.html 
>
>Isidro, Stephen, Enrique,
>
>Thanks. I already downloaded the book and started reading. Hoewever I do 
>not applaud the fact that MAS is coming to a standstill. I think it offers 
>some very nice options and even unique things (ASAIK) such as the citation 
>contexts. I also do not understand why it is necessary to have a single 
>standard in order to be able to assess how the WWW revolutionizes the 
>scholarly information system. Stephen, could you elaborate on why you 
>think that is necassary? Could that assessment not include various 
>parallel lines of development of these systems? And perhaps we already 
>need an addendum to the book with today's news of the launch of Paperity.
>
>Best,
>Jeroen
>
>
>
>
>
>
>Op 9 okt. 2014 om 18:23 heeft "Stephen J Bensman" 
><<mailto:notsjb at LSU.EDU>notsjb at LSU.EDU> het volgende geschreven:
>Enrique,
>Thank you for this information.  It simplifies matters.  At least MAS no 
>longer needs to be taken into account, and we can focus on Google 
>Scholar.  If we are going to make assessments on how the WWW is 
>revolutionizing the scientific/scholarly information system, we have to 
>have a single standard, and that is Google.  The problems are complex 
>enough without the need to compare competitive systems.  Life was better 
>and easier when the SCI was the single standard just as it was when peer 
>ratings were the only standard
>
>SB.
>
>
>
>From: ASIS&T Special Interest Group on Metrics 
>[<mailto:SIGMETRICS at LISTSERV.UTK.EDU>mailto:SIGMETRICS at LISTSERV.UTK.EDU] 
>On Behalf Of Enrique Orduña
>Sent: Thursday, October 09, 2014 9:47 AM
>To: <mailto:SIGMETRICS at LISTSERV.UTK.EDU>SIGMETRICS at LISTSERV.UTK.EDU
>Subject: Re: [SIGMETRICS] A new metrics-related book focused on academic 
>search engines
>
>Adminstrative info for SIGMETRICS (for example unsubscribe): 
><http://web.utk.edu/~gwhitney/sigmetrics.html>http://web.utk.edu/~gwhitney/sigmetrics.html 
>
>Dear friends,
>
>Interesting issues all of them. And of course I already purchased a copy 
>of Ortega's book :)
>
>As regards Microsoft Academic Search, and PoP software, we must take into 
>account that MAS is completely outdated. This issue is detected by Ortega 
>in his book. Moreover it was published by EC3 Research group by means of a 
>working paper few months ago. A more in-depth analysis has been performed, 
>which has been recently accepted for publication, where we study this drop 
>of coverage according to disciplines, universities and journals.
>
>Therefore, MAS cannot be used now for quantitative purposes. Additionally, 
>the MAS API does not work properly with queries that return hit count 
>estimates surpassing 1,000 results. And we can add finally all sometimes 
>unknown legal considerations in the reuse of Bing results due to Microsoft 
>copyright.
>
>Finally, some official voices from Microsoft announced that MAS results 
>will be integrated into Bing results, in an ongoing processs.
>
>As regards Google Scholar, as Isidro said, "site" command may be used both 
>in Google and Google Scholar. But be carefull, because search commands are 
>changing in Scholar. For example the combination of "site" and "filetype" 
>stopped working. In any case, site command in Google and Bing sometimes 
>get us unexpected results in terms of coverage.
>
>Best,
>
>Enrique
>
>On Thu, Oct 9, 2014 at 4:32 PM, Stephen J Bensman 
><<mailto:notsjb at lsu.edu>notsjb at lsu.edu> wrote:
>Adminstrative info for SIGMETRICS (for example unsubscribe):
><http://web.utk.edu/~gwhitney/sigmetrics.html>http://web.utk.edu/~gwhitney/sigmetrics.html
>
>Isidro,
>Thanks for the information.  I am looking forward to hearing from 
>Jose.  He and I are already in close contact on these matters.  I 
>definitely want you two to vet the paper we have done.  It should be ready 
>soon.  I screwed up in posting in it on arXiv, and it may take a while to 
>correct my stupidity of submitting the damn thing multiple times, because 
>I did not know what I was doing.
>
>You have already answered one of my questions.  The former Yahoo research 
>engine was based upon AltVista, which defined documentary sets by 
>words.  It was this system that Page tested and rejected as delivering 
>incoherent, irrelevant sets.  Instead Page incorporated Garfield's theory 
>of citation indexing, which defines relevant sets by linkages.  He 
>strengthened this by also incorporating Narin's influential method.  Doing 
>this delivered clearer more relevant sets than AltVista.  Multiple 
>linkages are better at semantically defining sets that multiple token 
>words.   If your book presents these facts, then I can strangle Microsoft 
>Academic in its cradle, as Churchill once said of a certain political 
>system that now seems to have come back into vogue.
>
>I hope to get the book and hear from Jose.
>
>Respectfully,
>
>Stephen J Bensman, Ph.D
>LSU Libraries
>Lousiana State University
>Baton Rouge, LA 70803
>USA
>
>
>
>-----Original Message-----
>From: ASIS&T Special Interest Group on Metrics 
>[mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Isidro F. Aguillo
>Sent: Thursday, October 09, 2014 9:07 AM
>To: <mailto:SIGMETRICS at LISTSERV.UTK.EDU>SIGMETRICS at LISTSERV.UTK.EDU
>Subject: Re: [SIGMETRICS] A new metrics-related book focused on academic 
>search engines
>
>Adminstrative info for SIGMETRICS (for example unsubscribe):
><http://web.utk.edu/~gwhitney/sigmetrics.html>http://web.utk.edu/~gwhitney/sigmetrics.html
>
>Dear Stephen,
>
>Ooops!
>
>Sorry, I am not the author of the book. it was written by my collaborator 
>and friend José Luis Ortega, also in this forum, so you can expect an 
>answer from him soon.
>
>But, I can give a few hints to some of your questions. Bing is using the 
>technology of the former Yahoo search engine. I do not know exactly the 
>way Bing works but my feeling is they are using visits as main criteria.
>Probably there are far more variables involved, but number of visits play 
>a similar role to links in Google`s PageRank. Of course, it is also 
>possible links are also taken into account.
>
>Microsoft Academic Search is a completely different animal. Really it is a 
>traditional bibliographic database, but I must recognize that although 
>they are using h-index, I am unable to understand the rankings they 
>publish. To my knowledge, MAS and Bing are completely independent 
>products. On the contrary, Google and Google Scholar are closely interlinked.
>
>Regarding web indicators I use number of webpages under different levels 
>of web addresses, like for example number of webpages in the webservers of 
>your university
>
>site:<http://lsu.edu>lsu.edu
>
>This syntax is valid for Google, Bing and even Google Scholar.
>
>Best regards,
>
>
>
>On 09/10/2014 15:36, Stephen J Bensman wrote:
> > Adminstrative info for SIGMETRICS (for example unsubscribe):
> > 
> <http://web.utk.edu/~gwhitney/sigmetrics.html>http://web.utk.edu/~gwhitney/sigmetrics.html
> >
> > Isidro,
> > Thanks for writing this book-- Academic Search Engines: A Quantitative 
> Outlook.  I am having LSU Libraries buy a copy of it, so you have sold at 
> least one.  I hope that you have discussed the differences between how 
> the Google and Microsoft search engines operate.  I understand how 
> PageRank operates, but I do not understand how Bing operates.  All I know 
> is that you obtain much better results with Google than with Microsoft, 
> which seems to be quite new.  I have tested them both.
> >
> > For your information, Harzing has now interfaced her PoP program with 
> Microsoft Academic as well as Google Scholar.  Now you can really run 
> comparative tests between Google and Microsoft.  You seem to get better 
> results with her PoP than with the Microsoft Academic site itself.  At 
> least her rankings are much better, although it is quite obvious from her 
> program that Microsoft coverage is much weaker.
> >
> > As a matter of curiosity, what metric did you use to measure the 
> quantitative aspects?  You cannot use standard bibliographic 
> classifications such as number of books, journals, journal articles, 
> working papers, etc. etc., because I do not think that either Google or 
> Microsoft can identify these.  The Web has no authority structure 
> whatever.  You are not dealing with OCLC WorldCat.  It must be something 
> like megabytes of data or something like that.
> >
> > We are finishing a paper on how Google Scholar operates.  I'd like you 
> to vet it when we have it ready.
> >
> > Respectfully,
> >
> > Stephen J Bensman, Ph.D.
> > LSU Libraries
> > Lousiana State University
> > Baton Rouge, LA 70803
> > USA
> >
> >
> > -----Original Message-----
> > From: ASIS&T Special Interest Group on Metrics
> > [mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Isidro F. Aguillo
> > Sent: Wednesday, October 08, 2014 6:27 AM
> > To: <mailto:SIGMETRICS at LISTSERV.UTK.EDU>SIGMETRICS at LISTSERV.UTK.EDU
> > Subject: [SIGMETRICS] A new metrics-related book focused on academic
> > search engines
> >
> > Adminstrative info for SIGMETRICS (for example unsubscribe):
> > 
> <http://web.utk.edu/~gwhitney/sigmetrics.html>http://web.utk.edu/~gwhitney/sigmetrics.html
> >
> > José Luis Ortega. Academic Search Engines: A Quantitative Outlook.
> > Elsevier, 2014. Chandos Information Professional Series ISBN
> > 1780634722, 9781780634722
> >
> > 
> <http://store.elsevier.com/Academic-Search-Engines/Jose-Luis-Ortega/isb>http://store.elsevier.com/Academic-Search-Engines/Jose-Luis-Ortega/isb
> > n-9781843347910/
> >
> >
> > Academic Search Engines: intends to run through the current panorama of 
> the academic search engines through a quantitative approach that analyses 
> the reliability and consistence of these services. The objective is to 
> describe the main characteristics of these engines, to highlight their 
> advantages and drawbacks, and to discuss the implications of these new 
> products in the future of scientific communication and their impact on 
> the research measurement and evaluation. In short, Academic Search 
> Engines presents a summary view of the new challenges that the Web set to 
> the scientific activity through the most novel and innovative searching 
> services available on the Web.
> >
> > Key Features:
> > · This is the first approach to analyze search engines exclusively 
> addressed to the research community in an integrative handbook.
> > · This book is not merely a description of the web functionalities of 
> these services; it is a scientific review of the most outstanding 
> characteristics of each platform, discussing their significance with 
> recent investigations.
> > · This book introduces an original methodology based on a quantitative 
> analysis of the covered data through the extensive use of crawlers and 
> harvesters which allow going in depth into how these engines are working.
> >
> > José Luis Ortega (CCHS-CSIC) is a web researcher in the Spanish 
> National Research Council (CSIC). He achieved a fellowship in the 
> Cybermetrics Lab of the CSIC, where he finished his doctoral studies 
> (2003-8). In 2005, he was employed by the Virtual Knowledge Studio of the 
> Royal Netherlands Academy of Sciences and Arts, and in 2008 he took up a 
> position as information scientist in the CSIC. He now continues his 
> collaboration with the Cybermetrics Lab in research areas such as 
> webometrics, web usage mining, visualization of information, academic 
> search engines and social networks for scientists.
> >
>
>
>--
>
>************************************
>Isidro F. Aguillo, HonDr.
>The Cybermetrics Lab, IPP-CSIC
>Grupo Scimago
>Madrid. SPAIN
>
><mailto:isidro.aguillo at csic.es>isidro.aguillo at csic.es
>ORCID 0000-0001-8927-4873
>ResearcherID: A-7280-2008
>Scholar Citations SaCSbeoAAAAJ
>Twitter @isidroaguillo
>Rankings Web <http://webometrics.info>webometrics.info
>************************************
>
>
>---
>Este mensaje no contiene virus ni malware porque la protección de avast! 
>Antivirus está activa.
><http://www.avast.com>http://www.avast.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/sigmetrics/attachments/20141010/857bc2ef/attachment.html>


More information about the SIGMETRICS mailing list