Aw: Re: [SIGMETRICS] A new metrics-related book focused on academic search engines

Enrique Orduña riorma at GMAIL.COM
Fri Oct 10 04:42:33 EDT 2014


Dear colleagues,

Stephen, I do not believe that having a single standard with Google Scholar
is good, I would prefer some market competition.

The models provided by Google and Microsoft were completely different.
Google has won. But MAS provided better functionalities, robustness, etc.
The model followed by two companies in the creation of researcher profiles
maybe was the key in the downgrade of MAS. In Google, people control their
profiles directly and everyday are reducing errors by correcting
information themselves.

Otherwise I agree with the comments of David about the differences between
Google and Google Scholar. Perhaps some journal editors should understand
that academic journals are websites as well.

The relevance of documents set for a especific query is essential for an
academic search engine succesful. Mixing information retrieval systems and
science metrics gives this interesting scene, in which the contribution of
Jose Luis is of much interest.

enrique



On Fri, Oct 10, 2014 at 10:26 AM, Jose Luis Ortega <jose_ortega at gmx.net>
wrote:

> Adminstrative info for SIGMETRICS (for example unsubscribe):
> http://web.utk.edu/~gwhitney/sigmetrics.html
>
> Dear Jeroen,
>
> Thank you for starting the book reading. I am completly agree with your
> according to the retrieval problems of GS, concretly the ranking algorithm.
> This is proper for general web pages but not entirely for research
> documents. It gives excesive weight to citations and less to word matching,
> this causes that the first documents are alway old papers with a lot of
> citations but irrelevant to the query. This is a interesting point because
> we talk too much on research evaluation, citations, h-index, etc. in search
> engines but we forget the main utility of these services: retrievering
> information. And this facet, I think, shows several and important gaps in
> every academic search engine.
>
> On MAS updating, I consider that MAS is in a standstill because its last
> updating was in 2012. This is a serious problem because their data are so
> old that make impossible to be informed on the new scientific results.
>
> Regards
>
> José Luis Ortega
> Cybermetrics Lab
>  *Gesendet:* Donnerstag, 09. Oktober 2014 um 23:40 Uhr
> *Von:* "Bosman, J.M. (Jeroen)" <j.bosman at UU.NL>
> *An:* SIGMETRICS at LISTSERV.UTK.EDU
> *Betreff:* Re: [SIGMETRICS] A new metrics-related book focused on
> academic search engines
> Adminstrative info for SIGMETRICS (for example unsubscribe):
> http://web.utk.edu/~gwhitney/sigmetrics.html
> Stephen,
>
> Thanks for your insightful elaboration. The ideas stem from about 1935
> (Otlet), 1945 (Bush) and 1955 (Garfield), the implementation from the early
> sixties in SCI, futher ideas in 1976 (Narin) and 1989 (Berners-Lee) and
> Google elaborated on that in 1996 with PageRank and a hydrid . So I doubt
> that the revolution takes a just a decade. It already has taken some
> decades and will take some more decades, for the change is not restricted
> to discovery but includes distribution as well, just as with the printing
> press and scholarly journal. So probably the 'revolution' will only be
> complete when at some point in the future the academic book, journal and
> paper are replaced by instant production/publication/discovery, for
> instance in a smart nanopublications type of way? Also I think that for the
> system to collapse Google Scholar is not a conditio sine qua non. ArXiv
> (1991) and Citeseer (1998) are way older than GS and together they have
> revolutionized search and distribution more than GS has done, albeit in a
> much more restricted field of physics and information science.
>
> On a less theoretical note, you say that MAS has been proven wrong and
> Google Scholar may be wright. But every other day I have to tell my
> students that in order to get relevant stuff they need to use GS pubyear
> filters, because if they don't they will end up using highly cited but
> outdated stuff. Over 95% of my students (>500 each year) had never realised
> this! By the way, I am not saying that MAS does a better job in this
> respect and I am a fan of Google Scholar.
>
> Best,
> Jeroen Bosman
> @jeroenbosman
>
> Op 9 okt. 2014 om 22:27 heeft "Stephen J Bensman" <notsjb at LSU.EDU> het
> volgende geschreven:
>
>
> Adminstrative info for SIGMETRICS (for example unsubscribe):
> http://web.utk.edu/~gwhitney/sigmetrics.html
>
> Jeroen
>
> Here is summary of what I think that we are involved in with academic
> search engines:
>
>
>
> “Academic search engines are an extremely complex topic, since we are now
> engaged in an information revolution on the same scale as the invention of
> the printing press in the 15th century and the scientific journal in the
> 17th century, except what was accomplished took centuries then, and we
> will do it in a decade or so now.  One facet of this information revolution
> is that what was once semantically defined by words is now semantically
> defined by linkages.  On top of it, this information revolution is entwined
> with a scientific revolution on the power-law distributional structure of
> nature and society that was launched as a result of the development of the
> World Wide Web.”
>
>
>
> Given the complexity of this thing, we need some sort of standardization,
> so we can better deal with it.  There has to be some sort of agreement on
> what is right and what is wrong.  MAS seems to be based on a system—number
> of word tokens in given document—that was proven wrong and ineffective in
> semantically defining relevant document sets.  For me it is very hard to
> grasp that a Googlebot crawled out of a garage in Palo Alto in 2004, and
> suddenly an entire system began to collapse and be replaced by something
> else.  This took less than 10 years.  The Chinese have a curse about living
> in interesting times, and our times are sure interesting in this sense.
>
>
>
> Respectfully,
>
>
>
> Stephen J Bensman
>
> LSU Libraries
>
> Lousiana State University
>
> Baton Rouge, LA 70803
>
> USA
>
>
>
>
>
>
>
>
>
>
>
> *From:* ASIS&T Special Interest Group on Metrics [mailto:
> SIGMETRICS at LISTSERV.UTK.EDU] *On Behalf Of *Bosman, J.M. (Jeroen)
> *Sent:* Thursday, October 09, 2014 2:40 PM
> *To:* SIGMETRICS at LISTSERV.UTK.EDU
> *Subject:* Re: [SIGMETRICS] A new metrics-related book focused on
> academic search engines
>
>
>
> Adminstrative info for SIGMETRICS (for example unsubscribe):
> http://web.utk.edu/~gwhitney/sigmetrics.html
>
> Isidro, Stephen, Enrique,
>
>
>
> Thanks. I already downloaded the book and started reading. Hoewever I do
> not applaud the fact that MAS is coming to a standstill. I think it offers
> some very nice options and even unique things (ASAIK) such as the citation
> contexts. I also do not understand why it is necessary to have a single
> standard in order to be able to assess how the WWW revolutionizes the
> scholarly information system. Stephen, could you elaborate on why you think
> that is necassary? Could that assessment not include various parallel lines
> of development of these systems? And perhaps we already need an addendum to
> the book with today's news of the launch of Paperity.
>
>
>
> Best,
>
> Jeroen
>
>
>
>
>
>
>
> Op 9 okt. 2014 om 18:23 heeft "Stephen J Bensman" <notsjb at LSU.EDU> het
> volgende geschreven:
>
>  Enrique,
>
> Thank you for this information.  It simplifies matters.  At least MAS no
> longer needs to be taken into account, and we can focus on Google Scholar.
> If we are going to make assessments on how the WWW is revolutionizing the
> scientific/scholarly information system, we have to have a single standard,
> and that is Google.  The problems are complex enough without the need to
> compare competitive systems.  Life was better and easier when the SCI was
> the single standard just as it was when peer ratings were the only standard
>
>
>
> SB.
>
>
>
>
>
>
>
> *From:* ASIS&T Special Interest Group on Metrics [
> mailto:SIGMETRICS at LISTSERV.UTK.EDU <http://SIGMETRICS@LISTSERV.UTK.EDU>] *On
> Behalf Of *Enrique Orduña
> *Sent:* Thursday, October 09, 2014 9:47 AM
> *To:* SIGMETRICS at LISTSERV.UTK.EDU
> *Subject:* Re: [SIGMETRICS] A new metrics-related book focused on
> academic search engines
>
>
>
> Adminstrative info for SIGMETRICS (for example unsubscribe):
> http://web.utk.edu/~gwhitney/sigmetrics.html
>
> Dear friends,
>
>
>
> Interesting issues all of them. And of course I already purchased a copy
> of Ortega's book :)
>
>
>
> As regards Microsoft Academic Search, and PoP software, we must take into
> account that MAS is completely outdated. This issue is detected by Ortega
> in his book. Moreover it was published by EC3 Research group by means of a
> working paper few months ago. A more in-depth analysis has been performed,
> which has been recently accepted for publication, where we study this drop
> of coverage according to disciplines, universities and journals.
>
>
>
> Therefore, MAS cannot be used now for quantitative purposes. Additionally,
> the MAS API does not work properly with queries that return hit count
> estimates surpassing 1,000 results. And we can add finally all sometimes
> unknown legal considerations in the reuse of Bing results due to Microsoft
> copyright.
>
>
>
> Finally, some official voices from Microsoft announced that MAS results
> will be integrated into Bing results, in an ongoing processs.
>
>
>
> As regards Google Scholar, as Isidro said, "site" command may be used both
> in Google and Google Scholar. But be carefull, because search commands are
> changing in Scholar. For example the combination of "site" and "filetype"
> stopped working. In any case, site command in Google and Bing sometimes get
> us unexpected results in terms of coverage.
>
>
>
> Best,
>
>
>
> Enrique
>
>
>
> On Thu, Oct 9, 2014 at 4:32 PM, Stephen J Bensman <notsjb at lsu.edu> wrote:
>
> Adminstrative info for SIGMETRICS (for example unsubscribe):
> http://web.utk.edu/~gwhitney/sigmetrics.html
>
> Isidro,
> Thanks for the information.  I am looking forward to hearing from Jose.
> He and I are already in close contact on these matters.  I definitely want
> you two to vet the paper we have done.  It should be ready soon.  I screwed
> up in posting in it on arXiv, and it may take a while to correct my
> stupidity of submitting the damn thing multiple times, because I did not
> know what I was doing.
>
> You have already answered one of my questions.  The former Yahoo research
> engine was based upon AltVista, which defined documentary sets by words.
> It was this system that Page tested and rejected as delivering incoherent,
> irrelevant sets.  Instead Page incorporated Garfield's theory of citation
> indexing, which defines relevant sets by linkages.  He strengthened this by
> also incorporating Narin's influential method.  Doing this delivered
> clearer more relevant sets than AltVista.  Multiple linkages are better at
> semantically defining sets that multiple token words.   If your book
> presents these facts, then I can strangle Microsoft Academic in its cradle,
> as Churchill once said of a certain political system that now seems to have
> come back into vogue.
>
> I hope to get the book and hear from Jose.
>
> Respectfully,
>
> Stephen J Bensman, Ph.D
> LSU Libraries
> Lousiana State University
> Baton Rouge, LA 70803
> USA
>
>
>
> -----Original Message-----
> From: ASIS&T Special Interest Group on Metrics [mailto:
> SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Isidro F. Aguillo
>
> Sent: Thursday, October 09, 2014 9:07 AM
> To: SIGMETRICS at LISTSERV.UTK.EDU
> Subject: Re: [SIGMETRICS] A new metrics-related book focused on academic
> search engines
>
> Adminstrative info for SIGMETRICS (for example unsubscribe):
> http://web.utk.edu/~gwhitney/sigmetrics.html
>
> Dear Stephen,
>
> Ooops!
>
> Sorry, I am not the author of the book. it was written by my collaborator
> and friend José Luis Ortega, also in this forum, so you can expect an
> answer from him soon.
>
> But, I can give a few hints to some of your questions. Bing is using the
> technology of the former Yahoo search engine. I do not know exactly the way
> Bing works but my feeling is they are using visits as main criteria.
> Probably there are far more variables involved, but number of visits play
> a similar role to links in Google`s PageRank. Of course, it is also
> possible links are also taken into account.
>
> Microsoft Academic Search is a completely different animal. Really it is a
> traditional bibliographic database, but I must recognize that although they
> are using h-index, I am unable to understand the rankings they publish. To
> my knowledge, MAS and Bing are completely independent products. On the
> contrary, Google and Google Scholar are closely interlinked.
>
> Regarding web indicators I use number of webpages under different levels
> of web addresses, like for example number of webpages in the webservers of
> your university
>
> site:lsu.edu
>
> This syntax is valid for Google, Bing and even Google Scholar.
>
> Best regards,
>
>
>
> On 09/10/2014 15:36, Stephen J Bensman wrote:
> > Adminstrative info for SIGMETRICS (for example unsubscribe):
> > http://web.utk.edu/~gwhitney/sigmetrics.html
> >
> > Isidro,
> > Thanks for writing this book-- Academic Search Engines: A Quantitative
> Outlook.  I am having LSU Libraries buy a copy of it, so you have sold at
> least one.  I hope that you have discussed the differences between how the
> Google and Microsoft search engines operate.  I understand how PageRank
> operates, but I do not understand how Bing operates.  All I know is that
> you obtain much better results with Google than with Microsoft, which seems
> to be quite new.  I have tested them both.
> >
> > For your information, Harzing has now interfaced her PoP program with
> Microsoft Academic as well as Google Scholar.  Now you can really run
> comparative tests between Google and Microsoft.  You seem to get better
> results with her PoP than with the Microsoft Academic site itself.  At
> least her rankings are much better, although it is quite obvious from her
> program that Microsoft coverage is much weaker.
> >
> > As a matter of curiosity, what metric did you use to measure the
> quantitative aspects?  You cannot use standard bibliographic
> classifications such as number of books, journals, journal articles,
> working papers, etc. etc., because I do not think that either Google or
> Microsoft can identify these.  The Web has no authority structure
> whatever.  You are not dealing with OCLC WorldCat.  It must be something
> like megabytes of data or something like that.
> >
> > We are finishing a paper on how Google Scholar operates.  I'd like you
> to vet it when we have it ready.
> >
> > Respectfully,
> >
> > Stephen J Bensman, Ph.D.
> > LSU Libraries
> > Lousiana State University
> > Baton Rouge, LA 70803
> > USA
> >
> >
> > -----Original Message-----
> > From: ASIS&T Special Interest Group on Metrics
> > [mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Isidro F. Aguillo
> > Sent: Wednesday, October 08, 2014 6:27 AM
> > To: SIGMETRICS at LISTSERV.UTK.EDU
> > Subject: [SIGMETRICS] A new metrics-related book focused on academic
> > search engines
> >
> > Adminstrative info for SIGMETRICS (for example unsubscribe):
> > http://web.utk.edu/~gwhitney/sigmetrics.html
> >
> > José Luis Ortega. Academic Search Engines: A Quantitative Outlook.
> > Elsevier, 2014. Chandos Information Professional Series ISBN
> > 1780634722, 9781780634722
> >
> > http://store.elsevier.com/Academic-Search-Engines/Jose-Luis-Ortega/isb
> > n-9781843347910/
> >
> >
> > Academic Search Engines: intends to run through the current panorama of
> the academic search engines through a quantitative approach that analyses
> the reliability and consistence of these services. The objective is to
> describe the main characteristics of these engines, to highlight their
> advantages and drawbacks, and to discuss the implications of these new
> products in the future of scientific communication and their impact on the
> research measurement and evaluation. In short, Academic Search Engines
> presents a summary view of the new challenges that the Web set to the
> scientific activity through the most novel and innovative searching
> services available on the Web.
> >
> > Key Features:
> > · This is the first approach to analyze search engines exclusively
> addressed to the research community in an integrative handbook.
> > · This book is not merely a description of the web functionalities of
> these services; it is a scientific review of the most outstanding
> characteristics of each platform, discussing their significance with recent
> investigations.
> > · This book introduces an original methodology based on a quantitative
> analysis of the covered data through the extensive use of crawlers and
> harvesters which allow going in depth into how these engines are working.
> >
> > José Luis Ortega (CCHS-CSIC) is a web researcher in the Spanish National
> Research Council (CSIC). He achieved a fellowship in the Cybermetrics Lab
> of the CSIC, where he finished his doctoral studies (2003-8). In 2005, he
> was employed by the Virtual Knowledge Studio of the Royal Netherlands
> Academy of Sciences and Arts, and in 2008 he took up a position as
> information scientist in the CSIC. He now continues his collaboration with
> the Cybermetrics Lab in research areas such as webometrics, web usage
> mining, visualization of information, academic search engines and social
> networks for scientists.
> >
>
>
> --
>
> ************************************
> Isidro F. Aguillo, HonDr.
> The Cybermetrics Lab, IPP-CSIC
> Grupo Scimago
> Madrid. SPAIN
>
> isidro.aguillo at csic.es
> ORCID 0000-0001-8927-4873
> ResearcherID: A-7280-2008
> Scholar Citations SaCSbeoAAAAJ
> Twitter @isidroaguillo
> Rankings Web webometrics.info
> ************************************
>
>
> ---
> Este mensaje no contiene virus ni malware porque la protección de avast!
> Antivirus está activa.
> http://www.avast.com
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/sigmetrics/attachments/20141010/aa74b859/attachment.html>


More information about the SIGMETRICS mailing list