A new metrics-related book focused on academic search engines

David Wojick dwojick at CRAIGELLACHIE.US
Sat Oct 11 16:44:12 EDT 2014


Stephen,

You can count citations per se and I have no objection to that metric. However, each citation has logical meaning and that too is an interesting field, my field.
David

Sent from my IPad

On Oct 11, 2014, at 3:38 PM, Stephen J Bensman <notsjb at LSU.EDU> wrote:

> Adminstrative info for SIGMETRICS (for example unsubscribe): http://web.utk.edu/~gwhitney/sigmetrics.html
> David,
> 
> There is a whole school of behavioral theory that rejects the validity of citations due to lack of knowledge of the motivation.  This is nonsense.  We should stick with Garfield--a citation or hyperlink indicates an association of ideas.  The rest is statistics and probability.  You can call it logic or semantics whatever you want--but the links define sets relevant to the query.  That is what we are after.  I do not know why it works but it does.  Page justified it by stating that the higher number of citations or links, the more it indicates consensus of human judgment.  Therefore, it is a measure of what is in the human mind.  We found that a large number of GS cites was consistent with the judgment of the Nobel committees that selected these guys.  Page was right.  It works.  Case closed.  What was surprising was Krugman.  He is best known as a New York Times op-ed writer, but his high cited items were academic works and one working paper on his work on economic geography.  That a fine distinction because the conservatives hate him and are always lambasting him in the press.  There was a theory that the Europeans gave him the prize because he hated George Bush.  No, GS indicates that his prize was for his academic work and not his political fulminations.
> 
>  
> 
> I wish that arXiv would just classify the damn thing any way it wants.  I classified it as "Computer Science--Information Retrieval" but there is more to it than just that.  A lot of probability analysis.
> 
>  
> 
> SB 
> 
> From: ASIS&T Special Interest Group on Metrics <SIGMETRICS at LISTSERV.UTK.EDU> on behalf of David Wojick <dwojick at CRAIGELLACHIE.US>
> Sent: Saturday, October 11, 2014 1:22 PM
> To: SIGMETRICS at LISTSERV.UTK.EDU
> Subject: Re: [SIGMETRICS] A new metrics-related book focused on academic search engines
>  
> Adminstrative info for SIGMETRICS (for example unsubscribe): http://web.utk.edu/~gwhitney/sigmetrics.html Just to elaborate (because I have done a lot of work on the logic of citation) consider the simple case where a paper uses a single number and cites another paper as the source of that number. The logic of the citation is "I got this number here" or perhaps "I got this number here and I accept their results" or some such. One of the deep problems with citation is that the logic of the citation is often quite vague. That is, just what a citation is saying is not always clear. But in no case is this citation relation semantic in nature. It is part of the reasoning presented in the citing paper, which makes it subject to logical analysis, not just semantic analysis.
> 
> I hope this helps. The logic of citation is an interesting field.
> 
> David
> 
> David Wojick
> http://insidepublicaccess.com/
> 
> At 09:50 AM 10/11/2014, you wrote:
>> Adminstrative info for SIGMETRICS (for example unsubscribe): http://web.utk.edu/~gwhitney/sigmetrics.html 
>> Stephen,
>> 
>> It looks like there is some of the usual confusion here but they do say this:
>>  "In semantic search the idea is to search for what you really mean by that phrase and find words and concepts that are associated with your phrase. For instance, when you search for a phrase containing "java," are you talking about coffee, an island, or a programming language?"
>> http://google.about.com/od/s/g/semantic_search.htm
>> 
>> Finding other words or phrases is indeed a semantic effort. A thesaurus is good here. So is term vector similarity, for that matter, because it looks at all the words in the document. There is a lot of semantics in search technology. But the nature of the relations presented in links and citations is logical, not semantic.
>> 
>> David
>> 
>> On Oct 11, 2014, at 9:24 AM, Stephen J Bensman <notsjb at LSU.EDU> wrote:
>> 
>>> Adminstrative info for SIGMETRICS (for example unsubscribe): http://web.utk.edu/~gwhitney/sigmetrics.html 
>>> 
>>> David,
>>> 
>>> You are probably right in your analysis below, but the term I keep running across particularly in respect to Google is "semantic."  I am posting the URL for an example below:
>>> 
>>>  
>>> 
>>> http://davidamerland.com/google-semantic-search.html
>>> 
>>>  
>>> 
>>> Google is trying to make its program more "semantically" capable.  The basic premise is that citations/hyperlinks link similar ideas and therefore construct relevant subject sets.
>>> 
>>>  
>>> 
>>> The contribution of Francis Narin is discussed on pp. 16-18 of that article.  Here it is shown the cites from documents with many inlinks themselves create sets that more accord with human judgment.  Page built this concept into Google.   Garfield solved it by restricting coverage only to the most highly cited journals.  All these people are helped by the fact that citations/hyperlinks follow power-law distributions, and Google consciously takes this into account, whereas others do not.  Kleinberg points this out.  Google does a good job in creating order out of the chaos of the WWW, where there is no authority structure to guide you.  It is really a wonder.
>>> 
>>>  
>>> 
>>> What am I particularly interested to learn from Jose is how does Microsoft operate.  It is a failure.  If I can better understand its operation, I can better understand why Google works so well.
>>> 
>>>  
>>> 
>>> Respectfully,
>>> 
>>> Steve B.
>>> 
>>> Google Semantic Search
>>> Google Semantic Search book page resource, summary plus where to buy paper book or eBook. 
>>> Read more...
>>> 
>>> From: ASIS&T Special Interest Group on Metrics < SIGMETRICS at LISTSERV.UTK.EDU> on behalf of David Wojick <dwojick at CRAIGELLACHIE.US >
>>> Sent: Saturday, October 11, 2014 6:33 AM
>>> To: SIGMETRICS at LISTSERV.UTK.EDU
>>> Subject: Re: [SIGMETRICS] A new metrics-related book focused on academic search engines 
>>>  
>>> Adminstrative info for SIGMETRICS (for example unsubscribe): http://web.utk.edu/~gwhitney/sigmetrics.html 
>>> Stephen,
>>> 
>>> Ideas are expressed as propositions, not individual words. The science of the relations between propositions is logic, not semantics. For example, many years ago I discovered a basic way in which the sentences in a document, or a group of documents on a given topic, are related. I called it the issue tree. This structure is a logical form, not semantic.
>>> 
>>> For example, one sentence may offer evidence for a claim made by another sentence. Or it may provide an example (as this sentence does) or an explanation, etc. These are not semantic relations. The same is true for citations and other referential links. The meaning of the relation is not like the meaning of a word, rather it is a relation between whole thoughts.
>>> 
>>> In fact a lot of what is called the semantic web is not semantic, rather it is propositional, hence a matter of logic. There is much confusion about this.
>>> 
>>> David
>>> 
>>> On Oct 11, 2014, at 6:38 AM, Stephen J Bensman <notsjb at LSU.EDU> wrote:
>>> 
>>>> Adminstrative info for SIGMETRICS (for example unsubscribe): http://web.utk.edu/~gwhitney/sigmetrics.html 
>>>> 
>>>> David,
>>>> 
>>>> It is in the first paragraph, where I discuss Garfield's concept of citation indexing.  I quote:
>>>> 
>>>> "Eugene Garfield is the creator of citation indexing. In his landmark book on the subject Garfield (1983) gave the following conceptual definition of citation indexing: 
>>>> 
>>>> The concept of citation indexing is simple…. Citations are the foormal, explicit 
>>>> 
>>>> linkages between papers that have particular points in common. A citation index 
>>>> 
>>>> is built around these linkages. It lists publications that have been cited and identifies 
>>>> 
>>>> the sources of the citations. Anyone conducting a literature search can find from 
>>>> 
>>>> one to dozens of additional papers on a subject just by knowing one that has been cited. 
>>>> 
>>>> And every paper that is found provides a list of new citations with which to continue 
>>>> 
>>>> the search. (p. 1) 
>>>> 
>>>> In an article entitled "Citation Indexes for Science" published in the journal Science Garfield (1955) set forth the basic reasons for developing a citation index. Later in life Garfield (1987a) deemed this article "my most important paper" (p. 16). In his Science article Garfield (1955) stated that a primary advantage of a citation index over conventional alphabetical and subject indexes was that its different construction allowed it to bring together material that would never be collated by the usual subject indexing. Garfield here described a citation index as "an association-of-ideas index" (p. 108) that allowed the reader as much leeway as he needed. In his opinion, conventional indexes were inadequate, because scientists were often concerned with a particular idea rather than a complete concept, and the basic problem was to build subject indexes that can anticipate the infinite number of possible approaches that scientists may require in order to bridge the gap between the subject approach of those who create the documents and the subject approach of those who seek the information. Garfield stated that the utility of a citation index had to be considered from the viewpoint of the transmission of ideas. Thus, Garfield  justified citation indexing as better able to deliver a set of relevant documents in response to a scientist’s search query."
>>>> 
>>>>  
>>>> 
>>>> Thus, citations and hyperlinks connect ideas to form relevant document sets.  Semantics is the science of meaning, and, if this is not semantics, then what is.  We found that the economists' papers highest in GS cites were precisely the ones for which they were awarded the prize.  In other words, GS had defined the economists perfectly by subject.
>>>> 
>>>>  
>>>> 
>>>> Respectfully,
>>>> 
>>>> SB
>>>> 
>>>> PS arXiv still has our article on hold.  Ironically they think that it should possibly have a different classification.  Hoisted on own petard.  What a joke.
>>>> 
>>>>  
>>>> 
>>>>  
>>>> From: ASIS&T Special Interest Group on Metrics < SIGMETRICS at LISTSERV.UTK.EDU> on behalf of David Wojick <dwojick at CRAIGELLACHIE.US >
>>>> Sent: Friday, October 10, 2014 4:38 PM
>>>> To: SIGMETRICS at LISTSERV.UTK.EDU
>>>> Subject: Re: [SIGMETRICS] A new metrics-related book focused on academic search engines 
>>>>  
>>>> Adminstrative info for SIGMETRICS (for example unsubscribe): http://web.utk.edu/~gwhitney/sigmetrics.html Dear Stephen,
>>>> 
>>>> Your paper is 42 pages long. Can you point to the section where you explain the semantic nature of linking and citation? So far as I know neither relation is semantic.
>>>> 
>>>> David
>>>> 
>>>> David Wojick
>>>> http://insidepublicaccess.com/
>>>> 
>>>> 
>>>> At 11:26 AM 10/10/2014, you wrote:
>>>>> Adminstrative info for SIGMETRICS (for example unsubscribe): http://web.utk.edu/~gwhitney/sigmetrics.html 
>>>>> David and Jeroen,
>>>>> I explain the bases of how Google works semantically by links in my following arXiv posting:
>>>>>  
>>>>> Eugene Garfield, Francis Narin, and PageRank: The Theoretical Bases of the Google Search Engine
>>>>> Authors: Stephen J. Bensman
>>>>> (Submitted on 13 Dec 2013)
>>>>> Abstract: This paper presents a test of the validity of using Google Scholar to evaluate the publications of researchers by comparing the premises on which its search engine, PageRank, is based, to those of Garfield's theory of citation indexing. It finds that the premises are identical and that PageRank and Garfield's theory of citation indexing validate each other. 
>>>>> Subjects: 
>>>>> Information Retrieval (cs.IR); Digital Libraries (cs.DL); Physics and Society (physics.soc-ph)
>>>>> Cite as: 
>>>>> arXiv:1312.3872 [cs.IR]
>>>>>  
>>>>> (or arXiv:1312.3872v1 [cs.IR] for this version)
>>>>>  
>>>>> You will see that Garfield’s theory of citation indexing is based upon the premise that subject sets are better defined by links than by words.  This is the same bases on which the Google search engine operates.  
>>>>>  
>>>>> Our new paper is entitled “POWER-LAW DISTRIBUTIONS, THE H-INDEX, AND GOOGLE SCHOLAR (GS) CITATIONS: A TEST OF THEIR RELATIONSHIP WITH ECONOMICS NOBELISTS,” and here is its abstract:
>>>>> “This paper comprises an analysis of whether Google Scholar (GS) can construct documentary sets relevant for the evaluation of the works of researchers.  The researchers analyzed were two samples of Nobelists in economics: an original sample of five laureates downloaded in September, 2011; and a validating sample of laureates downloaded in October, 2013.  Two methods were utilized to conduct this analysis.  The first is distributional.  Here it is shown that the distributions of the laureates’ works by total GS citations belong within the Lotkaian or power-law domain, whose major characteristic is asymptote or “tail” to the right.  It also proves that this asymptote is conterminous with the laureates’ h-indexes, which demarcate their core œuvre.  This overlap is proof of both the ability of GS to form relevant documentary sets and the validity of the h-index.  The second method is semantic.  This method shows that the extreme outliers at the right tip of the tail—a siignature feature of the economists’ distributions—are not random events but related by subject to contributions to the discipline for which the laureates were awarded this prize.  Another interesting finding is the important role played by working papers in the dissemination of new economic knowledge.”
>>>>> This is what I mean by semantic—the works with the highest GS cites wwere on topics and contributions for which the laureates were awarded the prize.  Semantically that is dead on.  When this paper is finally posted on arXiv, I would appreciate it, if you would vet it, before we submit to a journal with dictatorial referees.
>>>>> Respectfully,
>>>>>  
>>>>> Stephen J Bensman
>>>>> LSU Libraries
>>>>> Lousiana State University
>>>>> Baton Rouge, LA 70803
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>> From: ASIS&T Special Interest Group on Metrics [ mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Bosman, J.M. (Jeroen)
>>>>> Sent: Friday, October 10, 2014 9:57 AM
>>>>> To: SIGMETRICS at LISTSERV.UTK.EDU
>>>>> Subject: Re: [SIGMETRICS] A new metrics-related book focused on academic search engines
>>>>>  
>>>>> Adminstrative info for SIGMETRICS (for example unsubscribe): http://web.utk.edu/~gwhitney/sigmetrics.html 
>>>>> Stephen,
>>>>>  
>>>>> Maybe I should just have patience and wait for your paper. But do you mean by that it "works semantically by links" that it takes citations into account for its hybrid ranking?  That is a fact and something MAS does as well. Or are you suggesting that GS also looks at links pointing to the web pages of the articles? The latter would be new(s) for me.
>>>>>  
>>>>> One of the differences between G and GS is btw that G has years ago stopped interpreting each space as a Boolean AND, but GS still does, as far as I can tell.
>>>>>  
>>>>> Best regards,
>>>>> Jeroen
>>>>> 
>>>>> 
>>>>> Op 10 okt. 2014 om 16:37 heeft "Stephen J Bensman" <notsjb at LSU.EDU> het volgende geschreven:
>>>>> Adminstrative info for SIGMETRICS (for example unsubscribe): http://web.utk.edu/~gwhitney/sigmetrics.html
>>>>> Jeoren,
>>>>> This is a revolution with deep roots.  Garfield laid out the main premise of the Google search engine in an article he published in Science in 1955 on citation indexing.  It is an accelerating revolution that now is reaching warp speed.
>>>>>  
>>>>> The main reason Google delivers more relevant sets than Microsoft is that it semantically works by links and not words.  This enables it to take advantage of the power-law linkage structure of the WWW to zero in on the most important and relevant documents.
>>>>>  
>>>>> I wish to hell that arXiv would finally post our working paper, where we prove all this with economics Nobelists.  Then I can vet our theories.
>>>>>  
>>>>> Respectfully,
>>>>>  
>>>>> Stephen J Bensman, Ph.D
>>>>> LSU Libraries
>>>>> Lousiana State University
>>>>> Baton Rouge, LA 70803
>>>>>  
>>>>> PS I am a historian by training, and there is nothing that is outdated for me.  Older, highly cited stuff is of the greatest interest, for we may be looking at the influence of time and the degree of incorporation.
>>>>>  
>>>>> From: ASIS&T Special Interest Group on Metrics [ mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Bosman, J.M. (Jeroen)
>>>>> Sent: Thursday, October 09, 2014 4:41 PM
>>>>> To: SIGMETRICS at LISTSERV.UTK.EDU
>>>>> Subject: Re: [SIGMETRICS] A new metrics-related book focused on academic search engines
>>>>>  
>>>>> Adminstrative info for SIGMETRICS (for example unsubscribe): http://web.utk.edu/~gwhitney/sigmetrics.html
>>>>> Stephen,
>>>>>  
>>>>> Thanks for your insightful elaboration. The ideas stem from about 1935 (Otlet), 1945 (Bush) and 1955 (Garfield), the implementation from the early sixties in SCI, futher ideas in 1976 (Narin) and 1989 (Berners-Lee) and Google elaborated on that in 1996 with PageRank and a hydrid . So I doubt that the revolution takes a just a decade. It already has taken some decades and will take some more decades, for the change is not restricted to discovery but includes distribution as well, just as with the printing press and scholarly journal. So probably the 'revolution' will only be complete when at some point in the future the academic book, journal and paper are replaced by instant production/publication/discovery, for instance in a smart nanopublications type of way? Also I think that for the system to collapse Google Scholar is not a conditio sine qua non. ArXiv (1991) and Citeseer (1998) are way older than GS and together they have revolutionized search and distribution more than GS has done, albeit in a much more restricted field of physics and information science.
>>>>> 
>>>>> On a less theoretical note, you say that MAS has been proven wrong and Google Scholar may be wright. But every other day I have to tell my students that in order to get relevant stuff they need to use GS pubyear filters, because if they don't they will end up using highly cited but outdated stuff. Over 95% of my students (>500 each year) had never realised this! By the way, I am not saying that MAS does a better job in this respect and I am a fan of Google Scholar.
>>>>>  
>>>>> Best,
>>>>> Jeroen Bosman
>>>>> @jeroenbosman
>>>>> 
>>>>> Op 9 okt. 2014 om 22:27 heeft "Stephen J Bensman" <notsjb at LSU.EDU> het volgende geschreven:
>>>>> Adminstrative info for SIGMETRICS (for example unsubscribe): http://web.utk.edu/~gwhitney/sigmetrics.html
>>>>> Jeroen
>>>>> Here is summary of what I think that we are involved in with academic search engines:
>>>>>  
>>>>> “Academic search engines are an extremely complex topic, since we are now engaged in an information revolution on the same scale as the invention of the printing press in the 15th century and the scientific journal in the 17th century, except what was accomplished took centuries then, and we will do it in a decade or so now.  One facet of this information revolution is that what was once semantically defined by words is now semantically defined by linkages.  On top of it, this information revolution is entwined with a scientific revolution on the power-law distributional structure of nature and society that was launched as a result of the development of the World Wide Web.”
>>>>>  
>>>>> Given the complexity of this thing, we need some sort of standardization, so we can better deal with it.  There has to be some sort of agreement on what is right and what is wrong.  MAS seems to be based on a system—number of word tokens in given document—that t was proven wrong and ineffective in semantically defining relevant document sets.  For me it is very hard to grasp that a Googlebot crawled out of a garage in Palo Alto in 2004, and suddenly an entire system began to collapse and be replaced by something else.  This took less than 10 years.  The Chinese have a curse about living in interesting times, and our times are sure interesting in this sense.
>>>>>  
>>>>> Respectfully,
>>>>>  
>>>>> Stephen J Bensman
>>>>> LSU Libraries
>>>>> Lousiana State University
>>>>> Baton Rouge, LA 70803
>>>>> USA
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>> From: ASIS&T Special Interest Group on Metrics [ mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Bosman, J.M. (Jeroen)
>>>>> Sent: Thursday, October 09, 2014 2:40 PM
>>>>> To: SIGMETRICS at LISTSERV.UTK.EDU
>>>>> Subject: Re: [SIGMETRICS] A new metrics-related book focused on academic search engines
>>>>>  
>>>>> Adminstrative info for SIGMETRICS (for example unsubscribe): http://web.utk.edu/~gwhitney/sigmetrics.html
>>>>> Isidro, Stephen, Enrique,
>>>>>  
>>>>> Thanks. I already downloaded the book and started reading. Hoewever I do not applaud the fact that MAS is coming to a standstill. I think it offers some very nice options and even unique things (ASAIK) such as the citation contexts. I also do not understand why it is necessary to have a single standard in order to be able to assess how the WWW revolutionizes the scholarly information system. Stephen, could you elaborate on why you think that is necassary? Could that assessment not include various parallel lines of development of these systems? And perhaps we already need an addendum to the book with today's news of the launch of Paperity.
>>>>>  
>>>>> Best,
>>>>> Jeroen
>>>>>  
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> Op 9 okt. 2014 om 18:23 heeft "Stephen J Bensman" <notsjb at LSU.EDU> het volgende geschreven:
>>>>> Enrique,
>>>>> Thank you for this information.  It simplifies matters.  At least MAS no longer needs to be taken into account, and we can focus on Google Scholar.  If we are going to make assessments on how the WWW is revolutionizing the scientific/scholarly information system, we have to have a single standard, and that is Google.  The problems are complex enough without the need to compare competitive systems.  Life was better and easier when the SCI was the single standard just as it was when peer ratings were the only standard
>>>>>  
>>>>> SB.
>>>>>  
>>>>>  
>>>>>  
>>>>> From: ASIS&T Special Interest Group on Metrics [ mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Enrique Orduña
>>>>> Sent: Thursday, October 09, 2014 9:47 AM
>>>>> To: SIGMETRICS at LISTSERV.UTK.EDU
>>>>> Subject: Re: [SIGMETRICS] A new metrics-related book focused on academic search engines
>>>>>  
>>>>> Adminstrative info for SIGMETRICS (for example unsubscribe): http://web.utk.edu/~gwhitney/sigmetrics.html
>>>>> Dear friends,
>>>>>  
>>>>> Interesting issues all of them. And of course I already purchased a copy of Ortega's book :)
>>>>>  
>>>>> As regards Microsoft Academic Search, and PoP software, we must take into account that MAS is completely outdated. This issue is detected by Ortega in his book. Moreover it was published by EC3 Research group by means of a working paper few months ago. A more in-depth analysis has been performed, which has been recently accepted for publication, where we study this drop of coverage according to disciplines, universities and journals.
>>>>>  
>>>>> Therefore, MAS cannot be used now for quantitative purposes. Additionally, the MAS API does not work properly with queries that return hit count estimates surpassing 1,000 results. And we can add finally all sometimes unknown legal considerations in the reuse of Bing results due to Microsoft copyright.
>>>>>  
>>>>> Finally, some official voices from Microsoft announced that MAS results will be integrated into Bing results, in an ongoing processs.
>>>>>  
>>>>> As regards Google Scholar, as Isidro said, "site" command may be used both in Google and Google Scholar. But be carefull, because search commands are changing in Scholar. For example the combination of "site" and "filetype" stopped working. In any case, site command in Google and Bing sometimes get us unexpected results in terms of coverage.
>>>>>  
>>>>> Best,
>>>>>  
>>>>> Enrique
>>>>>  
>>>>> On Thu, Oct 9, 2014 at 4:32 PM, Stephen J Bensman <notsjb at lsu.edu> wrote:
>>>>> Adminstrative info for SIGMETRICS (for example unsubscribe):
>>>>> http://web.utk.edu/~gwhitney/sigmetrics.html
>>>>> Isidro,
>>>>> Thanks for the information.  I am looking forward to hearing from Jose.  He and I are already in close contact on these matters.  I definitely want you two to vet the paper we have done.  It should be ready soon.  I screwed up in posting in it on arXiv, and it may take a while to correct my stupidity of submitting the damn thing multiple times, because I did not know what I was doing.
>>>>> You have already answered one of my questions.  The former Yahoo research engine was based upon AltVista, which defined documentary sets by words.  It was this system that Page tested and rejected as delivering incoherent, irrelevant sets.  Instead Page incorporated Garfield's theory of citation indexing, which defines relevant sets by linkages.  He strengthened this by also incorporating Narin's influential method.  Doing this delivered clearer more relevant sets than AltVista.  Multiple linkages are better at semantically defining sets that multiple token words.   If your book presents these facts, then I can strangle Microsoft Academic in its cradle, as Churchill once said of a certain political system that now seems to have come back into vogue.
>>>>> 
>>>>> I hope to get the book and hear from Jose.
>>>>> Respectfully,
>>>>> Stephen J Bensman, Ph.D
>>>>> LSU Libraries
>>>>> Lousiana State University
>>>>> Baton Rouge, LA 70803
>>>>> USA
>>>>> 
>>>>> 
>>>>> -----Original Message-----
>>>>> From: ASIS&T Special Interest Group on Metrics [ mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Isidro F. Aguillo
>>>>> Sent: Thursday, October 09, 2014 9:07 AM
>>>>> To: SIGMETRICS at LISTSERV.UTK.EDU
>>>>> Subject: Re: [SIGMETRICS] A new metrics-related book focused on academic search engines
>>>>> Adminstrative info for SIGMETRICS (for example unsubscribe):
>>>>> http://web.utk.edu/~gwhitney/sigmetrics.html
>>>>> Dear Stephen,
>>>>> Ooops!
>>>>> Sorry, I am not the author of the book. it was written by my collaborator and friend José Luis Ortega, also in this forum, so you can expect an answer from him soon.
>>>>> But, I can give a few hints to some of your questions. Bing is using the technology of the former Yahoo search engine. I do not know exactly the way Bing works but my feeling is they are using visits as main criteria.
>>>>> Probably there are far more variables involved, but number of visits play a similar role to links in Google`s PageRank. Of course, it is also possible links are also taken into account.
>>>>> Microsoft Academic Search is a completely different animal. Really it is a traditional bibliographic database, but I must recognize that although they are using h-index, I am unable to understand the rankings they publish. To my knowledge, MAS and Bing are completely independent products. On the contrary, Google and Google Scholar are closely interlinked.
>>>>> Regarding web indicators I use number of webpages under different levels of web addresses, like for example number of webpages in the webservers of your university
>>>>> 
>>>>> site:lsu.edu 
>>>>> This syntax is valid for Google, Bing and even Google Scholar.
>>>>> Best regards,
>>>>> 
>>>>> 
>>>>> On 09/10/2014 15:36, Stephen J Bensman wrote:
>>>>> > Adminstrative info for SIGMETRICS (for example unsubscribe):
>>>>> > http://web.utk.edu/~gwhitney/sigmetrics.html
>>>>> >
>>>>> > Isidro,
>>>>> > Thanks for writing this book-- Academic Search Engines: A Quantitative Outlook.  I am having LSU Libraries buy a copy of it, so you have sold at least one.  I hope that you have discussed the differences between how the Google and Microsoft search engines operate.  I understand how PageRank operates, but I do not understand how Bing operates.  All I know is that you obtain much better results with Google than with Microsoft, which seems to be quite new.  I have tested them both.
>>>>> >
>>>>> > For your information, Harzing has now interfaced her PoP program with Microsoft Academic as well as Google Scholar.  Now you can really run comparative tests between Google and Microsoft.  You seem to get better results with her PoP than with the Microsoft  Academic site itself.  At least her rankings are much better, although it is quite obvious from her program that Microsoft coverage is much weaker.
>>>>> >
>>>>> > As a matter of curiosity, what metric did you use to measure the quantitative aspects?  You cannot use standard bibliographic classifications such as number of books, journals, journal articles, working papers, etc. etc., because I do not think that either Google or Microsoft can identify these.  The Web has no authority structure whatever.  You are not dealing with OCLC WorldCat.  It must be something like megabytes of data or something like that.
>>>>> >
>>>>> > We are finishing a paper on how Google Scholar operates.  I'd like you to vet it when we have it ready.
>>>>> >
>>>>> > Respectfully,
>>>>> >
>>>>> > Stephen J Bensman, Ph.D.
>>>>> > LSU Libraries
>>>>> > Lousiana State University
>>>>> > Baton Rouge, LA 70803
>>>>> > USA
>>>>> >
>>>>> >
>>>>> > -----Original Message-----
>>>>> > From: ASIS&T Special Interest Group on Metrics
>>>>> > [ mailto:SIGMETRICS at LISTSERV.UTK.EDU] On Behalf Of Isidro F. Aguillo
>>>>> > Sent: Wednesday, October 08, 2014 6:27 AM
>>>>> > To: SIGMETRICS at LISTSERV.UTK.EDU
>>>>> > Subject: [SIGMETRICS] A new metrics-related book focused on academic
>>>>> > search engines
>>>>> >
>>>>> > Adminstrative info for SIGMETRICS (for example unsubscribe):
>>>>> > http://web.utk.edu/~gwhitney/sigmetrics.html
>>>>> >
>>>>> > José Luis Ortega. Academic Search Engines: A Quantitative Outlook.
>>>>> > Elsevier, 2014. Chandos Information Professional Series ISBN
>>>>> > 1780634722, 9781780634722
>>>>> >
>>>>> > http://store.elsevier.com/Academic-Search-Engines/Jose-Luis-Ortega/isb
>>>>> > n-9781843347910/
>>>>> >
>>>>> >
>>>>> > Academic Search Engines: intends to run through the current panorama of the academic search engines through a quantitative approach that analyses the reliability and consistence of these services. The objective is to describe the main characteristics of these engines, to highlight their advantages and drawbacks, and to discuss the implications of these new products in the future of scientific communication and their impact on the research measurement and evaluation. In short, Academic Search Engines presents a summary view of the new challenges that the Web set to the scientific activity through the most novel and innovative searching services available on the Web.
>>>>> >
>>>>> > Key Features:
>>>>> > · This is the first approach to analyze search engines exclusively addressed to the research community in an integrative handbook.
>>>>> > · This book is not merely a description of the web functionalities of these services; it is a scientific review of the most outstanding characteristics of each platform, discussing their significance with recent investigations.
>>>>> > · This book introduces an original methodology based on a quantitative analysis of the covered data through the extensive use of crawlers and harvesters which allow going in depth into how these engines are working.
>>>>> >
>>>>> > José Luis Ortega (CCHS-CSIC) is a web researcher in the Spanish National Research Council (CSIC). He achieved a fellowship in the Cybermetrics Lab of the CSIC, where he finished his doctoral studies (2003-8). In 2005, he was employed by the Virtual Knowledge Studio of the Royal Netherlands Academy of Sciences and Arts, and in 2008 he took up a position as information scientist in the CSIC. He now continues his collaboration with the Cybermetrics Lab in research areas such as webometrics, web usage mining, visualization of information, academic search engines and social networks for scientists.
>>>>> >
>>>>> 
>>>>> --
>>>>> ************************************
>>>>> Isidro F. Aguillo, HonDr.
>>>>> The Cybermetrics Lab, IPP-CSIC
>>>>> Grupo Scimago
>>>>> Madrid. SPAIN
>>>>> 
>>>>> isidro.aguillo at csic.es
>>>>> ORCID 0000-0001-8927-4873
>>>>> ResearcherID: A-7280-2008
>>>>> Scholar Citations SaCSbeoAAAAJ
>>>>> Twitter @isidroaguillo
>>>>> Rankings Web webometrics.info
>>>>> ************************************
>>>>> 
>>>>> ---
>>>>> Este mensaje no contiene virus ni malware porque la protección de avast! Antivirus está activa.
>>>>> http://www.avast.com
>>>>> 
>>>>>  
>>>> 
>>> 
>> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/sigmetrics/attachments/20141011/a0e6f067/attachment.html>


More information about the SIGMETRICS mailing list