[Sigmetrics] 1findr: research discovery & analytics platform

Kevin Boyack kboyack at mapofscience.com
Wed Apr 25 16:50:01 EDT 2018


Éric,

… and thanks to you for being so transparent about what you’re doing! 

Kevin

 

From: SIGMETRICS <sigmetrics-bounces at asist.org> On Behalf Of Éric Archambault
Sent: Wednesday, April 25, 2018 9:05 AM
To: Anne-Wil Harzing <anne at harzing.com>; sigmetrics at mail.asis.org
Subject: Re: [Sigmetrics] 1findr: research discovery & analytics platform

 

Anne-Wil,

Thank you so much for this review. We need that kind of feedback to prioritize development. 

Thanks a lot for the positive comments. We are happy that they reflect our design decisions. Now, onto the niggles (all fair points in current version).

An important distinction of our system – at this stage of development – is that our emphasis is on scholarly/scientific/research work published in peer-reviewed/quality controlled journals (e.g. we don’t index trade journals and popular science magazines such as New Scientist – not a judgment on quality, many of them are stunningly good, they are just not the type we focus on for now).  This stems from work conducted several years ago for the European Commission. We got a contract at Science-Metrix to measure the proportion of articles published in peer-reviewed journals. We discovered (discovery being a big term considering what follows) that 1) OA articles were hard to find and count (numerator in the percentage), and 2) there wasn’t a database that comprised all peer-reviewed journals (denominator in the percentage). Consequently, we had to work by sampling, but hard core bibliometricians like the ones we are at Science-Metrix like the idea of working on population level measurement. At Science-Metrix, our bibliometric company, we have been using licensed bibliometric versions of the Web of Science and Scopus. Great tools, very high quality data (obvious to anyone who has worked on big bibliographic metadata), extensive coverage and loads of high quality, expensive to implement smart enrichment. However, when measuring, we noticed, as did many others, that the databases emphasized Western production to the detriment of the Global South, emerging countries, especially in Asia, and even the old Cold War foe in which the West lost interest after the fall of the wall. 1findr is addressing this – it aims to find as much OA as possible and to index everything peer-reviewed and academic level published in journals. We aim to expand to other types of content with a rationally designed indexing strategy, but this is what we are obstinately focusing on for now.

-We are working on linking all the papers within 1findr with references/citations. This will create the first rationally designed citation network: from all peer-reviewed journals to all peer-reviewed journals, regardless of language, country, field of research (we won’t get there easily or soon). We feel this is scientifically a sound way to measure. Conferences and books are also important, but currently when we take them into account in citations, we have extremely non-random lumps of indexed material, and no one can say what the effect on measured citations is. My educated guess is that this is extremely biased – book coverage is extremely linguistically biased, conference proceedings indexing is extremely field biased (proportionately way more computer and engineering than other fields). If we want to turn scientometrics into a proper science we need proper measurement tools. This is the long-term direction of 1findr. It won’t remain solely in the discovery field, it will become a scientifically designed tool to measure research, with clearly documented strengths and weaknesses. 

-We still need to improve our coverage of OA. Though we find twice as many freely downloadable papers in journals than Dimensions, Impact Story finds about 8% OA for papers with a DOI for which we haven’t found a copy yet (one reason we have more OA as a percentage of journal articles is that in 1findr we find much OA for articles without DOIs). We are working on characterizing a sample of papers which are not OA on the 1findr side, but which ImpactStory finds in OA. A glimpse at the data reveals some of these are false positives, but some of them reflect approaches used by ImpactStory that we have not yet implemented (Heather and Jason are smart, and we can all learn from them -  thanks to their generosity). There are also transient problems we experienced while building 1findr. For example, at the moment, we have challenges with our existing Wiley dataset and we need to update our harvester for Wiley’s site. Would be nice to have their collaboration, but they have been ignoring my emails for the last two months… Shame, we’re only making their papers more discoverable and helping world users find papers for which article processing charges were paid for. We need the cooperation of publishers to do justice to the wealth of their content, especially hybrid OA papers.

-We know we have several papers displaying a “404”. We are improving the oaFindr link resolver built in 1findr to reduce this. Also we need to scan more frequently for change (we have to be careful there as we don’t want to overwhelm servers; many of the servers we harvest from are truly slow and we want to be nice guys), and we need to continue to implement smarter mechanisms to avoid 404. Transiency of OA is a huge challenge. We have addressed several of the issues, but this takes time and our team has a finite size, and as you note, several challenges, and big ambitions at the same time.

-We are rewriting our “help” center. Please be aware that using no quote does full stemming, using single quote does stemming, but words need be in the same order in the results. Double quotes should be used for non-stemmed, exact matches. This is a really powerful way of searching.

Fuel cell = finds articles with fuel and cell(s)

'fuel cell' = finds articles with both fuel cell and fuel cells

"fuel cell" = finds articles strictly with fuel cell (won’t return fuel cells only articles)

Once again, thanks for the review, and apologies for the lengthy reply.

 

Éric

 

Eric Archambault, PhD

CEO  |  Chef de la direction

C. 1.514.518.0823

 <mailto:eric.archambault at science-metrix.com> eric.archambault at science-metrix.com

 <http://www.science-metrix.com/> science-metrix.com  &   <http://www.science-metrix.com/> 1science.com

 

From: SIGMETRICS <sigmetrics-bounces at asist.org <mailto:sigmetrics-bounces at asist.org> > On Behalf Of Anne-Wil Harzing
Sent: April-24-18 5:11 PM
To: sigmetrics at mail.asis.org <mailto:sigmetrics at mail.asis.org> 
Subject: Re: [Sigmetrics] 1findr: research discovery & analytics platform

 

Dear all,

I was asked (with a very short time-frame) to comment on 1Findr for an article in Nature (which I am not sure has actually appeared). I was given temporary login details for the Advanced interface. 

As "per normal" with these kind of requests only one of my comments was actually used. So I am posting all of them here in case they are of use to anyone (and to Eric and his team in fine-tuning the system). 

================

As I had a very limited amount of time to provide my comments, I tried out 1Findr by searching for my own name (I have about 150 publications including journal articles, books, book chapters, software, web publications and white papers) and some key terms in my own field (international management). 


What I like


Simple and intuitive user interface with fast response to search requests, much faster than with some competitor products where the website takes can take ages to load. The flexibility of the available search options clearly reflects the fact that this is an offering built by people with a background in Scientometrics.

A search for my own name showed that coverage at the author level is good, it finds more of my publications than both the Web of Science and Scopus, but fewer than Google Scholar and Microsoft Academic. It is approximately on par with CrossRef and Dimensions though all three services (CR, Dimensions and Findr) have unique publications that the other service doesn’t cover.

As far as I could assess, topic searches worked well with flexible options to search in title, keywords and abstracts. However, I have not tried these in detail.

Provides a very good set of subjects for filtering searches that – for the disciplines I can evaluate – shows much better knowledge of academic disciplines and disciplinary boundaries than is reflected in some competitor products. I particularly like the fact that there is more differentiation in the Applied Sciences, the Economic and Social Sciences and Arts & Humanities than in some other databases. This was sorely needed.

There is a quick summary of Altmetrics such as tweets, Facebook postings and Mendeley readers. Again I like the fact that a simple presentation is used, rather than the “bells & whistle” approach with the flashy graphics of some other providers. This keeps the website snappy and provides an instant overview.

There is good access to OA versions and a “1-click” download of all available OA versions [for a maximum of 40 publications at once as this is the upper limit of the number of records on a page]. I like the fact that it finds OA versions from my personal website (www.harzing.com <http://www.harzing.com> ) as well as OA versions in university repositories and gold OA versions. However, it doesn’t find all OA versions of my papers (see dislike below).


What I dislike


Although I like the fact that Findr doesn’t try to be anything and everything leading to a cluttered user interface, for me the fact that it doesn’t offer citation metrics limits its usefulness. Although I understand its focus is on finding literature (which is fair enough) many academics – rightly or wrongly – use citations scores to assess which articles to prioritize articles for downloading and reading.

The fact that it doesn’t yet find all Open Access versions that Google Scholar and Microsoft Academic do. All my publications are available in OA on my website, but Findr does not seem to find all of them. Findr also doesn’t seem to source OA versions from ResearchGate. Also several OA versions resulted in a “404. The requested resource is not found.”

The fact that it only seems to cover journal articles. None of my books, book chapters, software, white papers or web publications were found. Although a focus on peer-reviewed work is understandable I think coverage of books and book chapters is essential and services like Google Scholar, Microsoft Academic and CrossRef do cover books.


Niggles


There are duplicate results for quite a few of my articles, usually “poorer” versions (i.e. without full text/abstract/altmetric scores) it would be good if the duplicates could be removed and only the “best” version kept

Automatic stemming of searches is awkward if you try to search for author names in the “general” search (as many users will do). In my case (Harzing) it results in hundreds of articles on the Harz mountains obscuring all of my output.

Preferred search syntax should be clearer as many users will search authors with initials only (as this is what works best in other databases). In Findr this provides very few results as there are “exact” matches only, whereas in other databases initial searches are interpreted as initial + wildcard. 

More generally needs better author disambiguation. Some of my articles can only be found when searching for a-w harzing, a very specific rendition of my name.

When Exporting Citations the order seems to reverts to alphabetical order of the first author, not the order that was on the screen.


Best wishes,
Anne-Wil 

Prof. Anne-Wil Harzing

Professor of International Management
Middlesex University London, Business School

Web: Harzing.com <https://harzing.com>  - Twitter: @awharzing <https://twitter.com/awharzing>  - Google Scholar: Citation Profile <https://scholar.google.co.uk/citations?user=v0sDYGsAAAAJ> 
New: Latest blog post <https://harzing.com/blog/.latest?redirect>  - Surprise: Random blog post <https://harzing.com/blog/.random>  - Finally: Support Publish or Perish <https://harzing.com/resources/publish-or-perish/donations>  

On 24/04/2018 21:51, Bosman, J.M. (Jeroen) wrote:

Of course there is much more to say about 1Findr. What I have seen so far is that the coverage back to 1944 is very much akin to Dimensions, probably because both are deriving the bulk of their records from Crossref.  

 

Full text search is relatively rare among these systems. Google Scholar does it. Dimensions does it on a subset. And some publisher platform support it, as do some OA aggragators. 

 

Apart from these two aspects (coverage and full text search support), there are a lot of aspects and (forthcoming) 1Findr functionalities that deserve scrutiny, not least the exact method of OA detection (and version priority) of course.

 

Jeroen Bosman

Utrecht University Library


  _____  


From: SIGMETRICS [sigmetrics-bounces at asist.org <mailto:sigmetrics-bounces at asist.org> ] on behalf of David Wojick [dwojick at craigellachie.us <mailto:dwojick at craigellachie.us> ]
Sent: Tuesday, April 24, 2018 8:59 PM
To: Mark C. Wilson
Cc: sigmetrics at mail.asis.org <mailto:sigmetrics at mail.asis.org> 
Subject: Re: [Sigmetrics] 1findr: research discovery & analytics platform

There is a joke that what is called "rapid prototyping" actually means fielding the beta version. In that case every user is a beta tester.

It is fast and the filter numbers are useful in themselves. Some of the hits are a bit mysterious. It may have unique metric capabilities. Too bad that advanced search is not available for free.

David

At 02:34 PM 4/24/2018, Mark C. Wilson wrote:

Searching for my own papers I obtained some wrong records and the link to arXiv was broken. It does return results very quickly and many are useful. I am not sure whether 1science intended to use everyone in the world as beta-testers.



On 25/04/2018, at 06:16, David Wojick <dwojick at craigellachie.us <mailto:dwojick at craigellachie.us>  > wrote:

It appears not to be doing full text search, which is a significant limitation. I did a search on "chaotic" for 2018 and got 527 hits. Almost all had the term in the title and almost all of the remainder had it in the abstract. Normally with full text, those with the term only in the text are many times more than those with it in title, often orders of magnitude more.

But the scope is impressive, as is the ability to filter for OA.

David

David Wojick, Ph.D.
Formerly Senior Consultant for Innovation
DOE OSTI https://www.osti.gov/ 


At 08:00 AM 4/24/2018, you wrote:

Content-Language: en-US
Content-Type: multipart/related;
         type="multipart/alternative";
         boundary="----=_NextPart_001_00EE_01D3DBBD.BC977220"

Greetings everyone,
 
Today, 1science announced the official launch of 1findr, its platform for research discovery and analytics. Indexing 90 million articles­of which 27 million are available in OA­it represents the largest curated collection worldwide of scholarly research. The platform aims to include all articles published in peer-reviewed journals, in all fields of research, in all languages and from every country.
 
Here are a few resources if you’re interested in learning more:
 
•           p;  Access 1findr platform: www.1findr.com <http://www.1findr.com/> 
•           p;  Visit the 1findr website: www.1science.com/1findr <http://www.1science.com/1findr> 
•           p;  Send in your questions: 1findr at 1science.com <mailto:1findr at 1science.com> 
•           p;  See the press release: www.1science.com/1findr-public-launch <http://www.1science.com/1findr-public-launch>  
 
Sincerely,
 
Grégoire
 
Grégoire Côté
President | Président 
Science-Metrix 
1335, Mont-Royal E
Montréal, QC  H2J 1Y6
Canada
 
 <https://www.linkedin.com/company/science-metrix-inc> <16bac2d.png>  <https://twitter.com/ScienceMetrix> <16bac3d.png>   
T. 1.514.495.6505 x115
T. 1.800.994.4761
F. 1.514.495.6523
gregoire.cote at science-metrix.com <mailto:gregoire.cote at science-metrix.com> 
www.science-metrix.com <http://www.science-metrix.com/> 
 
 
 
 
Content-Type: image/png;
         name="image001.png"
Content-Description: image001.png
Content-Disposition: inline;
         creation-date=Tue, 24 Apr 2018 12:00:30 GMT;
         modification-date=Tue, 24 Apr 2018 12:00:30 GMT;
         filename="image001.png";
         size=1068
Content-ID:  <mailto:image001.png at 01D3DB57.02A76980> <image001.png at 01D3DB57.02A76980>

Content-Type: image/png;
         name="image002.png"
Content-Description: image002.png
Content-Disposition: inline;
         creation-date=Tue, 24 Apr 2018 12:00:30 GMT;
         modification-date=Tue, 24 Apr 2018 12:00:30 GMT;
         filename="image002.png";
         size=1109
Content-ID:  <mailto:image002.png at 01D3DB57.02A76980> <image002.png at 01D3DB57.02A76980>


_______________________________________________
SIGMETRICS mailing list
SIGMETRICS at mail.asis.org <mailto:SIGMETRICS at mail.asis.org> 
http://mail.asis.org/mailman/listinfo/sigmetrics

_______________________________________________
SIGMETRICS mailing list
SIGMETRICS at mail.asis.org <mailto:SIGMETRICS at mail.asis.org> 
http://mail.asis.org/mailman/listinfo/sigmetrics





_______________________________________________
SIGMETRICS mailing list
SIGMETRICS at mail.asis.org <mailto:SIGMETRICS at mail.asis.org> 
http://mail.asis.org/mailman/listinfo/sigmetrics

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/sigmetrics/attachments/20180425/42fc7f8a/attachment-0001.html>


More information about the SIGMETRICS mailing list