OA Growth Monitoring Needs a Google Data-Mining Exemption

Stevan Harnad harnad at ECS.SOTON.AC.UK
Mon Aug 26 07:45:06 EDT 2013


On 2013-08-26, at 6:12 AM, "Bosman, J.M."  (Utrech University Library) wrote (in SIGMETRICS):

> Do you know..
> 1)   How many of the freely available full text versions are “black OA”, i.e. shared against copyright? I know many examples of that in for instance ResearchGate, that is indexed by Google Scholar….

There are technically two kinds of "Black OA": 

(B1) Third-party piracy -- X posting the articles of Y. This is very unlikely to be in Institutional Repositories.

(B2) Authors self-archiving their own articles (Green OA), ignoring any publisher embargo (most of Arxiv would have been B2 for years, until the publishers altered their policy and endorsed immediate, unembargoed Green OA self-archiving).

We will soon have separate data for Green OA growth in UK institutional repositories (mandated and unmandated).

(Let others count the proportion of that Green OA that is B2: I'm more interested in burying publishers' damaging and unjustified access embargoes than in praising, enforcing or reinforcing them!))

But let it be noted that access provided after an embargo is Delayed Access (DA), not OA, which is immediate (and permanent). 

In many if not most fields of research the critical growth period for new research uptake is within the first year of publication (if not earlier, for preprints), although this may only be expressed and measurable as citations somewhat later. This is the research progress that (some) publishers are trying to suppress in order to sustain their subscription revenues at all costs (to research) by trying to embargo Green OA self-archiving. 

(It is ironic also, and instructive, that in fields where the critical growth period for new research uptake is longer than a year, publishers are trying to impose even longer embargoes on Green OA self-archiving.)

The publishing tail, still trying to keep wagging the research dog, come what may...

> 2)   To what extent [can] the growth of available OA versions be explained by increasing numbers of green OA versions of which the embargo period has ended and to what extent to more general acceptance of OA by scholars? It seems likely that the first effect will be more pronounced 6-24 months after a period of exceptional growth of self-archiving in repositories etc.

The empirical part of question 2 would be answered by the data that answer question 1. 

The rest seems circular: 

Yes, by definition, OA growth during embargoes will take place during embargoes, not after, whereas OA growth after embargoes have elapsed will take place after embargoes have elapsed, not before. 

And yes, whatever is actually being done is a sign of "acceptance" of doing it (by authors, I should think, since users looking for articles are ready to accept whatever they can find, at least for Gratis OA (read-only), if not Libre OA! (read-write).

Stevan Harnad
 
> 
>> On Fri, Aug 23, 2013 at 6:58 AM, Sean Burns  wrote:
>> Although a harvester would be very nice, sampling theory and some manual work does the trick too... [in my dissertation] I took the sample in May 2010 and collected bibliometric and other relevant data from Google Scholar in July 2010, July 2011, and July 2012.
>> 
>> 
>> 
>> 
> On Fri, Aug 23, 2013 at 6:58 AM Stevan Harnad wrote:
> 
> Yes, hand-sampling can and does provide valuable information. 
>  
> But, as I said, for systematic ongoing monitoring of the global time-course of OA growth across institutions, disciplines and nations, hand-sampling is excruciatingly difficult and time-consuming, holding research that could greatly benefit the worldwide research community (as well as Google and Google Scholar) to a scale and pace that is more suitable for a doctoral dissertation.
>  
> Historically speaking, if a few projects designed to monitor the ongoing global growth and distribution of OA were allowed to do machine data-mining in Google space, the growth rate of OA would be dramatically accelerated (and thereby also the size and functionality of Google Scholar space).
>  
> Otherwise, efforts to enrich Google Scholar space are relegated to the same fate as attempts to enrich vendors, spammers, napsters or phishermen.
>  
> Stevan Harnad
>  
>  
> 
> > This is a response to a query regarding Eric Archambault's report on
> > OA Growth by Adam G Dunn in Science Insider: "I find it difficult to
> > believe that the authors of the study managed to create a harvester
> > that could identify and verify the pdfs linked to by Google Scholar
> > when Google Scholar actively blocks IP addresses when they identify
> > crawling."
> >
> > Our own "harvester" attempts to gather the all-important data on OA
> > growth were blocked by Google.
> >
> > It is completely understandable and justifiable that Google shields
> > its increasingly vital global database and search mechanisms from the
> > countless and incessant worldwide attempts at exploitation by
> > commercial interests, spammers, and malware that could bring Google to
> > its knees if not rigorously and relentlessly blocked.
> >
> > But in the very special (and tiny) case of scientific research
> > articles it would not only be a great help to the worldwide research
> > community but to Google (and Google Scholar) itself if Google granted
> > special individual exemptions for important international studies like
> > Eric Archambault's, which was commissioned by the European Union to
> > monitor the global growth rate of open access to research.
> >
> > Google and Google Scholar would become all the richer as research
> > databases if data like Eric's (and our own) were not made so
> > excruciatingly difficult and time-consuming to gather by Google's
> > blanket blockage of automated data-mining.
> >
> >
> > (We do not trawl books, so Google's agreements with publishers are not
> > violated or at issue in any way. We just want to trawl for articles
> > whose metadata match the the metadata from Web of Science or SCOPUS
> > and have been made freely accessible on the web; nor do we want their
> > full-texts: just to check whether they are there!)
> >
> > Stevan Harnad
> >
> >
> 
>  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/sigmetrics/attachments/20130826/307b1978/attachment.html>


More information about the SIGMETRICS mailing list