How to Compare IRs and CRs

David E. Wojick dwojick at HUGHES.NET
Sun Feb 10 13:38:53 EST 2008

Steve, I do not regard establishing and maintaining IR's in tens of thouands of institutions around the world, and getting millions of authors to archive their writings each year, as simple or costing "next to nothing." (Clearly you do so we have little basis upon which to discuss policy issues.) By coincidence I come from a regulatory background, where I specialized in analyzing the human cost of information requirements. In the US we have what is called a "burden budget" for federal regulations, which I helped develop. Your version of OA looks to be quite burdensome indeed.

In any case, my delivery solution does not require more metadata. I tend to think of metadata as obsolete as a search tool, and burdensome, preferring full text (except for Sigmetrics-like meta analysis and visualization of course, where metadata can be very useful). My approach is exemplified by and the new , both of which already probably exceed plain Google in science content. (Google Scholar is largely pay-per-paper, not OA.) 

Our approach involves external federation of existing collections, which imposes no new burden on the institution, unlike Google's sitemap protocol. We are starting with the biggest collections first, then working our way down. If I have to find, federate and then track every college and institute IR in the world it will not be easy, hence my concern.

You also seem to be claiming that no further work is needed to improve the findability of raw access scientific content, until your (utopian?) vision of universal OA is completed. Needless to say I do not agree. There is much to be done, even if OA fails to materialize. One of our working principles is not to wait for visions to come true.

David Wojick

"David E. Wojick, PhD" <WojickD at>
Senior Consultant for Innovation
Office of Scientific and Technical Information
US Department of Energy

>On Sun, 10 Feb 2008, dwojick at wrote:
>> My point is that one should not consider (and design) OA in isolation.
>It is not at all clear why not, David. One does not have to redesign
>the web, publishing, or science, to attain 100% OA. One need merely
>self-archive in one's IR.
>> OA should be viewed as part of a systematic change in the way we do
>> science.
>But why? when reaching 100% OA is simple and reachable -- just a matter
>of a few keystrokes, and the only thing universities and funders need do
>is mandate them -- whereas systematically changing the way we do science
>is complicated, and not at all within obvious reach?
>> Or, to put it another way, OA has to be justified in terms of
>> the benefits it will provide. OA is disruptive and costly so the
>> benefits must be correspondingly great.
>What disruptive and costly effects? IRs cost next to nothing; keystrokes
>cost nothing; mandates cost nothing.
>Are we speculating, then, about the possible future of journal publishing
>after Green OA self-archiving is mandated and reaches 100%? (It will
>convert to Gold OA publishing. But what does that have to do with the
>scientific and scholarly research community? Publishing is a service
>industry and will adapt itself to the needs of research. Is research
>instead supposed to adapt itself to the needs of the publish industry?)
>> The benefits of OA in science lie in increased efficiency of
>> communication. What I call better, faster science. But access is only
>> part of the communication process. I am working the other part --
>Agreed that access is only part of it. But it is a necessary part,
>indeed an essential prerequisite. And it is an immediately doable
>part: The way to do it is for universities and funders to mandate
>Green OA self-archiving in the researcher's own OAI-compliant
>Institutional Repository (IR).
>That's immediately reachable, right now. Then we can worry about other
>[NB: Recall that I am only talking about OA's target content: journal
>> getting the stuff to the people who need it as efficiently as possible
>> (findability). My point is that my part of the system has something to
>> say about your part.
>But you can't find what's not there: Green OA IR mandates will provide
>the missing content, and then we can see whether there's truly any
>residual findability problem at all.
>> Less metaphorically, OA design issues like IR
>> versus CR need to consider the delivery (or findability) issue, perhaps
>> even being determined by them.
>IF it were the case that direct CR (Central Repository) deposit could
>deliver 100% of the target OA content and IF direct CR deposit were also
>somehow essential for findability, you would be quite right.
>But direct CR deposit cannot and will not deliver 100% of the target
>OA content (thematic CRs cannot cover all of research output space,
>exhaustively and non-redundantly, and institutions and funders are the
>entities that have the interests, and the means, to mandate deposit;
>"themes" are not); and harvesting content to CR search services will
>provide the findability. So both the conditional IFs are counterfactual.
>> My specific point was that your IR solution to OA looks like it
>> creates problems with my delivery solution. Perhaps we can discus this.
>I would be happy to discuss it. My guess is that your delivery solution
>calls for richer metadata than OAI. Fine. If the richer metadata really
>prove necessary, either CRs can harvest the OAI metadata from the IRs
>and enrich them, or, once the IRs are at last capturing all their own
>research output, the IRs themselves can be persuaded (by the advantages
>of your delivery solution) to enrich their own metadata requirements.
>But direct CR deposit is a nonstarter, either way, because it will not
>generate 100% OA content -- and it is totally unnecessary.
>[NB: Again, recall that I am only talking about OA's target content:
>journal articles.]
>> As for the research, it was very preliminary. We just took one issue
>> of each of several major journals, in physics and chemistry, and
>> manually (intelligently) searched the web for each article. Starting by
>> author typically worked better than by title or text. We got a good
>> success rate. I should point out that much, perhaps most, of web
>> available science is not on Google. It is in the deep web.
>Depositing on arbitrary websites, let alone in the deep web, is obviously
>nonoptimal. Mandates to deposit in OAI-compliant IRs will solve that.
>Stevan Harnad
>If you have adopted or plan to adopt a policy of providing Open Access
>to your own research article output, please describe your policy at:
>    BOAI-1 ("Green"): Publish your article in a suitable toll-access journal
>    BOAI-2 ("Gold"): Publish your article in an open-access journal if/when
>    a suitable one exists.
>    in BOTH cases self-archive a supplementary version of your article
>    in your own institutional repository.

More information about the SIGMETRICS mailing list