From dorothy.waugh at emory.edu  Mon May  1 09:30:08 2017
From: dorothy.waugh at emory.edu (Waugh, Dorothy F.)
Date: Mon, 1 May 2017 13:30:08 +0000
Subject: [Pasig-discuss] The Archivist's Guide to KryoFlux
Message-ID: <512DD020-9827-421E-9D9E-FBEBC79A0E16@emory.edu>

(With apologies for cross-posting)

An initial draft of The Archivist?s Guide to KryoFlux is now open for comment and review at goo.gl/ZZxxAJ.

The Archivist?s Guide to KryoFlux aims to provide a helpful resource for practitioners working with floppy disks in an archival context. This DRAFT of the Guide will remain open for comments from the digital archives community from May 1 through November 1, 2017. Once revisions have been incorporated, a version of the document will be freely available on GitHub.

Whether you already use a KryoFlux at your institution or are considering purchasing one, please take a look at the guide, put it to the test, and give us your feedback! You can either add your comments to the guide itself or send an email to archivistsguidetokryoflux at gmail.com<mailto:archivistsguidetokryoflux at gmail.com>. Your feedback will be enormously helpful as we go through an additional round of revisions in late 2017?so please, please do get in touch if you have any comments or questions.

With thanks,
The Archivist?s Guide to KryoFlux working group

Dorothy Waugh
Digital Archivist
Stuart A. Rose Manuscript, Archives, and Rare Book Library
Emory University
540 Asbury Circle
Atlanta, GA 30322-2870
Tel: (404) 727.2471
Email: dorothy.waugh at emory.edu<mailto:dorothy.waugh at emory.edu>

[cid:7B77B058-A0CD-49F0-91A9-AA9E0C53110D]

"The Stuart A. Rose Manuscript, Archives, & Rare Book Library collects and connects stories of human experience, promotes access and learning, and offers opportunities for dialogue for all wise hearts who seek knowledge."

Read the Rose Library blog: https://scholarblogs.emory.edu/marbl/

Like the Rose Library on Facebook:  https://www.facebook.com/emorymarbl

Follow the Rose Library on Twitter: https://twitter.com/EmoryMARBL

________________________________

This e-mail message (including any attachments) is for the sole use of
the intended recipient(s) and may contain confidential and privileged
information. If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution
or copying of this message (including any attachments) is strictly
prohibited.

If you have received this message in error, please contact
the sender by reply e-mail message and destroy all copies of the
original message (including attachments).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170501/c5df3c79/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rose_signature[CVcI][5].png
Type: image/png
Size: 12836 bytes
Desc: rose_signature[CVcI][5].png
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170501/c5df3c79/attachment-0001.png>

From Inge.Angevaare at KB.nl  Mon May  1 10:22:47 2017
From: Inge.Angevaare at KB.nl (Inge Angevaare)
Date: Mon, 1 May 2017 14:22:47 +0000
Subject: [Pasig-discuss] unsubscribe
Message-ID: <887EC26076B8864CB585EC40E46385D854AA87A8@MBX-SRV-P100.wpakb.kb.nl>


Inge Angevaare
eindredacteur www.kb.nl<http://www.kb.nl/>
en http://bibliotheekenbasisvaardigheden.nl<http://bibliotheekenbasisvaardigheden.nl/>
Marketing en diensten
Koninklijke Bibliotheek
T 06 11776725
E inge.angevaare at kb.nl<mailto:inge.angevaare at kb.nl>

Van: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] Namens Waugh, Dorothy F.
Verzonden: maandag 1 mei 2017 15:30
Aan: pasig-discuss at mail.asis.org
Onderwerp: [Pasig-discuss] The Archivist's Guide to KryoFlux

(With apologies for cross-posting)

An initial draft of The Archivist?s Guide to KryoFlux is now open for comment and review at goo.gl/ZZxxAJ.

The Archivist?s Guide to KryoFlux aims to provide a helpful resource for practitioners working with floppy disks in an archival context. This DRAFT of the Guide will remain open for comments from the digital archives community from May 1 through November 1, 2017. Once revisions have been incorporated, a version of the document will be freely available on GitHub.

Whether you already use a KryoFlux at your institution or are considering purchasing one, please take a look at the guide, put it to the test, and give us your feedback! You can either add your comments to the guide itself or send an email to archivistsguidetokryoflux at gmail.com<mailto:archivistsguidetokryoflux at gmail.com>. Your feedback will be enormously helpful as we go through an additional round of revisions in late 2017?so please, please do get in touch if you have any comments or questions.

With thanks,
The Archivist?s Guide to KryoFlux working group

Dorothy Waugh
Digital Archivist
Stuart A. Rose Manuscript, Archives, and Rare Book Library
Emory University
540 Asbury Circle
Atlanta, GA 30322-2870
Tel: (404) 727.2471
Email: dorothy.waugh at emory.edu<mailto:dorothy.waugh at emory.edu>

[cid:image001.png at 01D2C297.2B803B40]

"The Stuart A. Rose Manuscript, Archives, & Rare Book Library collects and connects stories of human experience, promotes access and learning, and offers opportunities for dialogue for all wise hearts who seek knowledge."

Read the Rose Library blog: https://scholarblogs.emory.edu/marbl/

Like the Rose Library on Facebook:  https://www.facebook.com/emorymarbl

Follow the Rose Library on Twitter: https://twitter.com/EmoryMARBL

________________________________

This e-mail message (including any attachments) is for the sole use of
the intended recipient(s) and may contain confidential and privileged
information. If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution
or copying of this message (including any attachments) is strictly
prohibited.

If you have received this message in error, please contact
the sender by reply e-mail message and destroy all copies of the
original message (including attachments).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170501/ad2b4394/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 12836 bytes
Desc: image001.png
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170501/ad2b4394/attachment-0001.png>

From sean.killen at oracle.com  Mon May  1 10:52:12 2017
From: sean.killen at oracle.com (Sean Killen)
Date: Mon, 1 May 2017 10:52:12 -0400
Subject: [Pasig-discuss] unsubscribe
In-Reply-To: <887EC26076B8864CB585EC40E46385D854AA87A8@MBX-SRV-P100.wpakb.kb.nl>
References: <887EC26076B8864CB585EC40E46385D854AA87A8@MBX-SRV-P100.wpakb.kb.nl>
Message-ID: <D50C97A9-5D53-4548-83EA-F7489C5FFBAA@oracle.com>

Unsubscribe 

Please pardon the typos.  I sent from an iPhone. 

> On May 1, 2017, at 10:22 AM, Inge Angevaare <Inge.Angevaare at KB.nl> wrote:
> 
>  
>  
> Inge Angevaare
> eindredacteur www.kb.nl
> en http://bibliotheekenbasisvaardigheden.nl
> Marketing en diensten
> Koninklijke Bibliotheek
> T 06 11776725
> E inge.angevaare at kb.nl
>  
> Van: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] Namens Waugh, Dorothy F.
> Verzonden: maandag 1 mei 2017 15:30
> Aan: pasig-discuss at mail.asis.org
> Onderwerp: [Pasig-discuss] The Archivist's Guide to KryoFlux
>  
> (With apologies for cross-posting)
>  
> An initial draft of The Archivist?s Guide to KryoFlux is now open for comment and review at goo.gl/ZZxxAJ.
>  
> The Archivist?s Guide to KryoFlux aims to provide a helpful resource for practitioners working with floppy disks in an archival context. This DRAFT of the Guide will remain open for comments from the digital archives community from May 1 through November 1, 2017. Once revisions have been incorporated, a version of the document will be freely available on GitHub.
>  
> Whether you already use a KryoFlux at your institution or are considering purchasing one, please take a look at the guide, put it to the test, and give us your feedback! You can either add your comments to the guide itself or send an email to archivistsguidetokryoflux at gmail.com. Your feedback will be enormously helpful as we go through an additional round of revisions in late 2017?so please, please do get in touch if you have any comments or questions.
>  
> With thanks,
> The Archivist?s Guide to KryoFlux working group  
>  
> Dorothy Waugh
> Digital Archivist
> Stuart A. Rose Manuscript, Archives, and Rare Book Library
> Emory University
> 540 Asbury Circle
> Atlanta, GA 30322-2870
> Tel: (404) 727.2471
> Email: dorothy.waugh at emory.edu
>  
> <image001.png>
>  
> "The Stuart A. Rose Manuscript, Archives, & Rare Book Library collects and connects stories of human experience, promotes access and learning, and offers opportunities for dialogue for all wise hearts who seek knowledge."
>  
> Read the Rose Library blog: https://scholarblogs.emory.edu/marbl/
>  
> Like the Rose Library on Facebook:  https://www.facebook.com/emorymarbl
>  
> Follow the Rose Library on Twitter: https://twitter.com/EmoryMARBL
>  
> 
> This e-mail message (including any attachments) is for the sole use of
> the intended recipient(s) and may contain confidential and privileged
> information. If the reader of this message is not the intended
> recipient, you are hereby notified that any dissemination, distribution
> or copying of this message (including any attachments) is strictly
> prohibited.
> 
> If you have received this message in error, please contact
> the sender by reply e-mail message and destroy all copies of the
> original message (including attachments).
> ----
> To subscribe, unsubscribe, or modify your subscription, please visit
> http://mail.asis.org/mailman/listinfo/pasig-discuss
> _______
> PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
> _______________________________________________
> Pasig-discuss mailing list
> Pasig-discuss at mail.asis.org
> http://mail.asis.org/mailman/listinfo/pasig-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170501/2141c5ff/attachment.html>

From sserbicki at ea.com  Mon May  1 19:35:40 2017
From: sserbicki at ea.com (Serbicki, Stefan)
Date: Mon, 1 May 2017 23:35:40 +0000
Subject: [Pasig-discuss] The Archivist's Guide to KryoFlux
In-Reply-To: <512DD020-9827-421E-9D9E-FBEBC79A0E16@emory.edu>
References: <512DD020-9827-421E-9D9E-FBEBC79A0E16@emory.edu>
Message-ID: <MWHPR07MB3535F895FABEB185B9387733CC140@MWHPR07MB3535.namprd07.prod.outlook.com>

At Electronic Arts, we used Kryoflux boards to recover data from approx. six thousand 3.5? and 5.25? floppies dating back to the 80s and early 90s. We had a ~95% success rate which was quite astounding considering that a good portion of the media had exceeded its theoretical lifetime: 25-30 years.

Getting the data off the disks was only part of the overall project. Our final goal was to obtain ?loose? files that could be read or executed. As several of the datasets consisted of backups in various formats, for various platforms, made with obsolete software, considerable work had to be done after achieving a successful Kryoflux extraction. In fact, our work is ongoing. Currently we are focusing on restoring backups made with Fastback 2.0. We have managed to do this successfully for two titles: F-22 Interceptor and LHX Attack Chopper.

The original backups were broken in parts and stored in 5.25? floppies. We used virtual machines to recreate the original environment in which the backups were made. The final output yielded Betas and the game source code. I?ll be happy to write a paper describing the steps we took from beginning to end to recover that data if there is interest.

------------------
Stefan Serbicki
Technical Lead ? IP Preservation
Electronic Arts
209 Redwood Shores Parkway
Redwood City, CA 94065

From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Waugh, Dorothy F.
Sent: Monday, May 01, 2017 6:30 AM
To: pasig-discuss at mail.asis.org
Subject: [Pasig-discuss] The Archivist's Guide to KryoFlux

(With apologies for cross-posting)

An initial draft of The Archivist?s Guide to KryoFlux is now open for comment and review at goo.gl/ZZxxAJ.

The Archivist?s Guide to KryoFlux aims to provide a helpful resource for practitioners working with floppy disks in an archival context. This DRAFT of the Guide will remain open for comments from the digital archives community from May 1 through November 1, 2017. Once revisions have been incorporated, a version of the document will be freely available on GitHub.

Whether you already use a KryoFlux at your institution or are considering purchasing one, please take a look at the guide, put it to the test, and give us your feedback! You can either add your comments to the guide itself or send an email to archivistsguidetokryoflux at gmail.com<mailto:archivistsguidetokryoflux at gmail.com>. Your feedback will be enormously helpful as we go through an additional round of revisions in late 2017?so please, please do get in touch if you have any comments or questions.

With thanks,
The Archivist?s Guide to KryoFlux working group

Dorothy Waugh
Digital Archivist
Stuart A. Rose Manuscript, Archives, and Rare Book Library
Emory University
540 Asbury Circle
Atlanta, GA 30322-2870
Tel: (404) 727.2471
Email: dorothy.waugh at emory.edu<mailto:dorothy.waugh at emory.edu>

[cid:image001.png at 01D2C296.3F9C0C40]

"The Stuart A. Rose Manuscript, Archives, & Rare Book Library collects and connects stories of human experience, promotes access and learning, and offers opportunities for dialogue for all wise hearts who seek knowledge."

Read the Rose Library blog: https://scholarblogs.emory.edu/marbl/

Like the Rose Library on Facebook:  https://www.facebook.com/emorymarbl

Follow the Rose Library on Twitter: https://twitter.com/EmoryMARBL

________________________________

This e-mail message (including any attachments) is for the sole use of
the intended recipient(s) and may contain confidential and privileged
information. If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution
or copying of this message (including any attachments) is strictly
prohibited.

If you have received this message in error, please contact
the sender by reply e-mail message and destroy all copies of the
original message (including attachments).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170501/5cff0021/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 12836 bytes
Desc: image001.png
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170501/5cff0021/attachment-0001.png>

From j.meyerson at austin.utexas.edu  Mon May  1 21:01:34 2017
From: j.meyerson at austin.utexas.edu (Meyerson, Jessica W)
Date: Tue, 2 May 2017 01:01:34 +0000
Subject: [Pasig-discuss] From soup to nuts (or to a continuum of meaningful
 reuse): Kryoflux data triage to emulated access
Message-ID: <CY4PR06MB26004623B26A4AD6A9C6EA99C4170@CY4PR06MB2600.namprd06.prod.outlook.com>

Huge thanks to Dorothy and all of the members of the Archivists Guide to the Kryoflux Working Group for this awesome contribution to the preservation community!

And Stefan - your offer to write up next steps towards a mountable, executable object would be useful to a broad audience to be sure: things you tested that failed, how you discerned the appropriate disktype wrapper to make a mountable image, and the components of the emulated environment necessary to ultimately provide access.


Best,

Jessica


Jessica Meyerson, MSIS, CA
Digital Archivist
Briscoe Center for American History
The University of Texas at Austin
2300 Red River St. Stop D1100
Austin TX, 78712-1426

(512) 495-4405
j.meyerson at austin.utexas.edu
http://www.cah.utexas.edu/
http://www.softwarepreservationnetwork.org/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170502/99722755/attachment.html>

From Walker.Sampson at Colorado.EDU  Tue May  2 17:16:26 2017
From: Walker.Sampson at Colorado.EDU (Walker Sampson)
Date: Tue, 2 May 2017 21:16:26 +0000
Subject: [Pasig-discuss] The Archivist's Guide to KryoFlux
Message-ID: <C300056E-DCC9-433B-B44D-B0EA91B2507D@colorado.edu>

Hi Stefan,

I?d certainly be interested in that paper. Particularly, what you all decided to do with that ~5% of floppies that were not successful. Just try again? Setting more retries in the KryoFlux software? Cleaning the platter, swapping drives, recalibration, cleaning the drive head, etc.? I would venture users are interested in troubleshooting steps there.

Also, are you keeping the raw track data KryoFlux makes? Regardless, happy to hear it?s been a successful project.

All best,

Walker Sampson
Digital Archivist, MSIS, CA
Special Collections and Archives
University of Colorado Boulder

From: Pasig-discuss <pasig-discuss-bounces at asis.org> on behalf of "Serbicki, Stefan" <sserbicki at ea.com>
Date: Monday, May 1, 2017 at 5:35 PM
To: "Waugh, Dorothy F." <dorothy.waugh at emory.edu>, "pasig-discuss at mail.asis.org" <pasig-discuss at mail.asis.org>
Subject: Re: [Pasig-discuss] The Archivist's Guide to KryoFlux

At Electronic Arts, we used Kryoflux boards to recover data from approx. six thousand 3.5? and 5.25? floppies dating back to the 80s and early 90s. We had a ~95% success rate which was quite astounding considering that a good portion of the media had exceeded its theoretical lifetime: 25-30 years.

Getting the data off the disks was only part of the overall project. Our final goal was to obtain ?loose? files that could be read or executed. As several of the datasets consisted of backups in various formats, for various platforms, made with obsolete software, considerable work had to be done after achieving a successful Kryoflux extraction. In fact, our work is ongoing. Currently we are focusing on restoring backups made with Fastback 2.0. We have managed to do this successfully for two titles: F-22 Interceptor and LHX Attack Chopper.

The original backups were broken in parts and stored in 5.25? floppies. We used virtual machines to recreate the original environment in which the backups were made. The final output yielded Betas and the game source code. I?ll be happy to write a paper describing the steps we took from beginning to end to recover that data if there is interest.

------------------
Stefan Serbicki
Technical Lead ? IP Preservation
Electronic Arts
209 Redwood Shores Parkway
Redwood City, CA 94065

From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Waugh, Dorothy F.
Sent: Monday, May 01, 2017 6:30 AM
To: pasig-discuss at mail.asis.org
Subject: [Pasig-discuss] The Archivist's Guide to KryoFlux

(With apologies for cross-posting)

An initial draft of The Archivist?s Guide to KryoFlux is now open for comment and review at goo.gl/ZZxxAJ.

The Archivist?s Guide to KryoFlux aims to provide a helpful resource for practitioners working with floppy disks in an archival context. This DRAFT of the Guide will remain open for comments from the digital archives community from May 1 through November 1, 2017. Once revisions have been incorporated, a version of the document will be freely available on GitHub.

Whether you already use a KryoFlux at your institution or are considering purchasing one, please take a look at the guide, put it to the test, and give us your feedback! You can either add your comments to the guide itself or send an email to archivistsguidetokryoflux at gmail.com<mailto:archivistsguidetokryoflux at gmail.com>. Your feedback will be enormously helpful as we go through an additional round of revisions in late 2017?so please, please do get in touch if you have any comments or questions.

With thanks,
The Archivist?s Guide to KryoFlux working group

Dorothy Waugh
Digital Archivist
Stuart A. Rose Manuscript, Archives, and Rare Book Library
Emory University
540 Asbury Circle
Atlanta, GA 30322-2870
Tel: (404) 727.2471
Email: dorothy.waugh at emory.edu<mailto:dorothy.waugh at emory.edu>

[cid:image001.png at 01D2C357.100EC460]

"The Stuart A. Rose Manuscript, Archives, & Rare Book Library collects and connects stories of human experience, promotes access and learning, and offers opportunities for dialogue for all wise hearts who seek knowledge."

Read the Rose Library blog: https://scholarblogs.emory.edu/marbl/

Like the Rose Library on Facebook:  https://www.facebook.com/emorymarbl

Follow the Rose Library on Twitter: https://twitter.com/EmoryMARBL

________________________________

This e-mail message (including any attachments) is for the sole use of
the intended recipient(s) and may contain confidential and privileged
information. If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution
or copying of this message (including any attachments) is strictly
prohibited.

If you have received this message in error, please contact
the sender by reply e-mail message and destroy all copies of the
original message (including attachments).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170502/04dc488c/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 12837 bytes
Desc: image001.png
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170502/04dc488c/attachment-0001.png>

From dwilcox at duraspace.org  Thu May  4 10:09:23 2017
From: dwilcox at duraspace.org (David Wilcox)
Date: Thu, 4 May 2017 09:09:23 -0500
Subject: [Pasig-discuss] JOIN US at Fedora Camp in Texas
Message-ID: <F308B7E0-738B-40EB-8E06-39530A484B46@duraspace.org>

You are invited to join experienced trainers and Fedora gurus at Fedora Camp <http://events.r20.constantcontact.com/register/event?oeidk=a07edor8kr92ff448d8&llr=5iy95gcab> to be held October 16-18 at the Perry-Casta?eda Library <http://www.lib.utexas.edu/pcl> at the University of Texas, Austin.

Fedora is the robust, modular, open source repository platform for the management and dissemination of digital content. Fedora 4, the latest production version of Fedora, features vast improvements in scalability, linked data capabilities, research data support, modularity, ease of use and more. Fedora Camp offers everyone a chance to dive in and learn all about Fedora.

Training will begin with the basics and build toward more advanced concepts?no prior Fedora experience is required. Participants can expect to come away with a deep dive Fedora learning experience coupled with multiple opportunities for applying hands-on techniques.

Previous Fedora Camps include the inaugural camp <https://wiki.duraspace.org/display/FF/Fedora+Camp+-+16-18+November+2015> held at Duke University, the West Coast camp <https://wiki.duraspace.org/display/FF/Fedora+Camp+California+-+11-13+April+2016> at CalTech, and the most recent, NYC camp <https://wiki.duraspace.org/display/FF/Fedora+Camp+NYC+-+28-30+November+2016> held at Columbia University.

Betsy Coles, Caltech Library Services and Fedora Camp attendee, said, ?The material covered was comprehensive, which I needed. It was also pitched at an appropriate level for me. I was able to keep up with the hands-on exercises without becoming completely befuddled. I thought the organization of the material and the hands-on exercises were very well done. I also appreciated the chance to interact with others with both similar and different interests.?

The camp curriculum provides a comprehensive overview of Fedora 4 by exploring such topics as:
Core & Integrated features
Data modeling and linked data
Hydra and Islandora
Migrating to Fedora 4
Deploying Fedora 4 in production
Preservation Services
A knowledgeable team of instructors from the Fedora community will lead you through the curriculum: David Wilcox - Fedora Product Manager, Andrew Woods - Fedora Technical Lead, Bethany Seeger - Amherst College,  Aaron Birkland - Johns Hopkins University,  Mike Durbin - University of Virginia

View the detailed agenda <https://wiki.duraspace.org/display/FF/2017-10+Fedora+Camp+Texas>.

Register here <http://events.r20.constantcontact.com/register/event?llr=5iy95gcab&oeidk=a07edor8kr92ff448d8>. Please note that the early bird discount will be offered until August 14, and that accommodations are available at a discounted rate.

--
David Wilcox
Fedora Product Manager
DuraSpace
dwilcox at duraspace.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170504/c4850e91/attachment.html>

From debra.weiss at colorado.edu  Mon May  8 09:09:04 2017
From: debra.weiss at colorado.edu (Debra Weiss)
Date: Mon, 8 May 2017 13:09:04 +0000
Subject: [Pasig-discuss] Job Opening: Digital Library Software Architect at
 University of Colorado Boulder
Message-ID: <DM2PR0301MB12649823E12ADAD0F9D92F3B92EE0@DM2PR0301MB1264.namprd03.prod.outlook.com>

The University of Colorado Boulder is seeking applicants for the position of digital library software architect to support University Libraries software applications and digital initiatives.   This is a permanent full-time position.
For the complete posting with information on how to apply, please see:

https://cu.taleo.net/careersection/jobdetail.ftl?job=09351&lang=en


Debra Weiss
Director of Libraries Information Technology
184 UCB
University of Colorado Boulder Libraries
Boulder, CO  80309
303-492-3965
http://www.colorado.edu/libraries/
[cid:image001.jpg at 01CFD693.E1EE9680]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170508/46341d45/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 2229 bytes
Desc: image002.jpg
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170508/46341d45/attachment-0001.jpg>

From arthurpasquinelli at gmail.com  Wed May 10 15:20:30 2017
From: arthurpasquinelli at gmail.com (Arthur Pasquinelli)
Date: Wed, 10 May 2017 12:20:30 -0700
Subject: [Pasig-discuss] 11th Annual Creative Storage Conference - Special
	Offer
Message-ID: <74080b9e-1050-7824-651b-a320b061d89c@gmail.com>

On May 24, 2017 at the DoubleTree Hotel in Culver City, CA the 11th 
annual Creative Storage Conference will explore every aspect of digital 
storage and rich media (www.creativestorage.org 
<http://www.creativestorage.org/>). This includes discussions of digital 
archiving and preservation. If you are interested in attending we would 
like to offer you a $150 discount off of early registration using this 
link: https://cs2017.eventbrite.com?discount=onefiftyoff37168524 
<https://cs2017.eventbrite.com/?discount=onefiftyoff37168524>

If you like the Southern California area I hope that you can join us.

Thomas Coughlin
Coughlin Associates
408-202-5098
tom at tomcoughlin.com <mailto:tom at tomcoughlin.com>
www.tomcoughlin.com <http://www.tomcoughlin.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170510/1fbb9230/attachment.html>

From Tim.Gollins at nrscotland.gov.uk  Fri May 12 07:33:34 2017
From: Tim.Gollins at nrscotland.gov.uk (Tim.Gollins at nrscotland.gov.uk)
Date: Fri, 12 May 2017 11:33:34 +0000
Subject: [Pasig-discuss] WORM (Write Once Read Many) AIPs
Message-ID: <FBB24FE94D8FB245B07DD35074C8C1FF8E689447@s0395g.scotland.gov.uk>

Dear PASIG 

I have been thinking recently about the challenge of managing "physical"  AIPs on offline or near line storage and how to optimise or simplify the use of managed storage media in a tape based (robotic) Hierarchical Storage Management (HSM) system. By "physical" AIPs I mean that the actual structure of the AIP written to the storage system is sufficiently self-describing that even if the management or other elements of a DP system were to be lost to a disaster then the entire collection could be fully re-instated reliably from the stored AIPs alone. 

I have also been thinking about the huge benefits of adopting the concepts of "Minimal Ingest" (MI) and "Autonomous Preservation Tools" (APT) in a new Digital Archive solution. 

One of the potential effects of the MI and APT concepts is that over time it is clear that while (of course) the original bit streams will never need to be updated, the metadata packaged in the AIP will need to change relatively often (through the life of the AIP) . This is of course in addition to any new renderings of the bit streams produced for preservation purposes (manifestations as termed in some systems).

If to update the AIP the process involves the AIP being "loaded" and "Modified" and "Stored" again as a whole then this will result in significant "churn" of the offline or near line media (i.e. tapes) in a HSM - which I would like to avoid. I think it would be really great if the AIP representation could accommodate the concept of an "update IP" (perhaps UIP?) where the UIP contains a "delta" of the original AIP - the full AIP then being interpreted as the original as modified by a series of deltas. This would then effectively result in AIPs (and UIPs) becoming WORM objects with clear benefits that I perceive in managing their reliable and safe storage.

I am not sufficiently familiar with the detail of all the different AIP models or implementations, I was wondering if anyone in the team would be able to comment on whether the they know of any AIP models, specifications or implementations that  would support such a use case.

I have just posted a version of this question to the E-Ark Linked in Group so my apologies to those who see it twice. 

Many thanks 

Tim
Tim Gollins | Head of Digital Archiving and Director of the NRS Digital Preservation Programme
National Records of Scotland | West Register House | Edinburgh EH2 4DF
+ 44 (0)131 535 1431 / + 44 (0)7974 922614 | tim.gollins at nrscotland.gov.uk | www.nrscotland.gov.uk

Preserving the past | Recording the present | Informing the future
Follow us on Twitter: @NatRecordsScot | http://twitter.com/NatRecordsScot


**********************************************************************
This e-mail (and any files or other attachments transmitted with it) is intended solely for the attention of the addressee(s). Unauthorised use, disclosure, storage, copying or distribution of any part of this e-mail is not permitted. If you are not the intended recipient please destroy the email, remove any copies from your system and inform the sender immediately by return.

Communications with the Scottish Government may be monitored or recorded in order to secure the effective operation of the system and for other lawful purposes. The views or opinions contained within this e-mail may not necessarily reflect those of the Scottish Government.


Tha am post-d seo (agus faidhle neo ceanglan  c?mhla ris) dhan neach neo luchd-ainmichte a-mh?in. Chan eil e ceadaichte a chleachdadh ann an d?igh sam bith, a? toirt a-steach c?raichean, foillseachadh neo sgaoileadh,  gun chead. Ma ?s e is gun d?fhuair sibh seo le gun fhiosd?, bu choir cur ?s dhan phost-d agus lethbhreac sam bith air an t-siostam agaibh, leig fios chun  neach a sgaoil am post-d  gun d?il.  

Dh?fhaodadh gum bi teachdaireachd sam bith bho Riaghaltas na h-Alba air a chl?radh neo air a sgr?dadh airson dearbhadh gu bheil an siostam ag obair gu h-?ifeachdach neo airson adhbhar laghail eile. Dh?fhaodadh nach  eil beachdan anns a? phost-d seo co-ionann ri beachdan Riaghaltas na h-Alba. 
**********************************************************************


From neil at jefferies.org  Fri May 12 08:05:42 2017
From: neil at jefferies.org (Neil Jefferies)
Date: Fri, 12 May 2017 13:05:42 +0100
Subject: [Pasig-discuss] WORM (Write Once Read Many) AIPs
In-Reply-To: <FBB24FE94D8FB245B07DD35074C8C1FF8E689447@s0395g.scotland.gov.uk>
References: <FBB24FE94D8FB245B07DD35074C8C1FF8E689447@s0395g.scotland.gov.uk>
Message-ID: <7290d680d2d83ae9c5d4a88371bb6147@imap.plus.net>

Tim,

If we store AIP's unpackaged, as a collection of files in a folder, then 
object updates could just be a new folder with symlinks to the unchanged 
parts and the updated parts in place in the folder. The object 
"location" would be a parent folder for all these version folders - for 
example, a pairtree (or triple-tree for faster scanning/rebuilds) based 
on object UUID. Version folders would be named accoprding to date or 
version number (date might make Memento compliant access simpler). 
Creating anew version clones the current verion (including links) with a 
new name and then replaces the updated parts in situ. Final act is to 
update a "current" symlink in the object. Any update failure will mean 
"current" is not updated an the partial clone can be discarded.

This assumes most updates are metadata and that a diff won't save much 
compared to a complete new XML file or whatever. I am also assuming that 
metadata won't be wrappered either (so you can forget METS) so that 
different types are stored in the most stuiable format and are accessed 
only when required. The problems with roundtripping packaged AIP's for 
updates rather than diff-ing are repeated by METS wrappering.

These may be a virtual folder/filesytem presentation and underneath an 
HSM would retrieve files from wherever when it is actually accessed. HSM 
policy in soemthing like SAM-QFS/Versity/Cray TAS can ensure folders are 
kep intact when moved to other storage (we could even dereference 
symlinks when dealing with tape).

This can be done with a POSIX filesystem and not muich code - Ben 
O'Steen started something along these lines here: 
https://github.com/dataflow/RDFDatabank/wiki/What-is-DataBank-and-what-does-it-do%3F

Fedora also also a versioning object store that could support this kind 
of model but also adds a fair bit of complexity to be 
Linked-Data_platform compliant.

In my paralance I would probably equate "Minimal Ingest" with "Sheer 
Curation" and APT with Asynchronous Message Driven Workers.

Neil


On 2017-05-12 12:33, Tim.Gollins at nrscotland.gov.uk wrote:
> Dear PASIG
> 
> I have been thinking recently about the challenge of managing
> "physical"  AIPs on offline or near line storage and how to optimise
> or simplify the use of managed storage media in a tape based (robotic)
> Hierarchical Storage Management (HSM) system. By "physical" AIPs I
> mean that the actual structure of the AIP written to the storage
> system is sufficiently self-describing that even if the management or
> other elements of a DP system were to be lost to a disaster then the
> entire collection could be fully re-instated reliably from the stored
> AIPs alone.
> 
> I have also been thinking about the huge benefits of adopting the
> concepts of "Minimal Ingest" (MI) and "Autonomous Preservation Tools"
> (APT) in a new Digital Archive solution.
> 
> One of the potential effects of the MI and APT concepts is that over
> time it is clear that while (of course) the original bit streams will
> never need to be updated, the metadata packaged in the AIP will need
> to change relatively often (through the life of the AIP) . This is of
> course in addition to any new renderings of the bit streams produced
> for preservation purposes (manifestations as termed in some systems).
> 
> If to update the AIP the process involves the AIP being "loaded" and
> "Modified" and "Stored" again as a whole then this will result in
> significant "churn" of the offline or near line media (i.e. tapes) in
> a HSM - which I would like to avoid. I think it would be really great
> if the AIP representation could accommodate the concept of an "update
> IP" (perhaps UIP?) where the UIP contains a "delta" of the original
> AIP - the full AIP then being interpreted as the original as modified
> by a series of deltas. This would then effectively result in AIPs (and
> UIPs) becoming WORM objects with clear benefits that I perceive in
> managing their reliable and safe storage.
> 
> I am not sufficiently familiar with the detail of all the different
> AIP models or implementations, I was wondering if anyone in the team
> would be able to comment on whether the they know of any AIP models,
> specifications or implementations that  would support such a use case.
> 
> I have just posted a version of this question to the E-Ark Linked in
> Group so my apologies to those who see it twice.
> 
> Many thanks
> 
> Tim
> Tim Gollins | Head of Digital Archiving and Director of the NRS
> Digital Preservation Programme
> National Records of Scotland | West Register House | Edinburgh EH2 4DF
> + 44 (0)131 535 1431 / + 44 (0)7974 922614 |
> tim.gollins at nrscotland.gov.uk | www.nrscotland.gov.uk
> 
> Preserving the past | Recording the present | Informing the future
> Follow us on Twitter: @NatRecordsScot | 
> http://twitter.com/NatRecordsScot
> 
> 
> **********************************************************************
> This e-mail (and any files or other attachments transmitted with it)
> is intended solely for the attention of the addressee(s). Unauthorised
> use, disclosure, storage, copying or distribution of any part of this
> e-mail is not permitted. If you are not the intended recipient please
> destroy the email, remove any copies from your system and inform the
> sender immediately by return.
> 
> Communications with the Scottish Government may be monitored or
> recorded in order to secure the effective operation of the system and
> for other lawful purposes. The views or opinions contained within this
> e-mail may not necessarily reflect those of the Scottish Government.
> 
> 
> Tha am post-d seo (agus faidhle neo ceanglan  c?mhla ris) dhan neach
> neo luchd-ainmichte a-mh?in. Chan eil e ceadaichte a chleachdadh ann
> an d?igh sam bith, a? toirt a-steach c?raichean, foillseachadh neo
> sgaoileadh,  gun chead. Ma ?s e is gun d?fhuair sibh seo le gun
> fhiosd?, bu choir cur ?s dhan phost-d agus lethbhreac sam bith air an
> t-siostam agaibh, leig fios chun  neach a sgaoil am post-d  gun d?il.
> 
> Dh?fhaodadh gum bi teachdaireachd sam bith bho Riaghaltas na h-Alba
> air a chl?radh neo air a sgr?dadh airson dearbhadh gu bheil an siostam
> ag obair gu h-?ifeachdach neo airson adhbhar laghail eile. Dh?fhaodadh
> nach  eil beachdan anns a? phost-d seo co-ionann ri beachdan
> Riaghaltas na h-Alba.
> **********************************************************************
> 
> 
> 
> ----
> To subscribe, unsubscribe, or modify your subscription, please visit
> http://mail.asis.org/mailman/listinfo/pasig-discuss
> _______
> PASIG Webinars and conference material is at
> http://www.preservationandarchivingsig.org/index.html
> _______________________________________________
> Pasig-discuss mailing list
> Pasig-discuss at mail.asis.org
> http://mail.asis.org/mailman/listinfo/pasig-discuss


From Tim.Gollins at nrscotland.gov.uk  Fri May 12 08:18:10 2017
From: Tim.Gollins at nrscotland.gov.uk (Tim.Gollins at nrscotland.gov.uk)
Date: Fri, 12 May 2017 12:18:10 +0000
Subject: [Pasig-discuss] WORM (Write Once Read Many) AIPs
In-Reply-To: <7290d680d2d83ae9c5d4a88371bb6147@imap.plus.net>
References: <FBB24FE94D8FB245B07DD35074C8C1FF8E689447@s0395g.scotland.gov.uk>
	<7290d680d2d83ae9c5d4a88371bb6147@imap.plus.net>
Message-ID: <FBB24FE94D8FB245B07DD35074C8C1FF8E6894FD@s0395g.scotland.gov.uk>

Hi Neil

Brilliant - Most helpful and thought provoking. The fact that Fedora has the idea of a versioning Object store is particularly interesting. 

I think there are a couple of distinctions between Minimal Ingest and Sheer Curation but (from a quick glance at Google articles) they are appear very closely related. I think APT uses something like Asynchronous Message Driven Workers. 

Very many thanks indeed,  especially for such a swift an comprehensive response.

Tim

Tim Gollins | Head of Digital Archiving and Director of the NRS Digital Preservation Programme
National Records of Scotland | West Register House | Edinburgh EH2 4DF
+ 44 (0)131 535 1431 / + 44 (0)7974 922614 | tim.gollins at nrscotland.gov.uk | www.nrscotland.gov.uk

Preserving the past | Recording the present | Informing the future
Follow us on Twitter: @NatRecordsScot | http://twitter.com/NatRecordsScot


-----Original Message-----
From: Neil Jefferies [mailto:neil at jefferies.org] 
Sent: 12 May 2017 13:06
To: Gollins T (Tim)
Cc: pasig-discuss at mail.asis.org
Subject: Re: [Pasig-discuss] WORM (Write Once Read Many) AIPs

Tim,

If we store AIP's unpackaged, as a collection of files in a folder, then 
object updates could just be a new folder with symlinks to the unchanged 
parts and the updated parts in place in the folder. The object 
"location" would be a parent folder for all these version folders - for 
example, a pairtree (or triple-tree for faster scanning/rebuilds) based 
on object UUID. Version folders would be named accoprding to date or 
version number (date might make Memento compliant access simpler). 
Creating anew version clones the current verion (including links) with a 
new name and then replaces the updated parts in situ. Final act is to 
update a "current" symlink in the object. Any update failure will mean 
"current" is not updated an the partial clone can be discarded.

This assumes most updates are metadata and that a diff won't save much 
compared to a complete new XML file or whatever. I am also assuming that 
metadata won't be wrappered either (so you can forget METS) so that 
different types are stored in the most stuiable format and are accessed 
only when required. The problems with roundtripping packaged AIP's for 
updates rather than diff-ing are repeated by METS wrappering.

These may be a virtual folder/filesytem presentation and underneath an 
HSM would retrieve files from wherever when it is actually accessed. HSM 
policy in soemthing like SAM-QFS/Versity/Cray TAS can ensure folders are 
kep intact when moved to other storage (we could even dereference 
symlinks when dealing with tape).

This can be done with a POSIX filesystem and not muich code - Ben 
O'Steen started something along these lines here: 
https://github.com/dataflow/RDFDatabank/wiki/What-is-DataBank-and-what-does-it-do%3F

Fedora also also a versioning object store that could support this kind 
of model but also adds a fair bit of complexity to be 
Linked-Data_platform compliant.

In my paralance I would probably equate "Minimal Ingest" with "Sheer 
Curation" and APT with Asynchronous Message Driven Workers.

Neil


On 2017-05-12 12:33, Tim.Gollins at nrscotland.gov.uk wrote:
> Dear PASIG
> 
> I have been thinking recently about the challenge of managing
> "physical"  AIPs on offline or near line storage and how to optimise
> or simplify the use of managed storage media in a tape based (robotic)
> Hierarchical Storage Management (HSM) system. By "physical" AIPs I
> mean that the actual structure of the AIP written to the storage
> system is sufficiently self-describing that even if the management or
> other elements of a DP system were to be lost to a disaster then the
> entire collection could be fully re-instated reliably from the stored
> AIPs alone.
> 
> I have also been thinking about the huge benefits of adopting the
> concepts of "Minimal Ingest" (MI) and "Autonomous Preservation Tools"
> (APT) in a new Digital Archive solution.
> 
> One of the potential effects of the MI and APT concepts is that over
> time it is clear that while (of course) the original bit streams will
> never need to be updated, the metadata packaged in the AIP will need
> to change relatively often (through the life of the AIP) . This is of
> course in addition to any new renderings of the bit streams produced
> for preservation purposes (manifestations as termed in some systems).
> 
> If to update the AIP the process involves the AIP being "loaded" and
> "Modified" and "Stored" again as a whole then this will result in
> significant "churn" of the offline or near line media (i.e. tapes) in
> a HSM - which I would like to avoid. I think it would be really great
> if the AIP representation could accommodate the concept of an "update
> IP" (perhaps UIP?) where the UIP contains a "delta" of the original
> AIP - the full AIP then being interpreted as the original as modified
> by a series of deltas. This would then effectively result in AIPs (and
> UIPs) becoming WORM objects with clear benefits that I perceive in
> managing their reliable and safe storage.
> 
> I am not sufficiently familiar with the detail of all the different
> AIP models or implementations, I was wondering if anyone in the team
> would be able to comment on whether the they know of any AIP models,
> specifications or implementations that  would support such a use case.
> 
> I have just posted a version of this question to the E-Ark Linked in
> Group so my apologies to those who see it twice.
> 
> Many thanks
> 
> Tim
> Tim Gollins | Head of Digital Archiving and Director of the NRS
> Digital Preservation Programme
> National Records of Scotland | West Register House | Edinburgh EH2 4DF
> + 44 (0)131 535 1431 / + 44 (0)7974 922614 |
> tim.gollins at nrscotland.gov.uk | www.nrscotland.gov.uk
> 
> Preserving the past | Recording the present | Informing the future
> Follow us on Twitter: @NatRecordsScot | 
> http://twitter.com/NatRecordsScot
> 
> 
> **********************************************************************
> This e-mail (and any files or other attachments transmitted with it)
> is intended solely for the attention of the addressee(s). Unauthorised
> use, disclosure, storage, copying or distribution of any part of this
> e-mail is not permitted. If you are not the intended recipient please
> destroy the email, remove any copies from your system and inform the
> sender immediately by return.
> 
> Communications with the Scottish Government may be monitored or
> recorded in order to secure the effective operation of the system and
> for other lawful purposes. The views or opinions contained within this
> e-mail may not necessarily reflect those of the Scottish Government.
> 
> 
> Tha am post-d seo (agus faidhle neo ceanglan  c?mhla ris) dhan neach
> neo luchd-ainmichte a-mh?in. Chan eil e ceadaichte a chleachdadh ann
> an d?igh sam bith, a? toirt a-steach c?raichean, foillseachadh neo
> sgaoileadh,  gun chead. Ma ?s e is gun d?fhuair sibh seo le gun
> fhiosd?, bu choir cur ?s dhan phost-d agus lethbhreac sam bith air an
> t-siostam agaibh, leig fios chun  neach a sgaoil am post-d  gun d?il.
> 
> Dh?fhaodadh gum bi teachdaireachd sam bith bho Riaghaltas na h-Alba
> air a chl?radh neo air a sgr?dadh airson dearbhadh gu bheil an siostam
> ag obair gu h-?ifeachdach neo airson adhbhar laghail eile. Dh?fhaodadh
> nach  eil beachdan anns a? phost-d seo co-ionann ri beachdan
> Riaghaltas na h-Alba.
> **********************************************************************
> 
> 
> 
> ----
> To subscribe, unsubscribe, or modify your subscription, please visit
> http://mail.asis.org/mailman/listinfo/pasig-discuss
> _______
> PASIG Webinars and conference material is at
> http://www.preservationandarchivingsig.org/index.html
> _______________________________________________
> Pasig-discuss mailing list
> Pasig-discuss at mail.asis.org
> http://mail.asis.org/mailman/listinfo/pasig-discuss


______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________

*********************************** ********************************
This email has been received from an external party and
has been swept for the presence of computer viruses.
******************************************************************** 


From jfarmer at cambridgecomputer.com  Fri May 12 09:33:11 2017
From: jfarmer at cambridgecomputer.com (Jacob Farmer)
Date: Fri, 12 May 2017 09:33:11 -0400
Subject: [Pasig-discuss] WORM (Write Once Read Many) AIPs
In-Reply-To: <7290d680d2d83ae9c5d4a88371bb6147@imap.plus.net>
References: <FBB24FE94D8FB245B07DD35074C8C1FF8E689447@s0395g.scotland.gov.uk>
	<7290d680d2d83ae9c5d4a88371bb6147@imap.plus.net>
Message-ID: <63d06e35b40be1c7d0ff6e5613950844@mail.gmail.com>

Two warnings and two suggestions:

Warnings:

1)  Symlinks and Housekeeping -- It is a common practice to use symlinks to
make versioned file collections.  If you do this, you should have some kind
of housekeeping processes that ensure that the symlinks are all working
correctly.  If files ever have to get migrated, symlinks can break.

2)  Check with your file system vendor -- Most removable media file systems
have some built in limitations on the number of inodes (files) that you can
have in one file system.  If you generate a lot of symlinks, you might
overwhelm the file system.  Your vendor will know.

Suggestions:

1)  Hashes for file names -- If your application software maintains a hash
for each file, you might consider naming the file according to the hash.
Use the first two digits for the parent directory, the next two digits for
sub-diretory, the next two digits for sub-directory.  Then use the full hash
for the file name.  This turns your POSIX file system into an object store
with uniquely named objects.

	As a safeguard, you might maintain a separate table or list that associates
path names with hashes.

2)  Consider using hard links instead of symlinks -- You might use hard
links instead of symlinks, presuming that the files are all in the same file
system.  You still have to watch for file count issues, but you have less
housekeeping to do.

I hope that helps.


Jacob Farmer  |  Chief Technology Officer  |  Cambridge Computer  |
"Artists In Data Storage"
Phone 781-250-3210  |  jfarmer at CambridgeComputer.com  |
www.CambridgeComputer.com


-----Original Message-----
From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of
Neil Jefferies
Sent: Friday, May 12, 2017 8:06 AM
To: Tim.Gollins at nrscotland.gov.uk
Cc: pasig-discuss at mail.asis.org
Subject: Re: [Pasig-discuss] WORM (Write Once Read Many) AIPs

Tim,

If we store AIP's unpackaged, as a collection of files in a folder, then
object updates could just be a new folder with symlinks to the unchanged
parts and the updated parts in place in the folder. The object "location"
would be a parent folder for all these version folders - for example, a
pairtree (or triple-tree for faster scanning/rebuilds) based on object UUID.
Version folders would be named accoprding to date or version number (date
might make Memento compliant access simpler).
Creating anew version clones the current verion (including links) with a new
name and then replaces the updated parts in situ. Final act is to update a
"current" symlink in the object. Any update failure will mean "current" is
not updated an the partial clone can be discarded.

This assumes most updates are metadata and that a diff won't save much
compared to a complete new XML file or whatever. I am also assuming that
metadata won't be wrappered either (so you can forget METS) so that
different types are stored in the most stuiable format and are accessed only
when required. The problems with roundtripping packaged AIP's for updates
rather than diff-ing are repeated by METS wrappering.

These may be a virtual folder/filesytem presentation and underneath an HSM
would retrieve files from wherever when it is actually accessed. HSM policy
in soemthing like SAM-QFS/Versity/Cray TAS can ensure folders are kep intact
when moved to other storage (we could even dereference symlinks when dealing
with tape).

This can be done with a POSIX filesystem and not muich code - Ben O'Steen
started something along these lines here:
https://github.com/dataflow/RDFDatabank/wiki/What-is-DataBank-and-what-does-it-do%3F

Fedora also also a versioning object store that could support this kind of
model but also adds a fair bit of complexity to be Linked-Data_platform
compliant.

In my paralance I would probably equate "Minimal Ingest" with "Sheer
Curation" and APT with Asynchronous Message Driven Workers.

Neil


On 2017-05-12 12:33, Tim.Gollins at nrscotland.gov.uk wrote:
> Dear PASIG
>
> I have been thinking recently about the challenge of managing
> "physical"  AIPs on offline or near line storage and how to optimise
> or simplify the use of managed storage media in a tape based (robotic)
> Hierarchical Storage Management (HSM) system. By "physical" AIPs I
> mean that the actual structure of the AIP written to the storage
> system is sufficiently self-describing that even if the management or
> other elements of a DP system were to be lost to a disaster then the
> entire collection could be fully re-instated reliably from the stored
> AIPs alone.
>
> I have also been thinking about the huge benefits of adopting the
> concepts of "Minimal Ingest" (MI) and "Autonomous Preservation Tools"
> (APT) in a new Digital Archive solution.
>
> One of the potential effects of the MI and APT concepts is that over
> time it is clear that while (of course) the original bit streams will
> never need to be updated, the metadata packaged in the AIP will need
> to change relatively often (through the life of the AIP) . This is of
> course in addition to any new renderings of the bit streams produced
> for preservation purposes (manifestations as termed in some systems).
>
> If to update the AIP the process involves the AIP being "loaded" and
> "Modified" and "Stored" again as a whole then this will result in
> significant "churn" of the offline or near line media (i.e. tapes) in
> a HSM - which I would like to avoid. I think it would be really great
> if the AIP representation could accommodate the concept of an "update
> IP" (perhaps UIP?) where the UIP contains a "delta" of the original
> AIP - the full AIP then being interpreted as the original as modified
> by a series of deltas. This would then effectively result in AIPs (and
> UIPs) becoming WORM objects with clear benefits that I perceive in
> managing their reliable and safe storage.
>
> I am not sufficiently familiar with the detail of all the different
> AIP models or implementations, I was wondering if anyone in the team
> would be able to comment on whether the they know of any AIP models,
> specifications or implementations that  would support such a use case.
>
> I have just posted a version of this question to the E-Ark Linked in
> Group so my apologies to those who see it twice.
>
> Many thanks
>
> Tim
> Tim Gollins | Head of Digital Archiving and Director of the NRS
> Digital Preservation Programme National Records of Scotland | West
> Register House | Edinburgh EH2 4DF
> + 44 (0)131 535 1431 / + 44 (0)7974 922614 |
> tim.gollins at nrscotland.gov.uk | www.nrscotland.gov.uk
>
> Preserving the past | Recording the present | Informing the future
> Follow us on Twitter: @NatRecordsScot |
> http://twitter.com/NatRecordsScot
>
>
> **********************************************************************
> This e-mail (and any files or other attachments transmitted with it)
> is intended solely for the attention of the addressee(s). Unauthorised
> use, disclosure, storage, copying or distribution of any part of this
> e-mail is not permitted. If you are not the intended recipient please
> destroy the email, remove any copies from your system and inform the
> sender immediately by return.
>
> Communications with the Scottish Government may be monitored or
> recorded in order to secure the effective operation of the system and
> for other lawful purposes. The views or opinions contained within this
> e-mail may not necessarily reflect those of the Scottish Government.
>
>
> Tha am post-d seo (agus faidhle neo ceanglan  c?mhla ris) dhan neach
> neo luchd-ainmichte a-mh?in. Chan eil e ceadaichte a chleachdadh ann
> an d?igh sam bith, a? toirt a-steach c?raichean, foillseachadh neo
> sgaoileadh,  gun chead. Ma ?s e is gun d?fhuair sibh seo le gun
> fhiosd?, bu choir cur ?s dhan phost-d agus lethbhreac sam bith air an
> t-siostam agaibh, leig fios chun  neach a sgaoil am post-d  gun d?il.
>
> Dh?fhaodadh gum bi teachdaireachd sam bith bho Riaghaltas na h-Alba
> air a chl?radh neo air a sgr?dadh airson dearbhadh gu bheil an siostam
> ag obair gu h-?ifeachdach neo airson adhbhar laghail eile. Dh?fhaodadh
> nach  eil beachdan anns a? phost-d seo co-ionann ri beachdan
> Riaghaltas na h-Alba.
> **********************************************************************
>
>
>
> ----
> To subscribe, unsubscribe, or modify your subscription, please visit
> http://mail.asis.org/mailman/listinfo/pasig-discuss
> _______
> PASIG Webinars and conference material is at
> http://www.preservationandarchivingsig.org/index.html
> _______________________________________________
> Pasig-discuss mailing list
> Pasig-discuss at mail.asis.org
> http://mail.asis.org/mailman/listinfo/pasig-discuss

----
To subscribe, unsubscribe, or modify your subscription, please visit
http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at
http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org
http://mail.asis.org/mailman/listinfo/pasig-discuss


From twalsh at cca.qc.ca  Fri May 12 10:15:39 2017
From: twalsh at cca.qc.ca (Tim Walsh)
Date: Fri, 12 May 2017 14:15:39 +0000
Subject: [Pasig-discuss] Digital repository storage benchmarking
Message-ID: <8B597316-5049-40E0-A7C4-4F7431E69E76@cca.qc.ca>

Dear PASIG,

I am currently in the process of benchmarking digital repository storage setups with our Director of IT, and am having trouble finding very much information about other institutions? configurations online. It?s very possible that this question has been asked before on-list, but I wasn?t able to find anything in the list archives.

For context, we are a research museum with significant born-digital archival holdings preparing to manage about 200 TB of digital objects over the next 3 years, replicated several times on various media. The question is what precisely those ?various media? will be. Currently, our plan is to store one copy on disk on-site, one copy on disk in a managed off-site facility, and a third copy on LTO sent to a third facility. Before we commit, we?d like to benchmark our plans against other institutions.

I have been able to find information about the storage configurations for MoMA and the Computer History Museum (who each wrote blog posts or presented on this topic), but not very many others. So my questions are:

* Could you point me to published/available resources outlining other institutions? digital repository storage configurations?
* Or, if you work at an institution, would you be willing to share the details of your configuration on- or off-list? (any information sent off-list will be kept strictly confidential)

Helpful details would include: amount of digital objects being stored; how many copies of data are being stored; which copies are online, nearline, or offline; which media are being used for which copies; and what services/software applications are you using to manage the creation and maintainance of backups.

Thank you!
Tim

- - -

Tim Walsh
Archiviste, Archives num?riques
Archivist, Digital Archives

Centre Canadien d?Architecture
Canadian Centre for Architecture
1920, rue Baile, Montr?al, Qu?bec  H3H 2S6
T 514 939 7001 x 1532
F 514 939 7020
www.cca.qc.ca<http://www.cca.qc.ca/>


Pensez ? l?environnement avant d?imprimer ce message
Please consider the environment before printing this email
Ce courriel peut contenir des renseignements confidentiels. Si vous n??tes
pas le destinataire pr?vu, veuillez nous en aviser imm?diatement. Merci
?galement de supprimer le pr?sent courriel et d?en d?truire toute copie.
This email may contain confidential information. If you are not the intended
recipient, please advise us immediately and delete this email as well
as any other copy. Thank you.


From preservation.guide at gmail.com  Fri May 12 10:30:10 2017
From: preservation.guide at gmail.com (Richard Wright)
Date: Fri, 12 May 2017 14:30:10 +0000
Subject: [Pasig-discuss] Digital repository storage benchmarking
In-Reply-To: <8B597316-5049-40E0-A7C4-4F7431E69E76@cca.qc.ca>
References: <8B597316-5049-40E0-A7C4-4F7431E69E76@cca.qc.ca>
Message-ID: <CANacnTwXNTsAfKLdaK38EFTo+xWF9ecPoMF4JyQjdeX9BvD96w@mail.gmail.com>

Tim and all -- quite a few case studies in the presentations at this
conference from a few years ago:
http://www.digitalpreservation.gov/meetings/storage14.html

On Fri, 12 May 2017 at 15:18 Tim Walsh <twalsh at cca.qc.ca> wrote:

> Dear PASIG,
>
> I am currently in the process of benchmarking digital repository storage
> setups with our Director of IT, and am having trouble finding very much
> information about other institutions? configurations online. It?s very
> possible that this question has been asked before on-list, but I wasn?t
> able to find anything in the list archives.
>
> For context, we are a research museum with significant born-digital
> archival holdings preparing to manage about 200 TB of digital objects over
> the next 3 years, replicated several times on various media. The question
> is what precisely those ?various media? will be. Currently, our plan is to
> store one copy on disk on-site, one copy on disk in a managed off-site
> facility, and a third copy on LTO sent to a third facility. Before we
> commit, we?d like to benchmark our plans against other institutions.
>
> I have been able to find information about the storage configurations for
> MoMA and the Computer History Museum (who each wrote blog posts or
> presented on this topic), but not very many others. So my questions are:
>
> * Could you point me to published/available resources outlining other
> institutions? digital repository storage configurations?
> * Or, if you work at an institution, would you be willing to share the
> details of your configuration on- or off-list? (any information sent
> off-list will be kept strictly confidential)
>
> Helpful details would include: amount of digital objects being stored; how
> many copies of data are being stored; which copies are online, nearline, or
> offline; which media are being used for which copies; and what
> services/software applications are you using to manage the creation and
> maintainance of backups.
>
> Thank you!
> Tim
>
> - - -
>
> Tim Walsh
> Archiviste, Archives num?riques
> Archivist, Digital Archives
>
> Centre Canadien d?Architecture
> Canadian Centre for Architecture
> 1920, rue Baile, Montr?al, Qu?bec  H3H 2S6
> T 514 939 7001 x 1532
> F 514 939 7020
> www.cca.qc.ca<http://www.cca.qc.ca/>
>
>
> Pensez ? l?environnement avant d?imprimer ce message
> Please consider the environment before printing this email
> Ce courriel peut contenir des renseignements confidentiels. Si vous n??tes
> pas le destinataire pr?vu, veuillez nous en aviser imm?diatement. Merci
> ?galement de supprimer le pr?sent courriel et d?en d?truire toute copie.
> This email may contain confidential information. If you are not the
> intended
> recipient, please advise us immediately and delete this email as well
> as any other copy. Thank you.
>
> ----
> To subscribe, unsubscribe, or modify your subscription, please visit
> http://mail.asis.org/mailman/listinfo/pasig-discuss
> _______
> PASIG Webinars and conference material is at
> http://www.preservationandarchivingsig.org/index.html
> _______________________________________________
> Pasig-discuss mailing list
> Pasig-discuss at mail.asis.org
> http://mail.asis.org/mailman/listinfo/pasig-discuss
>
-- 
Regards, Richard
Richard Wright +44 7724 717 981
preservationguide.co.uk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170512/90f3b568/attachment.html>

From Tim.Gollins at nrscotland.gov.uk  Fri May 12 10:54:17 2017
From: Tim.Gollins at nrscotland.gov.uk (Tim.Gollins at nrscotland.gov.uk)
Date: Fri, 12 May 2017 14:54:17 +0000
Subject: [Pasig-discuss] Digital repository storage benchmarking
In-Reply-To: <CANacnTwXNTsAfKLdaK38EFTo+xWF9ecPoMF4JyQjdeX9BvD96w@mail.gmail.com>
References: <8B597316-5049-40E0-A7C4-4F7431E69E76@cca.qc.ca> 
	<CANacnTwXNTsAfKLdaK38EFTo+xWF9ecPoMF4JyQjdeX9BvD96w@mail.gmail.com>
Message-ID: <FBB24FE94D8FB245B07DD35074C8C1FF8E6896F3@s0395g.scotland.gov.uk>

Time, Richard

Very many thanks from me too ? the answer to this question also helps me understand more in the context of my own recent question on WORM AIPs .

All the best

Tim

Tim Gollins | Head of Digital Archiving and Director of the NRS Digital Preservation Programme
National Records of Scotland | West Register House | Edinburgh EH2 4DF
+ 44 (0)131 535 1431 / + 44 (0)7974 922614 | tim.gollins at nrscotland.gov.uk<mailto:tim.gollins at nrscotland.gov.uk> | www.nrscotland.gov.uk<http://www.nrscotland.gov.uk/>

Preserving the past | Recording the present | Informing the future
Follow us on Twitter: @NatRecordsScot | http://twitter.com/NatRecordsScot

From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Richard Wright
Sent: 12 May 2017 15:30
To: Tim Walsh; pasig-discuss at asis.org
Subject: Re: [Pasig-discuss] Digital repository storage benchmarking

Tim and all -- quite a few case studies in the presentations at this conference from a few years ago: http://www.digitalpreservation.gov/meetings/storage14.html

On Fri, 12 May 2017 at 15:18 Tim Walsh <twalsh at cca.qc.ca<mailto:twalsh at cca.qc.ca>> wrote:
Dear PASIG,

I am currently in the process of benchmarking digital repository storage setups with our Director of IT, and am having trouble finding very much information about other institutions? configurations online. It?s very possible that this question has been asked before on-list, but I wasn?t able to find anything in the list archives.

For context, we are a research museum with significant born-digital archival holdings preparing to manage about 200 TB of digital objects over the next 3 years, replicated several times on various media. The question is what precisely those ?various media? will be. Currently, our plan is to store one copy on disk on-site, one copy on disk in a managed off-site facility, and a third copy on LTO sent to a third facility. Before we commit, we?d like to benchmark our plans against other institutions.

I have been able to find information about the storage configurations for MoMA and the Computer History Museum (who each wrote blog posts or presented on this topic), but not very many others. So my questions are:

* Could you point me to published/available resources outlining other institutions? digital repository storage configurations?
* Or, if you work at an institution, would you be willing to share the details of your configuration on- or off-list? (any information sent off-list will be kept strictly confidential)

Helpful details would include: amount of digital objects being stored; how many copies of data are being stored; which copies are online, nearline, or offline; which media are being used for which copies; and what services/software applications are you using to manage the creation and maintainance of backups.

Thank you!
Tim

- - -

Tim Walsh
Archiviste, Archives num?riques
Archivist, Digital Archives

Centre Canadien d?Architecture
Canadian Centre for Architecture
1920, rue Baile, Montr?al, Qu?bec  H3H 2S6
T 514 939 7001 x 1532
F 514 939 7020
www.cca.qc.ca<http://www.cca.qc.ca><http://www.cca.qc.ca/>


Pensez ? l?environnement avant d?imprimer ce message
Please consider the environment before printing this email
Ce courriel peut contenir des renseignements confidentiels. Si vous n??tes
pas le destinataire pr?vu, veuillez nous en aviser imm?diatement. Merci
?galement de supprimer le pr?sent courriel et d?en d?truire toute copie.
This email may contain confidential information. If you are not the intended
recipient, please advise us immediately and delete this email as well
as any other copy. Thank you.

----
To subscribe, unsubscribe, or modify your subscription, please visit
http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org<mailto:Pasig-discuss at mail.asis.org>
http://mail.asis.org/mailman/listinfo/pasig-discuss
--
Regards, Richard
Richard Wright +44 7724 717 981
preservationguide.co.uk<http://preservationguide.co.uk>

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________

*********************************** ********************************

This email has been received from an external party and

has been swept for the presence of computer viruses.

********************************************************************

**********************************************************************
This e-mail (and any files or other attachments transmitted with it) is intended solely for the attention of the addressee(s). Unauthorised use, disclosure, storage, copying or distribution of any part of this e-mail is not permitted. If you are not the intended recipient please destroy the email, remove any copies from your system and inform the sender immediately by return.

Communications with the Scottish Government may be monitored or recorded in order to secure the effective operation of the system and for other lawful purposes. The views or opinions contained within this e-mail may not necessarily reflect those of the Scottish Government.


Tha am post-d seo (agus faidhle neo ceanglan  c?mhla ris) dhan neach neo luchd-ainmichte a-mh?in. Chan eil e ceadaichte a chleachdadh ann an d?igh sam bith, a? toirt a-steach c?raichean, foillseachadh neo sgaoileadh,  gun chead. Ma ?s e is gun d?fhuair sibh seo le gun fhiosd?, bu choir cur ?s dhan phost-d agus lethbhreac sam bith air an t-siostam agaibh, leig fios chun  neach a sgaoil am post-d  gun d?il.  

Dh?fhaodadh gum bi teachdaireachd sam bith bho Riaghaltas na h-Alba air a chl?radh neo air a sgr?dadh airson dearbhadh gu bheil an siostam ag obair gu h-?ifeachdach neo airson adhbhar laghail eile. Dh?fhaodadh nach  eil beachdan anns a? phost-d seo co-ionann ri beachdan Riaghaltas na h-Alba. 
**********************************************************************


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170512/5a3ce9ec/attachment-0001.html>

From Sheila.Morrissey at ithaka.org  Fri May 12 13:43:56 2017
From: Sheila.Morrissey at ithaka.org (Sheila Morrissey)
Date: Fri, 12 May 2017 17:43:56 +0000
Subject: [Pasig-discuss] FW: Digital repository storage benchmarking
References: <8B597316-5049-40E0-A7C4-4F7431E69E76@cca.qc.ca> 
Message-ID: <a101f4645a4b4bb69e7fa3c512713f7b@pr2exchmbx.office.share.org>


Hello, Tim,

At Portico (http://www.portico.org/digital-preservation/), we preserve e-journals, e-books, digitized historical collections, and other born-digital scholarly content.

Currently, the Portico archive is comprised of roughly 77.7 million digital objects (we call them "Archival Units", or AUs); comprising over 400 TB; made up of 1.3 billion files.

We maintain 3 copies of the archive:  2 on disk in geographically distributed data centers, and a 3rd copy in commercial cloud storage.  We create and maintain backups (including fixity checks) using our own custom-written software.

I hope this helpful.

Best regards,
Sheila


Sheila M. Morrissey
Senior Researcher
ITHAKA
100 Campus Drive
Suite 100
Princeton NJ 08540
609-986-2221? ?
sheila.morrissey at ithaka.org
?
ITHAKA (www.ithaka.org) is a not-for-profit organization that helps the academic community use digital technologies to preserve the scholarly record and to advance research and teaching in sustainable ways.? We provide innovative services that benefit higher education, including Ithaka S+R, JSTOR, and Portico.


-----Original Message-----
From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Tim Walsh
Sent: Friday, May 12, 2017 10:16 AM
To: pasig-discuss at asis.org
Subject: [Pasig-discuss] Digital repository storage benchmarking

Dear PASIG,

I am currently in the process of benchmarking digital repository storage setups with our Director of IT, and am having trouble finding very much information about other institutions? configurations online. It?s very possible that this question has been asked before on-list, but I wasn?t able to find anything in the list archives.

For context, we are a research museum with significant born-digital archival holdings preparing to manage about 200 TB of digital objects over the next 3 years, replicated several times on various media. The question is what precisely those ?various media? will be. Currently, our plan is to store one copy on disk on-site, one copy on disk in a managed off-site facility, and a third copy on LTO sent to a third facility. Before we commit, we?d like to benchmark our plans against other institutions.

I have been able to find information about the storage configurations for MoMA and the Computer History Museum (who each wrote blog posts or presented on this topic), but not very many others. So my questions are:

* Could you point me to published/available resources outlining other institutions? digital repository storage configurations?
* Or, if you work at an institution, would you be willing to share the details of your configuration on- or off-list? (any information sent off-list will be kept strictly confidential)

Helpful details would include: amount of digital objects being stored; how many copies of data are being stored; which copies are online, nearline, or offline; which media are being used for which copies; and what services/software applications are you using to manage the creation and maintainance of backups.

Thank you!
Tim

- - -

Tim Walsh
Archiviste, Archives num?riques
Archivist, Digital Archives

Centre Canadien d?Architecture
Canadian Centre for Architecture
1920, rue Baile, Montr?al, Qu?bec  H3H 2S6 T 514 939 7001 x 1532 F 514 939 7020 www.cca.qc.ca<http://www.cca.qc.ca/>


Pensez ? l?environnement avant d?imprimer ce message Please consider the environment before printing this email Ce courriel peut contenir des renseignements confidentiels. Si vous n??tes pas le destinataire pr?vu, veuillez nous en aviser imm?diatement. Merci ?galement de supprimer le pr?sent courriel et d?en d?truire toute copie.
This email may contain confidential information. If you are not the intended recipient, please advise us immediately and delete this email as well as any other copy. Thank you.

----
To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org
http://mail.asis.org/mailman/listinfo/pasig-discuss


From randy_stern at harvard.edu  Fri May 12 13:58:46 2017
From: randy_stern at harvard.edu (Stern, Randy)
Date: Fri, 12 May 2017 17:58:46 +0000
Subject: [Pasig-discuss] FW: Digital repository storage benchmarking
Message-ID: <85DD9199-4D53-4754-8802-6C3171BD3BC4@harvard.edu>

Harvard is similar ? 2 disk copies in geographically distributed sites on, and one tape copy in a third location. We also have a 4th copy on tape in a tape library that is creating the tapes we remove off site to the third location. We run fixity checks on the disk copies, but not the tape copy. We currently have in excess of 200TB for each copy.

We currently store preservation and real-time access copies of files in the same storage system with the same storage policies. We expect that to change in the future, with likely delivery copy storage in the cloud.

Randy

On 5/12/17, 1:43 PM, "Sheila Morrissey" <Sheila.Morrissey at ithaka.org> wrote:

    
    Hello, Tim,
    
    At Portico (http://www.portico.org/digital-preservation/), we preserve e-journals, e-books, digitized historical collections, and other born-digital scholarly content.
    
    Currently, the Portico archive is comprised of roughly 77.7 million digital objects (we call them "Archival Units", or AUs); comprising over 400 TB; made up of 1.3 billion files.
    
    We maintain 3 copies of the archive:  2 on disk in geographically distributed data centers, and a 3rd copy in commercial cloud storage.  We create and maintain backups (including fixity checks) using our own custom-written software.
    
    I hope this helpful.
    
    Best regards,
    Sheila
    
    
    Sheila M. Morrissey
    Senior Researcher
    ITHAKA
    100 Campus Drive
    Suite 100
    Princeton NJ 08540
    609-986-2221   
    sheila.morrissey at ithaka.org
     
    ITHAKA (www.ithaka.org) is a not-for-profit organization that helps the academic community use digital technologies to preserve the scholarly record and to advance research and teaching in sustainable ways.  We provide innovative services that benefit higher education, including Ithaka S+R, JSTOR, and Portico.
    
    
    -----Original Message-----
    From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Tim Walsh
    Sent: Friday, May 12, 2017 10:16 AM
    To: pasig-discuss at asis.org
    Subject: [Pasig-discuss] Digital repository storage benchmarking
    
    Dear PASIG,
    
    I am currently in the process of benchmarking digital repository storage setups with our Director of IT, and am having trouble finding very much information about other institutions? configurations online. It?s very possible that this question has been asked before on-list, but I wasn?t able to find anything in the list archives.
    
    For context, we are a research museum with significant born-digital archival holdings preparing to manage about 200 TB of digital objects over the next 3 years, replicated several times on various media. The question is what precisely those ?various media? will be. Currently, our plan is to store one copy on disk on-site, one copy on disk in a managed off-site facility, and a third copy on LTO sent to a third facility. Before we commit, we?d like to benchmark our plans against other institutions.
    
    I have been able to find information about the storage configurations for MoMA and the Computer History Museum (who each wrote blog posts or presented on this topic), but not very many others. So my questions are:
    
    * Could you point me to published/available resources outlining other institutions? digital repository storage configurations?
    * Or, if you work at an institution, would you be willing to share the details of your configuration on- or off-list? (any information sent off-list will be kept strictly confidential)
    
    Helpful details would include: amount of digital objects being stored; how many copies of data are being stored; which copies are online, nearline, or offline; which media are being used for which copies; and what services/software applications are you using to manage the creation and maintainance of backups.
    
    Thank you!
    Tim
    
    - - -
    
    Tim Walsh
    Archiviste, Archives num?riques
    Archivist, Digital Archives
    
    Centre Canadien d?Architecture
    Canadian Centre for Architecture
    1920, rue Baile, Montr?al, Qu?bec  H3H 2S6 T 514 939 7001 x 1532 F 514 939 7020 www.cca.qc.ca<http://www.cca.qc.ca/>
    
    
    Pensez ? l?environnement avant d?imprimer ce message Please consider the environment before printing this email Ce courriel peut contenir des renseignements confidentiels. Si vous n??tes pas le destinataire pr?vu, veuillez nous en aviser imm?diatement. Merci ?galement de supprimer le pr?sent courriel et d?en d?truire toute copie.
    This email may contain confidential information. If you are not the intended recipient, please advise us immediately and delete this email as well as any other copy. Thank you.
    
    ----
    To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
    _______
    PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
    _______________________________________________
    Pasig-discuss mailing list
    Pasig-discuss at mail.asis.org
    http://mail.asis.org/mailman/listinfo/pasig-discuss
    
    ----
    To subscribe, unsubscribe, or modify your subscription, please visit
    http://mail.asis.org/mailman/listinfo/pasig-discuss
    _______
    PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
    _______________________________________________
    Pasig-discuss mailing list
    Pasig-discuss at mail.asis.org
    http://mail.asis.org/mailman/listinfo/pasig-discuss
    

From charlottekostelic at gmail.com  Fri May 12 14:23:36 2017
From: charlottekostelic at gmail.com (Charlotte Kostelic)
Date: Fri, 12 May 2017 19:23:36 +0100
Subject: [Pasig-discuss] Call for Proposals - NDSA 2017 Pittsburgh, PA
Message-ID: <CAFCOi3qKQ8XZrLzLvvb-eVFco3-7ng2d1=TwRaxyYDVR4dQLyw@mail.gmail.com>

The National Digital Stewardship Alliance (NDSA <http://ndsa.diglib.org/>)
invites proposals for Digital Preservation 2017: ?Preservation is
Political,? <http://ndsa.org/meetings/> to be held in Pittsburgh,
Pennsylvania, October 25-26, 2017.

Digital Preservation is the major meeting and conference of the NDSA?open
to members and non-members alike?focusing on tools, techniques, theories
and methodologies for digital stewardship and preservation, data curation,
the content lifecycle, and related issues. Our 2017 meeting is held in
partnership with our host organization, the Digital Library Federation (DLF
<https://www.diglib.org/>). Separate calls are being issued for the DLF
Liberal Arts Colleges Pre-Conference
<https://www.diglib.org/groups/liberal-arts-colleges/> (22 October) and 2017
DLF Forum <https://www.diglib.org/forums/2017forum/> (23-24 October)?all
happening in the same location.

Proposals are due by May 22th <https://conftool.pro/dlf2017/> at 11:59pm
Pacific Time.

About the NDSA and Digital Preservation 2017:

The National Digital Stewardship Alliance is a consortium of more than 160
organizations <http://ndsa.diglib.org/members-list/> committed to the
long-term preservation and stewardship of digital information and cultural
heritage, for the benefit of present and future generations. Digital
Preservation 2017 (#digipres17
<https://twitter.com/search?q=%23digipres17&src=typd>) will help to chart
future directions for both the NDSA and digital stewardship, and is
expected to be a crucial venue for intellectual exchange,
community-building, development of best practices, and national-level
agenda-setting in the field.

The conference will be held at the  Westin Convention Center
<http://www.westinpittsburgh.com/> ?where downtown buzz meets restorative
sleep?, just blocks from historic Market Square
<http://marketsquarepgh.com/>, The Andy Warhol Museum
<http://www.warhol.org/>, boutiques, restaurants, and nightlife. The NDSA
strives to create a safe, accessible, welcoming, and inclusive event, and
will operate under the DLF Forum?s Code of Conduct
<https://www.diglib.org/forums/2016forum/code-of-conduct/>.

Submissions:

250-word proposals describing the presentation/demo/poster are invited (500
words for full panel sessions). Please also include a 50-word short
abstract for the program if your submission is selected. Submit proposals
online: https://conftool.pro/dlf2017/.

Deadline: May 9th, 2017 at 11:59pm PT.

We especially encourage proposals that speak to our conference theme,
?Preservation
is Political.? This core theme emerged from a discussion of strategic
topics, our practice, our mission and the challenges.

Submissions are invited in the following lengths and formats:

Talks/Demos: Presentations and demonstrations are allocated 30 minutes
each. Speakers should reserve time for interactive exchanges on next steps,
possible NDSA community action, and discussion or debate.

Panels: Panel discussions with 4 or more speakers will be given a dedicated
session. Organizers are especially encouraged to include as diverse an
array of perspectives and voices as possible, and to reserve time for
audience Q&A.

Minute Madness: Share your ideas in 60 seconds or less as part of the
opening plenary of the conference. Presenters will have the option to
display posters during the reception that follows. (Guidelines for poster
sizes will be provided on acceptance.)

Lunchtime Working Group Meetings: NDSA working and interest group chairs
are invited to propose group meetings or targeted collaboration sessions.
(Lunch provided.)

All submissions will be peer-reviewed by NDSA?s volunteer Program
Committee. Presenters will be notified in July and guaranteed a
registration slot at the conference.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170512/356fb9cd/attachment-0001.html>

From tab.butler at mlb.com  Fri May 12 14:43:32 2017
From: tab.butler at mlb.com (Butler, Tab)
Date: Fri, 12 May 2017 18:43:32 +0000
Subject: [Pasig-discuss] FW: Digital repository storage benchmarking
In-Reply-To: <85DD9199-4D53-4754-8802-6C3171BD3BC4@harvard.edu>
References: <85DD9199-4D53-4754-8802-6C3171BD3BC4@harvard.edu>
Message-ID: <BL2PR20MB083397AAE278279137D8819297E20@BL2PR20MB0833.namprd20.prod.outlook.com>

Tim,

At Major League Baseball, we are focused mostly on archiving the broadcast game video feeds, along with pregame, postgame, and individual camera iso feeds for each game.  The content includes both the home and away team broadcasts, with and without graphics.  Essentially, we record 7 hours plus of content for every 1 hour of baseball played.   We also record and archive all the MLB Network content that is produced, which is between 12 - 18 hours of live content per day.  We will archive the entire broadcast show of record, and the individual elements that make up a show.

All in, we are recording over 1,000 hours of content per day.  This equates to 50+ TB of content being added to our archive per day.

We have both an active on-line disk tier (2 SAN's - each 2.88 PB) for recording, editing, and on-line storage, and a data tape archive that supports Partial File Restore (PFR) of video files.  We load balance recording content across the two SAN's... American League on one SAN, and National League on the other... and all edits (96 high performance / 54 desktop machines) access both SAN's.

Once content is written to a SAN, it is auto archived to tape, as per our DIAMOND asset management system (home grown).  We started archiving on LTO-4 in 2008, and are currently on Oracle T10000-D.  We are migrating content from LTO-4 to T10K-D tape within a tape group...

We have both an 'On-Site' tape sub-group, and an 'Off-Site' tape sub-group for each of our Tape Groups.  Tape Groups include "Games with Graphics" (Dirty) and "Games without Graphics" (Clean)... the Dirty off-site tapes go to a separate off site location than the Clean off-site tapes.  We break up all of our Off-Site Tape Groups between two geographically distributed locations, as well.

We are using the Oracle DIVArichive middleware, which performs a checksum value that is compared to the stored database value, each time a file is copied, moved, or restored.  We are performing between 1,000 to 2,000 PFR / Restores per day.

Currently we have over 45,000 LTO-4's and over 10,000 T10K-D tapes, growing at the rate of 125,000 hours of content per year.

If you would like more details regarding archiving video content, feel free to reach out to me.

Sincerely,

Tab


Tab Butler | Sr. Director - Media Management & Post Production| MLB Network | 40 Hartz Way, Suite 10 | Secaucus, NJ  07094
(201) 520-6252 Office | (646) 498-1662 Cell

tab.butler at mlb.com


-----Original Message-----
From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Stern, Randy
Sent: Friday, May 12, 2017 1:59 PM
To: Sheila Morrissey <Sheila.Morrissey at ithaka.org>; pasig-discuss at asis.org
Subject: Re: [Pasig-discuss] FW: Digital repository storage benchmarking

Harvard is similar ? 2 disk copies in geographically distributed sites on, and one tape copy in a third location. We also have a 4th copy on tape in a tape library that is creating the tapes we remove off site to the third location. We run fixity checks on the disk copies, but not the tape copy. We currently have in excess of 200TB for each copy.

We currently store preservation and real-time access copies of files in the same storage system with the same storage policies. We expect that to change in the future, with likely delivery copy storage in the cloud.

Randy

On 5/12/17, 1:43 PM, "Sheila Morrissey" <Sheila.Morrissey at ithaka.org> wrote:


    Hello, Tim,

    At Portico (http://www.portico.org/digital-preservation/), we preserve e-journals, e-books, digitized historical collections, and other born-digital scholarly content.

    Currently, the Portico archive is comprised of roughly 77.7 million digital objects (we call them "Archival Units", or AUs); comprising over 400 TB; made up of 1.3 billion files.

    We maintain 3 copies of the archive:  2 on disk in geographically distributed data centers, and a 3rd copy in commercial cloud storage.  We create and maintain backups (including fixity checks) using our own custom-written software.

    I hope this helpful.

    Best regards,
    Sheila


    Sheila M. Morrissey
    Senior Researcher
    ITHAKA
    100 Campus Drive
    Suite 100
    Princeton NJ 08540
    609-986-2221
    sheila.morrissey at ithaka.org

    ITHAKA (www.ithaka.org) is a not-for-profit organization that helps the academic community use digital technologies to preserve the scholarly record and to advance research and teaching in sustainable ways.  We provide innovative services that benefit higher education, including Ithaka S+R, JSTOR, and Portico.


    -----Original Message-----
    From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Tim Walsh
    Sent: Friday, May 12, 2017 10:16 AM
    To: pasig-discuss at asis.org
    Subject: [Pasig-discuss] Digital repository storage benchmarking

    Dear PASIG,

    I am currently in the process of benchmarking digital repository storage setups with our Director of IT, and am having trouble finding very much information about other institutions? configurations online. It?s very possible that this question has been asked before on-list, but I wasn?t able to find anything in the list archives.

    For context, we are a research museum with significant born-digital archival holdings preparing to manage about 200 TB of digital objects over the next 3 years, replicated several times on various media. The question is what precisely those ?various media? will be. Currently, our plan is to store one copy on disk on-site, one copy on disk in a managed off-site facility, and a third copy on LTO sent to a third facility. Before we commit, we?d like to benchmark our plans against other institutions.

    I have been able to find information about the storage configurations for MoMA and the Computer History Museum (who each wrote blog posts or presented on this topic), but not very many others. So my questions are:

    * Could you point me to published/available resources outlining other institutions? digital repository storage configurations?
    * Or, if you work at an institution, would you be willing to share the details of your configuration on- or off-list? (any information sent off-list will be kept strictly confidential)

    Helpful details would include: amount of digital objects being stored; how many copies of data are being stored; which copies are online, nearline, or offline; which media are being used for which copies; and what services/software applications are you using to manage the creation and maintainance of backups.

    Thank you!
    Tim

    - - -

    Tim Walsh
    Archiviste, Archives num?riques
    Archivist, Digital Archives

    Centre Canadien d?Architecture
    Canadian Centre for Architecture
    1920, rue Baile, Montr?al, Qu?bec  H3H 2S6 T 514 939 7001 x 1532 F 514 939 7020 www.cca.qc.ca<http://www.cca.qc.ca/>


    Pensez ? l?environnement avant d?imprimer ce message Please consider the environment before printing this email Ce courriel peut contenir des renseignements confidentiels. Si vous n??tes pas le destinataire pr?vu, veuillez nous en aviser imm?diatement. Merci ?galement de supprimer le pr?sent courriel et d?en d?truire toute copie.
    This email may contain confidential information. If you are not the intended recipient, please advise us immediately and delete this email as well as any other copy. Thank you.

    ----
    To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
    _______
    PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
    _______________________________________________
    Pasig-discuss mailing list
    Pasig-discuss at mail.asis.org
    http://mail.asis.org/mailman/listinfo/pasig-discuss

    ----
    To subscribe, unsubscribe, or modify your subscription, please visit
    http://mail.asis.org/mailman/listinfo/pasig-discuss
    _______
    PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
    _______________________________________________
    Pasig-discuss mailing list
    Pasig-discuss at mail.asis.org
    http://mail.asis.org/mailman/listinfo/pasig-discuss


----
To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org
http://mail.asis.org/mailman/listinfo/pasig-discuss


From neil at jefferies.org  Fri May 12 16:42:55 2017
From: neil at jefferies.org (Neil Jefferies)
Date: Fri, 12 May 2017 21:42:55 +0100
Subject: [Pasig-discuss] WORM (Write Once Read Many) AIPs
In-Reply-To: <d896375e1b9957d6b0c01a2e997855ad@mail.gmail.com>
References: <FBB24FE94D8FB245B07DD35074C8C1FF8E689447@s0395g.scotland.gov.uk>
	<7290d680d2d83ae9c5d4a88371bb6147@imap.plus.net>
	<63d06e35b40be1c7d0ff6e5613950844@mail.gmail.com>
	<df48f56e821e44c1a642e897ed62ce67@imap.plus.net>
	<d896375e1b9957d6b0c01a2e997855ad@mail.gmail.com>
Message-ID: <81d799913e0c44c2d1d46d9ddd9fbd23@imap.plus.net>

Jacob,

This is the key point of my argument - the definition of object you have 
is not the definition of an object that an archive wants to preserve. 
I'm speaking for people like Tim and I - others are quite happy to build 
what I term bit-museums.

Likewise, what you consider preservation (immutability of a bitstream) 
is not quite the same as ours - retention of knowledge content - which 
requires mutability but with immutable previous versions and 
provenance/audit records.

As long as this disconnect between technology and requirements remains 
the case, object stores are actually of limited use for us in 
preservation and archiving without considerable additional work. The 
'metadata' that most object stores support (key-value pairs) is pretty 
useless as far as our metadata requirements go - in the end we have to 
store XML or triples as separate files/objects. This was an issue when I 
reviewed the StorageTek 5800 code builds way back and frankly object 
storage hasn't moved on much.

Fedora, for all its faults, does actually provide an object view that is 
meaningful - something that can be a node in a linked-data graph. It can 
be arbitrarily complex but equally, could comprise only metadata. It is 
almost never a file.

Neil

On 2017-05-12 20:29, Jacob Farmer wrote:
> Hi, Neil.  Great points.  Indeed, hard links only work in a single file
> system, but they continue pointing to and fro when a file is otherwise 
> moved
> or renamed.
> 
> I personally think of POSIX file systems as object stores that have 
> weak
> addressing, limited metadata, and that offer mutability as the default.
> 
> My preferred definition of an object store is a device that stores 
> objects.
> My preferred definition of an object is any piece of data that can be
> individually addressed and manipulated.
> So, by that definition, POSIX file systems are object stores, so are 
> hard
> drives.  So is Microsoft exchange, etc.
> 
> If you name a file according to a hash or a UUID (the hash could be the
> UUID), then you have a form of persistent address.  As long as no one 
> messes
> with your file system, the address scheme stays intact.
> 
> 
> -----Original Message-----
> From: Neil Jefferies [mailto:neil at jefferies.org]
> Sent: Friday, May 12, 2017 11:25 AM
> To: Jacob Farmer
> Subject: RE: [Pasig-discuss] WORM (Write Once Read Many) AIPs
> 
> Good point on the housekeeping!
> 
> Most (reasonable) filesystems allow you specify the inode numbers at
> creation but yes, it is hard to change afterwards!
> 
> But I would really, really avoid hard links - they only work within a 
> single
> filesystem so they can't be used in tiered or virtual storage systems 
> and
> even break quota controls on regular filesystems. Scale up thus becomes 
> very
> difficult with hard links. Symlinks also make it explicit when you are
> dealing with a reference and can tell you which version of the object 
> held
> the original - useful provenance that hard links don't capture.
> 
> My personal feeling is no for hashes, yes for UUID's (or other suitably
> unique object ID). This allows us to keep all versions of an object in 
> the
> same root path even though it varies. And don't store at a file level - 
> this
> shotguns object fragments all over the store and make rebuilds 
> horrible.
> Many current object stores do this - and consequently don't version
> effectively - I wish people would understand objects are not files. 
> UUID's
> are also consistent in terms of computational time and hashes very much
> aren't.
> 
> There's a big difference in robustness between needing just filesystem
> metadata to find an object in storage and requiring filesystem metadata
> (because underneath all object stores are filesystems - even Seagates
> "object" hard drives), object store metadata to map paths to hashes, 
> and
> object metadata to find all the bits that make up a composite object.
> 
> ...and yes, I am saying that most object store vendors have got it 
> wrong. At
> least as far as archiving is concerned. And they ought to consider why 
> every
> object store ends up presenting itself as a POSIX filesystem.
> 
> Neil
> 
> 
> On 2017-05-12 14:33, Jacob Farmer wrote:
>> Two warnings and two suggestions:
>> 
>> Warnings:
>> 
>> 1)  Symlinks and Housekeeping -- It is a common practice to use
>> symlinks to make versioned file collections.  If you do this, you
>> should have some kind of housekeeping processes that ensure that the
>> symlinks are all working correctly.  If files ever have to get
>> migrated, symlinks can break.
>> 
>> 2)  Check with your file system vendor -- Most removable media file
>> systems have some built in limitations on the number of inodes (files)
>> that you can have in one file system.  If you generate a lot of
>> symlinks, you might overwhelm the file system.  Your vendor will know.
>> 
>> Suggestions:
>> 
>> 1)  Hashes for file names -- If your application software maintains a
>> hash for each file, you might consider naming the file according to
>> the hash.
>> Use the first two digits for the parent directory, the next two digits
>> for sub-diretory, the next two digits for sub-directory.  Then use the
>> full hash for the file name.  This turns your POSIX file system into
>> an object store with uniquely named objects.
>> 
>> 	As a safeguard, you might maintain a separate table or list that
>> associates path names with hashes.
>> 
>> 2)  Consider using hard links instead of symlinks -- You might use
>> hard links instead of symlinks, presuming that the files are all in
>> the same file system.  You still have to watch for file count issues,
>> but you have less housekeeping to do.
>> 
>> I hope that helps.
>> 
>> 
>> Jacob Farmer  |  Chief Technology Officer  |  Cambridge Computer  |
>> "Artists In Data Storage"
>> Phone 781-250-3210  |  jfarmer at CambridgeComputer.com  |
>> www.CambridgeComputer.com
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf
>> Of Neil Jefferies
>> Sent: Friday, May 12, 2017 8:06 AM
>> To: Tim.Gollins at nrscotland.gov.uk
>> Cc: pasig-discuss at mail.asis.org
>> Subject: Re: [Pasig-discuss] WORM (Write Once Read Many) AIPs
>> 
>> Tim,
>> 
>> If we store AIP's unpackaged, as a collection of files in a folder,
>> then object updates could just be a new folder with symlinks to the
>> unchanged parts and the updated parts in place in the folder. The
>> object "location"
>> would be a parent folder for all these version folders - for example,
>> a pairtree (or triple-tree for faster scanning/rebuilds) based on
>> object UUID.
>> Version folders would be named accoprding to date or version number
>> (date might make Memento compliant access simpler).
>> Creating anew version clones the current verion (including links) with
>> a new name and then replaces the updated parts in situ. Final act is
>> to update a "current" symlink in the object. Any update failure will
>> mean "current"
>> is
>> not updated an the partial clone can be discarded.
>> 
>> This assumes most updates are metadata and that a diff won't save much
>> compared to a complete new XML file or whatever. I am also assuming
>> that metadata won't be wrappered either (so you can forget METS) so
>> that different types are stored in the most stuiable format and are
>> accessed only when required. The problems with roundtripping packaged
>> AIP's for updates rather than diff-ing are repeated by METS
>> wrappering.
>> 
>> These may be a virtual folder/filesytem presentation and underneath an
>> HSM would retrieve files from wherever when it is actually accessed.
>> HSM policy in soemthing like SAM-QFS/Versity/Cray TAS can ensure
>> folders are kep intact when moved to other storage (we could even
>> dereference symlinks when dealing with tape).
>> 
>> This can be done with a POSIX filesystem and not muich code - Ben
>> O'Steen started something along these lines here:
>> https://github.com/dataflow/RDFDatabank/wiki/What-is-DataBank-and-what
>> -does-it-do%3F
>> 
>> Fedora also also a versioning object store that could support this
>> kind of model but also adds a fair bit of complexity to be
>> Linked-Data_platform compliant.
>> 
>> In my paralance I would probably equate "Minimal Ingest" with "Sheer
>> Curation" and APT with Asynchronous Message Driven Workers.
>> 
>> Neil
>> 
>> 
>> On 2017-05-12 12:33, Tim.Gollins at nrscotland.gov.uk wrote:
>>> Dear PASIG
>>> 
>>> I have been thinking recently about the challenge of managing
>>> "physical"  AIPs on offline or near line storage and how to optimise
>>> or simplify the use of managed storage media in a tape based
>>> (robotic) Hierarchical Storage Management (HSM) system. By "physical"
>>> AIPs I mean that the actual structure of the AIP written to the
>>> storage system is sufficiently self-describing that even if the
>>> management or other elements of a DP system were to be lost to a
>>> disaster then the entire collection could be fully re-instated
>>> reliably from the stored AIPs alone.
>>> 
>>> I have also been thinking about the huge benefits of adopting the
>>> concepts of "Minimal Ingest" (MI) and "Autonomous Preservation Tools"
>>> (APT) in a new Digital Archive solution.
>>> 
>>> One of the potential effects of the MI and APT concepts is that over
>>> time it is clear that while (of course) the original bit streams will
>>> never need to be updated, the metadata packaged in the AIP will need
>>> to change relatively often (through the life of the AIP) . This is of
>>> course in addition to any new renderings of the bit streams produced
>>> for preservation purposes (manifestations as termed in some systems).
>>> 
>>> If to update the AIP the process involves the AIP being "loaded" and
>>> "Modified" and "Stored" again as a whole then this will result in
>>> significant "churn" of the offline or near line media (i.e. tapes) in
>>> a HSM - which I would like to avoid. I think it would be really great
>>> if the AIP representation could accommodate the concept of an "update
>>> IP" (perhaps UIP?) where the UIP contains a "delta" of the original
>>> AIP - the full AIP then being interpreted as the original as modified
>>> by a series of deltas. This would then effectively result in AIPs
>>> (and
>>> UIPs) becoming WORM objects with clear benefits that I perceive in
>>> managing their reliable and safe storage.
>>> 
>>> I am not sufficiently familiar with the detail of all the different
>>> AIP models or implementations, I was wondering if anyone in the team
>>> would be able to comment on whether the they know of any AIP models,
>>> specifications or implementations that  would support such a use 
>>> case.
>>> 
>>> I have just posted a version of this question to the E-Ark Linked in
>>> Group so my apologies to those who see it twice.
>>> 
>>> Many thanks
>>> 
>>> Tim
>>> Tim Gollins | Head of Digital Archiving and Director of the NRS
>>> Digital Preservation Programme National Records of Scotland | West
>>> Register House | Edinburgh EH2 4DF
>>> + 44 (0)131 535 1431 / + 44 (0)7974 922614 |
>>> tim.gollins at nrscotland.gov.uk | www.nrscotland.gov.uk
>>> 
>>> Preserving the past | Recording the present | Informing the future
>>> Follow us on Twitter: @NatRecordsScot |
>>> http://twitter.com/NatRecordsScot
>>> 
>>> 
>>> *********************************************************************
>>> * This e-mail (and any files or other attachments transmitted with
>>> it) is intended solely for the attention of the addressee(s).
>>> Unauthorised use, disclosure, storage, copying or distribution of any
>>> part of this e-mail is not permitted. If you are not the intended
>>> recipient please destroy the email, remove any copies from your
>>> system and inform the sender immediately by return.
>>> 
>>> Communications with the Scottish Government may be monitored or
>>> recorded in order to secure the effective operation of the system and
>>> for other lawful purposes. The views or opinions contained within
>>> this e-mail may not necessarily reflect those of the Scottish 
>>> Government.
>>> 
>>> 
>>> Tha am post-d seo (agus faidhle neo ceanglan  c?mhla ris) dhan neach
>>> neo luchd-ainmichte a-mh?in. Chan eil e ceadaichte a chleachdadh ann
>>> an d?igh sam bith, a? toirt a-steach c?raichean, foillseachadh neo
>>> sgaoileadh,  gun chead. Ma ?s e is gun d?fhuair sibh seo le gun
>>> fhiosd?, bu choir cur ?s dhan phost-d agus lethbhreac sam bith air an
>>> t-siostam agaibh, leig fios chun  neach a sgaoil am post-d  gun d?il.
>>> 
>>> Dh?fhaodadh gum bi teachdaireachd sam bith bho Riaghaltas na h-Alba
>>> air a chl?radh neo air a sgr?dadh airson dearbhadh gu bheil an
>>> siostam ag obair gu h-?ifeachdach neo airson adhbhar laghail eile.
>>> Dh?fhaodadh nach  eil beachdan anns a? phost-d seo co-ionann ri
>>> beachdan Riaghaltas na h-Alba.
>>> *********************************************************************
>>> *
>>> 
>>> 
>>> 
>>> ----
>>> To subscribe, unsubscribe, or modify your subscription, please visit
>>> http://mail.asis.org/mailman/listinfo/pasig-discuss
>>> _______
>>> PASIG Webinars and conference material is at
>>> http://www.preservationandarchivingsig.org/index.html
>>> _______________________________________________
>>> Pasig-discuss mailing list
>>> Pasig-discuss at mail.asis.org
>>> http://mail.asis.org/mailman/listinfo/pasig-discuss
>> 
>> ----
>> To subscribe, unsubscribe, or modify your subscription, please visit
>> http://mail.asis.org/mailman/listinfo/pasig-discuss
>> _______
>> PASIG Webinars and conference material is at
>> http://www.preservationandarchivingsig.org/index.html
>> _______________________________________________
>> Pasig-discuss mailing list
>> Pasig-discuss at mail.asis.org
>> http://mail.asis.org/mailman/listinfo/pasig-discuss


From jfarmer at cambridgecomputer.com  Fri May 12 16:51:06 2017
From: jfarmer at cambridgecomputer.com (Jacob Farmer)
Date: Fri, 12 May 2017 16:51:06 -0400
Subject: [Pasig-discuss] WORM (Write Once Read Many) AIPs
In-Reply-To: <81d799913e0c44c2d1d46d9ddd9fbd23@imap.plus.net>
References: <FBB24FE94D8FB245B07DD35074C8C1FF8E689447@s0395g.scotland.gov.uk>
	<7290d680d2d83ae9c5d4a88371bb6147@imap.plus.net>	<63d06e35b40be1c7d0ff6e5613950844@mail.gmail.com>
	<df48f56e821e44c1a642e897ed62ce67@imap.plus.net>	<d896375e1b9957d6b0c01a2e997855ad@mail.gmail.com>
	<81d799913e0c44c2d1d46d9ddd9fbd23@imap.plus.net>
Message-ID: <945351555fe73d699a190d9d7d4fd135@mail.gmail.com>

Great point.  I think of the whole things as a stack.  There is the metadata
and bits that defines an object from the preservation point of view.  Then
there is a storage device that defines an object a specific set of bits to
serve up.

In the case of my software, Starfish, we think of ourselves as a middleware
that can define the object in some intermediate form.

At the end of the day, though, an object is any piece of data that can be
addressed and manipulated.  That piece of data should have a permanent
address, unique identifiers, and some metadata that gives it meaning.


-----Original Message-----
From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of
Neil Jefferies
Sent: Friday, May 12, 2017 4:43 PM
To: Jacob Farmer <jfarmer at cambridgecomputer.com>
Subject: Re: [Pasig-discuss] WORM (Write Once Read Many) AIPs

Jacob,

This is the key point of my argument - the definition of object you have is
not the definition of an object that an archive wants to preserve.
I'm speaking for people like Tim and I - others are quite happy to build
what I term bit-museums.

Likewise, what you consider preservation (immutability of a bitstream) is
not quite the same as ours - retention of knowledge content - which requires
mutability but with immutable previous versions and provenance/audit
records.

As long as this disconnect between technology and requirements remains the
case, object stores are actually of limited use for us in preservation and
archiving without considerable additional work. The 'metadata' that most
object stores support (key-value pairs) is pretty useless as far as our
metadata requirements go - in the end we have to store XML or triples as
separate files/objects. This was an issue when I reviewed the StorageTek
5800 code builds way back and frankly object storage hasn't moved on much.

Fedora, for all its faults, does actually provide an object view that is
meaningful - something that can be a node in a linked-data graph. It can be
arbitrarily complex but equally, could comprise only metadata. It is almost
never a file.

Neil

On 2017-05-12 20:29, Jacob Farmer wrote:
> Hi, Neil.  Great points.  Indeed, hard links only work in a single
> file system, but they continue pointing to and fro when a file is
> otherwise moved or renamed.
>
> I personally think of POSIX file systems as object stores that have
> weak addressing, limited metadata, and that offer mutability as the
> default.
>
> My preferred definition of an object store is a device that stores
> objects.
> My preferred definition of an object is any piece of data that can be
> individually addressed and manipulated.
> So, by that definition, POSIX file systems are object stores, so are
> hard drives.  So is Microsoft exchange, etc.
>
> If you name a file according to a hash or a UUID (the hash could be
> the UUID), then you have a form of persistent address.  As long as no
> one messes with your file system, the address scheme stays intact.
>
>
> -----Original Message-----
> From: Neil Jefferies [mailto:neil at jefferies.org]
> Sent: Friday, May 12, 2017 11:25 AM
> To: Jacob Farmer
> Subject: RE: [Pasig-discuss] WORM (Write Once Read Many) AIPs
>
> Good point on the housekeeping!
>
> Most (reasonable) filesystems allow you specify the inode numbers at
> creation but yes, it is hard to change afterwards!
>
> But I would really, really avoid hard links - they only work within a
> single filesystem so they can't be used in tiered or virtual storage
> systems and even break quota controls on regular filesystems. Scale up
> thus becomes very difficult with hard links. Symlinks also make it
> explicit when you are dealing with a reference and can tell you which
> version of the object held the original - useful provenance that hard
> links don't capture.
>
> My personal feeling is no for hashes, yes for UUID's (or other
> suitably unique object ID). This allows us to keep all versions of an
> object in the same root path even though it varies. And don't store at
> a file level - this shotguns object fragments all over the store and
> make rebuilds horrible.
> Many current object stores do this - and consequently don't version
> effectively - I wish people would understand objects are not files.
> UUID's
> are also consistent in terms of computational time and hashes very
> much aren't.
>
> There's a big difference in robustness between needing just filesystem
> metadata to find an object in storage and requiring filesystem
> metadata (because underneath all object stores are filesystems - even
> Seagates "object" hard drives), object store metadata to map paths to
> hashes, and object metadata to find all the bits that make up a
> composite object.
>
> ...and yes, I am saying that most object store vendors have got it
> wrong. At least as far as archiving is concerned. And they ought to
> consider why every object store ends up presenting itself as a POSIX
> filesystem.
>
> Neil
>
>
> On 2017-05-12 14:33, Jacob Farmer wrote:
>> Two warnings and two suggestions:
>>
>> Warnings:
>>
>> 1)  Symlinks and Housekeeping -- It is a common practice to use
>> symlinks to make versioned file collections.  If you do this, you
>> should have some kind of housekeeping processes that ensure that the
>> symlinks are all working correctly.  If files ever have to get
>> migrated, symlinks can break.
>>
>> 2)  Check with your file system vendor -- Most removable media file
>> systems have some built in limitations on the number of inodes
>> (files) that you can have in one file system.  If you generate a lot
>> of symlinks, you might overwhelm the file system.  Your vendor will know.
>>
>> Suggestions:
>>
>> 1)  Hashes for file names -- If your application software maintains a
>> hash for each file, you might consider naming the file according to
>> the hash.
>> Use the first two digits for the parent directory, the next two
>> digits for sub-diretory, the next two digits for sub-directory.  Then
>> use the full hash for the file name.  This turns your POSIX file
>> system into an object store with uniquely named objects.
>>
>> 	As a safeguard, you might maintain a separate table or list that
>> associates path names with hashes.
>>
>> 2)  Consider using hard links instead of symlinks -- You might use
>> hard links instead of symlinks, presuming that the files are all in
>> the same file system.  You still have to watch for file count issues,
>> but you have less housekeeping to do.
>>
>> I hope that helps.
>>
>>
>> Jacob Farmer  |  Chief Technology Officer  |  Cambridge Computer  |
>> "Artists In Data Storage"
>> Phone 781-250-3210  |  jfarmer at CambridgeComputer.com  |
>> www.CambridgeComputer.com
>>
>>
>>
>>
>> -----Original Message-----
>> From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf
>> Of Neil Jefferies
>> Sent: Friday, May 12, 2017 8:06 AM
>> To: Tim.Gollins at nrscotland.gov.uk
>> Cc: pasig-discuss at mail.asis.org
>> Subject: Re: [Pasig-discuss] WORM (Write Once Read Many) AIPs
>>
>> Tim,
>>
>> If we store AIP's unpackaged, as a collection of files in a folder,
>> then object updates could just be a new folder with symlinks to the
>> unchanged parts and the updated parts in place in the folder. The
>> object "location"
>> would be a parent folder for all these version folders - for example,
>> a pairtree (or triple-tree for faster scanning/rebuilds) based on
>> object UUID.
>> Version folders would be named accoprding to date or version number
>> (date might make Memento compliant access simpler).
>> Creating anew version clones the current verion (including links)
>> with a new name and then replaces the updated parts in situ. Final
>> act is to update a "current" symlink in the object. Any update
>> failure will mean "current"
>> is
>> not updated an the partial clone can be discarded.
>>
>> This assumes most updates are metadata and that a diff won't save
>> much compared to a complete new XML file or whatever. I am also
>> assuming that metadata won't be wrappered either (so you can forget
>> METS) so that different types are stored in the most stuiable format
>> and are accessed only when required. The problems with roundtripping
>> packaged AIP's for updates rather than diff-ing are repeated by METS
>> wrappering.
>>
>> These may be a virtual folder/filesytem presentation and underneath
>> an HSM would retrieve files from wherever when it is actually accessed.
>> HSM policy in soemthing like SAM-QFS/Versity/Cray TAS can ensure
>> folders are kep intact when moved to other storage (we could even
>> dereference symlinks when dealing with tape).
>>
>> This can be done with a POSIX filesystem and not muich code - Ben
>> O'Steen started something along these lines here:
>> https://github.com/dataflow/RDFDatabank/wiki/What-is-DataBank-and-wha
>> t
>> -does-it-do%3F
>>
>> Fedora also also a versioning object store that could support this
>> kind of model but also adds a fair bit of complexity to be
>> Linked-Data_platform compliant.
>>
>> In my paralance I would probably equate "Minimal Ingest" with "Sheer
>> Curation" and APT with Asynchronous Message Driven Workers.
>>
>> Neil
>>
>>
>> On 2017-05-12 12:33, Tim.Gollins at nrscotland.gov.uk wrote:
>>> Dear PASIG
>>>
>>> I have been thinking recently about the challenge of managing
>>> "physical"  AIPs on offline or near line storage and how to optimise
>>> or simplify the use of managed storage media in a tape based
>>> (robotic) Hierarchical Storage Management (HSM) system. By "physical"
>>> AIPs I mean that the actual structure of the AIP written to the
>>> storage system is sufficiently self-describing that even if the
>>> management or other elements of a DP system were to be lost to a
>>> disaster then the entire collection could be fully re-instated
>>> reliably from the stored AIPs alone.
>>>
>>> I have also been thinking about the huge benefits of adopting the
>>> concepts of "Minimal Ingest" (MI) and "Autonomous Preservation Tools"
>>> (APT) in a new Digital Archive solution.
>>>
>>> One of the potential effects of the MI and APT concepts is that over
>>> time it is clear that while (of course) the original bit streams
>>> will never need to be updated, the metadata packaged in the AIP will
>>> need to change relatively often (through the life of the AIP) . This
>>> is of course in addition to any new renderings of the bit streams
>>> produced for preservation purposes (manifestations as termed in some
>>> systems).
>>>
>>> If to update the AIP the process involves the AIP being "loaded" and
>>> "Modified" and "Stored" again as a whole then this will result in
>>> significant "churn" of the offline or near line media (i.e. tapes)
>>> in a HSM - which I would like to avoid. I think it would be really
>>> great if the AIP representation could accommodate the concept of an
>>> "update IP" (perhaps UIP?) where the UIP contains a "delta" of the
>>> original AIP - the full AIP then being interpreted as the original
>>> as modified by a series of deltas. This would then effectively
>>> result in AIPs (and
>>> UIPs) becoming WORM objects with clear benefits that I perceive in
>>> managing their reliable and safe storage.
>>>
>>> I am not sufficiently familiar with the detail of all the different
>>> AIP models or implementations, I was wondering if anyone in the team
>>> would be able to comment on whether the they know of any AIP models,
>>> specifications or implementations that  would support such a use
>>> case.
>>>
>>> I have just posted a version of this question to the E-Ark Linked in
>>> Group so my apologies to those who see it twice.
>>>
>>> Many thanks
>>>
>>> Tim
>>> Tim Gollins | Head of Digital Archiving and Director of the NRS
>>> Digital Preservation Programme National Records of Scotland | West
>>> Register House | Edinburgh EH2 4DF
>>> + 44 (0)131 535 1431 / + 44 (0)7974 922614 |
>>> tim.gollins at nrscotland.gov.uk | www.nrscotland.gov.uk
>>>
>>> Preserving the past | Recording the present | Informing the future
>>> Follow us on Twitter: @NatRecordsScot |
>>> http://twitter.com/NatRecordsScot
>>>
>>>
>>> ********************************************************************
>>> *
>>> * This e-mail (and any files or other attachments transmitted with
>>> it) is intended solely for the attention of the addressee(s).
>>> Unauthorised use, disclosure, storage, copying or distribution of
>>> any part of this e-mail is not permitted. If you are not the
>>> intended recipient please destroy the email, remove any copies from
>>> your system and inform the sender immediately by return.
>>>
>>> Communications with the Scottish Government may be monitored or
>>> recorded in order to secure the effective operation of the system
>>> and for other lawful purposes. The views or opinions contained
>>> within this e-mail may not necessarily reflect those of the Scottish
>>> Government.
>>>
>>>
>>> Tha am post-d seo (agus faidhle neo ceanglan  c?mhla ris) dhan neach
>>> neo luchd-ainmichte a-mh?in. Chan eil e ceadaichte a chleachdadh ann
>>> an d?igh sam bith, a? toirt a-steach c?raichean, foillseachadh neo
>>> sgaoileadh,  gun chead. Ma ?s e is gun d?fhuair sibh seo le gun
>>> fhiosd?, bu choir cur ?s dhan phost-d agus lethbhreac sam bith air
>>> an t-siostam agaibh, leig fios chun  neach a sgaoil am post-d  gun d?il.
>>>
>>> Dh?fhaodadh gum bi teachdaireachd sam bith bho Riaghaltas na h-Alba
>>> air a chl?radh neo air a sgr?dadh airson dearbhadh gu bheil an
>>> siostam ag obair gu h-?ifeachdach neo airson adhbhar laghail eile.
>>> Dh?fhaodadh nach  eil beachdan anns a? phost-d seo co-ionann ri
>>> beachdan Riaghaltas na h-Alba.
>>> ********************************************************************
>>> *
>>> *
>>>
>>>
>>>
>>> ----
>>> To subscribe, unsubscribe, or modify your subscription, please visit
>>> http://mail.asis.org/mailman/listinfo/pasig-discuss
>>> _______
>>> PASIG Webinars and conference material is at
>>> http://www.preservationandarchivingsig.org/index.html
>>> _______________________________________________
>>> Pasig-discuss mailing list
>>> Pasig-discuss at mail.asis.org
>>> http://mail.asis.org/mailman/listinfo/pasig-discuss
>>
>> ----
>> To subscribe, unsubscribe, or modify your subscription, please visit
>> http://mail.asis.org/mailman/listinfo/pasig-discuss
>> _______
>> PASIG Webinars and conference material is at
>> http://www.preservationandarchivingsig.org/index.html
>> _______________________________________________
>> Pasig-discuss mailing list
>> Pasig-discuss at mail.asis.org
>> http://mail.asis.org/mailman/listinfo/pasig-discuss

----
To subscribe, unsubscribe, or modify your subscription, please visit
http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at
http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org
http://mail.asis.org/mailman/listinfo/pasig-discuss


From twalsh at cca.qc.ca  Fri May 12 17:15:15 2017
From: twalsh at cca.qc.ca (Tim Walsh)
Date: Fri, 12 May 2017 21:15:15 +0000
Subject: [Pasig-discuss] FW: Digital repository storage benchmarking
In-Reply-To: <BL2PR20MB083397AAE278279137D8819297E20@BL2PR20MB0833.namprd20.prod.outlook.com>
References: <85DD9199-4D53-4754-8802-6C3171BD3BC4@harvard.edu>
	<BL2PR20MB083397AAE278279137D8819297E20@BL2PR20MB0833.namprd20.prod.outlook.com>
Message-ID: <25E51070-A934-448B-ADDD-B33AFE88F8A4@cca.qc.ca>

Thank you to Tab, Randy, Sheila, Richard, et al. Very interesting and helpful responses!

Best,
Tim

- - -
 
Tim Walsh
Archiviste, Archives num?riques
Archivist, Digital Archives
 
Centre Canadien d?Architecture
Canadian Centre for Architecture
T 514 939 7001 x 1532
www.cca.qc.ca <http://www.cca.qc.ca/>

On 2017-05-12, 2:43 PM, "Pasig-discuss on behalf of Butler, Tab" <pasig-discuss-bounces at asis.org on behalf of tab.butler at mlb.com> wrote:

    Tim,
    
    At Major League Baseball, we are focused mostly on archiving the broadcast game video feeds, along with pregame, postgame, and individual camera iso feeds for each game.  The content includes both the home and away team broadcasts, with and without graphics.  Essentially, we record 7 hours plus of content for every 1 hour of baseball played.   We also record and archive all the MLB Network content that is produced, which is between 12 - 18 hours of live content per day.  We will archive the entire broadcast show of record, and the individual elements that make up a show.
    
    All in, we are recording over 1,000 hours of content per day.  This equates to 50+ TB of content being added to our archive per day.
    
    We have both an active on-line disk tier (2 SAN's - each 2.88 PB) for recording, editing, and on-line storage, and a data tape archive that supports Partial File Restore (PFR) of video files.  We load balance recording content across the two SAN's... American League on one SAN, and National League on the other... and all edits (96 high performance / 54 desktop machines) access both SAN's.
    
    Once content is written to a SAN, it is auto archived to tape, as per our DIAMOND asset management system (home grown).  We started archiving on LTO-4 in 2008, and are currently on Oracle T10000-D.  We are migrating content from LTO-4 to T10K-D tape within a tape group...
    
    We have both an 'On-Site' tape sub-group, and an 'Off-Site' tape sub-group for each of our Tape Groups.  Tape Groups include "Games with Graphics" (Dirty) and "Games without Graphics" (Clean)... the Dirty off-site tapes go to a separate off site location than the Clean off-site tapes.  We break up all of our Off-Site Tape Groups between two geographically distributed locations, as well.
    
    We are using the Oracle DIVArichive middleware, which performs a checksum value that is compared to the stored database value, each time a file is copied, moved, or restored.  We are performing between 1,000 to 2,000 PFR / Restores per day.
    
    Currently we have over 45,000 LTO-4's and over 10,000 T10K-D tapes, growing at the rate of 125,000 hours of content per year.
    
    If you would like more details regarding archiving video content, feel free to reach out to me.
    
    Sincerely,
    
    Tab
    
    
    Tab Butler | Sr. Director - Media Management & Post Production| MLB Network | 40 Hartz Way, Suite 10 | Secaucus, NJ  07094
    (201) 520-6252 Office | (646) 498-1662 Cell
    
    tab.butler at mlb.com
    
    
    -----Original Message-----
    From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Stern, Randy
    Sent: Friday, May 12, 2017 1:59 PM
    To: Sheila Morrissey <Sheila.Morrissey at ithaka.org>; pasig-discuss at asis.org
    Subject: Re: [Pasig-discuss] FW: Digital repository storage benchmarking
    
    Harvard is similar ? 2 disk copies in geographically distributed sites on, and one tape copy in a third location. We also have a 4th copy on tape in a tape library that is creating the tapes we remove off site to the third location. We run fixity checks on the disk copies, but not the tape copy. We currently have in excess of 200TB for each copy.
    
    We currently store preservation and real-time access copies of files in the same storage system with the same storage policies. We expect that to change in the future, with likely delivery copy storage in the cloud.
    
    Randy
    
    On 5/12/17, 1:43 PM, "Sheila Morrissey" <Sheila.Morrissey at ithaka.org> wrote:
    
    
        Hello, Tim,
    
        At Portico (http://www.portico.org/digital-preservation/), we preserve e-journals, e-books, digitized historical collections, and other born-digital scholarly content.
    
        Currently, the Portico archive is comprised of roughly 77.7 million digital objects (we call them "Archival Units", or AUs); comprising over 400 TB; made up of 1.3 billion files.
    
        We maintain 3 copies of the archive:  2 on disk in geographically distributed data centers, and a 3rd copy in commercial cloud storage.  We create and maintain backups (including fixity checks) using our own custom-written software.
    
        I hope this helpful.
    
        Best regards,
        Sheila
    
    
        Sheila M. Morrissey
        Senior Researcher
        ITHAKA
        100 Campus Drive
        Suite 100
        Princeton NJ 08540
        609-986-2221
        sheila.morrissey at ithaka.org
    
        ITHAKA (www.ithaka.org) is a not-for-profit organization that helps the academic community use digital technologies to preserve the scholarly record and to advance research and teaching in sustainable ways.  We provide innovative services that benefit higher education, including Ithaka S+R, JSTOR, and Portico.
    
    
        -----Original Message-----
        From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Tim Walsh
        Sent: Friday, May 12, 2017 10:16 AM
        To: pasig-discuss at asis.org
        Subject: [Pasig-discuss] Digital repository storage benchmarking
    
        Dear PASIG,
    
        I am currently in the process of benchmarking digital repository storage setups with our Director of IT, and am having trouble finding very much information about other institutions? configurations online. It?s very possible that this question has been asked before on-list, but I wasn?t able to find anything in the list archives.
    
        For context, we are a research museum with significant born-digital archival holdings preparing to manage about 200 TB of digital objects over the next 3 years, replicated several times on various media. The question is what precisely those ?various media? will be. Currently, our plan is to store one copy on disk on-site, one copy on disk in a managed off-site facility, and a third copy on LTO sent to a third facility. Before we commit, we?d like to benchmark our plans against other institutions.
    
        I have been able to find information about the storage configurations for MoMA and the Computer History Museum (who each wrote blog posts or presented on this topic), but not very many others. So my questions are:
    
        * Could you point me to published/available resources outlining other institutions? digital repository storage configurations?
        * Or, if you work at an institution, would you be willing to share the details of your configuration on- or off-list? (any information sent off-list will be kept strictly confidential)
    
        Helpful details would include: amount of digital objects being stored; how many copies of data are being stored; which copies are online, nearline, or offline; which media are being used for which copies; and what services/software applications are you using to manage the creation and maintainance of backups.
    
        Thank you!
        Tim
    
        - - -
    
        Tim Walsh
        Archiviste, Archives num?riques
        Archivist, Digital Archives
    
        Centre Canadien d?Architecture
        Canadian Centre for Architecture
        1920, rue Baile, Montr?al, Qu?bec  H3H 2S6 T 514 939 7001 x 1532 F 514 939 7020 www.cca.qc.ca<http://www.cca.qc.ca/>
    
    
        Pensez ? l?environnement avant d?imprimer ce message Please consider the environment before printing this email Ce courriel peut contenir des renseignements confidentiels. Si vous n??tes pas le destinataire pr?vu, veuillez nous en aviser imm?diatement. Merci ?galement de supprimer le pr?sent courriel et d?en d?truire toute copie.
        This email may contain confidential information. If you are not the intended recipient, please advise us immediately and delete this email as well as any other copy. Thank you.
    
        ----
        To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
        _______
        PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
        _______________________________________________
        Pasig-discuss mailing list
        Pasig-discuss at mail.asis.org
        http://mail.asis.org/mailman/listinfo/pasig-discuss
    
        ----
        To subscribe, unsubscribe, or modify your subscription, please visit
        http://mail.asis.org/mailman/listinfo/pasig-discuss
        _______
        PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
        _______________________________________________
        Pasig-discuss mailing list
        Pasig-discuss at mail.asis.org
        http://mail.asis.org/mailman/listinfo/pasig-discuss
    
    
    ----
    To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
    _______
    PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
    _______________________________________________
    Pasig-discuss mailing list
    Pasig-discuss at mail.asis.org
    http://mail.asis.org/mailman/listinfo/pasig-discuss
    
    
    ----
    To subscribe, unsubscribe, or modify your subscription, please visit
    http://mail.asis.org/mailman/listinfo/pasig-discuss
    _______
    PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
    _______________________________________________
    Pasig-discuss mailing list
    Pasig-discuss at mail.asis.org
    http://mail.asis.org/mailman/listinfo/pasig-discuss
    

From jmorley at stanford.edu  Fri May 12 17:28:30 2017
From: jmorley at stanford.edu (Julian M. Morley)
Date: Fri, 12 May 2017 21:28:30 +0000
Subject: [Pasig-discuss] FW: Digital repository storage benchmarking
In-Reply-To: <25E51070-A934-448B-ADDD-B33AFE88F8A4@cca.qc.ca>
References: <85DD9199-4D53-4754-8802-6C3171BD3BC4@harvard.edu>
	<BL2PR20MB083397AAE278279137D8819297E20@BL2PR20MB0833.namprd20.prod.outlook.com>
	<25E51070-A934-448B-ADDD-B33AFE88F8A4@cca.qc.ca>
Message-ID: <CD425D86-D76A-45F3-BB16-195CEA98F99A@stanford.edu>


Tim,

Moab - used here at Stanford Libraries - is a POSIX-based paradigm that allows incremental updates without involving symlinks. We use it in conjunction with UUIDs (not hashes) and Fedora to define the AIPs used in the Stanford Digital Repository.

There?s a white paper describing Moab here:
http://journal.code4lib.org/articles/8482#2.5


-- 
Julian M. Morley
Technology Infrastructure Manager
Digital Library Systems & Services
Stanford University Libraries


On 5/12/17, 2:15 PM, "Pasig-discuss on behalf of Tim Walsh" <pasig-discuss-bounces at asis.org on behalf of twalsh at cca.qc.ca> wrote:

>Thank you to Tab, Randy, Sheila, Richard, et al. Very interesting and helpful responses!
>
>Best,
>Tim
>
>- - -
> 
>Tim Walsh
>Archiviste, Archives num?riques
>Archivist, Digital Archives
> 
>Centre Canadien d?Architecture
>Canadian Centre for Architecture
>T 514 939 7001 x 1532
>www.cca.qc.ca <http://www.cca.qc.ca/>
>
>On 2017-05-12, 2:43 PM, "Pasig-discuss on behalf of Butler, Tab" <pasig-discuss-bounces at asis.org on behalf of tab.butler at mlb.com> wrote:
>
>    Tim,
>    
>    At Major League Baseball, we are focused mostly on archiving the broadcast game video feeds, along with pregame, postgame, and individual camera iso feeds for each game.  The content includes both the home and away team broadcasts, with and without graphics.  Essentially, we record 7 hours plus of content for every 1 hour of baseball played.   We also record and archive all the MLB Network content that is produced, which is between 12 - 18 hours of live content per day.  We will archive the entire broadcast show of record, and the individual elements that make up a show.
>    
>    All in, we are recording over 1,000 hours of content per day.  This equates to 50+ TB of content being added to our archive per day.
>    
>    We have both an active on-line disk tier (2 SAN's - each 2.88 PB) for recording, editing, and on-line storage, and a data tape archive that supports Partial File Restore (PFR) of video files.  We load balance recording content across the two SAN's... American League on one SAN, and National League on the other... and all edits (96 high performance / 54 desktop machines) access both SAN's.
>    
>    Once content is written to a SAN, it is auto archived to tape, as per our DIAMOND asset management system (home grown).  We started archiving on LTO-4 in 2008, and are currently on Oracle T10000-D.  We are migrating content from LTO-4 to T10K-D tape within a tape group...
>    
>    We have both an 'On-Site' tape sub-group, and an 'Off-Site' tape sub-group for each of our Tape Groups.  Tape Groups include "Games with Graphics" (Dirty) and "Games without Graphics" (Clean)... the Dirty off-site tapes go to a separate off site location than the Clean off-site tapes.  We break up all of our Off-Site Tape Groups between two geographically distributed locations, as well.
>    
>    We are using the Oracle DIVArichive middleware, which performs a checksum value that is compared to the stored database value, each time a file is copied, moved, or restored.  We are performing between 1,000 to 2,000 PFR / Restores per day.
>    
>    Currently we have over 45,000 LTO-4's and over 10,000 T10K-D tapes, growing at the rate of 125,000 hours of content per year.
>    
>    If you would like more details regarding archiving video content, feel free to reach out to me.
>    
>    Sincerely,
>    
>    Tab
>    
>    
>    
>    Tab Butler | Sr. Director - Media Management & Post Production| MLB Network | 40 Hartz Way, Suite 10 | Secaucus, NJ  07094
>    (201) 520-6252 Office | (646) 498-1662 Cell
>    
>    tab.butler at mlb.com
>    
>    
>    -----Original Message-----
>    From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Stern, Randy
>    Sent: Friday, May 12, 2017 1:59 PM
>    To: Sheila Morrissey <Sheila.Morrissey at ithaka.org>; pasig-discuss at asis.org
>    Subject: Re: [Pasig-discuss] FW: Digital repository storage benchmarking
>    
>    Harvard is similar ? 2 disk copies in geographically distributed sites on, and one tape copy in a third location. We also have a 4th copy on tape in a tape library that is creating the tapes we remove off site to the third location. We run fixity checks on the disk copies, but not the tape copy. We currently have in excess of 200TB for each copy.
>    
>    We currently store preservation and real-time access copies of files in the same storage system with the same storage policies. We expect that to change in the future, with likely delivery copy storage in the cloud.
>    
>    Randy
>    
>    On 5/12/17, 1:43 PM, "Sheila Morrissey" <Sheila.Morrissey at ithaka.org> wrote:
>    
>    
>        Hello, Tim,
>    
>        At Portico (http://www.portico.org/digital-preservation/), we preserve e-journals, e-books, digitized historical collections, and other born-digital scholarly content.
>    
>        Currently, the Portico archive is comprised of roughly 77.7 million digital objects (we call them "Archival Units", or AUs); comprising over 400 TB; made up of 1.3 billion files.
>    
>        We maintain 3 copies of the archive:  2 on disk in geographically distributed data centers, and a 3rd copy in commercial cloud storage.  We create and maintain backups (including fixity checks) using our own custom-written software.
>    
>        I hope this helpful.
>    
>        Best regards,
>        Sheila
>    
>    
>        Sheila M. Morrissey
>        Senior Researcher
>        ITHAKA
>        100 Campus Drive
>        Suite 100
>        Princeton NJ 08540
>        609-986-2221
>        sheila.morrissey at ithaka.org
>    
>        ITHAKA (www.ithaka.org) is a not-for-profit organization that helps the academic community use digital technologies to preserve the scholarly record and to advance research and teaching in sustainable ways.  We provide innovative services that benefit higher education, including Ithaka S+R, JSTOR, and Portico.
>    
>    
>    
>        -----Original Message-----
>        From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Tim Walsh
>        Sent: Friday, May 12, 2017 10:16 AM
>        To: pasig-discuss at asis.org
>        Subject: [Pasig-discuss] Digital repository storage benchmarking
>    
>        Dear PASIG,
>    
>        I am currently in the process of benchmarking digital repository storage setups with our Director of IT, and am having trouble finding very much information about other institutions? configurations online. It?s very possible that this question has been asked before on-list, but I wasn?t able to find anything in the list archives.
>    
>        For context, we are a research museum with significant born-digital archival holdings preparing to manage about 200 TB of digital objects over the next 3 years, replicated several times on various media. The question is what precisely those ?various media? will be. Currently, our plan is to store one copy on disk on-site, one copy on disk in a managed off-site facility, and a third copy on LTO sent to a third facility. Before we commit, we?d like to benchmark our plans against other institutions.
>    
>        I have been able to find information about the storage configurations for MoMA and the Computer History Museum (who each wrote blog posts or presented on this topic), but not very many others. So my questions are:
>    
>        * Could you point me to published/available resources outlining other institutions? digital repository storage configurations?
>        * Or, if you work at an institution, would you be willing to share the details of your configuration on- or off-list? (any information sent off-list will be kept strictly confidential)
>    
>        Helpful details would include: amount of digital objects being stored; how many copies of data are being stored; which copies are online, nearline, or offline; which media are being used for which copies; and what services/software applications are you using to manage the creation and maintainance of backups.
>    
>        Thank you!
>        Tim
>    
>        - - -
>    
>        Tim Walsh
>        Archiviste, Archives num?riques
>        Archivist, Digital Archives
>    
>        Centre Canadien d?Architecture
>        Canadian Centre for Architecture
>        1920, rue Baile, Montr?al, Qu?bec  H3H 2S6 T 514 939 7001 x 1532 F 514 939 7020 www.cca.qc.ca<http://www.cca.qc.ca/>
>    
>    
>        Pensez ? l?environnement avant d?imprimer ce message Please consider the environment before printing this email Ce courriel peut contenir des renseignements confidentiels. Si vous n??tes pas le destinataire pr?vu, veuillez nous en aviser imm?diatement. Merci ?galement de supprimer le pr?sent courriel et d?en d?truire toute copie.
>        This email may contain confidential information. If you are not the intended recipient, please advise us immediately and delete this email as well as any other copy. Thank you.
>    
>        ----
>        To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
>        _______
>        PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
>        _______________________________________________
>        Pasig-discuss mailing list
>        Pasig-discuss at mail.asis.org
>        http://mail.asis.org/mailman/listinfo/pasig-discuss
>    
>        ----
>        To subscribe, unsubscribe, or modify your subscription, please visit
>        http://mail.asis.org/mailman/listinfo/pasig-discuss
>        _______
>        PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
>        _______________________________________________
>        Pasig-discuss mailing list
>        Pasig-discuss at mail.asis.org
>        http://mail.asis.org/mailman/listinfo/pasig-discuss
>    
>    
>    
>    ----
>    To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
>    _______
>    PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
>    _______________________________________________
>    Pasig-discuss mailing list
>    Pasig-discuss at mail.asis.org
>    http://mail.asis.org/mailman/listinfo/pasig-discuss
>    
>    
>    ----
>    To subscribe, unsubscribe, or modify your subscription, please visit
>    http://mail.asis.org/mailman/listinfo/pasig-discuss
>    _______
>    PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
>    _______________________________________________
>    Pasig-discuss mailing list
>    Pasig-discuss at mail.asis.org
>    http://mail.asis.org/mailman/listinfo/pasig-discuss
>    
>
>
>----
>To subscribe, unsubscribe, or modify your subscription, please visit
>http://mail.asis.org/mailman/listinfo/pasig-discuss
>_______
>PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
>_______________________________________________
>Pasig-discuss mailing list
>Pasig-discuss at mail.asis.org
>http://mail.asis.org/mailman/listinfo/pasig-discuss


From jmorley at stanford.edu  Fri May 12 17:36:55 2017
From: jmorley at stanford.edu (Julian M. Morley)
Date: Fri, 12 May 2017 21:36:55 +0000
Subject: [Pasig-discuss] WORM (Write Once Read Many) AIPs
Message-ID: <53A9BB8F-F59C-4370-A7CC-BC490969BAEA@stanford.edu>


Apologies - I replied to the wrong Tim!


-- 
Julian M. Morley
Technology Infrastructure Manager
Digital Library Systems & Services
Stanford University Libraries


On 5/12/17, 2:28 PM, "Julian M. Morley" <jmorley at stanford.edu> wrote:

>
>Tim,
>
>Moab - used here at Stanford Libraries - is a POSIX-based paradigm that allows incremental updates without involving symlinks. We use it in conjunction with UUIDs (not hashes) and Fedora to define the AIPs used in the Stanford Digital Repository.
>
>There?s a white paper describing Moab here:
>http://journal.code4lib.org/articles/8482#2.5
>
>
>-- 
>Julian M. Morley
>Technology Infrastructure Manager
>Digital Library Systems & Services
>Stanford University Libraries
>
>


From jonathan.tilbury at preservica.com  Sun May 14 06:54:59 2017
From: jonathan.tilbury at preservica.com (Jonathan Tilbury)
Date: Sun, 14 May 2017 10:54:59 +0000
Subject: [Pasig-discuss] WORM (Write Once Read Many) AIPs
In-Reply-To: <945351555fe73d699a190d9d7d4fd135@mail.gmail.com>
References: <FBB24FE94D8FB245B07DD35074C8C1FF8E689447@s0395g.scotland.gov.uk>
	<7290d680d2d83ae9c5d4a88371bb6147@imap.plus.net>
	<63d06e35b40be1c7d0ff6e5613950844@mail.gmail.com>
	<df48f56e821e44c1a642e897ed62ce67@imap.plus.net>
	<d896375e1b9957d6b0c01a2e997855ad@mail.gmail.com>
	<81d799913e0c44c2d1d46d9ddd9fbd23@imap.plus.net>
	<945351555fe73d699a190d9d7d4fd135@mail.gmail.com>
Message-ID: <HE1PR01MB15315B4CF679459FA8D9212BF1E00@HE1PR01MB1531.eurprd01.prod.exchangelabs.com>

Tim,

I have always thought of the "autonomous AIP" zipped up and held on a storage device as an residue of paper-thinking. When dealing with paper storage it is possible to bundle up the papers and some description and put it in a box onto a shelf. If you need the artefact, you get all of the box. The paper is unlikely to be updated of changed during its lifetime. 

This really does not map well onto the digital world. There a lots of changes that result in the "API" being changed, for example changes in descriptive metadata, structure (parentage), security settings, technical metadata (during a re-characterisation) and audit trail. You may also add extra files to the API and most importantly generate new representations for access or digital masters following a migration. This makes the idea of a single immutable AIP redundant. 

Addressing this we need to ask why are we worrying. I think you answered this well by saying the content plus all of the metadata listed above must be accessible outside of whatever system you are using to re-build the collection should disaster happen or should you want to change system provider. To enable this you need all of the digital objects plus metadata (description, technical, security, structure, audit trail, fixity) to be held in a place and in a way that can be machine read. This does not imply physical zipped AIPs, just that the data is there and is understandable. 

Physical (zipped) AIPs are difficult to work with. Whenever you need to access a file you need to unpack the zip which is cumbersome and slow. This happens for download, rendering, and fixity checking. This overhead has no benefit and several risks. Also, it brings into question what fixity checking actually means when the storage container is being changed all the time. These problems become particularly acute when we have to address the large flat collections we are now seeing more of. 

I have always thought a better approach is to save the digital objects (files) in an object store (for example a file drive, tape store, cloud storage), and to make sure these never change using fixity validation. All of the metadata can be written to the object store as well, and either updated or new versions written as it is updated. These digital objects (files and metadata) can be stored in multiple locations in different technologies. 

In Preservica we support both approaches through the range of storage adapters we include. Each has its own way of renaming the digital objects, but the use of objects with a UUID naming convention is preferred. We strongly recommend against the use of physical APIs. All of the objects, once stored, can then be checked for fixity on a rotating basis or when accessed. By storing to multiple storage adapters you can even self-heal if someone does mess with your file system.

As for exiting the system, we allow cloud edition users to replicate all of the content plus metadata to a remote store using SFTP in such a way that the physical directory structure mimics the logical collection structure. If they want to leave they have all the content safe in a place of their choosing.

I would very interested I people's comments on whether we should still support Physical (zipped) AIPs. 

Jon

=============
Jon Tilbury
CTO, Preserivca
=============


-----Original Message-----
From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Neil Jefferies
Sent: Friday, May 12, 2017 4:43 PM
To: Jacob Farmer <jfarmer at cambridgecomputer.com>
Subject: Re: [Pasig-discuss] WORM (Write Once Read Many) AIPs

Jacob,

This is the key point of my argument - the definition of object you have is not the definition of an object that an archive wants to preserve.
I'm speaking for people like Tim and I - others are quite happy to build what I term bit-museums.

Likewise, what you consider preservation (immutability of a bitstream) is not quite the same as ours - retention of knowledge content - which requires mutability but with immutable previous versions and provenance/audit records.

As long as this disconnect between technology and requirements remains the case, object stores are actually of limited use for us in preservation and archiving without considerable additional work. The 'metadata' that most object stores support (key-value pairs) is pretty useless as far as our metadata requirements go - in the end we have to store XML or triples as separate files/objects. This was an issue when I reviewed the StorageTek
5800 code builds way back and frankly object storage hasn't moved on much.

Fedora, for all its faults, does actually provide an object view that is meaningful - something that can be a node in a linked-data graph. It can be arbitrarily complex but equally, could comprise only metadata. It is almost never a file.

Neil

On 2017-05-12 20:29, Jacob Farmer wrote:
> Hi, Neil.  Great points.  Indeed, hard links only work in a single 
> file system, but they continue pointing to and fro when a file is 
> otherwise moved or renamed.
>
> I personally think of POSIX file systems as object stores that have 
> weak addressing, limited metadata, and that offer mutability as the 
> default.
>
> My preferred definition of an object store is a device that stores 
> objects.
> My preferred definition of an object is any piece of data that can be 
> individually addressed and manipulated.
> So, by that definition, POSIX file systems are object stores, so are 
> hard drives.  So is Microsoft exchange, etc.
>
> If you name a file according to a hash or a UUID (the hash could be 
> the UUID), then you have a form of persistent address.  As long as no 
> one messes with your file system, the address scheme stays intact.
>
>
> -----Original Message-----
> From: Neil Jefferies [mailto:neil at jefferies.org]
> Sent: Friday, May 12, 2017 11:25 AM
> To: Jacob Farmer
> Subject: RE: [Pasig-discuss] WORM (Write Once Read Many) AIPs
>
> Good point on the housekeeping!
>
> Most (reasonable) filesystems allow you specify the inode numbers at 
> creation but yes, it is hard to change afterwards!
>
> But I would really, really avoid hard links - they only work within a 
> single filesystem so they can't be used in tiered or virtual storage 
> systems and even break quota controls on regular filesystems. Scale up 
> thus becomes very difficult with hard links. Symlinks also make it 
> explicit when you are dealing with a reference and can tell you which 
> version of the object held the original - useful provenance that hard 
> links don't capture.
>
> My personal feeling is no for hashes, yes for UUID's (or other 
> suitably unique object ID). This allows us to keep all versions of an 
> object in the same root path even though it varies. And don't store at 
> a file level - this shotguns object fragments all over the store and 
> make rebuilds horrible.
> Many current object stores do this - and consequently don't version 
> effectively - I wish people would understand objects are not files.
> UUID's
> are also consistent in terms of computational time and hashes very 
> much aren't.
>
> There's a big difference in robustness between needing just filesystem 
> metadata to find an object in storage and requiring filesystem 
> metadata (because underneath all object stores are filesystems - even 
> Seagates "object" hard drives), object store metadata to map paths to 
> hashes, and object metadata to find all the bits that make up a 
> composite object.
>
> ...and yes, I am saying that most object store vendors have got it 
> wrong. At least as far as archiving is concerned. And they ought to 
> consider why every object store ends up presenting itself as a POSIX 
> filesystem.
>
> Neil
>
>
> On 2017-05-12 14:33, Jacob Farmer wrote:
>> Two warnings and two suggestions:
>>
>> Warnings:
>>
>> 1)  Symlinks and Housekeeping -- It is a common practice to use 
>> symlinks to make versioned file collections.  If you do this, you 
>> should have some kind of housekeeping processes that ensure that the 
>> symlinks are all working correctly.  If files ever have to get 
>> migrated, symlinks can break.
>>
>> 2)  Check with your file system vendor -- Most removable media file 
>> systems have some built in limitations on the number of inodes
>> (files) that you can have in one file system.  If you generate a lot 
>> of symlinks, you might overwhelm the file system.  Your vendor will know.
>>
>> Suggestions:
>>
>> 1)  Hashes for file names -- If your application software maintains a 
>> hash for each file, you might consider naming the file according to 
>> the hash.
>> Use the first two digits for the parent directory, the next two 
>> digits for sub-diretory, the next two digits for sub-directory.  Then 
>> use the full hash for the file name.  This turns your POSIX file 
>> system into an object store with uniquely named objects.
>>
>> 	As a safeguard, you might maintain a separate table or list that 
>> associates path names with hashes.
>>
>> 2)  Consider using hard links instead of symlinks -- You might use 
>> hard links instead of symlinks, presuming that the files are all in 
>> the same file system.  You still have to watch for file count issues, 
>> but you have less housekeeping to do.
>>
>> I hope that helps.
>>
>>
>> Jacob Farmer  |  Chief Technology Officer  |  Cambridge Computer  | 
>> "Artists In Data Storage"
>> Phone 781-250-3210  |  jfarmer at CambridgeComputer.com  | 
>> www.CambridgeComputer.com
>>
>>
>>
>>
>> -----Original Message-----
>> From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf 
>> Of Neil Jefferies
>> Sent: Friday, May 12, 2017 8:06 AM
>> To: Tim.Gollins at nrscotland.gov.uk
>> Cc: pasig-discuss at mail.asis.org
>> Subject: Re: [Pasig-discuss] WORM (Write Once Read Many) AIPs
>>
>> Tim,
>>
>> If we store AIP's unpackaged, as a collection of files in a folder, 
>> then object updates could just be a new folder with symlinks to the 
>> unchanged parts and the updated parts in place in the folder. The 
>> object "location"
>> would be a parent folder for all these version folders - for example, 
>> a pairtree (or triple-tree for faster scanning/rebuilds) based on 
>> object UUID.
>> Version folders would be named accoprding to date or version number 
>> (date might make Memento compliant access simpler).
>> Creating anew version clones the current verion (including links) 
>> with a new name and then replaces the updated parts in situ. Final 
>> act is to update a "current" symlink in the object. Any update 
>> failure will mean "current"
>> is
>> not updated an the partial clone can be discarded.
>>
>> This assumes most updates are metadata and that a diff won't save 
>> much compared to a complete new XML file or whatever. I am also 
>> assuming that metadata won't be wrappered either (so you can forget
>> METS) so that different types are stored in the most stuiable format 
>> and are accessed only when required. The problems with roundtripping 
>> packaged AIP's for updates rather than diff-ing are repeated by METS 
>> wrappering.
>>
>> These may be a virtual folder/filesytem presentation and underneath 
>> an HSM would retrieve files from wherever when it is actually accessed.
>> HSM policy in soemthing like SAM-QFS/Versity/Cray TAS can ensure 
>> folders are kep intact when moved to other storage (we could even 
>> dereference symlinks when dealing with tape).
>>
>> This can be done with a POSIX filesystem and not muich code - Ben 
>> O'Steen started something along these lines here:
>> https://github.com/dataflow/RDFDatabank/wiki/What-is-DataBank-and-wha
>> t
>> -does-it-do%3F
>>
>> Fedora also also a versioning object store that could support this 
>> kind of model but also adds a fair bit of complexity to be 
>> Linked-Data_platform compliant.
>>
>> In my paralance I would probably equate "Minimal Ingest" with "Sheer 
>> Curation" and APT with Asynchronous Message Driven Workers.
>>
>> Neil
>>
>>
>> On 2017-05-12 12:33, Tim.Gollins at nrscotland.gov.uk wrote:
>>> Dear PASIG
>>>
>>> I have been thinking recently about the challenge of managing 
>>> "physical"  AIPs on offline or near line storage and how to optimise 
>>> or simplify the use of managed storage media in a tape based
>>> (robotic) Hierarchical Storage Management (HSM) system. By "physical"
>>> AIPs I mean that the actual structure of the AIP written to the 
>>> storage system is sufficiently self-describing that even if the 
>>> management or other elements of a DP system were to be lost to a 
>>> disaster then the entire collection could be fully re-instated 
>>> reliably from the stored AIPs alone.
>>>
>>> I have also been thinking about the huge benefits of adopting the 
>>> concepts of "Minimal Ingest" (MI) and "Autonomous Preservation Tools"
>>> (APT) in a new Digital Archive solution.
>>>
>>> One of the potential effects of the MI and APT concepts is that over 
>>> time it is clear that while (of course) the original bit streams 
>>> will never need to be updated, the metadata packaged in the AIP will 
>>> need to change relatively often (through the life of the AIP) . This 
>>> is of course in addition to any new renderings of the bit streams 
>>> produced for preservation purposes (manifestations as termed in some 
>>> systems).
>>>
>>> If to update the AIP the process involves the AIP being "loaded" and 
>>> "Modified" and "Stored" again as a whole then this will result in 
>>> significant "churn" of the offline or near line media (i.e. tapes) 
>>> in a HSM - which I would like to avoid. I think it would be really 
>>> great if the AIP representation could accommodate the concept of an 
>>> "update IP" (perhaps UIP?) where the UIP contains a "delta" of the 
>>> original AIP - the full AIP then being interpreted as the original 
>>> as modified by a series of deltas. This would then effectively 
>>> result in AIPs (and
>>> UIPs) becoming WORM objects with clear benefits that I perceive in 
>>> managing their reliable and safe storage.
>>>
>>> I am not sufficiently familiar with the detail of all the different 
>>> AIP models or implementations, I was wondering if anyone in the team 
>>> would be able to comment on whether the they know of any AIP models, 
>>> specifications or implementations that  would support such a use 
>>> case.
>>>
>>> I have just posted a version of this question to the E-Ark Linked in 
>>> Group so my apologies to those who see it twice.
>>>
>>> Many thanks
>>>
>>> Tim
>>> Tim Gollins | Head of Digital Archiving and Director of the NRS 
>>> Digital Preservation Programme National Records of Scotland | West 
>>> Register House | Edinburgh EH2 4DF
>>> + 44 (0)131 535 1431 / + 44 (0)7974 922614 |
>>> tim.gollins at nrscotland.gov.uk | www.nrscotland.gov.uk
>>>
>>> Preserving the past | Recording the present | Informing the future 
>>> Follow us on Twitter: @NatRecordsScot | 
>>> http://twitter.com/NatRecordsScot
>>>
>>>
>>> ********************************************************************
>>> *
>>> * This e-mail (and any files or other attachments transmitted with
>>> it) is intended solely for the attention of the addressee(s).
>>> Unauthorised use, disclosure, storage, copying or distribution of 
>>> any part of this e-mail is not permitted. If you are not the 
>>> intended recipient please destroy the email, remove any copies from 
>>> your system and inform the sender immediately by return.
>>>
>>> Communications with the Scottish Government may be monitored or 
>>> recorded in order to secure the effective operation of the system 
>>> and for other lawful purposes. The views or opinions contained 
>>> within this e-mail may not necessarily reflect those of the Scottish 
>>> Government.
>>>
>>>
>>> Tha am post-d seo (agus faidhle neo ceanglan  c?mhla ris) dhan neach 
>>> neo luchd-ainmichte a-mh?in. Chan eil e ceadaichte a chleachdadh ann 
>>> an d?igh sam bith, a? toirt a-steach c?raichean, foillseachadh neo 
>>> sgaoileadh,  gun chead. Ma ?s e is gun d?fhuair sibh seo le gun 
>>> fhiosd?, bu choir cur ?s dhan phost-d agus lethbhreac sam bith air 
>>> an t-siostam agaibh, leig fios chun  neach a sgaoil am post-d  gun d?il.
>>>
>>> Dh?fhaodadh gum bi teachdaireachd sam bith bho Riaghaltas na h-Alba 
>>> air a chl?radh neo air a sgr?dadh airson dearbhadh gu bheil an 
>>> siostam ag obair gu h-?ifeachdach neo airson adhbhar laghail eile.
>>> Dh?fhaodadh nach  eil beachdan anns a? phost-d seo co-ionann ri 
>>> beachdan Riaghaltas na h-Alba.
>>> ********************************************************************
>>> *
>>> *
>>>
>>>
>>>
>>> ----
>>> To subscribe, unsubscribe, or modify your subscription, please visit 
>>> http://mail.asis.org/mailman/listinfo/pasig-discuss
>>> _______
>>> PASIG Webinars and conference material is at 
>>> http://www.preservationandarchivingsig.org/index.html
>>> _______________________________________________
>>> Pasig-discuss mailing list
>>> Pasig-discuss at mail.asis.org
>>> http://mail.asis.org/mailman/listinfo/pasig-discuss
>>
>> ----
>> To subscribe, unsubscribe, or modify your subscription, please visit 
>> http://mail.asis.org/mailman/listinfo/pasig-discuss
>> _______
>> PASIG Webinars and conference material is at 
>> http://www.preservationandarchivingsig.org/index.html
>> _______________________________________________
>> Pasig-discuss mailing list
>> Pasig-discuss at mail.asis.org
>> http://mail.asis.org/mailman/listinfo/pasig-discuss

----
To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org
http://mail.asis.org/mailman/listinfo/pasig-discuss

----
To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org
http://mail.asis.org/mailman/listinfo/pasig-discuss


From peter.burnhill at ed.ac.uk  Fri May 12 08:25:21 2017
From: peter.burnhill at ed.ac.uk (BURNHILL Peter)
Date: Fri, 12 May 2017 12:25:21 +0000
Subject: [Pasig-discuss] WORM (Write Once Read Many) AIPs
In-Reply-To: <FBB24FE94D8FB245B07DD35074C8C1FF8E6894FD@s0395g.scotland.gov.uk>
References: <FBB24FE94D8FB245B07DD35074C8C1FF8E689447@s0395g.scotland.gov.uk>
	<7290d680d2d83ae9c5d4a88371bb6147@imap.plus.net>,
	<FBB24FE94D8FB245B07DD35074C8C1FF8E6894FD@s0395g.scotland.gov.uk>
Message-ID: <1E70721C-7F4C-4163-A910-8857FCE8286B@ed.ac.uk>

Yes, I appreciated that too.

Peter


 Peter Burnhill

University of Edinburgh


Mobile: +44 (0) 774 0763 119<tel:+44%20774%200763%20119>

ps Am writing 'on the go' so pl excuse brevity

On 12 May 2017, at 1:23 pm, "Tim.Gollins at nrscotland.gov.uk<mailto:Tim.Gollins at nrscotland.gov.uk>" <Tim.Gollins at nrscotland.gov.uk<mailto:Tim.Gollins at nrscotland.gov.uk>> wrote:

Hi Neil

Brilliant - Most helpful and thought provoking. The fact that Fedora has the idea of a versioning Object store is particularly interesting.

I think there are a couple of distinctions between Minimal Ingest and Sheer Curation but (from a quick glance at Google articles) they are appear very closely related. I think APT uses something like Asynchronous Message Driven Workers.

Very many thanks indeed,  especially for such a swift an comprehensive response.

Tim

Tim Gollins | Head of Digital Archiving and Director of the NRS Digital Preservation Programme
National Records of Scotland | West Register House | Edinburgh EH2 4DF
+ 44 (0)131 535 1431 / + 44 (0)7974 922614 | tim.gollins at nrscotland.gov.uk<mailto:tim.gollins at nrscotland.gov.uk> | www.nrscotland.gov.uk<http://www.nrscotland.gov.uk>

Preserving the past | Recording the present | Informing the future
Follow us on Twitter: @NatRecordsScot | http://twitter.com/NatRecordsScot


-----Original Message-----
From: Neil Jefferies [mailto:neil at jefferies.org]
Sent: 12 May 2017 13:06
To: Gollins T (Tim)
Cc: pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org>
Subject: Re: [Pasig-discuss] WORM (Write Once Read Many) AIPs

Tim,

If we store AIP's unpackaged, as a collection of files in a folder, then
object updates could just be a new folder with symlinks to the unchanged
parts and the updated parts in place in the folder. The object
"location" would be a parent folder for all these version folders - for
example, a pairtree (or triple-tree for faster scanning/rebuilds) based
on object UUID. Version folders would be named accoprding to date or
version number (date might make Memento compliant access simpler).
Creating anew version clones the current verion (including links) with a
new name and then replaces the updated parts in situ. Final act is to
update a "current" symlink in the object. Any update failure will mean
"current" is not updated an the partial clone can be discarded.

This assumes most updates are metadata and that a diff won't save much
compared to a complete new XML file or whatever. I am also assuming that
metadata won't be wrappered either (so you can forget METS) so that
different types are stored in the most stuiable format and are accessed
only when required. The problems with roundtripping packaged AIP's for
updates rather than diff-ing are repeated by METS wrappering.

These may be a virtual folder/filesytem presentation and underneath an
HSM would retrieve files from wherever when it is actually accessed. HSM
policy in soemthing like SAM-QFS/Versity/Cray TAS can ensure folders are
kep intact when moved to other storage (we could even dereference
symlinks when dealing with tape).

This can be done with a POSIX filesystem and not muich code - Ben
O'Steen started something along these lines here:
https://github.com/dataflow/RDFDatabank/wiki/What-is-DataBank-and-what-does-it-do%3F

Fedora also also a versioning object store that could support this kind
of model but also adds a fair bit of complexity to be
Linked-Data_platform compliant.

In my paralance I would probably equate "Minimal Ingest" with "Sheer
Curation" and APT with Asynchronous Message Driven Workers.

Neil


On 2017-05-12 12:33, Tim.Gollins at nrscotland.gov.uk<mailto:Tim.Gollins at nrscotland.gov.uk> wrote:
Dear PASIG

I have been thinking recently about the challenge of managing
"physical"  AIPs on offline or near line storage and how to optimise
or simplify the use of managed storage media in a tape based (robotic)
Hierarchical Storage Management (HSM) system. By "physical" AIPs I
mean that the actual structure of the AIP written to the storage
system is sufficiently self-describing that even if the management or
other elements of a DP system were to be lost to a disaster then the
entire collection could be fully re-instated reliably from the stored
AIPs alone.

I have also been thinking about the huge benefits of adopting the
concepts of "Minimal Ingest" (MI) and "Autonomous Preservation Tools"
(APT) in a new Digital Archive solution.

One of the potential effects of the MI and APT concepts is that over
time it is clear that while (of course) the original bit streams will
never need to be updated, the metadata packaged in the AIP will need
to change relatively often (through the life of the AIP) . This is of
course in addition to any new renderings of the bit streams produced
for preservation purposes (manifestations as termed in some systems).

If to update the AIP the process involves the AIP being "loaded" and
"Modified" and "Stored" again as a whole then this will result in
significant "churn" of the offline or near line media (i.e. tapes) in
a HSM - which I would like to avoid. I think it would be really great
if the AIP representation could accommodate the concept of an "update
IP" (perhaps UIP?) where the UIP contains a "delta" of the original
AIP - the full AIP then being interpreted as the original as modified
by a series of deltas. This would then effectively result in AIPs (and
UIPs) becoming WORM objects with clear benefits that I perceive in
managing their reliable and safe storage.

I am not sufficiently familiar with the detail of all the different
AIP models or implementations, I was wondering if anyone in the team
would be able to comment on whether the they know of any AIP models,
specifications or implementations that  would support such a use case.

I have just posted a version of this question to the E-Ark Linked in
Group so my apologies to those who see it twice.

Many thanks

Tim
Tim Gollins | Head of Digital Archiving and Director of the NRS
Digital Preservation Programme
National Records of Scotland | West Register House | Edinburgh EH2 4DF
+ 44 (0)131 535 1431 / + 44 (0)7974 922614 |
tim.gollins at nrscotland.gov.uk<mailto:tim.gollins at nrscotland.gov.uk> | www.nrscotland.gov.uk<http://www.nrscotland.gov.uk>

Preserving the past | Recording the present | Informing the future
Follow us on Twitter: @NatRecordsScot |
http://twitter.com/NatRecordsScot


**********************************************************************
This e-mail (and any files or other attachments transmitted with it)
is intended solely for the attention of the addressee(s). Unauthorised
use, disclosure, storage, copying or distribution of any part of this
e-mail is not permitted. If you are not the intended recipient please
destroy the email, remove any copies from your system and inform the
sender immediately by return.

Communications with the Scottish Government may be monitored or
recorded in order to secure the effective operation of the system and
for other lawful purposes. The views or opinions contained within this
e-mail may not necessarily reflect those of the Scottish Government.


Tha am post-d seo (agus faidhle neo ceanglan  c?mhla ris) dhan neach
neo luchd-ainmichte a-mh?in. Chan eil e ceadaichte a chleachdadh ann
an d?igh sam bith, a? toirt a-steach c?raichean, foillseachadh neo
sgaoileadh,  gun chead. Ma ?s e is gun d?fhuair sibh seo le gun
fhiosd?, bu choir cur ?s dhan phost-d agus lethbhreac sam bith air an
t-siostam agaibh, leig fios chun  neach a sgaoil am post-d  gun d?il.

Dh?fhaodadh gum bi teachdaireachd sam bith bho Riaghaltas na h-Alba
air a chl?radh neo air a sgr?dadh airson dearbhadh gu bheil an siostam
ag obair gu h-?ifeachdach neo airson adhbhar laghail eile. Dh?fhaodadh
nach  eil beachdan anns a? phost-d seo co-ionann ri beachdan
Riaghaltas na h-Alba.
**********************************************************************


----
To subscribe, unsubscribe, or modify your subscription, please visit
http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at
http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org<mailto:Pasig-discuss at mail.asis.org>
http://mail.asis.org/mailman/listinfo/pasig-discuss


______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________

*********************************** ********************************
This email has been received from an external party and
has been swept for the presence of computer viruses.
********************************************************************

----
To subscribe, unsubscribe, or modify your subscription, please visit
http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org<mailto:Pasig-discuss at mail.asis.org>
http://mail.asis.org/mailman/listinfo/pasig-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170512/7baa098d/attachment-0001.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170512/7baa098d/attachment-0001.pl>

From Steve.Knight at dia.govt.nz  Sun May 14 19:43:36 2017
From: Steve.Knight at dia.govt.nz (Steve Knight)
Date: Sun, 14 May 2017 23:43:36 +0000
Subject: [Pasig-discuss] Digital repository storage benchmarking
In-Reply-To: <a101f4645a4b4bb69e7fa3c512713f7b@pr2exchmbx.office.share.org>
References: <8B597316-5049-40E0-A7C4-4F7431E69E76@cca.qc.ca>
	<a101f4645a4b4bb69e7fa3c512713f7b@pr2exchmbx.office.share.org>
Message-ID: <F9C8827E982E0341BCC8A342D42AA2C4E274FCAB@AKLDRMBX03.dia.govt.nz>

Hi Tim

At the National library of New Zealand, we are storing about 210TB of digital objects in our permanent repository.

We have a 25TB online cache, with an online copy of all the digital objects sitting on disk. 

Three tape copies of the objects are made as soon as they enter into the disk archive.  1 copy remains within the tape library (nearline), the other 2 copies are sent offsite (offline). We use Oracle SAM-QFS to manage the storage policies and automatic tierage.

We have a similar treatment for our 100TB of Test data, which has 1 less offsite tape copy.

We are currently looking at replacing this storage architecture with a mix of Hitachi's HDI and HCP S30 object storage products and our cloud provider's object storage offering. The cloud provider storage includes replication across 3 geographic locations providing both higher availability and higher resilience than we currently have.

By moving to an all online solution we hope to increase overall performance and make savings through utilising object storage and exiting some services related to current backup and restore processes.

Regards
Steve


-----Original Message-----
From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Sheila Morrissey
Sent: Saturday, 13 May 2017 5:44 a.m.
To: pasig-discuss at asis.org
Subject: [Pasig-discuss] FW: Digital repository storage benchmarking


Hello, Tim,

At Portico (http://www.portico.org/digital-preservation/), we preserve e-journals, e-books, digitized historical collections, and other born-digital scholarly content.

Currently, the Portico archive is comprised of roughly 77.7 million digital objects (we call them "Archival Units", or AUs); comprising over 400 TB; made up of 1.3 billion files.

We maintain 3 copies of the archive:  2 on disk in geographically distributed data centers, and a 3rd copy in commercial cloud storage.  We create and maintain backups (including fixity checks) using our own custom-written software.

I hope this helpful.

Best regards,
Sheila


Sheila M. Morrissey
Senior Researcher
ITHAKA
100 Campus Drive
Suite 100
Princeton NJ 08540
609-986-2221
sheila.morrissey at ithaka.org
?
ITHAKA (www.ithaka.org) is a not-for-profit organization that helps the academic community use digital technologies to preserve the scholarly record and to advance research and teaching in sustainable ways.? We provide innovative services that benefit higher education, including Ithaka S+R, JSTOR, and Portico.


-----Original Message-----
From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Tim Walsh
Sent: Friday, May 12, 2017 10:16 AM
To: pasig-discuss at asis.org
Subject: [Pasig-discuss] Digital repository storage benchmarking

Dear PASIG,

I am currently in the process of benchmarking digital repository storage setups with our Director of IT, and am having trouble finding very much information about other institutions? configurations online. It?s very possible that this question has been asked before on-list, but I wasn?t able to find anything in the list archives.

For context, we are a research museum with significant born-digital archival holdings preparing to manage about 200 TB of digital objects over the next 3 years, replicated several times on various media. The question is what precisely those ?various media? will be. Currently, our plan is to store one copy on disk on-site, one copy on disk in a managed off-site facility, and a third copy on LTO sent to a third facility. Before we commit, we?d like to benchmark our plans against other institutions.

I have been able to find information about the storage configurations for MoMA and the Computer History Museum (who each wrote blog posts or presented on this topic), but not very many others. So my questions are:

* Could you point me to published/available resources outlining other institutions? digital repository storage configurations?
* Or, if you work at an institution, would you be willing to share the details of your configuration on- or off-list? (any information sent off-list will be kept strictly confidential)

Helpful details would include: amount of digital objects being stored; how many copies of data are being stored; which copies are online, nearline, or offline; which media are being used for which copies; and what services/software applications are you using to manage the creation and maintainance of backups.

Thank you!
Tim

- - -

Tim Walsh
Archiviste, Archives num?riques
Archivist, Digital Archives

Centre Canadien d?Architecture
Canadian Centre for Architecture
1920, rue Baile, Montr?al, Qu?bec  H3H 2S6 T 514 939 7001 x 1532 F 514 939 7020 www.cca.qc.ca<http://www.cca.qc.ca/>


Pensez ? l?environnement avant d?imprimer ce message Please consider the environment before printing this email Ce courriel peut contenir des renseignements confidentiels. Si vous n??tes pas le destinataire pr?vu, veuillez nous en aviser imm?diatement. Merci ?galement de supprimer le pr?sent courriel et d?en d?truire toute copie.
This email may contain confidential information. If you are not the intended recipient, please advise us immediately and delete this email as well as any other copy. Thank you.

----
To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org
http://mail.asis.org/mailman/listinfo/pasig-discuss

----
To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org
http://mail.asis.org/mailman/listinfo/pasig-discuss

From BUNTONGA at mailbox.sc.edu  Sun May 14 21:41:22 2017
From: BUNTONGA at mailbox.sc.edu (BUNTON, GLENN)
Date: Mon, 15 May 2017 01:41:22 +0000
Subject: [Pasig-discuss] Digital repository storage benchmarking
In-Reply-To: <F9C8827E982E0341BCC8A342D42AA2C4E274FCAB@AKLDRMBX03.dia.govt.nz>
References: <8B597316-5049-40E0-A7C4-4F7431E69E76@cca.qc.ca>
	<a101f4645a4b4bb69e7fa3c512713f7b@pr2exchmbx.office.share.org>
	<F9C8827E982E0341BCC8A342D42AA2C4E274FCAB@AKLDRMBX03.dia.govt.nz>
Message-ID: <F616E6DA59E72E4BA6BC554BEE699E21742A930E@CAE145EMBP02.ds.sc.edu>

This discussion of the various digital repository storage approaches has been very enlightening and useful so far. I appreciate all the excellent details. There is one piece of information, however, that is missing. Cost? Both initial implementation outlay and ongoing costs. Any general sense of costs would be greatly appreciated. 


-----Original Message-----
From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Steve Knight
Sent: Sunday, May 14, 2017 6:44 PM
To: 'Sheila Morrissey' <Sheila.Morrissey at ithaka.org>; pasig-discuss at asis.org
Subject: Re: [Pasig-discuss] Digital repository storage benchmarking

Hi Tim

At the National library of New Zealand, we are storing about 210TB of digital objects in our permanent repository.

We have a 25TB online cache, with an online copy of all the digital objects sitting on disk. 

Three tape copies of the objects are made as soon as they enter into the disk archive.  1 copy remains within the tape library (nearline), the other 2 copies are sent offsite (offline). We use Oracle SAM-QFS to manage the storage policies and automatic tierage.

We have a similar treatment for our 100TB of Test data, which has 1 less offsite tape copy.

We are currently looking at replacing this storage architecture with a mix of Hitachi's HDI and HCP S30 object storage products and our cloud provider's object storage offering. The cloud provider storage includes replication across 3 geographic locations providing both higher availability and higher resilience than we currently have.

By moving to an all online solution we hope to increase overall performance and make savings through utilising object storage and exiting some services related to current backup and restore processes.

Regards
Steve


-----Original Message-----
From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Sheila Morrissey
Sent: Saturday, 13 May 2017 5:44 a.m.
To: pasig-discuss at asis.org
Subject: [Pasig-discuss] FW: Digital repository storage benchmarking


Hello, Tim,

At Portico (http://www.portico.org/digital-preservation/), we preserve e-journals, e-books, digitized historical collections, and other born-digital scholarly content.

Currently, the Portico archive is comprised of roughly 77.7 million digital objects (we call them "Archival Units", or AUs); comprising over 400 TB; made up of 1.3 billion files.

We maintain 3 copies of the archive:  2 on disk in geographically distributed data centers, and a 3rd copy in commercial cloud storage.  We create and maintain backups (including fixity checks) using our own custom-written software.

I hope this helpful.

Best regards,
Sheila


Sheila M. Morrissey
Senior Researcher
ITHAKA
100 Campus Drive
Suite 100
Princeton NJ 08540
609-986-2221
sheila.morrissey at ithaka.org
?
ITHAKA (www.ithaka.org) is a not-for-profit organization that helps the academic community use digital technologies to preserve the scholarly record and to advance research and teaching in sustainable ways.? We provide innovative services that benefit higher education, including Ithaka S+R, JSTOR, and Portico.


-----Original Message-----
From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Tim Walsh
Sent: Friday, May 12, 2017 10:16 AM
To: pasig-discuss at asis.org
Subject: [Pasig-discuss] Digital repository storage benchmarking

Dear PASIG,

I am currently in the process of benchmarking digital repository storage setups with our Director of IT, and am having trouble finding very much information about other institutions? configurations online. It?s very possible that this question has been asked before on-list, but I wasn?t able to find anything in the list archives.

For context, we are a research museum with significant born-digital archival holdings preparing to manage about 200 TB of digital objects over the next 3 years, replicated several times on various media. The question is what precisely those ?various media? will be. Currently, our plan is to store one copy on disk on-site, one copy on disk in a managed off-site facility, and a third copy on LTO sent to a third facility. Before we commit, we?d like to benchmark our plans against other institutions.

I have been able to find information about the storage configurations for MoMA and the Computer History Museum (who each wrote blog posts or presented on this topic), but not very many others. So my questions are:

* Could you point me to published/available resources outlining other institutions? digital repository storage configurations?
* Or, if you work at an institution, would you be willing to share the details of your configuration on- or off-list? (any information sent off-list will be kept strictly confidential)

Helpful details would include: amount of digital objects being stored; how many copies of data are being stored; which copies are online, nearline, or offline; which media are being used for which copies; and what services/software applications are you using to manage the creation and maintainance of backups.

Thank you!
Tim

- - -

Tim Walsh
Archiviste, Archives num?riques
Archivist, Digital Archives

Centre Canadien d?Architecture
Canadian Centre for Architecture
1920, rue Baile, Montr?al, Qu?bec  H3H 2S6 T 514 939 7001 x 1532 F 514 939 7020 www.cca.qc.ca<http://www.cca.qc.ca/>


Pensez ? l?environnement avant d?imprimer ce message Please consider the environment before printing this email Ce courriel peut contenir des renseignements confidentiels. Si vous n??tes pas le destinataire pr?vu, veuillez nous en aviser imm?diatement. Merci ?galement de supprimer le pr?sent courriel et d?en d?truire toute copie.
This email may contain confidential information. If you are not the intended recipient, please advise us immediately and delete this email as well as any other copy. Thank you.

----
To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org
http://mail.asis.org/mailman/listinfo/pasig-discuss

----
To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org
http://mail.asis.org/mailman/listinfo/pasig-discuss
----
To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org
http://mail.asis.org/mailman/listinfo/pasig-discuss


From jake.carroll at uq.edu.au  Sun May 14 23:01:19 2017
From: jake.carroll at uq.edu.au (Jake Carroll)
Date: Mon, 15 May 2017 03:01:19 +0000
Subject: [Pasig-discuss] Digital repository storage benchmarking
Message-ID: <78ADB971-820E-4450-BDEB-1814B86B19F0@uq.edu.au>

Certainly interesting.

At the Queensland Brain Institute and the Australian Institute of Bioengineering and Nanotechnology at the University of Queensland, we have around 8.5PB of data under management across our HSM platforms. We currently use Oracle HSM for this task.

We have 256TB of online ?cache? for the data landing location split across 6 different filesystems that are tuned differently for different types of workloads and different tasks. These workloads are generally categorised into a few functions:

? High IO, large serial writes from instruments
? Low IO, large serial writes from instruments
? High IO, granular ?many files, many IOPS? from instruments and computational factors
? Low IO, granular ?many files, low IOPS? from instruments and computational factors
? Generic group share
? Generic user dir

It is an interesting thing to manage and run statistical modelling on in terms of performance analysis and micro benchmarking of data movement patterns. All the filesystems above are provisioned on 16Gbit/sec FC connected Hitachi HUS-VM, 10K SAS.

The metadata for these filesystems is around 10 terabytes of Hitachi Accelerated Advanced Flash storage. We have around 3.8 billion files/unique objects under management.

We run a ?disk based copy? (we call that copy1) which is our disk based VSN or vault. It is around 1PB of ZFS managed storage sitting inside the very large Hitachi HUS-VM platform.

Our Copy2 and Copy3 are 2 * T10000D Oracle tape media copies in SL3000 storage silos, geographically distributed.

We do some interesting things with our tape infrastructure, including DIV-always-on, proactive data protection sweeps inside the HSM and continuous validation checks against the media. We also run STA (tape analytics tools) extra-data-path so we can see *exactly* what each drive is doing at all times. Believe me, we see things that would baffle and boggle the mind (and probably create a healthy sense of paranoia!) if you knew exactly what was going on ?inside there?.

We use finely tuned policy for data automation of movement between tiers so as to minimally impact user experience. Our HSM supports offline file mapping to the windows client, so people can tell when their files and objects are ?offline?. It is a useful semantic and great for usability for people.

We ZFS scrub the disk copy for ?always on disk consistency?, we use tpverify commands on the tape media also to consistently check the media itself. We?re experimenting with implementing fixity shortly too, as the filesystem supports it.

As for going ?all online?, at our scale ?we just can?t afford it yet, to walk away from ?cold tape? principles. We?re just too big. We?d love to rid ourselves of the complexities of it, and consider a full cloud based consumption model, but having crunched the very hard numbers of things such as AWS Glacier and S3, it is a long (long) way more expensive than the relative TCO?s of running it ?on premise? at this stage. My hope is that this will change soon and I can start experimenting with one of my copies being a ?cloud library?.

Interesting thread, this?

-jc


On 15/5/17, 11:41 am, "Pasig-discuss on behalf of BUNTON, GLENN" <pasig-discuss-bounces at asis.org on behalf of BUNTONGA at mailbox.sc.edu> wrote:

    This discussion of the various digital repository storage approaches has been very enlightening and useful so far. I appreciate all the excellent details. There is one piece of information, however, that is missing. Cost? Both initial implementation outlay and ongoing costs. Any general sense of costs would be greatly appreciated. 
    
    
    -----Original Message-----
    From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Steve Knight
    Sent: Sunday, May 14, 2017 6:44 PM
    To: 'Sheila Morrissey' <Sheila.Morrissey at ithaka.org>; pasig-discuss at asis.org
    Subject: Re: [Pasig-discuss] Digital repository storage benchmarking
    
    Hi Tim
    
    At the National library of New Zealand, we are storing about 210TB of digital objects in our permanent repository.
    
    We have a 25TB online cache, with an online copy of all the digital objects sitting on disk. 
    
    Three tape copies of the objects are made as soon as they enter into the disk archive.  1 copy remains within the tape library (nearline), the other 2 copies are sent offsite (offline). We use Oracle SAM-QFS to manage the storage policies and automatic tierage.
    
    We have a similar treatment for our 100TB of Test data, which has 1 less offsite tape copy.
    
    We are currently looking at replacing this storage architecture with a mix of Hitachi's HDI and HCP S30 object storage products and our cloud provider's object storage offering. The cloud provider storage includes replication across 3 geographic locations providing both higher availability and higher resilience than we currently have.
    
    By moving to an all online solution we hope to increase overall performance and make savings through utilising object storage and exiting some services related to current backup and restore processes.
    
    Regards
    Steve
    
    
    -----Original Message-----
    From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Sheila Morrissey
    Sent: Saturday, 13 May 2017 5:44 a.m.
    To: pasig-discuss at asis.org
    Subject: [Pasig-discuss] FW: Digital repository storage benchmarking
    
    
    Hello, Tim,
    
    At Portico (http://www.portico.org/digital-preservation/), we preserve e-journals, e-books, digitized historical collections, and other born-digital scholarly content.
    
    Currently, the Portico archive is comprised of roughly 77.7 million digital objects (we call them "Archival Units", or AUs); comprising over 400 TB; made up of 1.3 billion files.
    
    We maintain 3 copies of the archive:  2 on disk in geographically distributed data centers, and a 3rd copy in commercial cloud storage.  We create and maintain backups (including fixity checks) using our own custom-written software.
    
    I hope this helpful.
    
    Best regards,
    Sheila
    
    
    Sheila M. Morrissey
    Senior Researcher
    ITHAKA
    100 Campus Drive
    Suite 100
    Princeton NJ 08540
    609-986-2221
    sheila.morrissey at ithaka.org
     
    ITHAKA (www.ithaka.org) is a not-for-profit organization that helps the academic community use digital technologies to preserve the scholarly record and to advance research and teaching in sustainable ways.  We provide innovative services that benefit higher education, including Ithaka S+R, JSTOR, and Portico.
    
    
    -----Original Message-----
    From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Tim Walsh
    Sent: Friday, May 12, 2017 10:16 AM
    To: pasig-discuss at asis.org
    Subject: [Pasig-discuss] Digital repository storage benchmarking
    
    Dear PASIG,
    
    I am currently in the process of benchmarking digital repository storage setups with our Director of IT, and am having trouble finding very much information about other institutions? configurations online. It?s very possible that this question has been asked before on-list, but I wasn?t able to find anything in the list archives.
    
    For context, we are a research museum with significant born-digital archival holdings preparing to manage about 200 TB of digital objects over the next 3 years, replicated several times on various media. The question is what precisely those ?various media? will be. Currently, our plan is to store one copy on disk on-site, one copy on disk in a managed off-site facility, and a third copy on LTO sent to a third facility. Before we commit, we?d like to benchmark our plans against other institutions.
    
    I have been able to find information about the storage configurations for MoMA and the Computer History Museum (who each wrote blog posts or presented on this topic), but not very many others. So my questions are:
    
    * Could you point me to published/available resources outlining other institutions? digital repository storage configurations?
    * Or, if you work at an institution, would you be willing to share the details of your configuration on- or off-list? (any information sent off-list will be kept strictly confidential)
    
    Helpful details would include: amount of digital objects being stored; how many copies of data are being stored; which copies are online, nearline, or offline; which media are being used for which copies; and what services/software applications are you using to manage the creation and maintainance of backups.
    
    Thank you!
    Tim
    
    - - -
    
    Tim Walsh
    Archiviste, Archives num?riques
    Archivist, Digital Archives
    
    Centre Canadien d?Architecture
    Canadian Centre for Architecture
    1920, rue Baile, Montr?al, Qu?bec  H3H 2S6 T 514 939 7001 x 1532 F 514 939 7020 www.cca.qc.ca<http://www.cca.qc.ca/>
    
    
    Pensez ? l?environnement avant d?imprimer ce message Please consider the environment before printing this email Ce courriel peut contenir des renseignements confidentiels. Si vous n??tes pas le destinataire pr?vu, veuillez nous en aviser imm?diatement. Merci ?galement de supprimer le pr?sent courriel et d?en d?truire toute copie.
    This email may contain confidential information. If you are not the intended recipient, please advise us immediately and delete this email as well as any other copy. Thank you.
    
    ----
    To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
    _______
    PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
    _______________________________________________
    Pasig-discuss mailing list
    Pasig-discuss at mail.asis.org
    http://mail.asis.org/mailman/listinfo/pasig-discuss
    
    ----
    To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
    _______
    PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
    _______________________________________________
    Pasig-discuss mailing list
    Pasig-discuss at mail.asis.org
    http://mail.asis.org/mailman/listinfo/pasig-discuss
    ----
    To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
    _______
    PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
    _______________________________________________
    Pasig-discuss mailing list
    Pasig-discuss at mail.asis.org
    http://mail.asis.org/mailman/listinfo/pasig-discuss
    
    ----
    To subscribe, unsubscribe, or modify your subscription, please visit
    http://mail.asis.org/mailman/listinfo/pasig-discuss
    _______
    PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
    _______________________________________________
    Pasig-discuss mailing list
    Pasig-discuss at mail.asis.org
    http://mail.asis.org/mailman/listinfo/pasig-discuss
    

From william.kilbride at dpconline.org  Mon May 15 04:02:27 2017
From: william.kilbride at dpconline.org (William Kilbride)
Date: Mon, 15 May 2017 08:02:27 +0000
Subject: [Pasig-discuss] Digital repository storage benchmarking
In-Reply-To: <78ADB971-820E-4450-BDEB-1814B86B19F0@uq.edu.au>
References: <78ADB971-820E-4450-BDEB-1814B86B19F0@uq.edu.au>
Message-ID: <DB6PR0202MB2616142EFA1F4E14C65E229B95E10@DB6PR0202MB2616.eurprd02.prod.outlook.com>

Hi All, Hi Tim

This is a super thread and I am learning a tonne.  On the subject of costs I can make a recommendation and request ...

The Curation Costs Exchange is a useful thing and well worth a look for anyone looking at comparative costs across the digital preservation lifecycle including storage.  It's not been mentioned yet in the discussions, I assume because everyone is already aware of it.  But have a look: http://www.curationexchange.org/ 

The conclusion we drew from the 4C project was that financial planning was a core skill in preservation planning. So to be a 'trusted' repository an institution should be able to demonstrate certain skills in financial planning and be transparent about it.  It's expressed more elegantly in the 4c project roadmap: 
http://www.4cproject.eu/roadmap/

Now the request: there's a network effect here.  The more agencies share data the more useful the data becomes.  So can I encourage you all to share that information (anonymously or identifiably) via the costs exchange?

All best wishes,

William


-----Original Message-----
From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Jake Carroll
Sent: 15 May 2017 04:01
To: pasig-discuss at asis.org
Subject: Re: [Pasig-discuss] Digital repository storage benchmarking

Certainly interesting.

At the Queensland Brain Institute and the Australian Institute of Bioengineering and Nanotechnology at the University of Queensland, we have around 8.5PB of data under management across our HSM platforms. We currently use Oracle HSM for this task.

We have 256TB of online ?cache? for the data landing location split across 6 different filesystems that are tuned differently for different types of workloads and different tasks. These workloads are generally categorised into a few functions:

? High IO, large serial writes from instruments ? Low IO, large serial writes from instruments ? High IO, granular ?many files, many IOPS? from instruments and computational factors ? Low IO, granular ?many files, low IOPS? from instruments and computational factors ? Generic group share ? Generic user dir

It is an interesting thing to manage and run statistical modelling on in terms of performance analysis and micro benchmarking of data movement patterns. All the filesystems above are provisioned on 16Gbit/sec FC connected Hitachi HUS-VM, 10K SAS.

The metadata for these filesystems is around 10 terabytes of Hitachi Accelerated Advanced Flash storage. We have around 3.8 billion files/unique objects under management.

We run a ?disk based copy? (we call that copy1) which is our disk based VSN or vault. It is around 1PB of ZFS managed storage sitting inside the very large Hitachi HUS-VM platform.

Our Copy2 and Copy3 are 2 * T10000D Oracle tape media copies in SL3000 storage silos, geographically distributed.

We do some interesting things with our tape infrastructure, including DIV-always-on, proactive data protection sweeps inside the HSM and continuous validation checks against the media. We also run STA (tape analytics tools) extra-data-path so we can see *exactly* what each drive is doing at all times. Believe me, we see things that would baffle and boggle the mind (and probably create a healthy sense of paranoia!) if you knew exactly what was going on ?inside there?.

We use finely tuned policy for data automation of movement between tiers so as to minimally impact user experience. Our HSM supports offline file mapping to the windows client, so people can tell when their files and objects are ?offline?. It is a useful semantic and great for usability for people.

We ZFS scrub the disk copy for ?always on disk consistency?, we use tpverify commands on the tape media also to consistently check the media itself. We?re experimenting with implementing fixity shortly too, as the filesystem supports it.

As for going ?all online?, at our scale ?we just can?t afford it yet, to walk away from ?cold tape? principles. We?re just too big. We?d love to rid ourselves of the complexities of it, and consider a full cloud based consumption model, but having crunched the very hard numbers of things such as AWS Glacier and S3, it is a long (long) way more expensive than the relative TCO?s of running it ?on premise? at this stage. My hope is that this will change soon and I can start experimenting with one of my copies being a ?cloud library?.

Interesting thread, this?

-jc


On 15/5/17, 11:41 am, "Pasig-discuss on behalf of BUNTON, GLENN" <pasig-discuss-bounces at asis.org on behalf of BUNTONGA at mailbox.sc.edu> wrote:

    This discussion of the various digital repository storage approaches has been very enlightening and useful so far. I appreciate all the excellent details. There is one piece of information, however, that is missing. Cost? Both initial implementation outlay and ongoing costs. Any general sense of costs would be greatly appreciated. 
    
    
    -----Original Message-----
    From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Steve Knight
    Sent: Sunday, May 14, 2017 6:44 PM
    To: 'Sheila Morrissey' <Sheila.Morrissey at ithaka.org>; pasig-discuss at asis.org
    Subject: Re: [Pasig-discuss] Digital repository storage benchmarking
    
    Hi Tim
    
    At the National library of New Zealand, we are storing about 210TB of digital objects in our permanent repository.
    
    We have a 25TB online cache, with an online copy of all the digital objects sitting on disk. 
    
    Three tape copies of the objects are made as soon as they enter into the disk archive.  1 copy remains within the tape library (nearline), the other 2 copies are sent offsite (offline). We use Oracle SAM-QFS to manage the storage policies and automatic tierage.
    
    We have a similar treatment for our 100TB of Test data, which has 1 less offsite tape copy.
    
    We are currently looking at replacing this storage architecture with a mix of Hitachi's HDI and HCP S30 object storage products and our cloud provider's object storage offering. The cloud provider storage includes replication across 3 geographic locations providing both higher availability and higher resilience than we currently have.
    
    By moving to an all online solution we hope to increase overall performance and make savings through utilising object storage and exiting some services related to current backup and restore processes.
    
    Regards
    Steve
    
    
    -----Original Message-----
    From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Sheila Morrissey
    Sent: Saturday, 13 May 2017 5:44 a.m.
    To: pasig-discuss at asis.org
    Subject: [Pasig-discuss] FW: Digital repository storage benchmarking
    
    
    Hello, Tim,
    
    At Portico (http://www.portico.org/digital-preservation/), we preserve e-journals, e-books, digitized historical collections, and other born-digital scholarly content.
    
    Currently, the Portico archive is comprised of roughly 77.7 million digital objects (we call them "Archival Units", or AUs); comprising over 400 TB; made up of 1.3 billion files.
    
    We maintain 3 copies of the archive:  2 on disk in geographically distributed data centers, and a 3rd copy in commercial cloud storage.  We create and maintain backups (including fixity checks) using our own custom-written software.
    
    I hope this helpful.
    
    Best regards,
    Sheila
    
    
    Sheila M. Morrissey
    Senior Researcher
    ITHAKA
    100 Campus Drive
    Suite 100
    Princeton NJ 08540
    609-986-2221
    sheila.morrissey at ithaka.org
     
    ITHAKA (www.ithaka.org) is a not-for-profit organization that helps the academic community use digital technologies to preserve the scholarly record and to advance research and teaching in sustainable ways.  We provide innovative services that benefit higher education, including Ithaka S+R, JSTOR, and Portico.
    
    
    -----Original Message-----
    From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Tim Walsh
    Sent: Friday, May 12, 2017 10:16 AM
    To: pasig-discuss at asis.org
    Subject: [Pasig-discuss] Digital repository storage benchmarking
    
    Dear PASIG,
    
    I am currently in the process of benchmarking digital repository storage setups with our Director of IT, and am having trouble finding very much information about other institutions? configurations online. It?s very possible that this question has been asked before on-list, but I wasn?t able to find anything in the list archives.
    
    For context, we are a research museum with significant born-digital archival holdings preparing to manage about 200 TB of digital objects over the next 3 years, replicated several times on various media. The question is what precisely those ?various media? will be. Currently, our plan is to store one copy on disk on-site, one copy on disk in a managed off-site facility, and a third copy on LTO sent to a third facility. Before we commit, we?d like to benchmark our plans against other institutions.
    
    I have been able to find information about the storage configurations for MoMA and the Computer History Museum (who each wrote blog posts or presented on this topic), but not very many others. So my questions are:
    
    * Could you point me to published/available resources outlining other institutions? digital repository storage configurations?
    * Or, if you work at an institution, would you be willing to share the details of your configuration on- or off-list? (any information sent off-list will be kept strictly confidential)
    
    Helpful details would include: amount of digital objects being stored; how many copies of data are being stored; which copies are online, nearline, or offline; which media are being used for which copies; and what services/software applications are you using to manage the creation and maintainance of backups.
    
    Thank you!
    Tim
    
    - - -
    
    Tim Walsh
    Archiviste, Archives num?riques
    Archivist, Digital Archives
    
    Centre Canadien d?Architecture
    Canadian Centre for Architecture
    1920, rue Baile, Montr?al, Qu?bec  H3H 2S6 T 514 939 7001 x 1532 F 514 939 7020 www.cca.qc.ca<http://www.cca.qc.ca/>
    
    
    Pensez ? l?environnement avant d?imprimer ce message Please consider the environment before printing this email Ce courriel peut contenir des renseignements confidentiels. Si vous n??tes pas le destinataire pr?vu, veuillez nous en aviser imm?diatement. Merci ?galement de supprimer le pr?sent courriel et d?en d?truire toute copie.
    This email may contain confidential information. If you are not the intended recipient, please advise us immediately and delete this email as well as any other copy. Thank you.
    
    ----
    To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
    _______
    PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
    _______________________________________________
    Pasig-discuss mailing list
    Pasig-discuss at mail.asis.org
    http://mail.asis.org/mailman/listinfo/pasig-discuss
    
    ----
    To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
    _______
    PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
    _______________________________________________
    Pasig-discuss mailing list
    Pasig-discuss at mail.asis.org
    http://mail.asis.org/mailman/listinfo/pasig-discuss
    ----
    To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
    _______
    PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
    _______________________________________________
    Pasig-discuss mailing list
    Pasig-discuss at mail.asis.org
    http://mail.asis.org/mailman/listinfo/pasig-discuss
    
    ----
    To subscribe, unsubscribe, or modify your subscription, please visit
    http://mail.asis.org/mailman/listinfo/pasig-discuss
    _______
    PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
    _______________________________________________
    Pasig-discuss mailing list
    Pasig-discuss at mail.asis.org
    http://mail.asis.org/mailman/listinfo/pasig-discuss
    

----
To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org
http://mail.asis.org/mailman/listinfo/pasig-discuss


From neil.jefferies at bodleian.ox.ac.uk  Mon May 15 04:26:45 2017
From: neil.jefferies at bodleian.ox.ac.uk (Neil Jefferies)
Date: Mon, 15 May 2017 08:26:45 +0000
Subject: [Pasig-discuss] WORM (Write Once Read Many) AIPs
In-Reply-To: <HE1PR01MB15315B4CF679459FA8D9212BF1E00@HE1PR01MB1531.eurprd01.prod.exchangelabs.com>
References: <FBB24FE94D8FB245B07DD35074C8C1FF8E689447@s0395g.scotland.gov.uk>
	<7290d680d2d83ae9c5d4a88371bb6147@imap.plus.net>
	<63d06e35b40be1c7d0ff6e5613950844@mail.gmail.com>
	<df48f56e821e44c1a642e897ed62ce67@imap.plus.net>
	<d896375e1b9957d6b0c01a2e997855ad@mail.gmail.com>
	<81d799913e0c44c2d1d46d9ddd9fbd23@imap.plus.net>
	<945351555fe73d699a190d9d7d4fd135@mail.gmail.com>
	<HE1PR01MB15315B4CF679459FA8D9212BF1E00@HE1PR01MB1531.eurprd01.prod.exchangelabs.com>
Message-ID: <48E9420A4871584593FC3D435EF345AAEEED7C15@MBX10.ad.oak.ox.ac.uk>

I pretty much agree although I do think there is a use case for (mostly) immutable AIP's such as retention of material for legal reasons. 
However, I can?t see any reason to package them other than leveraging whatever logical grouping the underlying storage facility provides - which may be a folder in a filesystem or a tar file on tape. Anything additional just adds overhead and increased scope for errors.

If you are moving the object then a package should be undone anyway since you *should* be adding provenance information to cover the move at the very least.

On a more pragmatic level, until there is a better appreciation of the fact that OAIS is a model rather than a design template then I think there will be people who demand physical AIP's - but that is why we are here!

Neil Jefferies MA MBA
Head of Innovation
Bodleian Digital Library Systems and Services
Osney One
Osney Mead
OX2 0EW
T: +44 1865 2-80588

-----Original Message-----
From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Jonathan Tilbury
Sent: 14 May 2017 11:55
To: pasig-discuss at asis.org
Subject: Re: [Pasig-discuss] WORM (Write Once Read Many) AIPs

Tim,

I have always thought of the "autonomous AIP" zipped up and held on a storage device as an residue of paper-thinking. When dealing with paper storage it is possible to bundle up the papers and some description and put it in a box onto a shelf. If you need the artefact, you get all of the box. The paper is unlikely to be updated of changed during its lifetime. 

This really does not map well onto the digital world. There a lots of changes that result in the "API" being changed, for example changes in descriptive metadata, structure (parentage), security settings, technical metadata (during a re-characterisation) and audit trail. You may also add extra files to the API and most importantly generate new representations for access or digital masters following a migration. This makes the idea of a single immutable AIP redundant. 

Addressing this we need to ask why are we worrying. I think you answered this well by saying the content plus all of the metadata listed above must be accessible outside of whatever system you are using to re-build the collection should disaster happen or should you want to change system provider. To enable this you need all of the digital objects plus metadata (description, technical, security, structure, audit trail, fixity) to be held in a place and in a way that can be machine read. This does not imply physical zipped AIPs, just that the data is there and is understandable. 

Physical (zipped) AIPs are difficult to work with. Whenever you need to access a file you need to unpack the zip which is cumbersome and slow. This happens for download, rendering, and fixity checking. This overhead has no benefit and several risks. Also, it brings into question what fixity checking actually means when the storage container is being changed all the time. These problems become particularly acute when we have to address the large flat collections we are now seeing more of. 

I have always thought a better approach is to save the digital objects (files) in an object store (for example a file drive, tape store, cloud storage), and to make sure these never change using fixity validation. All of the metadata can be written to the object store as well, and either updated or new versions written as it is updated. These digital objects (files and metadata) can be stored in multiple locations in different technologies. 

In Preservica we support both approaches through the range of storage adapters we include. Each has its own way of renaming the digital objects, but the use of objects with a UUID naming convention is preferred. We strongly recommend against the use of physical APIs. All of the objects, once stored, can then be checked for fixity on a rotating basis or when accessed. By storing to multiple storage adapters you can even self-heal if someone does mess with your file system.

As for exiting the system, we allow cloud edition users to replicate all of the content plus metadata to a remote store using SFTP in such a way that the physical directory structure mimics the logical collection structure. If they want to leave they have all the content safe in a place of their choosing.

I would very interested I people's comments on whether we should still support Physical (zipped) AIPs. 

Jon

=============
Jon Tilbury
CTO, Preserivca
=============


-----Original Message-----
From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Neil Jefferies
Sent: Friday, May 12, 2017 4:43 PM
To: Jacob Farmer <jfarmer at cambridgecomputer.com>
Subject: Re: [Pasig-discuss] WORM (Write Once Read Many) AIPs

Jacob,

This is the key point of my argument - the definition of object you have is not the definition of an object that an archive wants to preserve.
I'm speaking for people like Tim and I - others are quite happy to build what I term bit-museums.

Likewise, what you consider preservation (immutability of a bitstream) is not quite the same as ours - retention of knowledge content - which requires mutability but with immutable previous versions and provenance/audit records.

As long as this disconnect between technology and requirements remains the case, object stores are actually of limited use for us in preservation and archiving without considerable additional work. The 'metadata' that most object stores support (key-value pairs) is pretty useless as far as our metadata requirements go - in the end we have to store XML or triples as separate files/objects. This was an issue when I reviewed the StorageTek
5800 code builds way back and frankly object storage hasn't moved on much.

Fedora, for all its faults, does actually provide an object view that is meaningful - something that can be a node in a linked-data graph. It can be arbitrarily complex but equally, could comprise only metadata. It is almost never a file.

Neil

On 2017-05-12 20:29, Jacob Farmer wrote:
> Hi, Neil.  Great points.  Indeed, hard links only work in a single 
> file system, but they continue pointing to and fro when a file is 
> otherwise moved or renamed.
>
> I personally think of POSIX file systems as object stores that have 
> weak addressing, limited metadata, and that offer mutability as the 
> default.
>
> My preferred definition of an object store is a device that stores 
> objects.
> My preferred definition of an object is any piece of data that can be 
> individually addressed and manipulated.
> So, by that definition, POSIX file systems are object stores, so are 
> hard drives.  So is Microsoft exchange, etc.
>
> If you name a file according to a hash or a UUID (the hash could be 
> the UUID), then you have a form of persistent address.  As long as no 
> one messes with your file system, the address scheme stays intact.
>
>
> -----Original Message-----
> From: Neil Jefferies [mailto:neil at jefferies.org]
> Sent: Friday, May 12, 2017 11:25 AM
> To: Jacob Farmer
> Subject: RE: [Pasig-discuss] WORM (Write Once Read Many) AIPs
>
> Good point on the housekeeping!
>
> Most (reasonable) filesystems allow you specify the inode numbers at 
> creation but yes, it is hard to change afterwards!
>
> But I would really, really avoid hard links - they only work within a 
> single filesystem so they can't be used in tiered or virtual storage 
> systems and even break quota controls on regular filesystems. Scale up 
> thus becomes very difficult with hard links. Symlinks also make it 
> explicit when you are dealing with a reference and can tell you which 
> version of the object held the original - useful provenance that hard 
> links don't capture.
>
> My personal feeling is no for hashes, yes for UUID's (or other 
> suitably unique object ID). This allows us to keep all versions of an 
> object in the same root path even though it varies. And don't store at 
> a file level - this shotguns object fragments all over the store and 
> make rebuilds horrible.
> Many current object stores do this - and consequently don't version 
> effectively - I wish people would understand objects are not files.
> UUID's
> are also consistent in terms of computational time and hashes very 
> much aren't.
>
> There's a big difference in robustness between needing just filesystem 
> metadata to find an object in storage and requiring filesystem 
> metadata (because underneath all object stores are filesystems - even 
> Seagates "object" hard drives), object store metadata to map paths to 
> hashes, and object metadata to find all the bits that make up a 
> composite object.
>
> ...and yes, I am saying that most object store vendors have got it 
> wrong. At least as far as archiving is concerned. And they ought to 
> consider why every object store ends up presenting itself as a POSIX 
> filesystem.
>
> Neil
>
>
> On 2017-05-12 14:33, Jacob Farmer wrote:
>> Two warnings and two suggestions:
>>
>> Warnings:
>>
>> 1)  Symlinks and Housekeeping -- It is a common practice to use 
>> symlinks to make versioned file collections.  If you do this, you 
>> should have some kind of housekeeping processes that ensure that the 
>> symlinks are all working correctly.  If files ever have to get 
>> migrated, symlinks can break.
>>
>> 2)  Check with your file system vendor -- Most removable media file 
>> systems have some built in limitations on the number of inodes
>> (files) that you can have in one file system.  If you generate a lot 
>> of symlinks, you might overwhelm the file system.  Your vendor will know.
>>
>> Suggestions:
>>
>> 1)  Hashes for file names -- If your application software maintains a 
>> hash for each file, you might consider naming the file according to 
>> the hash.
>> Use the first two digits for the parent directory, the next two 
>> digits for sub-diretory, the next two digits for sub-directory.  Then 
>> use the full hash for the file name.  This turns your POSIX file 
>> system into an object store with uniquely named objects.
>>
>> 	As a safeguard, you might maintain a separate table or list that 
>> associates path names with hashes.
>>
>> 2)  Consider using hard links instead of symlinks -- You might use 
>> hard links instead of symlinks, presuming that the files are all in 
>> the same file system.  You still have to watch for file count issues, 
>> but you have less housekeeping to do.
>>
>> I hope that helps.
>>
>>
>> Jacob Farmer  |  Chief Technology Officer  |  Cambridge Computer  | 
>> "Artists In Data Storage"
>> Phone 781-250-3210  |  jfarmer at CambridgeComputer.com  | 
>> www.CambridgeComputer.com
>>
>>
>>
>>
>> -----Original Message-----
>> From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf 
>> Of Neil Jefferies
>> Sent: Friday, May 12, 2017 8:06 AM
>> To: Tim.Gollins at nrscotland.gov.uk
>> Cc: pasig-discuss at mail.asis.org
>> Subject: Re: [Pasig-discuss] WORM (Write Once Read Many) AIPs
>>
>> Tim,
>>
>> If we store AIP's unpackaged, as a collection of files in a folder, 
>> then object updates could just be a new folder with symlinks to the 
>> unchanged parts and the updated parts in place in the folder. The 
>> object "location"
>> would be a parent folder for all these version folders - for example, 
>> a pairtree (or triple-tree for faster scanning/rebuilds) based on 
>> object UUID.
>> Version folders would be named accoprding to date or version number 
>> (date might make Memento compliant access simpler).
>> Creating anew version clones the current verion (including links) 
>> with a new name and then replaces the updated parts in situ. Final 
>> act is to update a "current" symlink in the object. Any update 
>> failure will mean "current"
>> is
>> not updated an the partial clone can be discarded.
>>
>> This assumes most updates are metadata and that a diff won't save 
>> much compared to a complete new XML file or whatever. I am also 
>> assuming that metadata won't be wrappered either (so you can forget
>> METS) so that different types are stored in the most stuiable format 
>> and are accessed only when required. The problems with roundtripping 
>> packaged AIP's for updates rather than diff-ing are repeated by METS 
>> wrappering.
>>
>> These may be a virtual folder/filesytem presentation and underneath 
>> an HSM would retrieve files from wherever when it is actually accessed.
>> HSM policy in soemthing like SAM-QFS/Versity/Cray TAS can ensure 
>> folders are kep intact when moved to other storage (we could even 
>> dereference symlinks when dealing with tape).
>>
>> This can be done with a POSIX filesystem and not muich code - Ben 
>> O'Steen started something along these lines here:
>> https://github.com/dataflow/RDFDatabank/wiki/What-is-DataBank-and-wha
>> t
>> -does-it-do%3F
>>
>> Fedora also also a versioning object store that could support this 
>> kind of model but also adds a fair bit of complexity to be 
>> Linked-Data_platform compliant.
>>
>> In my paralance I would probably equate "Minimal Ingest" with "Sheer 
>> Curation" and APT with Asynchronous Message Driven Workers.
>>
>> Neil
>>
>>
>> On 2017-05-12 12:33, Tim.Gollins at nrscotland.gov.uk wrote:
>>> Dear PASIG
>>>
>>> I have been thinking recently about the challenge of managing 
>>> "physical"  AIPs on offline or near line storage and how to optimise 
>>> or simplify the use of managed storage media in a tape based
>>> (robotic) Hierarchical Storage Management (HSM) system. By "physical"
>>> AIPs I mean that the actual structure of the AIP written to the 
>>> storage system is sufficiently self-describing that even if the 
>>> management or other elements of a DP system were to be lost to a 
>>> disaster then the entire collection could be fully re-instated 
>>> reliably from the stored AIPs alone.
>>>
>>> I have also been thinking about the huge benefits of adopting the 
>>> concepts of "Minimal Ingest" (MI) and "Autonomous Preservation Tools"
>>> (APT) in a new Digital Archive solution.
>>>
>>> One of the potential effects of the MI and APT concepts is that over 
>>> time it is clear that while (of course) the original bit streams 
>>> will never need to be updated, the metadata packaged in the AIP will 
>>> need to change relatively often (through the life of the AIP) . This 
>>> is of course in addition to any new renderings of the bit streams 
>>> produced for preservation purposes (manifestations as termed in some 
>>> systems).
>>>
>>> If to update the AIP the process involves the AIP being "loaded" and 
>>> "Modified" and "Stored" again as a whole then this will result in 
>>> significant "churn" of the offline or near line media (i.e. tapes) 
>>> in a HSM - which I would like to avoid. I think it would be really 
>>> great if the AIP representation could accommodate the concept of an 
>>> "update IP" (perhaps UIP?) where the UIP contains a "delta" of the 
>>> original AIP - the full AIP then being interpreted as the original 
>>> as modified by a series of deltas. This would then effectively 
>>> result in AIPs (and
>>> UIPs) becoming WORM objects with clear benefits that I perceive in 
>>> managing their reliable and safe storage.
>>>
>>> I am not sufficiently familiar with the detail of all the different 
>>> AIP models or implementations, I was wondering if anyone in the team 
>>> would be able to comment on whether the they know of any AIP models, 
>>> specifications or implementations that  would support such a use 
>>> case.
>>>
>>> I have just posted a version of this question to the E-Ark Linked in 
>>> Group so my apologies to those who see it twice.
>>>
>>> Many thanks
>>>
>>> Tim
>>> Tim Gollins | Head of Digital Archiving and Director of the NRS 
>>> Digital Preservation Programme National Records of Scotland | West 
>>> Register House | Edinburgh EH2 4DF
>>> + 44 (0)131 535 1431 / + 44 (0)7974 922614 |
>>> tim.gollins at nrscotland.gov.uk | www.nrscotland.gov.uk
>>>
>>> Preserving the past | Recording the present | Informing the future 
>>> Follow us on Twitter: @NatRecordsScot | 
>>> http://twitter.com/NatRecordsScot
>>>
>>>
>>> ********************************************************************
>>> *
>>> * This e-mail (and any files or other attachments transmitted with
>>> it) is intended solely for the attention of the addressee(s).
>>> Unauthorised use, disclosure, storage, copying or distribution of 
>>> any part of this e-mail is not permitted. If you are not the 
>>> intended recipient please destroy the email, remove any copies from 
>>> your system and inform the sender immediately by return.
>>>
>>> Communications with the Scottish Government may be monitored or 
>>> recorded in order to secure the effective operation of the system 
>>> and for other lawful purposes. The views or opinions contained 
>>> within this e-mail may not necessarily reflect those of the Scottish 
>>> Government.
>>>
>>>
>>> Tha am post-d seo (agus faidhle neo ceanglan  c?mhla ris) dhan neach 
>>> neo luchd-ainmichte a-mh?in. Chan eil e ceadaichte a chleachdadh ann 
>>> an d?igh sam bith, a? toirt a-steach c?raichean, foillseachadh neo 
>>> sgaoileadh,  gun chead. Ma ?s e is gun d?fhuair sibh seo le gun 
>>> fhiosd?, bu choir cur ?s dhan phost-d agus lethbhreac sam bith air 
>>> an t-siostam agaibh, leig fios chun  neach a sgaoil am post-d  gun d?il.
>>>
>>> Dh?fhaodadh gum bi teachdaireachd sam bith bho Riaghaltas na h-Alba 
>>> air a chl?radh neo air a sgr?dadh airson dearbhadh gu bheil an 
>>> siostam ag obair gu h-?ifeachdach neo airson adhbhar laghail eile.
>>> Dh?fhaodadh nach  eil beachdan anns a? phost-d seo co-ionann ri 
>>> beachdan Riaghaltas na h-Alba.
>>> ********************************************************************
>>> *
>>> *
>>>
>>>
>>>
>>> ----
>>> To subscribe, unsubscribe, or modify your subscription, please visit 
>>> http://mail.asis.org/mailman/listinfo/pasig-discuss
>>> _______
>>> PASIG Webinars and conference material is at 
>>> http://www.preservationandarchivingsig.org/index.html
>>> _______________________________________________
>>> Pasig-discuss mailing list
>>> Pasig-discuss at mail.asis.org
>>> http://mail.asis.org/mailman/listinfo/pasig-discuss
>>
>> ----
>> To subscribe, unsubscribe, or modify your subscription, please visit 
>> http://mail.asis.org/mailman/listinfo/pasig-discuss
>> _______
>> PASIG Webinars and conference material is at 
>> http://www.preservationandarchivingsig.org/index.html
>> _______________________________________________
>> Pasig-discuss mailing list
>> Pasig-discuss at mail.asis.org
>> http://mail.asis.org/mailman/listinfo/pasig-discuss

----
To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org
http://mail.asis.org/mailman/listinfo/pasig-discuss

----
To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org
http://mail.asis.org/mailman/listinfo/pasig-discuss

----
To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org
http://mail.asis.org/mailman/listinfo/pasig-discuss


From gail at trumantechnologies.com  Mon May 15 12:48:42 2017
From: gail at trumantechnologies.com (gail at trumantechnologies.com)
Date: Mon, 15 May 2017 09:48:42 -0700
Subject: [Pasig-discuss] Proposed Digital Preservation Storage Criteria ver.
 2 for community discussion
Message-ID: <20170515094842.b554e26909f2beaf9f8ddbf6be9a6600.5a9e0b4cf7.wbe@email09.godaddy.com>

An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170515/d8a934d0/attachment.html>

From randy_stern at harvard.edu  Tue May 16 10:05:12 2017
From: randy_stern at harvard.edu (Stern, Randy)
Date: Tue, 16 May 2017 14:05:12 +0000
Subject: [Pasig-discuss] Digital repository storage benchmarking
In-Reply-To: <DB6PR0202MB2616142EFA1F4E14C65E229B95E10@DB6PR0202MB2616.eurprd02.prod.outlook.com>
References: <78ADB971-820E-4450-BDEB-1814B86B19F0@uq.edu.au>
	<DB6PR0202MB2616142EFA1F4E14C65E229B95E10@DB6PR0202MB2616.eurprd02.prod.outlook.com>
Message-ID: <2EBED878-C03B-41EB-BAA8-E36F949EF821@harvard.edu>

Re costs:  For Harvard Library?s Digital Repository Service - 2 disk copies plus 2 tape copies - as of July 1, the cost of storage for depositors to the DRS is $1.25/GB/year for storage. This figure is moderately close to the storage hardware costs. The storage cost does not include staff costs, preservation activities, or server costs associated with the core DRS software services, tools, and databases.

Randy


On 5/15/17, 4:02 AM, "Pasig-discuss on behalf of William Kilbride" <pasig-discuss-bounces at asis.org on behalf of william.kilbride at dpconline.org> wrote:

    Hi All, Hi Tim
    
    This is a super thread and I am learning a tonne.  On the subject of costs I can make a recommendation and request ...
    
    The Curation Costs Exchange is a useful thing and well worth a look for anyone looking at comparative costs across the digital preservation lifecycle including storage.  It's not been mentioned yet in the discussions, I assume because everyone is already aware of it.  But have a look: http://www.curationexchange.org/ 
    
    The conclusion we drew from the 4C project was that financial planning was a core skill in preservation planning. So to be a 'trusted' repository an institution should be able to demonstrate certain skills in financial planning and be transparent about it.  It's expressed more elegantly in the 4c project roadmap: 
    http://www.4cproject.eu/roadmap/
    
    Now the request: there's a network effect here.  The more agencies share data the more useful the data becomes.  So can I encourage you all to share that information (anonymously or identifiably) via the costs exchange?
    
    All best wishes,
    
    William
    
    
    -----Original Message-----
    From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Jake Carroll
    Sent: 15 May 2017 04:01
    To: pasig-discuss at asis.org
    Subject: Re: [Pasig-discuss] Digital repository storage benchmarking
    
    Certainly interesting.
    
    At the Queensland Brain Institute and the Australian Institute of Bioengineering and Nanotechnology at the University of Queensland, we have around 8.5PB of data under management across our HSM platforms. We currently use Oracle HSM for this task.
    
    We have 256TB of online ?cache? for the data landing location split across 6 different filesystems that are tuned differently for different types of workloads and different tasks. These workloads are generally categorised into a few functions:
    
    ? High IO, large serial writes from instruments ? Low IO, large serial writes from instruments ? High IO, granular ?many files, many IOPS? from instruments and computational factors ? Low IO, granular ?many files, low IOPS? from instruments and computational factors ? Generic group share ? Generic user dir
    
    It is an interesting thing to manage and run statistical modelling on in terms of performance analysis and micro benchmarking of data movement patterns. All the filesystems above are provisioned on 16Gbit/sec FC connected Hitachi HUS-VM, 10K SAS.
    
    The metadata for these filesystems is around 10 terabytes of Hitachi Accelerated Advanced Flash storage. We have around 3.8 billion files/unique objects under management.
    
    We run a ?disk based copy? (we call that copy1) which is our disk based VSN or vault. It is around 1PB of ZFS managed storage sitting inside the very large Hitachi HUS-VM platform.
    
    Our Copy2 and Copy3 are 2 * T10000D Oracle tape media copies in SL3000 storage silos, geographically distributed.
    
    We do some interesting things with our tape infrastructure, including DIV-always-on, proactive data protection sweeps inside the HSM and continuous validation checks against the media. We also run STA (tape analytics tools) extra-data-path so we can see *exactly* what each drive is doing at all times. Believe me, we see things that would baffle and boggle the mind (and probably create a healthy sense of paranoia!) if you knew exactly what was going on ?inside there?.
    
    We use finely tuned policy for data automation of movement between tiers so as to minimally impact user experience. Our HSM supports offline file mapping to the windows client, so people can tell when their files and objects are ?offline?. It is a useful semantic and great for usability for people.
    
    We ZFS scrub the disk copy for ?always on disk consistency?, we use tpverify commands on the tape media also to consistently check the media itself. We?re experimenting with implementing fixity shortly too, as the filesystem supports it.
    
    As for going ?all online?, at our scale ?we just can?t afford it yet, to walk away from ?cold tape? principles. We?re just too big. We?d love to rid ourselves of the complexities of it, and consider a full cloud based consumption model, but having crunched the very hard numbers of things such as AWS Glacier and S3, it is a long (long) way more expensive than the relative TCO?s of running it ?on premise? at this stage. My hope is that this will change soon and I can start experimenting with one of my copies being a ?cloud library?.
    
    Interesting thread, this?
    
    -jc
    
    
    On 15/5/17, 11:41 am, "Pasig-discuss on behalf of BUNTON, GLENN" <pasig-discuss-bounces at asis.org on behalf of BUNTONGA at mailbox.sc.edu> wrote:
    
        This discussion of the various digital repository storage approaches has been very enlightening and useful so far. I appreciate all the excellent details. There is one piece of information, however, that is missing. Cost? Both initial implementation outlay and ongoing costs. Any general sense of costs would be greatly appreciated. 
        
        
        -----Original Message-----
        From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Steve Knight
        Sent: Sunday, May 14, 2017 6:44 PM
        To: 'Sheila Morrissey' <Sheila.Morrissey at ithaka.org>; pasig-discuss at asis.org
        Subject: Re: [Pasig-discuss] Digital repository storage benchmarking
        
        Hi Tim
        
        At the National library of New Zealand, we are storing about 210TB of digital objects in our permanent repository.
        
        We have a 25TB online cache, with an online copy of all the digital objects sitting on disk. 
        
        Three tape copies of the objects are made as soon as they enter into the disk archive.  1 copy remains within the tape library (nearline), the other 2 copies are sent offsite (offline). We use Oracle SAM-QFS to manage the storage policies and automatic tierage.
        
        We have a similar treatment for our 100TB of Test data, which has 1 less offsite tape copy.
        
        We are currently looking at replacing this storage architecture with a mix of Hitachi's HDI and HCP S30 object storage products and our cloud provider's object storage offering. The cloud provider storage includes replication across 3 geographic locations providing both higher availability and higher resilience than we currently have.
        
        By moving to an all online solution we hope to increase overall performance and make savings through utilising object storage and exiting some services related to current backup and restore processes.
        
        Regards
        Steve
        
        
        -----Original Message-----
        From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Sheila Morrissey
        Sent: Saturday, 13 May 2017 5:44 a.m.
        To: pasig-discuss at asis.org
        Subject: [Pasig-discuss] FW: Digital repository storage benchmarking
        
        
        Hello, Tim,
        
        At Portico (http://www.portico.org/digital-preservation/), we preserve e-journals, e-books, digitized historical collections, and other born-digital scholarly content.
        
        Currently, the Portico archive is comprised of roughly 77.7 million digital objects (we call them "Archival Units", or AUs); comprising over 400 TB; made up of 1.3 billion files.
        
        We maintain 3 copies of the archive:  2 on disk in geographically distributed data centers, and a 3rd copy in commercial cloud storage.  We create and maintain backups (including fixity checks) using our own custom-written software.
        
        I hope this helpful.
        
        Best regards,
        Sheila
        
        
        Sheila M. Morrissey
        Senior Researcher
        ITHAKA
        100 Campus Drive
        Suite 100
        Princeton NJ 08540
        609-986-2221
        sheila.morrissey at ithaka.org
         
        ITHAKA (www.ithaka.org) is a not-for-profit organization that helps the academic community use digital technologies to preserve the scholarly record and to advance research and teaching in sustainable ways.  We provide innovative services that benefit higher education, including Ithaka S+R, JSTOR, and Portico.
        
        
        -----Original Message-----
        From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Tim Walsh
        Sent: Friday, May 12, 2017 10:16 AM
        To: pasig-discuss at asis.org
        Subject: [Pasig-discuss] Digital repository storage benchmarking
        
        Dear PASIG,
        
        I am currently in the process of benchmarking digital repository storage setups with our Director of IT, and am having trouble finding very much information about other institutions? configurations online. It?s very possible that this question has been asked before on-list, but I wasn?t able to find anything in the list archives.
        
        For context, we are a research museum with significant born-digital archival holdings preparing to manage about 200 TB of digital objects over the next 3 years, replicated several times on various media. The question is what precisely those ?various media? will be. Currently, our plan is to store one copy on disk on-site, one copy on disk in a managed off-site facility, and a third copy on LTO sent to a third facility. Before we commit, we?d like to benchmark our plans against other institutions.
        
        I have been able to find information about the storage configurations for MoMA and the Computer History Museum (who each wrote blog posts or presented on this topic), but not very many others. So my questions are:
        
        * Could you point me to published/available resources outlining other institutions? digital repository storage configurations?
        * Or, if you work at an institution, would you be willing to share the details of your configuration on- or off-list? (any information sent off-list will be kept strictly confidential)
        
        Helpful details would include: amount of digital objects being stored; how many copies of data are being stored; which copies are online, nearline, or offline; which media are being used for which copies; and what services/software applications are you using to manage the creation and maintainance of backups.
        
        Thank you!
        Tim
        
        - - -
        
        Tim Walsh
        Archiviste, Archives num?riques
        Archivist, Digital Archives
        
        Centre Canadien d?Architecture
        Canadian Centre for Architecture
        1920, rue Baile, Montr?al, Qu?bec  H3H 2S6 T 514 939 7001 x 1532 F 514 939 7020 www.cca.qc.ca<http://www.cca.qc.ca/>
        
        
        Pensez ? l?environnement avant d?imprimer ce message Please consider the environment before printing this email Ce courriel peut contenir des renseignements confidentiels. Si vous n??tes pas le destinataire pr?vu, veuillez nous en aviser imm?diatement. Merci ?galement de supprimer le pr?sent courriel et d?en d?truire toute copie.
        This email may contain confidential information. If you are not the intended recipient, please advise us immediately and delete this email as well as any other copy. Thank you.
        
        ----
        To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
        _______
        PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
        _______________________________________________
        Pasig-discuss mailing list
        Pasig-discuss at mail.asis.org
        http://mail.asis.org/mailman/listinfo/pasig-discuss
        
        ----
        To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
        _______
        PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
        _______________________________________________
        Pasig-discuss mailing list
        Pasig-discuss at mail.asis.org
        http://mail.asis.org/mailman/listinfo/pasig-discuss
        ----
        To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
        _______
        PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
        _______________________________________________
        Pasig-discuss mailing list
        Pasig-discuss at mail.asis.org
        http://mail.asis.org/mailman/listinfo/pasig-discuss
        
        ----
        To subscribe, unsubscribe, or modify your subscription, please visit
        http://mail.asis.org/mailman/listinfo/pasig-discuss
        _______
        PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
        _______________________________________________
        Pasig-discuss mailing list
        Pasig-discuss at mail.asis.org
        http://mail.asis.org/mailman/listinfo/pasig-discuss
        
    
    ----
    To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
    _______
    PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
    _______________________________________________
    Pasig-discuss mailing list
    Pasig-discuss at mail.asis.org
    http://mail.asis.org/mailman/listinfo/pasig-discuss
    
    ----
    To subscribe, unsubscribe, or modify your subscription, please visit
    http://mail.asis.org/mailman/listinfo/pasig-discuss
    _______
    PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
    _______________________________________________
    Pasig-discuss mailing list
    Pasig-discuss at mail.asis.org
    http://mail.asis.org/mailman/listinfo/pasig-discuss
    

From luispo at gmail.com  Tue May 16 13:53:39 2017
From: luispo at gmail.com (=?utf-8?Q?Louis_Su=C3=A1rez-Potts?=)
Date: Tue, 16 May 2017 13:53:39 -0400
Subject: [Pasig-discuss] Digital repository storage benchmarking
In-Reply-To: <2EBED878-C03B-41EB-BAA8-E36F949EF821@harvard.edu>
References: <78ADB971-820E-4450-BDEB-1814B86B19F0@uq.edu.au>
	<DB6PR0202MB2616142EFA1F4E14C65E229B95E10@DB6PR0202MB2616.eurprd02.prod.outlook.com>
	<2EBED878-C03B-41EB-BAA8-E36F949EF821@harvard.edu>
Message-ID: <F4B89086-407A-4EEA-8040-B5D04B798487@gmail.com>

> Now the request: there's a network effect here.  The more agencies share data the more useful the data becomes.  So can I encourage you all to share that information (anonymously or identifiably) via the costs exchange?


Hi
I'm all for sharing this data, as well as other relevant information, including accounts of how we do things, even when they are mistakes. But an email list is not the best venue; something more pliable, like a wiki or its equivalent? I'm sure there are options. And equally sure that this particular issue has complications related to political location that do need to be made clear, as political mandates (must be within certain political boundaries, say) affect cost, inter alia.

Cheers,
Louis


> On 2017-05-16, at 10:05, Stern, Randy <randy_stern at harvard.edu> wrote:
> 
> Re costs:  For Harvard Library?s Digital Repository Service - 2 disk copies plus 2 tape copies - as of July 1, the cost of storage for depositors to the DRS is $1.25/GB/year for storage. This figure is moderately close to the storage hardware costs. The storage cost does not include staff costs, preservation activities, or server costs associated with the core DRS software services, tools, and databases.
> 
> Randy
> 
> 
> 
> On 5/15/17, 4:02 AM, "Pasig-discuss on behalf of William Kilbride" <pasig-discuss-bounces at asis.org on behalf of william.kilbride at dpconline.org> wrote:
> 
>    Hi All, Hi Tim
> 
>    This is a super thread and I am learning a tonne.  On the subject of costs I can make a recommendation and request ...
> 
>    The Curation Costs Exchange is a useful thing and well worth a look for anyone looking at comparative costs across the digital preservation lifecycle including storage.  It's not been mentioned yet in the discussions, I assume because everyone is already aware of it.  But have a look: http://www.curationexchange.org/ 
> 
>    The conclusion we drew from the 4C project was that financial planning was a core skill in preservation planning. So to be a 'trusted' repository an institution should be able to demonstrate certain skills in financial planning and be transparent about it.  It's expressed more elegantly in the 4c project roadmap: 
>    http://www.4cproject.eu/roadmap/
> 
>    Now the request: there's a network effect here.  The more agencies share data the more useful the data becomes.  So can I encourage you all to share that information (anonymously or identifiably) via the costs exchange?
> 
>    All best wishes,
> 
>    William
> 
> 
>    -----Original Message-----
>    From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Jake Carroll
>    Sent: 15 May 2017 04:01
>    To: pasig-discuss at asis.org
>    Subject: Re: [Pasig-discuss] Digital repository storage benchmarking
> 
>    Certainly interesting.
> 
>    At the Queensland Brain Institute and the Australian Institute of Bioengineering and Nanotechnology at the University of Queensland, we have around 8.5PB of data under management across our HSM platforms. We currently use Oracle HSM for this task.
> 
>    We have 256TB of online ?cache? for the data landing location split across 6 different filesystems that are tuned differently for different types of workloads and different tasks. These workloads are generally categorised into a few functions:
> 
>    ? High IO, large serial writes from instruments ? Low IO, large serial writes from instruments ? High IO, granular ?many files, many IOPS? from instruments and computational factors ? Low IO, granular ?many files, low IOPS? from instruments and computational factors ? Generic group share ? Generic user dir
> 
>    It is an interesting thing to manage and run statistical modelling on in terms of performance analysis and micro benchmarking of data movement patterns. All the filesystems above are provisioned on 16Gbit/sec FC connected Hitachi HUS-VM, 10K SAS.
> 
>    The metadata for these filesystems is around 10 terabytes of Hitachi Accelerated Advanced Flash storage. We have around 3.8 billion files/unique objects under management.
> 
>    We run a ?disk based copy? (we call that copy1) which is our disk based VSN or vault. It is around 1PB of ZFS managed storage sitting inside the very large Hitachi HUS-VM platform.
> 
>    Our Copy2 and Copy3 are 2 * T10000D Oracle tape media copies in SL3000 storage silos, geographically distributed.
> 
>    We do some interesting things with our tape infrastructure, including DIV-always-on, proactive data protection sweeps inside the HSM and continuous validation checks against the media. We also run STA (tape analytics tools) extra-data-path so we can see *exactly* what each drive is doing at all times. Believe me, we see things that would baffle and boggle the mind (and probably create a healthy sense of paranoia!) if you knew exactly what was going on ?inside there?.
> 
>    We use finely tuned policy for data automation of movement between tiers so as to minimally impact user experience. Our HSM supports offline file mapping to the windows client, so people can tell when their files and objects are ?offline?. It is a useful semantic and great for usability for people.
> 
>    We ZFS scrub the disk copy for ?always on disk consistency?, we use tpverify commands on the tape media also to consistently check the media itself. We?re experimenting with implementing fixity shortly too, as the filesystem supports it.
> 
>    As for going ?all online?, at our scale ?we just can?t afford it yet, to walk away from ?cold tape? principles. We?re just too big. We?d love to rid ourselves of the complexities of it, and consider a full cloud based consumption model, but having crunched the very hard numbers of things such as AWS Glacier and S3, it is a long (long) way more expensive than the relative TCO?s of running it ?on premise? at this stage. My hope is that this will change soon and I can start experimenting with one of my copies being a ?cloud library?.
> 
>    Interesting thread, this?
> 
>    -jc
> 
> 
> 
>    On 15/5/17, 11:41 am, "Pasig-discuss on behalf of BUNTON, GLENN" <pasig-discuss-bounces at asis.org on behalf of BUNTONGA at mailbox.sc.edu> wrote:
> 
>        This discussion of the various digital repository storage approaches has been very enlightening and useful so far. I appreciate all the excellent details. There is one piece of information, however, that is missing. Cost? Both initial implementation outlay and ongoing costs. Any general sense of costs would be greatly appreciated. 
> 
> 
>        -----Original Message-----
>        From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Steve Knight
>        Sent: Sunday, May 14, 2017 6:44 PM
>        To: 'Sheila Morrissey' <Sheila.Morrissey at ithaka.org>; pasig-discuss at asis.org
>        Subject: Re: [Pasig-discuss] Digital repository storage benchmarking
> 
>        Hi Tim
> 
>        At the National library of New Zealand, we are storing about 210TB of digital objects in our permanent repository.
> 
>        We have a 25TB online cache, with an online copy of all the digital objects sitting on disk. 
> 
>        Three tape copies of the objects are made as soon as they enter into the disk archive.  1 copy remains within the tape library (nearline), the other 2 copies are sent offsite (offline). We use Oracle SAM-QFS to manage the storage policies and automatic tierage.
> 
>        We have a similar treatment for our 100TB of Test data, which has 1 less offsite tape copy.
> 
>        We are currently looking at replacing this storage architecture with a mix of Hitachi's HDI and HCP S30 object storage products and our cloud provider's object storage offering. The cloud provider storage includes replication across 3 geographic locations providing both higher availability and higher resilience than we currently have.
> 
>        By moving to an all online solution we hope to increase overall performance and make savings through utilising object storage and exiting some services related to current backup and restore processes.
> 
>        Regards
>        Steve
> 
> 
> 
> 
>        -----Original Message-----
>        From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Sheila Morrissey
>        Sent: Saturday, 13 May 2017 5:44 a.m.
>        To: pasig-discuss at asis.org
>        Subject: [Pasig-discuss] FW: Digital repository storage benchmarking
> 
> 
>        Hello, Tim,
> 
>        At Portico (http://www.portico.org/digital-preservation/), we preserve e-journals, e-books, digitized historical collections, and other born-digital scholarly content.
> 
>        Currently, the Portico archive is comprised of roughly 77.7 million digital objects (we call them "Archival Units", or AUs); comprising over 400 TB; made up of 1.3 billion files.
> 
>        We maintain 3 copies of the archive:  2 on disk in geographically distributed data centers, and a 3rd copy in commercial cloud storage.  We create and maintain backups (including fixity checks) using our own custom-written software.
> 
>        I hope this helpful.
> 
>        Best regards,
>        Sheila
> 
> 
>        Sheila M. Morrissey
>        Senior Researcher
>        ITHAKA
>        100 Campus Drive
>        Suite 100
>        Princeton NJ 08540
>        609-986-2221
>        sheila.morrissey at ithaka.org
> 
>        ITHAKA (www.ithaka.org) is a not-for-profit organization that helps the academic community use digital technologies to preserve the scholarly record and to advance research and teaching in sustainable ways.  We provide innovative services that benefit higher education, including Ithaka S+R, JSTOR, and Portico.
> 
> 
> 
>        -----Original Message-----
>        From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Tim Walsh
>        Sent: Friday, May 12, 2017 10:16 AM
>        To: pasig-discuss at asis.org
>        Subject: [Pasig-discuss] Digital repository storage benchmarking
> 
>        Dear PASIG,
> 
>        I am currently in the process of benchmarking digital repository storage setups with our Director of IT, and am having trouble finding very much information about other institutions? configurations online. It?s very possible that this question has been asked before on-list, but I wasn?t able to find anything in the list archives.
> 
>        For context, we are a research museum with significant born-digital archival holdings preparing to manage about 200 TB of digital objects over the next 3 years, replicated several times on various media. The question is what precisely those ?various media? will be. Currently, our plan is to store one copy on disk on-site, one copy on disk in a managed off-site facility, and a third copy on LTO sent to a third facility. Before we commit, we?d like to benchmark our plans against other institutions.
> 
>        I have been able to find information about the storage configurations for MoMA and the Computer History Museum (who each wrote blog posts or presented on this topic), but not very many others. So my questions are:
> 
>        * Could you point me to published/available resources outlining other institutions? digital repository storage configurations?
>        * Or, if you work at an institution, would you be willing to share the details of your configuration on- or off-list? (any information sent off-list will be kept strictly confidential)
> 
>        Helpful details would include: amount of digital objects being stored; how many copies of data are being stored; which copies are online, nearline, or offline; which media are being used for which copies; and what services/software applications are you using to manage the creation and maintainance of backups.
> 
>        Thank you!
>        Tim
> 
>        - - -
> 
>        Tim Walsh
>        Archiviste, Archives num?riques
>        Archivist, Digital Archives
> 
>        Centre Canadien d?Architecture
>        Canadian Centre for Architecture
>        1920, rue Baile, Montr?al, Qu?bec  H3H 2S6 T 514 939 7001 x 1532 F 514 939 7020 www.cca.qc.ca<http://www.cca.qc.ca/>
> 
> 
>        Pensez ? l?environnement avant d?imprimer ce message Please consider the environment before printing this email Ce courriel peut contenir des renseignements confidentiels. Si vous n??tes pas le destinataire pr?vu, veuillez nous en aviser imm?diatement. Merci ?galement de supprimer le pr?sent courriel et d?en d?truire toute copie.
>        This email may contain confidential information. If you are not the intended recipient, please advise us immediately and delete this email as well as any other copy. Thank you.
> 
>        ----
>        To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
>        _______
>        PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
>        _______________________________________________
>        Pasig-discuss mailing list
>        Pasig-discuss at mail.asis.org
>        http://mail.asis.org/mailman/listinfo/pasig-discuss
> 
>        ----
>        To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
>        _______
>        PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
>        _______________________________________________
>        Pasig-discuss mailing list
>        Pasig-discuss at mail.asis.org
>        http://mail.asis.org/mailman/listinfo/pasig-discuss
>        ----
>        To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
>        _______
>        PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
>        _______________________________________________
>        Pasig-discuss mailing list
>        Pasig-discuss at mail.asis.org
>        http://mail.asis.org/mailman/listinfo/pasig-discuss
> 
>        ----
>        To subscribe, unsubscribe, or modify your subscription, please visit
>        http://mail.asis.org/mailman/listinfo/pasig-discuss
>        _______
>        PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
>        _______________________________________________
>        Pasig-discuss mailing list
>        Pasig-discuss at mail.asis.org
>        http://mail.asis.org/mailman/listinfo/pasig-discuss
> 
> 
> 
>    ----
>    To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
>    _______
>    PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
>    _______________________________________________
>    Pasig-discuss mailing list
>    Pasig-discuss at mail.asis.org
>    http://mail.asis.org/mailman/listinfo/pasig-discuss
> 
>    ----
>    To subscribe, unsubscribe, or modify your subscription, please visit
>    http://mail.asis.org/mailman/listinfo/pasig-discuss
>    _______
>    PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
>    _______________________________________________
>    Pasig-discuss mailing list
>    Pasig-discuss at mail.asis.org
>    http://mail.asis.org/mailman/listinfo/pasig-discuss
> 
> 
> 
> ----
> To subscribe, unsubscribe, or modify your subscription, please visit
> http://mail.asis.org/mailman/listinfo/pasig-discuss
> _______
> PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
> _______________________________________________
> Pasig-discuss mailing list
> Pasig-discuss at mail.asis.org
> http://mail.asis.org/mailman/listinfo/pasig-discuss


From william.kilbride at dpconline.org  Wed May 17 04:18:18 2017
From: william.kilbride at dpconline.org (William Kilbride)
Date: Wed, 17 May 2017 08:18:18 +0000
Subject: [Pasig-discuss] Digital repository storage benchmarking
In-Reply-To: <F4B89086-407A-4EEA-8040-B5D04B798487@gmail.com>
References: <78ADB971-820E-4450-BDEB-1814B86B19F0@uq.edu.au>
	<DB6PR0202MB2616142EFA1F4E14C65E229B95E10@DB6PR0202MB2616.eurprd02.prod.outlook.com>
	<2EBED878-C03B-41EB-BAA8-E36F949EF821@harvard.edu>
	<F4B89086-407A-4EEA-8040-B5D04B798487@gmail.com>
Message-ID: <DB6PR0202MB26163591FDE1B954E03FBB1895E70@DB6PR0202MB2616.eurprd02.prod.outlook.com>

Hi Louis,

Yes you're quite right: a list is great but it's not ideal for sharing this kind of information. 

I really do encourage you therefore to look seriously at the Curation Costs Exchange.  The 4c project took quite a lot of time to manage not only the legal constraints and anonymization issues but also the different accountancy approaches that can make it hard to compare data meaningfully.  Please do take a look (all!): http://www.curationexchange.org/ The more we put in the more we will get out...

W :-)

-----Original Message-----
From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Louis Su?rez-Potts
Sent: 16 May 2017 18:54
To: pasig-discuss at asis.org
Subject: Re: [Pasig-discuss] Digital repository storage benchmarking

> Now the request: there's a network effect here.  The more agencies share data the more useful the data becomes.  So can I encourage you all to share that information (anonymously or identifiably) via the costs exchange?


Hi
I'm all for sharing this data, as well as other relevant information, including accounts of how we do things, even when they are mistakes. But an email list is not the best venue; something more pliable, like a wiki or its equivalent? I'm sure there are options. And equally sure that this particular issue has complications related to political location that do need to be made clear, as political mandates (must be within certain political boundaries, say) affect cost, inter alia.

Cheers,
Louis


> On 2017-05-16, at 10:05, Stern, Randy <randy_stern at harvard.edu> wrote:
> 
> Re costs:  For Harvard Library?s Digital Repository Service - 2 disk copies plus 2 tape copies - as of July 1, the cost of storage for depositors to the DRS is $1.25/GB/year for storage. This figure is moderately close to the storage hardware costs. The storage cost does not include staff costs, preservation activities, or server costs associated with the core DRS software services, tools, and databases.
> 
> Randy
> 
> 
> 
> On 5/15/17, 4:02 AM, "Pasig-discuss on behalf of William Kilbride" <pasig-discuss-bounces at asis.org on behalf of william.kilbride at dpconline.org> wrote:
> 
>    Hi All, Hi Tim
> 
>    This is a super thread and I am learning a tonne.  On the subject of costs I can make a recommendation and request ...
> 
>    The Curation Costs Exchange is a useful thing and well worth a look 
> for anyone looking at comparative costs across the digital 
> preservation lifecycle including storage.  It's not been mentioned yet 
> in the discussions, I assume because everyone is already aware of it.  
> But have a look: http://www.curationexchange.org/
> 
>    The conclusion we drew from the 4C project was that financial planning was a core skill in preservation planning. So to be a 'trusted' repository an institution should be able to demonstrate certain skills in financial planning and be transparent about it.  It's expressed more elegantly in the 4c project roadmap: 
>    http://www.4cproject.eu/roadmap/
> 
>    Now the request: there's a network effect here.  The more agencies share data the more useful the data becomes.  So can I encourage you all to share that information (anonymously or identifiably) via the costs exchange?
> 
>    All best wishes,
> 
>    William
> 
> 
>    -----Original Message-----
>    From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Jake Carroll
>    Sent: 15 May 2017 04:01
>    To: pasig-discuss at asis.org
>    Subject: Re: [Pasig-discuss] Digital repository storage 
> benchmarking
> 
>    Certainly interesting.
> 
>    At the Queensland Brain Institute and the Australian Institute of Bioengineering and Nanotechnology at the University of Queensland, we have around 8.5PB of data under management across our HSM platforms. We currently use Oracle HSM for this task.
> 
>    We have 256TB of online ?cache? for the data landing location split across 6 different filesystems that are tuned differently for different types of workloads and different tasks. These workloads are generally categorised into a few functions:
> 
>    ? High IO, large serial writes from instruments ? Low IO, large 
> serial writes from instruments ? High IO, granular ?many files, many 
> IOPS? from instruments and computational factors ? Low IO, granular 
> ?many files, low IOPS? from instruments and computational factors ? 
> Generic group share ? Generic user dir
> 
>    It is an interesting thing to manage and run statistical modelling on in terms of performance analysis and micro benchmarking of data movement patterns. All the filesystems above are provisioned on 16Gbit/sec FC connected Hitachi HUS-VM, 10K SAS.
> 
>    The metadata for these filesystems is around 10 terabytes of Hitachi Accelerated Advanced Flash storage. We have around 3.8 billion files/unique objects under management.
> 
>    We run a ?disk based copy? (we call that copy1) which is our disk based VSN or vault. It is around 1PB of ZFS managed storage sitting inside the very large Hitachi HUS-VM platform.
> 
>    Our Copy2 and Copy3 are 2 * T10000D Oracle tape media copies in SL3000 storage silos, geographically distributed.
> 
>    We do some interesting things with our tape infrastructure, including DIV-always-on, proactive data protection sweeps inside the HSM and continuous validation checks against the media. We also run STA (tape analytics tools) extra-data-path so we can see *exactly* what each drive is doing at all times. Believe me, we see things that would baffle and boggle the mind (and probably create a healthy sense of paranoia!) if you knew exactly what was going on ?inside there?.
> 
>    We use finely tuned policy for data automation of movement between tiers so as to minimally impact user experience. Our HSM supports offline file mapping to the windows client, so people can tell when their files and objects are ?offline?. It is a useful semantic and great for usability for people.
> 
>    We ZFS scrub the disk copy for ?always on disk consistency?, we use tpverify commands on the tape media also to consistently check the media itself. We?re experimenting with implementing fixity shortly too, as the filesystem supports it.
> 
>    As for going ?all online?, at our scale ?we just can?t afford it yet, to walk away from ?cold tape? principles. We?re just too big. We?d love to rid ourselves of the complexities of it, and consider a full cloud based consumption model, but having crunched the very hard numbers of things such as AWS Glacier and S3, it is a long (long) way more expensive than the relative TCO?s of running it ?on premise? at this stage. My hope is that this will change soon and I can start experimenting with one of my copies being a ?cloud library?.
> 
>    Interesting thread, this?
> 
>    -jc
> 
> 
> 
>    On 15/5/17, 11:41 am, "Pasig-discuss on behalf of BUNTON, GLENN" <pasig-discuss-bounces at asis.org on behalf of BUNTONGA at mailbox.sc.edu> wrote:
> 
>        This discussion of the various digital repository storage approaches has been very enlightening and useful so far. I appreciate all the excellent details. There is one piece of information, however, that is missing. Cost? Both initial implementation outlay and ongoing costs. Any general sense of costs would be greatly appreciated. 
> 
> 
>        -----Original Message-----
>        From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Steve Knight
>        Sent: Sunday, May 14, 2017 6:44 PM
>        To: 'Sheila Morrissey' <Sheila.Morrissey at ithaka.org>; pasig-discuss at asis.org
>        Subject: Re: [Pasig-discuss] Digital repository storage 
> benchmarking
> 
>        Hi Tim
> 
>        At the National library of New Zealand, we are storing about 210TB of digital objects in our permanent repository.
> 
>        We have a 25TB online cache, with an online copy of all the digital objects sitting on disk. 
> 
>        Three tape copies of the objects are made as soon as they enter into the disk archive.  1 copy remains within the tape library (nearline), the other 2 copies are sent offsite (offline). We use Oracle SAM-QFS to manage the storage policies and automatic tierage.
> 
>        We have a similar treatment for our 100TB of Test data, which has 1 less offsite tape copy.
> 
>        We are currently looking at replacing this storage architecture with a mix of Hitachi's HDI and HCP S30 object storage products and our cloud provider's object storage offering. The cloud provider storage includes replication across 3 geographic locations providing both higher availability and higher resilience than we currently have.
> 
>        By moving to an all online solution we hope to increase overall performance and make savings through utilising object storage and exiting some services related to current backup and restore processes.
> 
>        Regards
>        Steve
> 
> 
> 
> 
>        -----Original Message-----
>        From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Sheila Morrissey
>        Sent: Saturday, 13 May 2017 5:44 a.m.
>        To: pasig-discuss at asis.org
>        Subject: [Pasig-discuss] FW: Digital repository storage 
> benchmarking
> 
> 
>        Hello, Tim,
> 
>        At Portico (http://www.portico.org/digital-preservation/), we preserve e-journals, e-books, digitized historical collections, and other born-digital scholarly content.
> 
>        Currently, the Portico archive is comprised of roughly 77.7 million digital objects (we call them "Archival Units", or AUs); comprising over 400 TB; made up of 1.3 billion files.
> 
>        We maintain 3 copies of the archive:  2 on disk in geographically distributed data centers, and a 3rd copy in commercial cloud storage.  We create and maintain backups (including fixity checks) using our own custom-written software.
> 
>        I hope this helpful.
> 
>        Best regards,
>        Sheila
> 
> 
>        Sheila M. Morrissey
>        Senior Researcher
>        ITHAKA
>        100 Campus Drive
>        Suite 100
>        Princeton NJ 08540
>        609-986-2221
>        sheila.morrissey at ithaka.org
> 
>        ITHAKA (www.ithaka.org) is a not-for-profit organization that helps the academic community use digital technologies to preserve the scholarly record and to advance research and teaching in sustainable ways.  We provide innovative services that benefit higher education, including Ithaka S+R, JSTOR, and Portico.
> 
> 
> 
>        -----Original Message-----
>        From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Tim Walsh
>        Sent: Friday, May 12, 2017 10:16 AM
>        To: pasig-discuss at asis.org
>        Subject: [Pasig-discuss] Digital repository storage 
> benchmarking
> 
>        Dear PASIG,
> 
>        I am currently in the process of benchmarking digital repository storage setups with our Director of IT, and am having trouble finding very much information about other institutions? configurations online. It?s very possible that this question has been asked before on-list, but I wasn?t able to find anything in the list archives.
> 
>        For context, we are a research museum with significant born-digital archival holdings preparing to manage about 200 TB of digital objects over the next 3 years, replicated several times on various media. The question is what precisely those ?various media? will be. Currently, our plan is to store one copy on disk on-site, one copy on disk in a managed off-site facility, and a third copy on LTO sent to a third facility. Before we commit, we?d like to benchmark our plans against other institutions.
> 
>        I have been able to find information about the storage configurations for MoMA and the Computer History Museum (who each wrote blog posts or presented on this topic), but not very many others. So my questions are:
> 
>        * Could you point me to published/available resources outlining other institutions? digital repository storage configurations?
>        * Or, if you work at an institution, would you be willing to 
> share the details of your configuration on- or off-list? (any 
> information sent off-list will be kept strictly confidential)
> 
>        Helpful details would include: amount of digital objects being stored; how many copies of data are being stored; which copies are online, nearline, or offline; which media are being used for which copies; and what services/software applications are you using to manage the creation and maintainance of backups.
> 
>        Thank you!
>        Tim
> 
>        - - -
> 
>        Tim Walsh
>        Archiviste, Archives num?riques
>        Archivist, Digital Archives
> 
>        Centre Canadien d?Architecture
>        Canadian Centre for Architecture
>        1920, rue Baile, Montr?al, Qu?bec  H3H 2S6 T 514 939 7001 x 
> 1532 F 514 939 7020 www.cca.qc.ca<http://www.cca.qc.ca/>
> 
> 
>        Pensez ? l?environnement avant d?imprimer ce message Please consider the environment before printing this email Ce courriel peut contenir des renseignements confidentiels. Si vous n??tes pas le destinataire pr?vu, veuillez nous en aviser imm?diatement. Merci ?galement de supprimer le pr?sent courriel et d?en d?truire toute copie.
>        This email may contain confidential information. If you are not the intended recipient, please advise us immediately and delete this email as well as any other copy. Thank you.
> 
>        ----
>        To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
>        _______
>        PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
>        _______________________________________________
>        Pasig-discuss mailing list
>        Pasig-discuss at mail.asis.org
>        http://mail.asis.org/mailman/listinfo/pasig-discuss
> 
>        ----
>        To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
>        _______
>        PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
>        _______________________________________________
>        Pasig-discuss mailing list
>        Pasig-discuss at mail.asis.org
>        http://mail.asis.org/mailman/listinfo/pasig-discuss
>        ----
>        To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
>        _______
>        PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
>        _______________________________________________
>        Pasig-discuss mailing list
>        Pasig-discuss at mail.asis.org
>        http://mail.asis.org/mailman/listinfo/pasig-discuss
> 
>        ----
>        To subscribe, unsubscribe, or modify your subscription, please visit
>        http://mail.asis.org/mailman/listinfo/pasig-discuss
>        _______
>        PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
>        _______________________________________________
>        Pasig-discuss mailing list
>        Pasig-discuss at mail.asis.org
>        http://mail.asis.org/mailman/listinfo/pasig-discuss
> 
> 
> 
>    ----
>    To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
>    _______
>    PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
>    _______________________________________________
>    Pasig-discuss mailing list
>    Pasig-discuss at mail.asis.org
>    http://mail.asis.org/mailman/listinfo/pasig-discuss
> 
>    ----
>    To subscribe, unsubscribe, or modify your subscription, please visit
>    http://mail.asis.org/mailman/listinfo/pasig-discuss
>    _______
>    PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
>    _______________________________________________
>    Pasig-discuss mailing list
>    Pasig-discuss at mail.asis.org
>    http://mail.asis.org/mailman/listinfo/pasig-discuss
> 
> 
> 
> ----
> To subscribe, unsubscribe, or modify your subscription, please visit 
> http://mail.asis.org/mailman/listinfo/pasig-discuss
> _______
> PASIG Webinars and conference material is at 
> http://www.preservationandarchivingsig.org/index.html
> _______________________________________________
> Pasig-discuss mailing list
> Pasig-discuss at mail.asis.org
> http://mail.asis.org/mailman/listinfo/pasig-discuss


----
To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org
http://mail.asis.org/mailman/listinfo/pasig-discuss


From allasia at eurixgroup.com  Wed May 17 06:37:50 2017
From: allasia at eurixgroup.com (Walter Allasia)
Date: Wed, 17 May 2017 12:37:50 +0200
Subject: [Pasig-discuss] Digital repository storage benchmarking
In-Reply-To: =?iso-8859-1?q?=3CDB6PR0202MB26163591FDE1B954E03FBB1895E70=40DB6PR020?=
	=?iso-8859-1?q?2MB2616=2Eeurprd02=2Eprod=2Eoutlook=2Ecom=3E?=
References: =?iso-8859-1?q?=3C78ADB971=2D820E=2D4450=2DBDEB=2D1814B86B19F0=40uq?=
	=?iso-8859-1?q?=2Eedu=2Eau=3E_=3CDB6PR0202MB2616142EFA1F4E14C65E229B9?=
	=?iso-8859-1?q?5E10=40DB6PR0202MB2616=2Eeurprd02=2Eprod=2Eoutlook=2Ec?=
	=?iso-8859-1?q?om=3E_=3C2EBED878=2DC03B=2D41EB=2DBAA8=2DE36F949EF821?=
	=?iso-8859-1?q?=40harvard=2Eedu=3E_=3CF4B89086=2D407A=2D4EEA=2D8040?=
	=?iso-8859-1?q?=2DB5D04B798487=40gmail=2Ecom=3E_=3CDB6PR0202MB2616359?=
	=?iso-8859-1?q?1FDE1B954E03FBB1895E70=40DB6PR0202MB2616=2Eeurprd02=2E?=
	=?iso-8859-1?q?prod=2Eoutlook=2Ecom=3E?=
Message-ID: <OQ3EV2$B0E5ACB52CA2ED56BF56BCBD30E16D54@eurixgroup.com>


Hi William, All,the 4C website is surely worth taking a look at.?I got good advices and warnings on digital preservation costs.
The greatest issue I?m still dealing with ishow to get the budget for preservation from the actual stakeholders?that usually are not aware of what is needed and?what is running behind the scenes for keeping stuff alive, safe and sound in the long term perspective.
Especially in contexts of ?public administration? where I am operating right now,?it seems that nobody cares about costs of storage (or digital preservation hardware)?as far as managers are asked to plan their annual budget.?Suddenly storage costs and hardware obsolescence become something?that everybody wants to leave to someone else?because it dramatically cuts their overall budget and?because it is not clear how to sell it.?
That is exactly the point.
My experience demonstrates that people usually perceives digital archives?such as digital wells with contents buried inside.?
Managers at public administration offices are not different.?Even worst since they have ?public money? to manage.
It?s not possible to create business models from ?digital wells?.
Offered services are the key factor.?
IT Services lay the foundations for revenues.?Every so often services are already making use of archives (many times unawares): costs MUST be shared.I believe that also the ?planning? of storage and preservation infrastructure MUST be shared and agreed.?
That?s the reason why the National Broadcaster have LTO tapes running LTFS and at the same time have disk arrays of nearly the same size: it?s a use case of preservation infrastructure driven by offered services, that?s the point.
Well, that?s just my two cents and apologise for the long mail.?
Walter Allasia


Walter Allasia, PhD?
Project Manager at?EURIXGroup {allasia at eurixgroup.com}
Adjunct professor at Physics University of Torino {walter.allasia at unito.it}
Project Manager Consultant at CSI Piemonte {walter.allasia at consulenti.csi.it}
?


From: "Pasig-discuss" pasig-discuss-bounces at asis.org
To: "Louis Su?rez-Potts" luispo at gmail.com,"pasig-discuss at asis.org" pasig-discuss at asis.org
Cc: 
Date: Wed, 17 May 2017 08:18:18 +0000
Subject: Re: [Pasig-discuss] Digital repository storage benchmarking


Hi Louis,
?
Yes you're quite right: a list is great but it's not ideal for sharing this kind of information. 
?
I really do encourage you therefore to look seriously at the Curation Costs Exchange.  The 4c project took quite a lot of time to manage not only the legal constraints and anonymization issues but also the different accountancy approaches that can make it hard to compare data meaningfully.  Please do take a look (all!): http://www.curationexchange.org/ The more we put in the more we will get out...
?
W :-)
?
-----Original Message-----
From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Louis Su?rez-Potts
Sent: 16 May 2017 18:54
To: pasig-discuss at asis.org
Subject: Re: [Pasig-discuss] Digital repository storage benchmarking
?
> Now the request: there's a network effect here.  The more agencies share data the more useful the data becomes.  So can I encourage you all to share that information (anonymously or identifiably) via the costs exchange?
?
?
Hi
I'm all for sharing this data, as well as other relevant information, including accounts of how we do things, even when they are mistakes. But an email list is not the best venue; something more pliable, like a wiki or its equivalent? I'm sure there are options. And equally sure that this particular issue has complications related to political location that do need to be made clear, as political mandates (must be within certain political boundaries, say) affect cost, inter alia.
?
Cheers,
Louis
?
?
?
> On 2017-05-16, at 10:05, Stern, Randy <randy_stern at harvard.edu> wrote:
> 
> Re costs:  For Harvard Library?s Digital Repository Service - 2 disk copies plus 2 tape copies - as of July 1, the cost of storage for depositors to the DRS is $1.25/GB/year for storage. This figure is moderately close to the storage hardware costs. The storage cost does not include staff costs, preservation activities, or server costs associated with the core DRS software services, tools, and databases.
> 
> Randy
> 
> 
> 
> On 5/15/17, 4:02 AM, "Pasig-discuss on behalf of William Kilbride" <pasig-discuss-bounces at asis.org behalf of william.kilbride at dpconline.org> wrote:
> 
>    Hi All, Hi Tim
> 
>    This is a super thread and I am learning a tonne.  On the subject of costs I can make a recommendation and request ...
> 
>    The Curation Costs Exchange is a useful thing and well worth a look 
> for anyone looking at comparative costs across the digital 
> preservation lifecycle including storage.  It's not been mentioned yet 
> in the discussions, I assume because everyone is already aware of it.  
> But have a look: http://www.curationexchange.org/
> 
>    The conclusion we drew from the 4C project was that financial planning was a core skill in preservation planning. So to be a 'trusted' repository an institution should be able to demonstrate certain skills in financial planning and be transparent about it.  It's expressed more elegantly in the 4c project roadmap: 
>    http://www.4cproject.eu/roadmap/
> 
>    Now the request: there's a network effect here.  The more agencies share data the more useful the data becomes.  So can I encourage you all to share that information (anonymously or identifiably) via the costs exchange?
> 
>    All best wishes,
> 
>    William
> 
> 
>    -----Original Message-----
>    From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Jake Carroll
>    Sent: 15 May 2017 04:01
>    To: pasig-discuss at asis.org
>    Subject: Re: [Pasig-discuss] Digital repository storage 
> benchmarking
> 
>    Certainly interesting.
> 
>    At the Queensland Brain Institute and the Australian Institute of Bioengineering and Nanotechnology at the University of Queensland, we have around 8.5PB of data under management across our HSM platforms. We currently use Oracle HSM for this task.
> 
>    We have 256TB of online ?cache? for the data landing location split across 6 different filesystems that are tuned differently for different types of workloads and different tasks. These workloads are generally categorised into a few functions:
> 
>    ? High IO, large serial writes from instruments ? Low IO, large 
> serial writes from instruments ? High IO, granular ?many files, many 
> IOPS? from instruments and computational factors ? Low IO, granular 
> ?many files, low IOPS? from instruments and computational factors ? 
> Generic group share ? Generic user dir
> 
>    It is an interesting thing to manage and run statistical modelling on in terms of performance analysis and micro benchmarking of data movement patterns. All the filesystems above are provisioned on 16Gbit/sec FC connected Hitachi HUS-VM, 10K SAS.
> 
>    The metadata for these filesystems is around 10 terabytes of Hitachi Accelerated Advanced Flash storage. We have around 3.8 billion files/unique objects under management.
> 
>    We run a ?disk based copy? (we call that copy1) which is our disk based VSN or vault. It is around 1PB of ZFS managed storage sitting inside the very large Hitachi HUS-VM platform.
> 
>    Our Copy2 and Copy3 are 2 * T10000D Oracle tape media copies in SL3000 storage silos, geographically distributed.
> 
>    We do some interesting things with our tape infrastructure, including DIV-always-on, proactive data protection sweeps inside the HSM and continuous validation checks against the media. We also run STA (tape analytics tools) extra-data-path so we can see *exactly* what each drive is doing at all times. Believe me, we see things that would baffle and boggle the mind (and probably create a healthy sense of paranoia!) if you knew exactly what was going on ?inside there?.
> 
>    We use finely tuned policy for data automation of movement between tiers so as to minimally impact user experience. Our HSM supports offline file mapping to the windows client, so people can tell when their files and objects are ?offline?. It is a useful semantic and great for usability for people.
> 
>    We ZFS scrub the disk copy for ?always on disk consistency?, we use tpverify commands on the tape media also to consistently check the media itself. We?re experimenting with implementing fixity shortly too, as the filesystem supports it.
> 
>    As for going ?all online?, at our scale ?we just can?t afford it yet, to walk away from ?cold tape? principles. We?re just too big. We?d love to rid ourselves of the complexities of it, and consider a full cloud based consumption model, but having crunched the very hard numbers of things such as AWS Glacier and S3, it is a long (long) way more expensive than the relative TCO?s of running it ?on premise? at this stage. My hope is that this will change soon and I can start experimenting with one of my copies being a ?cloud library?.
> 
>    Interesting thread, this?
> 
>    -jc
> 
> 
> 
>    On 15/5/17, 11:41 am, "Pasig-discuss on behalf of BUNTON, GLENN" <pasig-discuss-bounces at asis.org behalf of BUNTONGA at mailbox.sc.edu> wrote:
> 
>        This discussion of the various digital repository storage approaches has been very enlightening and useful so far. I appreciate all the excellent details. There is one piece of information, however, that is missing. Cost? Both initial implementation outlay and ongoing costs. Any general sense of costs would be greatly appreciated. 
> 
> 
>        -----Original Message-----
>        From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Steve Knight
>        Sent: Sunday, May 14, 2017 6:44 PM
>        To: 'Sheila Morrissey' <Sheila.Morrissey at ithaka.org>; pasig-discuss at asis.org
>        Subject: Re: [Pasig-discuss] Digital repository storage 
> benchmarking
> 
>        Hi Tim
> 
>        At the National library of New Zealand, we are storing about 210TB of digital objects in our permanent repository.
> 
>        We have a 25TB online cache, with an online copy of all the digital objects sitting on disk. 
> 
>        Three tape copies of the objects are made as soon as they enter into the disk archive.  1 copy remains within the tape library (nearline), the other 2 copies are sent offsite (offline). We use Oracle SAM-QFS to manage the storage policies and automatic tierage.
> 
>        We have a similar treatment for our 100TB of Test data, which has 1 less offsite tape copy.
> 
>        We are currently looking at replacing this storage architecture with a mix of Hitachi's HDI and HCP S30 object storage products and our cloud provider's object storage offering. The cloud provider storage includes replication across 3 geographic locations providing both higher availability and higher resilience than we currently have.
> 
>        By moving to an all online solution we hope to increase overall performance and make savings through utilising object storage and exiting some services related to current backup and restore processes.
> 
>        Regards
>        Steve
> 
> 
> 
> 
>        -----Original Message-----
>        From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Sheila Morrissey
>        Sent: Saturday, 13 May 2017 5:44 a.m.
>        To: pasig-discuss at asis.org
>        Subject: [Pasig-discuss] FW: Digital repository storage 
> benchmarking
> 
> 
>        Hello, Tim,
> 
>        At Portico (http://www.portico.org/digital-preservation/), we preserve e-journals, e-books, digitized historical collections, and other born-digital scholarly content.
> 
>        Currently, the Portico archive is comprised of roughly 77.7 million digital objects (we call them "Archival Units", or AUs); comprising over 400 TB; made up of 1.3 billion files.
> 
>        We maintain 3 copies of the archive:  2 on disk in geographically distributed data centers, and a 3rd copy in commercial cloud storage.  We create and maintain backups (including fixity checks) using our own custom-written software.
> 
>        I hope this helpful.
> 
>        Best regards,
>        Sheila
> 
> 
>        Sheila M. Morrissey
>        Senior Researcher
>        ITHAKA
>        100 Campus Drive
>        Suite 100
>        Princeton NJ 08540
>        609-986-2221
>        sheila.morrissey at ithaka.org
> 
>        ITHAKA (www.ithaka.org) is a not-for-profit organization that helps the academic community use digital technologies to preserve the scholarly record and to advance research and teaching in sustainable ways.  We provide innovative services that benefit higher education, including Ithaka S+R, JSTOR, and Portico.
> 
> 
> 
>        -----Original Message-----
>        From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Tim Walsh
>        Sent: Friday, May 12, 2017 10:16 AM
>        To: pasig-discuss at asis.org
>        Subject: [Pasig-discuss] Digital repository storage 
> benchmarking
> 
>        Dear PASIG,
> 
>        I am currently in the process of benchmarking digital repository storage setups with our Director of IT, and am having trouble finding very much information about other institutions? configurations online. It?s very possible that this question has been asked before on-list, but I wasn?t able to find anything in the list archives.
> 
>        For context, we are a research museum with significant born-digital archival holdings preparing to manage about 200 TB of digital objects over the next 3 years, replicated several times on various media. The question is what precisely those ?various media? will be. Currently, our plan is to store one copy on disk on-site, one copy on disk in a managed off-site facility, and a third copy on LTO sent to a third facility. Before we commit, we?d like to benchmark our plans against other institutions.
> 
>        I have been able to find information about the storage configurations for MoMA and the Computer History Museum (who each wrote blog posts or presented on this topic), but not very many others. So my questions are:
> 
>        * Could you point me to published/available resources outlining other institutions? digital repository storage configurations?
>        * Or, if you work at an institution, would you be willing to 
> share the details of your configuration on- or off-list? (any 
> information sent off-list will be kept strictly confidential)
> 
>        Helpful details would include: amount of digital objects being stored; how many copies of data are being stored; which copies are online, nearline, or offline; which media are being used for which copies; and what services/software applications are you using to manage the creation and maintainance of backups.
> 
>        Thank you!
>        Tim
> 
>        - - -
> 
>        Tim Walsh
>        Archiviste, Archives num?riques
>        Archivist, Digital Archives
> 
>        Centre Canadien d?Architecture
>        Canadian Centre for Architecture
>        1920, rue Baile, Montr?al, Qu?bec  H3H 2S6 T 514 939 7001 x 
> 1532 F 514 939 7020 www.cca.qc.ca<http://www.cca.qc.ca/>
> 
> 
>        Pensez ? l?environnement avant d?imprimer ce message Please consider the environment before printing this email Ce courriel peut contenir des renseignements confidentiels. Si vous n??tes pas le destinataire pr?vu, veuillez nous en aviser imm?diatement. Merci ?galement de supprimer le pr?sent courriel et d?en d?truire toute copie.
>        This email may contain confidential information. If you are not the intended recipient, please advise us immediately and delete this email as well as any other copy. Thank you.
> 
>        ----
>        To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
>        _______
>        PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
>        _______________________________________________
>        Pasig-discuss mailing list
>        Pasig-discuss at mail.asis.org
>        http://mail.asis.org/mailman/listinfo/pasig-discuss
> 
>        ----
>        To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
>        _______
>        PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
>        _______________________________________________
>        Pasig-discuss mailing list
>        Pasig-discuss at mail.asis.org
>        http://mail.asis.org/mailman/listinfo/pasig-discuss
>        ----
>        To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
>        _______
>        PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
>        _______________________________________________
>        Pasig-discuss mailing list
>        Pasig-discuss at mail.asis.org
>        http://mail.asis.org/mailman/listinfo/pasig-discuss
> 
>        ----
>        To subscribe, unsubscribe, or modify your subscription, please visit
>        http://mail.asis.org/mailman/listinfo/pasig-discuss
>        _______
>        PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
>        _______________________________________________
>        Pasig-discuss mailing list
>        Pasig-discuss at mail.asis.org
>        http://mail.asis.org/mailman/listinfo/pasig-discuss
> 
> 
> 
>    ----
>    To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
>    _______
>    PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
>    _______________________________________________
>    Pasig-discuss mailing list
>    Pasig-discuss at mail.asis.org
>    http://mail.asis.org/mailman/listinfo/pasig-discuss
> 
>    ----
>    To subscribe, unsubscribe, or modify your subscription, please visit
>    http://mail.asis.org/mailman/listinfo/pasig-discuss
>    _______
>    PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
>    _______________________________________________
>    Pasig-discuss mailing list
>    Pasig-discuss at mail.asis.org
>    http://mail.asis.org/mailman/listinfo/pasig-discuss
> 
> 
> 
> ----
> To subscribe, unsubscribe, or modify your subscription, please visit 
> http://mail.asis.org/mailman/listinfo/pasig-discuss
> _______
> PASIG Webinars and conference material is at 
> http://www.preservationandarchivingsig.org/index.html
> _______________________________________________
> Pasig-discuss mailing list
> Pasig-discuss at mail.asis.org
> http://mail.asis.org/mailman/listinfo/pasig-discuss
?
?
----
To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org
http://mail.asis.org/mailman/listinfo/pasig-discuss
?
----
To subscribe, unsubscribe, or modify your subscription, please visit
http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org
http://mail.asis.org/mailman/listinfo/pasig-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170517/c4c55d81/attachment-0001.html>

From bja at kb.dk  Wed May 17 02:48:00 2017
From: bja at kb.dk (Bjarne Andersen)
Date: Wed, 17 May 2017 06:48:00 +0000
Subject: [Pasig-discuss] Digital repository storage benchmarking
In-Reply-To: <2EBED878-C03B-41EB-BAA8-E36F949EF821@harvard.edu>
References: <78ADB971-820E-4450-BDEB-1814B86B19F0@uq.edu.au>
	<DB6PR0202MB2616142EFA1F4E14C65E229B95E10@DB6PR0202MB2616.eurprd02.prod.outlook.com>
	<2EBED878-C03B-41EB-BAA8-E36F949EF821@harvard.edu>
Message-ID: <de785f4263c943f390d6f8b3b20ad322@sbexch04.sb.statsbiblioteket.dk>

In Denmark The Royal Danish Library has developed the open source BitRepository software
www.bitrepository.org
This software handles "nothing but" the preservation of bits.
Its very basically explained a system for handling multiple copies of data on different "pillars" (different technologies, different locations, different organisations) to ensure as independent copies of data as possible.

In our own collections we store and preserve more than 4Pbytes unique content meaning that we have over 15Pbtes of current capacity

The Royal Danish Library offers bit preservation using this software for other national cultural heritage institutions.

Our pricing model basically has two prices - one for ingest (first year) and one for following years (which includes re-investment budget for periodic migration to new media/technology)

The prices are roughly (per Tb/year)
Online (disk): ingest: 500 Euros, following years: 200 Euros
Nearline (tape inside robot): ingest 156 Euros, following years: 68 Euros
Offline (tape moved to fire safe box): ingest: 132 Euros, following years 50 Euros.

These are meant for long term preservation so there are access-prices as well - off cause higher for the tape based storage and especially naturally for the Off line model where staff needs to collect tapes from a box and mount into tape robot.

With these prices we can offer a 3-copy setup with e.g. 1 disk and 2 tapes for a total of 750 Euros/Tbytes the first year and 300 Euros/Tbytes in the following years.

The prices includes everything: hardware, staff, power, media migration, etc...

best
-
Bjarne Andersen
Vicedirekt?r
Deputy Director General

It-udvikling og Infrastruktur
It developement & Infrastructure

+45 89 46 21 65 / + 45 25 66 23 53
bja at kb.dk


Det Kgl. Bibliotek
Royal Danish Library

Victor Albecks Vej 1
DK-8000 Aarhus C
+45 3347 4747

CVR 2898 8842
EAN 5798 000 792142

-----Original Message-----
From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Stern, Randy
Sent: Tuesday, May 16, 2017 4:05 PM
To: William Kilbride <william.kilbride at dpconline.org>; Jake Carroll <jake.carroll at uq.edu.au>; pasig-discuss at asis.org
Subject: Re: [Pasig-discuss] Digital repository storage benchmarking

Re costs:  For Harvard Library?s Digital Repository Service - 2 disk copies plus 2 tape copies - as of July 1, the cost of storage for depositors to the DRS is $1.25/GB/year for storage. This figure is moderately close to the storage hardware costs. The storage cost does not include staff costs, preservation activities, or server costs associated with the core DRS software services, tools, and databases.

Randy


On 5/15/17, 4:02 AM, "Pasig-discuss on behalf of William Kilbride" <pasig-discuss-bounces at asis.org on behalf of william.kilbride at dpconline.org> wrote:

    Hi All, Hi Tim
    
    This is a super thread and I am learning a tonne.  On the subject of costs I can make a recommendation and request ...
    
    The Curation Costs Exchange is a useful thing and well worth a look for anyone looking at comparative costs across the digital preservation lifecycle including storage.  It's not been mentioned yet in the discussions, I assume because everyone is already aware of it.  But have a look: http://www.curationexchange.org/ 
    
    The conclusion we drew from the 4C project was that financial planning was a core skill in preservation planning. So to be a 'trusted' repository an institution should be able to demonstrate certain skills in financial planning and be transparent about it.  It's expressed more elegantly in the 4c project roadmap: 
    http://www.4cproject.eu/roadmap/
    
    Now the request: there's a network effect here.  The more agencies share data the more useful the data becomes.  So can I encourage you all to share that information (anonymously or identifiably) via the costs exchange?
    
    All best wishes,
    
    William
    
    
    -----Original Message-----
    From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Jake Carroll
    Sent: 15 May 2017 04:01
    To: pasig-discuss at asis.org
    Subject: Re: [Pasig-discuss] Digital repository storage benchmarking
    
    Certainly interesting.
    
    At the Queensland Brain Institute and the Australian Institute of Bioengineering and Nanotechnology at the University of Queensland, we have around 8.5PB of data under management across our HSM platforms. We currently use Oracle HSM for this task.
    
    We have 256TB of online ?cache? for the data landing location split across 6 different filesystems that are tuned differently for different types of workloads and different tasks. These workloads are generally categorised into a few functions:
    
    ? High IO, large serial writes from instruments ? Low IO, large serial writes from instruments ? High IO, granular ?many files, many IOPS? from instruments and computational factors ? Low IO, granular ?many files, low IOPS? from instruments and computational factors ? Generic group share ? Generic user dir
    
    It is an interesting thing to manage and run statistical modelling on in terms of performance analysis and micro benchmarking of data movement patterns. All the filesystems above are provisioned on 16Gbit/sec FC connected Hitachi HUS-VM, 10K SAS.
    
    The metadata for these filesystems is around 10 terabytes of Hitachi Accelerated Advanced Flash storage. We have around 3.8 billion files/unique objects under management.
    
    We run a ?disk based copy? (we call that copy1) which is our disk based VSN or vault. It is around 1PB of ZFS managed storage sitting inside the very large Hitachi HUS-VM platform.
    
    Our Copy2 and Copy3 are 2 * T10000D Oracle tape media copies in SL3000 storage silos, geographically distributed.
    
    We do some interesting things with our tape infrastructure, including DIV-always-on, proactive data protection sweeps inside the HSM and continuous validation checks against the media. We also run STA (tape analytics tools) extra-data-path so we can see *exactly* what each drive is doing at all times. Believe me, we see things that would baffle and boggle the mind (and probably create a healthy sense of paranoia!) if you knew exactly what was going on ?inside there?.
    
    We use finely tuned policy for data automation of movement between tiers so as to minimally impact user experience. Our HSM supports offline file mapping to the windows client, so people can tell when their files and objects are ?offline?. It is a useful semantic and great for usability for people.
    
    We ZFS scrub the disk copy for ?always on disk consistency?, we use tpverify commands on the tape media also to consistently check the media itself. We?re experimenting with implementing fixity shortly too, as the filesystem supports it.
    
    As for going ?all online?, at our scale ?we just can?t afford it yet, to walk away from ?cold tape? principles. We?re just too big. We?d love to rid ourselves of the complexities of it, and consider a full cloud based consumption model, but having crunched the very hard numbers of things such as AWS Glacier and S3, it is a long (long) way more expensive than the relative TCO?s of running it ?on premise? at this stage. My hope is that this will change soon and I can start experimenting with one of my copies being a ?cloud library?.
    
    Interesting thread, this?
    
    -jc
    
    
    On 15/5/17, 11:41 am, "Pasig-discuss on behalf of BUNTON, GLENN" <pasig-discuss-bounces at asis.org on behalf of BUNTONGA at mailbox.sc.edu> wrote:
    
        This discussion of the various digital repository storage approaches has been very enlightening and useful so far. I appreciate all the excellent details. There is one piece of information, however, that is missing. Cost? Both initial implementation outlay and ongoing costs. Any general sense of costs would be greatly appreciated. 
        
        
        -----Original Message-----
        From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Steve Knight
        Sent: Sunday, May 14, 2017 6:44 PM
        To: 'Sheila Morrissey' <Sheila.Morrissey at ithaka.org>; pasig-discuss at asis.org
        Subject: Re: [Pasig-discuss] Digital repository storage benchmarking
        
        Hi Tim
        
        At the National library of New Zealand, we are storing about 210TB of digital objects in our permanent repository.
        
        We have a 25TB online cache, with an online copy of all the digital objects sitting on disk. 
        
        Three tape copies of the objects are made as soon as they enter into the disk archive.  1 copy remains within the tape library (nearline), the other 2 copies are sent offsite (offline). We use Oracle SAM-QFS to manage the storage policies and automatic tierage.
        
        We have a similar treatment for our 100TB of Test data, which has 1 less offsite tape copy.
        
        We are currently looking at replacing this storage architecture with a mix of Hitachi's HDI and HCP S30 object storage products and our cloud provider's object storage offering. The cloud provider storage includes replication across 3 geographic locations providing both higher availability and higher resilience than we currently have.
        
        By moving to an all online solution we hope to increase overall performance and make savings through utilising object storage and exiting some services related to current backup and restore processes.
        
        Regards
        Steve
        
        
        -----Original Message-----
        From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Sheila Morrissey
        Sent: Saturday, 13 May 2017 5:44 a.m.
        To: pasig-discuss at asis.org
        Subject: [Pasig-discuss] FW: Digital repository storage benchmarking
        
        
        Hello, Tim,
        
        At Portico (http://www.portico.org/digital-preservation/), we preserve e-journals, e-books, digitized historical collections, and other born-digital scholarly content.
        
        Currently, the Portico archive is comprised of roughly 77.7 million digital objects (we call them "Archival Units", or AUs); comprising over 400 TB; made up of 1.3 billion files.
        
        We maintain 3 copies of the archive:  2 on disk in geographically distributed data centers, and a 3rd copy in commercial cloud storage.  We create and maintain backups (including fixity checks) using our own custom-written software.
        
        I hope this helpful.
        
        Best regards,
        Sheila
        
        
        Sheila M. Morrissey
        Senior Researcher
        ITHAKA
        100 Campus Drive
        Suite 100
        Princeton NJ 08540
        609-986-2221
        sheila.morrissey at ithaka.org
         
        ITHAKA (www.ithaka.org) is a not-for-profit organization that helps the academic community use digital technologies to preserve the scholarly record and to advance research and teaching in sustainable ways.  We provide innovative services that benefit higher education, including Ithaka S+R, JSTOR, and Portico.
        
        
        -----Original Message-----
        From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Tim Walsh
        Sent: Friday, May 12, 2017 10:16 AM
        To: pasig-discuss at asis.org
        Subject: [Pasig-discuss] Digital repository storage benchmarking
        
        Dear PASIG,
        
        I am currently in the process of benchmarking digital repository storage setups with our Director of IT, and am having trouble finding very much information about other institutions? configurations online. It?s very possible that this question has been asked before on-list, but I wasn?t able to find anything in the list archives.
        
        For context, we are a research museum with significant born-digital archival holdings preparing to manage about 200 TB of digital objects over the next 3 years, replicated several times on various media. The question is what precisely those ?various media? will be. Currently, our plan is to store one copy on disk on-site, one copy on disk in a managed off-site facility, and a third copy on LTO sent to a third facility. Before we commit, we?d like to benchmark our plans against other institutions.
        
        I have been able to find information about the storage configurations for MoMA and the Computer History Museum (who each wrote blog posts or presented on this topic), but not very many others. So my questions are:
        
        * Could you point me to published/available resources outlining other institutions? digital repository storage configurations?
        * Or, if you work at an institution, would you be willing to share the details of your configuration on- or off-list? (any information sent off-list will be kept strictly confidential)
        
        Helpful details would include: amount of digital objects being stored; how many copies of data are being stored; which copies are online, nearline, or offline; which media are being used for which copies; and what services/software applications are you using to manage the creation and maintainance of backups.
        
        Thank you!
        Tim
        
        - - -
        
        Tim Walsh
        Archiviste, Archives num?riques
        Archivist, Digital Archives
        
        Centre Canadien d?Architecture
        Canadian Centre for Architecture
        1920, rue Baile, Montr?al, Qu?bec  H3H 2S6 T 514 939 7001 x 1532 F 514 939 7020 www.cca.qc.ca<http://www.cca.qc.ca/>
        
        
        Pensez ? l?environnement avant d?imprimer ce message Please consider the environment before printing this email Ce courriel peut contenir des renseignements confidentiels. Si vous n??tes pas le destinataire pr?vu, veuillez nous en aviser imm?diatement. Merci ?galement de supprimer le pr?sent courriel et d?en d?truire toute copie.
        This email may contain confidential information. If you are not the intended recipient, please advise us immediately and delete this email as well as any other copy. Thank you.
        
        ----
        To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
        _______
        PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
        _______________________________________________
        Pasig-discuss mailing list
        Pasig-discuss at mail.asis.org
        http://mail.asis.org/mailman/listinfo/pasig-discuss
        
        ----
        To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
        _______
        PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
        _______________________________________________
        Pasig-discuss mailing list
        Pasig-discuss at mail.asis.org
        http://mail.asis.org/mailman/listinfo/pasig-discuss
        ----
        To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
        _______
        PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
        _______________________________________________
        Pasig-discuss mailing list
        Pasig-discuss at mail.asis.org
        http://mail.asis.org/mailman/listinfo/pasig-discuss
        
        ----
        To subscribe, unsubscribe, or modify your subscription, please visit
        http://mail.asis.org/mailman/listinfo/pasig-discuss
        _______
        PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
        _______________________________________________
        Pasig-discuss mailing list
        Pasig-discuss at mail.asis.org
        http://mail.asis.org/mailman/listinfo/pasig-discuss
        
    
    ----
    To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
    _______
    PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
    _______________________________________________
    Pasig-discuss mailing list
    Pasig-discuss at mail.asis.org
    http://mail.asis.org/mailman/listinfo/pasig-discuss
    
    ----
    To subscribe, unsubscribe, or modify your subscription, please visit
    http://mail.asis.org/mailman/listinfo/pasig-discuss
    _______
    PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
    _______________________________________________
    Pasig-discuss mailing list
    Pasig-discuss at mail.asis.org
    http://mail.asis.org/mailman/listinfo/pasig-discuss
    

----
To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org
http://mail.asis.org/mailman/listinfo/pasig-discuss


From luispo at gmail.com  Wed May 17 13:24:50 2017
From: luispo at gmail.com (=?utf-8?Q?Louis_Su=C3=A1rez-Potts?=)
Date: Wed, 17 May 2017 13:24:50 -0400
Subject: [Pasig-discuss] Digital repository storage benchmarking
In-Reply-To: <DB6PR0202MB26163591FDE1B954E03FBB1895E70@DB6PR0202MB2616.eurprd02.prod.outlook.com>
References: <78ADB971-820E-4450-BDEB-1814B86B19F0@uq.edu.au>
	<DB6PR0202MB2616142EFA1F4E14C65E229B95E10@DB6PR0202MB2616.eurprd02.prod.outlook.com>
	<2EBED878-C03B-41EB-BAA8-E36F949EF821@harvard.edu>
	<F4B89086-407A-4EEA-8040-B5D04B798487@gmail.com>
	<DB6PR0202MB26163591FDE1B954E03FBB1895E70@DB6PR0202MB2616.eurprd02.prod.outlook.com>
Message-ID: <B639ED03-2110-44F5-8977-40BE809F8A85@gmail.com>


> On 2017-05-17, at 04:18, William Kilbride <william.kilbride at dpconline.org> wrote:
> 
> Hi Louis,
> 
> Yes you're quite right: a list is great but it's not ideal for sharing this kind of information. 
> 
> I really do encourage you therefore to look seriously at the Curation Costs Exchange.  The 4c project took quite a lot of time to manage not only the legal constraints and anonymization issues but also the different accountancy approaches that can make it hard to compare data meaningfully.  Please do take a look (all!): http://www.curationexchange.org/ The more we put in the more we will get out...
> 
> W :-)
> 

Thanks, William. I'll look?also with an eye to how such a site wd. be of use to other open source projects. There are a lot out there and though there are meta-organisations like Software Conservancy, and others, the focus is seldom on finance, let alone the difference trans-nationalisation makes.

louis
Louis Su?rez-Potts, PhD
Strategist & Co-Founder
Age of Peers
www.ageofpeers.com/
Skype: louisiam
Twitter: @luispo
Tel: +1.416.625.3843


From dave at dpn.org  Wed May 17 17:11:44 2017
From: dave at dpn.org (David Pcolar)
Date: Wed, 17 May 2017 17:11:44 -0400
Subject: [Pasig-discuss] Digital repository storage benchmarking
In-Reply-To: <F4B89086-407A-4EEA-8040-B5D04B798487@gmail.com>
References: <78ADB971-820E-4450-BDEB-1814B86B19F0@uq.edu.au>
	<DB6PR0202MB2616142EFA1F4E14C65E229B95E10@DB6PR0202MB2616.eurprd02.prod.outlook.com>
	<2EBED878-C03B-41EB-BAA8-E36F949EF821@harvard.edu>
	<F4B89086-407A-4EEA-8040-B5D04B798487@gmail.com>
Message-ID: <6B34DD3F-B3A8-4241-9971-C21BAC89F5BF@dpn.org>

The Digital Preservation Network (www.dpn.org <http://www.dpn.org/>) is a membership organization dedicated to the long term preservation of scholarly output. We have a cooperative distributed model that significantly reduces the risks surrounding content preservation.

DPN ensures the secure preservation of stored content by leveraging a heterogeneous network that spans diverse geographic, technical, and institutional environments. 
DPN?s preservation process can be expressed in five steps: 
(1) Content is deposited into the system via an Ingest Node; 
(2) Content is replicated to at least two other Replicating Nodes and stored in varied repository infrastructures; 
(3) Content is checked via bit auditing and repair services to ensure the content remains the same over time; 
(4) Destroyed or corrupted content is restored by DPN; 
(5) As service provider Nodes enter and leave DPN, content is redistributed to maintain the continuity of preservation services into the far-future. 

We have providers with Disk, Tape, and Cloud infrastructures, replicating copies across the continental US.  Our base service model is 3 copies with a preservation assurance of 20 years with a single payment. 

Current membership pricing is $20,000/year, which includes 5TB of deposits per year.  Deposits above 5TB are a single payment of $2750/TB for a 20-year term ($137.50/TB/Yr)

Please contact us for more information:

Mary Molinaro
Executive Director, Digital Preservation Network
mary at dpn.org

Dave Pcolar
Technical Officer, Digital Preservation Network
dave at dpn.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170517/e8ae4ccb/attachment-0001.html>

From dwilcox at duraspace.org  Thu May 18 12:01:56 2017
From: dwilcox at duraspace.org (David Wilcox)
Date: Thu, 18 May 2017 12:01:56 -0400
Subject: [Pasig-discuss] INVITATION: Fedora and Hydra Camp at Oxford
Message-ID: <C92AC189-65A3-4B7D-8780-6FAF44F9938E@duraspace.org>

DuraSpace and Data Curation Experts <https://curationexperts.com/> are pleased to invite you to attend the Fedora and Hydra Camp at Oxford University, Sept 4 - 8, 2017. The camp will be hosted by Oxford University <http://www.ox.ac.uk/>, Oxford, UK and is supported by Jisc <https://www.jisc.ac.uk/>.

Training begins with the basics and build toward more advanced concepts?no prior Fedora or Hydra experience is required. Participants can expect to come away with a deep dive Fedora and Hydra learning experience coupled with multiple opportunities for applying hands-on techniques working with experienced trainers from both communities.

Registration is limited to the first 40 applicants so register here soon <http://events.r20.constantcontact.com/register/event?oeidk=a07ee5ik0pya3fdf38d&llr=5iy95gcab&showPage=true>! An early bird discount is available until July 10.

Background

Fedora is the robust, modular, open source repository platform for the management and dissemination of digital content. Fedora 4, the latest production version of Fedora, features vast improvements in scalability, linked data capabilities, research data support, modularity, ease of use and more.

Hydra is a repository solution that is being used by institutions worldwide <https://projecthydra.org/partners-and-more/> to provide access to their digital content (see map <https://mapsengine.google.com/map/edit?mid=zXYQDa2z16pc.krO3mzzz6Lic>). Hydra provides a versatile and feature rich environment for end-users and repository administrators alike.

About Fedora Camp

Previous Fedora Camps include the inaugural camp <https://wiki.duraspace.org/display/FF/Fedora+Camp+-+16-18+November+2015> held at Duke University, the West Coast camp <https://wiki.duraspace.org/display/FF/Fedora+Camp+California+-+11-13+April+2016> at CalTech, and the most recent, NYC camp <https://wiki.duraspace.org/display/FF/Fedora+Camp+NYC+-+28-30+November+2016> held at Columbia University. Hydra Camps <https://curationexperts.com/our-services/hydra-camp/> have been held throughout the US and in the UK and the Republic of Ireland.  Most recently, DCE hosted the inaugural Advanced Hydra Camp <https://www.eventbrite.com/e/advanced-hydra-camp-minneapolis-2017-tickets-29337779087> focusing on advanced Hydra developer skills.  

The upcoming combined camp curriculum will provide a comprehensive overview of Fedora and Hydra by exploring such topics as:
Core & Integrated features
Data modeling and linked data
Content and Metadata management
Migrating to Fedora 4
Deploying Fedora and Hydra in production
Ruby, Rails, and collaborative development using Github
Introductory Blacklight including search and faceting
Preservation Services
The curriculum will be delivered by a knowledgeable team of instructors from the Fedora and Hydra communities: David Wilcox (DuraSpace), Andrew Woods (DuraSpace), Mark Bussey (Data Curation Experts), Bess Sadler (Data Curation Experts), Julie Allinson (University of London).

--
David Wilcox
Fedora Product Manager
DuraSpace
dwilcox at duraspace.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170518/c6fa7cbd/attachment.html>

From katherine at educopia.org  Thu May 18 13:52:38 2017
From: katherine at educopia.org (Katherine Skinner)
Date: Thu, 18 May 2017 13:52:38 -0400
Subject: [Pasig-discuss] Digital repository storage benchmarking
In-Reply-To: <6B34DD3F-B3A8-4241-9971-C21BAC89F5BF@dpn.org>
References: <78ADB971-820E-4450-BDEB-1814B86B19F0@uq.edu.au>
	<DB6PR0202MB2616142EFA1F4E14C65E229B95E10@DB6PR0202MB2616.eurprd02.prod.outlook.com>
	<2EBED878-C03B-41EB-BAA8-E36F949EF821@harvard.edu>
	<F4B89086-407A-4EEA-8040-B5D04B798487@gmail.com>
	<6B34DD3F-B3A8-4241-9971-C21BAC89F5BF@dpn.org>
Message-ID: <6A4876CD-D687-4D07-87CD-E40986BFB1AD@educopia.org>

I love this thread--thank you for starting it, Tim!

The MetaArchive Cooperative started preserving content with six institutions in 2004; it has grown to encompass more than 60 institutions, including through consortial memberships with several regional consortia (in Barcelona and Ohio) and a library alliance (HBCU).

Our mission is to provide a strong preservation community as well as an affordable preservation solution for distributed digital preservation for a wide variety of memory-oriented organizations. Our members constantly learn from each other as they compare workflows, tools, approaches, and policies.

More details, specific to your questions, Tim:
we are actively preserving 1,200+ collections totaling 85TB of content (and that is slated to almost double in the next year)
content is ingested via bags (BagIt) and can be submitted in a variety of ways
every file is replicated 7 times and stored in 7 secure, geographically distributed locations on infrastructure that includes both physical servers (at some member institutions) and "cloud-based" and VM infrastructures 
content is regularly audited using LOCKSS voting and polling mechanisms 
when needed, content is repaired and metadata describing that event is created
Other details that may be of interest:
pricing is $500/TB for storage fees, plus an annual membership fee of between $3,000-$5,500 depending on the selected category
some members host network infrastructure; others pay a small annual fee ($1000) to waive that responsibility
MetaArchive is entirely run, owned, and controlled by its members--including pricing decisions
Carly Dearborn (Purdue University) is the current Chair of the Steering Committee. If you are interested in learning more, please reach out to me (Katherine at Educopia.org) or Carly (cdearbor at purdue.edu) while the network's facilitator, Sam Meister, is out on paternity leave until early July.


Katherine Skinner, PhD
Executive Director, Educopia Institute
http://educopia.org

Working from Greensboro, NC
katherine at educopia.org | 404 783 2534


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170518/e3662ad3/attachment.html>

From jkramersmyth at gmail.com  Thu May 18 14:18:53 2017
From: jkramersmyth at gmail.com (Jeanne Kramer-Smyth)
Date: Thu, 18 May 2017 14:18:53 -0400
Subject: [Pasig-discuss] Digital repository storage benchmarking
In-Reply-To: <6A4876CD-D687-4D07-87CD-E40986BFB1AD@educopia.org>
References: <78ADB971-820E-4450-BDEB-1814B86B19F0@uq.edu.au>
	<DB6PR0202MB2616142EFA1F4E14C65E229B95E10@DB6PR0202MB2616.eurprd02.prod.outlook.com>
	<2EBED878-C03B-41EB-BAA8-E36F949EF821@harvard.edu>
	<F4B89086-407A-4EEA-8040-B5D04B798487@gmail.com>
	<6B34DD3F-B3A8-4241-9971-C21BAC89F5BF@dpn.org>
	<6A4876CD-D687-4D07-87CD-E40986BFB1AD@educopia.org>
Message-ID: <CABs6imggHEt+Yn8Z89BPNqa9RMReSdh-6R4io2KStubqDWdoGg@mail.gmail.com>

What would folks think of all of this amazing information being collected
in a shared document somewhere?

Jeanne

On Thu, May 18, 2017 at 1:52 PM, Katherine Skinner <katherine at educopia.org>
wrote:

> I love this thread--thank you for starting it, Tim!
>
> The MetaArchive Cooperative started preserving content with six
> institutions in 2004; it has grown to encompass more than 60 institutions,
> including through consortial memberships with several regional consortia
> (in Barcelona and Ohio) and a library alliance (HBCU).
>
> Our mission is to provide a strong preservation community as well as an
> affordable preservation solution for distributed digital preservation for a
> wide variety of memory-oriented organizations. Our members constantly learn
> from each other as they compare workflows, tools, approaches, and policies.
>
> More details, specific to your questions, Tim:
>
>    - we are actively preserving 1,200+ collections totaling 85TB of
>    content (and that is slated to almost double in the next year)
>    - content is ingested via bags (BagIt) and can be submitted in a
>    variety of ways
>    - every file is replicated 7 times and stored in 7 secure,
>    geographically distributed locations on infrastructure that includes both
>    physical servers (at some member institutions) and "cloud-based" and VM
>    infrastructures
>    - content is regularly audited using LOCKSS voting and polling
>    mechanisms
>    - when needed, content is repaired and metadata describing that event
>    is created
>
> Other details that may be of interest:
>
>    - pricing is $500/TB for storage fees, plus an annual membership fee
>    of between $3,000-$5,500 depending on the selected category
>    - some members host network infrastructure; others pay a small annual
>    fee ($1000) to waive that responsibility
>    - MetaArchive is entirely run, owned, and controlled by its
>    members--including pricing decisions
>
> Carly Dearborn (Purdue University) is the current Chair of the Steering
> Committee. If you are interested in learning more, please reach out to me (
> Katherine at Educopia.org) or Carly (cdearbor at purdue.edu) while the
> network's facilitator, Sam Meister, is out on paternity leave until early
> July.
>
>
>
>
> *Katherine Skinner, PhD*
> Executive Director, Educopia Institute
> http://educopia.org
>
> Working from Greensboro, NC
> katherine at educopia.org | 404 783 2534 <(404)%20783-2534>
>
>
>
>
> ----
> To subscribe, unsubscribe, or modify your subscription, please visit
> http://mail.asis.org/mailman/listinfo/pasig-discuss
> _______
> PASIG Webinars and conference material is at http://www.
> preservationandarchivingsig.org/index.html
> _______________________________________________
> Pasig-discuss mailing list
> Pasig-discuss at mail.asis.org
> http://mail.asis.org/mailman/listinfo/pasig-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170518/92e8dd15/attachment-0001.html>

From arthurpasquinelli at gmail.com  Thu May 18 14:40:11 2017
From: arthurpasquinelli at gmail.com (Arthur Pasquinelli)
Date: Thu, 18 May 2017 11:40:11 -0700
Subject: [Pasig-discuss] Digital repository storage benchmarking
In-Reply-To: <CABs6imggHEt+Yn8Z89BPNqa9RMReSdh-6R4io2KStubqDWdoGg@mail.gmail.com>
References: <78ADB971-820E-4450-BDEB-1814B86B19F0@uq.edu.au>
	<DB6PR0202MB2616142EFA1F4E14C65E229B95E10@DB6PR0202MB2616.eurprd02.prod.outlook.com>
	<2EBED878-C03B-41EB-BAA8-E36F949EF821@harvard.edu>
	<F4B89086-407A-4EEA-8040-B5D04B798487@gmail.com>
	<6B34DD3F-B3A8-4241-9971-C21BAC89F5BF@dpn.org>
	<6A4876CD-D687-4D07-87CD-E40986BFB1AD@educopia.org>
	<CABs6imggHEt+Yn8Z89BPNqa9RMReSdh-6R4io2KStubqDWdoGg@mail.gmail.com>
Message-ID: <47b8bfa3-b73d-3842-b7e8-e29fa8914628@gmail.com>

I was just thinking the same thing since we have had some good 
discussions now and in the past. Since I have kept a copy of all past 
PASIG emails, I'll work on it with the other PASIG steering committee 
members. We are in the middle of some administrative work for PASIG 
right now, so I'll add this to the things being worked on.

On 5/18/17 11:18 AM, Jeanne Kramer-Smyth wrote:
> What would folks think of all of this amazing information being 
> collected in a shared document somewhere?
>
> Jeanne
>
> On Thu, May 18, 2017 at 1:52 PM, Katherine Skinner 
> <katherine at educopia.org <mailto:katherine at educopia.org>> wrote:
>
>     I love this thread--thank you for starting it, Tim!
>
>     The MetaArchive Cooperative started preserving content with six
>     institutions in 2004; it has grown to encompass more than 60
>     institutions, including through consortial memberships with
>     several regional consortia (in Barcelona and Ohio) and a library
>     alliance (HBCU).
>
>     Our mission is to provide a strong preservation community as well
>     as an affordable preservation solution for distributed digital
>     preservation for a wide variety of memory-oriented organizations.
>     Our members constantly learn from each other as they compare
>     workflows, tools, approaches, and policies.
>
>     More details, specific to your questions, Tim:
>
>       * we are actively preserving 1,200+ collections totaling 85TB of
>         content (and that is slated to almost double in the next year)
>       * content is ingested via bags (BagIt) and can be submitted in a
>         variety of ways
>       * every file is replicated 7 times and stored in 7 secure,
>         geographically distributed locations on infrastructure that
>         includes both physical servers (at some member institutions)
>         and "cloud-based" and VM infrastructures
>       * content is regularly audited using LOCKSS voting and polling
>         mechanisms
>       * when needed, content is repaired and metadata describing that
>         event is created
>
>     Other details that may be of interest:
>
>       * pricing is $500/TB for storage fees, plus an annual membership
>         fee of between $3,000-$5,500 depending on the selected category
>       * some members host network infrastructure; others pay a small
>         annual fee ($1000) to waive that responsibility
>       * MetaArchive is entirely run, owned, and controlled by its
>         members--including pricing decisions
>
>     Carly Dearborn (Purdue University) is the current Chair of the
>     Steering Committee. If you are interested in learning more, please
>     reach out to me (Katherine at Educopia.org
>     <mailto:Katherine at Educopia.org>) or Carly (cdearbor at purdue.edu
>     <mailto:cdearbor at purdue.edu>) while the network's facilitator, Sam
>     Meister, is out on paternity leave until early July.
>
>
>
>
>     *Katherine Skinner, PhD*
>     Executive Director, Educopia Institute
>     http://educopia.org
>
>     Working from Greensboro, NC
>     katherine at educopia.org <mailto:katherine at educopia.org> | 404 783
>     2534 <tel:%28404%29%20783-2534>
>
>
>
>
>     ----
>     To subscribe, unsubscribe, or modify your subscription, please visit
>     http://mail.asis.org/mailman/listinfo/pasig-discuss
>     <http://mail.asis.org/mailman/listinfo/pasig-discuss>
>     _______
>     PASIG Webinars and conference material is at
>     http://www.preservationandarchivingsig.org/index.html
>     <http://www.preservationandarchivingsig.org/index.html>
>     _______________________________________________
>     Pasig-discuss mailing list
>     Pasig-discuss at mail.asis.org <mailto:Pasig-discuss at mail.asis.org>
>     http://mail.asis.org/mailman/listinfo/pasig-discuss
>     <http://mail.asis.org/mailman/listinfo/pasig-discuss>
>
>
>
>
> ----
> To subscribe, unsubscribe, or modify your subscription, please visit
> http://mail.asis.org/mailman/listinfo/pasig-discuss
> _______
> PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
> _______________________________________________
> Pasig-discuss mailing list
> Pasig-discuss at mail.asis.org
> http://mail.asis.org/mailman/listinfo/pasig-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170518/a12c5ee1/attachment.html>

From sschaefer at ucsd.edu  Thu May 18 15:24:03 2017
From: sschaefer at ucsd.edu (Schaefer, Sibyl)
Date: Thu, 18 May 2017 19:24:03 +0000
Subject: [Pasig-discuss] Digital repository storage benchmarking
In-Reply-To: <47b8bfa3-b73d-3842-b7e8-e29fa8914628@gmail.com>
References: <78ADB971-820E-4450-BDEB-1814B86B19F0@uq.edu.au>
	<DB6PR0202MB2616142EFA1F4E14C65E229B95E10@DB6PR0202MB2616.eurprd02.prod.outlook.com>
	<2EBED878-C03B-41EB-BAA8-E36F949EF821@harvard.edu>
	<F4B89086-407A-4EEA-8040-B5D04B798487@gmail.com>
	<6B34DD3F-B3A8-4241-9971-C21BAC89F5BF@dpn.org>
	<6A4876CD-D687-4D07-87CD-E40986BFB1AD@educopia.org>
	<CABs6imggHEt+Yn8Z89BPNqa9RMReSdh-6R4io2KStubqDWdoGg@mail.gmail.com>
	<47b8bfa3-b73d-3842-b7e8-e29fa8914628@gmail.com>
Message-ID: <FD48D170-80CE-4CCE-9C77-28BBAB67C4CD@ucsd.edu>

I?d like to chime in with some information about the Chronopolis Digital Preservation Network. We were originally funded by the Library of Congress NDIPP program and ingested our first production content in 2008. Chronopolis was designed to preserve hundreds of terabytes of digital data with minimal requirements on the data provider. The single, overriding commitment of the Chronopolis system is to preserve objects in such a way that they can be transmitted back to the original data providers in the exact form in which they were submitted. Chronopolis leverages high-speed networks, mass-scale storage capabilities, and the expertise of the partners in order to provide a geographically distributed, heterogeneous, and highly redundant archive system. Our partners include the University of California San Diego Library, the National Center for Atmospheric Research, The University of Maryland Institute for Advanced Computer Studies, and our newest partner, the Texas Digital Library.

Features of the project include:
?         Three geographically distributed copies of the data
?         Curatorial audit reporting
?         Development of best practices for data packaging and sharing

We also serve as a founding node in the Digital Preservation Network and partner with DuraSpace to provide our services. We currently preserve over 50 TBs (150 replicated) of data. Our prices vary depending on the ingest mechanism, but the base rate for storage is $286/TB/year for three geographically-distributed copies.

Best,

Sibyl


Sibyl Schaefer
Chronopolis Program Manager // Digital Preservation Analyst
University of California, San Diego


From: Pasig-discuss <pasig-discuss-bounces at asist.org> on behalf of Arthur Pasquinelli <arthurpasquinelli at gmail.com>
Date: Thursday, May 18, 2017 at 11:40 AM
To: "pasig-discuss at mail.asis.org" <pasig-discuss at mail.asis.org>
Subject: Re: [Pasig-discuss] Digital repository storage benchmarking

I was just thinking the same thing since we have had some good discussions now and in the past. Since I have kept a copy of all past PASIG emails, I'll work on it with the other PASIG steering committee members. We are in the middle of some administrative work for PASIG right now, so I'll add this to the things being worked on.

On 5/18/17 11:18 AM, Jeanne Kramer-Smyth wrote:
What would folks think of all of this amazing information being collected in a shared document somewhere?

Jeanne

On Thu, May 18, 2017 at 1:52 PM, Katherine Skinner <katherine at educopia.org<mailto:katherine at educopia.org>> wrote:
I love this thread--thank you for starting it, Tim!

The MetaArchive Cooperative started preserving content with six institutions in 2004; it has grown to encompass more than 60 institutions, including through consortial memberships with several regional consortia (in Barcelona and Ohio) and a library alliance (HBCU).

Our mission is to provide a strong preservation community as well as an affordable preservation solution for distributed digital preservation for a wide variety of memory-oriented organizations. Our members constantly learn from each other as they compare workflows, tools, approaches, and policies.

More details, specific to your questions, Tim:

  *   we are actively preserving 1,200+ collections totaling 85TB of content (and that is slated to almost double in the next year)
  *   content is ingested via bags (BagIt) and can be submitted in a variety of ways
  *   every file is replicated 7 times and stored in 7 secure, geographically distributed locations on infrastructure that includes both physical servers (at some member institutions) and "cloud-based" and VM infrastructures
  *   content is regularly audited using LOCKSS voting and polling mechanisms
  *   when needed, content is repaired and metadata describing that event is created
Other details that may be of interest:

  *   pricing is $500/TB for storage fees, plus an annual membership fee of between $3,000-$5,500 depending on the selected category
  *   some members host network infrastructure; others pay a small annual fee ($1000) to waive that responsibility
  *   MetaArchive is entirely run, owned, and controlled by its members--including pricing decisions
Carly Dearborn (Purdue University) is the current Chair of the Steering Committee. If you are interested in learning more, please reach out to me (Katherine at Educopia.org<mailto:Katherine at Educopia.org>) or Carly (cdearbor at purdue.edu<mailto:cdearbor at purdue.edu>) while the network's facilitator, Sam Meister, is out on paternity leave until early July.


Katherine Skinner, PhD
Executive Director, Educopia Institute
http://educopia.org

Working from Greensboro, NC
katherine at educopia.org<mailto:katherine at educopia.org> | 404 783 2534<tel:%28404%29%20783-2534>


----
To subscribe, unsubscribe, or modify your subscription, please visit
http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org<mailto:Pasig-discuss at mail.asis.org>
http://mail.asis.org/mailman/listinfo/pasig-discuss


----

To subscribe, unsubscribe, or modify your subscription, please visit

http://mail.asis.org/mailman/listinfo/pasig-discuss

_______

PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html

_______________________________________________

Pasig-discuss mailing list

Pasig-discuss at mail.asis.org<mailto:Pasig-discuss at mail.asis.org>

http://mail.asis.org/mailman/listinfo/pasig-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170518/a1ddf5d0/attachment-0001.html>

From randy_stern at harvard.edu  Thu May 18 15:34:27 2017
From: randy_stern at harvard.edu (Stern, Randy)
Date: Thu, 18 May 2017 19:34:27 +0000
Subject: [Pasig-discuss] Digital repository storage benchmarking
In-Reply-To: <FD48D170-80CE-4CCE-9C77-28BBAB67C4CD@ucsd.edu>
References: <78ADB971-820E-4450-BDEB-1814B86B19F0@uq.edu.au>
	<DB6PR0202MB2616142EFA1F4E14C65E229B95E10@DB6PR0202MB2616.eurprd02.prod.outlook.com>
	<2EBED878-C03B-41EB-BAA8-E36F949EF821@harvard.edu>
	<F4B89086-407A-4EEA-8040-B5D04B798487@gmail.com>
	<6B34DD3F-B3A8-4241-9971-C21BAC89F5BF@dpn.org>
	<6A4876CD-D687-4D07-87CD-E40986BFB1AD@educopia.org>
	<CABs6imggHEt+Yn8Z89BPNqa9RMReSdh-6R4io2KStubqDWdoGg@mail.gmail.com>
	<47b8bfa3-b73d-3842-b7e8-e29fa8914628@gmail.com>
	<FD48D170-80CE-4CCE-9C77-28BBAB67C4CD@ucsd.edu>
Message-ID: <06489D92-8E52-454C-8EC6-0A9EF3B04AC0@harvard.edu>

Thanks for sharing this, $286/TB/year for 3 copies ? are the copies tape only? Does this include real time access to disk copies, or is it a dark archive? It would be great to have all these factors broken out in the shared repository of informaiton that Art Pasquinelli wrotw about!

Randy


From: Pasig-discuss <pasig-discuss-bounces at asist.org> on behalf of "Schaefer, Sibyl" <sschaefer at ucsd.edu>
Date: Thursday, May 18, 2017 at 3:24 PM
To: Arthur Pasquinelli <arthurpasquinelli at gmail.com>, "pasig-discuss at mail.asis.org" <pasig-discuss at mail.asis.org>
Subject: Re: [Pasig-discuss] Digital repository storage benchmarking

I?d like to chime in with some information about the Chronopolis Digital Preservation Network. We were originally funded by the Library of Congress NDIPP program and ingested our first production content in 2008. Chronopolis was designed to preserve hundreds of terabytes of digital data with minimal requirements on the data provider. The single, overriding commitment of the Chronopolis system is to preserve objects in such a way that they can be transmitted back to the original data providers in the exact form in which they were submitted. Chronopolis leverages high-speed networks, mass-scale storage capabilities, and the expertise of the partners in order to provide a geographically distributed, heterogeneous, and highly redundant archive system. Our partners include the University of California San Diego Library, the National Center for Atmospheric Research, The University of Maryland Institute for Advanced Computer Studies, and our newest partner, the Texas Digital Library.

Features of the project include:
&#8226          Three geographically distributed copies of the data
&#8226          Curatorial audit reporting
&#8226          Development of best practices for data packaging and sharing

We also serve as a founding node in the Digital Preservation Network and partner with DuraSpace to provide our services. We currently preserve over 50 TBs (150 replicated) of data. Our prices vary depending on the ingest mechanism, but the base rate for storage is $286/TB/year for three geographically-distributed copies.

Best,

Sibyl


Sibyl Schaefer
Chronopolis Program Manager // Digital Preservation Analyst
University of California, San Diego


From: Pasig-discuss <pasig-discuss-bounces at asist.org> on behalf of Arthur Pasquinelli <arthurpasquinelli at gmail.com>
Date: Thursday, May 18, 2017 at 11:40 AM
To: "pasig-discuss at mail.asis.org" <pasig-discuss at mail.asis.org>
Subject: Re: [Pasig-discuss] Digital repository storage benchmarking

I was just thinking the same thing since we have had some good discussions now and in the past. Since I have kept a copy of all past PASIG emails, I'll work on it with the other PASIG steering committee members. We are in the middle of some administrative work for PASIG right now, so I'll add this to the things being worked on.

On 5/18/17 11:18 AM, Jeanne Kramer-Smyth wrote:
What would folks think of all of this amazing information being collected in a shared document somewhere?

Jeanne

On Thu, May 18, 2017 at 1:52 PM, Katherine Skinner <katherine at educopia.org<mailto:katherine at educopia.org>> wrote:
I love this thread--thank you for starting it, Tim!

The MetaArchive Cooperative started preserving content with six institutions in 2004; it has grown to encompass more than 60 institutions, including through consortial memberships with several regional consortia (in Barcelona and Ohio) and a library alliance (HBCU).

Our mission is to provide a strong preservation community as well as an affordable preservation solution for distributed digital preservation for a wide variety of memory-oriented organizations. Our members constantly learn from each other as they compare workflows, tools, approaches, and policies.

More details, specific to your questions, Tim:

  *   we are actively preserving 1,200+ collections totaling 85TB of content (and that is slated to almost double in the next year)
  *   content is ingested via bags (BagIt) and can be submitted in a variety of ways
  *   every file is replicated 7 times and stored in 7 secure, geographically distributed locations on infrastructure that includes both physical servers (at some member institutions) and "cloud-based" and VM infrastructures
  *   content is regularly audited using LOCKSS voting and polling mechanisms
  *   when needed, content is repaired and metadata describing that event is created
Other details that may be of interest:

  *   pricing is $500/TB for storage fees, plus an annual membership fee of between $3,000-$5,500 depending on the selected category
  *   some members host network infrastructure; others pay a small annual fee ($1000) to waive that responsibility
  *   MetaArchive is entirely run, owned, and controlled by its members--including pricing decisions
Carly Dearborn (Purdue University) is the current Chair of the Steering Committee. If you are interested in learning more, please reach out to me (Katherine at Educopia.org<mailto:Katherine at Educopia.org>) or Carly (cdearbor at purdue.edu<mailto:cdearbor at purdue.edu>) while the network's facilitator, Sam Meister, is out on paternity leave until early July.


Katherine Skinner, PhD
Executive Director, Educopia Institute
http://educopia.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__educopia.org&d=DwMGaQ&c=WO-RGvefibhHBZq3fL85hQ&r=4JGa5VtTknXYK4u1_QXigeqoIeLZKZx44Vc_775eJ9Q&m=eoe0KKTuetF31qfzApQnHBFG2_VciQ3Hs4wsIuk4Zsk&s=Wq-WzTJRLbiw8t-maHjnqTueLCElPhRNl_v2p1hyYNA&e=>

Working from Greensboro, NC
katherine at educopia.org<mailto:katherine at educopia.org> | 404 783 2534<tel:%28404%29%20783-2534>


----
To subscribe, unsubscribe, or modify your subscription, please visit
http://mail.asis.org/mailman/listinfo/pasig-discuss<https://urldefense.proofpoint.com/v2/url?u=http-3A__mail.asis.org_mailman_listinfo_pasig-2Ddiscuss&d=DwMGaQ&c=WO-RGvefibhHBZq3fL85hQ&r=4JGa5VtTknXYK4u1_QXigeqoIeLZKZx44Vc_775eJ9Q&m=eoe0KKTuetF31qfzApQnHBFG2_VciQ3Hs4wsIuk4Zsk&s=9TxFPA7VeupzReWypawybcbCd6H6R9q-LHcijvSa2Zw&e=>
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.preservationandarchivingsig.org_index.html&d=DwMGaQ&c=WO-RGvefibhHBZq3fL85hQ&r=4JGa5VtTknXYK4u1_QXigeqoIeLZKZx44Vc_775eJ9Q&m=eoe0KKTuetF31qfzApQnHBFG2_VciQ3Hs4wsIuk4Zsk&s=hXUf9gGVU_UYQ_CHPQtP4DHvNQNofUrvZ4r5nGi5F_U&e=>
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org<mailto:Pasig-discuss at mail.asis.org>
http://mail.asis.org/mailman/listinfo/pasig-discuss<https://urldefense.proofpoint.com/v2/url?u=http-3A__mail.asis.org_mailman_listinfo_pasig-2Ddiscuss&d=DwMGaQ&c=WO-RGvefibhHBZq3fL85hQ&r=4JGa5VtTknXYK4u1_QXigeqoIeLZKZx44Vc_775eJ9Q&m=eoe0KKTuetF31qfzApQnHBFG2_VciQ3Hs4wsIuk4Zsk&s=9TxFPA7VeupzReWypawybcbCd6H6R9q-LHcijvSa2Zw&e=>


----

To subscribe, unsubscribe, or modify your subscription, please visit

http://mail.asis.org/mailman/listinfo/pasig-discuss<https://urldefense.proofpoint.com/v2/url?u=http-3A__mail.asis.org_mailman_listinfo_pasig-2Ddiscuss&d=DwMGaQ&c=WO-RGvefibhHBZq3fL85hQ&r=4JGa5VtTknXYK4u1_QXigeqoIeLZKZx44Vc_775eJ9Q&m=eoe0KKTuetF31qfzApQnHBFG2_VciQ3Hs4wsIuk4Zsk&s=9TxFPA7VeupzReWypawybcbCd6H6R9q-LHcijvSa2Zw&e=>

_______

PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.preservationandarchivingsig.org_index.html&d=DwMGaQ&c=WO-RGvefibhHBZq3fL85hQ&r=4JGa5VtTknXYK4u1_QXigeqoIeLZKZx44Vc_775eJ9Q&m=eoe0KKTuetF31qfzApQnHBFG2_VciQ3Hs4wsIuk4Zsk&s=hXUf9gGVU_UYQ_CHPQtP4DHvNQNofUrvZ4r5nGi5F_U&e=>

_______________________________________________

Pasig-discuss mailing list

Pasig-discuss at mail.asis.org<mailto:Pasig-discuss at mail.asis.org>

http://mail.asis.org/mailman/listinfo/pasig-discuss<https://urldefense.proofpoint.com/v2/url?u=http-3A__mail.asis.org_mailman_listinfo_pasig-2Ddiscuss&d=DwMGaQ&c=WO-RGvefibhHBZq3fL85hQ&r=4JGa5VtTknXYK4u1_QXigeqoIeLZKZx44Vc_775eJ9Q&m=eoe0KKTuetF31qfzApQnHBFG2_VciQ3Hs4wsIuk4Zsk&s=9TxFPA7VeupzReWypawybcbCd6H6R9q-LHcijvSa2Zw&e=>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170518/dcdcfa09/attachment-0001.html>

From sschaefer at ucsd.edu  Thu May 18 15:41:44 2017
From: sschaefer at ucsd.edu (Schaefer, Sibyl)
Date: Thu, 18 May 2017 19:41:44 +0000
Subject: [Pasig-discuss] Digital repository storage benchmarking
In-Reply-To: <06489D92-8E52-454C-8EC6-0A9EF3B04AC0@harvard.edu>
References: <78ADB971-820E-4450-BDEB-1814B86B19F0@uq.edu.au>
	<DB6PR0202MB2616142EFA1F4E14C65E229B95E10@DB6PR0202MB2616.eurprd02.prod.outlook.com>
	<2EBED878-C03B-41EB-BAA8-E36F949EF821@harvard.edu>
	<F4B89086-407A-4EEA-8040-B5D04B798487@gmail.com>
	<6B34DD3F-B3A8-4241-9971-C21BAC89F5BF@dpn.org>
	<6A4876CD-D687-4D07-87CD-E40986BFB1AD@educopia.org>
	<CABs6imggHEt+Yn8Z89BPNqa9RMReSdh-6R4io2KStubqDWdoGg@mail.gmail.com>
	<47b8bfa3-b73d-3842-b7e8-e29fa8914628@gmail.com>
	<FD48D170-80CE-4CCE-9C77-28BBAB67C4CD@ucsd.edu>
	<06489D92-8E52-454C-8EC6-0A9EF3B04AC0@harvard.edu>
Message-ID: <C356FD61-8B26-47B6-A766-5C69A70EF4CF@ucsd.edu>

Hi Randy-

The copies are all on hard disks, allowing us to run vigorous fixity checking routines. It is a dark archive, so there is no real time access to copies.

Let me know if you have more questions!

Best,

Sibyl


Sibyl Schaefer
Chronopolis Program Manager // Digital Preservation Analyst
University of California, San Diego


From: "Stern, Randy" <randy_stern at harvard.edu>
Date: Thursday, May 18, 2017 at 12:34 PM
To: "Schaefer, Sibyl" <sschaefer at ucsd.edu>, Arthur Pasquinelli <arthurpasquinelli at gmail.com>, "pasig-discuss at mail.asis.org" <pasig-discuss at mail.asis.org>
Subject: Re: [Pasig-discuss] Digital repository storage benchmarking

Thanks for sharing this, $286/TB/year for 3 copies ? are the copies tape only? Does this include real time access to disk copies, or is it a dark archive? It would be great to have all these factors broken out in the shared repository of informaiton that Art Pasquinelli wrotw about!

Randy


From: Pasig-discuss <pasig-discuss-bounces at asist.org> on behalf of "Schaefer, Sibyl" <sschaefer at ucsd.edu>
Date: Thursday, May 18, 2017 at 3:24 PM
To: Arthur Pasquinelli <arthurpasquinelli at gmail.com>, "pasig-discuss at mail.asis.org" <pasig-discuss at mail.asis.org>
Subject: Re: [Pasig-discuss] Digital repository storage benchmarking

I?d like to chime in with some information about the Chronopolis Digital Preservation Network. We were originally funded by the Library of Congress NDIPP program and ingested our first production content in 2008. Chronopolis was designed to preserve hundreds of terabytes of digital data with minimal requirements on the data provider. The single, overriding commitment of the Chronopolis system is to preserve objects in such a way that they can be transmitted back to the original data providers in the exact form in which they were submitted. Chronopolis leverages high-speed networks, mass-scale storage capabilities, and the expertise of the partners in order to provide a geographically distributed, heterogeneous, and highly redundant archive system. Our partners include the University of California San Diego Library, the National Center for Atmospheric Research, The University of Maryland Institute for Advanced Computer Studies, and our newest partner, the Texas Digital Library.

Features of the project include:
&#8226          Three geographically distributed copies of the data
&#8226          Curatorial audit reporting
&#8226          Development of best practices for data packaging and sharing

We also serve as a founding node in the Digital Preservation Network and partner with DuraSpace to provide our services. We currently preserve over 50 TBs (150 replicated) of data. Our prices vary depending on the ingest mechanism, but the base rate for storage is $286/TB/year for three geographically-distributed copies.

Best,

Sibyl


Sibyl Schaefer
Chronopolis Program Manager // Digital Preservation Analyst
University of California, San Diego


From: Pasig-discuss <pasig-discuss-bounces at asist.org> on behalf of Arthur Pasquinelli <arthurpasquinelli at gmail.com>
Date: Thursday, May 18, 2017 at 11:40 AM
To: "pasig-discuss at mail.asis.org" <pasig-discuss at mail.asis.org>
Subject: Re: [Pasig-discuss] Digital repository storage benchmarking

I was just thinking the same thing since we have had some good discussions now and in the past. Since I have kept a copy of all past PASIG emails, I'll work on it with the other PASIG steering committee members. We are in the middle of some administrative work for PASIG right now, so I'll add this to the things being worked on.

On 5/18/17 11:18 AM, Jeanne Kramer-Smyth wrote:
What would folks think of all of this amazing information being collected in a shared document somewhere?

Jeanne

On Thu, May 18, 2017 at 1:52 PM, Katherine Skinner <katherine at educopia.org<mailto:katherine at educopia.org>> wrote:
I love this thread--thank you for starting it, Tim!

The MetaArchive Cooperative started preserving content with six institutions in 2004; it has grown to encompass more than 60 institutions, including through consortial memberships with several regional consortia (in Barcelona and Ohio) and a library alliance (HBCU).

Our mission is to provide a strong preservation community as well as an affordable preservation solution for distributed digital preservation for a wide variety of memory-oriented organizations. Our members constantly learn from each other as they compare workflows, tools, approaches, and policies.

More details, specific to your questions, Tim:

  *   we are actively preserving 1,200+ collections totaling 85TB of content (and that is slated to almost double in the next year)
  *   content is ingested via bags (BagIt) and can be submitted in a variety of ways
  *   every file is replicated 7 times and stored in 7 secure, geographically distributed locations on infrastructure that includes both physical servers (at some member institutions) and "cloud-based" and VM infrastructures
  *   content is regularly audited using LOCKSS voting and polling mechanisms
  *   when needed, content is repaired and metadata describing that event is created
Other details that may be of interest:

  *   pricing is $500/TB for storage fees, plus an annual membership fee of between $3,000-$5,500 depending on the selected category
  *   some members host network infrastructure; others pay a small annual fee ($1000) to waive that responsibility
  *   MetaArchive is entirely run, owned, and controlled by its members--including pricing decisions
Carly Dearborn (Purdue University) is the current Chair of the Steering Committee. If you are interested in learning more, please reach out to me (Katherine at Educopia.org<mailto:Katherine at Educopia.org>) or Carly (cdearbor at purdue.edu<mailto:cdearbor at purdue.edu>) while the network's facilitator, Sam Meister, is out on paternity leave until early July.


Katherine Skinner, PhD
Executive Director, Educopia Institute
http://educopia.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__educopia.org&d=DwMGaQ&c=WO-RGvefibhHBZq3fL85hQ&r=4JGa5VtTknXYK4u1_QXigeqoIeLZKZx44Vc_775eJ9Q&m=eoe0KKTuetF31qfzApQnHBFG2_VciQ3Hs4wsIuk4Zsk&s=Wq-WzTJRLbiw8t-maHjnqTueLCElPhRNl_v2p1hyYNA&e=>

Working from Greensboro, NC
katherine at educopia.org<mailto:katherine at educopia.org> | 404 783 2534<tel:%28404%29%20783-2534>


----
To subscribe, unsubscribe, or modify your subscription, please visit
http://mail.asis.org/mailman/listinfo/pasig-discuss<https://urldefense.proofpoint.com/v2/url?u=http-3A__mail.asis.org_mailman_listinfo_pasig-2Ddiscuss&d=DwMGaQ&c=WO-RGvefibhHBZq3fL85hQ&r=4JGa5VtTknXYK4u1_QXigeqoIeLZKZx44Vc_775eJ9Q&m=eoe0KKTuetF31qfzApQnHBFG2_VciQ3Hs4wsIuk4Zsk&s=9TxFPA7VeupzReWypawybcbCd6H6R9q-LHcijvSa2Zw&e=>
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.preservationandarchivingsig.org_index.html&d=DwMGaQ&c=WO-RGvefibhHBZq3fL85hQ&r=4JGa5VtTknXYK4u1_QXigeqoIeLZKZx44Vc_775eJ9Q&m=eoe0KKTuetF31qfzApQnHBFG2_VciQ3Hs4wsIuk4Zsk&s=hXUf9gGVU_UYQ_CHPQtP4DHvNQNofUrvZ4r5nGi5F_U&e=>
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org<mailto:Pasig-discuss at mail.asis.org>
http://mail.asis.org/mailman/listinfo/pasig-discuss<https://urldefense.proofpoint.com/v2/url?u=http-3A__mail.asis.org_mailman_listinfo_pasig-2Ddiscuss&d=DwMGaQ&c=WO-RGvefibhHBZq3fL85hQ&r=4JGa5VtTknXYK4u1_QXigeqoIeLZKZx44Vc_775eJ9Q&m=eoe0KKTuetF31qfzApQnHBFG2_VciQ3Hs4wsIuk4Zsk&s=9TxFPA7VeupzReWypawybcbCd6H6R9q-LHcijvSa2Zw&e=>


----

To subscribe, unsubscribe, or modify your subscription, please visit

http://mail.asis.org/mailman/listinfo/pasig-discuss<https://urldefense.proofpoint.com/v2/url?u=http-3A__mail.asis.org_mailman_listinfo_pasig-2Ddiscuss&d=DwMGaQ&c=WO-RGvefibhHBZq3fL85hQ&r=4JGa5VtTknXYK4u1_QXigeqoIeLZKZx44Vc_775eJ9Q&m=eoe0KKTuetF31qfzApQnHBFG2_VciQ3Hs4wsIuk4Zsk&s=9TxFPA7VeupzReWypawybcbCd6H6R9q-LHcijvSa2Zw&e=>

_______

PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.preservationandarchivingsig.org_index.html&d=DwMGaQ&c=WO-RGvefibhHBZq3fL85hQ&r=4JGa5VtTknXYK4u1_QXigeqoIeLZKZx44Vc_775eJ9Q&m=eoe0KKTuetF31qfzApQnHBFG2_VciQ3Hs4wsIuk4Zsk&s=hXUf9gGVU_UYQ_CHPQtP4DHvNQNofUrvZ4r5nGi5F_U&e=>

_______________________________________________

Pasig-discuss mailing list

Pasig-discuss at mail.asis.org<mailto:Pasig-discuss at mail.asis.org>

http://mail.asis.org/mailman/listinfo/pasig-discuss<https://urldefense.proofpoint.com/v2/url?u=http-3A__mail.asis.org_mailman_listinfo_pasig-2Ddiscuss&d=DwMGaQ&c=WO-RGvefibhHBZq3fL85hQ&r=4JGa5VtTknXYK4u1_QXigeqoIeLZKZx44Vc_775eJ9Q&m=eoe0KKTuetF31qfzApQnHBFG2_VciQ3Hs4wsIuk4Zsk&s=9TxFPA7VeupzReWypawybcbCd6H6R9q-LHcijvSa2Zw&e=>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170518/3ca53663/attachment-0001.html>

From jkramersmyth at gmail.com  Thu May 18 15:43:23 2017
From: jkramersmyth at gmail.com (Jeanne Kramer-Smyth)
Date: Thu, 18 May 2017 15:43:23 -0400
Subject: [Pasig-discuss] Digital repository storage benchmarking
In-Reply-To: <47b8bfa3-b73d-3842-b7e8-e29fa8914628@gmail.com>
References: <78ADB971-820E-4450-BDEB-1814B86B19F0@uq.edu.au>
	<DB6PR0202MB2616142EFA1F4E14C65E229B95E10@DB6PR0202MB2616.eurprd02.prod.outlook.com>
	<2EBED878-C03B-41EB-BAA8-E36F949EF821@harvard.edu>
	<F4B89086-407A-4EEA-8040-B5D04B798487@gmail.com>
	<6B34DD3F-B3A8-4241-9971-C21BAC89F5BF@dpn.org>
	<6A4876CD-D687-4D07-87CD-E40986BFB1AD@educopia.org>
	<CABs6imggHEt+Yn8Z89BPNqa9RMReSdh-6R4io2KStubqDWdoGg@mail.gmail.com>
	<47b8bfa3-b73d-3842-b7e8-e29fa8914628@gmail.com>
Message-ID: <CABs6imgDx7bpcDMd6DS9jWE+aDe94P39_U_xBc3t1b8RFunDbQ@mail.gmail.com>

Dear Arthur,

It could even be something as simple as a shared Google Spreadsheet that
people can add their institution's information to over time. It would be
great if it could be a living document.

Thanks!
Jeanne


On Thu, May 18, 2017 at 2:40 PM, Arthur Pasquinelli <
arthurpasquinelli at gmail.com> wrote:

> I was just thinking the same thing since we have had some good discussions
> now and in the past. Since I have kept a copy of all past PASIG emails,
> I'll work on it with the other PASIG steering committee members. We are in
> the middle of some administrative work for PASIG right now, so I'll add
> this to the things being worked on.
>
>
> On 5/18/17 11:18 AM, Jeanne Kramer-Smyth wrote:
>
> What would folks think of all of this amazing information being collected
> in a shared document somewhere?
>
> Jeanne
>
> On Thu, May 18, 2017 at 1:52 PM, Katherine Skinner <katherine at educopia.org
> > wrote:
>
>> I love this thread--thank you for starting it, Tim!
>>
>> The MetaArchive Cooperative started preserving content with six
>> institutions in 2004; it has grown to encompass more than 60 institutions,
>> including through consortial memberships with several regional consortia
>> (in Barcelona and Ohio) and a library alliance (HBCU).
>>
>> Our mission is to provide a strong preservation community as well as an
>> affordable preservation solution for distributed digital preservation for a
>> wide variety of memory-oriented organizations. Our members constantly learn
>> from each other as they compare workflows, tools, approaches, and policies.
>>
>> More details, specific to your questions, Tim:
>>
>>    - we are actively preserving 1,200+ collections totaling 85TB of
>>    content (and that is slated to almost double in the next year)
>>    - content is ingested via bags (BagIt) and can be submitted in a
>>    variety of ways
>>    - every file is replicated 7 times and stored in 7 secure,
>>    geographically distributed locations on infrastructure that includes both
>>    physical servers (at some member institutions) and "cloud-based" and VM
>>    infrastructures
>>    - content is regularly audited using LOCKSS voting and polling
>>    mechanisms
>>    - when needed, content is repaired and metadata describing that event
>>    is created
>>
>> Other details that may be of interest:
>>
>>    - pricing is $500/TB for storage fees, plus an annual membership fee
>>    of between $3,000-$5,500 depending on the selected category
>>    - some members host network infrastructure; others pay a small annual
>>    fee ($1000) to waive that responsibility
>>    - MetaArchive is entirely run, owned, and controlled by its
>>    members--including pricing decisions
>>
>> Carly Dearborn (Purdue University) is the current Chair of the Steering
>> Committee. If you are interested in learning more, please reach out to me (
>> Katherine at Educopia.org) or Carly (cdearbor at purdue.edu) while the
>> network's facilitator, Sam Meister, is out on paternity leave until early
>> July.
>>
>>
>>
>>
>> *Katherine Skinner, PhD*
>> Executive Director, Educopia Institute
>> http://educopia.org
>>
>> Working from Greensboro, NC
>> katherine at educopia.org | 404 783 2534 <%28404%29%20783-2534>
>>
>>
>>
>>
>> ----
>> To subscribe, unsubscribe, or modify your subscription, please visit
>> http://mail.asis.org/mailman/listinfo/pasig-discuss
>> _______
>> PASIG Webinars and conference material is at
>> http://www.preservationandarchivingsig.org/index.html
>> _______________________________________________
>> Pasig-discuss mailing list
>> Pasig-discuss at mail.asis.org
>> http://mail.asis.org/mailman/listinfo/pasig-discuss
>>
>>
>
>
> ----
> To subscribe, unsubscribe, or modify your subscription, please visithttp://mail.asis.org/mailman/listinfo/pasig-discuss
> _______
> PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
> _______________________________________________
> Pasig-discuss mailing listPasig-discuss at mail.asis.orghttp://mail.asis.org/mailman/listinfo/pasig-discuss
>
>
>
> ----
> To subscribe, unsubscribe, or modify your subscription, please visit
> http://mail.asis.org/mailman/listinfo/pasig-discuss
> _______
> PASIG Webinars and conference material is at http://www.
> preservationandarchivingsig.org/index.html
> _______________________________________________
> Pasig-discuss mailing list
> Pasig-discuss at mail.asis.org
> http://mail.asis.org/mailman/listinfo/pasig-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170518/29adc3ae/attachment.html>

From corey at coppul.ca  Thu May 18 18:33:34 2017
From: corey at coppul.ca (Corey Davis)
Date: Thu, 18 May 2017 15:33:34 -0700
Subject: [Pasig-discuss] Digital preservation advocacy document examples
Message-ID: <CAMfC5=bJz135Dw=d8dEm27BHqbyRETsge-TePHUTE4SdmEyitA@mail.gmail.com>

Hi folks,

In COPPUL we've recently established a major digital preservation
initiative http://www.coppul.ca/blog/2017/04/coppul-builds-future-establishes-coppul-digital-preservation-network
and one of the things we're hoping to develop in the near future is
something we're tentatively calling a "digital preservation advocacy
toolkit." This would primarily consist of a graphics-heavy document or
template intended to brief senior academic administrators. We want
them to better understand the issues so our member libraries are in a
better position to advocate for resources.

There's some great stuff in the CESSDA cost-benefit advocacy toolkit
(and thanks to the CESSDA folks for making this CC-BY), but I'm
wondering if others out there have developed briefing documents or
other resources for senior academic administrators in relation to
digital preservation, that they might be willing to share.

Many thanks,
Corey
--

Corey Davis
Digital Preservation Network Manager
Council of Prairie and Pacific University Libraries (COPPUL)
corey at coppul.ca
(250) 472-5024 office
(778) 677-5746 cell

From Stephen.Abrams at ucop.edu  Fri May 19 13:14:37 2017
From: Stephen.Abrams at ucop.edu (Stephen Abrams)
Date: Fri, 19 May 2017 17:14:37 +0000
Subject: [Pasig-discuss] Digital repository storage benchmarking
In-Reply-To: <06489D92-8E52-454C-8EC6-0A9EF3B04AC0@harvard.edu>
References: <78ADB971-820E-4450-BDEB-1814B86B19F0@uq.edu.au>
	<DB6PR0202MB2616142EFA1F4E14C65E229B95E10@DB6PR0202MB2616.eurprd02.prod.outlook.com>
	<2EBED878-C03B-41EB-BAA8-E36F949EF821@harvard.edu>
	<F4B89086-407A-4EEA-8040-B5D04B798487@gmail.com>
	<6B34DD3F-B3A8-4241-9971-C21BAC89F5BF@dpn.org>
	<6A4876CD-D687-4D07-87CD-E40986BFB1AD@educopia.org>
	<CABs6imggHEt+Yn8Z89BPNqa9RMReSdh-6R4io2KStubqDWdoGg@mail.gmail.com>
	<47b8bfa3-b73d-3842-b7e8-e29fa8914628@gmail.com>
	<FD48D170-80CE-4CCE-9C77-28BBAB67C4CD@ucsd.edu>
	<06489D92-8E52-454C-8EC6-0A9EF3B04AC0@harvard.edu>
Message-ID: <BN6PR06MB2403B62B6F828C8BD00D74DBF8E50@BN6PR06MB2403.namprd06.prod.outlook.com>

CDL?s Merritt repository supports long-term preservation and current (and long-term) access.  All content is actively replication and audited, with either 2 or ? 6 copies, depending upon how you count things.  We rely on two cloud service providers: one, a private cloud at UCSD/SDSC, which itself manages 3 copies on independent arrays; and the other, AWS S3 (for bright content) and Glacier (for dark content), which manage at least 3 copies spread across availability zones.   Both clouds perform local fixity audit, which Merritt overlays with its own audit (except for Glacier content, which would be cost prohibitive under its transactional fee structure).

Merritt?s nominal price point is $650/TB/year, but the cost accounting is done by adding up the used byte-days (billed at $0.00000000000178/B/day = 650/365) over the service year.  This lets our customers avoid having to worry about the timing of their contributions.  So 1 TB deposited on the first day of the year will accrue the full $650 cost; that same 1 TB deposited on the last day of the year will accrue only $1.78 in cost.

--sla

Stephen Abrams
Associate Director, UC Curation Center
California Digital Library
University of California, Office of the President
Stephen.Abrams at ucop.edu<mailto:Stephen.Abrams at ucop.edu>
+1 510-987-0370
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170519/963bc615/attachment-0001.html>