[Pasig-discuss] FW: Digital repository storage benchmarking

Julian M. Morley jmorley at stanford.edu
Fri May 12 17:28:30 EDT 2017


Tim,

Moab - used here at Stanford Libraries - is a POSIX-based paradigm that allows incremental updates without involving symlinks. We use it in conjunction with UUIDs (not hashes) and Fedora to define the AIPs used in the Stanford Digital Repository.

There’s a white paper describing Moab here:
http://journal.code4lib.org/articles/8482#2.5


-- 
Julian M. Morley
Technology Infrastructure Manager
Digital Library Systems & Services
Stanford University Libraries








On 5/12/17, 2:15 PM, "Pasig-discuss on behalf of Tim Walsh" <pasig-discuss-bounces at asis.org on behalf of twalsh at cca.qc.ca> wrote:

>Thank you to Tab, Randy, Sheila, Richard, et al. Very interesting and helpful responses!
>
>Best,
>Tim
>
>- - -
> 
>Tim Walsh
>Archiviste, Archives numériques
>Archivist, Digital Archives
> 
>Centre Canadien d’Architecture
>Canadian Centre for Architecture
>T 514 939 7001 x 1532
>www.cca.qc.ca <http://www.cca.qc.ca/>
>
>On 2017-05-12, 2:43 PM, "Pasig-discuss on behalf of Butler, Tab" <pasig-discuss-bounces at asis.org on behalf of tab.butler at mlb.com> wrote:
>
>    Tim,
>    
>    At Major League Baseball, we are focused mostly on archiving the broadcast game video feeds, along with pregame, postgame, and individual camera iso feeds for each game.  The content includes both the home and away team broadcasts, with and without graphics.  Essentially, we record 7 hours plus of content for every 1 hour of baseball played.   We also record and archive all the MLB Network content that is produced, which is between 12 - 18 hours of live content per day.  We will archive the entire broadcast show of record, and the individual elements that make up a show.
>    
>    All in, we are recording over 1,000 hours of content per day.  This equates to 50+ TB of content being added to our archive per day.
>    
>    We have both an active on-line disk tier (2 SAN's - each 2.88 PB) for recording, editing, and on-line storage, and a data tape archive that supports Partial File Restore (PFR) of video files.  We load balance recording content across the two SAN's... American League on one SAN, and National League on the other... and all edits (96 high performance / 54 desktop machines) access both SAN's.
>    
>    Once content is written to a SAN, it is auto archived to tape, as per our DIAMOND asset management system (home grown).  We started archiving on LTO-4 in 2008, and are currently on Oracle T10000-D.  We are migrating content from LTO-4 to T10K-D tape within a tape group...
>    
>    We have both an 'On-Site' tape sub-group, and an 'Off-Site' tape sub-group for each of our Tape Groups.  Tape Groups include "Games with Graphics" (Dirty) and "Games without Graphics" (Clean)... the Dirty off-site tapes go to a separate off site location than the Clean off-site tapes.  We break up all of our Off-Site Tape Groups between two geographically distributed locations, as well.
>    
>    We are using the Oracle DIVArichive middleware, which performs a checksum value that is compared to the stored database value, each time a file is copied, moved, or restored.  We are performing between 1,000 to 2,000 PFR / Restores per day.
>    
>    Currently we have over 45,000 LTO-4's and over 10,000 T10K-D tapes, growing at the rate of 125,000 hours of content per year.
>    
>    If you would like more details regarding archiving video content, feel free to reach out to me.
>    
>    Sincerely,
>    
>    Tab
>    
>    
>    
>    Tab Butler | Sr. Director - Media Management & Post Production| MLB Network | 40 Hartz Way, Suite 10 | Secaucus, NJ  07094
>    (201) 520-6252 Office | (646) 498-1662 Cell
>    
>    tab.butler at mlb.com
>    
>    
>    -----Original Message-----
>    From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Stern, Randy
>    Sent: Friday, May 12, 2017 1:59 PM
>    To: Sheila Morrissey <Sheila.Morrissey at ithaka.org>; pasig-discuss at asis.org
>    Subject: Re: [Pasig-discuss] FW: Digital repository storage benchmarking
>    
>    Harvard is similar – 2 disk copies in geographically distributed sites on, and one tape copy in a third location. We also have a 4th copy on tape in a tape library that is creating the tapes we remove off site to the third location. We run fixity checks on the disk copies, but not the tape copy. We currently have in excess of 200TB for each copy.
>    
>    We currently store preservation and real-time access copies of files in the same storage system with the same storage policies. We expect that to change in the future, with likely delivery copy storage in the cloud.
>    
>    Randy
>    
>    On 5/12/17, 1:43 PM, "Sheila Morrissey" <Sheila.Morrissey at ithaka.org> wrote:
>    
>    
>        Hello, Tim,
>    
>        At Portico (http://www.portico.org/digital-preservation/), we preserve e-journals, e-books, digitized historical collections, and other born-digital scholarly content.
>    
>        Currently, the Portico archive is comprised of roughly 77.7 million digital objects (we call them "Archival Units", or AUs); comprising over 400 TB; made up of 1.3 billion files.
>    
>        We maintain 3 copies of the archive:  2 on disk in geographically distributed data centers, and a 3rd copy in commercial cloud storage.  We create and maintain backups (including fixity checks) using our own custom-written software.
>    
>        I hope this helpful.
>    
>        Best regards,
>        Sheila
>    
>    
>        Sheila M. Morrissey
>        Senior Researcher
>        ITHAKA
>        100 Campus Drive
>        Suite 100
>        Princeton NJ 08540
>        609-986-2221
>        sheila.morrissey at ithaka.org
>    
>        ITHAKA (www.ithaka.org) is a not-for-profit organization that helps the academic community use digital technologies to preserve the scholarly record and to advance research and teaching in sustainable ways.  We provide innovative services that benefit higher education, including Ithaka S+R, JSTOR, and Portico.
>    
>    
>    
>        -----Original Message-----
>        From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Tim Walsh
>        Sent: Friday, May 12, 2017 10:16 AM
>        To: pasig-discuss at asis.org
>        Subject: [Pasig-discuss] Digital repository storage benchmarking
>    
>        Dear PASIG,
>    
>        I am currently in the process of benchmarking digital repository storage setups with our Director of IT, and am having trouble finding very much information about other institutions’ configurations online. It’s very possible that this question has been asked before on-list, but I wasn’t able to find anything in the list archives.
>    
>        For context, we are a research museum with significant born-digital archival holdings preparing to manage about 200 TB of digital objects over the next 3 years, replicated several times on various media. The question is what precisely those “various media” will be. Currently, our plan is to store one copy on disk on-site, one copy on disk in a managed off-site facility, and a third copy on LTO sent to a third facility. Before we commit, we’d like to benchmark our plans against other institutions.
>    
>        I have been able to find information about the storage configurations for MoMA and the Computer History Museum (who each wrote blog posts or presented on this topic), but not very many others. So my questions are:
>    
>        * Could you point me to published/available resources outlining other institutions’ digital repository storage configurations?
>        * Or, if you work at an institution, would you be willing to share the details of your configuration on- or off-list? (any information sent off-list will be kept strictly confidential)
>    
>        Helpful details would include: amount of digital objects being stored; how many copies of data are being stored; which copies are online, nearline, or offline; which media are being used for which copies; and what services/software applications are you using to manage the creation and maintainance of backups.
>    
>        Thank you!
>        Tim
>    
>        - - -
>    
>        Tim Walsh
>        Archiviste, Archives numériques
>        Archivist, Digital Archives
>    
>        Centre Canadien d’Architecture
>        Canadian Centre for Architecture
>        1920, rue Baile, Montréal, Québec  H3H 2S6 T 514 939 7001 x 1532 F 514 939 7020 www.cca.qc.ca<http://www.cca.qc.ca/>
>    
>    
>        Pensez à l’environnement avant d’imprimer ce message Please consider the environment before printing this email Ce courriel peut contenir des renseignements confidentiels. Si vous n’êtes pas le destinataire prévu, veuillez nous en aviser immédiatement. Merci également de supprimer le présent courriel et d’en détruire toute copie.
>        This email may contain confidential information. If you are not the intended recipient, please advise us immediately and delete this email as well as any other copy. Thank you.
>    
>        ----
>        To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
>        _______
>        PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
>        _______________________________________________
>        Pasig-discuss mailing list
>        Pasig-discuss at mail.asis.org
>        http://mail.asis.org/mailman/listinfo/pasig-discuss
>    
>        ----
>        To subscribe, unsubscribe, or modify your subscription, please visit
>        http://mail.asis.org/mailman/listinfo/pasig-discuss
>        _______
>        PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
>        _______________________________________________
>        Pasig-discuss mailing list
>        Pasig-discuss at mail.asis.org
>        http://mail.asis.org/mailman/listinfo/pasig-discuss
>    
>    
>    
>    ----
>    To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss
>    _______
>    PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
>    _______________________________________________
>    Pasig-discuss mailing list
>    Pasig-discuss at mail.asis.org
>    http://mail.asis.org/mailman/listinfo/pasig-discuss
>    
>    
>    ----
>    To subscribe, unsubscribe, or modify your subscription, please visit
>    http://mail.asis.org/mailman/listinfo/pasig-discuss
>    _______
>    PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
>    _______________________________________________
>    Pasig-discuss mailing list
>    Pasig-discuss at mail.asis.org
>    http://mail.asis.org/mailman/listinfo/pasig-discuss
>    
>
>
>----
>To subscribe, unsubscribe, or modify your subscription, please visit
>http://mail.asis.org/mailman/listinfo/pasig-discuss
>_______
>PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
>_______________________________________________
>Pasig-discuss mailing list
>Pasig-discuss at mail.asis.org
>http://mail.asis.org/mailman/listinfo/pasig-discuss



More information about the Pasig-discuss mailing list