[Pasig-discuss] Risks of encryption & compression built into storage options?

Gail Truman gail at trumantechnologies.com
Tue Mar 21 10:27:39 EDT 2017


There's an NDSA working group that's currently trying to assess cloud vendor practices around fixity, durability promises, etc. some people on this PASIG thread alias are part of the team and might want to chime in, (I am one of those on the NDSA team). Also there's a preservation storage group outside of the NDSA team (but with member overlap) that started prior to IPRES last year, did a workshop at iPRES and is regularly meeting with plans to publish findings and attend this coming iPRES. Folks from that team are also on this PASIG alias and can speak up too. 

Gail

Gail

Gail Truman
Truman Technologies, LLC

Protecting the world's digital heritage for future generations
www.trumantechnologies.com
facebook/TrumanTechnologies
https://www.linkedin.com/in/gtruman
+1 510 5026497


> On Mar 21, 2017, at 1:24 AM, Matthew Addis <matthew.addis at arkivum.com> wrote:
> 
> Interesting!  There’s a growing number of cloud services that provide bit-level preservation, which is good news as it’s evidence that there’s a growing market.  Along with Arkivum (largely UK but we do have customers in the US), there’s also DuraCloud (US) and some interesting ‘shared service’ options in specific domains, e.g. DPN for scholarly outputs.  
> 
> The guys at AVPreserve in the US profiled some of these using the NDSA preservation levels (https://www.avpreserve.com/papers-and-presentations/cloud-storage-vendor-profiles/) and the TNA in the UK commissioned a similar review (http://www.nationalarchives.gov.uk/documents/CloudStorage-Guidance_March-2015.pdf).  There’s also a bit about cloud for digital preservation in the DPC handbook which links to some vendors and case studies (http://www.dpconline.org/handbook/technical-solutions-and-tools/cloud-services)
> 
> I’m not aware of a recent survey of bit-preservation in the cloud - does anyone have any pointers?
> 
> Cheers,
> 
> Matthew
> 
> Matthew Addis
> Chief Technology Officer
>  
> tel:  
> +44 1249 405060
> mob:    
> +44 7703 393374
> email:    
> matthew.addis at arkivum.com
> web:      
> www.arkivum.com
> twitter: @arkivum
>  
> This message is confidential unless otherwise stated.
> Arkivum Limited is registered in England and Wales, company number 7530353. Registered Office: 24 Cornhill, London, EC3V 3ND, United Kingdom
> 
> From: Pasig-discuss <pasig-discuss-bounces at asis.org> on behalf of "gail at trumantechnologies.com" <gail at trumantechnologies.com>
> Date: Monday, 20 March 2017 23:31
> To: Michal Růžička <ruzicka at ics.muni.cz>, "pasig-discuss at mail.asis.org" <pasig-discuss at mail.asis.org>
> Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options?
> 
> Michal /all - I'm aware of a cloud vendor who has rolled out preservation services (including SHA-256 fixity checks, BagIt for transport, and some other features). Their data centers are in US so probably not useful for .cz but others on this alias may find this useful. 
> 
> Check out http://www.komodocloud.com/TruStore.html
> 
> Gail
> 
> 
> 
>  
>  
> Gail Truman
> Truman Technologies, LLC
> Certified Digital Archives Specialist, Society of American Archivists
>  
> Protecting the world's digital heritage for future generations
> www.trumantechnologies.com
> facebook/TrumanTechnologies
> https://www.linkedin.com/in/gtruman 
>  
> +1 510 502 6497
>  
>  
> 
> 
> -------- Original Message --------
> Subject: Re: [Pasig-discuss] Risks of encryption & compression built
> into storage options?
> From: Michal Růžička <ruzicka at ics.muni.cz>
> Date: Fri, March 17, 2017 12:06 pm
> To: <pasig-discuss at mail.asis.org>
> 
> Dear all,
> 
> I am very interested in this discussion. One short comment first and
> two question next:
> 
> I do not think erasure coding is a good idea in the LTP system as
> significantly increases the complexity of the system and coding
> (increases probability of an error in implementation/process/...) and
> increases interconnections of the data between the multiple storage
> areas. I am a big fan of isolation, independence and
> as-simple-as-possible coding of the data replicas as much as possible.
> 
> Now questions:
> 
> 1. What is the best LTP implementation methodology you can recommend
> me? I do not mean the OAIS itself but practical recommendations on
> concrete implementations, methods and procedures for a relatively small
> (<< 1 PB) data archive.
> 
> 2. The Ceph distributed storage was mentioned in the below cited
> e-mail. I am aware of the Ceph use in the Dutch National Archive
> (http://widodh.o.auroraobjects.eu/talks/ceph_dutch_national_archive_2016.pdf#page=11&zoom=page-fit,-177,595).
> What do you think about the use of Ceph in an LTP system? Do you have
> any experience with Ceph in practice or strong opinion on this technology?
> 
> All the best,
> Michal
> 
> 
> Dne 17.3.2017 v 15:49 Paul Mather napsal(a):
> > On Mar 17, 2017, at 3:48 AM, van Wezel, Jos (SCC) <jos.vanwezel at kit.edu
> > <mailto:jos.vanwezel at kit.edu>> wrote:
> > 
> >> Chris,
> >> do you happen to have any reference to the mathatical correctness or
> >> computation that 3 copies is optimal. Is proof based on the standard
> >> ecc values that vendors list with their components (tapes, disks,
> >> transport lines, memory etc). I'm asking because its difficult to
> >> argue for the additional costs of a third copy without the math.
> >> Currently I can't tell my customers how much (as in percentage) extra
> >> security an addittional copy will bring, even theoretically.
> > 
> > One thing I don't believe I've seen mentioned so far in regards to
> > redundancy costs is switching to erasure-resilient coding rather than
> > using plain replication. Explained briefly, erasure-resilient coding
> > represents a logical unit of data as k fragments. These k fragments
> > are then encoded into a larger unit of n fragments, n > k, where the
> > n-k extra fragments can be thought of as "parity" fragments. The n
> > encoded fragments may then be distributed across different disks,
> > racks, and data centres. The value is that *any* k out of n fragments
> > may be used to reconstitute the original logical unit of data. As n
> > grows larger, the probability of total data loss grows smaller, and,
> > conversely, the storage overhead and cost grows larger, allowing you to
> > choose your cost/risk balance. The main disadvantage of
> > erasure-resilient coding is that data I/O latency is increased due to
> > the inherently distributed nature of the storage approach. There are
> > comparisons between replication and erasure-resilient coding systems.
> > One such (https://dl.acm.org/citation.cfm?id=687814) concludes, "We
> > show that systems employing erasure codes have mean time to failures
> > many orders of magnitude higher than replicated systems with similar
> > storage and bandwidth requirements. More importantly, erasure-resilient
> > systems use an order of magnitude less bandwidth and storage to provide
> > similar system durability as replicated systems."
> > 
> > Erasure-resilient coding is becoming mainstream in Cloud storage and
> > object storage systems in general. I believe that Hadoop has recently
> > acquired an erasure-resilient coding storage option for HDFS as an
> > alternative to the standard replication model. This is due to the
> > increase in data set sizes, where erasure-resilient coding can offer
> > lower redundancy overheads than plain replication options, yet still
> > offering the same or higher assurance levels on data availability. I
> > also believe CEPH and OpenStack Swift are supporting erasure-resilient
> > storage.
> > 
> > Cheers,
> > 
> > Paul.
> 
> 
> -- 
> ---------------------------------------------------------------
> Michal Růžička <ruzicka at ics.muni.cz>
> Phone: +420 549 49 6834
> Aleph Library Management System
> Library Information Centre, Institute of Computer Science
> Masaryk University, Czech Republic
> Office number C308, Botanická 68a, 602 00 Brno
> OpenPGP key: https://kic-internal.ics.muni.cz/~ruzicka/pgp-key/
> Fingerprint: 4791 027A B994 A183 C28C 9B89 33C1 5D8C 293E 15A9
> ---------------------------------------------------------------
> ----
> To subscribe, unsubscribe, or modify your subscription, please visit
> http://mail.asis.org/mailman/listinfo/pasig-discuss
> _______
> PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
> _______________________________________________
> Pasig-discuss mailing list
> Pasig-discuss at mail.asis.org
> http://mail.asis.org/mailman/listinfo/pasig-discuss
> ----
> To subscribe, unsubscribe, or modify your subscription, please visit
> http://mail.asis.org/mailman/listinfo/pasig-discuss
> _______
> PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
> _______________________________________________
> Pasig-discuss mailing list
> Pasig-discuss at mail.asis.org
> http://mail.asis.org/mailman/listinfo/pasig-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170321/81cacb0a/attachment-0001.html>


More information about the Pasig-discuss mailing list