[Pasig-discuss] Risks of encryption & compression built into storage options?

Klein, Stephen SKlein at gc.cuny.edu
Tue Mar 21 08:15:16 EDT 2017


Preservica and DuraCloud are both cloud based and provide these services.

From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of gail at trumantechnologies.com
Sent: Monday, March 20, 2017 7:31 PM
To: Michal Růžička <ruzicka at ics.muni.cz>; pasig-discuss at mail.asis.org
Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options?

Michal /all - I'm aware of a cloud vendor who has rolled out preservation services (including SHA-256 fixity checks, BagIt for transport, and some other features). Their data centers are in US so probably not useful for .cz but others on this alias may find this useful.

Check out http://www.komodocloud.com/TruStore.html[komodocloud.com]<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.komodocloud.com_TruStore.html&d=DwMFaQ&c=8v77JlHZOYsReeOxyYXDU39VUUzHxyfBUh7fw_ZfBDA&r=RzuvmrjV2OCbxFATfoSTEnF5WFvuPC_o3B3MUQCetc0&m=KnVJxQnr4KnGmryVExSMkzMApepZA071CVFf-CFBWgg&s=oyQxRZRu2DRADJjIwrRiiki39izyseJYAtWIaJCODms&e=>

Gail





Gail Truman
Truman Technologies, LLC
Certified Digital Archives Specialist, Society of American Archivists

Protecting the world's digital heritage for future generations
www.trumantechnologies.com[trumantechnologies.com]<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.trumantechnologies.com&d=DwMFaQ&c=8v77JlHZOYsReeOxyYXDU39VUUzHxyfBUh7fw_ZfBDA&r=RzuvmrjV2OCbxFATfoSTEnF5WFvuPC_o3B3MUQCetc0&m=KnVJxQnr4KnGmryVExSMkzMApepZA071CVFf-CFBWgg&s=MTOVZ1bhoaXEApmsndEuzQVgbAGi3XkkLD5AMDUVzYE&e=>
facebook/TrumanTechnologies
https://www.linkedin.com/in/gtruman[linkedin.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_in_gtruman&d=DwMFaQ&c=8v77JlHZOYsReeOxyYXDU39VUUzHxyfBUh7fw_ZfBDA&r=RzuvmrjV2OCbxFATfoSTEnF5WFvuPC_o3B3MUQCetc0&m=KnVJxQnr4KnGmryVExSMkzMApepZA071CVFf-CFBWgg&s=ECTe_uyhgPvJ_538Icbx3OHE-AyffvINKYMJ8Qwgs7k&e=>

+1 510 502 6497



-------- Original Message --------
Subject: Re: [Pasig-discuss] Risks of encryption & compression built
into storage options?
From: Michal Růžička <ruzicka at ics.muni.cz<mailto:ruzicka at ics.muni.cz>>
Date: Fri, March 17, 2017 12:06 pm
To: <pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org>>

Dear all,

I am very interested in this discussion. One short comment first and
two question next:

I do not think erasure coding is a good idea in the LTP system as
significantly increases the complexity of the system and coding
(increases probability of an error in implementation/process/...) and
increases interconnections of the data between the multiple storage
areas. I am a big fan of isolation, independence and
as-simple-as-possible coding of the data replicas as much as possible.

Now questions:

1. What is the best LTP implementation methodology you can recommend
me? I do not mean the OAIS itself but practical recommendations on
concrete implementations, methods and procedures for a relatively small
(<< 1 PB) data archive.

2. The Ceph distributed storage was mentioned in the below cited
e-mail. I am aware of the Ceph use in the Dutch National Archive
(http://widodh.o.auroraobjects.eu/talks/ceph_dutch_national_archive_2016.pdf#page=11&zoom=page-fit[widodh.o.auroraobjects.eu]<https://urldefense.proofpoint.com/v2/url?u=http-3A__widodh.o.auroraobjects.eu_talks_ceph-5Fdutch-5Fnational-5Farchive-5F2016.pdf-23page-3D11-26zoom-3Dpage-2Dfit&d=DwMFaQ&c=8v77JlHZOYsReeOxyYXDU39VUUzHxyfBUh7fw_ZfBDA&r=RzuvmrjV2OCbxFATfoSTEnF5WFvuPC_o3B3MUQCetc0&m=KnVJxQnr4KnGmryVExSMkzMApepZA071CVFf-CFBWgg&s=f7QoyYZbkiXxN6Jo5DHv2CYF4BIWz_DcCkwxx4FruqM&e=>,-177,595).
What do you think about the use of Ceph in an LTP system? Do you have
any experience with Ceph in practice or strong opinion on this technology?

All the best,
Michal


Dne 17.3.2017 v 15:49 Paul Mather napsal(a):
> On Mar 17, 2017, at 3:48 AM, van Wezel, Jos (SCC) <jos.vanwezel at kit.edu<mailto:jos.vanwezel at kit.edu>
> <mailto:jos.vanwezel at kit.edu>> wrote:
>
>> Chris,
>> do you happen to have any reference to the mathatical correctness or
>> computation that 3 copies is optimal. Is proof based on the standard
>> ecc values that vendors list with their components (tapes, disks,
>> transport lines, memory etc). I'm asking because its difficult to
>> argue for the additional costs of a third copy without the math.
>> Currently I can't tell my customers how much (as in percentage) extra
>> security an addittional copy will bring, even theoretically.
>
> One thing I don't believe I've seen mentioned so far in regards to
> redundancy costs is switching to erasure-resilient coding rather than
> using plain replication. Explained briefly, erasure-resilient coding
> represents a logical unit of data as k fragments. These k fragments
> are then encoded into a larger unit of n fragments, n > k, where the
> n-k extra fragments can be thought of as "parity" fragments. The n
> encoded fragments may then be distributed across different disks,
> racks, and data centres. The value is that *any* k out of n fragments
> may be used to reconstitute the original logical unit of data. As n
> grows larger, the probability of total data loss grows smaller, and,
> conversely, the storage overhead and cost grows larger, allowing you to
> choose your cost/risk balance. The main disadvantage of
> erasure-resilient coding is that data I/O latency is increased due to
> the inherently distributed nature of the storage approach. There are
> comparisons between replication and erasure-resilient coding systems.
> One such (https://dl.acm.org/citation.cfm?id=687814)[dl.acm.org]<https://urldefense.proofpoint.com/v2/url?u=https-3A__dl.acm.org_citation.cfm-3Fid-3D687814-29&d=DwMFaQ&c=8v77JlHZOYsReeOxyYXDU39VUUzHxyfBUh7fw_ZfBDA&r=RzuvmrjV2OCbxFATfoSTEnF5WFvuPC_o3B3MUQCetc0&m=KnVJxQnr4KnGmryVExSMkzMApepZA071CVFf-CFBWgg&s=orLjOeQ9OtXiHVF9W7qdj-lBm2eo49_GY8Sk9PrtM4M&e=> concludes, "We
> show that systems employing erasure codes have mean time to failures
> many orders of magnitude higher than replicated systems with similar
> storage and bandwidth requirements. More importantly, erasure-resilient
> systems use an order of magnitude less bandwidth and storage to provide
> similar system durability as replicated systems."
>
> Erasure-resilient coding is becoming mainstream in Cloud storage and
> object storage systems in general. I believe that Hadoop has recently
> acquired an erasure-resilient coding storage option for HDFS as an
> alternative to the standard replication model. This is due to the
> increase in data set sizes, where erasure-resilient coding can offer
> lower redundancy overheads than plain replication options, yet still
> offering the same or higher assurance levels on data availability. I
> also believe CEPH and OpenStack Swift are supporting erasure-resilient
> storage.
>
> Cheers,
>
> Paul.


--
---------------------------------------------------------------
Michal Růžička <ruzicka at ics.muni.cz<mailto:ruzicka at ics.muni.cz>>
Phone: +420 549 49 6834
Aleph Library Management System
Library Information Centre, Institute of Computer Science
Masaryk University, Czech Republic
Office number C308, Botanická 68a, 602 00 Brno
OpenPGP key: https://kic-internal.ics.muni.cz/~ruzicka/pgp-key[kic-internal.ics.muni.cz]<https://urldefense.proofpoint.com/v2/url?u=https-3A__kic-2Dinternal.ics.muni.cz_-7Eruzicka_pgp-2Dkey&d=DwMFaQ&c=8v77JlHZOYsReeOxyYXDU39VUUzHxyfBUh7fw_ZfBDA&r=RzuvmrjV2OCbxFATfoSTEnF5WFvuPC_o3B3MUQCetc0&m=KnVJxQnr4KnGmryVExSMkzMApepZA071CVFf-CFBWgg&s=aWjTCL5_Y2fiR_kmNytYy6NHUpWjWzuwgwy4L-gJrC4&e=>/
Fingerprint: 4791 027A B994 A183 C28C 9B89 33C1 5D8C 293E 15A9
---------------------------------------------------------------
----
To subscribe, unsubscribe, or modify your subscription, please visit
http://mail.asis.org/mailman/listinfo/pasig-discuss[mail.asis.org]<https://urldefense.proofpoint.com/v2/url?u=http-3A__mail.asis.org_mailman_listinfo_pasig-2Ddiscuss&d=DwMFaQ&c=8v77JlHZOYsReeOxyYXDU39VUUzHxyfBUh7fw_ZfBDA&r=RzuvmrjV2OCbxFATfoSTEnF5WFvuPC_o3B3MUQCetc0&m=KnVJxQnr4KnGmryVExSMkzMApepZA071CVFf-CFBWgg&s=0-rQXATEab3oBPz-xRyw_oshipKaDHa-RefqX2iKIko&e=>
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html[preservationandarchivingsig.org]<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.preservationandarchivingsig.org_index.html&d=DwMFaQ&c=8v77JlHZOYsReeOxyYXDU39VUUzHxyfBUh7fw_ZfBDA&r=RzuvmrjV2OCbxFATfoSTEnF5WFvuPC_o3B3MUQCetc0&m=KnVJxQnr4KnGmryVExSMkzMApepZA071CVFf-CFBWgg&s=CEF6s29VxttDfhjJi5fR2wnBaN71bSIuv_L4dDr7pxU&e=>
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org<mailto:Pasig-discuss at mail.asis.org>
http://mail.asis.org/mailman/listinfo/pasig-discuss[mail.asis.org]<https://urldefense.proofpoint.com/v2/url?u=http-3A__mail.asis.org_mailman_listinfo_pasig-2Ddiscuss&d=DwMFaQ&c=8v77JlHZOYsReeOxyYXDU39VUUzHxyfBUh7fw_ZfBDA&r=RzuvmrjV2OCbxFATfoSTEnF5WFvuPC_o3B3MUQCetc0&m=KnVJxQnr4KnGmryVExSMkzMApepZA071CVFf-CFBWgg&s=0-rQXATEab3oBPz-xRyw_oshipKaDHa-RefqX2iKIko&e=>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170321/809b26c0/attachment-0001.html>


More information about the Pasig-discuss mailing list