[Pasig-discuss] Risks of encryption & compression built into storage options?

Paul Mather pmather at vt.edu
Fri Mar 17 10:49:01 EDT 2017


On Mar 17, 2017, at 3:48 AM, van Wezel, Jos (SCC) <jos.vanwezel at kit.edu> wrote:

> Chris,
> do you happen to have any reference to the mathatical correctness or computation that 3 copies is optimal. Is proof based on the standard ecc values that vendors list with their components (tapes,  disks,  transport lines, memory etc). I'm asking because its difficult to argue for the additional costs of a third copy without the math. Currently I can't tell my customers how much (as in percentage) extra security an addittional copy will bring, even theoretically.


One thing I don't believe I've seen mentioned so far in regards to redundancy costs is switching to erasure-resilient coding rather than using plain replication.  Explained briefly, erasure-resilient coding represents a logical unit of data as k fragments.  These k fragments are then encoded into a larger unit of n fragments, n > k, where the n-k extra fragments can be thought of as "parity" fragments.  The n encoded fragments may then be distributed across different disks, racks, and data centres.  The value is that *any* k out of n fragments may be used to reconstitute the original logical unit of data.  As n grows larger, the probability of total data loss grows smaller, and, conversely, the storage overhead and cost grows larger, allowing you to choose your cost/risk balance.  The main disadvantage of erasure-resilient coding is that data I/O latency is increased due to the inherently distributed nature of the storage approach.  There are comparisons between replication and erasure-resilient coding systems.  One such (https://dl.acm.org/citation.cfm?id=687814 <https://dl.acm.org/citation.cfm?id=687814>) concludes, "We show that systems employing erasure codes have mean time to failures many orders of magnitude higher than replicated systems with similar storage and bandwidth requirements. More importantly, erasure-resilient systems use an order of magnitude less bandwidth and storage to provide similar system durability as replicated systems."

Erasure-resilient coding is becoming mainstream in Cloud storage and object storage systems in general.  I believe that Hadoop has recently acquired an erasure-resilient coding storage option for HDFS as an alternative to the standard replication model.  This is due to the increase in data set sizes, where erasure-resilient coding can offer lower redundancy overheads than plain replication options, yet still offering the same or higher assurance levels on data availability.  I also believe CEPH and OpenStack Swift are supporting erasure-resilient storage.

Cheers,

Paul.




> 
> 
> regards
> 
> jos
> 
> Sent from my Samsung Galaxy smartphone.
> 
> -------- Original message --------
> From: Chris Wood <lw85381 at yahoo.com <mailto:lw85381 at yahoo.com>>
> Date: 17/03/2017 02:07 (GMT+01:00)
> To: "Raymond A. Clarke" <Raymond.Clarke1 at Verizon.net <mailto:Raymond.Clarke1 at Verizon.net>>, gail at trumantechnologies.com <mailto:gail at trumantechnologies.com>, 'Jeanne Kramer-Smyth' <jkramersmyth at worldbankgroup.org <mailto:jkramersmyth at worldbankgroup.org>>, 'Robert Spindler' <rob.spindler at asu.edu <mailto:rob.spindler at asu.edu>>, pasig-discuss at mail.asis.org <mailto:pasig-discuss at mail.asis.org>
> Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options?
> 
> Thanks Ray as always for a great summary. Now my three bits:
> 
> Three (3) copies please. One of which is in a remote location on a different flood plane, Electric grid, fault line etc. for the obvious reasons. Mathematically, this has turned out to be the optimal number looked at with a cost/benefit mindset. Kind of like: 2 is better than one, buta  local problem gets both copies. Three (remote) is more expensive but you get A LOT more data resilience/persistence. Four costs a bunch more, but delivers just a little bit more resilience. Four+ are all examples of ever diminishing returns.
> 
> CW
> 
> On 3/16/2017 4:40 PM, Raymond A. Clarke wrote:
>> Hello All,
>> 
>> A few years back, I did some research on bit-rot and data corruption, as it relates to the various medium that data passes through, on its way to and from the user.  Consider this simple example; as data from memory to HBA to cable to air to cable and so on, bits can be lost along way at any one of, or several of the medium transit  points. This something that current technologies can help with, in part.  Back to the original question, :how do we insure against corruption, either from compression, encryption” and/or transmission?  Well disk and tape(data resting places, if you will) have a come very long way in reducing bit-error rates, compression and encryption.  But the “resting places” are only part of a problem.  In accordance with Gail’s suggestion and as Dr. Rosenthal has coined, LOCKSS (“lot of copies keep stuff safe”).
>> 
>> 
>> Take good care,
>> Raymond
>> 
>> From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org <mailto:pasig-discuss-bounces at asis.org>] On Behalf Of gail at trumantechnologies.com <mailto:gail at trumantechnologies.com>
>> Sent: Thursday, March 16, 2017 5:10 PM
>> To: Jeanne Kramer-Smyth <jkramersmyth at worldbankgroup.org> <mailto:jkramersmyth at worldbankgroup.org>; Robert Spindler <rob.spindler at asu.edu> <mailto:rob.spindler at asu.edu>; pasig-discuss at mail.asis.org <mailto:pasig-discuss at mail.asis.org>
>> Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options?
>> 
>> Hello again, Jeanne,
>> 
>> I think you're hitting on something that needs to be raised to (and pushed for with) vendors, and that is the need for "More transparency" and the reporting to customers of "events" that are part of the provenance of a digital object. The storage architectures do a good job of error detection and self healing; however, they do not report this out. I'd like to (this is my dream) have vendors report back to customers (as part of their SLA) when a object (or part of an object if it's been chunked) has been repaired/self-healed - or lost forever. I could then record this as a PREMIS event. As you know, vendors "design for" 11x9s or 13x9s durability, but their SLAs do not require them to tell us if their durability and data corruption starts to get really bad for whatever reason.
>> 
>> I've not directly answered your question about whether the encryption, dedupe, compression, and other things that can happen inside a storage system is increasing the risk of corruption. I'll look around. I am sure the disk vendors and storage solution and cloud storage vendors have run the numbers, but am not sure if they're made public.
>> 
>> This alias has people from Oracle, Seagate and other storage companies on it so I encourage them to please share any research they have on this -
>> 
>> 
>> Gail
>> 
>> 
>> 
>> Gail Truman
>> Truman Technologies, LLC
>> Certified Digital Archives Specialist, Society of American Archivists
>> 
>> Protecting the world's digital heritage for future generations
>> www.trumantechnologies.com <http://www.trumantechnologies.com/>
>> facebook/TrumanTechnologies
>> https://www.linkedin.com/in/gtruman <https://www.linkedin.com/in/gtruman>
>> 
>> +1 510 502 6497
>> 
>> 
>> 
>> -------- Original Message --------
>> Subject: RE: [Pasig-discuss] Risks of encryption & compression built
>> into storage options?
>> From: Jeanne Kramer-Smyth <jkramersmyth at worldbankgroup.org <mailto:jkramersmyth at worldbankgroup.org>>
>> Date: Thu, March 16, 2017 1:44 pm
>> To: "gail at trumantechnologies.com <mailto:gail at trumantechnologies.com>" <gail at trumantechnologies.com <mailto:gail at trumantechnologies.com>>, "Robert
>> Spindler" <rob.spindler at asu.edu <mailto:rob.spindler at asu.edu>>, "pasig-discuss at mail.asis.org <mailto:pasig-discuss at mail.asis.org>"
>> <pasig-discuss at mail.asis.org <mailto:pasig-discuss at mail.asis.org>>
>> 
>> Thanks Gail & Rob for your replies.
>> 
>> I am less worried about the scenario of someone stealing a drive – as Rob pointed out, if that is happening we have bigger problems.
>> 
>> I do wonder if there are increased risks of bit-rot/file corruption with encryption, compression, and data deduplication. Have there been any studies on this? Could pulling a file off a drive that requires reversal of the auto-encryption and auto-compression in place at the system level mean a greater risk of bits flipping? I am trying to contrast the increased “handling” and change required to get from the stored version to the original version vs the decreased “handling” it would require if what I am pulling off the storage device is exactly what I sent to be stored.
>> 
>> I am less worried about issues related to not being able to decrypt content. The storage solutions we are contemplating would remain under enough ongoing management that these issues should be avoidable. Since ensuring that non-public records remain secure is also very important, encryption gets some points in the “pro” column. I agree that having multiple copies in different storage architectures and with different vendors would also decrease risk.
>> 
>> I want to understand the risks related to the different storage architectures and the ever increasing number of “automatic” things being done to digital objects in the process of them being stored and retrieved. Are there people doing work, independent of vendor claims, to document these types of risks?
>> 
>> Thank you,
>> 
>> Jeanne
>> Jeanne Kramer-Smyth
>> IT Officer, Information Management Services II
>> <ATT00001.png>
>> Information and Technology Solutions
>> WBG Library & Archives of Development
>> T
>> 202-473-9803
>> E
>> jkramersmyth at worldbankgroup.org <mailto:jkramersmyth at worldbankgroup.org%20>
>> W
>> www.worldbank.org <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.worldbank.org&d=DQMFAg&c=AGbYxfJbXK67KfXyGqyv2Ejiz41FqQuZFk4A-1IxfAU&r=NJgCuYsVfzWCDaR17iRz_stYXCBl0BBUfunzpCgq3O4&m=6K-rNEvustg-w3KUuAEUFhjRVFmFu0yMAsazbeVm-lg&s=TkShGzs9qr7es714pkkxzLceCXcULADNIGs74_m1QKQ&e=>
>> 
>> spellboundblog
>> 
>> jkramersmyth
>> 
>> jkramersmyth
>> A
>> 1818 H St NW Washington, DC 20433
>> <ATT00005.png>
>> <ATT00006.png>
>> 
>> From: gail at trumantechnologies.com <mailto:gail at trumantechnologies.com> [mailto:gail at trumantechnologies.com <mailto:gail at trumantechnologies.com>]
>> Sent: Thursday, March 16, 2017 3:18 PM
>> To: Robert Spindler <rob.spindler at asu.edu <mailto:rob.spindler at asu.edu>>; Jeanne Kramer-Smyth <jkramersmyth at worldbankgroup.org <mailto:jkramersmyth at worldbankgroup.org>>; pasig-discuss at mail.asis.org <mailto:pasig-discuss at mail.asis.org>
>> Subject: RE: [Pasig-discuss] Risks of encryption & compression built into storage options?
>> 
>> Hi all, a good topic!
>> There is new drive technology from Seagate (probably other manufacturers) called "Self Encrypted Drives" (SEDs) which can be used to solve the problem of a person stealing a drive and running off with data.
>> 
>> Most cloud services now automatically provide "server side encryption" which means the vendor is doing the encryption for all data at rest (as you point out Jeanne). This is required by HIPAA for all health care data, and is now considered cloud best practice for cloud vendors due to the very real risk of hacking. So, for archival, we need to weigh the data security provided by cloud storage services using server side encryption with the risk of the vendor managing the encryption keys. Which IMO underscores the importance of having multiple copies of all your archival data -- with different vendors and storage architectures or media types if possible.
>> 
>> Gail
>> 
>> 
>> 
>> 
>> 
>> Gail Truman
>> Truman Technologies, LLC
>> Certified Digital Archives Specialist, Society of American Archivists
>> 
>> Protecting the world's digital heritage for future generations
>> www.trumantechnologies.com <http://www.trumantechnologies.com/>
>> facebook/TrumanTechnologies
>> https://www.linkedin.com/in/gtruman <https://www.linkedin.com/in/gtruman>
>> 
>> +1 510 502 6497
>> 
>> 
>> 
>> -------- Original Message --------
>> Subject: Re: [Pasig-discuss] Risks of encryption & compression built
>> into storage options?
>> From: Robert Spindler <rob.spindler at asu.edu <mailto:rob.spindler at asu.edu>>
>> Date: Thu, March 16, 2017 9:06 am
>> To: Jeanne Kramer-Smyth <jkramersmyth at worldbankgroup.org <mailto:jkramersmyth at worldbankgroup.org>>,
>> "pasig-discuss at mail.asis.org <mailto:pasig-discuss at mail.asis.org>" <pasig-discuss at mail.asis.org <mailto:pasig-discuss at mail.asis.org>>
>> At risk of starting a conversation, here are a couple basic issues from an archival standpoint:
>> 
>> Encryption: Who has the keys and what happens should a provider go out of business?
>> 
>> Compression: Lossy or Lossless and how does that compression act on different file formats (video/audio). If this is frequently accessed material it becomes more of an issue.
>> 
>> Short story: At a CNI meeting perhaps 15 years ago in a session about ebooks I asked a panel of vendors if they would give up the keys to encrypted e-books when they reached public domain. Crickets.
>> 
>> Physical discs are not secure given the forensics software widely available today, but if someone can grab a physical disc the provider has more problems than forensics.
>> 
>> Rob Spindler
>> University Archivist and Head
>> Archives and Special Collections
>> Arizona State University Libraries
>> Tempe AZ 85287-1006
>> 480.965.9277
>> http://www.asu.edu/lib/archives <http://www.asu.edu/lib/archives>
>> 
>> From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org <mailto:pasig-discuss-bounces at asis.org>] On Behalf Of Jeanne Kramer-Smyth
>> Sent: Thursday, March 16, 2017 8:54 AM
>> To: pasig-discuss at mail.asis.org <mailto:pasig-discuss at mail.asis.org>
>> Subject: [Pasig-discuss] Risks of encryption & compression built into storage options?
>> 
>> Is anyone aware of active research into the risks to digital preservation that are posed by built in encryption and compression in both cloud and on-prem storage options? Any and all go-to sources for research and reading on these topics would be very welcome.
>> 
>> I am being told by the staff who source storage solutions for my organization that encryption and compression are generally included at the hardware level. That content is automatically encrypted and compressed as it is written to disc – and then un-encrypted and un-compressed as it is pulled off disc in response to a request. It is advertised as both more secure (someone stealing a physical disc could not, in theory, extract its contents) and more cost efficient (taking up less space).
>> 
>> I want to be sure that as we make our choices for long-term storage of permanent digital records that we take these risks into accounts.
>> 
>> Thank you!
>> Jeanne
>> 
>> Jeanne Kramer-Smyth
>> IT Officer, Information Management Services II
>> <ATT00001.png>
>> Information and Technology Solutions
>> WBG Library & Archives of Development
>> T
>> 202-473-9803
>> E
>> jkramersmyth at worldbankgroup.org <mailto:jkramersmyth at worldbankgroup.org%20>
>> W
>> www.worldbank.org <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.worldbank.org&d=DQMFAg&c=AGbYxfJbXK67KfXyGqyv2Ejiz41FqQuZFk4A-1IxfAU&r=NJgCuYsVfzWCDaR17iRz_stYXCBl0BBUfunzpCgq3O4&m=6K-rNEvustg-w3KUuAEUFhjRVFmFu0yMAsazbeVm-lg&s=TkShGzs9qr7es714pkkxzLceCXcULADNIGs74_m1QKQ&e=>
>> 
>> spellboundblog
>> 
>> jkramersmyth
>> 
>> jkramersmyth
>> A
>> 1818 H St NW Washington, DC 20433
>> <ATT00005.png>
>> <ATT00006.png>
>> 
>> 
>> ----
>> To subscribe, unsubscribe, or modify your subscription, please visit
>> http://mail.asis.org/mailman/listinfo/pasig-discuss <http://mail.asis.org/mailman/listinfo/pasig-discuss>
>> _______
>> PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html <http://www.preservationandarchivingsig.org/index.html>
>> _______________________________________________
>> Pasig-discuss mailing list
>> Pasig-discuss at mail.asis.org <mailto:Pasig-discuss at mail.asis.org>
>> http://mail.asis.org/mailman/listinfo/pasig-discuss <http://mail.asis.org/mailman/listinfo/pasig-discuss>
>> 
>> ----
>> To subscribe, unsubscribe, or modify your subscription, please visit
>> http://mail.asis.org/mailman/listinfo/pasig-discuss <http://mail.asis.org/mailman/listinfo/pasig-discuss>
>> _______
>> PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html <http://www.preservationandarchivingsig.org/index.html>
>> _______________________________________________
>> Pasig-discuss mailing list
>> Pasig-discuss at mail.asis.org <mailto:Pasig-discuss at mail.asis.org>
>> http://mail.asis.org/mailman/listinfo/pasig-discuss <http://mail.asis.org/mailman/listinfo/pasig-discuss>
> 
> --
> ----------------------------------------------------
> Chris Wood
> Storage & Data Management
> Office:  408-782-2757 (Home Office)
> Office:  408-276-0730 (Work Office)
> Mobile:  408-218-7313 (Preferred)
> Email: lw85381 at yahoo.com <mailto:lw85381 at yahoo.com>
> ----------------------------------------------------
> ----
> To subscribe, unsubscribe, or modify your subscription, please visit
> http://mail.asis.org/mailman/listinfo/pasig-discuss <http://mail.asis.org/mailman/listinfo/pasig-discuss>
> _______
> PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html <http://www.preservationandarchivingsig.org/index.html>
> _______________________________________________
> Pasig-discuss mailing list
> Pasig-discuss at mail.asis.org <mailto:Pasig-discuss at mail.asis.org>
> http://mail.asis.org/mailman/listinfo/pasig-discuss <http://mail.asis.org/mailman/listinfo/pasig-discuss>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170317/daab957b/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170317/daab957b/attachment-0001.bin>


More information about the Pasig-discuss mailing list