[Pasig-discuss] Risks of encryption & compression built into storage options?

BURNHILL Peter peter.burnhill at ed.ac.uk
Sun Mar 19 21:02:45 EDT 2017


Jos

My take would be strong recommendation that you don't stick with 2 & that you go for 3 replicates instead - doing what you can to have each held under separate conditions


 Peter Burnhill


On 19 Mar 2017, at 10:51 pm, van Wezel, Jos (SCC) <jos.vanwezel at kit.edu<mailto:jos.vanwezel at kit.edu>> wrote:

Hi Chris, thanks a lot. The paper is fun reading especially about the analog movie archive :-)). Hopefully you do find the mpeg paper. My searches returned nothing yet. (was it ever published in some way?)

@all: Having read all posts thus far (great stuff guys) clearly the engineering approach to the problem does not cut it at all. Reading between the lines there seems to be a lot of experience with disasters where even a BER of 10^99 and 4 copies wont help. :-) For now we'll stick with 2 copies and 3 if requested explicitly by the client.

Groet

Jos


On 17/03/2017 17:48, Chris Wood wrote:
Jos:

I just knew somebody would ask this. Ha.  Several years ago several of us wrote
a paper for the MPEG (Motion Pictures Expert Group) and a mathematician named
Jeff Bonwick figured out all the math.  I haven't found it yet in the junk heap
of my PC, but did find a companion paper written by by the same set of authors.
It's not exactly, what you are looking for, but close. It's more about Bit Error
Rates at a rather low level.  I will continue to look for the MPEG paper. It's
got to be somewhere. The Internet "never forgets" Right?
Stay tuned as I keep looking.

CW

On 3/17/2017 12:48 AM, van Wezel, Jos (SCC) wrote:
Chris,
do you happen to have any reference to the mathatical correctness or
computation that 3 copies is optimal. Is proof based on the standard ecc
values that vendors list with their components (tapes,  disks,  transport
lines, memory etc). I'm asking because its difficult to argue for the
additional costs of a third copy without the math. Currently I can't tell my
customers how much (as in percentage) extra security an addittional copy will
bring, even theoretically.

regards

jos

Sent from my Samsung Galaxy smartphone.

-------- Original message --------
From: Chris Wood <lw85381 at yahoo.com<mailto:lw85381 at yahoo.com>>
Date: 17/03/2017 02:07 (GMT+01:00)
To: "Raymond A. Clarke" <Raymond.Clarke1 at Verizon.net<mailto:Raymond.Clarke1 at Verizon.net>>,
gail at trumantechnologies.com<mailto:gail at trumantechnologies.com>, 'Jeanne Kramer-Smyth'
<jkramersmyth at worldbankgroup.org<mailto:jkramersmyth at worldbankgroup.org>>, 'Robert Spindler' <rob.spindler at asu.edu<mailto:rob.spindler at asu.edu>>,
pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org>
Subject: Re: [Pasig-discuss] Risks of encryption & compression built into
storage options?

Thanks Ray as always for a great summary. Now my three bits:

Three (3) copies please. One of which is in a remote location on a different
flood plane, Electric grid, fault line etc. for the obvious reasons.
Mathematically, this has turned out to be the optimal number looked at with a
cost/benefit mindset. Kind of like: 2 is better than one, buta  local problem
gets both copies. Three (remote) is more expensive but you get A LOT more data
resilience/persistence. Four costs a bunch more, but delivers just a little
bit more resilience. Four+ are all examples of ever diminishing returns.

CW

On 3/16/2017 4:40 PM, Raymond A. Clarke wrote:

Hello All,



A few years back, I did some research on bit-rot and data corruption, as it
relates to the various medium that data passes through, on its way to and
from the user.  Consider this simple example; as data from memory to HBA to
cable to air to cable and so on, bits can be lost along way at any one of, or
several of the medium transit  points. This something that current
technologies can help with, in part.  Back to the original question, :how do
we insure against corruption, either from compression, encryption” and/or
transmission?  Well disk and tape(/data resting places/, if you will) have a
come very long way in reducing bit-error rates, compression and encryption.
But the “/resting places”/ are only part of a problem.  In accordance with
Gail’s suggestion and as Dr. Rosenthal has coined, LOCKSS (“lot of copies
keep stuff safe”).





Take good care,

Raymond



*From:*Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] *On Behalf Of
*gail at trumantechnologies.com<mailto:gail at trumantechnologies.com>
*Sent:* Thursday, March 16, 2017 5:10 PM
*To:* Jeanne Kramer-Smyth <jkramersmyth at worldbankgroup.org<mailto:jkramersmyth at worldbankgroup.org>>; Robert Spindler
<rob.spindler at asu.edu<mailto:rob.spindler at asu.edu>>; pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org>
*Subject:* Re: [Pasig-discuss] Risks of encryption & compression built into
storage options?



Hello again, Jeanne,



I think you're hitting on something that needs to be raised to (and pushed
for with) vendors, and that is the need for "More transparency" and the
reporting to customers of "events" that are part of the provenance of a
digital object. The storage architectures do a good job of error detection
and self healing; however, they do not report this out. I'd like to (this is
my dream) have vendors report back to customers (as part of their SLA) when a
object (or part of an object if it's been chunked) has been
repaired/self-healed - or lost forever. I could then record this as a PREMIS
event. As you know, vendors "design for" 11x9s or 13x9s durability, but their
SLAs do not require them to tell us if their durability and data corruption
starts to get really bad for whatever reason.



I've not directly answered your question about whether the encryption,
dedupe, compression, and other things that can happen inside a storage system
is increasing the risk of corruption. I'll look around. I am sure the disk
vendors and storage solution and cloud storage vendors have run the numbers,
but am not sure if they're made public.



This alias has people from Oracle, Seagate and other storage companies on it
so I encourage them to please share any research they have on this -





Gail







Gail Truman

Truman Technologies, LLC

Certified Digital Archives Specialist, Society of American Archivists



/*Protecting the world's digital heritage for future generations*/

www.trumantechnologies.com<http://www.trumantechnologies.com> <http://www.trumantechnologies.com>

facebook/TrumanTechnologies

https://www.linkedin.com/in/gtruman



+1 510 502 6497







   -------- Original Message --------
   Subject: RE: [Pasig-discuss] Risks of encryption & compression built
   into storage options?
   From: Jeanne Kramer-Smyth <jkramersmyth at worldbankgroup.org<mailto:jkramersmyth at worldbankgroup.org>
   <mailto:jkramersmyth at worldbankgroup.org>>
   Date: Thu, March 16, 2017 1:44 pm
   To: "gail at trumantechnologies.com<mailto:gail at trumantechnologies.com> <mailto:gail at trumantechnologies.com>"
   <gail at trumantechnologies.com<mailto:gail at trumantechnologies.com> <mailto:gail at trumantechnologies.com>>, "Robert
   Spindler" <rob.spindler at asu.edu<mailto:rob.spindler at asu.edu> <mailto:rob.spindler at asu.edu>>,
   "pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org> <mailto:pasig-discuss at mail.asis.org>"
   <pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org> <mailto:pasig-discuss at mail.asis.org>>

   Thanks Gail & Rob for your replies.



   I am less worried about the scenario of someone stealing a drive – as Rob
   pointed out, if that is happening we have bigger problems.



   I do wonder if there are increased risks of bit-rot/file corruption with
   encryption, compression, and data deduplication. Have there been any
   studies on this? Could pulling a file off a drive that requires reversal
   of the auto-encryption and auto-compression in place at the system level
   mean a greater risk of bits flipping? I am trying to contrast the
   increased “handling” and change required to get from the stored version
   to the original version vs the decreased “handling” it would require if
   what I am pulling off the storage device is exactly what I sent to be stored.



   I am less worried about issues related to not being able to decrypt
   content. The storage solutions we are contemplating would remain under
   enough ongoing management that these issues should be avoidable. Since
   ensuring that non-public records remain secure is also very important,
   encryption gets some points in the “pro” column. I agree that having
   multiple copies in different storage architectures and with different
   vendors would also decrease risk.



   I want to understand the risks related to the different storage
   architectures and the ever increasing number of “automatic” things being
   done to digital objects in the process of them being stored and
   retrieved. Are there people doing work, independent of vendor claims, to
   document these types of risks?



   Thank you,



   Jeanne

   *Jeanne Kramer-Smyth*

   *IT Officer, Information Management Services II*

   http://siteresources.worldbank.org/NEWS/Images/spacer.png

   *Information and Technology Solutions*

   *WBG Library & Archives of Development*

   T



   202-473-9803

   E



   jkramersmyth at worldbankgroup.org<mailto:jkramersmyth at worldbankgroup.org> <mailto:jkramersmyth at worldbankgroup.org%20>

   W



   www.worldbank.org<http://www.worldbank.org>
   <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.worldbank.org&d=DQMFAg&c=AGbYxfJbXK67KfXyGqyv2Ejiz41FqQuZFk4A-1IxfAU&r=NJgCuYsVfzWCDaR17iRz_stYXCBl0BBUfunzpCgq3O4&m=6K-rNEvustg-w3KUuAEUFhjRVFmFu0yMAsazbeVm-lg&s=TkShGzs9qr7es714pkkxzLceCXcULADNIGs74_m1QKQ&e=>

   http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg



   spellboundblog

   http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg



   jkramersmyth

   http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg



   jkramersmyth

   A



   1818 H St NW Washington, DC 20433

   http://siteresources.worldbank.org/NEWS/Images/spacer.png

   http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png



   *From:*gail at trumantechnologies.com<mailto:gail at trumantechnologies.com> <mailto:gail at trumantechnologies.com>
   [mailto:gail at trumantechnologies.com]
   *Sent:* Thursday, March 16, 2017 3:18 PM
   *To:* Robert Spindler <rob.spindler at asu.edu<mailto:rob.spindler at asu.edu>
   <mailto:rob.spindler at asu.edu>>; Jeanne Kramer-Smyth
   <jkramersmyth at worldbankgroup.org<mailto:jkramersmyth at worldbankgroup.org>
   <mailto:jkramersmyth at worldbankgroup.org>>; pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org>
   <mailto:pasig-discuss at mail.asis.org>
   *Subject:* RE: [Pasig-discuss] Risks of encryption & compression built
   into storage options?



   Hi all, a good topic!

   There is new drive technology from Seagate (probably other manufacturers)
   called "Self Encrypted Drives" (SEDs) which can be used to solve the
   problem of a person stealing a drive and running off with data.



   Most cloud services now automatically provide "server side encryption"
   which means the vendor is doing the encryption for all data at rest (as
   you point out Jeanne). This is required by HIPAA for all health care
   data, and is now considered cloud best practice for cloud vendors due to
   the very real risk of hacking. So, for archival, we need to weigh the
   data security provided by cloud storage services using server side
   encryption with the risk of the vendor managing the encryption keys.
   Which IMO underscores the importance of having multiple copies of all
   your archival data -- with different vendors and storage architectures or
   media types if possible.



   Gail











   Gail Truman

   Truman Technologies, LLC

   Certified Digital Archives Specialist, Society of American Archivists



   /*Protecting the world's digital heritage for future generations*/

   www.trumantechnologies.com<http://www.trumantechnologies.com> <http://www.trumantechnologies.com>

   facebook/TrumanTechnologies

   https://www.linkedin.com/in/gtruman



   +1 510 502 6497







       -------- Original Message --------
       Subject: Re: [Pasig-discuss] Risks of encryption & compression built
       into storage options?
       From: Robert Spindler <rob.spindler at asu.edu<mailto:rob.spindler at asu.edu>
       <mailto:rob.spindler at asu.edu>>
       Date: Thu, March 16, 2017 9:06 am
       To: Jeanne Kramer-Smyth <jkramersmyth at worldbankgroup.org<mailto:jkramersmyth at worldbankgroup.org>
       <mailto:jkramersmyth at worldbankgroup.org>>,
       "pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org> <mailto:pasig-discuss at mail.asis.org>"
       <pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org> <mailto:pasig-discuss at mail.asis.org>>

       At risk of starting a conversation, here are a couple basic issues
       from an archival standpoint:



       Encryption: Who has the keys and what happens should a provider go
       out of business?



       Compression: Lossy or Lossless and how does that compression act on
       different file formats (video/audio). If this is frequently accessed
       material it becomes more of an issue.



       Short story: At a CNI meeting perhaps 15 years ago in a session about
       ebooks I asked a panel of vendors if they would give up the keys to
       encrypted e-books when they reached public domain. Crickets.



       Physical discs are not secure given the forensics software widely
       available today, but if someone can grab a physical disc the provider
       has more problems than forensics.



       Rob Spindler

       University Archivist and Head

       Archives and Special Collections

       Arizona State University Libraries

       Tempe AZ 85287-1006

       480.965.9277

       http://www.asu.edu/lib/archives



       *From:*Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] *On
       Behalf Of *Jeanne Kramer-Smyth
       *Sent:* Thursday, March 16, 2017 8:54 AM
       *To:* pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org> <mailto:pasig-discuss at mail.asis.org>
       *Subject:* [Pasig-discuss] Risks of encryption & compression built
       into storage options?



       Is anyone aware of active research into the risks to digital
       preservation that are posed by built in encryption and compression in
       both cloud and on-prem storage options? Any and all go-to sources for
       research and reading on these topics would be very welcome.



       I am being told by the staff who source storage solutions for my
       organization that encryption and compression are generally included
       at the hardware level. That content is automatically encrypted and
       compressed as it is written to disc – and then un-encrypted and
       un-compressed as it is pulled off disc in response to a request. It
       is advertised as both more secure (someone stealing a physical disc
       could not, in theory, extract its contents) and more cost efficient
       (taking up less space).



       I want to be sure that as we make our choices for long-term storage
       of permanent digital records that we take these risks into accounts.



       Thank you!

       Jeanne



       *Jeanne Kramer-Smyth*

       *IT Officer, Information Management Services II*

       http://siteresources.worldbank.org/NEWS/Images/spacer.png

       *Information and Technology Solutions*

       *WBG Library & Archives of Development*

       T



       202-473-9803

       E



       jkramersmyth at worldbankgroup.org<mailto:jkramersmyth at worldbankgroup.org>
       <mailto:jkramersmyth at worldbankgroup.org%20>

       W



       www.worldbank.org<http://www.worldbank.org>
       <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.worldbank.org&d=DQMFAg&c=AGbYxfJbXK67KfXyGqyv2Ejiz41FqQuZFk4A-1IxfAU&r=NJgCuYsVfzWCDaR17iRz_stYXCBl0BBUfunzpCgq3O4&m=6K-rNEvustg-w3KUuAEUFhjRVFmFu0yMAsazbeVm-lg&s=TkShGzs9qr7es714pkkxzLceCXcULADNIGs74_m1QKQ&e=>

       http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg



       spellboundblog

       http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg



       jkramersmyth

       http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg



       jkramersmyth

       A



       1818 H St NW Washington, DC 20433

       http://siteresources.worldbank.org/NEWS/Images/spacer.png

       http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png





       --------------------------------------------------------------------------------

       ----
       To subscribe, unsubscribe, or modify your subscription, please visit
       http://mail.asis.org/mailman/listinfo/pasig-discuss
       _______
       PASIG Webinars and conference material is at
       http://www.preservationandarchivingsig.org/index.html
       _______________________________________________
       Pasig-discuss mailing list
       Pasig-discuss at mail.asis.org<mailto:Pasig-discuss at mail.asis.org> <mailto:Pasig-discuss at mail.asis.org>
       http://mail.asis.org/mailman/listinfo/pasig-discuss



----
To subscribe, unsubscribe, or modify your subscription, please visit
http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org<mailto:Pasig-discuss at mail.asis.org>
http://mail.asis.org/mailman/listinfo/pasig-discuss

--
----------------------------------------------------
Chris Wood
Storage & Data Management
Office:  408-782-2757 (Home Office)
Office:  408-276-0730 (Work Office)
Mobile:  408-218-7313 (Preferred)
Email: lw85381 at yahoo.com<mailto:lw85381 at yahoo.com>
----------------------------------------------------

--
----------------------------------------------------
Chris Wood
Storage & Data Management
Office:  408-782-2757 (Home Office)
Office:  408-276-0730 (Work Office)
Mobile:  408-218-7313 (Preferred)
Email: lw85381 at yahoo.com<mailto:lw85381 at yahoo.com>
----------------------------------------------------


--
Steinbuch Centre for Computing (SCC)
KIT - Campus Nord
Hermann von Helmholtzplatz 1
76344 Eggenstein - Leopoldshafen
☏ +49 721 60826305
Building 449, Room 122
Orcid ID: 0000-0003-0175-6216

----
To subscribe, unsubscribe, or modify your subscription, please visit
http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org<mailto:Pasig-discuss at mail.asis.org>
http://mail.asis.org/mailman/listinfo/pasig-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170320/7523776a/attachment-0001.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170320/7523776a/attachment-0001.pl>


More information about the Pasig-discuss mailing list