[Pasig-discuss] Risks of encryption & compression built into storage options?

van Wezel, Jos (SCC) jos.vanwezel at kit.edu
Sun Mar 19 18:41:37 EDT 2017


Hi Chris, thanks a lot. The paper is fun reading especially about the analog 
movie archive :-)). Hopefully you do find the mpeg paper. My searches returned 
nothing yet. (was it ever published in some way?)

@all: Having read all posts thus far (great stuff guys) clearly the engineering 
approach to the problem does not cut it at all. Reading between the lines there 
seems to be a lot of experience with disasters where even a BER of 10^99 and 4 
copies wont help. :-) For now we'll stick with 2 copies and 3 if requested 
explicitly by the client.

Groet

Jos


On 17/03/2017 17:48, Chris Wood wrote:
> Jos:
>
> I just knew somebody would ask this. Ha.  Several years ago several of us wrote
> a paper for the MPEG (Motion Pictures Expert Group) and a mathematician named
> Jeff Bonwick figured out all the math.  I haven't found it yet in the junk heap
> of my PC, but did find a companion paper written by by the same set of authors.
> It's not exactly, what you are looking for, but close. It's more about Bit Error
> Rates at a rather low level.  I will continue to look for the MPEG paper. It's
> got to be somewhere. The Internet "never forgets" Right?
> Stay tuned as I keep looking.
>
> CW
>
> On 3/17/2017 12:48 AM, van Wezel, Jos (SCC) wrote:
>> Chris,
>> do you happen to have any reference to the mathatical correctness or
>> computation that 3 copies is optimal. Is proof based on the standard ecc
>> values that vendors list with their components (tapes,  disks,  transport
>> lines, memory etc). I'm asking because its difficult to argue for the
>> additional costs of a third copy without the math. Currently I can't tell my
>> customers how much (as in percentage) extra security an addittional copy will
>> bring, even theoretically.
>>
>> regards
>>
>> jos
>>
>> Sent from my Samsung Galaxy smartphone.
>>
>> -------- Original message --------
>> From: Chris Wood <lw85381 at yahoo.com>
>> Date: 17/03/2017 02:07 (GMT+01:00)
>> To: "Raymond A. Clarke" <Raymond.Clarke1 at Verizon.net>,
>> gail at trumantechnologies.com, 'Jeanne Kramer-Smyth'
>> <jkramersmyth at worldbankgroup.org>, 'Robert Spindler' <rob.spindler at asu.edu>,
>> pasig-discuss at mail.asis.org
>> Subject: Re: [Pasig-discuss] Risks of encryption & compression built into
>> storage options?
>>
>> Thanks Ray as always for a great summary. Now my three bits:
>>
>> Three (3) copies please. One of which is in a remote location on a different
>> flood plane, Electric grid, fault line etc. for the obvious reasons.
>> Mathematically, this has turned out to be the optimal number looked at with a
>> cost/benefit mindset. Kind of like: 2 is better than one, buta  local problem
>> gets both copies. Three (remote) is more expensive but you get A LOT more data
>> resilience/persistence. Four costs a bunch more, but delivers just a little
>> bit more resilience. Four+ are all examples of ever diminishing returns.
>>
>> CW
>>
>> On 3/16/2017 4:40 PM, Raymond A. Clarke wrote:
>>>
>>> Hello All,
>>>
>>>
>>>
>>> A few years back, I did some research on bit-rot and data corruption, as it
>>> relates to the various medium that data passes through, on its way to and
>>> from the user.  Consider this simple example; as data from memory to HBA to
>>> cable to air to cable and so on, bits can be lost along way at any one of, or
>>> several of the medium transit  points. This something that current
>>> technologies can help with, in part.  Back to the original question, :how do
>>> we insure against corruption, either from compression, encryption” and/or
>>> transmission?  Well disk and tape(/data resting places/, if you will) have a
>>> come very long way in reducing bit-error rates, compression and encryption.
>>> But the “/resting places”/ are only part of a problem.  In accordance with
>>> Gail’s suggestion and as Dr. Rosenthal has coined, LOCKSS (“lot of copies
>>> keep stuff safe”).
>>>
>>>
>>>
>>>
>>>
>>> Take good care,
>>>
>>> Raymond
>>>
>>>
>>>
>>> *From:*Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] *On Behalf Of
>>> *gail at trumantechnologies.com
>>> *Sent:* Thursday, March 16, 2017 5:10 PM
>>> *To:* Jeanne Kramer-Smyth <jkramersmyth at worldbankgroup.org>; Robert Spindler
>>> <rob.spindler at asu.edu>; pasig-discuss at mail.asis.org
>>> *Subject:* Re: [Pasig-discuss] Risks of encryption & compression built into
>>> storage options?
>>>
>>>
>>>
>>> Hello again, Jeanne,
>>>
>>>
>>>
>>> I think you're hitting on something that needs to be raised to (and pushed
>>> for with) vendors, and that is the need for "More transparency" and the
>>> reporting to customers of "events" that are part of the provenance of a
>>> digital object. The storage architectures do a good job of error detection
>>> and self healing; however, they do not report this out. I'd like to (this is
>>> my dream) have vendors report back to customers (as part of their SLA) when a
>>> object (or part of an object if it's been chunked) has been
>>> repaired/self-healed - or lost forever. I could then record this as a PREMIS
>>> event. As you know, vendors "design for" 11x9s or 13x9s durability, but their
>>> SLAs do not require them to tell us if their durability and data corruption
>>> starts to get really bad for whatever reason.
>>>
>>>
>>>
>>> I've not directly answered your question about whether the encryption,
>>> dedupe, compression, and other things that can happen inside a storage system
>>> is increasing the risk of corruption. I'll look around. I am sure the disk
>>> vendors and storage solution and cloud storage vendors have run the numbers,
>>> but am not sure if they're made public.
>>>
>>>
>>>
>>> This alias has people from Oracle, Seagate and other storage companies on it
>>> so I encourage them to please share any research they have on this -
>>>
>>>
>>>
>>>
>>>
>>> Gail
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Gail Truman
>>>
>>> Truman Technologies, LLC
>>>
>>> Certified Digital Archives Specialist, Society of American Archivists
>>>
>>>
>>>
>>> /*Protecting the world's digital heritage for future generations*/
>>>
>>> www.trumantechnologies.com <http://www.trumantechnologies.com>
>>>
>>> facebook/TrumanTechnologies
>>>
>>> https://www.linkedin.com/in/gtruman
>>>
>>>
>>>
>>> +1 510 502 6497
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>     -------- Original Message --------
>>>     Subject: RE: [Pasig-discuss] Risks of encryption & compression built
>>>     into storage options?
>>>     From: Jeanne Kramer-Smyth <jkramersmyth at worldbankgroup.org
>>>     <mailto:jkramersmyth at worldbankgroup.org>>
>>>     Date: Thu, March 16, 2017 1:44 pm
>>>     To: "gail at trumantechnologies.com <mailto:gail at trumantechnologies.com>"
>>>     <gail at trumantechnologies.com <mailto:gail at trumantechnologies.com>>, "Robert
>>>     Spindler" <rob.spindler at asu.edu <mailto:rob.spindler at asu.edu>>,
>>>     "pasig-discuss at mail.asis.org <mailto:pasig-discuss at mail.asis.org>"
>>>     <pasig-discuss at mail.asis.org <mailto:pasig-discuss at mail.asis.org>>
>>>
>>>     Thanks Gail & Rob for your replies.
>>>
>>>
>>>
>>>     I am less worried about the scenario of someone stealing a drive – as Rob
>>>     pointed out, if that is happening we have bigger problems.
>>>
>>>
>>>
>>>     I do wonder if there are increased risks of bit-rot/file corruption with
>>>     encryption, compression, and data deduplication. Have there been any
>>>     studies on this? Could pulling a file off a drive that requires reversal
>>>     of the auto-encryption and auto-compression in place at the system level
>>>     mean a greater risk of bits flipping? I am trying to contrast the
>>>     increased “handling” and change required to get from the stored version
>>>     to the original version vs the decreased “handling” it would require if
>>>     what I am pulling off the storage device is exactly what I sent to be stored.
>>>
>>>
>>>
>>>     I am less worried about issues related to not being able to decrypt
>>>     content. The storage solutions we are contemplating would remain under
>>>     enough ongoing management that these issues should be avoidable. Since
>>>     ensuring that non-public records remain secure is also very important,
>>>     encryption gets some points in the “pro” column. I agree that having
>>>     multiple copies in different storage architectures and with different
>>>     vendors would also decrease risk.
>>>
>>>
>>>
>>>     I want to understand the risks related to the different storage
>>>     architectures and the ever increasing number of “automatic” things being
>>>     done to digital objects in the process of them being stored and
>>>     retrieved. Are there people doing work, independent of vendor claims, to
>>>     document these types of risks?
>>>
>>>
>>>
>>>     Thank you,
>>>
>>>
>>>
>>>     Jeanne
>>>
>>>     *Jeanne Kramer-Smyth*
>>>
>>>     *IT Officer, Information Management Services II*
>>>
>>>     http://siteresources.worldbank.org/NEWS/Images/spacer.png
>>>
>>>     *Information and Technology Solutions*
>>>
>>>     *WBG Library & Archives of Development*
>>>
>>>     T
>>>
>>>     	
>>>
>>>     202-473-9803
>>>
>>>     E
>>>
>>>     	
>>>
>>>     jkramersmyth at worldbankgroup.org <mailto:jkramersmyth at worldbankgroup.org%20>
>>>
>>>     W
>>>
>>>     	
>>>
>>>     www.worldbank.org
>>>     <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.worldbank.org&d=DQMFAg&c=AGbYxfJbXK67KfXyGqyv2Ejiz41FqQuZFk4A-1IxfAU&r=NJgCuYsVfzWCDaR17iRz_stYXCBl0BBUfunzpCgq3O4&m=6K-rNEvustg-w3KUuAEUFhjRVFmFu0yMAsazbeVm-lg&s=TkShGzs9qr7es714pkkxzLceCXcULADNIGs74_m1QKQ&e=>
>>>
>>>     http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg
>>>
>>>     	
>>>
>>>     spellboundblog
>>>
>>>     http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg
>>>
>>>     	
>>>
>>>     jkramersmyth
>>>
>>>     http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg
>>>
>>>     	
>>>
>>>     jkramersmyth
>>>
>>>     A
>>>
>>>     	
>>>
>>>     1818 H St NW Washington, DC 20433
>>>
>>>     http://siteresources.worldbank.org/NEWS/Images/spacer.png
>>>
>>>     http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png
>>>
>>>
>>>
>>>     *From:*gail at trumantechnologies.com <mailto:gail at trumantechnologies.com>
>>>     [mailto:gail at trumantechnologies.com]
>>>     *Sent:* Thursday, March 16, 2017 3:18 PM
>>>     *To:* Robert Spindler <rob.spindler at asu.edu
>>>     <mailto:rob.spindler at asu.edu>>; Jeanne Kramer-Smyth
>>>     <jkramersmyth at worldbankgroup.org
>>>     <mailto:jkramersmyth at worldbankgroup.org>>; pasig-discuss at mail.asis.org
>>>     <mailto:pasig-discuss at mail.asis.org>
>>>     *Subject:* RE: [Pasig-discuss] Risks of encryption & compression built
>>>     into storage options?
>>>
>>>
>>>
>>>     Hi all, a good topic!
>>>
>>>     There is new drive technology from Seagate (probably other manufacturers)
>>>     called "Self Encrypted Drives" (SEDs) which can be used to solve the
>>>     problem of a person stealing a drive and running off with data.
>>>
>>>
>>>
>>>     Most cloud services now automatically provide "server side encryption"
>>>     which means the vendor is doing the encryption for all data at rest (as
>>>     you point out Jeanne). This is required by HIPAA for all health care
>>>     data, and is now considered cloud best practice for cloud vendors due to
>>>     the very real risk of hacking. So, for archival, we need to weigh the
>>>     data security provided by cloud storage services using server side
>>>     encryption with the risk of the vendor managing the encryption keys.
>>>     Which IMO underscores the importance of having multiple copies of all
>>>     your archival data -- with different vendors and storage architectures or
>>>     media types if possible.
>>>
>>>
>>>
>>>     Gail
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>     Gail Truman
>>>
>>>     Truman Technologies, LLC
>>>
>>>     Certified Digital Archives Specialist, Society of American Archivists
>>>
>>>
>>>
>>>     /*Protecting the world's digital heritage for future generations*/
>>>
>>>     www.trumantechnologies.com <http://www.trumantechnologies.com>
>>>
>>>     facebook/TrumanTechnologies
>>>
>>>     https://www.linkedin.com/in/gtruman
>>>
>>>
>>>
>>>     +1 510 502 6497
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>         -------- Original Message --------
>>>         Subject: Re: [Pasig-discuss] Risks of encryption & compression built
>>>         into storage options?
>>>         From: Robert Spindler <rob.spindler at asu.edu
>>>         <mailto:rob.spindler at asu.edu>>
>>>         Date: Thu, March 16, 2017 9:06 am
>>>         To: Jeanne Kramer-Smyth <jkramersmyth at worldbankgroup.org
>>>         <mailto:jkramersmyth at worldbankgroup.org>>,
>>>         "pasig-discuss at mail.asis.org <mailto:pasig-discuss at mail.asis.org>"
>>>         <pasig-discuss at mail.asis.org <mailto:pasig-discuss at mail.asis.org>>
>>>
>>>         At risk of starting a conversation, here are a couple basic issues
>>>         from an archival standpoint:
>>>
>>>
>>>
>>>         Encryption: Who has the keys and what happens should a provider go
>>>         out of business?
>>>
>>>
>>>
>>>         Compression: Lossy or Lossless and how does that compression act on
>>>         different file formats (video/audio). If this is frequently accessed
>>>         material it becomes more of an issue.
>>>
>>>
>>>
>>>         Short story: At a CNI meeting perhaps 15 years ago in a session about
>>>         ebooks I asked a panel of vendors if they would give up the keys to
>>>         encrypted e-books when they reached public domain. Crickets.
>>>
>>>
>>>
>>>         Physical discs are not secure given the forensics software widely
>>>         available today, but if someone can grab a physical disc the provider
>>>         has more problems than forensics.
>>>
>>>
>>>
>>>         Rob Spindler
>>>
>>>         University Archivist and Head
>>>
>>>         Archives and Special Collections
>>>
>>>         Arizona State University Libraries
>>>
>>>         Tempe AZ 85287-1006
>>>
>>>         480.965.9277
>>>
>>>         http://www.asu.edu/lib/archives
>>>
>>>
>>>
>>>         *From:*Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] *On
>>>         Behalf Of *Jeanne Kramer-Smyth
>>>         *Sent:* Thursday, March 16, 2017 8:54 AM
>>>         *To:* pasig-discuss at mail.asis.org <mailto:pasig-discuss at mail.asis.org>
>>>         *Subject:* [Pasig-discuss] Risks of encryption & compression built
>>>         into storage options?
>>>
>>>
>>>
>>>         Is anyone aware of active research into the risks to digital
>>>         preservation that are posed by built in encryption and compression in
>>>         both cloud and on-prem storage options? Any and all go-to sources for
>>>         research and reading on these topics would be very welcome.
>>>
>>>
>>>
>>>         I am being told by the staff who source storage solutions for my
>>>         organization that encryption and compression are generally included
>>>         at the hardware level. That content is automatically encrypted and
>>>         compressed as it is written to disc – and then un-encrypted and
>>>         un-compressed as it is pulled off disc in response to a request. It
>>>         is advertised as both more secure (someone stealing a physical disc
>>>         could not, in theory, extract its contents) and more cost efficient
>>>         (taking up less space).
>>>
>>>
>>>
>>>         I want to be sure that as we make our choices for long-term storage
>>>         of permanent digital records that we take these risks into accounts.
>>>
>>>
>>>
>>>         Thank you!
>>>
>>>         Jeanne
>>>
>>>
>>>
>>>         *Jeanne Kramer-Smyth*
>>>
>>>         *IT Officer, Information Management Services II*
>>>
>>>         http://siteresources.worldbank.org/NEWS/Images/spacer.png
>>>
>>>         *Information and Technology Solutions*
>>>
>>>         *WBG Library & Archives of Development*
>>>
>>>         T
>>>
>>>         	
>>>
>>>         202-473-9803
>>>
>>>         E
>>>
>>>         	
>>>
>>>         jkramersmyth at worldbankgroup.org
>>>         <mailto:jkramersmyth at worldbankgroup.org%20>
>>>
>>>         W
>>>
>>>         	
>>>
>>>         www.worldbank.org
>>>         <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.worldbank.org&d=DQMFAg&c=AGbYxfJbXK67KfXyGqyv2Ejiz41FqQuZFk4A-1IxfAU&r=NJgCuYsVfzWCDaR17iRz_stYXCBl0BBUfunzpCgq3O4&m=6K-rNEvustg-w3KUuAEUFhjRVFmFu0yMAsazbeVm-lg&s=TkShGzs9qr7es714pkkxzLceCXcULADNIGs74_m1QKQ&e=>
>>>
>>>         http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg
>>>
>>>         	
>>>
>>>         spellboundblog
>>>
>>>         http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg
>>>
>>>         	
>>>
>>>         jkramersmyth
>>>
>>>         http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg
>>>
>>>         	
>>>
>>>         jkramersmyth
>>>
>>>         A
>>>
>>>         	
>>>
>>>         1818 H St NW Washington, DC 20433
>>>
>>>         http://siteresources.worldbank.org/NEWS/Images/spacer.png
>>>
>>>         http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png
>>>
>>>
>>>
>>>
>>>
>>>         --------------------------------------------------------------------------------
>>>
>>>         ----
>>>         To subscribe, unsubscribe, or modify your subscription, please visit
>>>         http://mail.asis.org/mailman/listinfo/pasig-discuss
>>>         _______
>>>         PASIG Webinars and conference material is at
>>>         http://www.preservationandarchivingsig.org/index.html
>>>         _______________________________________________
>>>         Pasig-discuss mailing list
>>>         Pasig-discuss at mail.asis.org <mailto:Pasig-discuss at mail.asis.org>
>>>         http://mail.asis.org/mailman/listinfo/pasig-discuss
>>>
>>>
>>>
>>> ----
>>> To subscribe, unsubscribe, or modify your subscription, please visit
>>> http://mail.asis.org/mailman/listinfo/pasig-discuss
>>> _______
>>> PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
>>> _______________________________________________
>>> Pasig-discuss mailing list
>>> Pasig-discuss at mail.asis.org
>>> http://mail.asis.org/mailman/listinfo/pasig-discuss
>>
>> --
>> ----------------------------------------------------
>> Chris Wood
>> Storage & Data Management
>> Office:  408-782-2757 (Home Office)
>> Office:  408-276-0730 (Work Office)
>> Mobile:  408-218-7313 (Preferred)
>> Email: lw85381 at yahoo.com
>> ----------------------------------------------------
>
> --
> ----------------------------------------------------
> Chris Wood
> Storage & Data Management
> Office:  408-782-2757 (Home Office)
> Office:  408-276-0730 (Work Office)
> Mobile:  408-218-7313 (Preferred)
> Email: lw85381 at yahoo.com
> ----------------------------------------------------
>

-- 
Steinbuch Centre for Computing (SCC)
KIT - Campus Nord
Hermann von Helmholtzplatz 1
76344 Eggenstein - Leopoldshafen
☏ +49 721 60826305
Building 449, Room 122
Orcid ID: 0000-0003-0175-6216

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5063 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170319/cb654de3/attachment.bin>


More information about the Pasig-discuss mailing list