[Pasig-discuss] Risks of encryption & compression built into storage options?

Chris Wood lw85381 at yahoo.com
Mon Mar 20 00:15:11 EDT 2017


Hi Jos:

I remember getting a nice hard copy of the booklet. I don't know if MPEG 
ever made it public. I thought that by now some institution would have 
posted it, but I can't find it. (yet) Still looking.

Your comments about "other" bad things happening is spot on. In an IBM 
study several of us did about 20 years ago on data loss causal agents, 
human error won by a huge margin. In last place (Fewest causal factors) 
was H/W failures. In between in rough order was application and data 
management Software, incorrect documentation, device Firmware (We used 
to call this microcode when we still had dial phones:-)), external 
events (Power failures, storms whatever) and a few other categories I 
forget.  I do remember our RAS expert (Reliability, Availability and 
Serviceability) making the point that perfect replication code 
replicates corrupted data perfectly. Even more true today.

You might find this a quick interesting read: Why did NASA TRIPLEX all 
computers in the Space Shuttle and have two separate vendors write the 
code for them with a sophisticated voting system cases of non-agreement. 
https://www.nap.edu/read/2222/chapter/5
It seemed to work fine, but inter-booster gaskets did not and it turned 
out the insulation tiles were not very good at foreign object impact 
resistance.
A good example of unknown and completely unexpected failure modes.

CW

On 3/19/2017 3:41 PM, van Wezel, Jos (SCC) wrote:
> Hi Chris, thanks a lot. The paper is fun reading especially about the 
> analog movie archive :-)). Hopefully you do find the mpeg paper. My 
> searches returned nothing yet. (was it ever published in some way?)
>
> @all: Having read all posts thus far (great stuff guys) clearly the 
> engineering approach to the problem does not cut it at all. Reading 
> between the lines there seems to be a lot of experience with disasters 
> where even a BER of 10^99 and 4 copies wont help. :-) For now we'll 
> stick with 2 copies and 3 if requested explicitly by the client.
>
> Groet
>
> Jos
>
>
> On 17/03/2017 17:48, Chris Wood wrote:
>> Jos:
>>
>> I just knew somebody would ask this. Ha.  Several years ago several 
>> of us wrote
>> a paper for the MPEG (Motion Pictures Expert Group) and a 
>> mathematician named
>> Jeff Bonwick figured out all the math.  I haven't found it yet in the 
>> junk heap
>> of my PC, but did find a companion paper written by by the same set 
>> of authors.
>> It's not exactly, what you are looking for, but close. It's more 
>> about Bit Error
>> Rates at a rather low level.  I will continue to look for the MPEG 
>> paper. It's
>> got to be somewhere. The Internet "never forgets" Right?
>> Stay tuned as I keep looking.
>>
>> CW
>>
>> On 3/17/2017 12:48 AM, van Wezel, Jos (SCC) wrote:
>>> Chris,
>>> do you happen to have any reference to the mathatical correctness or
>>> computation that 3 copies is optimal. Is proof based on the standard 
>>> ecc
>>> values that vendors list with their components (tapes, disks,  
>>> transport
>>> lines, memory etc). I'm asking because its difficult to argue for the
>>> additional costs of a third copy without the math. Currently I can't 
>>> tell my
>>> customers how much (as in percentage) extra security an addittional 
>>> copy will
>>> bring, even theoretically.
>>>
>>> regards
>>>
>>> jos
>>>
>>> Sent from my Samsung Galaxy smartphone.
>>>
>>> -------- Original message --------
>>> From: Chris Wood <lw85381 at yahoo.com>
>>> Date: 17/03/2017 02:07 (GMT+01:00)
>>> To: "Raymond A. Clarke" <Raymond.Clarke1 at Verizon.net>,
>>> gail at trumantechnologies.com, 'Jeanne Kramer-Smyth'
>>> <jkramersmyth at worldbankgroup.org>, 'Robert Spindler' 
>>> <rob.spindler at asu.edu>,
>>> pasig-discuss at mail.asis.org
>>> Subject: Re: [Pasig-discuss] Risks of encryption & compression built 
>>> into
>>> storage options?
>>>
>>> Thanks Ray as always for a great summary. Now my three bits:
>>>
>>> Three (3) copies please. One of which is in a remote location on a 
>>> different
>>> flood plane, Electric grid, fault line etc. for the obvious reasons.
>>> Mathematically, this has turned out to be the optimal number looked 
>>> at with a
>>> cost/benefit mindset. Kind of like: 2 is better than one, buta  
>>> local problem
>>> gets both copies. Three (remote) is more expensive but you get A LOT 
>>> more data
>>> resilience/persistence. Four costs a bunch more, but delivers just a 
>>> little
>>> bit more resilience. Four+ are all examples of ever diminishing 
>>> returns.
>>>
>>> CW
>>>
>>> On 3/16/2017 4:40 PM, Raymond A. Clarke wrote:
>>>>
>>>> Hello All,
>>>>
>>>>
>>>>
>>>> A few years back, I did some research on bit-rot and data 
>>>> corruption, as it
>>>> relates to the various medium that data passes through, on its way 
>>>> to and
>>>> from the user.  Consider this simple example; as data from memory 
>>>> to HBA to
>>>> cable to air to cable and so on, bits can be lost along way at any 
>>>> one of, or
>>>> several of the medium transit  points. This something that current
>>>> technologies can help with, in part.  Back to the original 
>>>> question, :how do
>>>> we insure against corruption, either from compression, encryption” 
>>>> and/or
>>>> transmission?  Well disk and tape(/data resting places/, if you 
>>>> will) have a
>>>> come very long way in reducing bit-error rates, compression and 
>>>> encryption.
>>>> But the “/resting places”/ are only part of a problem.  In 
>>>> accordance with
>>>> Gail’s suggestion and as Dr. Rosenthal has coined, LOCKSS (“lot of 
>>>> copies
>>>> keep stuff safe”).
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Take good care,
>>>>
>>>> Raymond
>>>>
>>>>
>>>>
>>>> *From:*Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] *On 
>>>> Behalf Of
>>>> *gail at trumantechnologies.com
>>>> *Sent:* Thursday, March 16, 2017 5:10 PM
>>>> *To:* Jeanne Kramer-Smyth <jkramersmyth at worldbankgroup.org>; Robert 
>>>> Spindler
>>>> <rob.spindler at asu.edu>; pasig-discuss at mail.asis.org
>>>> *Subject:* Re: [Pasig-discuss] Risks of encryption & compression 
>>>> built into
>>>> storage options?
>>>>
>>>>
>>>>
>>>> Hello again, Jeanne,
>>>>
>>>>
>>>>
>>>> I think you're hitting on something that needs to be raised to (and 
>>>> pushed
>>>> for with) vendors, and that is the need for "More transparency" and 
>>>> the
>>>> reporting to customers of "events" that are part of the provenance 
>>>> of a
>>>> digital object. The storage architectures do a good job of error 
>>>> detection
>>>> and self healing; however, they do not report this out. I'd like to 
>>>> (this is
>>>> my dream) have vendors report back to customers (as part of their 
>>>> SLA) when a
>>>> object (or part of an object if it's been chunked) has been
>>>> repaired/self-healed - or lost forever. I could then record this as 
>>>> a PREMIS
>>>> event. As you know, vendors "design for" 11x9s or 13x9s durability, 
>>>> but their
>>>> SLAs do not require them to tell us if their durability and data 
>>>> corruption
>>>> starts to get really bad for whatever reason.
>>>>
>>>>
>>>>
>>>> I've not directly answered your question about whether the encryption,
>>>> dedupe, compression, and other things that can happen inside a 
>>>> storage system
>>>> is increasing the risk of corruption. I'll look around. I am sure 
>>>> the disk
>>>> vendors and storage solution and cloud storage vendors have run the 
>>>> numbers,
>>>> but am not sure if they're made public.
>>>>
>>>>
>>>>
>>>> This alias has people from Oracle, Seagate and other storage 
>>>> companies on it
>>>> so I encourage them to please share any research they have on this -
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Gail
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Gail Truman
>>>>
>>>> Truman Technologies, LLC
>>>>
>>>> Certified Digital Archives Specialist, Society of American Archivists
>>>>
>>>>
>>>>
>>>> /*Protecting the world's digital heritage for future generations*/
>>>>
>>>> www.trumantechnologies.com <http://www.trumantechnologies.com>
>>>>
>>>> facebook/TrumanTechnologies
>>>>
>>>> https://www.linkedin.com/in/gtruman
>>>>
>>>>
>>>>
>>>> +1 510 502 6497
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>     -------- Original Message --------
>>>>     Subject: RE: [Pasig-discuss] Risks of encryption & compression 
>>>> built
>>>>     into storage options?
>>>>     From: Jeanne Kramer-Smyth <jkramersmyth at worldbankgroup.org
>>>>     <mailto:jkramersmyth at worldbankgroup.org>>
>>>>     Date: Thu, March 16, 2017 1:44 pm
>>>>     To: "gail at trumantechnologies.com 
>>>> <mailto:gail at trumantechnologies.com>"
>>>>     <gail at trumantechnologies.com 
>>>> <mailto:gail at trumantechnologies.com>>, "Robert
>>>>     Spindler" <rob.spindler at asu.edu <mailto:rob.spindler at asu.edu>>,
>>>>     "pasig-discuss at mail.asis.org <mailto:pasig-discuss at mail.asis.org>"
>>>>     <pasig-discuss at mail.asis.org <mailto:pasig-discuss at mail.asis.org>>
>>>>
>>>>     Thanks Gail & Rob for your replies.
>>>>
>>>>
>>>>
>>>>     I am less worried about the scenario of someone stealing a 
>>>> drive – as Rob
>>>>     pointed out, if that is happening we have bigger problems.
>>>>
>>>>
>>>>
>>>>     I do wonder if there are increased risks of bit-rot/file 
>>>> corruption with
>>>>     encryption, compression, and data deduplication. Have there 
>>>> been any
>>>>     studies on this? Could pulling a file off a drive that requires 
>>>> reversal
>>>>     of the auto-encryption and auto-compression in place at the 
>>>> system level
>>>>     mean a greater risk of bits flipping? I am trying to contrast the
>>>>     increased “handling” and change required to get from the stored 
>>>> version
>>>>     to the original version vs the decreased “handling” it would 
>>>> require if
>>>>     what I am pulling off the storage device is exactly what I sent 
>>>> to be stored.
>>>>
>>>>
>>>>
>>>>     I am less worried about issues related to not being able to 
>>>> decrypt
>>>>     content. The storage solutions we are contemplating would 
>>>> remain under
>>>>     enough ongoing management that these issues should be 
>>>> avoidable. Since
>>>>     ensuring that non-public records remain secure is also very 
>>>> important,
>>>>     encryption gets some points in the “pro” column. I agree that 
>>>> having
>>>>     multiple copies in different storage architectures and with 
>>>> different
>>>>     vendors would also decrease risk.
>>>>
>>>>
>>>>
>>>>     I want to understand the risks related to the different storage
>>>>     architectures and the ever increasing number of “automatic” 
>>>> things being
>>>>     done to digital objects in the process of them being stored and
>>>>     retrieved. Are there people doing work, independent of vendor 
>>>> claims, to
>>>>     document these types of risks?
>>>>
>>>>
>>>>
>>>>     Thank you,
>>>>
>>>>
>>>>
>>>>     Jeanne
>>>>
>>>>     *Jeanne Kramer-Smyth*
>>>>
>>>>     *IT Officer, Information Management Services II*
>>>>
>>>> http://siteresources.worldbank.org/NEWS/Images/spacer.png
>>>>
>>>>     *Information and Technology Solutions*
>>>>
>>>>     *WBG Library & Archives of Development*
>>>>
>>>>     T
>>>>
>>>>
>>>>
>>>>     202-473-9803
>>>>
>>>>     E
>>>>
>>>>
>>>>
>>>>     jkramersmyth at worldbankgroup.org 
>>>> <mailto:jkramersmyth at worldbankgroup.org%20>
>>>>
>>>>     W
>>>>
>>>>
>>>>
>>>>     www.worldbank.org
>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.worldbank.org&d=DQMFAg&c=AGbYxfJbXK67KfXyGqyv2Ejiz41FqQuZFk4A-1IxfAU&r=NJgCuYsVfzWCDaR17iRz_stYXCBl0BBUfunzpCgq3O4&m=6K-rNEvustg-w3KUuAEUFhjRVFmFu0yMAsazbeVm-lg&s=TkShGzs9qr7es714pkkxzLceCXcULADNIGs74_m1QKQ&e=>
>>>>
>>>> http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg
>>>>
>>>>
>>>>
>>>>     spellboundblog
>>>>
>>>> http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg
>>>>
>>>>
>>>>
>>>>     jkramersmyth
>>>>
>>>> http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg
>>>>
>>>>
>>>>
>>>>     jkramersmyth
>>>>
>>>>     A
>>>>
>>>>
>>>>
>>>>     1818 H St NW Washington, DC 20433
>>>>
>>>> http://siteresources.worldbank.org/NEWS/Images/spacer.png
>>>>
>>>> http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png
>>>>
>>>>
>>>>
>>>>     *From:*gail at trumantechnologies.com 
>>>> <mailto:gail at trumantechnologies.com>
>>>>     [mailto:gail at trumantechnologies.com]
>>>>     *Sent:* Thursday, March 16, 2017 3:18 PM
>>>>     *To:* Robert Spindler <rob.spindler at asu.edu
>>>>     <mailto:rob.spindler at asu.edu>>; Jeanne Kramer-Smyth
>>>>     <jkramersmyth at worldbankgroup.org
>>>>     <mailto:jkramersmyth at worldbankgroup.org>>; 
>>>> pasig-discuss at mail.asis.org
>>>>     <mailto:pasig-discuss at mail.asis.org>
>>>>     *Subject:* RE: [Pasig-discuss] Risks of encryption & 
>>>> compression built
>>>>     into storage options?
>>>>
>>>>
>>>>
>>>>     Hi all, a good topic!
>>>>
>>>>     There is new drive technology from Seagate (probably other 
>>>> manufacturers)
>>>>     called "Self Encrypted Drives" (SEDs) which can be used to 
>>>> solve the
>>>>     problem of a person stealing a drive and running off with data.
>>>>
>>>>
>>>>
>>>>     Most cloud services now automatically provide "server side 
>>>> encryption"
>>>>     which means the vendor is doing the encryption for all data at 
>>>> rest (as
>>>>     you point out Jeanne). This is required by HIPAA for all health 
>>>> care
>>>>     data, and is now considered cloud best practice for cloud 
>>>> vendors due to
>>>>     the very real risk of hacking. So, for archival, we need to 
>>>> weigh the
>>>>     data security provided by cloud storage services using server side
>>>>     encryption with the risk of the vendor managing the encryption 
>>>> keys.
>>>>     Which IMO underscores the importance of having multiple copies 
>>>> of all
>>>>     your archival data -- with different vendors and storage 
>>>> architectures or
>>>>     media types if possible.
>>>>
>>>>
>>>>
>>>>     Gail
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>     Gail Truman
>>>>
>>>>     Truman Technologies, LLC
>>>>
>>>>     Certified Digital Archives Specialist, Society of American 
>>>> Archivists
>>>>
>>>>
>>>>
>>>>     /*Protecting the world's digital heritage for future generations*/
>>>>
>>>>     www.trumantechnologies.com <http://www.trumantechnologies.com>
>>>>
>>>>     facebook/TrumanTechnologies
>>>>
>>>>     https://www.linkedin.com/in/gtruman
>>>>
>>>>
>>>>
>>>>     +1 510 502 6497
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>         -------- Original Message --------
>>>>         Subject: Re: [Pasig-discuss] Risks of encryption & 
>>>> compression built
>>>>         into storage options?
>>>>         From: Robert Spindler <rob.spindler at asu.edu
>>>>         <mailto:rob.spindler at asu.edu>>
>>>>         Date: Thu, March 16, 2017 9:06 am
>>>>         To: Jeanne Kramer-Smyth <jkramersmyth at worldbankgroup.org
>>>>         <mailto:jkramersmyth at worldbankgroup.org>>,
>>>>         "pasig-discuss at mail.asis.org 
>>>> <mailto:pasig-discuss at mail.asis.org>"
>>>>         <pasig-discuss at mail.asis.org 
>>>> <mailto:pasig-discuss at mail.asis.org>>
>>>>
>>>>         At risk of starting a conversation, here are a couple basic 
>>>> issues
>>>>         from an archival standpoint:
>>>>
>>>>
>>>>
>>>>         Encryption: Who has the keys and what happens should a 
>>>> provider go
>>>>         out of business?
>>>>
>>>>
>>>>
>>>>         Compression: Lossy or Lossless and how does that 
>>>> compression act on
>>>>         different file formats (video/audio). If this is frequently 
>>>> accessed
>>>>         material it becomes more of an issue.
>>>>
>>>>
>>>>
>>>>         Short story: At a CNI meeting perhaps 15 years ago in a 
>>>> session about
>>>>         ebooks I asked a panel of vendors if they would give up the 
>>>> keys to
>>>>         encrypted e-books when they reached public domain. Crickets.
>>>>
>>>>
>>>>
>>>>         Physical discs are not secure given the forensics software 
>>>> widely
>>>>         available today, but if someone can grab a physical disc 
>>>> the provider
>>>>         has more problems than forensics.
>>>>
>>>>
>>>>
>>>>         Rob Spindler
>>>>
>>>>         University Archivist and Head
>>>>
>>>>         Archives and Special Collections
>>>>
>>>>         Arizona State University Libraries
>>>>
>>>>         Tempe AZ 85287-1006
>>>>
>>>>         480.965.9277
>>>>
>>>>         http://www.asu.edu/lib/archives
>>>>
>>>>
>>>>
>>>>         *From:*Pasig-discuss 
>>>> [mailto:pasig-discuss-bounces at asis.org] *On
>>>>         Behalf Of *Jeanne Kramer-Smyth
>>>>         *Sent:* Thursday, March 16, 2017 8:54 AM
>>>>         *To:* pasig-discuss at mail.asis.org 
>>>> <mailto:pasig-discuss at mail.asis.org>
>>>>         *Subject:* [Pasig-discuss] Risks of encryption & 
>>>> compression built
>>>>         into storage options?
>>>>
>>>>
>>>>
>>>>         Is anyone aware of active research into the risks to digital
>>>>         preservation that are posed by built in encryption and 
>>>> compression in
>>>>         both cloud and on-prem storage options? Any and all go-to 
>>>> sources for
>>>>         research and reading on these topics would be very welcome.
>>>>
>>>>
>>>>
>>>>         I am being told by the staff who source storage solutions 
>>>> for my
>>>>         organization that encryption and compression are generally 
>>>> included
>>>>         at the hardware level. That content is automatically 
>>>> encrypted and
>>>>         compressed as it is written to disc – and then un-encrypted 
>>>> and
>>>>         un-compressed as it is pulled off disc in response to a 
>>>> request. It
>>>>         is advertised as both more secure (someone stealing a 
>>>> physical disc
>>>>         could not, in theory, extract its contents) and more cost 
>>>> efficient
>>>>         (taking up less space).
>>>>
>>>>
>>>>
>>>>         I want to be sure that as we make our choices for long-term 
>>>> storage
>>>>         of permanent digital records that we take these risks into 
>>>> accounts.
>>>>
>>>>
>>>>
>>>>         Thank you!
>>>>
>>>>         Jeanne
>>>>
>>>>
>>>>
>>>>         *Jeanne Kramer-Smyth*
>>>>
>>>>         *IT Officer, Information Management Services II*
>>>>
>>>> http://siteresources.worldbank.org/NEWS/Images/spacer.png
>>>>
>>>>         *Information and Technology Solutions*
>>>>
>>>>         *WBG Library & Archives of Development*
>>>>
>>>>         T
>>>>
>>>>
>>>>
>>>>         202-473-9803
>>>>
>>>>         E
>>>>
>>>>
>>>>
>>>>         jkramersmyth at worldbankgroup.org
>>>>         <mailto:jkramersmyth at worldbankgroup.org%20>
>>>>
>>>>         W
>>>>
>>>>
>>>>
>>>>         www.worldbank.org
>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.worldbank.org&d=DQMFAg&c=AGbYxfJbXK67KfXyGqyv2Ejiz41FqQuZFk4A-1IxfAU&r=NJgCuYsVfzWCDaR17iRz_stYXCBl0BBUfunzpCgq3O4&m=6K-rNEvustg-w3KUuAEUFhjRVFmFu0yMAsazbeVm-lg&s=TkShGzs9qr7es714pkkxzLceCXcULADNIGs74_m1QKQ&e=>
>>>>
>>>> http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg
>>>>
>>>>
>>>>
>>>>         spellboundblog
>>>>
>>>> http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg
>>>>
>>>>
>>>>
>>>>         jkramersmyth
>>>>
>>>> http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg
>>>>
>>>>
>>>>
>>>>         jkramersmyth
>>>>
>>>>         A
>>>>
>>>>
>>>>
>>>>         1818 H St NW Washington, DC 20433
>>>>
>>>> http://siteresources.worldbank.org/NEWS/Images/spacer.png
>>>>
>>>> http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --------------------------------------------------------------------------------
>>>>
>>>>         ----
>>>>         To subscribe, unsubscribe, or modify your subscription, 
>>>> please visit
>>>>         http://mail.asis.org/mailman/listinfo/pasig-discuss
>>>>         _______
>>>>         PASIG Webinars and conference material is at
>>>> http://www.preservationandarchivingsig.org/index.html
>>>>         _______________________________________________
>>>>         Pasig-discuss mailing list
>>>>         Pasig-discuss at mail.asis.org 
>>>> <mailto:Pasig-discuss at mail.asis.org>
>>>>         http://mail.asis.org/mailman/listinfo/pasig-discuss
>>>>
>>>>
>>>>
>>>> ----
>>>> To subscribe, unsubscribe, or modify your subscription, please visit
>>>> http://mail.asis.org/mailman/listinfo/pasig-discuss
>>>> _______
>>>> PASIG Webinars and conference material is at 
>>>> http://www.preservationandarchivingsig.org/index.html
>>>> _______________________________________________
>>>> Pasig-discuss mailing list
>>>> Pasig-discuss at mail.asis.org
>>>> http://mail.asis.org/mailman/listinfo/pasig-discuss
>>>
>>> -- 
>>> ----------------------------------------------------
>>> Chris Wood
>>> Storage & Data Management
>>> Office:  408-782-2757 (Home Office)
>>> Office:  408-276-0730 (Work Office)
>>> Mobile:  408-218-7313 (Preferred)
>>> Email: lw85381 at yahoo.com
>>> ----------------------------------------------------
>>
>> -- 
>> ----------------------------------------------------
>> Chris Wood
>> Storage & Data Management
>> Office:  408-782-2757 (Home Office)
>> Office:  408-276-0730 (Work Office)
>> Mobile:  408-218-7313 (Preferred)
>> Email: lw85381 at yahoo.com
>> ----------------------------------------------------
>>
>

-- 
----------------------------------------------------
Chris Wood
Storage & Data Management
Office:  408-782-2757 (Home Office)
Office:  408-276-0730 (Work Office)
Mobile:  408-218-7313 (Preferred)
Email: lw85381 at yahoo.com
----------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170319/d67d1573/attachment-0001.html>


More information about the Pasig-discuss mailing list