[Pasig-discuss] Risks of encryption & compression built into storage options?
Matthew Addis
matthew.addis at arkivum.com
Mon Mar 20 03:28:21 EDT 2017
Hi Chris, Jos,
There’s some examples of the effects that bit-flips and other data corruptions have on compressed AV content in a report from the PrestoPRIME project. There’s some links in there to work by Heydegger and others, e.g. impact of bit errors on JPEG2000. The report mainly covers AV, but there are some references in there about other compressed file formats, e.g. work by CERN on problems opening zips after bit-errors. See page 57 onwards.
https://eprints.soton.ac.uk/373760/1/373760.pdf
This was followed up by work in the DAVID project that did a more extensive survey of how AV content gets corrupted in practice within big AV archives. Note that bit-errors from storage, a.k.a bit rot was not a significant issue, well not compared with all the other problems!
http://david-preservation.eu/wp-content/uploads/2013/10/DAVID-D2-1-INA-WP2-DamageAssessment_v1-20.pdf
The reports above cover some aspects of compression at the file-format level (jpeg, zip etc.) and not compression at the hardware level (e.g. LTO data tape). At Arkivum we turn compression off at the hardware level and instead let our clients chose to use compression or not at the application level. In practice, most people using our service already have compressed file-formats, esp. images and video, because of the reduced data volumes which saves storage, bandwidth etc. in their day-to-day workflows. Trying to add compression on the top e.g. at the LTO level rarely adds any benefit.
Cheers,
Matthew
Matthew Addis
Chief Technology Officer
tel: +44 1249 405060
mob: +44 7703 393374
email: matthew.addis at arkivum.com<mailto:matthew.addis at arkivum.com>
web: www.arkivum.com<http://www.arkivum.com/>
twitter: @arkivum
This message is confidential unless otherwise stated.
Arkivum Limited is registered in England and Wales, company number 7530353. Registered Office: 24 Cornhill, London, EC3V 3ND, United Kingdom
From: Pasig-discuss <pasig-discuss-bounces at asis.org<mailto:pasig-discuss-bounces at asis.org>> on behalf of Chris Wood <lw85381 at yahoo.com<mailto:lw85381 at yahoo.com>>
Date: Monday, 20 March 2017 04:15
To: "jos.vanwezel at kit.edu<mailto:jos.vanwezel at kit.edu>" <jos.vanwezel at kit.edu<mailto:jos.vanwezel at kit.edu>>
Cc: "pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org>" <pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org>>
Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options?
Hi Jos:
I remember getting a nice hard copy of the booklet. I don't know if MPEG ever made it public. I thought that by now some institution would have posted it, but I can't find it. (yet) Still looking.
Your comments about "other" bad things happening is spot on. In an IBM study several of us did about 20 years ago on data loss causal agents, human error won by a huge margin. In last place (Fewest causal factors) was H/W failures. In between in rough order was application and data management Software, incorrect documentation, device Firmware (We used to call this microcode when we still had dial phones:-)), external events (Power failures, storms whatever) and a few other categories I forget. I do remember our RAS expert (Reliability, Availability and Serviceability) making the point that perfect replication code replicates corrupted data perfectly. Even more true today.
You might find this a quick interesting read: Why did NASA TRIPLEX all computers in the Space Shuttle and have two separate vendors write the code for them with a sophisticated voting system cases of non-agreement. https://www.nap.edu/read/2222/chapter/5
It seemed to work fine, but inter-booster gaskets did not and it turned out the insulation tiles were not very good at foreign object impact resistance.
A good example of unknown and completely unexpected failure modes.
CW
On 3/19/2017 3:41 PM, van Wezel, Jos (SCC) wrote:
Hi Chris, thanks a lot. The paper is fun reading especially about the analog movie archive :-)). Hopefully you do find the mpeg paper. My searches returned nothing yet. (was it ever published in some way?)
@all: Having read all posts thus far (great stuff guys) clearly the engineering approach to the problem does not cut it at all. Reading between the lines there seems to be a lot of experience with disasters where even a BER of 10^99 and 4 copies wont help. :-) For now we'll stick with 2 copies and 3 if requested explicitly by the client.
Groet
Jos
On 17/03/2017 17:48, Chris Wood wrote:
Jos:
I just knew somebody would ask this. Ha. Several years ago several of us wrote
a paper for the MPEG (Motion Pictures Expert Group) and a mathematician named
Jeff Bonwick figured out all the math. I haven't found it yet in the junk heap
of my PC, but did find a companion paper written by by the same set of authors.
It's not exactly, what you are looking for, but close. It's more about Bit Error
Rates at a rather low level. I will continue to look for the MPEG paper. It's
got to be somewhere. The Internet "never forgets" Right?
Stay tuned as I keep looking.
CW
On 3/17/2017 12:48 AM, van Wezel, Jos (SCC) wrote:
Chris,
do you happen to have any reference to the mathatical correctness or
computation that 3 copies is optimal. Is proof based on the standard ecc
values that vendors list with their components (tapes, disks, transport
lines, memory etc). I'm asking because its difficult to argue for the
additional costs of a third copy without the math. Currently I can't tell my
customers how much (as in percentage) extra security an addittional copy will
bring, even theoretically.
regards
jos
Sent from my Samsung Galaxy smartphone.
-------- Original message --------
From: Chris Wood <lw85381 at yahoo.com><mailto:lw85381 at yahoo.com>
Date: 17/03/2017 02:07 (GMT+01:00)
To: "Raymond A. Clarke" <Raymond.Clarke1 at Verizon.net><mailto:Raymond.Clarke1 at Verizon.net>,
gail at trumantechnologies.com<mailto:gail at trumantechnologies.com>, 'Jeanne Kramer-Smyth'
<jkramersmyth at worldbankgroup.org><mailto:jkramersmyth at worldbankgroup.org>, 'Robert Spindler' <rob.spindler at asu.edu><mailto:rob.spindler at asu.edu>,
pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org>
Subject: Re: [Pasig-discuss] Risks of encryption & compression built into
storage options?
Thanks Ray as always for a great summary. Now my three bits:
Three (3) copies please. One of which is in a remote location on a different
flood plane, Electric grid, fault line etc. for the obvious reasons.
Mathematically, this has turned out to be the optimal number looked at with a
cost/benefit mindset. Kind of like: 2 is better than one, buta local problem
gets both copies. Three (remote) is more expensive but you get A LOT more data
resilience/persistence. Four costs a bunch more, but delivers just a little
bit more resilience. Four+ are all examples of ever diminishing returns.
CW
On 3/16/2017 4:40 PM, Raymond A. Clarke wrote:
Hello All,
A few years back, I did some research on bit-rot and data corruption, as it
relates to the various medium that data passes through, on its way to and
from the user. Consider this simple example; as data from memory to HBA to
cable to air to cable and so on, bits can be lost along way at any one of, or
several of the medium transit points. This something that current
technologies can help with, in part. Back to the original question, :how do
we insure against corruption, either from compression, encryption” and/or
transmission? Well disk and tape(/data resting places/, if you will) have a
come very long way in reducing bit-error rates, compression and encryption.
But the “/resting places”/ are only part of a problem. In accordance with
Gail’s suggestion and as Dr. Rosenthal has coined, LOCKSS (“lot of copies
keep stuff safe”).
Take good care,
Raymond
*From:*Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] *On Behalf Of
*gail at trumantechnologies.com<mailto:*gail at trumantechnologies.com>
*Sent:* Thursday, March 16, 2017 5:10 PM
*To:* Jeanne Kramer-Smyth <jkramersmyth at worldbankgroup.org><mailto:jkramersmyth at worldbankgroup.org>; Robert Spindler
<rob.spindler at asu.edu><mailto:rob.spindler at asu.edu>; pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org>
*Subject:* Re: [Pasig-discuss] Risks of encryption & compression built into
storage options?
Hello again, Jeanne,
I think you're hitting on something that needs to be raised to (and pushed
for with) vendors, and that is the need for "More transparency" and the
reporting to customers of "events" that are part of the provenance of a
digital object. The storage architectures do a good job of error detection
and self healing; however, they do not report this out. I'd like to (this is
my dream) have vendors report back to customers (as part of their SLA) when a
object (or part of an object if it's been chunked) has been
repaired/self-healed - or lost forever. I could then record this as a PREMIS
event. As you know, vendors "design for" 11x9s or 13x9s durability, but their
SLAs do not require them to tell us if their durability and data corruption
starts to get really bad for whatever reason.
I've not directly answered your question about whether the encryption,
dedupe, compression, and other things that can happen inside a storage system
is increasing the risk of corruption. I'll look around. I am sure the disk
vendors and storage solution and cloud storage vendors have run the numbers,
but am not sure if they're made public.
This alias has people from Oracle, Seagate and other storage companies on it
so I encourage them to please share any research they have on this -
Gail
Gail Truman
Truman Technologies, LLC
Certified Digital Archives Specialist, Society of American Archivists
/*Protecting the world's digital heritage for future generations*/
www.trumantechnologies.com<http://www.trumantechnologies.com><http://www.trumantechnologies.com><http://www.trumantechnologies.com>
facebook/TrumanTechnologies
https://www.linkedin.com/in/gtruman
+1 510 502 6497
-------- Original Message --------
Subject: RE: [Pasig-discuss] Risks of encryption & compression built
into storage options?
From: Jeanne Kramer-Smyth <jkramersmyth at worldbankgroup.org<mailto:jkramersmyth at worldbankgroup.org>
<mailto:jkramersmyth at worldbankgroup.org><mailto:jkramersmyth at worldbankgroup.org>>
Date: Thu, March 16, 2017 1:44 pm
To: "gail at trumantechnologies.com<mailto:gail at trumantechnologies.com><mailto:gail at trumantechnologies.com><mailto:gail at trumantechnologies.com>"
<gail at trumantechnologies.com<mailto:gail at trumantechnologies.com><mailto:gail at trumantechnologies.com><mailto:gail at trumantechnologies.com>>, "Robert
Spindler" <rob.spindler at asu.edu<mailto:rob.spindler at asu.edu><mailto:rob.spindler at asu.edu><mailto:rob.spindler at asu.edu>>,
"pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org><mailto:pasig-discuss at mail.asis.org><mailto:pasig-discuss at mail.asis.org>"
<pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org><mailto:pasig-discuss at mail.asis.org><mailto:pasig-discuss at mail.asis.org>>
Thanks Gail & Rob for your replies.
I am less worried about the scenario of someone stealing a drive – as Rob
pointed out, if that is happening we have bigger problems.
I do wonder if there are increased risks of bit-rot/file corruption with
encryption, compression, and data deduplication. Have there been any
studies on this? Could pulling a file off a drive that requires reversal
of the auto-encryption and auto-compression in place at the system level
mean a greater risk of bits flipping? I am trying to contrast the
increased “handling” and change required to get from the stored version
to the original version vs the decreased “handling” it would require if
what I am pulling off the storage device is exactly what I sent to be stored.
I am less worried about issues related to not being able to decrypt
content. The storage solutions we are contemplating would remain under
enough ongoing management that these issues should be avoidable. Since
ensuring that non-public records remain secure is also very important,
encryption gets some points in the “pro” column. I agree that having
multiple copies in different storage architectures and with different
vendors would also decrease risk.
I want to understand the risks related to the different storage
architectures and the ever increasing number of “automatic” things being
done to digital objects in the process of them being stored and
retrieved. Are there people doing work, independent of vendor claims, to
document these types of risks?
Thank you,
Jeanne
*Jeanne Kramer-Smyth*
*IT Officer, Information Management Services II*
http://siteresources.worldbank.org/NEWS/Images/spacer.png
*Information and Technology Solutions*
*WBG Library & Archives of Development*
T
202-473-9803
E
jkramersmyth at worldbankgroup.org<mailto:jkramersmyth at worldbankgroup.org> <mailto:jkramersmyth at worldbankgroup.org%20><mailto:jkramersmyth at worldbankgroup.org%20>
W
www.worldbank.org<http://www.worldbank.org>
<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.worldbank.org&d=DQMFAg&c=AGbYxfJbXK67KfXyGqyv2Ejiz41FqQuZFk4A-1IxfAU&r=NJgCuYsVfzWCDaR17iRz_stYXCBl0BBUfunzpCgq3O4&m=6K-rNEvustg-w3KUuAEUFhjRVFmFu0yMAsazbeVm-lg&s=TkShGzs9qr7es714pkkxzLceCXcULADNIGs74_m1QKQ&e=><https://urldefense.proofpoint.com/v2/url?u=http-3A__www.worldbank.org&d=DQMFAg&c=AGbYxfJbXK67KfXyGqyv2Ejiz41FqQuZFk4A-1IxfAU&r=NJgCuYsVfzWCDaR17iRz_stYXCBl0BBUfunzpCgq3O4&m=6K-rNEvustg-w3KUuAEUFhjRVFmFu0yMAsazbeVm-lg&s=TkShGzs9qr7es714pkkxzLceCXcULADNIGs74_m1QKQ&e=>
http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg
spellboundblog
http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg
jkramersmyth
http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg
jkramersmyth
A
1818 H St NW Washington, DC 20433
http://siteresources.worldbank.org/NEWS/Images/spacer.png
http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png
*From:*gail at trumantechnologies.com<mailto:*gail at trumantechnologies.com> <mailto:gail at trumantechnologies.com><mailto:gail at trumantechnologies.com>
[mailto:gail at trumantechnologies.com]
*Sent:* Thursday, March 16, 2017 3:18 PM
*To:* Robert Spindler <rob.spindler at asu.edu<mailto:rob.spindler at asu.edu>
<mailto:rob.spindler at asu.edu><mailto:rob.spindler at asu.edu>>; Jeanne Kramer-Smyth
<jkramersmyth at worldbankgroup.org<mailto:jkramersmyth at worldbankgroup.org>
<mailto:jkramersmyth at worldbankgroup.org><mailto:jkramersmyth at worldbankgroup.org>>; pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org>
<mailto:pasig-discuss at mail.asis.org><mailto:pasig-discuss at mail.asis.org>
*Subject:* RE: [Pasig-discuss] Risks of encryption & compression built
into storage options?
Hi all, a good topic!
There is new drive technology from Seagate (probably other manufacturers)
called "Self Encrypted Drives" (SEDs) which can be used to solve the
problem of a person stealing a drive and running off with data.
Most cloud services now automatically provide "server side encryption"
which means the vendor is doing the encryption for all data at rest (as
you point out Jeanne). This is required by HIPAA for all health care
data, and is now considered cloud best practice for cloud vendors due to
the very real risk of hacking. So, for archival, we need to weigh the
data security provided by cloud storage services using server side
encryption with the risk of the vendor managing the encryption keys.
Which IMO underscores the importance of having multiple copies of all
your archival data -- with different vendors and storage architectures or
media types if possible.
Gail
Gail Truman
Truman Technologies, LLC
Certified Digital Archives Specialist, Society of American Archivists
/*Protecting the world's digital heritage for future generations*/
www.trumantechnologies.com<http://www.trumantechnologies.com> <http://www.trumantechnologies.com><http://www.trumantechnologies.com>
facebook/TrumanTechnologies
https://www.linkedin.com/in/gtruman
+1 510 502 6497
-------- Original Message --------
Subject: Re: [Pasig-discuss] Risks of encryption & compression built
into storage options?
From: Robert Spindler <rob.spindler at asu.edu<mailto:rob.spindler at asu.edu>
<mailto:rob.spindler at asu.edu><mailto:rob.spindler at asu.edu>>
Date: Thu, March 16, 2017 9:06 am
To: Jeanne Kramer-Smyth <jkramersmyth at worldbankgroup.org<mailto:jkramersmyth at worldbankgroup.org>
<mailto:jkramersmyth at worldbankgroup.org><mailto:jkramersmyth at worldbankgroup.org>>,
"pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org><mailto:pasig-discuss at mail.asis.org><mailto:pasig-discuss at mail.asis.org>"
<pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org><mailto:pasig-discuss at mail.asis.org><mailto:pasig-discuss at mail.asis.org>>
At risk of starting a conversation, here are a couple basic issues
from an archival standpoint:
Encryption: Who has the keys and what happens should a provider go
out of business?
Compression: Lossy or Lossless and how does that compression act on
different file formats (video/audio). If this is frequently accessed
material it becomes more of an issue.
Short story: At a CNI meeting perhaps 15 years ago in a session about
ebooks I asked a panel of vendors if they would give up the keys to
encrypted e-books when they reached public domain. Crickets.
Physical discs are not secure given the forensics software widely
available today, but if someone can grab a physical disc the provider
has more problems than forensics.
Rob Spindler
University Archivist and Head
Archives and Special Collections
Arizona State University Libraries
Tempe AZ 85287-1006
480.965.9277
http://www.asu.edu/lib/archives
*From:*Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] *On
Behalf Of *Jeanne Kramer-Smyth
*Sent:* Thursday, March 16, 2017 8:54 AM
*To:* pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org> <mailto:pasig-discuss at mail.asis.org><mailto:pasig-discuss at mail.asis.org>
*Subject:* [Pasig-discuss] Risks of encryption & compression built
into storage options?
Is anyone aware of active research into the risks to digital
preservation that are posed by built in encryption and compression in
both cloud and on-prem storage options? Any and all go-to sources for
research and reading on these topics would be very welcome.
I am being told by the staff who source storage solutions for my
organization that encryption and compression are generally included
at the hardware level. That content is automatically encrypted and
compressed as it is written to disc – and then un-encrypted and
un-compressed as it is pulled off disc in response to a request. It
is advertised as both more secure (someone stealing a physical disc
could not, in theory, extract its contents) and more cost efficient
(taking up less space).
I want to be sure that as we make our choices for long-term storage
of permanent digital records that we take these risks into accounts.
Thank you!
Jeanne
*Jeanne Kramer-Smyth*
*IT Officer, Information Management Services II*
http://siteresources.worldbank.org/NEWS/Images/spacer.png
*Information and Technology Solutions*
*WBG Library & Archives of Development*
T
202-473-9803
E
jkramersmyth at worldbankgroup.org<mailto:jkramersmyth at worldbankgroup.org>
<mailto:jkramersmyth at worldbankgroup.org%20><mailto:jkramersmyth at worldbankgroup.org%20>
W
www.worldbank.org<http://www.worldbank.org>
<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.worldbank.org&d=DQMFAg&c=AGbYxfJbXK67KfXyGqyv2Ejiz41FqQuZFk4A-1IxfAU&r=NJgCuYsVfzWCDaR17iRz_stYXCBl0BBUfunzpCgq3O4&m=6K-rNEvustg-w3KUuAEUFhjRVFmFu0yMAsazbeVm-lg&s=TkShGzs9qr7es714pkkxzLceCXcULADNIGs74_m1QKQ&e=><https://urldefense.proofpoint.com/v2/url?u=http-3A__www.worldbank.org&d=DQMFAg&c=AGbYxfJbXK67KfXyGqyv2Ejiz41FqQuZFk4A-1IxfAU&r=NJgCuYsVfzWCDaR17iRz_stYXCBl0BBUfunzpCgq3O4&m=6K-rNEvustg-w3KUuAEUFhjRVFmFu0yMAsazbeVm-lg&s=TkShGzs9qr7es714pkkxzLceCXcULADNIGs74_m1QKQ&e=>
http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg
spellboundblog
http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg
jkramersmyth
http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg
jkramersmyth
A
1818 H St NW Washington, DC 20433
http://siteresources.worldbank.org/NEWS/Images/spacer.png
http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png
--------------------------------------------------------------------------------
----
To subscribe, unsubscribe, or modify your subscription, please visit
http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at
http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org<mailto:Pasig-discuss at mail.asis.org> <mailto:Pasig-discuss at mail.asis.org><mailto:Pasig-discuss at mail.asis.org>
http://mail.asis.org/mailman/listinfo/pasig-discuss
----
To subscribe, unsubscribe, or modify your subscription, please visit
http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org<mailto:Pasig-discuss at mail.asis.org>
http://mail.asis.org/mailman/listinfo/pasig-discuss
--
----------------------------------------------------
Chris Wood
Storage & Data Management
Office: 408-782-2757 (Home Office)
Office: 408-276-0730 (Work Office)
Mobile: 408-218-7313 (Preferred)
Email: lw85381 at yahoo.com<mailto:lw85381 at yahoo.com>
----------------------------------------------------
--
----------------------------------------------------
Chris Wood
Storage & Data Management
Office: 408-782-2757 (Home Office)
Office: 408-276-0730 (Work Office)
Mobile: 408-218-7313 (Preferred)
Email: lw85381 at yahoo.com<mailto:lw85381 at yahoo.com>
----------------------------------------------------
--
----------------------------------------------------
Chris Wood
Storage & Data Management
Office: 408-782-2757 (Home Office)
Office: 408-276-0730 (Work Office)
Mobile: 408-218-7313 (Preferred)
Email: lw85381 at yahoo.com<mailto:lw85381 at yahoo.com>
----------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170320/d2cabad6/attachment-0001.html>
More information about the Pasig-discuss
mailing list