[Pasig-discuss] Risks of encryption & compression built into storage options?

Matthew Addis matthew.addis at arkivum.com
Mon Mar 20 03:28:21 EDT 2017


Hi Chris, Jos,

There’s some examples of the effects that bit-flips and other data corruptions have on compressed AV content in a report from the PrestoPRIME project.  There’s some links in there to work by Heydegger and others, e.g. impact of bit errors on JPEG2000.   The report mainly covers AV, but there are some references in there about other compressed file formats, e.g. work by CERN on problems opening zips after bit-errors.  See page 57 onwards.
https://eprints.soton.ac.uk/373760/1/373760.pdf

This was followed up by work in the DAVID project that did a more extensive survey of how AV content gets corrupted in practice within big AV archives.   Note that bit-errors from storage, a.k.a bit rot was not a significant issue, well not compared with all the other problems!
http://david-preservation.eu/wp-content/uploads/2013/10/DAVID-D2-1-INA-WP2-DamageAssessment_v1-20.pdf

The reports above cover some aspects of compression at the file-format level (jpeg, zip etc.) and not compression at the hardware level (e.g. LTO data tape).   At Arkivum we turn compression off at the hardware level and instead let our clients chose to use compression or not at the application level.  In practice, most people using our service already have compressed file-formats, esp. images and video, because of the reduced data volumes which saves storage, bandwidth etc. in their day-to-day workflows.   Trying to add compression on the top e.g. at the LTO level rarely adds any benefit.

Cheers,

Matthew


Matthew Addis
Chief Technology Officer

tel:   +44 1249 405060
mob:     +44 7703 393374
email:     matthew.addis at arkivum.com<mailto:matthew.addis at arkivum.com>
web:       www.arkivum.com<http://www.arkivum.com/>
twitter: @arkivum

This message is confidential unless otherwise stated.
Arkivum Limited is registered in England and Wales, company number 7530353. Registered Office: 24 Cornhill, London, EC3V 3ND, United Kingdom

From: Pasig-discuss <pasig-discuss-bounces at asis.org<mailto:pasig-discuss-bounces at asis.org>> on behalf of Chris Wood <lw85381 at yahoo.com<mailto:lw85381 at yahoo.com>>
Date: Monday, 20 March 2017 04:15
To: "jos.vanwezel at kit.edu<mailto:jos.vanwezel at kit.edu>" <jos.vanwezel at kit.edu<mailto:jos.vanwezel at kit.edu>>
Cc: "pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org>" <pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org>>
Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options?

Hi Jos:

I remember getting a nice hard copy of the booklet. I don't know if MPEG ever made it public. I thought that by now some institution would have posted it, but I can't find it. (yet) Still looking.

Your comments about "other" bad things happening is spot on. In an IBM study several of us did about 20 years ago on data loss causal agents, human error won by a huge margin. In last place (Fewest causal factors) was H/W failures. In between in rough order was application and data management Software, incorrect documentation, device Firmware (We used to call this microcode when we still had dial phones:-)), external events (Power failures, storms whatever) and a few other categories I forget.  I do remember our RAS expert (Reliability, Availability and Serviceability) making the point that perfect replication code replicates corrupted data perfectly. Even more true today.

You might find this a quick interesting read: Why did NASA TRIPLEX all computers in the Space Shuttle and have two separate vendors write the code for them with a sophisticated voting system cases of non-agreement. https://www.nap.edu/read/2222/chapter/5
It seemed to work fine, but inter-booster gaskets did not and it turned out the insulation tiles were not very good at foreign object impact resistance.
A good example of unknown and completely unexpected failure modes.

CW

On 3/19/2017 3:41 PM, van Wezel, Jos (SCC) wrote:
Hi Chris, thanks a lot. The paper is fun reading especially about the analog movie archive :-)). Hopefully you do find the mpeg paper. My searches returned nothing yet. (was it ever published in some way?)

@all: Having read all posts thus far (great stuff guys) clearly the engineering approach to the problem does not cut it at all. Reading between the lines there seems to be a lot of experience with disasters where even a BER of 10^99 and 4 copies wont help. :-) For now we'll stick with 2 copies and 3 if requested explicitly by the client.

Groet

Jos


On 17/03/2017 17:48, Chris Wood wrote:
Jos:

I just knew somebody would ask this. Ha.  Several years ago several of us wrote
a paper for the MPEG (Motion Pictures Expert Group) and a mathematician named
Jeff Bonwick figured out all the math.  I haven't found it yet in the junk heap
of my PC, but did find a companion paper written by by the same set of authors.
It's not exactly, what you are looking for, but close. It's more about Bit Error
Rates at a rather low level.  I will continue to look for the MPEG paper. It's
got to be somewhere. The Internet "never forgets" Right?
Stay tuned as I keep looking.

CW

On 3/17/2017 12:48 AM, van Wezel, Jos (SCC) wrote:
Chris,
do you happen to have any reference to the mathatical correctness or
computation that 3 copies is optimal. Is proof based on the standard ecc
values that vendors list with their components (tapes,  disks,  transport
lines, memory etc). I'm asking because its difficult to argue for the
additional costs of a third copy without the math. Currently I can't tell my
customers how much (as in percentage) extra security an addittional copy will
bring, even theoretically.

regards

jos

Sent from my Samsung Galaxy smartphone.

-------- Original message --------
From: Chris Wood <lw85381 at yahoo.com><mailto:lw85381 at yahoo.com>
Date: 17/03/2017 02:07 (GMT+01:00)
To: "Raymond A. Clarke" <Raymond.Clarke1 at Verizon.net><mailto:Raymond.Clarke1 at Verizon.net>,
gail at trumantechnologies.com<mailto:gail at trumantechnologies.com>, 'Jeanne Kramer-Smyth'
<jkramersmyth at worldbankgroup.org><mailto:jkramersmyth at worldbankgroup.org>, 'Robert Spindler' <rob.spindler at asu.edu><mailto:rob.spindler at asu.edu>,
pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org>
Subject: Re: [Pasig-discuss] Risks of encryption & compression built into
storage options?

Thanks Ray as always for a great summary. Now my three bits:

Three (3) copies please. One of which is in a remote location on a different
flood plane, Electric grid, fault line etc. for the obvious reasons.
Mathematically, this has turned out to be the optimal number looked at with a
cost/benefit mindset. Kind of like: 2 is better than one, buta  local problem
gets both copies. Three (remote) is more expensive but you get A LOT more data
resilience/persistence. Four costs a bunch more, but delivers just a little
bit more resilience. Four+ are all examples of ever diminishing returns.

CW

On 3/16/2017 4:40 PM, Raymond A. Clarke wrote:

Hello All,



A few years back, I did some research on bit-rot and data corruption, as it
relates to the various medium that data passes through, on its way to and
from the user.  Consider this simple example; as data from memory to HBA to
cable to air to cable and so on, bits can be lost along way at any one of, or
several of the medium transit  points. This something that current
technologies can help with, in part.  Back to the original question, :how do
we insure against corruption, either from compression, encryption” and/or
transmission?  Well disk and tape(/data resting places/, if you will) have a
come very long way in reducing bit-error rates, compression and encryption.
But the “/resting places”/ are only part of a problem.  In accordance with
Gail’s suggestion and as Dr. Rosenthal has coined, LOCKSS (“lot of copies
keep stuff safe”).





Take good care,

Raymond



*From:*Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] *On Behalf Of
*gail at trumantechnologies.com<mailto:*gail at trumantechnologies.com>
*Sent:* Thursday, March 16, 2017 5:10 PM
*To:* Jeanne Kramer-Smyth <jkramersmyth at worldbankgroup.org><mailto:jkramersmyth at worldbankgroup.org>; Robert Spindler
<rob.spindler at asu.edu><mailto:rob.spindler at asu.edu>; pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org>
*Subject:* Re: [Pasig-discuss] Risks of encryption & compression built into
storage options?



Hello again, Jeanne,



I think you're hitting on something that needs to be raised to (and pushed
for with) vendors, and that is the need for "More transparency" and the
reporting to customers of "events" that are part of the provenance of a
digital object. The storage architectures do a good job of error detection
and self healing; however, they do not report this out. I'd like to (this is
my dream) have vendors report back to customers (as part of their SLA) when a
object (or part of an object if it's been chunked) has been
repaired/self-healed - or lost forever. I could then record this as a PREMIS
event. As you know, vendors "design for" 11x9s or 13x9s durability, but their
SLAs do not require them to tell us if their durability and data corruption
starts to get really bad for whatever reason.



I've not directly answered your question about whether the encryption,
dedupe, compression, and other things that can happen inside a storage system
is increasing the risk of corruption. I'll look around. I am sure the disk
vendors and storage solution and cloud storage vendors have run the numbers,
but am not sure if they're made public.



This alias has people from Oracle, Seagate and other storage companies on it
so I encourage them to please share any research they have on this -





Gail







Gail Truman

Truman Technologies, LLC

Certified Digital Archives Specialist, Society of American Archivists



/*Protecting the world's digital heritage for future generations*/

www.trumantechnologies.com<http://www.trumantechnologies.com><http://www.trumantechnologies.com><http://www.trumantechnologies.com>

facebook/TrumanTechnologies

https://www.linkedin.com/in/gtruman



+1 510 502 6497







    -------- Original Message --------
    Subject: RE: [Pasig-discuss] Risks of encryption & compression built
    into storage options?
    From: Jeanne Kramer-Smyth <jkramersmyth at worldbankgroup.org<mailto:jkramersmyth at worldbankgroup.org>
    <mailto:jkramersmyth at worldbankgroup.org><mailto:jkramersmyth at worldbankgroup.org>>
    Date: Thu, March 16, 2017 1:44 pm
    To: "gail at trumantechnologies.com<mailto:gail at trumantechnologies.com><mailto:gail at trumantechnologies.com><mailto:gail at trumantechnologies.com>"
    <gail at trumantechnologies.com<mailto:gail at trumantechnologies.com><mailto:gail at trumantechnologies.com><mailto:gail at trumantechnologies.com>>, "Robert
    Spindler" <rob.spindler at asu.edu<mailto:rob.spindler at asu.edu><mailto:rob.spindler at asu.edu><mailto:rob.spindler at asu.edu>>,
    "pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org><mailto:pasig-discuss at mail.asis.org><mailto:pasig-discuss at mail.asis.org>"
    <pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org><mailto:pasig-discuss at mail.asis.org><mailto:pasig-discuss at mail.asis.org>>

    Thanks Gail & Rob for your replies.



    I am less worried about the scenario of someone stealing a drive – as Rob
    pointed out, if that is happening we have bigger problems.



    I do wonder if there are increased risks of bit-rot/file corruption with
    encryption, compression, and data deduplication. Have there been any
    studies on this? Could pulling a file off a drive that requires reversal
    of the auto-encryption and auto-compression in place at the system level
    mean a greater risk of bits flipping? I am trying to contrast the
    increased “handling” and change required to get from the stored version
    to the original version vs the decreased “handling” it would require if
    what I am pulling off the storage device is exactly what I sent to be stored.



    I am less worried about issues related to not being able to decrypt
    content. The storage solutions we are contemplating would remain under
    enough ongoing management that these issues should be avoidable. Since
    ensuring that non-public records remain secure is also very important,
    encryption gets some points in the “pro” column. I agree that having
    multiple copies in different storage architectures and with different
    vendors would also decrease risk.



    I want to understand the risks related to the different storage
    architectures and the ever increasing number of “automatic” things being
    done to digital objects in the process of them being stored and
    retrieved. Are there people doing work, independent of vendor claims, to
    document these types of risks?



    Thank you,



    Jeanne

    *Jeanne Kramer-Smyth*

    *IT Officer, Information Management Services II*

    http://siteresources.worldbank.org/NEWS/Images/spacer.png

    *Information and Technology Solutions*

    *WBG Library & Archives of Development*

    T



    202-473-9803

    E



    jkramersmyth at worldbankgroup.org<mailto:jkramersmyth at worldbankgroup.org> <mailto:jkramersmyth at worldbankgroup.org%20><mailto:jkramersmyth at worldbankgroup.org%20>

    W



    www.worldbank.org<http://www.worldbank.org>
    <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.worldbank.org&d=DQMFAg&c=AGbYxfJbXK67KfXyGqyv2Ejiz41FqQuZFk4A-1IxfAU&r=NJgCuYsVfzWCDaR17iRz_stYXCBl0BBUfunzpCgq3O4&m=6K-rNEvustg-w3KUuAEUFhjRVFmFu0yMAsazbeVm-lg&s=TkShGzs9qr7es714pkkxzLceCXcULADNIGs74_m1QKQ&e=><https://urldefense.proofpoint.com/v2/url?u=http-3A__www.worldbank.org&d=DQMFAg&c=AGbYxfJbXK67KfXyGqyv2Ejiz41FqQuZFk4A-1IxfAU&r=NJgCuYsVfzWCDaR17iRz_stYXCBl0BBUfunzpCgq3O4&m=6K-rNEvustg-w3KUuAEUFhjRVFmFu0yMAsazbeVm-lg&s=TkShGzs9qr7es714pkkxzLceCXcULADNIGs74_m1QKQ&e=>

    http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg



    spellboundblog

    http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg



    jkramersmyth

    http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg



    jkramersmyth

    A



    1818 H St NW Washington, DC 20433

    http://siteresources.worldbank.org/NEWS/Images/spacer.png

    http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png



    *From:*gail at trumantechnologies.com<mailto:*gail at trumantechnologies.com> <mailto:gail at trumantechnologies.com><mailto:gail at trumantechnologies.com>
    [mailto:gail at trumantechnologies.com]
    *Sent:* Thursday, March 16, 2017 3:18 PM
    *To:* Robert Spindler <rob.spindler at asu.edu<mailto:rob.spindler at asu.edu>
    <mailto:rob.spindler at asu.edu><mailto:rob.spindler at asu.edu>>; Jeanne Kramer-Smyth
    <jkramersmyth at worldbankgroup.org<mailto:jkramersmyth at worldbankgroup.org>
    <mailto:jkramersmyth at worldbankgroup.org><mailto:jkramersmyth at worldbankgroup.org>>; pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org>
    <mailto:pasig-discuss at mail.asis.org><mailto:pasig-discuss at mail.asis.org>
    *Subject:* RE: [Pasig-discuss] Risks of encryption & compression built
    into storage options?



    Hi all, a good topic!

    There is new drive technology from Seagate (probably other manufacturers)
    called "Self Encrypted Drives" (SEDs) which can be used to solve the
    problem of a person stealing a drive and running off with data.



    Most cloud services now automatically provide "server side encryption"
    which means the vendor is doing the encryption for all data at rest (as
    you point out Jeanne). This is required by HIPAA for all health care
    data, and is now considered cloud best practice for cloud vendors due to
    the very real risk of hacking. So, for archival, we need to weigh the
    data security provided by cloud storage services using server side
    encryption with the risk of the vendor managing the encryption keys.
    Which IMO underscores the importance of having multiple copies of all
    your archival data -- with different vendors and storage architectures or
    media types if possible.



    Gail











    Gail Truman

    Truman Technologies, LLC

    Certified Digital Archives Specialist, Society of American Archivists



    /*Protecting the world's digital heritage for future generations*/

    www.trumantechnologies.com<http://www.trumantechnologies.com> <http://www.trumantechnologies.com><http://www.trumantechnologies.com>

    facebook/TrumanTechnologies

    https://www.linkedin.com/in/gtruman



    +1 510 502 6497







        -------- Original Message --------
        Subject: Re: [Pasig-discuss] Risks of encryption & compression built
        into storage options?
        From: Robert Spindler <rob.spindler at asu.edu<mailto:rob.spindler at asu.edu>
        <mailto:rob.spindler at asu.edu><mailto:rob.spindler at asu.edu>>
        Date: Thu, March 16, 2017 9:06 am
        To: Jeanne Kramer-Smyth <jkramersmyth at worldbankgroup.org<mailto:jkramersmyth at worldbankgroup.org>
        <mailto:jkramersmyth at worldbankgroup.org><mailto:jkramersmyth at worldbankgroup.org>>,
        "pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org><mailto:pasig-discuss at mail.asis.org><mailto:pasig-discuss at mail.asis.org>"
        <pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org><mailto:pasig-discuss at mail.asis.org><mailto:pasig-discuss at mail.asis.org>>

        At risk of starting a conversation, here are a couple basic issues
        from an archival standpoint:



        Encryption: Who has the keys and what happens should a provider go
        out of business?



        Compression: Lossy or Lossless and how does that compression act on
        different file formats (video/audio). If this is frequently accessed
        material it becomes more of an issue.



        Short story: At a CNI meeting perhaps 15 years ago in a session about
        ebooks I asked a panel of vendors if they would give up the keys to
        encrypted e-books when they reached public domain. Crickets.



        Physical discs are not secure given the forensics software widely
        available today, but if someone can grab a physical disc the provider
        has more problems than forensics.



        Rob Spindler

        University Archivist and Head

        Archives and Special Collections

        Arizona State University Libraries

        Tempe AZ 85287-1006

        480.965.9277

        http://www.asu.edu/lib/archives



        *From:*Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] *On
        Behalf Of *Jeanne Kramer-Smyth
        *Sent:* Thursday, March 16, 2017 8:54 AM
        *To:* pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org> <mailto:pasig-discuss at mail.asis.org><mailto:pasig-discuss at mail.asis.org>
        *Subject:* [Pasig-discuss] Risks of encryption & compression built
        into storage options?



        Is anyone aware of active research into the risks to digital
        preservation that are posed by built in encryption and compression in
        both cloud and on-prem storage options? Any and all go-to sources for
        research and reading on these topics would be very welcome.



        I am being told by the staff who source storage solutions for my
        organization that encryption and compression are generally included
        at the hardware level. That content is automatically encrypted and
        compressed as it is written to disc – and then un-encrypted and
        un-compressed as it is pulled off disc in response to a request. It
        is advertised as both more secure (someone stealing a physical disc
        could not, in theory, extract its contents) and more cost efficient
        (taking up less space).



        I want to be sure that as we make our choices for long-term storage
        of permanent digital records that we take these risks into accounts.



        Thank you!

        Jeanne



        *Jeanne Kramer-Smyth*

        *IT Officer, Information Management Services II*

        http://siteresources.worldbank.org/NEWS/Images/spacer.png

        *Information and Technology Solutions*

        *WBG Library & Archives of Development*

        T



        202-473-9803

        E



        jkramersmyth at worldbankgroup.org<mailto:jkramersmyth at worldbankgroup.org>
        <mailto:jkramersmyth at worldbankgroup.org%20><mailto:jkramersmyth at worldbankgroup.org%20>

        W



        www.worldbank.org<http://www.worldbank.org>
        <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.worldbank.org&d=DQMFAg&c=AGbYxfJbXK67KfXyGqyv2Ejiz41FqQuZFk4A-1IxfAU&r=NJgCuYsVfzWCDaR17iRz_stYXCBl0BBUfunzpCgq3O4&m=6K-rNEvustg-w3KUuAEUFhjRVFmFu0yMAsazbeVm-lg&s=TkShGzs9qr7es714pkkxzLceCXcULADNIGs74_m1QKQ&e=><https://urldefense.proofpoint.com/v2/url?u=http-3A__www.worldbank.org&d=DQMFAg&c=AGbYxfJbXK67KfXyGqyv2Ejiz41FqQuZFk4A-1IxfAU&r=NJgCuYsVfzWCDaR17iRz_stYXCBl0BBUfunzpCgq3O4&m=6K-rNEvustg-w3KUuAEUFhjRVFmFu0yMAsazbeVm-lg&s=TkShGzs9qr7es714pkkxzLceCXcULADNIGs74_m1QKQ&e=>

        http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg



        spellboundblog

        http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg



        jkramersmyth

        http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg



        jkramersmyth

        A



        1818 H St NW Washington, DC 20433

        http://siteresources.worldbank.org/NEWS/Images/spacer.png

        http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png





        --------------------------------------------------------------------------------

        ----
        To subscribe, unsubscribe, or modify your subscription, please visit
        http://mail.asis.org/mailman/listinfo/pasig-discuss
        _______
        PASIG Webinars and conference material is at
        http://www.preservationandarchivingsig.org/index.html
        _______________________________________________
        Pasig-discuss mailing list
        Pasig-discuss at mail.asis.org<mailto:Pasig-discuss at mail.asis.org> <mailto:Pasig-discuss at mail.asis.org><mailto:Pasig-discuss at mail.asis.org>
        http://mail.asis.org/mailman/listinfo/pasig-discuss



----
To subscribe, unsubscribe, or modify your subscription, please visit
http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org<mailto:Pasig-discuss at mail.asis.org>
http://mail.asis.org/mailman/listinfo/pasig-discuss

--
----------------------------------------------------
Chris Wood
Storage & Data Management
Office:  408-782-2757 (Home Office)
Office:  408-276-0730 (Work Office)
Mobile:  408-218-7313 (Preferred)
Email: lw85381 at yahoo.com<mailto:lw85381 at yahoo.com>
----------------------------------------------------

--
----------------------------------------------------
Chris Wood
Storage & Data Management
Office:  408-782-2757 (Home Office)
Office:  408-276-0730 (Work Office)
Mobile:  408-218-7313 (Preferred)
Email: lw85381 at yahoo.com<mailto:lw85381 at yahoo.com>
----------------------------------------------------




--
----------------------------------------------------
Chris Wood
Storage & Data Management
Office:  408-782-2757 (Home Office)
Office:  408-276-0730 (Work Office)
Mobile:  408-218-7313 (Preferred)
Email: lw85381 at yahoo.com<mailto:lw85381 at yahoo.com>
----------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170320/d2cabad6/attachment-0001.html>


More information about the Pasig-discuss mailing list