From kdweeks at vt.edu Fri Mar 3 14:34:23 2017 From: kdweeks at vt.edu (Kimberli Weeks) Date: Fri, 3 Mar 2017 14:34:23 -0500 Subject: [Pasig-discuss] Job opportunity: Virginia Tech Libraries, Software Engineer Message-ID: Hi All! We are looking for a talented Software Engineer to work at University Libraries, Virginia Tech. ===== The University Libraries at Virginia Tech is recruiting for a Software Engineer. This is an opportunity to join a focused and successful team of engineers in developing digital library and repository software solutions. The systems being developed enable management, preservation, and online discovery that support the research data and scholarly, scientific, and creative expression of researchers at Virginia Tech. The successful candidate will engage in digital preservation strategies and repository systems research within the Digital Library Development team and support an expanding suite of data and informatics technologies within the University Libraries. Virginia Tech Libraries is on the forefront of inventing the future of the research library. Our team seeks candidates to help us continuously innovate, improve agility, increase reliability, and support university functions through library systems. Candidates for the position of should review required qualifications and apply here: https://listings.jobs.vt.edu/postings/74235 ===== We would also appreciate any ideas or suggestions you may have as we continue to identify strong candidates. Thanks! Kimberli _ _ _ _ _ _ _ _ _ _ Kimberli Weeks kdweeks at vt.edu (540) 231-2674 Technical Director, Digital Library Development, Research & Informatics University Libraries, Virginia Tech http://scholar.lib.vt.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From ntay at stanford.edu Fri Mar 3 14:45:13 2017 From: ntay at stanford.edu (Nicholas Taylor) Date: Fri, 3 Mar 2017 19:45:13 +0000 Subject: [Pasig-discuss] job opening: LOCKSS Partnerships Manager at Stanford Libraries Message-ID: Help us advance resilient, networked digital preservation for communities. The LOCKSS Program, a division of Digital Library Systems and Services at Stanford University Libraries, is seeking a partnerships manager to support the adaptation and utilization of LOCKSS networks and technologies for a growing array of communities and use cases. We offer generous salaries, a beautiful working environment, a diverse team of passionate and talented programmers and librarians, and the opportunity to have a meaningful impact in the field of digital preservation. For further details, please see the following links: * Job ad: https://library.stanford.edu/department/digital-library-systems-and-services-dlss/jobs/lockss-partnerships-manager * LOCKSS Program: https://www.lockss.org/ * Digital Library Systems and Services: https://library.stanford.edu/department/digital-library-systems-and-services-dlss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jackie at sdiwc.info Mon Mar 6 05:04:39 2017 From: jackie at sdiwc.info (Jackie Blanco) Date: Mon, 06 Mar 2017 03:04:39 -0700 Subject: [Pasig-discuss] CfP: EECEA2017 - Philippines Message-ID: <6c844b404fb2f72b45e8a27da567c1a4@sdiwc.info> The Fourth International Conference on Electrical, Electronics, Computer Engineering and their Applications (EECEA2017) University of Perpetual Help System DALTA, Las Pi?as, Manila, Philippines October 11-13, 2017 http://sdiwc.net/conferences/eecea2017/ The conference welcomes papers on the following (but not limited to) research topics: *Electronics Engineering *Electrical Engineering *Computer Engineering Researchers are encouraged to submit their work electronically. All papers will be fully refereed by a minimum of two specialized referees. Before final acceptance, all referees comments must be considered. Keynote Speaker Yoshiro Imai, Kagawa University, Japan Keynote Title: Visualization and Data Mining Important Dates September 11, 2017 - Submission Deadline September 21, 2017 - Notification of Acceptance September 30, 2017 - Camere Ready and Registration Deadline Special Session Researchers are welcome to organize special session. The session should be within a very specific topic. Please send your session title, your cv for this purpose to eecea17 [at] sdiwc [dot] net. From jschne at stanford.edu Wed Mar 15 14:35:57 2017 From: jschne at stanford.edu (Josh Schneider) Date: Wed, 15 Mar 2017 18:35:57 +0000 Subject: [Pasig-discuss] =?windows-1252?q?Personal_Digital_Archiving_2017_?= =?windows-1252?q?=96_Announcing_the_PDA_Hackathon_=28Palo_Alto=2C_CA=29?= Message-ID: Stanford University Libraries will be hosting a Personal Digital Archiving (PDA) Hackathon from March 31 - April 1 at Stanford University, in Palo Alto, CA, in conjunction with the PDA 2017 conference. Join friends and fellow coders to create innovative solutions to the ongoing challenges facing individuals (including digital humanists and cultural heritage researchers) in the digital age, including how we can best mine, analyze, visualize, and use the immense variety of personal data that we are all creating. Visit the PDA Hackathon site to register and to learn more about prizes and the event schedule. PDA Conference The PDA 2017 conference will be will be held March 29 - 31 at Stanford University, in Palo Alto, CA, just a short commuter train away from San Francisco. The conference program and schedule are now available here. There are just 25 spaces left for conference attendees, and we expect these to fill up fast, so please register ASAP if you would like to attend! The PDA conference will shine a spotlight on projects and research by digital archivists, faculty, tool and service developers, independent researchers, and others engaged in the collection, preservation, and study of data shedding light on individuals, their families, and their communities. The conference will consist of presentations, panel discussions, posters/demos, and hands-on workshops. Keynote speakers include Kim Christen and Gary Wolf. More info about PDA 2017 (including travel/lodging recommendations) can be found here. Note the PDA conference is co-scheduled with several other conferences hosted by Stanford Libraries, including LDCX (March 27-29, 2017), and BDAX (March 28, 2017), in order to encourage co-attendance and cross-pollination. We look forward to seeing you as many of you as possible in Palo Alto! Best, Josh (on behalf of the PDA 2017 planning committee) -- Josh Schneider Assistant University Archivist ePADD Community Manager Special Collections & University Archives Stanford University josh.schneider at stanford.edu 650-497-6489 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jkramersmyth at worldbankgroup.org Thu Mar 16 11:53:38 2017 From: jkramersmyth at worldbankgroup.org (Jeanne Kramer-Smyth) Date: Thu, 16 Mar 2017 15:53:38 +0000 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? Message-ID: Is anyone aware of active research into the risks to digital preservation that are posed by built in encryption and compression in both cloud and on-prem storage options? Any and all go-to sources for research and reading on these topics would be very welcome. I am being told by the staff who source storage solutions for my organization that encryption and compression are generally included at the hardware level. That content is automatically encrypted and compressed as it is written to disc - and then un-encrypted and un-compressed as it is pulled off disc in response to a request. It is advertised as both more secure (someone stealing a physical disc could not, in theory, extract its contents) and more cost efficient (taking up less space). I want to be sure that as we make our choices for long-term storage of permanent digital records that we take these risks into accounts. Thank you! Jeanne Jeanne Kramer-Smyth IT Officer, Information Management Services II [http://siteresources.worldbank.org/NEWS/Images/spacer.png] Information and Technology Solutions WBG Library & Archives of Development T 202-473-9803 E jkramersmyth at worldbankgroup.org W www.worldbank.org [http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg] spellboundblog [http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg] jkramersmyth [http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg] jkramersmyth A 1818 H St NW Washington, DC 20433 [http://siteresources.worldbank.org/NEWS/Images/spacer.png] [http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 170 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 700 bytes Desc: image002.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.jpg Type: image/jpeg Size: 11482 bytes Desc: image003.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.jpg Type: image/jpeg Size: 11424 bytes Desc: image004.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.png Type: image/png Size: 170 bytes Desc: image005.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.png Type: image/png Size: 6577 bytes Desc: image006.png URL: From rob.spindler at asu.edu Thu Mar 16 12:06:48 2017 From: rob.spindler at asu.edu (Robert Spindler) Date: Thu, 16 Mar 2017 16:06:48 +0000 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: References: Message-ID: At risk of starting a conversation, here are a couple basic issues from an archival standpoint: Encryption: Who has the keys and what happens should a provider go out of business? Compression: Lossy or Lossless and how does that compression act on different file formats (video/audio). If this is frequently accessed material it becomes more of an issue. Short story: At a CNI meeting perhaps 15 years ago in a session about ebooks I asked a panel of vendors if they would give up the keys to encrypted e-books when they reached public domain. Crickets. Physical discs are not secure given the forensics software widely available today, but if someone can grab a physical disc the provider has more problems than forensics. Rob Spindler University Archivist and Head Archives and Special Collections Arizona State University Libraries Tempe AZ 85287-1006 480.965.9277 http://www.asu.edu/lib/archives From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Jeanne Kramer-Smyth Sent: Thursday, March 16, 2017 8:54 AM To: pasig-discuss at mail.asis.org Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? Is anyone aware of active research into the risks to digital preservation that are posed by built in encryption and compression in both cloud and on-prem storage options? Any and all go-to sources for research and reading on these topics would be very welcome. I am being told by the staff who source storage solutions for my organization that encryption and compression are generally included at the hardware level. That content is automatically encrypted and compressed as it is written to disc - and then un-encrypted and un-compressed as it is pulled off disc in response to a request. It is advertised as both more secure (someone stealing a physical disc could not, in theory, extract its contents) and more cost efficient (taking up less space). I want to be sure that as we make our choices for long-term storage of permanent digital records that we take these risks into accounts. Thank you! Jeanne Jeanne Kramer-Smyth IT Officer, Information Management Services II [http://siteresources.worldbank.org/NEWS/Images/spacer.png] Information and Technology Solutions WBG Library & Archives of Development T 202-473-9803 E jkramersmyth at worldbankgroup.org W www.worldbank.org [http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg] spellboundblog [http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg] jkramersmyth [http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg] jkramersmyth A 1818 H St NW Washington, DC 20433 [http://siteresources.worldbank.org/NEWS/Images/spacer.png] [http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 170 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 700 bytes Desc: image002.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.jpg Type: image/jpeg Size: 11482 bytes Desc: image003.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.jpg Type: image/jpeg Size: 11424 bytes Desc: image004.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.png Type: image/png Size: 170 bytes Desc: image005.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.png Type: image/png Size: 6577 bytes Desc: image006.png URL: From gail at trumantechnologies.com Thu Mar 16 15:18:01 2017 From: gail at trumantechnologies.com (gail at trumantechnologies.com) Date: Thu, 16 Mar 2017 12:18:01 -0700 Subject: [Pasig-discuss] =?utf-8?q?Risks_of_encryption_=26_compression_bui?= =?utf-8?q?lt_into_storage_options=3F?= Message-ID: <20170316121801.b554e26909f2beaf9f8ddbf6be9a6600.039e21facb.wbe@email09.godaddy.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 170 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 700 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.jpg Type: image/jpeg Size: 11482 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.jpg Type: image/jpeg Size: 11424 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.png Type: image/png Size: 170 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.png Type: image/png Size: 6577 bytes Desc: not available URL: From jkramersmyth at worldbankgroup.org Thu Mar 16 16:44:37 2017 From: jkramersmyth at worldbankgroup.org (Jeanne Kramer-Smyth) Date: Thu, 16 Mar 2017 20:44:37 +0000 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: <20170316121801.b554e26909f2beaf9f8ddbf6be9a6600.039e21facb.wbe@email09.godaddy.com> References: <20170316121801.b554e26909f2beaf9f8ddbf6be9a6600.039e21facb.wbe@email09.godaddy.com> Message-ID: Thanks Gail & Rob for your replies. I am less worried about the scenario of someone stealing a drive ? as Rob pointed out, if that is happening we have bigger problems. I do wonder if there are increased risks of bit-rot/file corruption with encryption, compression, and data deduplication. Have there been any studies on this? Could pulling a file off a drive that requires reversal of the auto-encryption and auto-compression in place at the system level mean a greater risk of bits flipping? I am trying to contrast the increased ?handling? and change required to get from the stored version to the original version vs the decreased ?handling? it would require if what I am pulling off the storage device is exactly what I sent to be stored. I am less worried about issues related to not being able to decrypt content. The storage solutions we are contemplating would remain under enough ongoing management that these issues should be avoidable. Since ensuring that non-public records remain secure is also very important, encryption gets some points in the ?pro? column. I agree that having multiple copies in different storage architectures and with different vendors would also decrease risk. I want to understand the risks related to the different storage architectures and the ever increasing number of ?automatic? things being done to digital objects in the process of them being stored and retrieved. Are there people doing work, independent of vendor claims, to document these types of risks? Thank you, Jeanne Jeanne Kramer-Smyth IT Officer, Information Management Services II [http://siteresources.worldbank.org/NEWS/Images/spacer.png] Information and Technology Solutions WBG Library & Archives of Development T 202-473-9803 E jkramersmyth at worldbankgroup.org W www.worldbank.org [http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg] spellboundblog [http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg] jkramersmyth [http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg] jkramersmyth A 1818 H St NW Washington, DC 20433 [http://siteresources.worldbank.org/NEWS/Images/spacer.png] [http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png] From: gail at trumantechnologies.com [mailto:gail at trumantechnologies.com] Sent: Thursday, March 16, 2017 3:18 PM To: Robert Spindler ; Jeanne Kramer-Smyth ; pasig-discuss at mail.asis.org Subject: RE: [Pasig-discuss] Risks of encryption & compression built into storage options? Hi all, a good topic! There is new drive technology from Seagate (probably other manufacturers) called "Self Encrypted Drives" (SEDs) which can be used to solve the problem of a person stealing a drive and running off with data. Most cloud services now automatically provide "server side encryption" which means the vendor is doing the encryption for all data at rest (as you point out Jeanne). This is required by HIPAA for all health care data, and is now considered cloud best practice for cloud vendors due to the very real risk of hacking. So, for archival, we need to weigh the data security provided by cloud storage services using server side encryption with the risk of the vendor managing the encryption keys. Which IMO underscores the importance of having multiple copies of all your archival data -- with different vendors and storage architectures or media types if possible. Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? From: Robert Spindler > Date: Thu, March 16, 2017 9:06 am To: Jeanne Kramer-Smyth >, "pasig-discuss at mail.asis.org" > At risk of starting a conversation, here are a couple basic issues from an archival standpoint: Encryption: Who has the keys and what happens should a provider go out of business? Compression: Lossy or Lossless and how does that compression act on different file formats (video/audio). If this is frequently accessed material it becomes more of an issue. Short story: At a CNI meeting perhaps 15 years ago in a session about ebooks I asked a panel of vendors if they would give up the keys to encrypted e-books when they reached public domain. Crickets. Physical discs are not secure given the forensics software widely available today, but if someone can grab a physical disc the provider has more problems than forensics. Rob Spindler University Archivist and Head Archives and Special Collections Arizona State University Libraries Tempe AZ 85287-1006 480.965.9277 http://www.asu.edu/lib/archives From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Jeanne Kramer-Smyth Sent: Thursday, March 16, 2017 8:54 AM To: pasig-discuss at mail.asis.org Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? Is anyone aware of active research into the risks to digital preservation that are posed by built in encryption and compression in both cloud and on-prem storage options? Any and all go-to sources for research and reading on these topics would be very welcome. I am being told by the staff who source storage solutions for my organization that encryption and compression are generally included at the hardware level. That content is automatically encrypted and compressed as it is written to disc ? and then un-encrypted and un-compressed as it is pulled off disc in response to a request. It is advertised as both more secure (someone stealing a physical disc could not, in theory, extract its contents) and more cost efficient (taking up less space). I want to be sure that as we make our choices for long-term storage of permanent digital records that we take these risks into accounts. Thank you! Jeanne Jeanne Kramer-Smyth IT Officer, Information Management Services II [http://siteresources.worldbank.org/NEWS/Images/spacer.png] Information and Technology Solutions WBG Library & Archives of Development T 202-473-9803 E jkramersmyth at worldbankgroup.org W www.worldbank.org [http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg] spellboundblog [http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg] jkramersmyth [http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg] jkramersmyth A 1818 H St NW Washington, DC 20433 [http://siteresources.worldbank.org/NEWS/Images/spacer.png] [http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png] ________________________________ ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 170 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 700 bytes Desc: image002.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.jpg Type: image/jpeg Size: 11482 bytes Desc: image003.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.jpg Type: image/jpeg Size: 11424 bytes Desc: image004.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.png Type: image/png Size: 170 bytes Desc: image005.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.png Type: image/png Size: 6577 bytes Desc: image006.png URL: From dshr at stanford.edu Thu Mar 16 16:47:10 2017 From: dshr at stanford.edu (David Rosenthal) Date: Thu, 16 Mar 2017 13:47:10 -0700 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: References: Message-ID: On 03/16/2017 08:53 AM, Jeanne Kramer-Smyth wrote: > I am being told by the staff who source storage solutions for my > organization that encryption and compression are generally included > at the hardware level. That content is automatically encrypted and > compressed as it is written to disc - and then un-encrypted and > un-compressed as it is pulled off disc in response to a request. It > is advertised as both more secure (someone stealing a physical disc > could not, in theory, extract its contents) and more cost efficient > (taking up less space). > > I want to be sure that as we make our choices for long-term storage > of permanent digital records that we take these risks into accounts. Archival systems have to treat all media as unreliable, because they are. The path between the analog data on the disk platters and the unencrypted uncompressed data at the SATA or SAS interface is enormously complex (you truly do not want to, and in fact cannot, know), but it is irrelevant to applications using the disks. Media should be treated as black boxes. Data goes in, data comes out. Some data returned will be bad. At some point the entire medium will die. Archival systems have to live with these facts. Depending on your threat model, encrypting data at rest may be a good idea. Depending on the media to do it for you, and thus not knowing whether or how it is being done, may not be an adequate threat mitigation. You may be interested in this blog post: http://blog.dshr.org/2016/12/the-medium-term-prospects-for-long-term.html especially the sections: Does Long-Term Storage Need Long-Lived Media? Does Long-Term Storage Need Ultra-Reliable Media? David. From gail at trumantechnologies.com Thu Mar 16 17:09:56 2017 From: gail at trumantechnologies.com (gail at trumantechnologies.com) Date: Thu, 16 Mar 2017 14:09:56 -0700 Subject: [Pasig-discuss] =?utf-8?q?Risks_of_encryption_=26_compression_bui?= =?utf-8?q?lt_into_storage_options=3F?= Message-ID: <20170316140956.b554e26909f2beaf9f8ddbf6be9a6600.ee7a29052e.wbe@email09.godaddy.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 170 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 700 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.jpg Type: image/jpeg Size: 11482 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.jpg Type: image/jpeg Size: 11424 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.png Type: image/png Size: 170 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.png Type: image/png Size: 6577 bytes Desc: not available URL: From evelyn at artefactual.com Thu Mar 16 18:54:16 2017 From: evelyn at artefactual.com (Evelyn McLellan) Date: Thu, 16 Mar 2017 15:54:16 -0700 Subject: [Pasig-discuss] Fwd: Announcement: version 1.6 of Archivematica released today In-Reply-To: References: Message-ID: Dear colleagues, Please excuse my cross-posting. We're pleased to announce that version 1.6 of Archivematica has been released. We have named the release to honour the memory of Nancy Deromedi, whose vision helped shape defining features of this release. You can read the full release notes here: https://groups.google.co m/d/msg/archivematica/AVP4ARbomA4/SbaGxXaGBgAJ Some highlights: - new functionality for managing backlogs and the appraisal and arrangement of material - AIP re-ingest which allows you to re-run all major preservation services, including normalization for preservation - new workflows with AtoM, ArchivesSpace and DSpace - fixity reporting via the Archivematica Storage Service - updated to use PRONOM v. 88 - Bug fixes and updated tools - and more! We would welcome your feedback and comments via our user forum, linked above. Warm regards, Sarah Romkey Sarah Romkey, MAS,MLIS Archivematica Program Manager Artefactual Systems 604-527-2056 <(604)%20527-2056> @archivematica / @accesstomemory -------------- next part -------------- An HTML attachment was scrubbed... URL: From Raymond.Clarke1 at Verizon.net Thu Mar 16 19:40:59 2017 From: Raymond.Clarke1 at Verizon.net (Raymond A. Clarke) Date: Thu, 16 Mar 2017 19:40:59 -0400 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: <20170316140956.b554e26909f2beaf9f8ddbf6be9a6600.ee7a29052e.wbe@email09.godaddy.com> References: <20170316140956.b554e26909f2beaf9f8ddbf6be9a6600.ee7a29052e.wbe@email09.godaddy.com> Message-ID: <02a701d29eae$c4722a00$4d567e00$@Verizon.net> Hello All, A few years back, I did some research on bit-rot and data corruption, as it relates to the various medium that data passes through, on its way to and from the user. Consider this simple example; as data from memory to HBA to cable to air to cable and so on, bits can be lost along way at any one of, or several of the medium transit points. This something that current technologies can help with, in part. Back to the original question, :how do we insure against corruption, either from compression, encryption? and/or transmission? Well disk and tape(data resting places, if you will) have a come very long way in reducing bit-error rates, compression and encryption. But the ?resting places? are only part of a problem. In accordance with Gail?s suggestion and as Dr. Rosenthal has coined, LOCKSS (?lot of copies keep stuff safe?). Take good care, Raymond From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of gail at trumantechnologies.com Sent: Thursday, March 16, 2017 5:10 PM To: Jeanne Kramer-Smyth ; Robert Spindler ; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Hello again, Jeanne, I think you're hitting on something that needs to be raised to (and pushed for with) vendors, and that is the need for "More transparency" and the reporting to customers of "events" that are part of the provenance of a digital object. The storage architectures do a good job of error detection and self healing; however, they do not report this out. I'd like to (this is my dream) have vendors report back to customers (as part of their SLA) when a object (or part of an object if it's been chunked) has been repaired/self-healed - or lost forever. I could then record this as a PREMIS event. As you know, vendors "design for" 11x9s or 13x9s durability, but their SLAs do not require them to tell us if their durability and data corruption starts to get really bad for whatever reason. I've not directly answered your question about whether the encryption, dedupe, compression, and other things that can happen inside a storage system is increasing the risk of corruption. I'll look around. I am sure the disk vendors and storage solution and cloud storage vendors have run the numbers, but am not sure if they're made public. This alias has people from Oracle, Seagate and other storage companies on it so I encourage them to please share any research they have on this - Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: RE: [Pasig-discuss] Risks of encryption & compression built into storage options? From: Jeanne Kramer-Smyth > Date: Thu, March 16, 2017 1:44 pm To: "gail at trumantechnologies.com " >, "Robert Spindler" >, "pasig-discuss at mail.asis.org " > Thanks Gail & Rob for your replies. I am less worried about the scenario of someone stealing a drive ? as Rob pointed out, if that is happening we have bigger problems. I do wonder if there are increased risks of bit-rot/file corruption with encryption, compression, and data deduplication. Have there been any studies on this? Could pulling a file off a drive that requires reversal of the auto-encryption and auto-compression in place at the system level mean a greater risk of bits flipping? I am trying to contrast the increased ?handling? and change required to get from the stored version to the original version vs the decreased ?handling? it would require if what I am pulling off the storage device is exactly what I sent to be stored. I am less worried about issues related to not being able to decrypt content. The storage solutions we are contemplating would remain under enough ongoing management that these issues should be avoidable. Since ensuring that non-public records remain secure is also very important, encryption gets some points in the ?pro? column. I agree that having multiple copies in different storage architectures and with different vendors would also decrease risk. I want to understand the risks related to the different storage architectures and the ever increasing number of ?automatic? things being done to digital objects in the process of them being stored and retrieved. Are there people doing work, independent of vendor claims, to document these types of risks? Thank you, Jeanne Jeanne Kramer-Smyth IT Officer, Information Management Services II Information and Technology Solutions WBG Library & Archives of Development T 202-473-9803 E jkramersmyth at worldbankgroup.org W www.worldbank.org spellboundblog jkramersmyth jkramersmyth A 1818 H St NW Washington, DC 20433 From: gail at trumantechnologies.com [mailto:gail at trumantechnologies.com] Sent: Thursday, March 16, 2017 3:18 PM To: Robert Spindler >; Jeanne Kramer-Smyth >; pasig-discuss at mail.asis.org Subject: RE: [Pasig-discuss] Risks of encryption & compression built into storage options? Hi all, a good topic! There is new drive technology from Seagate (probably other manufacturers) called "Self Encrypted Drives" (SEDs) which can be used to solve the problem of a person stealing a drive and running off with data. Most cloud services now automatically provide "server side encryption" which means the vendor is doing the encryption for all data at rest (as you point out Jeanne). This is required by HIPAA for all health care data, and is now considered cloud best practice for cloud vendors due to the very real risk of hacking. So, for archival, we need to weigh the data security provided by cloud storage services using server side encryption with the risk of the vendor managing the encryption keys. Which IMO underscores the importance of having multiple copies of all your archival data -- with different vendors and storage architectures or media types if possible. Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? From: Robert Spindler < rob.spindler at asu.edu> Date: Thu, March 16, 2017 9:06 am To: Jeanne Kramer-Smyth < jkramersmyth at worldbankgroup.org>, " pasig-discuss at mail.asis.org" < pasig-discuss at mail.asis.org> At risk of starting a conversation, here are a couple basic issues from an archival standpoint: Encryption: Who has the keys and what happens should a provider go out of business? Compression: Lossy or Lossless and how does that compression act on different file formats (video/audio). If this is frequently accessed material it becomes more of an issue. Short story: At a CNI meeting perhaps 15 years ago in a session about ebooks I asked a panel of vendors if they would give up the keys to encrypted e-books when they reached public domain. Crickets. Physical discs are not secure given the forensics software widely available today, but if someone can grab a physical disc the provider has more problems than forensics. Rob Spindler University Archivist and Head Archives and Special Collections Arizona State University Libraries Tempe AZ 85287-1006 480.965.9277 http://www.asu.edu/lib/archives From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Jeanne Kramer-Smyth Sent: Thursday, March 16, 2017 8:54 AM To: pasig-discuss at mail.asis.org Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? Is anyone aware of active research into the risks to digital preservation that are posed by built in encryption and compression in both cloud and on-prem storage options? Any and all go-to sources for research and reading on these topics would be very welcome. I am being told by the staff who source storage solutions for my organization that encryption and compression are generally included at the hardware level. That content is automatically encrypted and compressed as it is written to disc ? and then un-encrypted and un-compressed as it is pulled off disc in response to a request. It is advertised as both more secure (someone stealing a physical disc could not, in theory, extract its contents) and more cost efficient (taking up less space). I want to be sure that as we make our choices for long-term storage of permanent digital records that we take these risks into accounts. Thank you! Jeanne Jeanne Kramer-Smyth IT Officer, Information Management Services II Information and Technology Solutions WBG Library & Archives of Development T 202-473-9803 E jkramersmyth at worldbankgroup.org W www.worldbank.org spellboundblog jkramersmyth jkramersmyth A 1818 H St NW Washington, DC 20433 _____ ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 170 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 700 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.jpg Type: image/jpeg Size: 11482 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.jpg Type: image/jpeg Size: 11424 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.png Type: image/png Size: 170 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.png Type: image/png Size: 6577 bytes Desc: not available URL: From lw85381 at yahoo.com Thu Mar 16 21:03:47 2017 From: lw85381 at yahoo.com (Chris Wood) Date: Thu, 16 Mar 2017 18:03:47 -0700 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: <02a701d29eae$c4722a00$4d567e00$@Verizon.net> References: <20170316140956.b554e26909f2beaf9f8ddbf6be9a6600.ee7a29052e.wbe@email09.godaddy.com> <02a701d29eae$c4722a00$4d567e00$@Verizon.net> Message-ID: Thanks Ray as always for a great summary. Now my three bits: Three (3) copies please. One of which is in a remote location on a different flood plane, Electric grid, fault line etc. for the obvious reasons. Mathematically, this has turned out to be the optimal number looked at with a cost/benefit mindset. Kind of like: 2 is better than one, buta local problem gets both copies. Three (remote) is more expensive but you get A LOT more data resilience/persistence. Four costs a bunch more, but delivers just a little bit more resilience. Four+ are all examples of ever diminishing returns. CW On 3/16/2017 4:40 PM, Raymond A. Clarke wrote: > > Hello All, > > A few years back, I did some research on bit-rot and data corruption, > as it relates to the various medium that data passes through, on its > way to and from the user. Consider this simple example; as data from > memory to HBA to cable to air to cable and so on, bits can be lost > along way at any one of, or several of the medium transit points. > This something that current technologies can help with, in part. Back > to the original question, :how do we insure against corruption, either > from compression, encryption? and/or transmission? Well disk and > tape(/data resting places/, if you will) have a come very long way in > reducing bit-error rates, compression and encryption. But the > ?/resting places?/ are only part of a problem. In accordance with > Gail?s suggestion and as Dr. Rosenthal has coined, LOCKSS (?lot of > copies keep stuff safe?). > > Take good care, > > Raymond > > *From:*Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] *On > Behalf Of *gail at trumantechnologies.com > *Sent:* Thursday, March 16, 2017 5:10 PM > *To:* Jeanne Kramer-Smyth ; Robert > Spindler ; pasig-discuss at mail.asis.org > *Subject:* Re: [Pasig-discuss] Risks of encryption & compression built > into storage options? > > Hello again, Jeanne, > > I think you're hitting on something that needs to be raised to (and > pushed for with) vendors, and that is the need for "More transparency" > and the reporting to customers of "events" that are part of the > provenance of a digital object. The storage architectures do a good > job of error detection and self healing; however, they do not report > this out. I'd like to (this is my dream) have vendors report back to > customers (as part of their SLA) when a object (or part of an object > if it's been chunked) has been repaired/self-healed - or lost forever. > I could then record this as a PREMIS event. As you know, vendors > "design for" 11x9s or 13x9s durability, but their SLAs do not require > them to tell us if their durability and data corruption starts to get > really bad for whatever reason. > > I've not directly answered your question about whether the encryption, > dedupe, compression, and other things that can happen inside a storage > system is increasing the risk of corruption. I'll look around. I am > sure the disk vendors and storage solution and cloud storage vendors > have run the numbers, but am not sure if they're made public. > > This alias has people from Oracle, Seagate and other storage companies > on it so I encourage them to please share any research they have on > this - > > Gail > > Gail Truman > > Truman Technologies, LLC > > Certified Digital Archives Specialist, Society of American Archivists > > /*Protecting the world's digital heritage for future generations*/ > > www.trumantechnologies.com > > facebook/TrumanTechnologies > > https://www.linkedin.com/in/gtruman > > +1 510 502 6497 > > -------- Original Message -------- > Subject: RE: [Pasig-discuss] Risks of encryption & compression built > into storage options? > From: Jeanne Kramer-Smyth > > Date: Thu, March 16, 2017 1:44 pm > To: "gail at trumantechnologies.com > " >, "Robert > Spindler" >, > "pasig-discuss at mail.asis.org " > > > > Thanks Gail & Rob for your replies. > > I am less worried about the scenario of someone stealing a drive ? > as Rob pointed out, if that is happening we have bigger problems. > > I do wonder if there are increased risks of bit-rot/file > corruption with encryption, compression, and data deduplication. > Have there been any studies on this? Could pulling a file off a > drive that requires reversal of the auto-encryption and > auto-compression in place at the system level mean a greater risk > of bits flipping? I am trying to contrast the increased ?handling? > and change required to get from the stored version to the original > version vs the decreased ?handling? it would require if what I am > pulling off the storage device is exactly what I sent to be stored. > > I am less worried about issues related to not being able to > decrypt content. The storage solutions we are contemplating would > remain under enough ongoing management that these issues should be > avoidable. Since ensuring that non-public records remain secure is > also very important, encryption gets some points in the ?pro? > column. I agree that having multiple copies in different storage > architectures and with different vendors would also decrease risk. > > I want to understand the risks related to the different storage > architectures and the ever increasing number of ?automatic? things > being done to digital objects in the process of them being stored > and retrieved. Are there people doing work, independent of vendor > claims, to document these types of risks? > > Thank you, > > Jeanne > > *Jeanne Kramer-Smyth* > > *IT Officer, Information Management Services II* > > http://siteresources.worldbank.org/NEWS/Images/spacer.png > > *Information and Technology Solutions* > > *WBG Library & Archives of Development* > > T > > > > 202-473-9803 > > E > > > > jkramersmyth at worldbankgroup.org > > > W > > > > www.worldbank.org > > > http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg > > > > spellboundblog > > http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg > > > > jkramersmyth > > http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg > > > > jkramersmyth > > A > > > > 1818 H St NW Washington, DC 20433 > > http://siteresources.worldbank.org/NEWS/Images/spacer.png > > http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png > > *From:*gail at trumantechnologies.com > > [mailto:gail at trumantechnologies.com] > *Sent:* Thursday, March 16, 2017 3:18 PM > *To:* Robert Spindler >; Jeanne Kramer-Smyth > >; > pasig-discuss at mail.asis.org > *Subject:* RE: [Pasig-discuss] Risks of encryption & compression > built into storage options? > > Hi all, a good topic! > > There is new drive technology from Seagate (probably other > manufacturers) called "Self Encrypted Drives" (SEDs) which can be > used to solve the problem of a person stealing a drive and running > off with data. > > Most cloud services now automatically provide "server side > encryption" which means the vendor is doing the encryption for all > data at rest (as you point out Jeanne). This is required by HIPAA > for all health care data, and is now considered cloud best > practice for cloud vendors due to the very real risk of hacking. > So, for archival, we need to weigh the data security provided by > cloud storage services using server side encryption with the risk > of the vendor managing the encryption keys. Which IMO underscores > the importance of having multiple copies of all your archival data > -- with different vendors and storage architectures or media types > if possible. > > Gail > > Gail Truman > > Truman Technologies, LLC > > Certified Digital Archives Specialist, Society of American Archivists > > /*Protecting the world's digital heritage for future generations*/ > > www.trumantechnologies.com > > facebook/TrumanTechnologies > > https://www.linkedin.com/in/gtruman > > +1 510 502 6497 > > -------- Original Message -------- > Subject: Re: [Pasig-discuss] Risks of encryption & compression > built > into storage options? > From: Robert Spindler > > Date: Thu, March 16, 2017 9:06 am > To: Jeanne Kramer-Smyth >, > "pasig-discuss at mail.asis.org > " > > > > At risk of starting a conversation, here are a couple basic > issues from an archival standpoint: > > Encryption: Who has the keys and what happens should a > provider go out of business? > > Compression: Lossy or Lossless and how does that compression > act on different file formats (video/audio). If this is > frequently accessed material it becomes more of an issue. > > Short story: At a CNI meeting perhaps 15 years ago in a > session about ebooks I asked a panel of vendors if they would > give up the keys to encrypted e-books when they reached public > domain. Crickets. > > Physical discs are not secure given the forensics software > widely available today, but if someone can grab a physical > disc the provider has more problems than forensics. > > Rob Spindler > > University Archivist and Head > > Archives and Special Collections > > Arizona State University Libraries > > Tempe AZ 85287-1006 > > 480.965.9277 > > http://www.asu.edu/lib/archives > > *From:*Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] > *On Behalf Of *Jeanne Kramer-Smyth > *Sent:* Thursday, March 16, 2017 8:54 AM > *To:* pasig-discuss at mail.asis.org > > *Subject:* [Pasig-discuss] Risks of encryption & compression > built into storage options? > > Is anyone aware of active research into the risks to digital > preservation that are posed by built in encryption and > compression in both cloud and on-prem storage options? Any and > all go-to sources for research and reading on these topics > would be very welcome. > > I am being told by the staff who source storage solutions for > my organization that encryption and compression are generally > included at the hardware level. That content is automatically > encrypted and compressed as it is written to disc ? and then > un-encrypted and un-compressed as it is pulled off disc in > response to a request. It is advertised as both more secure > (someone stealing a physical disc could not, in theory, > extract its contents) and more cost efficient (taking up less > space). > > I want to be sure that as we make our choices for long-term > storage of permanent digital records that we take these risks > into accounts. > > Thank you! > > Jeanne > > *Jeanne Kramer-Smyth* > > *IT Officer, Information Management Services II* > > http://siteresources.worldbank.org/NEWS/Images/spacer.png > > *Information and Technology Solutions* > > *WBG Library & Archives of Development* > > T > > > > 202-473-9803 > > E > > > > jkramersmyth at worldbankgroup.org > > > W > > > > www.worldbank.org > > > http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg > > > > spellboundblog > > http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg > > > > jkramersmyth > > http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg > > > > jkramersmyth > > A > > > > 1818 H St NW Washington, DC 20433 > > http://siteresources.worldbank.org/NEWS/Images/spacer.png > > http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png > > ------------------------------------------------------------------------ > > ---- > To subscribe, unsubscribe, or modify your subscription, please > visit > http://mail.asis.org/mailman/listinfo/pasig-discuss > _______ > PASIG Webinars and conference material is at > http://www.preservationandarchivingsig.org/index.html > _______________________________________________ > Pasig-discuss mailing list > Pasig-discuss at mail.asis.org > http://mail.asis.org/mailman/listinfo/pasig-discuss > > > > ---- > To subscribe, unsubscribe, or modify your subscription, please visit > http://mail.asis.org/mailman/listinfo/pasig-discuss > _______ > PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html > _______________________________________________ > Pasig-discuss mailing list > Pasig-discuss at mail.asis.org > http://mail.asis.org/mailman/listinfo/pasig-discuss -- ---------------------------------------------------- Chris Wood Storage & Data Management Office: 408-782-2757 (Home Office) Office: 408-276-0730 (Work Office) Mobile: 408-218-7313 (Preferred) Email: lw85381 at yahoo.com ---------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 170 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 700 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11482 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11424 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 170 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 6577 bytes Desc: not available URL: From jos.vanwezel at kit.edu Fri Mar 17 03:48:07 2017 From: jos.vanwezel at kit.edu (van Wezel, Jos (SCC)) Date: Fri, 17 Mar 2017 07:48:07 +0000 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? Message-ID: <9758a8c7f6644c81bc8ad030c5540dff@kit-msx-29.kit.edu> Chris,do you happen to have any reference to the mathatical correctness or computation that 3 copies is optimal. Is proof based on the standard ecc values that vendors list with their components (tapes, ?disks, ?transport lines, memory etc). I'm asking because its difficult to argue for the additional costs of a third copy without the math. Currently I can't tell my customers how much (as in percentage) extra security an addittional copy will bring, even theoretically.? regards jos Sent from my Samsung Galaxy smartphone. -------- Original message --------From: Chris Wood Date: 17/03/2017 02:07 (GMT+01:00) To: "Raymond A. Clarke" , gail at trumantechnologies.com, 'Jeanne Kramer-Smyth' , 'Robert Spindler' , pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Thanks Ray as always for a great summary. Now my three bits: Three (3) copies please. One of which is in a remote location on a different flood plane, Electric grid, fault line etc. for the obvious reasons. Mathematically, this has turned out to be the optimal number looked at with a cost/benefit mindset. Kind of like: 2 is better than one, buta? local problem gets both copies. Three (remote) is more expensive but you get A LOT more data resilience/persistence. Four costs a bunch more, but delivers just a little bit more resilience. Four+ are all examples of ever diminishing returns. ? CW On 3/16/2017 4:40 PM, Raymond A. Clarke wrote: Hello All, ? A few years back, I did some research on bit-rot and data corruption, as it relates to the various medium that data passes through, on its way to and from the user.? Consider this simple example; as data from memory to HBA to cable to air to cable and so on, bits can be lost along way at any one of, or several of the medium transit? points. This something that current technologies can help with, in part. ?Back to the original question, :how do we insure against corruption, either from compression, encryption? and/or transmission?? Well disk and tape(data resting places, if you will) have a come very long way in reducing bit-error rates, compression and encryption.? But the ?resting places? are only part of a problem.? In accordance with Gail?s suggestion and as Dr. Rosenthal has coined, LOCKSS (?lot of copies keep stuff safe?).? ? ? Take good care, Raymond ? From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of gail at trumantechnologies.com Sent: Thursday, March 16, 2017 5:10 PM To: Jeanne Kramer-Smyth ; Robert Spindler ; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? ? Hello again, Jeanne,? ? I think you're hitting on something that needs to be raised to (and pushed for with) vendors, and that is the need for "More transparency" and the reporting to customers of "events" that are part of the provenance of a digital object. The storage architectures do a good job of error detection and self healing; however, they do not report this out. I'd like to (this is my dream) have vendors report back to customers (as part of their SLA) when a object (or part of an object if it's been chunked) has been repaired/self-healed - or lost forever. I could then record this as a PREMIS event. As you know, vendors "design for" 11x9s or 13x9s durability, but their SLAs do not require them to tell us if their durability and data corruption starts to get really bad for whatever reason. ? I've not directly answered your question about whether the encryption, dedupe, compression, and other things that can happen inside a storage system is increasing the risk of corruption. I'll look around. I am sure the disk vendors and storage solution and cloud storage vendors have run the numbers, but am not sure if they're made public.? ? This alias has people from Oracle, Seagate and other storage companies on it so I encourage them to please share any research they have on this -? ? ? Gail ? ? ? Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists ? Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman? ? +1 510 502 6497 ? ? ? -------- Original Message -------- Subject: RE: [Pasig-discuss] Risks of encryption & compression built into storage options? From: Jeanne Kramer-Smyth Date: Thu, March 16, 2017 1:44 pm To: "gail at trumantechnologies.com" , "Robert Spindler" , "pasig-discuss at mail.asis.org" Thanks Gail & Rob for your replies. ? I am less worried about the scenario of someone stealing a drive ? as Rob pointed out, if that is happening we have bigger problems. ? I do wonder if there are increased risks of bit-rot/file corruption with encryption, compression, and data deduplication. Have there been any studies on this? Could pulling a file off a drive that requires reversal of the auto-encryption and auto-compression in place at the system level mean a greater risk of bits flipping? I am trying to contrast the increased ?handling? and change required to get from the stored version to the original version vs the decreased ?handling? it would require if what I am pulling off the storage device is exactly what I sent to be stored. ? I am less worried about issues related to not being able to decrypt content. The storage solutions we are contemplating would remain under enough ongoing management that these issues should be avoidable. Since ensuring that non-public records remain secure is also very important, encryption gets some points in the ?pro? column. I agree that having multiple copies in different storage architectures and with different vendors would also decrease risk. ? I want to understand the risks related to the different storage architectures and the ever increasing number of ?automatic? things being done to digital objects in the process of them being stored and retrieved. Are there people doing work, independent of vendor claims, to document these types of risks? ? Thank you, ? Jeanne Jeanne Kramer-Smyth IT Officer, Information Management Services II Information and Technology Solutions WBG Library & Archives of Development T 202-473-9803 E jkramersmyth at worldbankgroup.org W www.worldbank.org spellboundblog jkramersmyth jkramersmyth A 1818 H St NW Washington, DC 20433 ? From: gail at trumantechnologies.com [mailto:gail at trumantechnologies.com] Sent: Thursday, March 16, 2017 3:18 PM To: Robert Spindler ; Jeanne Kramer-Smyth ; pasig-discuss at mail.asis.org Subject: RE: [Pasig-discuss] Risks of encryption & compression built into storage options? ? Hi all, a good topic! There is new drive technology from Seagate (probably other manufacturers) called "Self Encrypted Drives" (SEDs) which can be used to solve the problem of a person stealing a drive and running off with data. ? Most cloud services now automatically provide "server side encryption" which means the vendor is doing the encryption for all data at rest (as you point out Jeanne). This is required by HIPAA for all health care data, and is now considered cloud best practice for cloud vendors due to the very real risk of hacking. So, for archival, we need to weigh the data security provided by cloud storage services using server side encryption with the risk of the vendor managing the encryption keys. Which IMO underscores the importance of having multiple copies of all your archival data -- with different vendors and storage architectures or media types if possible. ? Gail ? ? ? ? ? Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists ? Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman? ? +1 510 502 6497 ? ? ? -------- Original Message -------- Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? From: Robert Spindler Date: Thu, March 16, 2017 9:06 am To: Jeanne Kramer-Smyth , "pasig-discuss at mail.asis.org" At risk of starting a conversation, here are a couple basic issues from an archival standpoint: ? Encryption: Who has the keys and what happens should a provider go out of business? ? Compression: Lossy or Lossless and how does that compression act on different file formats (video/audio). If this is frequently accessed material it becomes more of an issue. ? Short story: At a CNI meeting perhaps 15 years ago in a session about ebooks I asked a panel of vendors if they would give up the keys to encrypted e-books when they reached public domain. Crickets. ? Physical discs are not secure given the forensics software widely available today, but if someone can grab a physical disc the provider has more problems than forensics. ? Rob Spindler University Archivist and Head Archives and Special Collections Arizona State University Libraries Tempe AZ 85287-1006 480.965.9277 http://www.asu.edu/lib/archives ? From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Jeanne Kramer-Smyth Sent: Thursday, March 16, 2017 8:54 AM To: pasig-discuss at mail.asis.org Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? ? Is anyone aware of active research into the risks to digital preservation that are posed by built in encryption and compression in both cloud and on-prem storage options? Any and all go-to sources for research and reading on these topics would be very welcome. ? I am being told by the staff who source storage solutions for my organization that encryption and compression are generally included at the hardware level. That content is automatically encrypted and compressed as it is written to disc ? and then un-encrypted and un-compressed as it is pulled off disc in response to a request. It is advertised as both more secure (someone stealing a physical disc could not, in theory, extract its contents) and more cost efficient (taking up less space). ? I want to be sure that as we make our choices for long-term storage of permanent digital records that we take these risks into accounts. ? Thank you! Jeanne ? Jeanne Kramer-Smyth IT Officer, Information Management Services II Information and Technology Solutions WBG Library & Archives of Development T 202-473-9803 E jkramersmyth at worldbankgroup.org W www.worldbank.org spellboundblog jkramersmyth jkramersmyth A 1818 H St NW Washington, DC 20433 ? ? ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -- ---------------------------------------------------- Chris Wood Storage & Data Management Office: 408-782-2757 (Home Office) Office: 408-276-0730 (Work Office) Mobile: 408-218-7313 (Preferred) Email: lw85381 at yahoo.com ---------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00001.png Type: image/png Size: 170 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00005.png Type: image/png Size: 170 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00006.png Type: image/png Size: 6577 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2190 bytes Desc: not available URL: From neil.jefferies at bodleian.ox.ac.uk Fri Mar 17 05:06:15 2017 From: neil.jefferies at bodleian.ox.ac.uk (Neil Jefferies) Date: Fri, 17 Mar 2017 09:06:15 +0000 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: <9758a8c7f6644c81bc8ad030c5540dff@kit-msx-29.kit.edu> References: <9758a8c7f6644c81bc8ad030c5540dff@kit-msx-29.kit.edu> Message-ID: <48E9420A4871584593FC3D435EF345AAEEEA1154@MBX10.ad.oak.ox.ac.uk> Jos It?s quite simple ? when one record is corrupted or altered then a checksum doesn?t always catch it - for instance an error upstream of the checksumming mechanism that gets committed to storage through legitimate channels. People tend to assume erroneously that software, hardware and people work correctly ? the most common causes of corruption are going to be human error and software faults. Ray has already described the long chain of error-prone transmission even simple operations entail without even getting into software, operating systems and human processes ? all of which are considerably less reliable than modern hardware. With that sort of error then you have two copies of supposedly correct information and no way of telling which is correct. If you have three copies maintained in a suitably independent way then the error should only affect one and it stands out clearly. However, ensuring that the three copies are genuinely independent is not simple ? human error can often propagate to all three. This is where versioning and audit trails become essential. In terms of encryption ? if some else can?t get the data out of the medium in the absence of additional hardware and software then neither can you. Another thing to consider is that a bit error in uncompressed, unencrypted storage is a bit error in a single file or data structure and can often be reasonably easily corrected. A bit error in encrypted or compressed storage can actually destroy a lot more data because the nature of the algorithms used mean that a single bit in storage corresponds at least partially to a number of bits in the source data. Neil Jefferies MA MBA Head of Innovation Bodleian Digital Library Systems and Services Osney One Osney Mead OX2 0EW T: +44 1865 2-80588 From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of van Wezel, Jos (SCC) Sent: 17 March 2017 07:48 To: Chris Wood ; Raymond A. Clarke ; gail at trumantechnologies.com; 'Jeanne Kramer-Smyth' ; 'Robert Spindler' ; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Chris, do you happen to have any reference to the mathatical correctness or computation that 3 copies is optimal. Is proof based on the standard ecc values that vendors list with their components (tapes, disks, transport lines, memory etc). I'm asking because its difficult to argue for the additional costs of a third copy without the math. Currently I can't tell my customers how much (as in percentage) extra security an addittional copy will bring, even theoretically. regards jos Sent from my Samsung Galaxy smartphone. -------- Original message -------- From: Chris Wood > Date: 17/03/2017 02:07 (GMT+01:00) To: "Raymond A. Clarke" >, gail at trumantechnologies.com, 'Jeanne Kramer-Smyth' >, 'Robert Spindler' >, pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Thanks Ray as always for a great summary. Now my three bits: Three (3) copies please. One of which is in a remote location on a different flood plane, Electric grid, fault line etc. for the obvious reasons. Mathematically, this has turned out to be the optimal number looked at with a cost/benefit mindset. Kind of like: 2 is better than one, buta local problem gets both copies. Three (remote) is more expensive but you get A LOT more data resilience/persistence. Four costs a bunch more, but delivers just a little bit more resilience. Four+ are all examples of ever diminishing returns. CW On 3/16/2017 4:40 PM, Raymond A. Clarke wrote: Hello All, A few years back, I did some research on bit-rot and data corruption, as it relates to the various medium that data passes through, on its way to and from the user. Consider this simple example; as data from memory to HBA to cable to air to cable and so on, bits can be lost along way at any one of, or several of the medium transit points. This something that current technologies can help with, in part. Back to the original question, :how do we insure against corruption, either from compression, encryption? and/or transmission? Well disk and tape(data resting places, if you will) have a come very long way in reducing bit-error rates, compression and encryption. But the ?resting places? are only part of a problem. In accordance with Gail?s suggestion and as Dr. Rosenthal has coined, LOCKSS (?lot of copies keep stuff safe?). Take good care, Raymond From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of gail at trumantechnologies.com Sent: Thursday, March 16, 2017 5:10 PM To: Jeanne Kramer-Smyth ; Robert Spindler ; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Hello again, Jeanne, I think you're hitting on something that needs to be raised to (and pushed for with) vendors, and that is the need for "More transparency" and the reporting to customers of "events" that are part of the provenance of a digital object. The storage architectures do a good job of error detection and self healing; however, they do not report this out. I'd like to (this is my dream) have vendors report back to customers (as part of their SLA) when a object (or part of an object if it's been chunked) has been repaired/self-healed - or lost forever. I could then record this as a PREMIS event. As you know, vendors "design for" 11x9s or 13x9s durability, but their SLAs do not require them to tell us if their durability and data corruption starts to get really bad for whatever reason. I've not directly answered your question about whether the encryption, dedupe, compression, and other things that can happen inside a storage system is increasing the risk of corruption. I'll look around. I am sure the disk vendors and storage solution and cloud storage vendors have run the numbers, but am not sure if they're made public. This alias has people from Oracle, Seagate and other storage companies on it so I encourage them to please share any research they have on this - Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: RE: [Pasig-discuss] Risks of encryption & compression built into storage options? From: Jeanne Kramer-Smyth > Date: Thu, March 16, 2017 1:44 pm To: "gail at trumantechnologies.com" >, "Robert Spindler" >, "pasig-discuss at mail.asis.org" > Thanks Gail & Rob for your replies. I am less worried about the scenario of someone stealing a drive ? as Rob pointed out, if that is happening we have bigger problems. I do wonder if there are increased risks of bit-rot/file corruption with encryption, compression, and data deduplication. Have there been any studies on this? Could pulling a file off a drive that requires reversal of the auto-encryption and auto-compression in place at the system level mean a greater risk of bits flipping? I am trying to contrast the increased ?handling? and change required to get from the stored version to the original version vs the decreased ?handling? it would require if what I am pulling off the storage device is exactly what I sent to be stored. I am less worried about issues related to not being able to decrypt content. The storage solutions we are contemplating would remain under enough ongoing management that these issues should be avoidable. Since ensuring that non-public records remain secure is also very important, encryption gets some points in the ?pro? column. I agree that having multiple copies in different storage architectures and with different vendors would also decrease risk. I want to understand the risks related to the different storage architectures and the ever increasing number of ?automatic? things being done to digital objects in the process of them being stored and retrieved. Are there people doing work, independent of vendor claims, to document these types of risks? Thank you, Jeanne Jeanne Kramer-Smyth IT Officer, Information Management Services II [http://siteresources.worldbank.org/NEWS/Images/spacer.png] Information and Technology Solutions WBG Library & Archives of Development T 202-473-9803 E jkramersmyth at worldbankgroup.org W www.worldbank.org [http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg] spellboundblog [http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg] jkramersmyth [http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg] jkramersmyth A 1818 H St NW Washington, DC 20433 [http://siteresources.worldbank.org/NEWS/Images/spacer.png] [http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png] From: gail at trumantechnologies.com [mailto:gail at trumantechnologies.com] Sent: Thursday, March 16, 2017 3:18 PM To: Robert Spindler >; Jeanne Kramer-Smyth >; pasig-discuss at mail.asis.org Subject: RE: [Pasig-discuss] Risks of encryption & compression built into storage options? Hi all, a good topic! There is new drive technology from Seagate (probably other manufacturers) called "Self Encrypted Drives" (SEDs) which can be used to solve the problem of a person stealing a drive and running off with data. Most cloud services now automatically provide "server side encryption" which means the vendor is doing the encryption for all data at rest (as you point out Jeanne). This is required by HIPAA for all health care data, and is now considered cloud best practice for cloud vendors due to the very real risk of hacking. So, for archival, we need to weigh the data security provided by cloud storage services using server side encryption with the risk of the vendor managing the encryption keys. Which IMO underscores the importance of having multiple copies of all your archival data -- with different vendors and storage architectures or media types if possible. Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? From: Robert Spindler > Date: Thu, March 16, 2017 9:06 am To: Jeanne Kramer-Smyth >, "pasig-discuss at mail.asis.org" > At risk of starting a conversation, here are a couple basic issues from an archival standpoint: Encryption: Who has the keys and what happens should a provider go out of business? Compression: Lossy or Lossless and how does that compression act on different file formats (video/audio). If this is frequently accessed material it becomes more of an issue. Short story: At a CNI meeting perhaps 15 years ago in a session about ebooks I asked a panel of vendors if they would give up the keys to encrypted e-books when they reached public domain. Crickets. Physical discs are not secure given the forensics software widely available today, but if someone can grab a physical disc the provider has more problems than forensics. Rob Spindler University Archivist and Head Archives and Special Collections Arizona State University Libraries Tempe AZ 85287-1006 480.965.9277 http://www.asu.edu/lib/archives From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Jeanne Kramer-Smyth Sent: Thursday, March 16, 2017 8:54 AM To: pasig-discuss at mail.asis.org Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? Is anyone aware of active research into the risks to digital preservation that are posed by built in encryption and compression in both cloud and on-prem storage options? Any and all go-to sources for research and reading on these topics would be very welcome. I am being told by the staff who source storage solutions for my organization that encryption and compression are generally included at the hardware level. That content is automatically encrypted and compressed as it is written to disc ? and then un-encrypted and un-compressed as it is pulled off disc in response to a request. It is advertised as both more secure (someone stealing a physical disc could not, in theory, extract its contents) and more cost efficient (taking up less space). I want to be sure that as we make our choices for long-term storage of permanent digital records that we take these risks into accounts. Thank you! Jeanne Jeanne Kramer-Smyth IT Officer, Information Management Services II [http://siteresources.worldbank.org/NEWS/Images/spacer.png] Information and Technology Solutions WBG Library & Archives of Development T 202-473-9803 E jkramersmyth at worldbankgroup.org W www.worldbank.org [http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg] spellboundblog [http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg] jkramersmyth [http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg] jkramersmyth A 1818 H St NW Washington, DC 20433 [http://siteresources.worldbank.org/NEWS/Images/spacer.png] [http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png] ________________________________ ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -- ---------------------------------------------------- Chris Wood Storage & Data Management Office: 408-782-2757 (Home Office) Office: 408-276-0730 (Work Office) Mobile: 408-218-7313 (Preferred) Email: lw85381 at yahoo.com ---------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 170 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 170 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.png Type: image/png Size: 6577 bytes Desc: image003.png URL: From Raymond.Clarke1 at Verizon.net Fri Mar 17 09:45:10 2017 From: Raymond.Clarke1 at Verizon.net (Raymond A. Clarke) Date: Fri, 17 Mar 2017 09:45:10 -0400 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: References: <20170316140956.b554e26909f2beaf9f8ddbf6be9a6600.ee7a29052e.wbe@email09.godaddy.com> <02a701d29eae$c4722a00$4d567e00$@Verizon.net> Message-ID: <1312844D-7E81-462A-B2E9-8B3E77B99ECC@Verizon.net> Right on Chris. Thanks. Apologies for any typos, Sent from my iPhone Take good care, Ray > On Mar 16, 2017, at 9:03 PM, Chris Wood wrote: > > Thanks Ray as always for a great summary. Now my three bits: > > Three (3) copies please. One of which is in a remote location on a different flood plane, Electric grid, fault line etc. for the obvious reasons. Mathematically, this has turned out to be the optimal number looked at with a cost/benefit mindset. Kind of like: 2 is better than one, buta local problem gets both copies. Three (remote) is more expensive but you get A LOT more data resilience/persistence. Four costs a bunch more, but delivers just a little bit more resilience. Four+ are all examples of ever diminishing returns. > > CW > >> On 3/16/2017 4:40 PM, Raymond A. Clarke wrote: >> Hello All, >> >> A few years back, I did some research on bit-rot and data corruption, as it relates to the various medium that data passes through, on its way to and from the user. Consider this simple example; as data from memory to HBA to cable to air to cable and so on, bits can be lost along way at any one of, or several of the medium transit points. This something that current technologies can help with, in part. Back to the original question, :how do we insure against corruption, either from compression, encryption? and/or transmission? Well disk and tape(data resting places, if you will) have a come very long way in reducing bit-error rates, compression and encryption. But the ?resting places? are only part of a problem. In accordance with Gail?s suggestion and as Dr. Rosenthal has coined, LOCKSS (?lot of copies keep stuff safe?). >> >> >> Take good care, >> Raymond >> >> From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of gail at trumantechnologies.com >> Sent: Thursday, March 16, 2017 5:10 PM >> To: Jeanne Kramer-Smyth ; Robert Spindler ; pasig-discuss at mail.asis.org >> Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? >> >> Hello again, Jeanne, >> >> I think you're hitting on something that needs to be raised to (and pushed for with) vendors, and that is the need for "More transparency" and the reporting to customers of "events" that are part of the provenance of a digital object. The storage architectures do a good job of error detection and self healing; however, they do not report this out. I'd like to (this is my dream) have vendors report back to customers (as part of their SLA) when a object (or part of an object if it's been chunked) has been repaired/self-healed - or lost forever. I could then record this as a PREMIS event. As you know, vendors "design for" 11x9s or 13x9s durability, but their SLAs do not require them to tell us if their durability and data corruption starts to get really bad for whatever reason. >> >> I've not directly answered your question about whether the encryption, dedupe, compression, and other things that can happen inside a storage system is increasing the risk of corruption. I'll look around. I am sure the disk vendors and storage solution and cloud storage vendors have run the numbers, but am not sure if they're made public. >> >> This alias has people from Oracle, Seagate and other storage companies on it so I encourage them to please share any research they have on this - >> >> >> Gail >> >> >> >> Gail Truman >> Truman Technologies, LLC >> Certified Digital Archives Specialist, Society of American Archivists >> >> Protecting the world's digital heritage for future generations >> www.trumantechnologies.com >> facebook/TrumanTechnologies >> https://www.linkedin.com/in/gtruman >> >> +1 510 502 6497 >> >> >> >> >> -------- Original Message -------- >> Subject: RE: [Pasig-discuss] Risks of encryption & compression built >> into storage options? >> From: Jeanne Kramer-Smyth >> Date: Thu, March 16, 2017 1:44 pm >> To: "gail at trumantechnologies.com" , "Robert >> Spindler" , "pasig-discuss at mail.asis.org" >> >> >> Thanks Gail & Rob for your replies. >> >> I am less worried about the scenario of someone stealing a drive ? as Rob pointed out, if that is happening we have bigger problems. >> >> I do wonder if there are increased risks of bit-rot/file corruption with encryption, compression, and data deduplication. Have there been any studies on this? Could pulling a file off a drive that requires reversal of the auto-encryption and auto-compression in place at the system level mean a greater risk of bits flipping? I am trying to contrast the increased ?handling? and change required to get from the stored version to the original version vs the decreased ?handling? it would require if what I am pulling off the storage device is exactly what I sent to be stored. >> >> I am less worried about issues related to not being able to decrypt content. The storage solutions we are contemplating would remain under enough ongoing management that these issues should be avoidable. Since ensuring that non-public records remain secure is also very important, encryption gets some points in the ?pro? column. I agree that having multiple copies in different storage architectures and with different vendors would also decrease risk. >> >> I want to understand the risks related to the different storage architectures and the ever increasing number of ?automatic? things being done to digital objects in the process of them being stored and retrieved. Are there people doing work, independent of vendor claims, to document these types of risks? >> >> Thank you, >> >> Jeanne >> Jeanne Kramer-Smyth >> IT Officer, Information Management Services II >> >> Information and Technology Solutions >> WBG Library & Archives of Development >> T >> 202-473-9803 >> E >> jkramersmyth at worldbankgroup.org >> W >> www.worldbank.org >> >> spellboundblog >> >> jkramersmyth >> >> jkramersmyth >> A >> 1818 H St NW Washington, DC 20433 >> >> >> >> From: gail at trumantechnologies.com [mailto:gail at trumantechnologies.com] >> Sent: Thursday, March 16, 2017 3:18 PM >> To: Robert Spindler ; Jeanne Kramer-Smyth ; pasig-discuss at mail.asis.org >> Subject: RE: [Pasig-discuss] Risks of encryption & compression built into storage options? >> >> Hi all, a good topic! >> There is new drive technology from Seagate (probably other manufacturers) called "Self Encrypted Drives" (SEDs) which can be used to solve the problem of a person stealing a drive and running off with data. >> >> Most cloud services now automatically provide "server side encryption" which means the vendor is doing the encryption for all data at rest (as you point out Jeanne). This is required by HIPAA for all health care data, and is now considered cloud best practice for cloud vendors due to the very real risk of hacking. So, for archival, we need to weigh the data security provided by cloud storage services using server side encryption with the risk of the vendor managing the encryption keys. Which IMO underscores the importance of having multiple copies of all your archival data -- with different vendors and storage architectures or media types if possible. >> >> Gail >> >> >> >> >> >> Gail Truman >> Truman Technologies, LLC >> Certified Digital Archives Specialist, Society of American Archivists >> >> Protecting the world's digital heritage for future generations >> www.trumantechnologies.com >> facebook/TrumanTechnologies >> https://www.linkedin.com/in/gtruman >> >> +1 510 502 6497 >> >> >> >> -------- Original Message -------- >> Subject: Re: [Pasig-discuss] Risks of encryption & compression built >> into storage options? >> From: Robert Spindler >> Date: Thu, March 16, 2017 9:06 am >> To: Jeanne Kramer-Smyth , >> "pasig-discuss at mail.asis.org" >> At risk of starting a conversation, here are a couple basic issues from an archival standpoint: >> >> Encryption: Who has the keys and what happens should a provider go out of business? >> >> Compression: Lossy or Lossless and how does that compression act on different file formats (video/audio). If this is frequently accessed material it becomes more of an issue. >> >> Short story: At a CNI meeting perhaps 15 years ago in a session about ebooks I asked a panel of vendors if they would give up the keys to encrypted e-books when they reached public domain. Crickets. >> >> Physical discs are not secure given the forensics software widely available today, but if someone can grab a physical disc the provider has more problems than forensics. >> >> Rob Spindler >> University Archivist and Head >> Archives and Special Collections >> Arizona State University Libraries >> Tempe AZ 85287-1006 >> 480.965.9277 >> http://www.asu.edu/lib/archives >> >> From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Jeanne Kramer-Smyth >> Sent: Thursday, March 16, 2017 8:54 AM >> To: pasig-discuss at mail.asis.org >> Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? >> >> Is anyone aware of active research into the risks to digital preservation that are posed by built in encryption and compression in both cloud and on-prem storage options? Any and all go-to sources for research and reading on these topics would be very welcome. >> >> I am being told by the staff who source storage solutions for my organization that encryption and compression are generally included at the hardware level. That content is automatically encrypted and compressed as it is written to disc ? and then un-encrypted and un-compressed as it is pulled off disc in response to a request. It is advertised as both more secure (someone stealing a physical disc could not, in theory, extract its contents) and more cost efficient (taking up less space). >> >> I want to be sure that as we make our choices for long-term storage of permanent digital records that we take these risks into accounts. >> >> Thank you! >> Jeanne >> >> Jeanne Kramer-Smyth >> IT Officer, Information Management Services II >> >> Information and Technology Solutions >> WBG Library & Archives of Development >> T >> 202-473-9803 >> E >> jkramersmyth at worldbankgroup.org >> W >> www.worldbank.org >> >> spellboundblog >> >> jkramersmyth >> >> jkramersmyth >> A >> 1818 H St NW Washington, DC 20433 >> >> >> >> >> ---- >> To subscribe, unsubscribe, or modify your subscription, please visit >> http://mail.asis.org/mailman/listinfo/pasig-discuss >> _______ >> PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html >> _______________________________________________ >> Pasig-discuss mailing list >> Pasig-discuss at mail.asis.org >> http://mail.asis.org/mailman/listinfo/pasig-discuss >> >> >> ---- >> To subscribe, unsubscribe, or modify your subscription, please visit >> http://mail.asis.org/mailman/listinfo/pasig-discuss >> _______ >> PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html >> _______________________________________________ >> Pasig-discuss mailing list >> Pasig-discuss at mail.asis.org >> http://mail.asis.org/mailman/listinfo/pasig-discuss > > -- > ---------------------------------------------------- > Chris Wood > Storage & Data Management > Office: 408-782-2757 (Home Office) > Office: 408-276-0730 (Work Office) > Mobile: 408-218-7313 (Preferred) > Email: lw85381 at yahoo.com > ---------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mime-attachment.jpg Type: image/jpeg Size: 700 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mime-attachment.jpg Type: image/jpeg Size: 11482 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mime-attachment.jpg Type: image/jpeg Size: 700 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mime-attachment.jpg Type: image/jpeg Size: 11482 bytes Desc: not available URL: From pmather at vt.edu Fri Mar 17 10:49:01 2017 From: pmather at vt.edu (Paul Mather) Date: Fri, 17 Mar 2017 10:49:01 -0400 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: <9758a8c7f6644c81bc8ad030c5540dff@kit-msx-29.kit.edu> References: <9758a8c7f6644c81bc8ad030c5540dff@kit-msx-29.kit.edu> Message-ID: <54C09C0B-6196-472B-9D05-44C87765D481@vt.edu> On Mar 17, 2017, at 3:48 AM, van Wezel, Jos (SCC) wrote: > Chris, > do you happen to have any reference to the mathatical correctness or computation that 3 copies is optimal. Is proof based on the standard ecc values that vendors list with their components (tapes, disks, transport lines, memory etc). I'm asking because its difficult to argue for the additional costs of a third copy without the math. Currently I can't tell my customers how much (as in percentage) extra security an addittional copy will bring, even theoretically. One thing I don't believe I've seen mentioned so far in regards to redundancy costs is switching to erasure-resilient coding rather than using plain replication. Explained briefly, erasure-resilient coding represents a logical unit of data as k fragments. These k fragments are then encoded into a larger unit of n fragments, n > k, where the n-k extra fragments can be thought of as "parity" fragments. The n encoded fragments may then be distributed across different disks, racks, and data centres. The value is that *any* k out of n fragments may be used to reconstitute the original logical unit of data. As n grows larger, the probability of total data loss grows smaller, and, conversely, the storage overhead and cost grows larger, allowing you to choose your cost/risk balance. The main disadvantage of erasure-resilient coding is that data I/O latency is increased due to the inherently distributed nature of the storage approach. There are comparisons between replication and erasure-resilient coding systems. One such (https://dl.acm.org/citation.cfm?id=687814 ) concludes, "We show that systems employing erasure codes have mean time to failures many orders of magnitude higher than replicated systems with similar storage and bandwidth requirements. More importantly, erasure-resilient systems use an order of magnitude less bandwidth and storage to provide similar system durability as replicated systems." Erasure-resilient coding is becoming mainstream in Cloud storage and object storage systems in general. I believe that Hadoop has recently acquired an erasure-resilient coding storage option for HDFS as an alternative to the standard replication model. This is due to the increase in data set sizes, where erasure-resilient coding can offer lower redundancy overheads than plain replication options, yet still offering the same or higher assurance levels on data availability. I also believe CEPH and OpenStack Swift are supporting erasure-resilient storage. Cheers, Paul. > > > regards > > jos > > Sent from my Samsung Galaxy smartphone. > > -------- Original message -------- > From: Chris Wood > > Date: 17/03/2017 02:07 (GMT+01:00) > To: "Raymond A. Clarke" >, gail at trumantechnologies.com , 'Jeanne Kramer-Smyth' >, 'Robert Spindler' >, pasig-discuss at mail.asis.org > Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? > > Thanks Ray as always for a great summary. Now my three bits: > > Three (3) copies please. One of which is in a remote location on a different flood plane, Electric grid, fault line etc. for the obvious reasons. Mathematically, this has turned out to be the optimal number looked at with a cost/benefit mindset. Kind of like: 2 is better than one, buta local problem gets both copies. Three (remote) is more expensive but you get A LOT more data resilience/persistence. Four costs a bunch more, but delivers just a little bit more resilience. Four+ are all examples of ever diminishing returns. > > CW > > On 3/16/2017 4:40 PM, Raymond A. Clarke wrote: >> Hello All, >> >> A few years back, I did some research on bit-rot and data corruption, as it relates to the various medium that data passes through, on its way to and from the user. Consider this simple example; as data from memory to HBA to cable to air to cable and so on, bits can be lost along way at any one of, or several of the medium transit points. This something that current technologies can help with, in part. Back to the original question, :how do we insure against corruption, either from compression, encryption? and/or transmission? Well disk and tape(data resting places, if you will) have a come very long way in reducing bit-error rates, compression and encryption. But the ?resting places? are only part of a problem. In accordance with Gail?s suggestion and as Dr. Rosenthal has coined, LOCKSS (?lot of copies keep stuff safe?). >> >> >> Take good care, >> Raymond >> >> From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org ] On Behalf Of gail at trumantechnologies.com >> Sent: Thursday, March 16, 2017 5:10 PM >> To: Jeanne Kramer-Smyth ; Robert Spindler ; pasig-discuss at mail.asis.org >> Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? >> >> Hello again, Jeanne, >> >> I think you're hitting on something that needs to be raised to (and pushed for with) vendors, and that is the need for "More transparency" and the reporting to customers of "events" that are part of the provenance of a digital object. The storage architectures do a good job of error detection and self healing; however, they do not report this out. I'd like to (this is my dream) have vendors report back to customers (as part of their SLA) when a object (or part of an object if it's been chunked) has been repaired/self-healed - or lost forever. I could then record this as a PREMIS event. As you know, vendors "design for" 11x9s or 13x9s durability, but their SLAs do not require them to tell us if their durability and data corruption starts to get really bad for whatever reason. >> >> I've not directly answered your question about whether the encryption, dedupe, compression, and other things that can happen inside a storage system is increasing the risk of corruption. I'll look around. I am sure the disk vendors and storage solution and cloud storage vendors have run the numbers, but am not sure if they're made public. >> >> This alias has people from Oracle, Seagate and other storage companies on it so I encourage them to please share any research they have on this - >> >> >> Gail >> >> >> >> Gail Truman >> Truman Technologies, LLC >> Certified Digital Archives Specialist, Society of American Archivists >> >> Protecting the world's digital heritage for future generations >> www.trumantechnologies.com >> facebook/TrumanTechnologies >> https://www.linkedin.com/in/gtruman >> >> +1 510 502 6497 >> >> >> >> -------- Original Message -------- >> Subject: RE: [Pasig-discuss] Risks of encryption & compression built >> into storage options? >> From: Jeanne Kramer-Smyth > >> Date: Thu, March 16, 2017 1:44 pm >> To: "gail at trumantechnologies.com " >, "Robert >> Spindler" >, "pasig-discuss at mail.asis.org " >> > >> >> Thanks Gail & Rob for your replies. >> >> I am less worried about the scenario of someone stealing a drive ? as Rob pointed out, if that is happening we have bigger problems. >> >> I do wonder if there are increased risks of bit-rot/file corruption with encryption, compression, and data deduplication. Have there been any studies on this? Could pulling a file off a drive that requires reversal of the auto-encryption and auto-compression in place at the system level mean a greater risk of bits flipping? I am trying to contrast the increased ?handling? and change required to get from the stored version to the original version vs the decreased ?handling? it would require if what I am pulling off the storage device is exactly what I sent to be stored. >> >> I am less worried about issues related to not being able to decrypt content. The storage solutions we are contemplating would remain under enough ongoing management that these issues should be avoidable. Since ensuring that non-public records remain secure is also very important, encryption gets some points in the ?pro? column. I agree that having multiple copies in different storage architectures and with different vendors would also decrease risk. >> >> I want to understand the risks related to the different storage architectures and the ever increasing number of ?automatic? things being done to digital objects in the process of them being stored and retrieved. Are there people doing work, independent of vendor claims, to document these types of risks? >> >> Thank you, >> >> Jeanne >> Jeanne Kramer-Smyth >> IT Officer, Information Management Services II >> >> Information and Technology Solutions >> WBG Library & Archives of Development >> T >> 202-473-9803 >> E >> jkramersmyth at worldbankgroup.org >> W >> www.worldbank.org >> >> spellboundblog >> >> jkramersmyth >> >> jkramersmyth >> A >> 1818 H St NW Washington, DC 20433 >> >> >> >> From: gail at trumantechnologies.com [mailto:gail at trumantechnologies.com ] >> Sent: Thursday, March 16, 2017 3:18 PM >> To: Robert Spindler >; Jeanne Kramer-Smyth >; pasig-discuss at mail.asis.org >> Subject: RE: [Pasig-discuss] Risks of encryption & compression built into storage options? >> >> Hi all, a good topic! >> There is new drive technology from Seagate (probably other manufacturers) called "Self Encrypted Drives" (SEDs) which can be used to solve the problem of a person stealing a drive and running off with data. >> >> Most cloud services now automatically provide "server side encryption" which means the vendor is doing the encryption for all data at rest (as you point out Jeanne). This is required by HIPAA for all health care data, and is now considered cloud best practice for cloud vendors due to the very real risk of hacking. So, for archival, we need to weigh the data security provided by cloud storage services using server side encryption with the risk of the vendor managing the encryption keys. Which IMO underscores the importance of having multiple copies of all your archival data -- with different vendors and storage architectures or media types if possible. >> >> Gail >> >> >> >> >> >> Gail Truman >> Truman Technologies, LLC >> Certified Digital Archives Specialist, Society of American Archivists >> >> Protecting the world's digital heritage for future generations >> www.trumantechnologies.com >> facebook/TrumanTechnologies >> https://www.linkedin.com/in/gtruman >> >> +1 510 502 6497 >> >> >> >> -------- Original Message -------- >> Subject: Re: [Pasig-discuss] Risks of encryption & compression built >> into storage options? >> From: Robert Spindler > >> Date: Thu, March 16, 2017 9:06 am >> To: Jeanne Kramer-Smyth >, >> "pasig-discuss at mail.asis.org " > >> At risk of starting a conversation, here are a couple basic issues from an archival standpoint: >> >> Encryption: Who has the keys and what happens should a provider go out of business? >> >> Compression: Lossy or Lossless and how does that compression act on different file formats (video/audio). If this is frequently accessed material it becomes more of an issue. >> >> Short story: At a CNI meeting perhaps 15 years ago in a session about ebooks I asked a panel of vendors if they would give up the keys to encrypted e-books when they reached public domain. Crickets. >> >> Physical discs are not secure given the forensics software widely available today, but if someone can grab a physical disc the provider has more problems than forensics. >> >> Rob Spindler >> University Archivist and Head >> Archives and Special Collections >> Arizona State University Libraries >> Tempe AZ 85287-1006 >> 480.965.9277 >> http://www.asu.edu/lib/archives >> >> From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org ] On Behalf Of Jeanne Kramer-Smyth >> Sent: Thursday, March 16, 2017 8:54 AM >> To: pasig-discuss at mail.asis.org >> Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? >> >> Is anyone aware of active research into the risks to digital preservation that are posed by built in encryption and compression in both cloud and on-prem storage options? Any and all go-to sources for research and reading on these topics would be very welcome. >> >> I am being told by the staff who source storage solutions for my organization that encryption and compression are generally included at the hardware level. That content is automatically encrypted and compressed as it is written to disc ? and then un-encrypted and un-compressed as it is pulled off disc in response to a request. It is advertised as both more secure (someone stealing a physical disc could not, in theory, extract its contents) and more cost efficient (taking up less space). >> >> I want to be sure that as we make our choices for long-term storage of permanent digital records that we take these risks into accounts. >> >> Thank you! >> Jeanne >> >> Jeanne Kramer-Smyth >> IT Officer, Information Management Services II >> >> Information and Technology Solutions >> WBG Library & Archives of Development >> T >> 202-473-9803 >> E >> jkramersmyth at worldbankgroup.org >> W >> www.worldbank.org >> >> spellboundblog >> >> jkramersmyth >> >> jkramersmyth >> A >> 1818 H St NW Washington, DC 20433 >> >> >> >> >> ---- >> To subscribe, unsubscribe, or modify your subscription, please visit >> http://mail.asis.org/mailman/listinfo/pasig-discuss >> _______ >> PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html >> _______________________________________________ >> Pasig-discuss mailing list >> Pasig-discuss at mail.asis.org >> http://mail.asis.org/mailman/listinfo/pasig-discuss >> >> ---- >> To subscribe, unsubscribe, or modify your subscription, please visit >> http://mail.asis.org/mailman/listinfo/pasig-discuss >> _______ >> PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html >> _______________________________________________ >> Pasig-discuss mailing list >> Pasig-discuss at mail.asis.org >> http://mail.asis.org/mailman/listinfo/pasig-discuss > > -- > ---------------------------------------------------- > Chris Wood > Storage & Data Management > Office: 408-782-2757 (Home Office) > Office: 408-276-0730 (Work Office) > Mobile: 408-218-7313 (Preferred) > Email: lw85381 at yahoo.com > ---------------------------------------------------- > ---- > To subscribe, unsubscribe, or modify your subscription, please visit > http://mail.asis.org/mailman/listinfo/pasig-discuss > _______ > PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html > _______________________________________________ > Pasig-discuss mailing list > Pasig-discuss at mail.asis.org > http://mail.asis.org/mailman/listinfo/pasig-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP URL: From neil.jefferies at bodleian.ox.ac.uk Fri Mar 17 11:34:59 2017 From: neil.jefferies at bodleian.ox.ac.uk (Neil Jefferies) Date: Fri, 17 Mar 2017 15:34:59 +0000 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: <54C09C0B-6196-472B-9D05-44C87765D481@vt.edu> References: <9758a8c7f6644c81bc8ad030c5540dff@kit-msx-29.kit.edu> <54C09C0B-6196-472B-9D05-44C87765D481@vt.edu> Message-ID: <48E9420A4871584593FC3D435EF345AAEEEA154C@MBX10.ad.oak.ox.ac.uk> However, erasure coding only protects against certain failure modes and is critically dependent on metadata for fragment reassembly. It is a very effective way of improving local resilience in a more efficient way than RAID arrays but once geographically distributed the benefits diminish. A disaster that takes out a datacentre will probably destroy enough fragments to make rebuild impossible. Until recently many erasure coded systems also did not apply the same resilience to metadata and so were subject to failure rather more than theory would suggest. Most systems tend to also favour uniformity in the fragment storage platform so they are also subject to systemic failures of technology stacks. In general, erasure coding use cases are targeted at availability rather than indestructability. From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Paul Mather Sent: 17 March 2017 14:49 To: van Wezel, Jos (SCC) Cc: pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? On Mar 17, 2017, at 3:48 AM, van Wezel, Jos (SCC) > wrote: Chris, do you happen to have any reference to the mathatical correctness or computation that 3 copies is optimal. Is proof based on the standard ecc values that vendors list with their components (tapes, disks, transport lines, memory etc). I'm asking because its difficult to argue for the additional costs of a third copy without the math. Currently I can't tell my customers how much (as in percentage) extra security an addittional copy will bring, even theoretically. One thing I don't believe I've seen mentioned so far in regards to redundancy costs is switching to erasure-resilient coding rather than using plain replication. Explained briefly, erasure-resilient coding represents a logical unit of data as k fragments. These k fragments are then encoded into a larger unit of n fragments, n > k, where the n-k extra fragments can be thought of as "parity" fragments. The n encoded fragments may then be distributed across different disks, racks, and data centres. The value is that *any* k out of n fragments may be used to reconstitute the original logical unit of data. As n grows larger, the probability of total data loss grows smaller, and, conversely, the storage overhead and cost grows larger, allowing you to choose your cost/risk balance. The main disadvantage of erasure-resilient coding is that data I/O latency is increased due to the inherently distributed nature of the storage approach. There are comparisons between replication and erasure-resilient coding systems. One such (https://dl.acm.org/citation.cfm?id=687814) concludes, "We show that systems employing erasure codes have mean time to failures many orders of magnitude higher than replicated systems with similar storage and bandwidth requirements. More importantly, erasure-resilient systems use an order of magnitude less bandwidth and storage to provide similar system durability as replicated systems." Erasure-resilient coding is becoming mainstream in Cloud storage and object storage systems in general. I believe that Hadoop has recently acquired an erasure-resilient coding storage option for HDFS as an alternative to the standard replication model. This is due to the increase in data set sizes, where erasure-resilient coding can offer lower redundancy overheads than plain replication options, yet still offering the same or higher assurance levels on data availability. I also believe CEPH and OpenStack Swift are supporting erasure-resilient storage. Cheers, Paul. regards jos Sent from my Samsung Galaxy smartphone. -------- Original message -------- From: Chris Wood > Date: 17/03/2017 02:07 (GMT+01:00) To: "Raymond A. Clarke" >, gail at trumantechnologies.com, 'Jeanne Kramer-Smyth' >, 'Robert Spindler' >, pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Thanks Ray as always for a great summary. Now my three bits: Three (3) copies please. One of which is in a remote location on a different flood plane, Electric grid, fault line etc. for the obvious reasons. Mathematically, this has turned out to be the optimal number looked at with a cost/benefit mindset. Kind of like: 2 is better than one, buta local problem gets both copies. Three (remote) is more expensive but you get A LOT more data resilience/persistence. Four costs a bunch more, but delivers just a little bit more resilience. Four+ are all examples of ever diminishing returns. CW On 3/16/2017 4:40 PM, Raymond A. Clarke wrote: Hello All, A few years back, I did some research on bit-rot and data corruption, as it relates to the various medium that data passes through, on its way to and from the user. Consider this simple example; as data from memory to HBA to cable to air to cable and so on, bits can be lost along way at any one of, or several of the medium transit points. This something that current technologies can help with, in part. Back to the original question, :how do we insure against corruption, either from compression, encryption? and/or transmission? Well disk and tape(data resting places, if you will) have a come very long way in reducing bit-error rates, compression and encryption. But the ?resting places? are only part of a problem. In accordance with Gail?s suggestion and as Dr. Rosenthal has coined, LOCKSS (?lot of copies keep stuff safe?). Take good care, Raymond From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of gail at trumantechnologies.com Sent: Thursday, March 16, 2017 5:10 PM To: Jeanne Kramer-Smyth ; Robert Spindler ; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Hello again, Jeanne, I think you're hitting on something that needs to be raised to (and pushed for with) vendors, and that is the need for "More transparency" and the reporting to customers of "events" that are part of the provenance of a digital object. The storage architectures do a good job of error detection and self healing; however, they do not report this out. I'd like to (this is my dream) have vendors report back to customers (as part of their SLA) when a object (or part of an object if it's been chunked) has been repaired/self-healed - or lost forever. I could then record this as a PREMIS event. As you know, vendors "design for" 11x9s or 13x9s durability, but their SLAs do not require them to tell us if their durability and data corruption starts to get really bad for whatever reason. I've not directly answered your question about whether the encryption, dedupe, compression, and other things that can happen inside a storage system is increasing the risk of corruption. I'll look around. I am sure the disk vendors and storage solution and cloud storage vendors have run the numbers, but am not sure if they're made public. This alias has people from Oracle, Seagate and other storage companies on it so I encourage them to please share any research they have on this - Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: RE: [Pasig-discuss] Risks of encryption & compression built into storage options? From: Jeanne Kramer-Smyth > Date: Thu, March 16, 2017 1:44 pm To: "gail at trumantechnologies.com" >, "Robert Spindler" >, "pasig-discuss at mail.asis.org" > Thanks Gail & Rob for your replies. I am less worried about the scenario of someone stealing a drive ? as Rob pointed out, if that is happening we have bigger problems. I do wonder if there are increased risks of bit-rot/file corruption with encryption, compression, and data deduplication. Have there been any studies on this? Could pulling a file off a drive that requires reversal of the auto-encryption and auto-compression in place at the system level mean a greater risk of bits flipping? I am trying to contrast the increased ?handling? and change required to get from the stored version to the original version vs the decreased ?handling? it would require if what I am pulling off the storage device is exactly what I sent to be stored. I am less worried about issues related to not being able to decrypt content. The storage solutions we are contemplating would remain under enough ongoing management that these issues should be avoidable. Since ensuring that non-public records remain secure is also very important, encryption gets some points in the ?pro? column. I agree that having multiple copies in different storage architectures and with different vendors would also decrease risk. I want to understand the risks related to the different storage architectures and the ever increasing number of ?automatic? things being done to digital objects in the process of them being stored and retrieved. Are there people doing work, independent of vendor claims, to document these types of risks? Thank you, Jeanne Jeanne Kramer-Smyth IT Officer, Information Management Services II Information and Technology Solutions WBG Library & Archives of Development T 202-473-9803 E jkramersmyth at worldbankgroup.org W www.worldbank.org spellboundblog jkramersmyth jkramersmyth A 1818 H St NW Washington, DC 20433 From: gail at trumantechnologies.com [mailto:gail at trumantechnologies.com] Sent: Thursday, March 16, 2017 3:18 PM To: Robert Spindler >; Jeanne Kramer-Smyth >; pasig-discuss at mail.asis.org Subject: RE: [Pasig-discuss] Risks of encryption & compression built into storage options? Hi all, a good topic! There is new drive technology from Seagate (probably other manufacturers) called "Self Encrypted Drives" (SEDs) which can be used to solve the problem of a person stealing a drive and running off with data. Most cloud services now automatically provide "server side encryption" which means the vendor is doing the encryption for all data at rest (as you point out Jeanne). This is required by HIPAA for all health care data, and is now considered cloud best practice for cloud vendors due to the very real risk of hacking. So, for archival, we need to weigh the data security provided by cloud storage services using server side encryption with the risk of the vendor managing the encryption keys. Which IMO underscores the importance of having multiple copies of all your archival data -- with different vendors and storage architectures or media types if possible. Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? From: Robert Spindler > Date: Thu, March 16, 2017 9:06 am To: Jeanne Kramer-Smyth >, "pasig-discuss at mail.asis.org" > At risk of starting a conversation, here are a couple basic issues from an archival standpoint: Encryption: Who has the keys and what happens should a provider go out of business? Compression: Lossy or Lossless and how does that compression act on different file formats (video/audio). If this is frequently accessed material it becomes more of an issue. Short story: At a CNI meeting perhaps 15 years ago in a session about ebooks I asked a panel of vendors if they would give up the keys to encrypted e-books when they reached public domain. Crickets. Physical discs are not secure given the forensics software widely available today, but if someone can grab a physical disc the provider has more problems than forensics. Rob Spindler University Archivist and Head Archives and Special Collections Arizona State University Libraries Tempe AZ 85287-1006 480.965.9277 http://www.asu.edu/lib/archives From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Jeanne Kramer-Smyth Sent: Thursday, March 16, 2017 8:54 AM To: pasig-discuss at mail.asis.org Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? Is anyone aware of active research into the risks to digital preservation that are posed by built in encryption and compression in both cloud and on-prem storage options? Any and all go-to sources for research and reading on these topics would be very welcome. I am being told by the staff who source storage solutions for my organization that encryption and compression are generally included at the hardware level. That content is automatically encrypted and compressed as it is written to disc ? and then un-encrypted and un-compressed as it is pulled off disc in response to a request. It is advertised as both more secure (someone stealing a physical disc could not, in theory, extract its contents) and more cost efficient (taking up less space). I want to be sure that as we make our choices for long-term storage of permanent digital records that we take these risks into accounts. Thank you! Jeanne Jeanne Kramer-Smyth IT Officer, Information Management Services II Information and Technology Solutions WBG Library & Archives of Development T 202-473-9803 E jkramersmyth at worldbankgroup.org W www.worldbank.org spellboundblog jkramersmyth jkramersmyth A 1818 H St NW Washington, DC 20433 ________________________________ ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -- ---------------------------------------------------- Chris Wood Storage & Data Management Office: 408-782-2757 (Home Office) Office: 408-276-0730 (Work Office) Mobile: 408-218-7313 (Preferred) Email: lw85381 at yahoo.com ---------------------------------------------------- ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter.burnhill at ed.ac.uk Fri Mar 17 06:19:28 2017 From: peter.burnhill at ed.ac.uk (BURNHILL Peter) Date: Fri, 17 Mar 2017 10:19:28 +0000 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: <9758a8c7f6644c81bc8ad030c5540dff@kit-msx-29.kit.edu> References: <9758a8c7f6644c81bc8ad030c5540dff@kit-msx-29.kit.edu> Message-ID: <955B6946-D656-4B96-93C7-25581981F984@ed.ac.uk> Two comments on the Why 3 question: 1. If you have only 2 copies and through bit rot or some other reason the contents are found to differ, which is correct? As with statistical inference, a sample of 3 (or more) enables estimation of variance. 2. David Rosenthal must surely have written on the maths of how many copies for LOCKSS, see http://blog.dshr.org/?m=1 hth Peter Peter Burnhill University of Edinburgh Peter Burnhill ISG Director of Business Development and Innovation University of Edinburgh Mobile: +44 (0) 774 0763 119 ps Am writing 'on the go' so pl excuse brevity On 17 Mar 2017, at 8:13 am, van Wezel, Jos (SCC) > wrote: Chris, do you happen to have any reference to the mathatical correctness or computation that 3 copies is optimal. Is proof based on the standard ecc values that vendors list with their components (tapes, disks, transport lines, memory etc). I'm asking because its difficult to argue for the additional costs of a third copy without the math. Currently I can't tell my customers how much (as in percentage) extra security an addittional copy will bring, even theoretically. regards jos Sent from my Samsung Galaxy smartphone. -------- Original message -------- From: Chris Wood > Date: 17/03/2017 02:07 (GMT+01:00) To: "Raymond A. Clarke" >, gail at trumantechnologies.com, 'Jeanne Kramer-Smyth' >, 'Robert Spindler' >, pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Thanks Ray as always for a great summary. Now my three bits: Three (3) copies please. One of which is in a remote location on a different flood plane, Electric grid, fault line etc. for the obvious reasons. Mathematically, this has turned out to be the optimal number looked at with a cost/benefit mindset. Kind of like: 2 is better than one, buta local problem gets both copies. Three (remote) is more expensive but you get A LOT more data resilience/persistence. Four costs a bunch more, but delivers just a little bit more resilience. Four+ are all examples of ever diminishing returns. CW On 3/16/2017 4:40 PM, Raymond A. Clarke wrote: Hello All, A few years back, I did some research on bit-rot and data corruption, as it relates to the various medium that data passes through, on its way to and from the user. Consider this simple example; as data from memory to HBA to cable to air to cable and so on, bits can be lost along way at any one of, or several of the medium transit points. This something that current technologies can help with, in part. Back to the original question, :how do we insure against corruption, either from compression, encryption? and/or transmission? Well disk and tape(data resting places, if you will) have a come very long way in reducing bit-error rates, compression and encryption. But the ?resting places? are only part of a problem. In accordance with Gail?s suggestion and as Dr. Rosenthal has coined, LOCKSS (?lot of copies keep stuff safe?). Take good care, Raymond From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of gail at trumantechnologies.com Sent: Thursday, March 16, 2017 5:10 PM To: Jeanne Kramer-Smyth ; Robert Spindler ; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Hello again, Jeanne, I think you're hitting on something that needs to be raised to (and pushed for with) vendors, and that is the need for "More transparency" and the reporting to customers of "events" that are part of the provenance of a digital object. The storage architectures do a good job of error detection and self healing; however, they do not report this out. I'd like to (this is my dream) have vendors report back to customers (as part of their SLA) when a object (or part of an object if it's been chunked) has been repaired/self-healed - or lost forever. I could then record this as a PREMIS event. As you know, vendors "design for" 11x9s or 13x9s durability, but their SLAs do not require them to tell us if their durability and data corruption starts to get really bad for whatever reason. I've not directly answered your question about whether the encryption, dedupe, compression, and other things that can happen inside a storage system is increasing the risk of corruption. I'll look around. I am sure the disk vendors and storage solution and cloud storage vendors have run the numbers, but am not sure if they're made public. This alias has people from Oracle, Seagate and other storage companies on it so I encourage them to please share any research they have on this - Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: RE: [Pasig-discuss] Risks of encryption & compression built into storage options? From: Jeanne Kramer-Smyth > Date: Thu, March 16, 2017 1:44 pm To: "gail at trumantechnologies.com" >, "Robert Spindler" >, "pasig-discuss at mail.asis.org" > Thanks Gail & Rob for your replies. I am less worried about the scenario of someone stealing a drive ? as Rob pointed out, if that is happening we have bigger problems. I do wonder if there are increased risks of bit-rot/file corruption with encryption, compression, and data deduplication. Have there been any studies on this? Could pulling a file off a drive that requires reversal of the auto-encryption and auto-compression in place at the system level mean a greater risk of bits flipping? I am trying to contrast the increased ?handling? and change required to get from the stored version to the original version vs the decreased ?handling? it would require if what I am pulling off the storage device is exactly what I sent to be stored. I am less worried about issues related to not being able to decrypt content. The storage solutions we are contemplating would remain under enough ongoing management that these issues should be avoidable. Since ensuring that non-public records remain secure is also very important, encryption gets some points in the ?pro? column. I agree that having multiple copies in different storage architectures and with different vendors would also decrease risk. I want to understand the risks related to the different storage architectures and the ever increasing number of ?automatic? things being done to digital objects in the process of them being stored and retrieved. Are there people doing work, independent of vendor claims, to document these types of risks? Thank you, Jeanne Jeanne Kramer-Smyth IT Officer, Information Management Services II Information and Technology Solutions WBG Library & Archives of Development T 202-473-9803 E jkramersmyth at worldbankgroup.org W www.worldbank.org [http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg] spellboundblog [http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg] jkramersmyth [http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg] jkramersmyth A 1818 H St NW Washington, DC 20433 From: gail at trumantechnologies.com [mailto:gail at trumantechnologies.com] Sent: Thursday, March 16, 2017 3:18 PM To: Robert Spindler >; Jeanne Kramer-Smyth >; pasig-discuss at mail.asis.org Subject: RE: [Pasig-discuss] Risks of encryption & compression built into storage options? Hi all, a good topic! There is new drive technology from Seagate (probably other manufacturers) called "Self Encrypted Drives" (SEDs) which can be used to solve the problem of a person stealing a drive and running off with data. Most cloud services now automatically provide "server side encryption" which means the vendor is doing the encryption for all data at rest (as you point out Jeanne). This is required by HIPAA for all health care data, and is now considered cloud best practice for cloud vendors due to the very real risk of hacking. So, for archival, we need to weigh the data security provided by cloud storage services using server side encryption with the risk of the vendor managing the encryption keys. Which IMO underscores the importance of having multiple copies of all your archival data -- with different vendors and storage architectures or media types if possible. Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? From: Robert Spindler > Date: Thu, March 16, 2017 9:06 am To: Jeanne Kramer-Smyth >, "pasig-discuss at mail.asis.org" > At risk of starting a conversation, here are a couple basic issues from an archival standpoint: Encryption: Who has the keys and what happens should a provider go out of business? Compression: Lossy or Lossless and how does that compression act on different file formats (video/audio). If this is frequently accessed material it becomes more of an issue. Short story: At a CNI meeting perhaps 15 years ago in a session about ebooks I asked a panel of vendors if they would give up the keys to encrypted e-books when they reached public domain. Crickets. Physical discs are not secure given the forensics software widely available today, but if someone can grab a physical disc the provider has more problems than forensics. Rob Spindler University Archivist and Head Archives and Special Collections Arizona State University Libraries Tempe AZ 85287-1006 480.965.9277 http://www.asu.edu/lib/archives From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Jeanne Kramer-Smyth Sent: Thursday, March 16, 2017 8:54 AM To: pasig-discuss at mail.asis.org Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? Is anyone aware of active research into the risks to digital preservation that are posed by built in encryption and compression in both cloud and on-prem storage options? Any and all go-to sources for research and reading on these topics would be very welcome. I am being told by the staff who source storage solutions for my organization that encryption and compression are generally included at the hardware level. That content is automatically encrypted and compressed as it is written to disc ? and then un-encrypted and un-compressed as it is pulled off disc in response to a request. It is advertised as both more secure (someone stealing a physical disc could not, in theory, extract its contents) and more cost efficient (taking up less space). I want to be sure that as we make our choices for long-term storage of permanent digital records that we take these risks into accounts. Thank you! Jeanne Jeanne Kramer-Smyth IT Officer, Information Management Services II Information and Technology Solutions WBG Library & Archives of Development T 202-473-9803 E jkramersmyth at worldbankgroup.org W www.worldbank.org [http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg] spellboundblog [http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg] jkramersmyth [http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg] jkramersmyth A 1818 H St NW Washington, DC 20433 ________________________________ ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -- ---------------------------------------------------- Chris Wood Storage & Data Management Office: 408-782-2757 (Home Office) Office: 408-276-0730 (Work Office) Mobile: 408-218-7313 (Preferred) Email: lw85381 at yahoo.com ---------------------------------------------------- ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00001.png Type: image/png Size: 170 bytes Desc: ATT00001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00005.png Type: image/png Size: 170 bytes Desc: ATT00005.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00006.png Type: image/png Size: 6577 bytes Desc: ATT00006.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mime-attachment.jpg Type: image/jpeg Size: 700 bytes Desc: mime-attachment.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mime-attachment.jpg Type: image/jpeg Size: 11482 bytes Desc: mime-attachment.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mime-attachment.jpg Type: image/jpeg Size: 11424 bytes Desc: mime-attachment.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mime-attachment.jpg Type: image/jpeg Size: 700 bytes Desc: mime-attachment.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mime-attachment.jpg Type: image/jpeg Size: 11482 bytes Desc: mime-attachment.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mime-attachment.jpg Type: image/jpeg Size: 11424 bytes Desc: mime-attachment.jpg URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: not available URL: From peter.burnhill at ed.ac.uk Fri Mar 17 07:16:34 2017 From: peter.burnhill at ed.ac.uk (BURNHILL Peter) Date: Fri, 17 Mar 2017 11:16:34 +0000 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: <48E9420A4871584593FC3D435EF345AAEEEA1154@MBX10.ad.oak.ox.ac.uk> References: <9758a8c7f6644c81bc8ad030c5540dff@kit-msx-29.kit.edu>, <48E9420A4871584593FC3D435EF345AAEEEA1154@MBX10.ad.oak.ox.ac.uk> Message-ID: Just seen this post from Neil. Comment added inline below. In general, if planning for the long, then reckon that the unusual will happen more often [?] Peter ________________________________ > It?s quite simple ? when one record is corrupted or altered then a checksum doesn?t always catch it - for instance an error upstream of the checksumming mechanism that gets committed to storage through legitimate channels. People tend to assume erroneously that software, hardware and people work correctly ? the most common causes of corruption are going to be human error and software faults. Ray has already described the long chain of error-prone transmission even simple operations entail without even getting into software, operating systems and human processes ? all of which are considerably less reliable than modern hardware. > With that sort of error then you have two copies of supposedly correct information and no way of telling which is correct. If you have three copies maintained in a suitably independent way then the error should only affect one and it stands out clearly. However, ensuring that the three copies are genuinely independent is not simple ? human error can often propagate to all three. This is where versioning and audit trails become essential. Agreed, to both points: for detection & correction you need at least 3 replicates (for reasons of logic and behind basis of statistical inference); for prevention you might need more than 3 (and for more on that, set aside an hour or so and enjoy http://blog.dshr.org/2014/04/what-could-possibly-go-wrong.html ) [http://2.bp.blogspot.com/-sRzb32vkILc/Uz3h_gAiYuI/AAAAAAAACkQ/pCff28K1_-o/w1200-h630-p-k-no-nu/Mote.png] What Could Possibly Go Wrong? blog.dshr.org I gave a talk at UC Berkeley's Swarm Lab entitled "What Could Possibly Go Wrong?" It was an initial attempt to summarize for non-preservati... > In terms of encryption ? if some else can?t get the data out of the medium in the absence of additional hardware and software then neither can you. The more opportunity for error, the more you get. From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of van Wezel, Jos (SCC) Sent: 17 March 2017 07:48 To: Chris Wood ; Raymond A. Clarke ; gail at trumantechnologies.com; 'Jeanne Kramer-Smyth' ; 'Robert Spindler' ; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Chris, do you happen to have any reference to the mathatical correctness or computation that 3 copies is optimal. Is proof based on the standard ecc values that vendors list with their components (tapes, disks, transport lines, memory etc). I'm asking because its difficult to argue for the additional costs of a third copy without the math. Currently I can't tell my customers how much (as in percentage) extra security an addittional copy will bring, even theoretically. regards jos Sent from my Samsung Galaxy smartphone. -------- Original message -------- From: Chris Wood > Date: 17/03/2017 02:07 (GMT+01:00) To: "Raymond A. Clarke" >, gail at trumantechnologies.com, 'Jeanne Kramer-Smyth' >, 'Robert Spindler' >, pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Thanks Ray as always for a great summary. Now my three bits: Three (3) copies please. One of which is in a remote location on a different flood plane, Electric grid, fault line etc. for the obvious reasons. Mathematically, this has turned out to be the optimal number looked at with a cost/benefit mindset. Kind of like: 2 is better than one, buta local problem gets both copies. Three (remote) is more expensive but you get A LOT more data resilience/persistence. Four costs a bunch more, but delivers just a little bit more resilience. Four+ are all examples of ever diminishing returns. CW On 3/16/2017 4:40 PM, Raymond A. Clarke wrote: Hello All, A few years back, I did some research on bit-rot and data corruption, as it relates to the various medium that data passes through, on its way to and from the user. Consider this simple example; as data from memory to HBA to cable to air to cable and so on, bits can be lost along way at any one of, or several of the medium transit points. This something that current technologies can help with, in part. Back to the original question, :how do we insure against corruption, either from compression, encryption? and/or transmission? Well disk and tape(data resting places, if you will) have a come very long way in reducing bit-error rates, compression and encryption. But the ?resting places? are only part of a problem. In accordance with Gail?s suggestion and as Dr. Rosenthal has coined, LOCKSS (?lot of copies keep stuff safe?). Take good care, Raymond From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of gail at trumantechnologies.com Sent: Thursday, March 16, 2017 5:10 PM To: Jeanne Kramer-Smyth ; Robert Spindler ; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Hello again, Jeanne, I think you're hitting on something that needs to be raised to (and pushed for with) vendors, and that is the need for "More transparency" and the reporting to customers of "events" that are part of the provenance of a digital object. The storage architectures do a good job of error detection and self healing; however, they do not report this out. I'd like to (this is my dream) have vendors report back to customers (as part of their SLA) when a object (or part of an object if it's been chunked) has been repaired/self-healed - or lost forever. I could then record this as a PREMIS event. As you know, vendors "design for" 11x9s or 13x9s durability, but their SLAs do not require them to tell us if their durability and data corruption starts to get really bad for whatever reason. I've not directly answered your question about whether the encryption, dedupe, compression, and other things that can happen inside a storage system is increasing the risk of corruption. I'll look around. I am sure the disk vendors and storage solution and cloud storage vendors have run the numbers, but am not sure if they're made public. This alias has people from Oracle, Seagate and other storage companies on it so I encourage them to please share any research they have on this - Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: RE: [Pasig-discuss] Risks of encryption & compression built into storage options? From: Jeanne Kramer-Smyth > Date: Thu, March 16, 2017 1:44 pm To: "gail at trumantechnologies.com" >, "Robert Spindler" >, "pasig-discuss at mail.asis.org" > Thanks Gail & Rob for your replies. I am less worried about the scenario of someone stealing a drive ? as Rob pointed out, if that is happening we have bigger problems. I do wonder if there are increased risks of bit-rot/file corruption with encryption, compression, and data deduplication. Have there been any studies on this? Could pulling a file off a drive that requires reversal of the auto-encryption and auto-compression in place at the system level mean a greater risk of bits flipping? I am trying to contrast the increased ?handling? and change required to get from the stored version to the original version vs the decreased ?handling? it would require if what I am pulling off the storage device is exactly what I sent to be stored. I am less worried about issues related to not being able to decrypt content. The storage solutions we are contemplating would remain under enough ongoing management that these issues should be avoidable. Since ensuring that non-public records remain secure is also very important, encryption gets some points in the ?pro? column. I agree that having multiple copies in different storage architectures and with different vendors would also decrease risk. I want to understand the risks related to the different storage architectures and the ever increasing number of ?automatic? things being done to digital objects in the process of them being stored and retrieved. Are there people doing work, independent of vendor claims, to document these types of risks? Thank you, Jeanne Jeanne Kramer-Smyth IT Officer, Information Management Services II [http://siteresources.worldbank.org/NEWS/Images/spacer.png] Information and Technology Solutions WBG Library & Archives of Development T 202-473-9803 E jkramersmyth at worldbankgroup.org W www.worldbank.org [http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg] spellboundblog [http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg] jkramersmyth [http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg] jkramersmyth A 1818 H St NW Washington, DC 20433 [http://siteresources.worldbank.org/NEWS/Images/spacer.png] [http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png] From: gail at trumantechnologies.com [mailto:gail at trumantechnologies.com] Sent: Thursday, March 16, 2017 3:18 PM To: Robert Spindler >; Jeanne Kramer-Smyth >; pasig-discuss at mail.asis.org Subject: RE: [Pasig-discuss] Risks of encryption & compression built into storage options? Hi all, a good topic! There is new drive technology from Seagate (probably other manufacturers) called "Self Encrypted Drives" (SEDs) which can be used to solve the problem of a person stealing a drive and running off with data. Most cloud services now automatically provide "server side encryption" which means the vendor is doing the encryption for all data at rest (as you point out Jeanne). This is required by HIPAA for all health care data, and is now considered cloud best practice for cloud vendors due to the very real risk of hacking. So, for archival, we need to weigh the data security provided by cloud storage services using server side encryption with the risk of the vendor managing the encryption keys. Which IMO underscores the importance of having multiple copies of all your archival data -- with different vendors and storage architectures or media types if possible. Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? From: Robert Spindler > Date: Thu, March 16, 2017 9:06 am To: Jeanne Kramer-Smyth >, "pasig-discuss at mail.asis.org" > At risk of starting a conversation, here are a couple basic issues from an archival standpoint: Encryption: Who has the keys and what happens should a provider go out of business? Compression: Lossy or Lossless and how does that compression act on different file formats (video/audio). If this is frequently accessed material it becomes more of an issue. Short story: At a CNI meeting perhaps 15 years ago in a session about ebooks I asked a panel of vendors if they would give up the keys to encrypted e-books when they reached public domain. Crickets. Physical discs are not secure given the forensics software widely available today, but if someone can grab a physical disc the provider has more problems than forensics. Rob Spindler University Archivist and Head Archives and Special Collections Arizona State University Libraries Tempe AZ 85287-1006 480.965.9277 http://www.asu.edu/lib/archives From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Jeanne Kramer-Smyth Sent: Thursday, March 16, 2017 8:54 AM To: pasig-discuss at mail.asis.org Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? Is anyone aware of active research into the risks to digital preservation that are posed by built in encryption and compression in both cloud and on-prem storage options? Any and all go-to sources for research and reading on these topics would be very welcome. I am being told by the staff who source storage solutions for my organization that encryption and compression are generally included at the hardware level. That content is automatically encrypted and compressed as it is written to disc ? and then un-encrypted and un-compressed as it is pulled off disc in response to a request. It is advertised as both more secure (someone stealing a physical disc could not, in theory, extract its contents) and more cost efficient (taking up less space). I want to be sure that as we make our choices for long-term storage of permanent digital records that we take these risks into accounts. Thank you! Jeanne Jeanne Kramer-Smyth IT Officer, Information Management Services II [http://siteresources.worldbank.org/NEWS/Images/spacer.png] Information and Technology Solutions WBG Library & Archives of Development T 202-473-9803 E jkramersmyth at worldbankgroup.org W www.worldbank.org [http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg] spellboundblog [http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg] jkramersmyth [http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg] jkramersmyth A 1818 H St NW Washington, DC 20433 [http://siteresources.worldbank.org/NEWS/Images/spacer.png] [http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png] ________________________________ ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -- ---------------------------------------------------- Chris Wood Storage & Data Management Office: 408-782-2757 (Home Office) Office: 408-276-0730 (Work Office) Mobile: 408-218-7313 (Preferred) Email: lw85381 at yahoo.com ---------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 170 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 170 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.png Type: image/png Size: 6577 bytes Desc: image003.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OutlookEmoji-?.png Type: image/png Size: 488 bytes Desc: OutlookEmoji-?.png URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jonathan.tilbury at preservica.com Fri Mar 17 11:28:54 2017 From: jonathan.tilbury at preservica.com (Jonathan Tilbury) Date: Fri, 17 Mar 2017 15:28:54 +0000 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: <1312844D-7E81-462A-B2E9-8B3E77B99ECC@Verizon.net> References: <20170316140956.b554e26909f2beaf9f8ddbf6be9a6600.ee7a29052e.wbe@email09.godaddy.com> <02a701d29eae$c4722a00$4d567e00$@Verizon.net> <1312844D-7E81-462A-B2E9-8B3E77B99ECC@Verizon.net> Message-ID: I can give you an example of how we address this for our Preservica Cloud Edition, hosted in Amazon Web Services. Fixity calculation ? we calculate this on all files on the source machine (such as your laptop) from where you are loading the content. This is the earliest we can calculate it as it is the first time we come into contact with the content. You can use one of 4 fixity algorithms. Geographical separation ? Amazon S3 and Glacier default to storing each object (file) more in at least three different data centres within a region, and may have multiple copies in each data centre. The data centres are located in safer locations (e.g. not in earthquake zones) typically within a 10km radius. It?s possible to send copies to another Amazon zone maybe 1000?s of km away if you consider this too much risk. It?s also possible to write copies of the files and the metadata to a remote SFTP server. Fixity checking - Each copy is check-summed by both us and Amazon on arrival. We confirm this was not changed on the way in. Amazon check the fixity regularly. If a corruption is noted the system will self-heal from one of the other copies. In addition Preservica can be set up to cycle through the objects on S3 to do its own fixity checks to ensure the objects are still there from its own perspective. Also, we check the fixity when the file is retrieved to ensure it is still uncorrupted. Encryption at rest ? Amazon S3 can be set up to encrypt the information on disk and either manage the keys themselves or leave it to the customer (i.e. us) to manage. This is of course a risk of key loss but it is possible to escrow the keys to ensure they are safe. However, as all you are protecting against is theft if the hardware from the data centre you may choose not to encrypt the data on disk. Encryption in flight ? we recommend setting all information transport to use HTTPS to reduce the risk of packet interception and inspection. I hope this helps Jon Tilbury CTO, Preservica From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Raymond A. Clarke Sent: 17 March 2017 13:45 To: Chris Wood Cc: pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Right on Chris. Thanks. Apologies for any typos, Sent from my iPhone Take good care, Ray On Mar 16, 2017, at 9:03 PM, Chris Wood > wrote: Thanks Ray as always for a great summary. Now my three bits: Three (3) copies please. One of which is in a remote location on a different flood plane, Electric grid, fault line etc. for the obvious reasons. Mathematically, this has turned out to be the optimal number looked at with a cost/benefit mindset. Kind of like: 2 is better than one, buta local problem gets both copies. Three (remote) is more expensive but you get A LOT more data resilience/persistence. Four costs a bunch more, but delivers just a little bit more resilience. Four+ are all examples of ever diminishing returns. CW On 3/16/2017 4:40 PM, Raymond A. Clarke wrote: Hello All, A few years back, I did some research on bit-rot and data corruption, as it relates to the various medium that data passes through, on its way to and from the user. Consider this simple example; as data from memory to HBA to cable to air to cable and so on, bits can be lost along way at any one of, or several of the medium transit points. This something that current technologies can help with, in part. Back to the original question, :how do we insure against corruption, either from compression, encryption? and/or transmission? Well disk and tape(data resting places, if you will) have a come very long way in reducing bit-error rates, compression and encryption. But the ?resting places? are only part of a problem. In accordance with Gail?s suggestion and as Dr. Rosenthal has coined, LOCKSS (?lot of copies keep stuff safe?). Take good care, Raymond From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of gail at trumantechnologies.com Sent: Thursday, March 16, 2017 5:10 PM To: Jeanne Kramer-Smyth ; Robert Spindler ; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Hello again, Jeanne, I think you're hitting on something that needs to be raised to (and pushed for with) vendors, and that is the need for "More transparency" and the reporting to customers of "events" that are part of the provenance of a digital object. The storage architectures do a good job of error detection and self healing; however, they do not report this out. I'd like to (this is my dream) have vendors report back to customers (as part of their SLA) when a object (or part of an object if it's been chunked) has been repaired/self-healed - or lost forever. I could then record this as a PREMIS event. As you know, vendors "design for" 11x9s or 13x9s durability, but their SLAs do not require them to tell us if their durability and data corruption starts to get really bad for whatever reason. I've not directly answered your question about whether the encryption, dedupe, compression, and other things that can happen inside a storage system is increasing the risk of corruption. I'll look around. I am sure the disk vendors and storage solution and cloud storage vendors have run the numbers, but am not sure if they're made public. This alias has people from Oracle, Seagate and other storage companies on it so I encourage them to please share any research they have on this - Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: RE: [Pasig-discuss] Risks of encryption & compression built into storage options? From: Jeanne Kramer-Smyth > Date: Thu, March 16, 2017 1:44 pm To: "gail at trumantechnologies.com" >, "Robert Spindler" >, "pasig-discuss at mail.asis.org" > Thanks Gail & Rob for your replies. I am less worried about the scenario of someone stealing a drive ? as Rob pointed out, if that is happening we have bigger problems. I do wonder if there are increased risks of bit-rot/file corruption with encryption, compression, and data deduplication. Have there been any studies on this? Could pulling a file off a drive that requires reversal of the auto-encryption and auto-compression in place at the system level mean a greater risk of bits flipping? I am trying to contrast the increased ?handling? and change required to get from the stored version to the original version vs the decreased ?handling? it would require if what I am pulling off the storage device is exactly what I sent to be stored. I am less worried about issues related to not being able to decrypt content. The storage solutions we are contemplating would remain under enough ongoing management that these issues should be avoidable. Since ensuring that non-public records remain secure is also very important, encryption gets some points in the ?pro? column. I agree that having multiple copies in different storage architectures and with different vendors would also decrease risk. I want to understand the risks related to the different storage architectures and the ever increasing number of ?automatic? things being done to digital objects in the process of them being stored and retrieved. Are there people doing work, independent of vendor claims, to document these types of risks? Thank you, Jeanne Jeanne Kramer-Smyth IT Officer, Information Management Services II Information and Technology Solutions WBG Library & Archives of Development T 202-473-9803 E jkramersmyth at worldbankgroup.org W www.worldbank.org [http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg] spellboundblog [http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg] jkramersmyth jkramersmyth A 1818 H St NW Washington, DC 20433 From: gail at trumantechnologies.com [mailto:gail at trumantechnologies.com] Sent: Thursday, March 16, 2017 3:18 PM To: Robert Spindler >; Jeanne Kramer-Smyth >; pasig-discuss at mail.asis.org Subject: RE: [Pasig-discuss] Risks of encryption & compression built into storage options? Hi all, a good topic! There is new drive technology from Seagate (probably other manufacturers) called "Self Encrypted Drives" (SEDs) which can be used to solve the problem of a person stealing a drive and running off with data. Most cloud services now automatically provide "server side encryption" which means the vendor is doing the encryption for all data at rest (as you point out Jeanne). This is required by HIPAA for all health care data, and is now considered cloud best practice for cloud vendors due to the very real risk of hacking. So, for archival, we need to weigh the data security provided by cloud storage services using server side encryption with the risk of the vendor managing the encryption keys. Which IMO underscores the importance of having multiple copies of all your archival data -- with different vendors and storage architectures or media types if possible. Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? From: Robert Spindler > Date: Thu, March 16, 2017 9:06 am To: Jeanne Kramer-Smyth >, "pasig-discuss at mail.asis.org" > At risk of starting a conversation, here are a couple basic issues from an archival standpoint: Encryption: Who has the keys and what happens should a provider go out of business? Compression: Lossy or Lossless and how does that compression act on different file formats (video/audio). If this is frequently accessed material it becomes more of an issue. Short story: At a CNI meeting perhaps 15 years ago in a session about ebooks I asked a panel of vendors if they would give up the keys to encrypted e-books when they reached public domain. Crickets. Physical discs are not secure given the forensics software widely available today, but if someone can grab a physical disc the provider has more problems than forensics. Rob Spindler University Archivist and Head Archives and Special Collections Arizona State University Libraries Tempe AZ 85287-1006 480.965.9277 http://www.asu.edu/lib/archives From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Jeanne Kramer-Smyth Sent: Thursday, March 16, 2017 8:54 AM To: pasig-discuss at mail.asis.org Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? Is anyone aware of active research into the risks to digital preservation that are posed by built in encryption and compression in both cloud and on-prem storage options? Any and all go-to sources for research and reading on these topics would be very welcome. I am being told by the staff who source storage solutions for my organization that encryption and compression are generally included at the hardware level. That content is automatically encrypted and compressed as it is written to disc ? and then un-encrypted and un-compressed as it is pulled off disc in response to a request. It is advertised as both more secure (someone stealing a physical disc could not, in theory, extract its contents) and more cost efficient (taking up less space). I want to be sure that as we make our choices for long-term storage of permanent digital records that we take these risks into accounts. Thank you! Jeanne Jeanne Kramer-Smyth IT Officer, Information Management Services II Information and Technology Solutions WBG Library & Archives of Development T 202-473-9803 E jkramersmyth at worldbankgroup.org W www.worldbank.org [http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg] spellboundblog [http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg] jkramersmyth jkramersmyth A 1818 H St NW Washington, DC 20433 ________________________________ ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -- ---------------------------------------------------- Chris Wood Storage & Data Management Office: 408-782-2757 (Home Office) Office: 408-276-0730 (Work Office) Mobile: 408-218-7313 (Preferred) Email: lw85381 at yahoo.com ---------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 700 bytes Desc: image001.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 11482 bytes Desc: image002.jpg URL: From lw85381 at yahoo.com Fri Mar 17 12:48:40 2017 From: lw85381 at yahoo.com (Chris Wood) Date: Fri, 17 Mar 2017 09:48:40 -0700 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: <9758a8c7f6644c81bc8ad030c5540dff@kit-msx-29.kit.edu> References: <9758a8c7f6644c81bc8ad030c5540dff@kit-msx-29.kit.edu> Message-ID: <98f7186f-6bef-6f1b-82cd-ad2e4b662f9b@yahoo.com> Jos: I just knew somebody would ask this. Ha. Several years ago several of us wrote a paper for the MPEG (Motion Pictures Expert Group) and a mathematician named Jeff Bonwick figured out all the math. I haven't found it yet in the junk heap of my PC, but did find a companion paper written by by the same set of authors. It's not exactly, what you are looking for, but close. It's more about Bit Error Rates at a rather low level. I will continue to look for the MPEG paper. It's got to be somewhere. The Internet "never forgets" Right? Stay tuned as I keep looking. CW On 3/17/2017 12:48 AM, van Wezel, Jos (SCC) wrote: > Chris, > do you happen to have any reference to the mathatical correctness or > computation that 3 copies is optimal. Is proof based on the standard > ecc values that vendors list with their components (tapes, disks, > transport lines, memory etc). I'm asking because its difficult to > argue for the additional costs of a third copy without the math. > Currently I can't tell my customers how much (as in percentage) extra > security an addittional copy will bring, even theoretically. > > regards > > jos > > Sent from my Samsung Galaxy smartphone. > > -------- Original message -------- > From: Chris Wood > Date: 17/03/2017 02:07 (GMT+01:00) > To: "Raymond A. Clarke" , > gail at trumantechnologies.com, 'Jeanne Kramer-Smyth' > , 'Robert Spindler' > , pasig-discuss at mail.asis.org > Subject: Re: [Pasig-discuss] Risks of encryption & compression built > into storage options? > > Thanks Ray as always for a great summary. Now my three bits: > > Three (3) copies please. One of which is in a remote location on a > different flood plane, Electric grid, fault line etc. for the obvious > reasons. Mathematically, this has turned out to be the optimal number > looked at with a cost/benefit mindset. Kind of like: 2 is better than > one, buta local problem gets both copies. Three (remote) is more > expensive but you get A LOT more data resilience/persistence. Four > costs a bunch more, but delivers just a little bit more resilience. > Four+ are all examples of ever diminishing returns. > > CW > > On 3/16/2017 4:40 PM, Raymond A. Clarke wrote: >> >> Hello All, >> >> A few years back, I did some research on bit-rot and data corruption, >> as it relates to the various medium that data passes through, on its >> way to and from the user. Consider this simple example; as data from >> memory to HBA to cable to air to cable and so on, bits can be lost >> along way at any one of, or several of the medium transit points. >> This something that current technologies can help with, in part. >> Back to the original question, :how do we insure against corruption, >> either from compression, encryption? and/or transmission? Well disk >> and tape(/data resting places/, if you will) have a come very long >> way in reducing bit-error rates, compression and encryption. But the >> ?/resting places?/ are only part of a problem. In accordance with >> Gail?s suggestion and as Dr. Rosenthal has coined, LOCKSS (?lot of >> copies keep stuff safe?). >> >> Take good care, >> >> Raymond >> >> *From:*Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] *On >> Behalf Of *gail at trumantechnologies.com >> *Sent:* Thursday, March 16, 2017 5:10 PM >> *To:* Jeanne Kramer-Smyth ; Robert >> Spindler ; pasig-discuss at mail.asis.org >> *Subject:* Re: [Pasig-discuss] Risks of encryption & compression >> built into storage options? >> >> Hello again, Jeanne, >> >> I think you're hitting on something that needs to be raised to (and >> pushed for with) vendors, and that is the need for "More >> transparency" and the reporting to customers of "events" that are >> part of the provenance of a digital object. The storage architectures >> do a good job of error detection and self healing; however, they do >> not report this out. I'd like to (this is my dream) have vendors >> report back to customers (as part of their SLA) when a object (or >> part of an object if it's been chunked) has been repaired/self-healed >> - or lost forever. I could then record this as a PREMIS event. As you >> know, vendors "design for" 11x9s or 13x9s durability, but their SLAs >> do not require them to tell us if their durability and data >> corruption starts to get really bad for whatever reason. >> >> I've not directly answered your question about whether the >> encryption, dedupe, compression, and other things that can happen >> inside a storage system is increasing the risk of corruption. I'll >> look around. I am sure the disk vendors and storage solution and >> cloud storage vendors have run the numbers, but am not sure if >> they're made public. >> >> This alias has people from Oracle, Seagate and other storage >> companies on it so I encourage them to please share any research they >> have on this - >> >> Gail >> >> Gail Truman >> >> Truman Technologies, LLC >> >> Certified Digital Archives Specialist, Society of American Archivists >> >> /*Protecting the world's digital heritage for future generations*/ >> >> www.trumantechnologies.com >> >> facebook/TrumanTechnologies >> >> https://www.linkedin.com/in/gtruman >> >> +1 510 502 6497 >> >> -------- Original Message -------- >> Subject: RE: [Pasig-discuss] Risks of encryption & compression built >> into storage options? >> From: Jeanne Kramer-Smyth > > >> Date: Thu, March 16, 2017 1:44 pm >> To: "gail at trumantechnologies.com >> " >> > >, "Robert >> Spindler" >, >> "pasig-discuss at mail.asis.org " >> > >> >> Thanks Gail & Rob for your replies. >> >> I am less worried about the scenario of someone stealing a drive >> ? as Rob pointed out, if that is happening we have bigger problems. >> >> I do wonder if there are increased risks of bit-rot/file >> corruption with encryption, compression, and data deduplication. >> Have there been any studies on this? Could pulling a file off a >> drive that requires reversal of the auto-encryption and >> auto-compression in place at the system level mean a greater risk >> of bits flipping? I am trying to contrast the increased >> ?handling? and change required to get from the stored version to >> the original version vs the decreased ?handling? it would require >> if what I am pulling off the storage device is exactly what I >> sent to be stored. >> >> I am less worried about issues related to not being able to >> decrypt content. The storage solutions we are contemplating would >> remain under enough ongoing management that these issues should >> be avoidable. Since ensuring that non-public records remain >> secure is also very important, encryption gets some points in the >> ?pro? column. I agree that having multiple copies in different >> storage architectures and with different vendors would also >> decrease risk. >> >> I want to understand the risks related to the different storage >> architectures and the ever increasing number of ?automatic? >> things being done to digital objects in the process of them being >> stored and retrieved. Are there people doing work, independent of >> vendor claims, to document these types of risks? >> >> Thank you, >> >> Jeanne >> >> *Jeanne Kramer-Smyth* >> >> *IT Officer, Information Management Services II* >> >> http://siteresources.worldbank.org/NEWS/Images/spacer.png >> >> *Information and Technology Solutions* >> >> *WBG Library & Archives of Development* >> >> T >> >> >> >> 202-473-9803 >> >> E >> >> >> >> jkramersmyth at worldbankgroup.org >> >> >> W >> >> >> >> www.worldbank.org >> >> >> http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg >> >> >> >> spellboundblog >> >> http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg >> >> >> >> jkramersmyth >> >> http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg >> >> >> >> jkramersmyth >> >> A >> >> >> >> 1818 H St NW Washington, DC 20433 >> >> http://siteresources.worldbank.org/NEWS/Images/spacer.png >> >> http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png >> >> *From:*gail at trumantechnologies.com >> >> [mailto:gail at trumantechnologies.com] >> *Sent:* Thursday, March 16, 2017 3:18 PM >> *To:* Robert Spindler > >; Jeanne Kramer-Smyth >> > >; >> pasig-discuss at mail.asis.org >> *Subject:* RE: [Pasig-discuss] Risks of encryption & compression >> built into storage options? >> >> Hi all, a good topic! >> >> There is new drive technology from Seagate (probably other >> manufacturers) called "Self Encrypted Drives" (SEDs) which can be >> used to solve the problem of a person stealing a drive and >> running off with data. >> >> Most cloud services now automatically provide "server side >> encryption" which means the vendor is doing the encryption for >> all data at rest (as you point out Jeanne). This is required by >> HIPAA for all health care data, and is now considered cloud best >> practice for cloud vendors due to the very real risk of hacking. >> So, for archival, we need to weigh the data security provided by >> cloud storage services using server side encryption with the risk >> of the vendor managing the encryption keys. Which IMO underscores >> the importance of having multiple copies of all your archival >> data -- with different vendors and storage architectures or media >> types if possible. >> >> Gail >> >> Gail Truman >> >> Truman Technologies, LLC >> >> Certified Digital Archives Specialist, Society of American Archivists >> >> /*Protecting the world's digital heritage for future generations*/ >> >> www.trumantechnologies.com >> >> facebook/TrumanTechnologies >> >> https://www.linkedin.com/in/gtruman >> >> +1 510 502 6497 >> >> -------- Original Message -------- >> Subject: Re: [Pasig-discuss] Risks of encryption & >> compression built >> into storage options? >> From: Robert Spindler > > >> Date: Thu, March 16, 2017 9:06 am >> To: Jeanne Kramer-Smyth > >, >> "pasig-discuss at mail.asis.org >> " >> > > >> >> At risk of starting a conversation, here are a couple basic >> issues from an archival standpoint: >> >> Encryption: Who has the keys and what happens should a >> provider go out of business? >> >> Compression: Lossy or Lossless and how does that compression >> act on different file formats (video/audio). If this is >> frequently accessed material it becomes more of an issue. >> >> Short story: At a CNI meeting perhaps 15 years ago in a >> session about ebooks I asked a panel of vendors if they would >> give up the keys to encrypted e-books when they reached >> public domain. Crickets. >> >> Physical discs are not secure given the forensics software >> widely available today, but if someone can grab a physical >> disc the provider has more problems than forensics. >> >> Rob Spindler >> >> University Archivist and Head >> >> Archives and Special Collections >> >> Arizona State University Libraries >> >> Tempe AZ 85287-1006 >> >> 480.965.9277 >> >> http://www.asu.edu/lib/archives >> >> *From:*Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] >> *On Behalf Of *Jeanne Kramer-Smyth >> *Sent:* Thursday, March 16, 2017 8:54 AM >> *To:* pasig-discuss at mail.asis.org >> >> *Subject:* [Pasig-discuss] Risks of encryption & compression >> built into storage options? >> >> Is anyone aware of active research into the risks to digital >> preservation that are posed by built in encryption and >> compression in both cloud and on-prem storage options? Any >> and all go-to sources for research and reading on these >> topics would be very welcome. >> >> I am being told by the staff who source storage solutions for >> my organization that encryption and compression are generally >> included at the hardware level. That content is automatically >> encrypted and compressed as it is written to disc ? and then >> un-encrypted and un-compressed as it is pulled off disc in >> response to a request. It is advertised as both more secure >> (someone stealing a physical disc could not, in theory, >> extract its contents) and more cost efficient (taking up less >> space). >> >> I want to be sure that as we make our choices for long-term >> storage of permanent digital records that we take these risks >> into accounts. >> >> Thank you! >> >> Jeanne >> >> *Jeanne Kramer-Smyth* >> >> *IT Officer, Information Management Services II* >> >> http://siteresources.worldbank.org/NEWS/Images/spacer.png >> >> *Information and Technology Solutions* >> >> *WBG Library & Archives of Development* >> >> T >> >> >> >> 202-473-9803 >> >> E >> >> >> >> jkramersmyth at worldbankgroup.org >> >> >> W >> >> >> >> www.worldbank.org >> >> >> http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg >> >> >> >> spellboundblog >> >> http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg >> >> >> >> jkramersmyth >> >> http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg >> >> >> >> jkramersmyth >> >> A >> >> >> >> 1818 H St NW Washington, DC 20433 >> >> http://siteresources.worldbank.org/NEWS/Images/spacer.png >> >> http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png >> >> ------------------------------------------------------------------------ >> >> ---- >> To subscribe, unsubscribe, or modify your subscription, >> please visit >> http://mail.asis.org/mailman/listinfo/pasig-discuss >> _______ >> PASIG Webinars and conference material is at >> http://www.preservationandarchivingsig.org/index.html >> _______________________________________________ >> Pasig-discuss mailing list >> Pasig-discuss at mail.asis.org >> http://mail.asis.org/mailman/listinfo/pasig-discuss >> >> >> >> ---- >> To subscribe, unsubscribe, or modify your subscription, please visit >> http://mail.asis.org/mailman/listinfo/pasig-discuss >> _______ >> PASIG Webinars and conference material is athttp://www.preservationandarchivingsig.org/index.html >> _______________________________________________ >> Pasig-discuss mailing list >> Pasig-discuss at mail.asis.org >> http://mail.asis.org/mailman/listinfo/pasig-discuss > > -- > ---------------------------------------------------- > Chris Wood > Storage & Data Management > Office: 408-782-2757 (Home Office) > Office: 408-276-0730 (Work Office) > Mobile: 408-218-7313 (Preferred) > Email:lw85381 at yahoo.com > ---------------------------------------------------- -- ---------------------------------------------------- Chris Wood Storage & Data Management Office: 408-782-2757 (Home Office) Office: 408-276-0730 (Work Office) Mobile: 408-218-7313 (Preferred) Email: lw85381 at yahoo.com ---------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 170 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 170 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 6577 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Archiving_Movies_in_a_Digital_World.pdf Type: application/pdf Size: 266120 bytes Desc: not available URL: From andrea_goethals at harvard.edu Fri Mar 17 13:37:01 2017 From: andrea_goethals at harvard.edu (Goethals, Andrea) Date: Fri, 17 Mar 2017 17:37:01 +0000 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: References: <9758a8c7f6644c81bc8ad030c5540dff@kit-msx-29.kit.edu> <48E9420A4871584593FC3D435EF345AAEEEA1154@MBX10.ad.oak.ox.ac.uk> Message-ID: Regarding ?> With that sort of error then you have two copies of supposedly correct information and no way of telling which is correct.? ? I think it?s important to maintain fixity information (e.g. SHA/MD5, etc.) values outside of the storage systems as well so that you can be sure that any copy of your content hasn?t changed. It can also come in very handy to check the completeness of transfers, e.g. when migrating to new storage. Andrea From: Pasig-discuss on behalf of BURNHILL Peter Date: Friday, March 17, 2017 at 7:16 AM To: Neil Jefferies , "van Wezel, Jos (SCC)" , Chris Wood , "Raymond A. Clarke" , "gail at trumantechnologies.com" , 'Jeanne Kramer-Smyth' , 'Robert Spindler' , "pasig-discuss at mail.asis.org" Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Just seen this post from Neil. Comment added inline below. In general, if planning for the long, then reckon that the unusual will happen more often [?] Peter ________________________________ > It?s quite simple ? when one record is corrupted or altered then a checksum doesn?t always catch it - for instance an error upstream of the checksumming mechanism that gets committed to storage through legitimate channels. People tend to assume erroneously that software, hardware and people work correctly ? the most common causes of corruption are going to be human error and software faults. Ray has already described the long chain of error-prone transmission even simple operations entail without even getting into software, operating systems and human processes ? all of which are considerably less reliable than modern hardware. > With that sort of error then you have two copies of supposedly correct information and no way of telling which is correct. If you have three copies maintained in a suitably independent way then the error should only affect one and it stands out clearly. However, ensuring that the three copies are genuinely independent is not simple ? human error can often propagate to all three. This is where versioning and audit trails become essential. Agreed, to both points: for detection & correction you need at least 3 replicates (for reasons of logic and behind basis of statistical inference); for prevention you might need more than 3 (and for more on that, set aside an hour or so and enjoy http://blog.dshr.org/2014/04/what-could-possibly-go-wrong.html ) [http://2.bp.blogspot.com/-sRzb32vkILc/Uz3h_gAiYuI/AAAAAAAACkQ/pCff28K1_-o/w1200-h630-p-k-no-nu/Mote.png] What Could Possibly Go Wrong? blog.dshr.org I gave a talk at UC Berkeley's Swarm Lab entitled "What Could Possibly Go Wrong?" It was an initial attempt to summarize for non-preservati... > In terms of encryption ? if some else can?t get the data out of the medium in the absence of additional hardware and software then neither can you. The more opportunity for error, the more you get. From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of van Wezel, Jos (SCC) Sent: 17 March 2017 07:48 To: Chris Wood ; Raymond A. Clarke ; gail at trumantechnologies.com; 'Jeanne Kramer-Smyth' ; 'Robert Spindler' ; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Chris, do you happen to have any reference to the mathatical correctness or computation that 3 copies is optimal. Is proof based on the standard ecc values that vendors list with their components (tapes, disks, transport lines, memory etc). I'm asking because its difficult to argue for the additional costs of a third copy without the math. Currently I can't tell my customers how much (as in percentage) extra security an addittional copy will bring, even theoretically. regards jos Sent from my Samsung Galaxy smartphone. -------- Original message -------- From: Chris Wood > Date: 17/03/2017 02:07 (GMT+01:00) To: "Raymond A. Clarke" >, gail at trumantechnologies.com, 'Jeanne Kramer-Smyth' >, 'Robert Spindler' >, pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Thanks Ray as always for a great summary. Now my three bits: Three (3) copies please. One of which is in a remote location on a different flood plane, Electric grid, fault line etc. for the obvious reasons. Mathematically, this has turned out to be the optimal number looked at with a cost/benefit mindset. Kind of like: 2 is better than one, buta local problem gets both copies. Three (remote) is more expensive but you get A LOT more data resilience/persistence. Four costs a bunch more, but delivers just a little bit more resilience. Four+ are all examples of ever diminishing returns. CW On 3/16/2017 4:40 PM, Raymond A. Clarke wrote: Hello All, A few years back, I did some research on bit-rot and data corruption, as it relates to the various medium that data passes through, on its way to and from the user. Consider this simple example; as data from memory to HBA to cable to air to cable and so on, bits can be lost along way at any one of, or several of the medium transit points. This something that current technologies can help with, in part. Back to the original question, :how do we insure against corruption, either from compression, encryption? and/or transmission? Well disk and tape(data resting places, if you will) have a come very long way in reducing bit-error rates, compression and encryption. But the ?resting places? are only part of a problem. In accordance with Gail?s suggestion and as Dr. Rosenthal has coined, LOCKSS (?lot of copies keep stuff safe?). Take good care, Raymond From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of gail at trumantechnologies.com Sent: Thursday, March 16, 2017 5:10 PM To: Jeanne Kramer-Smyth ; Robert Spindler ; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Hello again, Jeanne, I think you're hitting on something that needs to be raised to (and pushed for with) vendors, and that is the need for "More transparency" and the reporting to customers of "events" that are part of the provenance of a digital object. The storage architectures do a good job of error detection and self healing; however, they do not report this out. I'd like to (this is my dream) have vendors report back to customers (as part of their SLA) when a object (or part of an object if it's been chunked) has been repaired/self-healed - or lost forever. I could then record this as a PREMIS event. As you know, vendors "design for" 11x9s or 13x9s durability, but their SLAs do not require them to tell us if their durability and data corruption starts to get really bad for whatever reason. I've not directly answered your question about whether the encryption, dedupe, compression, and other things that can happen inside a storage system is increasing the risk of corruption. I'll look around. I am sure the disk vendors and storage solution and cloud storage vendors have run the numbers, but am not sure if they're made public. This alias has people from Oracle, Seagate and other storage companies on it so I encourage them to please share any research they have on this - Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: RE: [Pasig-discuss] Risks of encryption & compression built into storage options? From: Jeanne Kramer-Smyth > Date: Thu, March 16, 2017 1:44 pm To: "gail at trumantechnologies.com" >, "Robert Spindler" >, "pasig-discuss at mail.asis.org" > Thanks Gail & Rob for your replies. I am less worried about the scenario of someone stealing a drive ? as Rob pointed out, if that is happening we have bigger problems. I do wonder if there are increased risks of bit-rot/file corruption with encryption, compression, and data deduplication. Have there been any studies on this? Could pulling a file off a drive that requires reversal of the auto-encryption and auto-compression in place at the system level mean a greater risk of bits flipping? I am trying to contrast the increased ?handling? and change required to get from the stored version to the original version vs the decreased ?handling? it would require if what I am pulling off the storage device is exactly what I sent to be stored. I am less worried about issues related to not being able to decrypt content. The storage solutions we are contemplating would remain under enough ongoing management that these issues should be avoidable. Since ensuring that non-public records remain secure is also very important, encryption gets some points in the ?pro? column. I agree that having multiple copies in different storage architectures and with different vendors would also decrease risk. I want to understand the risks related to the different storage architectures and the ever increasing number of ?automatic? things being done to digital objects in the process of them being stored and retrieved. Are there people doing work, independent of vendor claims, to document these types of risks? Thank you, Jeanne Jeanne Kramer-Smyth IT Officer, Information Management Services II [ttp://siteresources.worldbank.org/NEWS/Images/spacer.png] Information and Technology Solutions WBG Library & Archives of Development T 202-473-9803 E jkramersmyth at worldbankgroup.org W www.worldbank.org [ttp://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg] spellboundblog [ttp://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg] jkramersmyth [ttp://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg] jkramersmyth A 1818 H St NW Washington, DC 20433 [ttp://siteresources.worldbank.org/NEWS/Images/spacer.png] [ttp://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png] From: gail at trumantechnologies.com [mailto:gail at trumantechnologies.com] Sent: Thursday, March 16, 2017 3:18 PM To: Robert Spindler >; Jeanne Kramer-Smyth >; pasig-discuss at mail.asis.org Subject: RE: [Pasig-discuss] Risks of encryption & compression built into storage options? Hi all, a good topic! There is new drive technology from Seagate (probably other manufacturers) called "Self Encrypted Drives" (SEDs) which can be used to solve the problem of a person stealing a drive and running off with data. Most cloud services now automatically provide "server side encryption" which means the vendor is doing the encryption for all data at rest (as you point out Jeanne). This is required by HIPAA for all health care data, and is now considered cloud best practice for cloud vendors due to the very real risk of hacking. So, for archival, we need to weigh the data security provided by cloud storage services using server side encryption with the risk of the vendor managing the encryption keys. Which IMO underscores the importance of having multiple copies of all your archival data -- with different vendors and storage architectures or media types if possible. Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? From: Robert Spindler > Date: Thu, March 16, 2017 9:06 am To: Jeanne Kramer-Smyth >, "pasig-discuss at mail.asis.org" > At risk of starting a conversation, here are a couple basic issues from an archival standpoint: Encryption: Who has the keys and what happens should a provider go out of business? Compression: Lossy or Lossless and how does that compression act on different file formats (video/audio). If this is frequently accessed material it becomes more of an issue. Short story: At a CNI meeting perhaps 15 years ago in a session about ebooks I asked a panel of vendors if they would give up the keys to encrypted e-books when they reached public domain. Crickets. Physical discs are not secure given the forensics software widely available today, but if someone can grab a physical disc the provider has more problems than forensics. Rob Spindler University Archivist and Head Archives and Special Collections Arizona State University Libraries Tempe AZ 85287-1006 480.965.9277 http://www.asu.edu/lib/archives From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Jeanne Kramer-Smyth Sent: Thursday, March 16, 2017 8:54 AM To: pasig-discuss at mail.asis.org Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? Is anyone aware of active research into the risks to digital preservation that are posed by built in encryption and compression in both cloud and on-prem storage options? Any and all go-to sources for research and reading on these topics would be very welcome. I am being told by the staff who source storage solutions for my organization that encryption and compression are generally included at the hardware level. That content is automatically encrypted and compressed as it is written to disc ? and then un-encrypted and un-compressed as it is pulled off disc in response to a request. It is advertised as both more secure (someone stealing a physical disc could not, in theory, extract its contents) and more cost efficient (taking up less space). I want to be sure that as we make our choices for long-term storage of permanent digital records that we take these risks into accounts. Thank you! Jeanne Jeanne Kramer-Smyth IT Officer, Information Management Services II [ttp://siteresources.worldbank.org/NEWS/Images/spacer.png] Information and Technology Solutions WBG Library & Archives of Development T 202-473-9803 E jkramersmyth at worldbankgroup.org W www.worldbank.org [ttp://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg] spellboundblog [ttp://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg] jkramersmyth [ttp://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg] jkramersmyth A 1818 H St NW Washington, DC 20433 [ttp://siteresources.worldbank.org/NEWS/Images/spacer.png] [ttp://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png] ________________________________ ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -- ---------------------------------------------------- Chris Wood Storage & Data Management Office: 408-782-2757 (Home Office) Office: 408-276-0730 (Work Office) Mobile: 408-218-7313 (Preferred) Email: lw85381 at yahoo.com ---------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 489 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 171 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.png Type: image/png Size: 171 bytes Desc: image003.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.png Type: image/png Size: 6578 bytes Desc: image004.png URL: From dshr at stanford.edu Fri Mar 17 13:44:53 2017 From: dshr at stanford.edu (David Rosenthal) Date: Fri, 17 Mar 2017 10:44:53 -0700 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: <955B6946-D656-4B96-93C7-25581981F984@ed.ac.uk> References: <9758a8c7f6644c81bc8ad030c5540dff@kit-msx-29.kit.edu> <955B6946-D656-4B96-93C7-25581981F984@ed.ac.uk> Message-ID: <0db48b70-f79e-4b3f-d54c-b1e87c5bfa48@stanford.edu> On 03/17/2017 03:19 AM, BURNHILL Peter wrote: > 2. David Rosenthal must surely have written on the maths of how many copies for LOCKSS, see http://blog.dshr.org/?m=1 Models claiming to estimate loss probability from replication factor, whether true replication or erasure coding, are wildly optimistic and should be treated with great suspicion. There are two reasons: - The models are built on models of underlying failures. The data on which these failure models are typically based are (a) based on manufacturers' reliability claims, and (b) ignore failures upstream of the media. Much research shows that actual failures in the field are (a) vastly more likely than manufacturers' claims, and (b) more likely to be caused by system components other than the media. - The models almost always assume that the failures are un-correlated, because modeling correlated failures is much more difficult, and requires much more data than un-correlated failures. In practice it has been known for decades that failures in storage systems are significantly correlated. Correlations among failures greatly raise the probability of data loss. For replicated systems, three replicas is the absolute minimum IF your threat model excludes all external or internal attacks. Otherwise four (see Byzantine Fault Tolerance). For (k of n) erasure coded systems the absolute minimum is three sites arranged so that k shards can be obtained from any two sites. This is because shards in a single site are subject to correlated failures (e.g. earthquake). David. PS - this discussion is based on a mis-apprehension of how disk technology works. See: https://en.wikipedia.org/wiki/Hardware-based_full_disk_encryption From there: "The drive except for bootup authentication operates just like any drive with no degradation in performance." The encrypted data is never visible outside the drive. So as far as systems using them are concerned, whether the drive encrypts or not is irrelevant. They have one additional failure mode over regular drives; they support a crypto erase command which renders the data inaccessible. The effect as far as the data is concerned is the same as a major head crash. Archival systems that fail if a head crashes are useless, so they must be designed to survive total loss of the data on a drive. There is thus no reason not to use self-encrypting drives, and many reasons why one might want to. But note that their use does not mean there is no reason for the system to encrypt the data sent to the drive (see my earlier mail). From Raymond.Clarke1 at Verizon.net Fri Mar 17 14:09:24 2017 From: Raymond.Clarke1 at Verizon.net (Raymond A. Clarke) Date: Fri, 17 Mar 2017 14:09:24 -0400 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: <48E9420A4871584593FC3D435EF345AAEEEA154C@MBX10.ad.oak.ox.ac.uk> References: <9758a8c7f6644c81bc8ad030c5540dff@kit-msx-29.kit.edu> <54C09C0B-6196-472B-9D05-44C87765D481@vt.edu> <48E9420A4871584593FC3D435EF345AAEEEA154C@MBX10.ad.oak.ox.ac.uk> Message-ID: <067d01d29f49$9c2d7720$d4886560$@Verizon.net> Jos, Neil makes an extremely relevant point, as it relates effectiveness of error correction coding across geographically dispersed locations. Albeit, this is very good conversation, I fear that the conversation may have so what strayed away from your original question. That having to do with quantifying the effective cost of , say two copies versus three. Consider the following; assuming the Bernoulli distribution is applicable and the probability of encountering corrupted data in one of two copies, then the probability is 50%(which I suggest, with today?s technology, is more of a worst case scenario). For one corruption encounter in three copies, the probability would be 33.3%. One in four, 25%, one in five, 20%, and so on. As Chris suggested earlier, the amount of gain or the difference in probability of encountering corrupt data, starts to diminish rather quickly after three copies. In fact, after four copies, the deltas are not worth the cost. I doesn?t get a whole lot better. Now back to my earlier remarks, coupled with Neil?s observations, I still feel that the ?resting places? remain the best place where we can insure some measurable cost to quantifying data integrity. We can pretty readily estimate the cost of two copies versus three(i.e. media costs, network costs, floor space costs, management costs, etc.). Right now, compression is pretty much the best we can do regarding reducing physical storage capacity. Encryption is one the basic steps that we can employ to address security, within the I/O subsystem. At the end of the day, compressed and encrypted data, is just data. Data that conforms to some structure that must remain intact, in order to ultimately turn that data into information. Take good care, Raymond From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Neil Jefferies Sent: Friday, March 17, 2017 11:35 AM To: Paul Mather ; van Wezel, Jos (SCC) Cc: pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? However, erasure coding only protects against certain failure modes and is critically dependent on metadata for fragment reassembly. It is a very effective way of improving local resilience in a more efficient way than RAID arrays but once geographically distributed the benefits diminish. A disaster that takes out a datacentre will probably destroy enough fragments to make rebuild impossible. Until recently many erasure coded systems also did not apply the same resilience to metadata and so were subject to failure rather more than theory would suggest. Most systems tend to also favour uniformity in the fragment storage platform so they are also subject to systemic failures of technology stacks. In general, erasure coding use cases are targeted at availability rather than indestructability. From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Paul Mather Sent: 17 March 2017 14:49 To: van Wezel, Jos (SCC) > Cc: pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? On Mar 17, 2017, at 3:48 AM, van Wezel, Jos (SCC) > wrote: Chris, do you happen to have any reference to the mathatical correctness or computation that 3 copies is optimal. Is proof based on the standard ecc values that vendors list with their components (tapes, disks, transport lines, memory etc). I'm asking because its difficult to argue for the additional costs of a third copy without the math. Currently I can't tell my customers how much (as in percentage) extra security an addittional copy will bring, even theoretically. One thing I don't believe I've seen mentioned so far in regards to redundancy costs is switching to erasure-resilient coding rather than using plain replication. Explained briefly, erasure-resilient coding represents a logical unit of data as k fragments. These k fragments are then encoded into a larger unit of n fragments, n > k, where the n-k extra fragments can be thought of as "parity" fragments. The n encoded fragments may then be distributed across different disks, racks, and data centres. The value is that *any* k out of n fragments may be used to reconstitute the original logical unit of data. As n grows larger, the probability of total data loss grows smaller, and, conversely, the storage overhead and cost grows larger, allowing you to choose your cost/risk balance. The main disadvantage of erasure-resilient coding is that data I/O latency is increased due to the inherently distributed nature of the storage approach. There are comparisons between replication and erasure-resilient coding systems. One such (https://dl.acm.org/citation.cfm?id=687814) concludes, "We show that systems employing erasure codes have mean time to failures many orders of magnitude higher than replicated systems with similar storage and bandwidth requirements. More importantly, erasure-resilient systems use an order of magnitude less bandwidth and storage to provide similar system durability as replicated systems." Erasure-resilient coding is becoming mainstream in Cloud storage and object storage systems in general. I believe that Hadoop has recently acquired an erasure-resilient coding storage option for HDFS as an alternative to the standard replication model. This is due to the increase in data set sizes, where erasure-resilient coding can offer lower redundancy overheads than plain replication options, yet still offering the same or higher assurance levels on data availability. I also believe CEPH and OpenStack Swift are supporting erasure-resilient storage. Cheers, Paul. regards jos Sent from my Samsung Galaxy smartphone. -------- Original message -------- From: Chris Wood < lw85381 at yahoo.com> Date: 17/03/2017 02:07 (GMT+01:00) To: "Raymond A. Clarke" < Raymond.Clarke1 at Verizon.net>, gail at trumantechnologies.com, 'Jeanne Kramer-Smyth' < jkramersmyth at worldbankgroup.org>, 'Robert Spindler' < rob.spindler at asu.edu>, pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Thanks Ray as always for a great summary. Now my three bits: Three (3) copies please. One of which is in a remote location on a different flood plane, Electric grid, fault line etc. for the obvious reasons. Mathematically, this has turned out to be the optimal number looked at with a cost/benefit mindset. Kind of like: 2 is better than one, buta local problem gets both copies. Three (remote) is more expensive but you get A LOT more data resilience/persistence. Four costs a bunch more, but delivers just a little bit more resilience. Four+ are all examples of ever diminishing returns. CW On 3/16/2017 4:40 PM, Raymond A. Clarke wrote: Hello All, A few years back, I did some research on bit-rot and data corruption, as it relates to the various medium that data passes through, on its way to and from the user. Consider this simple example; as data from memory to HBA to cable to air to cable and so on, bits can be lost along way at any one of, or several of the medium transit points. This something that current technologies can help with, in part. Back to the original question, :how do we insure against corruption, either from compression, encryption? and/or transmission? Well disk and tape(data resting places, if you will) have a come very long way in reducing bit-error rates, compression and encryption. But the ?resting places? are only part of a problem. In accordance with Gail?s suggestion and as Dr. Rosenthal has coined, LOCKSS (?lot of copies keep stuff safe?). Take good care, Raymond From: Pasig-discuss [ mailto:pasig-discuss-bounces at asis.org] On Behalf Of gail at trumantechnologies.com Sent: Thursday, March 16, 2017 5:10 PM To: Jeanne Kramer-Smyth ; Robert Spindler ; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Hello again, Jeanne, I think you're hitting on something that needs to be raised to (and pushed for with) vendors, and that is the need for "More transparency" and the reporting to customers of "events" that are part of the provenance of a digital object. The storage architectures do a good job of error detection and self healing; however, they do not report this out. I'd like to (this is my dream) have vendors report back to customers (as part of their SLA) when a object (or part of an object if it's been chunked) has been repaired/self-healed - or lost forever. I could then record this as a PREMIS event. As you know, vendors "design for" 11x9s or 13x9s durability, but their SLAs do not require them to tell us if their durability and data corruption starts to get really bad for whatever reason. I've not directly answered your question about whether the encryption, dedupe, compression, and other things that can happen inside a storage system is increasing the risk of corruption. I'll look around. I am sure the disk vendors and storage solution and cloud storage vendors have run the numbers, but am not sure if they're made public. This alias has people from Oracle, Seagate and other storage companies on it so I encourage them to please share any research they have on this - Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: RE: [Pasig-discuss] Risks of encryption & compression built into storage options? From: Jeanne Kramer-Smyth < jkramersmyth at worldbankgroup.org> Date: Thu, March 16, 2017 1:44 pm To: " gail at trumantechnologies.com" < gail at trumantechnologies.com>, "Robert Spindler" < rob.spindler at asu.edu>, " pasig-discuss at mail.asis.org" < pasig-discuss at mail.asis.org> Thanks Gail & Rob for your replies. I am less worried about the scenario of someone stealing a drive ? as Rob pointed out, if that is happening we have bigger problems. I do wonder if there are increased risks of bit-rot/file corruption with encryption, compression, and data deduplication. Have there been any studies on this? Could pulling a file off a drive that requires reversal of the auto-encryption and auto-compression in place at the system level mean a greater risk of bits flipping? I am trying to contrast the increased ?handling? and change required to get from the stored version to the original version vs the decreased ?handling? it would require if what I am pulling off the storage device is exactly what I sent to be stored. I am less worried about issues related to not being able to decrypt content. The storage solutions we are contemplating would remain under enough ongoing management that these issues should be avoidable. Since ensuring that non-public records remain secure is also very important, encryption gets some points in the ?pro? column. I agree that having multiple copies in different storage architectures and with different vendors would also decrease risk. I want to understand the risks related to the different storage architectures and the ever increasing number of ?automatic? things being done to digital objects in the process of them being stored and retrieved. Are there people doing work, independent of vendor claims, to document these types of risks? Thank you, Jeanne Jeanne Kramer-Smyth IT Officer, Information Management Services II Information and Technology Solutions WBG Library & Archives of Development T 202-473-9803 E jkramersmyth at worldbankgroup.org W www.worldbank.org spellboundblog jkramersmyth jkramersmyth A 1818 H St NW Washington, DC 20433 From: gail at trumantechnologies.com [ mailto:gail at trumantechnologies.com] Sent: Thursday, March 16, 2017 3:18 PM To: Robert Spindler < rob.spindler at asu.edu>; Jeanne Kramer-Smyth < jkramersmyth at worldbankgroup.org>; pasig-discuss at mail.asis.org Subject: RE: [Pasig-discuss] Risks of encryption & compression built into storage options? Hi all, a good topic! There is new drive technology from Seagate (probably other manufacturers) called "Self Encrypted Drives" (SEDs) which can be used to solve the problem of a person stealing a drive and running off with data. Most cloud services now automatically provide "server side encryption" which means the vendor is doing the encryption for all data at rest (as you point out Jeanne). This is required by HIPAA for all health care data, and is now considered cloud best practice for cloud vendors due to the very real risk of hacking. So, for archival, we need to weigh the data security provided by cloud storage services using server side encryption with the risk of the vendor managing the encryption keys. Which IMO underscores the importance of having multiple copies of all your archival data -- with different vendors and storage architectures or media types if possible. Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? From: Robert Spindler < rob.spindler at asu.edu> Date: Thu, March 16, 2017 9:06 am To: Jeanne Kramer-Smyth < jkramersmyth at worldbankgroup.org>, " pasig-discuss at mail.asis.org" < pasig-discuss at mail.asis.org> At risk of starting a conversation, here are a couple basic issues from an archival standpoint: Encryption: Who has the keys and what happens should a provider go out of business? Compression: Lossy or Lossless and how does that compression act on different file formats (video/audio). If this is frequently accessed material it becomes more of an issue. Short story: At a CNI meeting perhaps 15 years ago in a session about ebooks I asked a panel of vendors if they would give up the keys to encrypted e-books when they reached public domain. Crickets. Physical discs are not secure given the forensics software widely available today, but if someone can grab a physical disc the provider has more problems than forensics. Rob Spindler University Archivist and Head Archives and Special Collections Arizona State University Libraries Tempe AZ 85287-1006 480.965.9277 http://www.asu.edu/lib/archives From: Pasig-discuss [ mailto:pasig-discuss-bounces at asis.org] On Behalf Of Jeanne Kramer-Smyth Sent: Thursday, March 16, 2017 8:54 AM To: pasig-discuss at mail.asis.org Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? Is anyone aware of active research into the risks to digital preservation that are posed by built in encryption and compression in both cloud and on-prem storage options? Any and all go-to sources for research and reading on these topics would be very welcome. I am being told by the staff who source storage solutions for my organization that encryption and compression are generally included at the hardware level. That content is automatically encrypted and compressed as it is written to disc ? and then un-encrypted and un-compressed as it is pulled off disc in response to a request. It is advertised as both more secure (someone stealing a physical disc could not, in theory, extract its contents) and more cost efficient (taking up less space). I want to be sure that as we make our choices for long-term storage of permanent digital records that we take these risks into accounts. Thank you! Jeanne Jeanne Kramer-Smyth IT Officer, Information Management Services II Information and Technology Solutions WBG Library & Archives of Development T 202-473-9803 E jkramersmyth at worldbankgroup.org W www.worldbank.org spellboundblog jkramersmyth jkramersmyth A 1818 H St NW Washington, DC 20433 _____ ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -- ---------------------------------------------------- Chris Wood Storage & Data Management Office: 408-782-2757 (Home Office) Office: 408-276-0730 (Work Office) Mobile: 408-218-7313 (Preferred) Email: lw85381 at yahoo.com ---------------------------------------------------- ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruzicka at ics.muni.cz Fri Mar 17 15:06:46 2017 From: ruzicka at ics.muni.cz (=?UTF-8?B?TWljaGFsIFLFr8W+acSNa2E=?=) Date: Fri, 17 Mar 2017 20:06:46 +0100 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: <54C09C0B-6196-472B-9D05-44C87765D481@vt.edu> References: <9758a8c7f6644c81bc8ad030c5540dff@kit-msx-29.kit.edu> <54C09C0B-6196-472B-9D05-44C87765D481@vt.edu> Message-ID: <4218ba27-445d-a14f-044a-8ffc7dd095e8@ics.muni.cz> Dear all, I am very interested in this discussion. One short comment first and two question next: I do not think erasure coding is a good idea in the LTP system as significantly increases the complexity of the system and coding (increases probability of an error in implementation/process/...) and increases interconnections of the data between the multiple storage areas. I am a big fan of isolation, independence and as-simple-as-possible coding of the data replicas as much as possible. Now questions: 1. What is the best LTP implementation methodology you can recommend me? I do not mean the OAIS itself but practical recommendations on concrete implementations, methods and procedures for a relatively small (<< 1 PB) data archive. 2. The Ceph distributed storage was mentioned in the below cited e-mail. I am aware of the Ceph use in the Dutch National Archive (http://widodh.o.auroraobjects.eu/talks/ceph_dutch_national_archive_2016.pdf#page=11&zoom=page-fit,-177,595). What do you think about the use of Ceph in an LTP system? Do you have any experience with Ceph in practice or strong opinion on this technology? All the best, Michal Dne 17.3.2017 v 15:49 Paul Mather napsal(a): > On Mar 17, 2017, at 3:48 AM, van Wezel, Jos (SCC) > wrote: > >> Chris, >> do you happen to have any reference to the mathatical correctness or >> computation that 3 copies is optimal. Is proof based on the standard >> ecc values that vendors list with their components (tapes, disks, >> transport lines, memory etc). I'm asking because its difficult to >> argue for the additional costs of a third copy without the math. >> Currently I can't tell my customers how much (as in percentage) extra >> security an addittional copy will bring, even theoretically. > > One thing I don't believe I've seen mentioned so far in regards to > redundancy costs is switching to erasure-resilient coding rather than > using plain replication. Explained briefly, erasure-resilient coding > represents a logical unit of data as k fragments. These k fragments > are then encoded into a larger unit of n fragments, n > k, where the > n-k extra fragments can be thought of as "parity" fragments. The n > encoded fragments may then be distributed across different disks, > racks, and data centres. The value is that *any* k out of n fragments > may be used to reconstitute the original logical unit of data. As n > grows larger, the probability of total data loss grows smaller, and, > conversely, the storage overhead and cost grows larger, allowing you to > choose your cost/risk balance. The main disadvantage of > erasure-resilient coding is that data I/O latency is increased due to > the inherently distributed nature of the storage approach. There are > comparisons between replication and erasure-resilient coding systems. > One such (https://dl.acm.org/citation.cfm?id=687814) concludes, "We > show that systems employing erasure codes have mean time to failures > many orders of magnitude higher than replicated systems with similar > storage and bandwidth requirements. More importantly, erasure-resilient > systems use an order of magnitude less bandwidth and storage to provide > similar system durability as replicated systems." > > Erasure-resilient coding is becoming mainstream in Cloud storage and > object storage systems in general. I believe that Hadoop has recently > acquired an erasure-resilient coding storage option for HDFS as an > alternative to the standard replication model. This is due to the > increase in data set sizes, where erasure-resilient coding can offer > lower redundancy overheads than plain replication options, yet still > offering the same or higher assurance levels on data availability. I > also believe CEPH and OpenStack Swift are supporting erasure-resilient > storage. > > Cheers, > > Paul. -- --------------------------------------------------------------- Michal R??i?ka Phone: +420 549 49 6834 Aleph Library Management System Library Information Centre, Institute of Computer Science Masaryk University, Czech Republic Office number C308, Botanick? 68a, 602 00 Brno OpenPGP key: https://kic-internal.ics.muni.cz/~ruzicka/pgp-key/ Fingerprint: 4791 027A B994 A183 C28C 9B89 33C1 5D8C 293E 15A9 --------------------------------------------------------------- From dshr at stanford.edu Fri Mar 17 17:17:24 2017 From: dshr at stanford.edu (David Rosenthal) Date: Fri, 17 Mar 2017 14:17:24 -0700 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: References: <9758a8c7f6644c81bc8ad030c5540dff@kit-msx-29.kit.edu> <48E9420A4871584593FC3D435EF345AAEEEA1154@MBX10.ad.oak.ox.ac.uk> Message-ID: <664b4808-00c4-41a9-cad5-74586bf0cc80@stanford.edu> On 03/17/2017 10:37 AM, Goethals, Andrea wrote: > Regarding ?> With that sort of error then you have two copies of > supposedly correct information and no way of telling which is > correct.? ? I think it?s important to maintain fixity information > (e.g. SHA/MD5, etc.) values outside of the storage systems as well so > that you can be sure that any copy of your content hasn?t changed. It > can also come in very handy to check the completeness of transfers, > e.g. when migrating to new storage. Its not quite that simple. See: http://blog.dshr.org/2017/03/sha1-is-dead.html David. From illtud.daniel at llgc.org.uk Fri Mar 17 12:59:49 2017 From: illtud.daniel at llgc.org.uk (Illtud Daniel) Date: Fri, 17 Mar 2017 16:59:49 +0000 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: <48E9420A4871584593FC3D435EF345AAEEEA154C@MBX10.ad.oak.ox.ac.uk> References: <9758a8c7f6644c81bc8ad030c5540dff@kit-msx-29.kit.edu> <54C09C0B-6196-472B-9D05-44C87765D481@vt.edu> <48E9420A4871584593FC3D435EF345AAEEEA154C@MBX10.ad.oak.ox.ac.uk> Message-ID: On 17/03/17 15:34, Neil Jefferies wrote: > However, erasure coding only protects against certain failure modes and > is critically dependent on metadata for fragment reassembly. This. And your metadata layer may be proprietary and opaque, and possibly require a particular version of a system to successfully rebuild fragments. Roll on robust open-source implementations with confidence-building distributed filesystem integrity checking, but I'd be wary of adopting anything other than that. -- Illtud Daniel illtud.daniel at llgc.org.uk Pennaeth TGCh Head of ICT Llyfrgell Genedlaethol Cymru National Library of Wales From dshr at stanford.edu Sat Mar 18 22:25:59 2017 From: dshr at stanford.edu (David Rosenthal) Date: Sat, 18 Mar 2017 19:25:59 -0700 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: <4218ba27-445d-a14f-044a-8ffc7dd095e8@ics.muni.cz> References: <9758a8c7f6644c81bc8ad030c5540dff@kit-msx-29.kit.edu> <54C09C0B-6196-472B-9D05-44C87765D481@vt.edu> <4218ba27-445d-a14f-044a-8ffc7dd095e8@ics.muni.cz> Message-ID: <53a1d94c-6dd3-b1df-3172-fbc2ef7260dc@stanford.edu> On 03/17/2017 12:06 PM, Michal R??i?ka wrote: > I do not think erasure coding is a good idea in the LTP system as > significantly increases the complexity of the system and coding > (increases probability of an error in implementation/process/...) and > increases interconnections of the data between the multiple storage > areas. I am a big fan of isolation, independence and > as-simple-as-possible coding of the data replicas as much as possible. Erasure coding can easily halve the cost. Running out of money is often a much more likely threat than "an error in implementation/process/..." Once again, it is important to base these discussions on an explicit threat model. Perhaps Michal's threat model excludes budgetary threats. If so, many of us will be very jealous, especially given the US administration's proposed budget. David. From matthew at addisdigital.co.uk Sun Mar 19 16:45:24 2017 From: matthew at addisdigital.co.uk (Matthew Addis) Date: Sun, 19 Mar 2017 20:45:24 +0000 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: <53a1d94c-6dd3-b1df-3172-fbc2ef7260dc@stanford.edu> References: <9758a8c7f6644c81bc8ad030c5540dff@kit-msx-29.kit.edu> <54C09C0B-6196-472B-9D05-44C87765D481@vt.edu> <4218ba27-445d-a14f-044a-8ffc7dd095e8@ics.muni.cz> <53a1d94c-6dd3-b1df-3172-fbc2ef7260dc@stanford.edu> Message-ID: <79bcf3c2-e646-046c-db19-0817ded4b249@addisdigital.co.uk> I feel a bit late to the party given so much interesting discussion so far, but could I humbly offer the following suggestions. These are based on a long history working on archival storage solutions and working with many large archives. Do a bit of risk assessment and think broadly about the threats to long-term data safety. Along with David, I'd suggest that economic factors are right up there near the top. Not having the budget, even temporarily, that's needed to sustain a programme of storage maintenance, upgrades, migrations, hosting fees, DR, auditing, testing exit plans etc. is often a bigger risk than using a specific storage technology or worrying about things like bit-rot. Next comes people problems, e.g. outsider attacks such as hacking, staff not following procedures, skilled members of a team leaving and not handing over knowledge of how a system works or how to use it. You can have the most reliable storage technology on the planet, but its not much good if content can easily be deleted by unauthorised staff or there are security vulnerabilities that open you up to ransomware or worse. Things that can help with thinking in the right direction include DRAMBORA, TRAC, TDR, DSA and other risk and audit criteria. If you are going to consider storage in detail, then treat storage as a system and recognise that modes of data loss include things like bugs in the software and firmware that writes data to storage and reads it back again. Unreliable media or the risks that come from specific techniques such as compression or encryption have a part to play, but system-level errors and lack of error detection and handling are often a bigger issue, not least because as David points out this can lead to correlated errors, systematic failures and much higher data loss rates than you might expect from looking at individual components of the system in isolation. I'd suggest that diversity and independent copies of the data are your friends. Multiple copies in different geographic locations using different storage technologies that are kept as isolated as far as possible helps reduce correlated errors and stop data corruption/loss from spreading. If you want more information about storage error and failure modes in the real world then the proceedings of FAST are a great place to look. One copy of the data stored offline with a third-party can be a lifesaver. This provides a great 'firebreak' against the various threats to online systems or services - it's very hard for software bugs, hackers, disgruntled staff, replication of data corruption and other nasties to propagate to data that's stored in a box at the bottom of a salt mine with armed guards at the door. This is of course additional to the one or more copies of the data that are online so the data is also easy to access and use, which is often crucial to showing its value and economic sustainability. There's no magic number of copies when storing data, e.g. three. Treat mathematical models with caution. I know because I've made several in the past to simulate the risk of data loss so know what's involved. Models rarely cover all the threats or match the real world. Storage models and simulations are based on a lot of assumptions and are notoriously hard to calibrate. Instead of trying to justify a specific number of copies, I reckon you are better off following a maturity model, e.g. NDSA preservation levels or DPCMM, i.e. start with a recognised strategy and work from there. If you need to build a business case that justifies the cost of storing data properly and safely then I'd base it on NDSA preservation levels, the DPC handbook, what a big archive does or some other evidence of what's been found to work in practice rather than some mathematical model. There's nothing wrong with proprietary storage solutions per se, including erasure coding, and at some level or other we all use it - there's no such thing as an open source hard drive after all. As Neil says, the is to treat all storage as a black box and have independent checks and balances, e.g. using your own checksums and fixity checks to make sure data has been received by the storage system correctly and integrity hasn't been lost over time. Multiple independent copies of the data that are regularly checked gives assurance that data is OK and if one copy has issues then you'll pick it up quickly and can fix it using one of the other copies. Compression and encryption aren't necessarily risk multipliers if you chose carefully and there can sometimes be benefits. For example, intra-frame compression in video or counter-mode encryption mean that if one part of a file is corrupted then you don't necessarily lose everything. The upside is that compression can, in some circumstances, allow you to make more copies of the data for the same budget. For example, I'd take two lossless compressed video files stored in two geographically separate locations over a single copy stored in an uncompressed format that takes up twice the storage space. I?d also take erasure-coded storage plus an offline escrowed copy of the data over a single cloud provider irrespective of how many copies they make internally. IMHO, it's all about trading off costs, risks and the value of content. There isn't a one size fits all solution. Links to pretty much everything above is in some slides and speakers notes for a talk I do for the DPC: https://doi.org/10.6084/m9.figshare.4584859 There's also the DPC handbook that has sections on storage and fixity that build upon the NDSA levels and takes a risk assessment approach: http://dpconline.org/handbook/technical-solutions-and-tools/fixity-and-checksums http://dpconline.org/handbook/organisational-activities/storage If anyone wants to know how we apply the above at Arkivum then I'd be happy to do a follow-up post and talk about our approach to keeping data safe to the level that allows us to offer a data integrity guarantee. Cheers, Matthew Matthew Addis Chief Technology Officer Arkivum tel: +44 1249 405060 mob: +44 7703 393374 email: matthew.addis at arkivum.com web: www.arkivum.com twitter: @arkivum -------------- next part -------------- An HTML attachment was scrubbed... URL: From jos.vanwezel at kit.edu Sun Mar 19 18:41:37 2017 From: jos.vanwezel at kit.edu (van Wezel, Jos (SCC)) Date: Sun, 19 Mar 2017 23:41:37 +0100 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: <98f7186f-6bef-6f1b-82cd-ad2e4b662f9b@yahoo.com> References: <9758a8c7f6644c81bc8ad030c5540dff@kit-msx-29.kit.edu> <98f7186f-6bef-6f1b-82cd-ad2e4b662f9b@yahoo.com> Message-ID: <4ad27211-7696-55ef-646b-6d17f07c0e95@kit.edu> Hi Chris, thanks a lot. The paper is fun reading especially about the analog movie archive :-)). Hopefully you do find the mpeg paper. My searches returned nothing yet. (was it ever published in some way?) @all: Having read all posts thus far (great stuff guys) clearly the engineering approach to the problem does not cut it at all. Reading between the lines there seems to be a lot of experience with disasters where even a BER of 10^99 and 4 copies wont help. :-) For now we'll stick with 2 copies and 3 if requested explicitly by the client. Groet Jos On 17/03/2017 17:48, Chris Wood wrote: > Jos: > > I just knew somebody would ask this. Ha. Several years ago several of us wrote > a paper for the MPEG (Motion Pictures Expert Group) and a mathematician named > Jeff Bonwick figured out all the math. I haven't found it yet in the junk heap > of my PC, but did find a companion paper written by by the same set of authors. > It's not exactly, what you are looking for, but close. It's more about Bit Error > Rates at a rather low level. I will continue to look for the MPEG paper. It's > got to be somewhere. The Internet "never forgets" Right? > Stay tuned as I keep looking. > > CW > > On 3/17/2017 12:48 AM, van Wezel, Jos (SCC) wrote: >> Chris, >> do you happen to have any reference to the mathatical correctness or >> computation that 3 copies is optimal. Is proof based on the standard ecc >> values that vendors list with their components (tapes, disks, transport >> lines, memory etc). I'm asking because its difficult to argue for the >> additional costs of a third copy without the math. Currently I can't tell my >> customers how much (as in percentage) extra security an addittional copy will >> bring, even theoretically. >> >> regards >> >> jos >> >> Sent from my Samsung Galaxy smartphone. >> >> -------- Original message -------- >> From: Chris Wood >> Date: 17/03/2017 02:07 (GMT+01:00) >> To: "Raymond A. Clarke" , >> gail at trumantechnologies.com, 'Jeanne Kramer-Smyth' >> , 'Robert Spindler' , >> pasig-discuss at mail.asis.org >> Subject: Re: [Pasig-discuss] Risks of encryption & compression built into >> storage options? >> >> Thanks Ray as always for a great summary. Now my three bits: >> >> Three (3) copies please. One of which is in a remote location on a different >> flood plane, Electric grid, fault line etc. for the obvious reasons. >> Mathematically, this has turned out to be the optimal number looked at with a >> cost/benefit mindset. Kind of like: 2 is better than one, buta local problem >> gets both copies. Three (remote) is more expensive but you get A LOT more data >> resilience/persistence. Four costs a bunch more, but delivers just a little >> bit more resilience. Four+ are all examples of ever diminishing returns. >> >> CW >> >> On 3/16/2017 4:40 PM, Raymond A. Clarke wrote: >>> >>> Hello All, >>> >>> >>> >>> A few years back, I did some research on bit-rot and data corruption, as it >>> relates to the various medium that data passes through, on its way to and >>> from the user. Consider this simple example; as data from memory to HBA to >>> cable to air to cable and so on, bits can be lost along way at any one of, or >>> several of the medium transit points. This something that current >>> technologies can help with, in part. Back to the original question, :how do >>> we insure against corruption, either from compression, encryption? and/or >>> transmission? Well disk and tape(/data resting places/, if you will) have a >>> come very long way in reducing bit-error rates, compression and encryption. >>> But the ?/resting places?/ are only part of a problem. In accordance with >>> Gail?s suggestion and as Dr. Rosenthal has coined, LOCKSS (?lot of copies >>> keep stuff safe?). >>> >>> >>> >>> >>> >>> Take good care, >>> >>> Raymond >>> >>> >>> >>> *From:*Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] *On Behalf Of >>> *gail at trumantechnologies.com >>> *Sent:* Thursday, March 16, 2017 5:10 PM >>> *To:* Jeanne Kramer-Smyth ; Robert Spindler >>> ; pasig-discuss at mail.asis.org >>> *Subject:* Re: [Pasig-discuss] Risks of encryption & compression built into >>> storage options? >>> >>> >>> >>> Hello again, Jeanne, >>> >>> >>> >>> I think you're hitting on something that needs to be raised to (and pushed >>> for with) vendors, and that is the need for "More transparency" and the >>> reporting to customers of "events" that are part of the provenance of a >>> digital object. The storage architectures do a good job of error detection >>> and self healing; however, they do not report this out. I'd like to (this is >>> my dream) have vendors report back to customers (as part of their SLA) when a >>> object (or part of an object if it's been chunked) has been >>> repaired/self-healed - or lost forever. I could then record this as a PREMIS >>> event. As you know, vendors "design for" 11x9s or 13x9s durability, but their >>> SLAs do not require them to tell us if their durability and data corruption >>> starts to get really bad for whatever reason. >>> >>> >>> >>> I've not directly answered your question about whether the encryption, >>> dedupe, compression, and other things that can happen inside a storage system >>> is increasing the risk of corruption. I'll look around. I am sure the disk >>> vendors and storage solution and cloud storage vendors have run the numbers, >>> but am not sure if they're made public. >>> >>> >>> >>> This alias has people from Oracle, Seagate and other storage companies on it >>> so I encourage them to please share any research they have on this - >>> >>> >>> >>> >>> >>> Gail >>> >>> >>> >>> >>> >>> >>> >>> Gail Truman >>> >>> Truman Technologies, LLC >>> >>> Certified Digital Archives Specialist, Society of American Archivists >>> >>> >>> >>> /*Protecting the world's digital heritage for future generations*/ >>> >>> www.trumantechnologies.com >>> >>> facebook/TrumanTechnologies >>> >>> https://www.linkedin.com/in/gtruman >>> >>> >>> >>> +1 510 502 6497 >>> >>> >>> >>> >>> >>> >>> >>> -------- Original Message -------- >>> Subject: RE: [Pasig-discuss] Risks of encryption & compression built >>> into storage options? >>> From: Jeanne Kramer-Smyth >> > >>> Date: Thu, March 16, 2017 1:44 pm >>> To: "gail at trumantechnologies.com " >>> >, "Robert >>> Spindler" >, >>> "pasig-discuss at mail.asis.org " >>> > >>> >>> Thanks Gail & Rob for your replies. >>> >>> >>> >>> I am less worried about the scenario of someone stealing a drive ? as Rob >>> pointed out, if that is happening we have bigger problems. >>> >>> >>> >>> I do wonder if there are increased risks of bit-rot/file corruption with >>> encryption, compression, and data deduplication. Have there been any >>> studies on this? Could pulling a file off a drive that requires reversal >>> of the auto-encryption and auto-compression in place at the system level >>> mean a greater risk of bits flipping? I am trying to contrast the >>> increased ?handling? and change required to get from the stored version >>> to the original version vs the decreased ?handling? it would require if >>> what I am pulling off the storage device is exactly what I sent to be stored. >>> >>> >>> >>> I am less worried about issues related to not being able to decrypt >>> content. The storage solutions we are contemplating would remain under >>> enough ongoing management that these issues should be avoidable. Since >>> ensuring that non-public records remain secure is also very important, >>> encryption gets some points in the ?pro? column. I agree that having >>> multiple copies in different storage architectures and with different >>> vendors would also decrease risk. >>> >>> >>> >>> I want to understand the risks related to the different storage >>> architectures and the ever increasing number of ?automatic? things being >>> done to digital objects in the process of them being stored and >>> retrieved. Are there people doing work, independent of vendor claims, to >>> document these types of risks? >>> >>> >>> >>> Thank you, >>> >>> >>> >>> Jeanne >>> >>> *Jeanne Kramer-Smyth* >>> >>> *IT Officer, Information Management Services II* >>> >>> http://siteresources.worldbank.org/NEWS/Images/spacer.png >>> >>> *Information and Technology Solutions* >>> >>> *WBG Library & Archives of Development* >>> >>> T >>> >>> >>> >>> 202-473-9803 >>> >>> E >>> >>> >>> >>> jkramersmyth at worldbankgroup.org >>> >>> W >>> >>> >>> >>> www.worldbank.org >>> >>> >>> http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg >>> >>> >>> >>> spellboundblog >>> >>> http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg >>> >>> >>> >>> jkramersmyth >>> >>> http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg >>> >>> >>> >>> jkramersmyth >>> >>> A >>> >>> >>> >>> 1818 H St NW Washington, DC 20433 >>> >>> http://siteresources.worldbank.org/NEWS/Images/spacer.png >>> >>> http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png >>> >>> >>> >>> *From:*gail at trumantechnologies.com >>> [mailto:gail at trumantechnologies.com] >>> *Sent:* Thursday, March 16, 2017 3:18 PM >>> *To:* Robert Spindler >> >; Jeanne Kramer-Smyth >>> >> >; pasig-discuss at mail.asis.org >>> >>> *Subject:* RE: [Pasig-discuss] Risks of encryption & compression built >>> into storage options? >>> >>> >>> >>> Hi all, a good topic! >>> >>> There is new drive technology from Seagate (probably other manufacturers) >>> called "Self Encrypted Drives" (SEDs) which can be used to solve the >>> problem of a person stealing a drive and running off with data. >>> >>> >>> >>> Most cloud services now automatically provide "server side encryption" >>> which means the vendor is doing the encryption for all data at rest (as >>> you point out Jeanne). This is required by HIPAA for all health care >>> data, and is now considered cloud best practice for cloud vendors due to >>> the very real risk of hacking. So, for archival, we need to weigh the >>> data security provided by cloud storage services using server side >>> encryption with the risk of the vendor managing the encryption keys. >>> Which IMO underscores the importance of having multiple copies of all >>> your archival data -- with different vendors and storage architectures or >>> media types if possible. >>> >>> >>> >>> Gail >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> Gail Truman >>> >>> Truman Technologies, LLC >>> >>> Certified Digital Archives Specialist, Society of American Archivists >>> >>> >>> >>> /*Protecting the world's digital heritage for future generations*/ >>> >>> www.trumantechnologies.com >>> >>> facebook/TrumanTechnologies >>> >>> https://www.linkedin.com/in/gtruman >>> >>> >>> >>> +1 510 502 6497 >>> >>> >>> >>> >>> >>> >>> >>> -------- Original Message -------- >>> Subject: Re: [Pasig-discuss] Risks of encryption & compression built >>> into storage options? >>> From: Robert Spindler >> > >>> Date: Thu, March 16, 2017 9:06 am >>> To: Jeanne Kramer-Smyth >> >, >>> "pasig-discuss at mail.asis.org " >>> > >>> >>> At risk of starting a conversation, here are a couple basic issues >>> from an archival standpoint: >>> >>> >>> >>> Encryption: Who has the keys and what happens should a provider go >>> out of business? >>> >>> >>> >>> Compression: Lossy or Lossless and how does that compression act on >>> different file formats (video/audio). If this is frequently accessed >>> material it becomes more of an issue. >>> >>> >>> >>> Short story: At a CNI meeting perhaps 15 years ago in a session about >>> ebooks I asked a panel of vendors if they would give up the keys to >>> encrypted e-books when they reached public domain. Crickets. >>> >>> >>> >>> Physical discs are not secure given the forensics software widely >>> available today, but if someone can grab a physical disc the provider >>> has more problems than forensics. >>> >>> >>> >>> Rob Spindler >>> >>> University Archivist and Head >>> >>> Archives and Special Collections >>> >>> Arizona State University Libraries >>> >>> Tempe AZ 85287-1006 >>> >>> 480.965.9277 >>> >>> http://www.asu.edu/lib/archives >>> >>> >>> >>> *From:*Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] *On >>> Behalf Of *Jeanne Kramer-Smyth >>> *Sent:* Thursday, March 16, 2017 8:54 AM >>> *To:* pasig-discuss at mail.asis.org >>> *Subject:* [Pasig-discuss] Risks of encryption & compression built >>> into storage options? >>> >>> >>> >>> Is anyone aware of active research into the risks to digital >>> preservation that are posed by built in encryption and compression in >>> both cloud and on-prem storage options? Any and all go-to sources for >>> research and reading on these topics would be very welcome. >>> >>> >>> >>> I am being told by the staff who source storage solutions for my >>> organization that encryption and compression are generally included >>> at the hardware level. That content is automatically encrypted and >>> compressed as it is written to disc ? and then un-encrypted and >>> un-compressed as it is pulled off disc in response to a request. It >>> is advertised as both more secure (someone stealing a physical disc >>> could not, in theory, extract its contents) and more cost efficient >>> (taking up less space). >>> >>> >>> >>> I want to be sure that as we make our choices for long-term storage >>> of permanent digital records that we take these risks into accounts. >>> >>> >>> >>> Thank you! >>> >>> Jeanne >>> >>> >>> >>> *Jeanne Kramer-Smyth* >>> >>> *IT Officer, Information Management Services II* >>> >>> http://siteresources.worldbank.org/NEWS/Images/spacer.png >>> >>> *Information and Technology Solutions* >>> >>> *WBG Library & Archives of Development* >>> >>> T >>> >>> >>> >>> 202-473-9803 >>> >>> E >>> >>> >>> >>> jkramersmyth at worldbankgroup.org >>> >>> >>> W >>> >>> >>> >>> www.worldbank.org >>> >>> >>> http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg >>> >>> >>> >>> spellboundblog >>> >>> http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg >>> >>> >>> >>> jkramersmyth >>> >>> http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg >>> >>> >>> >>> jkramersmyth >>> >>> A >>> >>> >>> >>> 1818 H St NW Washington, DC 20433 >>> >>> http://siteresources.worldbank.org/NEWS/Images/spacer.png >>> >>> http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png >>> >>> >>> >>> >>> >>> -------------------------------------------------------------------------------- >>> >>> ---- >>> To subscribe, unsubscribe, or modify your subscription, please visit >>> http://mail.asis.org/mailman/listinfo/pasig-discuss >>> _______ >>> PASIG Webinars and conference material is at >>> http://www.preservationandarchivingsig.org/index.html >>> _______________________________________________ >>> Pasig-discuss mailing list >>> Pasig-discuss at mail.asis.org >>> http://mail.asis.org/mailman/listinfo/pasig-discuss >>> >>> >>> >>> ---- >>> To subscribe, unsubscribe, or modify your subscription, please visit >>> http://mail.asis.org/mailman/listinfo/pasig-discuss >>> _______ >>> PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html >>> _______________________________________________ >>> Pasig-discuss mailing list >>> Pasig-discuss at mail.asis.org >>> http://mail.asis.org/mailman/listinfo/pasig-discuss >> >> -- >> ---------------------------------------------------- >> Chris Wood >> Storage & Data Management >> Office: 408-782-2757 (Home Office) >> Office: 408-276-0730 (Work Office) >> Mobile: 408-218-7313 (Preferred) >> Email: lw85381 at yahoo.com >> ---------------------------------------------------- > > -- > ---------------------------------------------------- > Chris Wood > Storage & Data Management > Office: 408-782-2757 (Home Office) > Office: 408-276-0730 (Work Office) > Mobile: 408-218-7313 (Preferred) > Email: lw85381 at yahoo.com > ---------------------------------------------------- > -- Steinbuch Centre for Computing (SCC) KIT - Campus Nord Hermann von Helmholtzplatz 1 76344 Eggenstein - Leopoldshafen ? +49 721 60826305 Building 449, Room 122 Orcid ID: 0000-0003-0175-6216 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5063 bytes Desc: S/MIME Cryptographic Signature URL: From lw85381 at yahoo.com Mon Mar 20 00:15:11 2017 From: lw85381 at yahoo.com (Chris Wood) Date: Sun, 19 Mar 2017 21:15:11 -0700 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: <4ad27211-7696-55ef-646b-6d17f07c0e95@kit.edu> References: <9758a8c7f6644c81bc8ad030c5540dff@kit-msx-29.kit.edu> <98f7186f-6bef-6f1b-82cd-ad2e4b662f9b@yahoo.com> <4ad27211-7696-55ef-646b-6d17f07c0e95@kit.edu> Message-ID: <38ec899b-8104-1f22-ae7c-25af0dad29a4@yahoo.com> Hi Jos: I remember getting a nice hard copy of the booklet. I don't know if MPEG ever made it public. I thought that by now some institution would have posted it, but I can't find it. (yet) Still looking. Your comments about "other" bad things happening is spot on. In an IBM study several of us did about 20 years ago on data loss causal agents, human error won by a huge margin. In last place (Fewest causal factors) was H/W failures. In between in rough order was application and data management Software, incorrect documentation, device Firmware (We used to call this microcode when we still had dial phones:-)), external events (Power failures, storms whatever) and a few other categories I forget. I do remember our RAS expert (Reliability, Availability and Serviceability) making the point that perfect replication code replicates corrupted data perfectly. Even more true today. You might find this a quick interesting read: Why did NASA TRIPLEX all computers in the Space Shuttle and have two separate vendors write the code for them with a sophisticated voting system cases of non-agreement. https://www.nap.edu/read/2222/chapter/5 It seemed to work fine, but inter-booster gaskets did not and it turned out the insulation tiles were not very good at foreign object impact resistance. A good example of unknown and completely unexpected failure modes. CW On 3/19/2017 3:41 PM, van Wezel, Jos (SCC) wrote: > Hi Chris, thanks a lot. The paper is fun reading especially about the > analog movie archive :-)). Hopefully you do find the mpeg paper. My > searches returned nothing yet. (was it ever published in some way?) > > @all: Having read all posts thus far (great stuff guys) clearly the > engineering approach to the problem does not cut it at all. Reading > between the lines there seems to be a lot of experience with disasters > where even a BER of 10^99 and 4 copies wont help. :-) For now we'll > stick with 2 copies and 3 if requested explicitly by the client. > > Groet > > Jos > > > On 17/03/2017 17:48, Chris Wood wrote: >> Jos: >> >> I just knew somebody would ask this. Ha. Several years ago several >> of us wrote >> a paper for the MPEG (Motion Pictures Expert Group) and a >> mathematician named >> Jeff Bonwick figured out all the math. I haven't found it yet in the >> junk heap >> of my PC, but did find a companion paper written by by the same set >> of authors. >> It's not exactly, what you are looking for, but close. It's more >> about Bit Error >> Rates at a rather low level. I will continue to look for the MPEG >> paper. It's >> got to be somewhere. The Internet "never forgets" Right? >> Stay tuned as I keep looking. >> >> CW >> >> On 3/17/2017 12:48 AM, van Wezel, Jos (SCC) wrote: >>> Chris, >>> do you happen to have any reference to the mathatical correctness or >>> computation that 3 copies is optimal. Is proof based on the standard >>> ecc >>> values that vendors list with their components (tapes, disks, >>> transport >>> lines, memory etc). I'm asking because its difficult to argue for the >>> additional costs of a third copy without the math. Currently I can't >>> tell my >>> customers how much (as in percentage) extra security an addittional >>> copy will >>> bring, even theoretically. >>> >>> regards >>> >>> jos >>> >>> Sent from my Samsung Galaxy smartphone. >>> >>> -------- Original message -------- >>> From: Chris Wood >>> Date: 17/03/2017 02:07 (GMT+01:00) >>> To: "Raymond A. Clarke" , >>> gail at trumantechnologies.com, 'Jeanne Kramer-Smyth' >>> , 'Robert Spindler' >>> , >>> pasig-discuss at mail.asis.org >>> Subject: Re: [Pasig-discuss] Risks of encryption & compression built >>> into >>> storage options? >>> >>> Thanks Ray as always for a great summary. Now my three bits: >>> >>> Three (3) copies please. One of which is in a remote location on a >>> different >>> flood plane, Electric grid, fault line etc. for the obvious reasons. >>> Mathematically, this has turned out to be the optimal number looked >>> at with a >>> cost/benefit mindset. Kind of like: 2 is better than one, buta >>> local problem >>> gets both copies. Three (remote) is more expensive but you get A LOT >>> more data >>> resilience/persistence. Four costs a bunch more, but delivers just a >>> little >>> bit more resilience. Four+ are all examples of ever diminishing >>> returns. >>> >>> CW >>> >>> On 3/16/2017 4:40 PM, Raymond A. Clarke wrote: >>>> >>>> Hello All, >>>> >>>> >>>> >>>> A few years back, I did some research on bit-rot and data >>>> corruption, as it >>>> relates to the various medium that data passes through, on its way >>>> to and >>>> from the user. Consider this simple example; as data from memory >>>> to HBA to >>>> cable to air to cable and so on, bits can be lost along way at any >>>> one of, or >>>> several of the medium transit points. This something that current >>>> technologies can help with, in part. Back to the original >>>> question, :how do >>>> we insure against corruption, either from compression, encryption? >>>> and/or >>>> transmission? Well disk and tape(/data resting places/, if you >>>> will) have a >>>> come very long way in reducing bit-error rates, compression and >>>> encryption. >>>> But the ?/resting places?/ are only part of a problem. In >>>> accordance with >>>> Gail?s suggestion and as Dr. Rosenthal has coined, LOCKSS (?lot of >>>> copies >>>> keep stuff safe?). >>>> >>>> >>>> >>>> >>>> >>>> Take good care, >>>> >>>> Raymond >>>> >>>> >>>> >>>> *From:*Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] *On >>>> Behalf Of >>>> *gail at trumantechnologies.com >>>> *Sent:* Thursday, March 16, 2017 5:10 PM >>>> *To:* Jeanne Kramer-Smyth ; Robert >>>> Spindler >>>> ; pasig-discuss at mail.asis.org >>>> *Subject:* Re: [Pasig-discuss] Risks of encryption & compression >>>> built into >>>> storage options? >>>> >>>> >>>> >>>> Hello again, Jeanne, >>>> >>>> >>>> >>>> I think you're hitting on something that needs to be raised to (and >>>> pushed >>>> for with) vendors, and that is the need for "More transparency" and >>>> the >>>> reporting to customers of "events" that are part of the provenance >>>> of a >>>> digital object. The storage architectures do a good job of error >>>> detection >>>> and self healing; however, they do not report this out. I'd like to >>>> (this is >>>> my dream) have vendors report back to customers (as part of their >>>> SLA) when a >>>> object (or part of an object if it's been chunked) has been >>>> repaired/self-healed - or lost forever. I could then record this as >>>> a PREMIS >>>> event. As you know, vendors "design for" 11x9s or 13x9s durability, >>>> but their >>>> SLAs do not require them to tell us if their durability and data >>>> corruption >>>> starts to get really bad for whatever reason. >>>> >>>> >>>> >>>> I've not directly answered your question about whether the encryption, >>>> dedupe, compression, and other things that can happen inside a >>>> storage system >>>> is increasing the risk of corruption. I'll look around. I am sure >>>> the disk >>>> vendors and storage solution and cloud storage vendors have run the >>>> numbers, >>>> but am not sure if they're made public. >>>> >>>> >>>> >>>> This alias has people from Oracle, Seagate and other storage >>>> companies on it >>>> so I encourage them to please share any research they have on this - >>>> >>>> >>>> >>>> >>>> >>>> Gail >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> Gail Truman >>>> >>>> Truman Technologies, LLC >>>> >>>> Certified Digital Archives Specialist, Society of American Archivists >>>> >>>> >>>> >>>> /*Protecting the world's digital heritage for future generations*/ >>>> >>>> www.trumantechnologies.com >>>> >>>> facebook/TrumanTechnologies >>>> >>>> https://www.linkedin.com/in/gtruman >>>> >>>> >>>> >>>> +1 510 502 6497 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -------- Original Message -------- >>>> Subject: RE: [Pasig-discuss] Risks of encryption & compression >>>> built >>>> into storage options? >>>> From: Jeanne Kramer-Smyth >>> > >>>> Date: Thu, March 16, 2017 1:44 pm >>>> To: "gail at trumantechnologies.com >>>> " >>>> >>> >, "Robert >>>> Spindler" >, >>>> "pasig-discuss at mail.asis.org " >>>> > >>>> >>>> Thanks Gail & Rob for your replies. >>>> >>>> >>>> >>>> I am less worried about the scenario of someone stealing a >>>> drive ? as Rob >>>> pointed out, if that is happening we have bigger problems. >>>> >>>> >>>> >>>> I do wonder if there are increased risks of bit-rot/file >>>> corruption with >>>> encryption, compression, and data deduplication. Have there >>>> been any >>>> studies on this? Could pulling a file off a drive that requires >>>> reversal >>>> of the auto-encryption and auto-compression in place at the >>>> system level >>>> mean a greater risk of bits flipping? I am trying to contrast the >>>> increased ?handling? and change required to get from the stored >>>> version >>>> to the original version vs the decreased ?handling? it would >>>> require if >>>> what I am pulling off the storage device is exactly what I sent >>>> to be stored. >>>> >>>> >>>> >>>> I am less worried about issues related to not being able to >>>> decrypt >>>> content. The storage solutions we are contemplating would >>>> remain under >>>> enough ongoing management that these issues should be >>>> avoidable. Since >>>> ensuring that non-public records remain secure is also very >>>> important, >>>> encryption gets some points in the ?pro? column. I agree that >>>> having >>>> multiple copies in different storage architectures and with >>>> different >>>> vendors would also decrease risk. >>>> >>>> >>>> >>>> I want to understand the risks related to the different storage >>>> architectures and the ever increasing number of ?automatic? >>>> things being >>>> done to digital objects in the process of them being stored and >>>> retrieved. Are there people doing work, independent of vendor >>>> claims, to >>>> document these types of risks? >>>> >>>> >>>> >>>> Thank you, >>>> >>>> >>>> >>>> Jeanne >>>> >>>> *Jeanne Kramer-Smyth* >>>> >>>> *IT Officer, Information Management Services II* >>>> >>>> http://siteresources.worldbank.org/NEWS/Images/spacer.png >>>> >>>> *Information and Technology Solutions* >>>> >>>> *WBG Library & Archives of Development* >>>> >>>> T >>>> >>>> >>>> >>>> 202-473-9803 >>>> >>>> E >>>> >>>> >>>> >>>> jkramersmyth at worldbankgroup.org >>>> >>>> >>>> W >>>> >>>> >>>> >>>> www.worldbank.org >>>> >>>> >>>> http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg >>>> >>>> >>>> >>>> spellboundblog >>>> >>>> http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg >>>> >>>> >>>> >>>> jkramersmyth >>>> >>>> http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg >>>> >>>> >>>> >>>> jkramersmyth >>>> >>>> A >>>> >>>> >>>> >>>> 1818 H St NW Washington, DC 20433 >>>> >>>> http://siteresources.worldbank.org/NEWS/Images/spacer.png >>>> >>>> http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png >>>> >>>> >>>> >>>> *From:*gail at trumantechnologies.com >>>> >>>> [mailto:gail at trumantechnologies.com] >>>> *Sent:* Thursday, March 16, 2017 3:18 PM >>>> *To:* Robert Spindler >>> >; Jeanne Kramer-Smyth >>>> >>> >; >>>> pasig-discuss at mail.asis.org >>>> >>>> *Subject:* RE: [Pasig-discuss] Risks of encryption & >>>> compression built >>>> into storage options? >>>> >>>> >>>> >>>> Hi all, a good topic! >>>> >>>> There is new drive technology from Seagate (probably other >>>> manufacturers) >>>> called "Self Encrypted Drives" (SEDs) which can be used to >>>> solve the >>>> problem of a person stealing a drive and running off with data. >>>> >>>> >>>> >>>> Most cloud services now automatically provide "server side >>>> encryption" >>>> which means the vendor is doing the encryption for all data at >>>> rest (as >>>> you point out Jeanne). This is required by HIPAA for all health >>>> care >>>> data, and is now considered cloud best practice for cloud >>>> vendors due to >>>> the very real risk of hacking. So, for archival, we need to >>>> weigh the >>>> data security provided by cloud storage services using server side >>>> encryption with the risk of the vendor managing the encryption >>>> keys. >>>> Which IMO underscores the importance of having multiple copies >>>> of all >>>> your archival data -- with different vendors and storage >>>> architectures or >>>> media types if possible. >>>> >>>> >>>> >>>> Gail >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> Gail Truman >>>> >>>> Truman Technologies, LLC >>>> >>>> Certified Digital Archives Specialist, Society of American >>>> Archivists >>>> >>>> >>>> >>>> /*Protecting the world's digital heritage for future generations*/ >>>> >>>> www.trumantechnologies.com >>>> >>>> facebook/TrumanTechnologies >>>> >>>> https://www.linkedin.com/in/gtruman >>>> >>>> >>>> >>>> +1 510 502 6497 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -------- Original Message -------- >>>> Subject: Re: [Pasig-discuss] Risks of encryption & >>>> compression built >>>> into storage options? >>>> From: Robert Spindler >>> > >>>> Date: Thu, March 16, 2017 9:06 am >>>> To: Jeanne Kramer-Smyth >>> >, >>>> "pasig-discuss at mail.asis.org >>>> " >>>> >>> > >>>> >>>> At risk of starting a conversation, here are a couple basic >>>> issues >>>> from an archival standpoint: >>>> >>>> >>>> >>>> Encryption: Who has the keys and what happens should a >>>> provider go >>>> out of business? >>>> >>>> >>>> >>>> Compression: Lossy or Lossless and how does that >>>> compression act on >>>> different file formats (video/audio). If this is frequently >>>> accessed >>>> material it becomes more of an issue. >>>> >>>> >>>> >>>> Short story: At a CNI meeting perhaps 15 years ago in a >>>> session about >>>> ebooks I asked a panel of vendors if they would give up the >>>> keys to >>>> encrypted e-books when they reached public domain. Crickets. >>>> >>>> >>>> >>>> Physical discs are not secure given the forensics software >>>> widely >>>> available today, but if someone can grab a physical disc >>>> the provider >>>> has more problems than forensics. >>>> >>>> >>>> >>>> Rob Spindler >>>> >>>> University Archivist and Head >>>> >>>> Archives and Special Collections >>>> >>>> Arizona State University Libraries >>>> >>>> Tempe AZ 85287-1006 >>>> >>>> 480.965.9277 >>>> >>>> http://www.asu.edu/lib/archives >>>> >>>> >>>> >>>> *From:*Pasig-discuss >>>> [mailto:pasig-discuss-bounces at asis.org] *On >>>> Behalf Of *Jeanne Kramer-Smyth >>>> *Sent:* Thursday, March 16, 2017 8:54 AM >>>> *To:* pasig-discuss at mail.asis.org >>>> >>>> *Subject:* [Pasig-discuss] Risks of encryption & >>>> compression built >>>> into storage options? >>>> >>>> >>>> >>>> Is anyone aware of active research into the risks to digital >>>> preservation that are posed by built in encryption and >>>> compression in >>>> both cloud and on-prem storage options? Any and all go-to >>>> sources for >>>> research and reading on these topics would be very welcome. >>>> >>>> >>>> >>>> I am being told by the staff who source storage solutions >>>> for my >>>> organization that encryption and compression are generally >>>> included >>>> at the hardware level. That content is automatically >>>> encrypted and >>>> compressed as it is written to disc ? and then un-encrypted >>>> and >>>> un-compressed as it is pulled off disc in response to a >>>> request. It >>>> is advertised as both more secure (someone stealing a >>>> physical disc >>>> could not, in theory, extract its contents) and more cost >>>> efficient >>>> (taking up less space). >>>> >>>> >>>> >>>> I want to be sure that as we make our choices for long-term >>>> storage >>>> of permanent digital records that we take these risks into >>>> accounts. >>>> >>>> >>>> >>>> Thank you! >>>> >>>> Jeanne >>>> >>>> >>>> >>>> *Jeanne Kramer-Smyth* >>>> >>>> *IT Officer, Information Management Services II* >>>> >>>> http://siteresources.worldbank.org/NEWS/Images/spacer.png >>>> >>>> *Information and Technology Solutions* >>>> >>>> *WBG Library & Archives of Development* >>>> >>>> T >>>> >>>> >>>> >>>> 202-473-9803 >>>> >>>> E >>>> >>>> >>>> >>>> jkramersmyth at worldbankgroup.org >>>> >>>> >>>> W >>>> >>>> >>>> >>>> www.worldbank.org >>>> >>>> >>>> http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg >>>> >>>> >>>> >>>> spellboundblog >>>> >>>> http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg >>>> >>>> >>>> >>>> jkramersmyth >>>> >>>> http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg >>>> >>>> >>>> >>>> jkramersmyth >>>> >>>> A >>>> >>>> >>>> >>>> 1818 H St NW Washington, DC 20433 >>>> >>>> http://siteresources.worldbank.org/NEWS/Images/spacer.png >>>> >>>> http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png >>>> >>>> >>>> >>>> >>>> >>>> -------------------------------------------------------------------------------- >>>> >>>> ---- >>>> To subscribe, unsubscribe, or modify your subscription, >>>> please visit >>>> http://mail.asis.org/mailman/listinfo/pasig-discuss >>>> _______ >>>> PASIG Webinars and conference material is at >>>> http://www.preservationandarchivingsig.org/index.html >>>> _______________________________________________ >>>> Pasig-discuss mailing list >>>> Pasig-discuss at mail.asis.org >>>> >>>> http://mail.asis.org/mailman/listinfo/pasig-discuss >>>> >>>> >>>> >>>> ---- >>>> To subscribe, unsubscribe, or modify your subscription, please visit >>>> http://mail.asis.org/mailman/listinfo/pasig-discuss >>>> _______ >>>> PASIG Webinars and conference material is at >>>> http://www.preservationandarchivingsig.org/index.html >>>> _______________________________________________ >>>> Pasig-discuss mailing list >>>> Pasig-discuss at mail.asis.org >>>> http://mail.asis.org/mailman/listinfo/pasig-discuss >>> >>> -- >>> ---------------------------------------------------- >>> Chris Wood >>> Storage & Data Management >>> Office: 408-782-2757 (Home Office) >>> Office: 408-276-0730 (Work Office) >>> Mobile: 408-218-7313 (Preferred) >>> Email: lw85381 at yahoo.com >>> ---------------------------------------------------- >> >> -- >> ---------------------------------------------------- >> Chris Wood >> Storage & Data Management >> Office: 408-782-2757 (Home Office) >> Office: 408-276-0730 (Work Office) >> Mobile: 408-218-7313 (Preferred) >> Email: lw85381 at yahoo.com >> ---------------------------------------------------- >> > -- ---------------------------------------------------- Chris Wood Storage & Data Management Office: 408-782-2757 (Home Office) Office: 408-276-0730 (Work Office) Mobile: 408-218-7313 (Preferred) Email: lw85381 at yahoo.com ---------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.addis at arkivum.com Mon Mar 20 03:28:21 2017 From: matthew.addis at arkivum.com (Matthew Addis) Date: Mon, 20 Mar 2017 07:28:21 +0000 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: <38ec899b-8104-1f22-ae7c-25af0dad29a4@yahoo.com> References: <9758a8c7f6644c81bc8ad030c5540dff@kit-msx-29.kit.edu> <98f7186f-6bef-6f1b-82cd-ad2e4b662f9b@yahoo.com> <4ad27211-7696-55ef-646b-6d17f07c0e95@kit.edu> <38ec899b-8104-1f22-ae7c-25af0dad29a4@yahoo.com> Message-ID: Hi Chris, Jos, There?s some examples of the effects that bit-flips and other data corruptions have on compressed AV content in a report from the PrestoPRIME project. There?s some links in there to work by Heydegger and others, e.g. impact of bit errors on JPEG2000. The report mainly covers AV, but there are some references in there about other compressed file formats, e.g. work by CERN on problems opening zips after bit-errors. See page 57 onwards. https://eprints.soton.ac.uk/373760/1/373760.pdf This was followed up by work in the DAVID project that did a more extensive survey of how AV content gets corrupted in practice within big AV archives. Note that bit-errors from storage, a.k.a bit rot was not a significant issue, well not compared with all the other problems! http://david-preservation.eu/wp-content/uploads/2013/10/DAVID-D2-1-INA-WP2-DamageAssessment_v1-20.pdf The reports above cover some aspects of compression at the file-format level (jpeg, zip etc.) and not compression at the hardware level (e.g. LTO data tape). At Arkivum we turn compression off at the hardware level and instead let our clients chose to use compression or not at the application level. In practice, most people using our service already have compressed file-formats, esp. images and video, because of the reduced data volumes which saves storage, bandwidth etc. in their day-to-day workflows. Trying to add compression on the top e.g. at the LTO level rarely adds any benefit. Cheers, Matthew Matthew Addis Chief Technology Officer tel: +44 1249 405060 mob: +44 7703 393374 email: matthew.addis at arkivum.com web: www.arkivum.com twitter: @arkivum This message is confidential unless otherwise stated. Arkivum Limited is registered in England and Wales, company number 7530353. Registered Office: 24 Cornhill, London, EC3V 3ND, United Kingdom From: Pasig-discuss > on behalf of Chris Wood > Date: Monday, 20 March 2017 04:15 To: "jos.vanwezel at kit.edu" > Cc: "pasig-discuss at mail.asis.org" > Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Hi Jos: I remember getting a nice hard copy of the booklet. I don't know if MPEG ever made it public. I thought that by now some institution would have posted it, but I can't find it. (yet) Still looking. Your comments about "other" bad things happening is spot on. In an IBM study several of us did about 20 years ago on data loss causal agents, human error won by a huge margin. In last place (Fewest causal factors) was H/W failures. In between in rough order was application and data management Software, incorrect documentation, device Firmware (We used to call this microcode when we still had dial phones:-)), external events (Power failures, storms whatever) and a few other categories I forget. I do remember our RAS expert (Reliability, Availability and Serviceability) making the point that perfect replication code replicates corrupted data perfectly. Even more true today. You might find this a quick interesting read: Why did NASA TRIPLEX all computers in the Space Shuttle and have two separate vendors write the code for them with a sophisticated voting system cases of non-agreement. https://www.nap.edu/read/2222/chapter/5 It seemed to work fine, but inter-booster gaskets did not and it turned out the insulation tiles were not very good at foreign object impact resistance. A good example of unknown and completely unexpected failure modes. CW On 3/19/2017 3:41 PM, van Wezel, Jos (SCC) wrote: Hi Chris, thanks a lot. The paper is fun reading especially about the analog movie archive :-)). Hopefully you do find the mpeg paper. My searches returned nothing yet. (was it ever published in some way?) @all: Having read all posts thus far (great stuff guys) clearly the engineering approach to the problem does not cut it at all. Reading between the lines there seems to be a lot of experience with disasters where even a BER of 10^99 and 4 copies wont help. :-) For now we'll stick with 2 copies and 3 if requested explicitly by the client. Groet Jos On 17/03/2017 17:48, Chris Wood wrote: Jos: I just knew somebody would ask this. Ha. Several years ago several of us wrote a paper for the MPEG (Motion Pictures Expert Group) and a mathematician named Jeff Bonwick figured out all the math. I haven't found it yet in the junk heap of my PC, but did find a companion paper written by by the same set of authors. It's not exactly, what you are looking for, but close. It's more about Bit Error Rates at a rather low level. I will continue to look for the MPEG paper. It's got to be somewhere. The Internet "never forgets" Right? Stay tuned as I keep looking. CW On 3/17/2017 12:48 AM, van Wezel, Jos (SCC) wrote: Chris, do you happen to have any reference to the mathatical correctness or computation that 3 copies is optimal. Is proof based on the standard ecc values that vendors list with their components (tapes, disks, transport lines, memory etc). I'm asking because its difficult to argue for the additional costs of a third copy without the math. Currently I can't tell my customers how much (as in percentage) extra security an addittional copy will bring, even theoretically. regards jos Sent from my Samsung Galaxy smartphone. -------- Original message -------- From: Chris Wood Date: 17/03/2017 02:07 (GMT+01:00) To: "Raymond A. Clarke" , gail at trumantechnologies.com, 'Jeanne Kramer-Smyth' , 'Robert Spindler' , pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Thanks Ray as always for a great summary. Now my three bits: Three (3) copies please. One of which is in a remote location on a different flood plane, Electric grid, fault line etc. for the obvious reasons. Mathematically, this has turned out to be the optimal number looked at with a cost/benefit mindset. Kind of like: 2 is better than one, buta local problem gets both copies. Three (remote) is more expensive but you get A LOT more data resilience/persistence. Four costs a bunch more, but delivers just a little bit more resilience. Four+ are all examples of ever diminishing returns. CW On 3/16/2017 4:40 PM, Raymond A. Clarke wrote: Hello All, A few years back, I did some research on bit-rot and data corruption, as it relates to the various medium that data passes through, on its way to and from the user. Consider this simple example; as data from memory to HBA to cable to air to cable and so on, bits can be lost along way at any one of, or several of the medium transit points. This something that current technologies can help with, in part. Back to the original question, :how do we insure against corruption, either from compression, encryption? and/or transmission? Well disk and tape(/data resting places/, if you will) have a come very long way in reducing bit-error rates, compression and encryption. But the ?/resting places?/ are only part of a problem. In accordance with Gail?s suggestion and as Dr. Rosenthal has coined, LOCKSS (?lot of copies keep stuff safe?). Take good care, Raymond *From:*Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] *On Behalf Of *gail at trumantechnologies.com *Sent:* Thursday, March 16, 2017 5:10 PM *To:* Jeanne Kramer-Smyth ; Robert Spindler ; pasig-discuss at mail.asis.org *Subject:* Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Hello again, Jeanne, I think you're hitting on something that needs to be raised to (and pushed for with) vendors, and that is the need for "More transparency" and the reporting to customers of "events" that are part of the provenance of a digital object. The storage architectures do a good job of error detection and self healing; however, they do not report this out. I'd like to (this is my dream) have vendors report back to customers (as part of their SLA) when a object (or part of an object if it's been chunked) has been repaired/self-healed - or lost forever. I could then record this as a PREMIS event. As you know, vendors "design for" 11x9s or 13x9s durability, but their SLAs do not require them to tell us if their durability and data corruption starts to get really bad for whatever reason. I've not directly answered your question about whether the encryption, dedupe, compression, and other things that can happen inside a storage system is increasing the risk of corruption. I'll look around. I am sure the disk vendors and storage solution and cloud storage vendors have run the numbers, but am not sure if they're made public. This alias has people from Oracle, Seagate and other storage companies on it so I encourage them to please share any research they have on this - Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists /*Protecting the world's digital heritage for future generations*/ www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: RE: [Pasig-discuss] Risks of encryption & compression built into storage options? From: Jeanne Kramer-Smyth > Date: Thu, March 16, 2017 1:44 pm To: "gail at trumantechnologies.com" >, "Robert Spindler" >, "pasig-discuss at mail.asis.org" > Thanks Gail & Rob for your replies. I am less worried about the scenario of someone stealing a drive ? as Rob pointed out, if that is happening we have bigger problems. I do wonder if there are increased risks of bit-rot/file corruption with encryption, compression, and data deduplication. Have there been any studies on this? Could pulling a file off a drive that requires reversal of the auto-encryption and auto-compression in place at the system level mean a greater risk of bits flipping? I am trying to contrast the increased ?handling? and change required to get from the stored version to the original version vs the decreased ?handling? it would require if what I am pulling off the storage device is exactly what I sent to be stored. I am less worried about issues related to not being able to decrypt content. The storage solutions we are contemplating would remain under enough ongoing management that these issues should be avoidable. Since ensuring that non-public records remain secure is also very important, encryption gets some points in the ?pro? column. I agree that having multiple copies in different storage architectures and with different vendors would also decrease risk. I want to understand the risks related to the different storage architectures and the ever increasing number of ?automatic? things being done to digital objects in the process of them being stored and retrieved. Are there people doing work, independent of vendor claims, to document these types of risks? Thank you, Jeanne *Jeanne Kramer-Smyth* *IT Officer, Information Management Services II* http://siteresources.worldbank.org/NEWS/Images/spacer.png *Information and Technology Solutions* *WBG Library & Archives of Development* T 202-473-9803 E jkramersmyth at worldbankgroup.org W www.worldbank.org http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg spellboundblog http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg jkramersmyth http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg jkramersmyth A 1818 H St NW Washington, DC 20433 http://siteresources.worldbank.org/NEWS/Images/spacer.png http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png *From:*gail at trumantechnologies.com [mailto:gail at trumantechnologies.com] *Sent:* Thursday, March 16, 2017 3:18 PM *To:* Robert Spindler >; Jeanne Kramer-Smyth >; pasig-discuss at mail.asis.org *Subject:* RE: [Pasig-discuss] Risks of encryption & compression built into storage options? Hi all, a good topic! There is new drive technology from Seagate (probably other manufacturers) called "Self Encrypted Drives" (SEDs) which can be used to solve the problem of a person stealing a drive and running off with data. Most cloud services now automatically provide "server side encryption" which means the vendor is doing the encryption for all data at rest (as you point out Jeanne). This is required by HIPAA for all health care data, and is now considered cloud best practice for cloud vendors due to the very real risk of hacking. So, for archival, we need to weigh the data security provided by cloud storage services using server side encryption with the risk of the vendor managing the encryption keys. Which IMO underscores the importance of having multiple copies of all your archival data -- with different vendors and storage architectures or media types if possible. Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists /*Protecting the world's digital heritage for future generations*/ www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? From: Robert Spindler > Date: Thu, March 16, 2017 9:06 am To: Jeanne Kramer-Smyth >, "pasig-discuss at mail.asis.org" > At risk of starting a conversation, here are a couple basic issues from an archival standpoint: Encryption: Who has the keys and what happens should a provider go out of business? Compression: Lossy or Lossless and how does that compression act on different file formats (video/audio). If this is frequently accessed material it becomes more of an issue. Short story: At a CNI meeting perhaps 15 years ago in a session about ebooks I asked a panel of vendors if they would give up the keys to encrypted e-books when they reached public domain. Crickets. Physical discs are not secure given the forensics software widely available today, but if someone can grab a physical disc the provider has more problems than forensics. Rob Spindler University Archivist and Head Archives and Special Collections Arizona State University Libraries Tempe AZ 85287-1006 480.965.9277 http://www.asu.edu/lib/archives *From:*Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] *On Behalf Of *Jeanne Kramer-Smyth *Sent:* Thursday, March 16, 2017 8:54 AM *To:* pasig-discuss at mail.asis.org *Subject:* [Pasig-discuss] Risks of encryption & compression built into storage options? Is anyone aware of active research into the risks to digital preservation that are posed by built in encryption and compression in both cloud and on-prem storage options? Any and all go-to sources for research and reading on these topics would be very welcome. I am being told by the staff who source storage solutions for my organization that encryption and compression are generally included at the hardware level. That content is automatically encrypted and compressed as it is written to disc ? and then un-encrypted and un-compressed as it is pulled off disc in response to a request. It is advertised as both more secure (someone stealing a physical disc could not, in theory, extract its contents) and more cost efficient (taking up less space). I want to be sure that as we make our choices for long-term storage of permanent digital records that we take these risks into accounts. Thank you! Jeanne *Jeanne Kramer-Smyth* *IT Officer, Information Management Services II* http://siteresources.worldbank.org/NEWS/Images/spacer.png *Information and Technology Solutions* *WBG Library & Archives of Development* T 202-473-9803 E jkramersmyth at worldbankgroup.org W www.worldbank.org http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg spellboundblog http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg jkramersmyth http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg jkramersmyth A 1818 H St NW Washington, DC 20433 http://siteresources.worldbank.org/NEWS/Images/spacer.png http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png -------------------------------------------------------------------------------- ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -- ---------------------------------------------------- Chris Wood Storage & Data Management Office: 408-782-2757 (Home Office) Office: 408-276-0730 (Work Office) Mobile: 408-218-7313 (Preferred) Email: lw85381 at yahoo.com ---------------------------------------------------- -- ---------------------------------------------------- Chris Wood Storage & Data Management Office: 408-782-2757 (Home Office) Office: 408-276-0730 (Work Office) Mobile: 408-218-7313 (Preferred) Email: lw85381 at yahoo.com ---------------------------------------------------- -- ---------------------------------------------------- Chris Wood Storage & Data Management Office: 408-782-2757 (Home Office) Office: 408-276-0730 (Work Office) Mobile: 408-218-7313 (Preferred) Email: lw85381 at yahoo.com ---------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter.burnhill at ed.ac.uk Sun Mar 19 21:02:45 2017 From: peter.burnhill at ed.ac.uk (BURNHILL Peter) Date: Mon, 20 Mar 2017 01:02:45 +0000 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: <4ad27211-7696-55ef-646b-6d17f07c0e95@kit.edu> References: <9758a8c7f6644c81bc8ad030c5540dff@kit-msx-29.kit.edu> <98f7186f-6bef-6f1b-82cd-ad2e4b662f9b@yahoo.com>, <4ad27211-7696-55ef-646b-6d17f07c0e95@kit.edu> Message-ID: <8F21139E-65C4-4763-B3D5-E585F6049C2D@ed.ac.uk> Jos My take would be strong recommendation that you don't stick with 2 & that you go for 3 replicates instead - doing what you can to have each held under separate conditions Peter Burnhill On 19 Mar 2017, at 10:51 pm, van Wezel, Jos (SCC) > wrote: Hi Chris, thanks a lot. The paper is fun reading especially about the analog movie archive :-)). Hopefully you do find the mpeg paper. My searches returned nothing yet. (was it ever published in some way?) @all: Having read all posts thus far (great stuff guys) clearly the engineering approach to the problem does not cut it at all. Reading between the lines there seems to be a lot of experience with disasters where even a BER of 10^99 and 4 copies wont help. :-) For now we'll stick with 2 copies and 3 if requested explicitly by the client. Groet Jos On 17/03/2017 17:48, Chris Wood wrote: Jos: I just knew somebody would ask this. Ha. Several years ago several of us wrote a paper for the MPEG (Motion Pictures Expert Group) and a mathematician named Jeff Bonwick figured out all the math. I haven't found it yet in the junk heap of my PC, but did find a companion paper written by by the same set of authors. It's not exactly, what you are looking for, but close. It's more about Bit Error Rates at a rather low level. I will continue to look for the MPEG paper. It's got to be somewhere. The Internet "never forgets" Right? Stay tuned as I keep looking. CW On 3/17/2017 12:48 AM, van Wezel, Jos (SCC) wrote: Chris, do you happen to have any reference to the mathatical correctness or computation that 3 copies is optimal. Is proof based on the standard ecc values that vendors list with their components (tapes, disks, transport lines, memory etc). I'm asking because its difficult to argue for the additional costs of a third copy without the math. Currently I can't tell my customers how much (as in percentage) extra security an addittional copy will bring, even theoretically. regards jos Sent from my Samsung Galaxy smartphone. -------- Original message -------- From: Chris Wood > Date: 17/03/2017 02:07 (GMT+01:00) To: "Raymond A. Clarke" >, gail at trumantechnologies.com, 'Jeanne Kramer-Smyth' >, 'Robert Spindler' >, pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Thanks Ray as always for a great summary. Now my three bits: Three (3) copies please. One of which is in a remote location on a different flood plane, Electric grid, fault line etc. for the obvious reasons. Mathematically, this has turned out to be the optimal number looked at with a cost/benefit mindset. Kind of like: 2 is better than one, buta local problem gets both copies. Three (remote) is more expensive but you get A LOT more data resilience/persistence. Four costs a bunch more, but delivers just a little bit more resilience. Four+ are all examples of ever diminishing returns. CW On 3/16/2017 4:40 PM, Raymond A. Clarke wrote: Hello All, A few years back, I did some research on bit-rot and data corruption, as it relates to the various medium that data passes through, on its way to and from the user. Consider this simple example; as data from memory to HBA to cable to air to cable and so on, bits can be lost along way at any one of, or several of the medium transit points. This something that current technologies can help with, in part. Back to the original question, :how do we insure against corruption, either from compression, encryption? and/or transmission? Well disk and tape(/data resting places/, if you will) have a come very long way in reducing bit-error rates, compression and encryption. But the ?/resting places?/ are only part of a problem. In accordance with Gail?s suggestion and as Dr. Rosenthal has coined, LOCKSS (?lot of copies keep stuff safe?). Take good care, Raymond *From:*Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] *On Behalf Of *gail at trumantechnologies.com *Sent:* Thursday, March 16, 2017 5:10 PM *To:* Jeanne Kramer-Smyth >; Robert Spindler >; pasig-discuss at mail.asis.org *Subject:* Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Hello again, Jeanne, I think you're hitting on something that needs to be raised to (and pushed for with) vendors, and that is the need for "More transparency" and the reporting to customers of "events" that are part of the provenance of a digital object. The storage architectures do a good job of error detection and self healing; however, they do not report this out. I'd like to (this is my dream) have vendors report back to customers (as part of their SLA) when a object (or part of an object if it's been chunked) has been repaired/self-healed - or lost forever. I could then record this as a PREMIS event. As you know, vendors "design for" 11x9s or 13x9s durability, but their SLAs do not require them to tell us if their durability and data corruption starts to get really bad for whatever reason. I've not directly answered your question about whether the encryption, dedupe, compression, and other things that can happen inside a storage system is increasing the risk of corruption. I'll look around. I am sure the disk vendors and storage solution and cloud storage vendors have run the numbers, but am not sure if they're made public. This alias has people from Oracle, Seagate and other storage companies on it so I encourage them to please share any research they have on this - Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists /*Protecting the world's digital heritage for future generations*/ www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: RE: [Pasig-discuss] Risks of encryption & compression built into storage options? From: Jeanne Kramer-Smyth > Date: Thu, March 16, 2017 1:44 pm To: "gail at trumantechnologies.com " >, "Robert Spindler" >, "pasig-discuss at mail.asis.org " > Thanks Gail & Rob for your replies. I am less worried about the scenario of someone stealing a drive ? as Rob pointed out, if that is happening we have bigger problems. I do wonder if there are increased risks of bit-rot/file corruption with encryption, compression, and data deduplication. Have there been any studies on this? Could pulling a file off a drive that requires reversal of the auto-encryption and auto-compression in place at the system level mean a greater risk of bits flipping? I am trying to contrast the increased ?handling? and change required to get from the stored version to the original version vs the decreased ?handling? it would require if what I am pulling off the storage device is exactly what I sent to be stored. I am less worried about issues related to not being able to decrypt content. The storage solutions we are contemplating would remain under enough ongoing management that these issues should be avoidable. Since ensuring that non-public records remain secure is also very important, encryption gets some points in the ?pro? column. I agree that having multiple copies in different storage architectures and with different vendors would also decrease risk. I want to understand the risks related to the different storage architectures and the ever increasing number of ?automatic? things being done to digital objects in the process of them being stored and retrieved. Are there people doing work, independent of vendor claims, to document these types of risks? Thank you, Jeanne *Jeanne Kramer-Smyth* *IT Officer, Information Management Services II* http://siteresources.worldbank.org/NEWS/Images/spacer.png *Information and Technology Solutions* *WBG Library & Archives of Development* T 202-473-9803 E jkramersmyth at worldbankgroup.org W www.worldbank.org http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg spellboundblog http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg jkramersmyth http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg jkramersmyth A 1818 H St NW Washington, DC 20433 http://siteresources.worldbank.org/NEWS/Images/spacer.png http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png *From:*gail at trumantechnologies.com [mailto:gail at trumantechnologies.com] *Sent:* Thursday, March 16, 2017 3:18 PM *To:* Robert Spindler >; Jeanne Kramer-Smyth >; pasig-discuss at mail.asis.org *Subject:* RE: [Pasig-discuss] Risks of encryption & compression built into storage options? Hi all, a good topic! There is new drive technology from Seagate (probably other manufacturers) called "Self Encrypted Drives" (SEDs) which can be used to solve the problem of a person stealing a drive and running off with data. Most cloud services now automatically provide "server side encryption" which means the vendor is doing the encryption for all data at rest (as you point out Jeanne). This is required by HIPAA for all health care data, and is now considered cloud best practice for cloud vendors due to the very real risk of hacking. So, for archival, we need to weigh the data security provided by cloud storage services using server side encryption with the risk of the vendor managing the encryption keys. Which IMO underscores the importance of having multiple copies of all your archival data -- with different vendors and storage architectures or media types if possible. Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists /*Protecting the world's digital heritage for future generations*/ www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? From: Robert Spindler > Date: Thu, March 16, 2017 9:06 am To: Jeanne Kramer-Smyth >, "pasig-discuss at mail.asis.org " > At risk of starting a conversation, here are a couple basic issues from an archival standpoint: Encryption: Who has the keys and what happens should a provider go out of business? Compression: Lossy or Lossless and how does that compression act on different file formats (video/audio). If this is frequently accessed material it becomes more of an issue. Short story: At a CNI meeting perhaps 15 years ago in a session about ebooks I asked a panel of vendors if they would give up the keys to encrypted e-books when they reached public domain. Crickets. Physical discs are not secure given the forensics software widely available today, but if someone can grab a physical disc the provider has more problems than forensics. Rob Spindler University Archivist and Head Archives and Special Collections Arizona State University Libraries Tempe AZ 85287-1006 480.965.9277 http://www.asu.edu/lib/archives *From:*Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] *On Behalf Of *Jeanne Kramer-Smyth *Sent:* Thursday, March 16, 2017 8:54 AM *To:* pasig-discuss at mail.asis.org *Subject:* [Pasig-discuss] Risks of encryption & compression built into storage options? Is anyone aware of active research into the risks to digital preservation that are posed by built in encryption and compression in both cloud and on-prem storage options? Any and all go-to sources for research and reading on these topics would be very welcome. I am being told by the staff who source storage solutions for my organization that encryption and compression are generally included at the hardware level. That content is automatically encrypted and compressed as it is written to disc ? and then un-encrypted and un-compressed as it is pulled off disc in response to a request. It is advertised as both more secure (someone stealing a physical disc could not, in theory, extract its contents) and more cost efficient (taking up less space). I want to be sure that as we make our choices for long-term storage of permanent digital records that we take these risks into accounts. Thank you! Jeanne *Jeanne Kramer-Smyth* *IT Officer, Information Management Services II* http://siteresources.worldbank.org/NEWS/Images/spacer.png *Information and Technology Solutions* *WBG Library & Archives of Development* T 202-473-9803 E jkramersmyth at worldbankgroup.org W www.worldbank.org http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg spellboundblog http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg jkramersmyth http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg jkramersmyth A 1818 H St NW Washington, DC 20433 http://siteresources.worldbank.org/NEWS/Images/spacer.png http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png -------------------------------------------------------------------------------- ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -- ---------------------------------------------------- Chris Wood Storage & Data Management Office: 408-782-2757 (Home Office) Office: 408-276-0730 (Work Office) Mobile: 408-218-7313 (Preferred) Email: lw85381 at yahoo.com ---------------------------------------------------- -- ---------------------------------------------------- Chris Wood Storage & Data Management Office: 408-782-2757 (Home Office) Office: 408-276-0730 (Work Office) Mobile: 408-218-7313 (Preferred) Email: lw85381 at yahoo.com ---------------------------------------------------- -- Steinbuch Centre for Computing (SCC) KIT - Campus Nord Hermann von Helmholtzplatz 1 76344 Eggenstein - Leopoldshafen ? +49 721 60826305 Building 449, Room 122 Orcid ID: 0000-0003-0175-6216 ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: not available URL: From lw85381 at yahoo.com Mon Mar 20 14:23:03 2017 From: lw85381 at yahoo.com (Chris Wood) Date: Mon, 20 Mar 2017 11:23:03 -0700 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: References: <9758a8c7f6644c81bc8ad030c5540dff@kit-msx-29.kit.edu> <98f7186f-6bef-6f1b-82cd-ad2e4b662f9b@yahoo.com> <4ad27211-7696-55ef-646b-6d17f07c0e95@kit.edu> <38ec899b-8104-1f22-ae7c-25af0dad29a4@yahoo.com> Message-ID: <6780eeba-728a-44c7-e6c9-584f4376773f@yahoo.com> Thank You Matthew, Very interesting. CW On 3/20/2017 12:28 AM, Matthew Addis wrote: > Hi Chris, Jos, > > There?s some examples of the effects that bit-flips and other data > corruptions have on compressed AV content in a report from the > PrestoPRIME project. There?s some links in there to work by Heydegger > and others, e.g. impact of bit errors on JPEG2000. The report mainly > covers AV, but there are some references in there about other > compressed file formats, e.g. work by CERN on problems opening zips > after bit-errors. See page 57 onwards. > https://eprints.soton.ac.uk/373760/1/373760.pdf > > This was followed up by work in the DAVID project that did a more > extensive survey of how AV content gets corrupted in practice within > big AV archives. Note that bit-errors from storage, a.k.a bit rot > was not a significant issue, well not compared with all the other > problems! > http://david-preservation.eu/wp-content/uploads/2013/10/DAVID-D2-1-INA-WP2-DamageAssessment_v1-20.pdf > > The reports above cover some aspects of compression at the file-format > level (jpeg, zip etc.) and not compression at the hardware level (e.g. > LTO data tape). At Arkivum we turn compression off at the hardware > level and instead let our clients chose to use compression or not at > the application level. In practice, most people using our service > already have compressed file-formats, esp. images and video, because > of the reduced data volumes which saves storage, bandwidth etc. in > their day-to-day workflows. Trying to add compression on the top > e.g. at the LTO level rarely adds any benefit. > > Cheers, > > Matthew > > > Matthew Addis > > Chief Technology Officer > > tel: +44 1249 405060 > > mob: +44 7703 393374 > > email: matthew.addis at arkivum.com > > web: www.arkivum.com > > twitter: @arkivum > > This message is confidential unless otherwise stated. > > Arkivum Limited is registered in England and Wales, company number > 7530353. Registered Office: 24 Cornhill, London, EC3V 3ND, United Kingdom > > > From: Pasig-discuss > on behalf of Chris Wood > > > Date: Monday, 20 March 2017 04:15 > To: "jos.vanwezel at kit.edu " > > > Cc: "pasig-discuss at mail.asis.org " > > > Subject: Re: [Pasig-discuss] Risks of encryption & compression built > into storage options? > > Hi Jos: > > I remember getting a nice hard copy of the booklet. I don't know if > MPEG ever made it public. I thought that by now some institution would > have posted it, but I can't find it. (yet) Still looking. > > Your comments about "other" bad things happening is spot on. In an IBM > study several of us did about 20 years ago on data loss causal agents, > human error won by a huge margin. In last place (Fewest causal > factors) was H/W failures. In between in rough order was application > and data management Software, incorrect documentation, device Firmware > (We used to call this microcode when we still had dial phones:-)), > external events (Power failures, storms whatever) and a few other > categories I forget. I do remember our RAS expert (Reliability, > Availability and Serviceability) making the point that perfect > replication code replicates corrupted data perfectly. Even more true > today. > > You might find this a quick interesting read: Why did NASA TRIPLEX all > computers in the Space Shuttle and have two separate vendors write the > code for them with a sophisticated voting system cases of > non-agreement. https://www.nap.edu/read/2222/chapter/5 > It seemed to work fine, but inter-booster gaskets did not and it > turned out the insulation tiles were not very good at foreign object > impact resistance. > A good example of unknown and completely unexpected failure modes. > > CW > > On 3/19/2017 3:41 PM, van Wezel, Jos (SCC) wrote: >> Hi Chris, thanks a lot. The paper is fun reading especially about the >> analog movie archive :-)). Hopefully you do find the mpeg paper. My >> searches returned nothing yet. (was it ever published in some way?) >> >> @all: Having read all posts thus far (great stuff guys) clearly the >> engineering approach to the problem does not cut it at all. Reading >> between the lines there seems to be a lot of experience with >> disasters where even a BER of 10^99 and 4 copies wont help. :-) For >> now we'll stick with 2 copies and 3 if requested explicitly by the >> client. >> >> Groet >> >> Jos >> >> >> On 17/03/2017 17:48, Chris Wood wrote: >>> Jos: >>> >>> I just knew somebody would ask this. Ha. Several years ago several >>> of us wrote >>> a paper for the MPEG (Motion Pictures Expert Group) and a >>> mathematician named >>> Jeff Bonwick figured out all the math. I haven't found it yet in >>> the junk heap >>> of my PC, but did find a companion paper written by by the same set >>> of authors. >>> It's not exactly, what you are looking for, but close. It's more >>> about Bit Error >>> Rates at a rather low level. I will continue to look for the MPEG >>> paper. It's >>> got to be somewhere. The Internet "never forgets" Right? >>> Stay tuned as I keep looking. >>> >>> CW >>> >>> On 3/17/2017 12:48 AM, van Wezel, Jos (SCC) wrote: >>>> Chris, >>>> do you happen to have any reference to the mathatical correctness or >>>> computation that 3 copies is optimal. Is proof based on the >>>> standard ecc >>>> values that vendors list with their components (tapes, disks, >>>> transport >>>> lines, memory etc). I'm asking because its difficult to argue for the >>>> additional costs of a third copy without the math. Currently I >>>> can't tell my >>>> customers how much (as in percentage) extra security an addittional >>>> copy will >>>> bring, even theoretically. >>>> >>>> regards >>>> >>>> jos >>>> >>>> Sent from my Samsung Galaxy smartphone. >>>> >>>> -------- Original message -------- >>>> From: Chris Wood >>>> Date: 17/03/2017 02:07 (GMT+01:00) >>>> To: "Raymond A. Clarke" , >>>> gail at trumantechnologies.com, 'Jeanne Kramer-Smyth' >>>> , 'Robert Spindler' >>>> , >>>> pasig-discuss at mail.asis.org >>>> Subject: Re: [Pasig-discuss] Risks of encryption & compression >>>> built into >>>> storage options? >>>> >>>> Thanks Ray as always for a great summary. Now my three bits: >>>> >>>> Three (3) copies please. One of which is in a remote location on a >>>> different >>>> flood plane, Electric grid, fault line etc. for the obvious reasons. >>>> Mathematically, this has turned out to be the optimal number looked >>>> at with a >>>> cost/benefit mindset. Kind of like: 2 is better than one, buta >>>> local problem >>>> gets both copies. Three (remote) is more expensive but you get A >>>> LOT more data >>>> resilience/persistence. Four costs a bunch more, but delivers just >>>> a little >>>> bit more resilience. Four+ are all examples of ever diminishing >>>> returns. >>>> >>>> CW >>>> >>>> On 3/16/2017 4:40 PM, Raymond A. Clarke wrote: >>>>> >>>>> Hello All, >>>>> >>>>> >>>>> >>>>> A few years back, I did some research on bit-rot and data >>>>> corruption, as it >>>>> relates to the various medium that data passes through, on its way >>>>> to and >>>>> from the user. Consider this simple example; as data from memory >>>>> to HBA to >>>>> cable to air to cable and so on, bits can be lost along way at any >>>>> one of, or >>>>> several of the medium transit points. This something that current >>>>> technologies can help with, in part. Back to the original >>>>> question, :how do >>>>> we insure against corruption, either from compression, encryption? >>>>> and/or >>>>> transmission? Well disk and tape(/data resting places/, if you >>>>> will) have a >>>>> come very long way in reducing bit-error rates, compression and >>>>> encryption. >>>>> But the ?/resting places?/ are only part of a problem. In >>>>> accordance with >>>>> Gail?s suggestion and as Dr. Rosenthal has coined, LOCKSS (?lot of >>>>> copies >>>>> keep stuff safe?). >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Take good care, >>>>> >>>>> Raymond >>>>> >>>>> >>>>> >>>>> *From:*Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] *On >>>>> Behalf Of >>>>> *gail at trumantechnologies.com >>>>> *Sent:* Thursday, March 16, 2017 5:10 PM >>>>> *To:* Jeanne Kramer-Smyth ; >>>>> Robert Spindler >>>>> ; pasig-discuss at mail.asis.org >>>>> *Subject:* Re: [Pasig-discuss] Risks of encryption & compression >>>>> built into >>>>> storage options? >>>>> >>>>> >>>>> >>>>> Hello again, Jeanne, >>>>> >>>>> >>>>> >>>>> I think you're hitting on something that needs to be raised to >>>>> (and pushed >>>>> for with) vendors, and that is the need for "More transparency" >>>>> and the >>>>> reporting to customers of "events" that are part of the provenance >>>>> of a >>>>> digital object. The storage architectures do a good job of error >>>>> detection >>>>> and self healing; however, they do not report this out. I'd like >>>>> to (this is >>>>> my dream) have vendors report back to customers (as part of their >>>>> SLA) when a >>>>> object (or part of an object if it's been chunked) has been >>>>> repaired/self-healed - or lost forever. I could then record this >>>>> as a PREMIS >>>>> event. As you know, vendors "design for" 11x9s or 13x9s >>>>> durability, but their >>>>> SLAs do not require them to tell us if their durability and data >>>>> corruption >>>>> starts to get really bad for whatever reason. >>>>> >>>>> >>>>> >>>>> I've not directly answered your question about whether the >>>>> encryption, >>>>> dedupe, compression, and other things that can happen inside a >>>>> storage system >>>>> is increasing the risk of corruption. I'll look around. I am sure >>>>> the disk >>>>> vendors and storage solution and cloud storage vendors have run >>>>> the numbers, >>>>> but am not sure if they're made public. >>>>> >>>>> >>>>> >>>>> This alias has people from Oracle, Seagate and other storage >>>>> companies on it >>>>> so I encourage them to please share any research they have on this - >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Gail >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Gail Truman >>>>> >>>>> Truman Technologies, LLC >>>>> >>>>> Certified Digital Archives Specialist, Society of American Archivists >>>>> >>>>> >>>>> >>>>> /*Protecting the world's digital heritage for future generations*/ >>>>> >>>>> www.trumantechnologies.com >>>>> >>>>> facebook/TrumanTechnologies >>>>> >>>>> https://www.linkedin.com/in/gtruman >>>>> >>>>> >>>>> >>>>> +1 510 502 6497 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -------- Original Message -------- >>>>> Subject: RE: [Pasig-discuss] Risks of encryption & compression >>>>> built >>>>> into storage options? >>>>> From: Jeanne Kramer-Smyth >>>> > >>>>> Date: Thu, March 16, 2017 1:44 pm >>>>> To: >>>>> "gail at trumantechnologies.com" >>>>> >>>>> >, >>>>> "Robert >>>>> Spindler" >, >>>>> "pasig-discuss at mail.asis.org" >>>>> > >>>>> >>>>> Thanks Gail & Rob for your replies. >>>>> >>>>> >>>>> >>>>> I am less worried about the scenario of someone stealing a >>>>> drive ? as Rob >>>>> pointed out, if that is happening we have bigger problems. >>>>> >>>>> >>>>> >>>>> I do wonder if there are increased risks of bit-rot/file >>>>> corruption with >>>>> encryption, compression, and data deduplication. Have there >>>>> been any >>>>> studies on this? Could pulling a file off a drive that >>>>> requires reversal >>>>> of the auto-encryption and auto-compression in place at the >>>>> system level >>>>> mean a greater risk of bits flipping? I am trying to contrast the >>>>> increased ?handling? and change required to get from the >>>>> stored version >>>>> to the original version vs the decreased ?handling? it would >>>>> require if >>>>> what I am pulling off the storage device is exactly what I >>>>> sent to be stored. >>>>> >>>>> >>>>> >>>>> I am less worried about issues related to not being able to >>>>> decrypt >>>>> content. The storage solutions we are contemplating would >>>>> remain under >>>>> enough ongoing management that these issues should be >>>>> avoidable. Since >>>>> ensuring that non-public records remain secure is also very >>>>> important, >>>>> encryption gets some points in the ?pro? column. I agree that >>>>> having >>>>> multiple copies in different storage architectures and with >>>>> different >>>>> vendors would also decrease risk. >>>>> >>>>> >>>>> >>>>> I want to understand the risks related to the different storage >>>>> architectures and the ever increasing number of ?automatic? >>>>> things being >>>>> done to digital objects in the process of them being stored and >>>>> retrieved. Are there people doing work, independent of vendor >>>>> claims, to >>>>> document these types of risks? >>>>> >>>>> >>>>> >>>>> Thank you, >>>>> >>>>> >>>>> >>>>> Jeanne >>>>> >>>>> *Jeanne Kramer-Smyth* >>>>> >>>>> *IT Officer, Information Management Services II* >>>>> >>>>> http://siteresources.worldbank.org/NEWS/Images/spacer.png >>>>> >>>>> *Information and Technology Solutions* >>>>> >>>>> *WBG Library & Archives of Development* >>>>> >>>>> T >>>>> >>>>> >>>>> >>>>> 202-473-9803 >>>>> >>>>> E >>>>> >>>>> >>>>> >>>>> jkramersmyth at worldbankgroup.org >>>>> >>>>> >>>>> W >>>>> >>>>> >>>>> >>>>> www.worldbank.org >>>>> >>>>> >>>>> http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg >>>>> >>>>> >>>>> >>>>> spellboundblog >>>>> >>>>> http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg >>>>> >>>>> >>>>> >>>>> jkramersmyth >>>>> >>>>> http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg >>>>> >>>>> >>>>> >>>>> jkramersmyth >>>>> >>>>> A >>>>> >>>>> >>>>> >>>>> 1818 H St NW Washington, DC 20433 >>>>> >>>>> http://siteresources.worldbank.org/NEWS/Images/spacer.png >>>>> >>>>> http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png >>>>> >>>>> >>>>> >>>>> *From:*gail at trumantechnologies.com >>>>> >>>>> >>>>> [mailto:gail at trumantechnologies.com] >>>>> *Sent:* Thursday, March 16, 2017 3:18 PM >>>>> *To:* Robert Spindler >>>> >; Jeanne Kramer-Smyth >>>>> >>>> >; >>>>> pasig-discuss at mail.asis.org >>>>> >>>>> *Subject:* RE: [Pasig-discuss] Risks of encryption & >>>>> compression built >>>>> into storage options? >>>>> >>>>> >>>>> >>>>> Hi all, a good topic! >>>>> >>>>> There is new drive technology from Seagate (probably other >>>>> manufacturers) >>>>> called "Self Encrypted Drives" (SEDs) which can be used to >>>>> solve the >>>>> problem of a person stealing a drive and running off with data. >>>>> >>>>> >>>>> >>>>> Most cloud services now automatically provide "server side >>>>> encryption" >>>>> which means the vendor is doing the encryption for all data at >>>>> rest (as >>>>> you point out Jeanne). This is required by HIPAA for all >>>>> health care >>>>> data, and is now considered cloud best practice for cloud >>>>> vendors due to >>>>> the very real risk of hacking. So, for archival, we need to >>>>> weigh the >>>>> data security provided by cloud storage services using server >>>>> side >>>>> encryption with the risk of the vendor managing the encryption >>>>> keys. >>>>> Which IMO underscores the importance of having multiple copies >>>>> of all >>>>> your archival data -- with different vendors and storage >>>>> architectures or >>>>> media types if possible. >>>>> >>>>> >>>>> >>>>> Gail >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Gail Truman >>>>> >>>>> Truman Technologies, LLC >>>>> >>>>> Certified Digital Archives Specialist, Society of American >>>>> Archivists >>>>> >>>>> >>>>> >>>>> /*Protecting the world's digital heritage for future >>>>> generations*/ >>>>> >>>>> www.trumantechnologies.com >>>>> >>>>> facebook/TrumanTechnologies >>>>> >>>>> https://www.linkedin.com/in/gtruman >>>>> >>>>> >>>>> >>>>> +1 510 502 6497 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -------- Original Message -------- >>>>> Subject: Re: [Pasig-discuss] Risks of encryption & >>>>> compression built >>>>> into storage options? >>>>> From: Robert Spindler >>>> > >>>>> Date: Thu, March 16, 2017 9:06 am >>>>> To: Jeanne Kramer-Smyth >>>> >, >>>>> >>>>> "pasig-discuss at mail.asis.org" >>>>> >>>>> > >>>>> >>>>> At risk of starting a conversation, here are a couple >>>>> basic issues >>>>> from an archival standpoint: >>>>> >>>>> >>>>> >>>>> Encryption: Who has the keys and what happens should a >>>>> provider go >>>>> out of business? >>>>> >>>>> >>>>> >>>>> Compression: Lossy or Lossless and how does that >>>>> compression act on >>>>> different file formats (video/audio). If this is >>>>> frequently accessed >>>>> material it becomes more of an issue. >>>>> >>>>> >>>>> >>>>> Short story: At a CNI meeting perhaps 15 years ago in a >>>>> session about >>>>> ebooks I asked a panel of vendors if they would give up >>>>> the keys to >>>>> encrypted e-books when they reached public domain. Crickets. >>>>> >>>>> >>>>> >>>>> Physical discs are not secure given the forensics software >>>>> widely >>>>> available today, but if someone can grab a physical disc >>>>> the provider >>>>> has more problems than forensics. >>>>> >>>>> >>>>> >>>>> Rob Spindler >>>>> >>>>> University Archivist and Head >>>>> >>>>> Archives and Special Collections >>>>> >>>>> Arizona State University Libraries >>>>> >>>>> Tempe AZ 85287-1006 >>>>> >>>>> 480.965.9277 >>>>> >>>>> http://www.asu.edu/lib/archives >>>>> >>>>> >>>>> >>>>> *From:*Pasig-discuss >>>>> [mailto:pasig-discuss-bounces at asis.org] *On >>>>> Behalf Of *Jeanne Kramer-Smyth >>>>> *Sent:* Thursday, March 16, 2017 8:54 AM >>>>> *To:* pasig-discuss at mail.asis.org >>>>> >>>>> *Subject:* [Pasig-discuss] Risks of encryption & >>>>> compression built >>>>> into storage options? >>>>> >>>>> >>>>> >>>>> Is anyone aware of active research into the risks to digital >>>>> preservation that are posed by built in encryption and >>>>> compression in >>>>> both cloud and on-prem storage options? Any and all go-to >>>>> sources for >>>>> research and reading on these topics would be very welcome. >>>>> >>>>> >>>>> >>>>> I am being told by the staff who source storage solutions >>>>> for my >>>>> organization that encryption and compression are generally >>>>> included >>>>> at the hardware level. That content is automatically >>>>> encrypted and >>>>> compressed as it is written to disc ? and then >>>>> un-encrypted and >>>>> un-compressed as it is pulled off disc in response to a >>>>> request. It >>>>> is advertised as both more secure (someone stealing a >>>>> physical disc >>>>> could not, in theory, extract its contents) and more cost >>>>> efficient >>>>> (taking up less space). >>>>> >>>>> >>>>> >>>>> I want to be sure that as we make our choices for >>>>> long-term storage >>>>> of permanent digital records that we take these risks into >>>>> accounts. >>>>> >>>>> >>>>> >>>>> Thank you! >>>>> >>>>> Jeanne >>>>> >>>>> >>>>> >>>>> *Jeanne Kramer-Smyth* >>>>> >>>>> *IT Officer, Information Management Services II* >>>>> >>>>> http://siteresources.worldbank.org/NEWS/Images/spacer.png >>>>> >>>>> *Information and Technology Solutions* >>>>> >>>>> *WBG Library & Archives of Development* >>>>> >>>>> T >>>>> >>>>> >>>>> >>>>> 202-473-9803 >>>>> >>>>> E >>>>> >>>>> >>>>> >>>>> jkramersmyth at worldbankgroup.org >>>>> >>>>> >>>>> W >>>>> >>>>> >>>>> >>>>> www.worldbank.org >>>>> >>>>> >>>>> http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg >>>>> >>>>> >>>>> >>>>> spellboundblog >>>>> >>>>> http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg >>>>> >>>>> >>>>> >>>>> jkramersmyth >>>>> >>>>> http://siteresources.worldbank.org/NEWS/Images/linkedin_logo.jpg >>>>> >>>>> >>>>> >>>>> jkramersmyth >>>>> >>>>> A >>>>> >>>>> >>>>> >>>>> 1818 H St NW Washington, DC 20433 >>>>> >>>>> http://siteresources.worldbank.org/NEWS/Images/spacer.png >>>>> >>>>> http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -------------------------------------------------------------------------------- >>>>> >>>>> ---- >>>>> To subscribe, unsubscribe, or modify your subscription, >>>>> please visit >>>>> http://mail.asis.org/mailman/listinfo/pasig-discuss >>>>> _______ >>>>> PASIG Webinars and conference material is at >>>>> http://www.preservationandarchivingsig.org/index.html >>>>> _______________________________________________ >>>>> Pasig-discuss mailing list >>>>> Pasig-discuss at mail.asis.org >>>>> http://mail.asis.org/mailman/listinfo/pasig-discuss >>>>> >>>>> >>>>> >>>>> ---- >>>>> To subscribe, unsubscribe, or modify your subscription, please visit >>>>> http://mail.asis.org/mailman/listinfo/pasig-discuss >>>>> _______ >>>>> PASIG Webinars and conference material is at >>>>> http://www.preservationandarchivingsig.org/index.html >>>>> _______________________________________________ >>>>> Pasig-discuss mailing list >>>>> Pasig-discuss at mail.asis.org >>>>> http://mail.asis.org/mailman/listinfo/pasig-discuss >>>> >>>> -- >>>> ---------------------------------------------------- >>>> Chris Wood >>>> Storage & Data Management >>>> Office: 408-782-2757 (Home Office) >>>> Office: 408-276-0730 (Work Office) >>>> Mobile: 408-218-7313 (Preferred) >>>> Email: lw85381 at yahoo.com >>>> ---------------------------------------------------- >>> >>> -- >>> ---------------------------------------------------- >>> Chris Wood >>> Storage & Data Management >>> Office: 408-782-2757 (Home Office) >>> Office: 408-276-0730 (Work Office) >>> Mobile: 408-218-7313 (Preferred) >>> Email: lw85381 at yahoo.com >>> ---------------------------------------------------- >>> >> > > -- > ---------------------------------------------------- > Chris Wood > Storage & Data Management > Office: 408-782-2757 (Home Office) > Office: 408-276-0730 (Work Office) > Mobile: 408-218-7313 (Preferred) > Email:lw85381 at yahoo.com > ---------------------------------------------------- -- ---------------------------------------------------- Chris Wood Storage & Data Management Office: 408-782-2757 (Home Office) Office: 408-276-0730 (Work Office) Mobile: 408-218-7313 (Preferred) Email: lw85381 at yahoo.com ---------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From dshr at stanford.edu Mon Mar 20 17:27:48 2017 From: dshr at stanford.edu (David Rosenthal) Date: Mon, 20 Mar 2017 14:27:48 -0700 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: <8F21139E-65C4-4763-B3D5-E585F6049C2D@ed.ac.uk> References: <9758a8c7f6644c81bc8ad030c5540dff@kit-msx-29.kit.edu> <98f7186f-6bef-6f1b-82cd-ad2e4b662f9b@yahoo.com> <4ad27211-7696-55ef-646b-6d17f07c0e95@kit.edu> <8F21139E-65C4-4763-B3D5-E585F6049C2D@ed.ac.uk> Message-ID: <82420722-d32e-b8f0-c4d4-36b5ab5c1f65@stanford.edu> On 03/19/2017 06:02 PM, BURNHILL Peter wrote: > My take would be strong recommendation that you don't stick with 2 & > that you go for 3 replicates instead - doing what you can to have > each held under separate conditions Here are two of my blog posts on the question of "How few copies?" From 2011: http://blog.dshr.org/2011/03/how-few-copies.html From 2016: http://blog.dshr.org/2016/04/how-few-copies.html responding to this work from MIT: https://github.com/MIT-Informatics/PreservationSimulation David. From gail at trumantechnologies.com Mon Mar 20 19:31:29 2017 From: gail at trumantechnologies.com (gail at trumantechnologies.com) Date: Mon, 20 Mar 2017 16:31:29 -0700 Subject: [Pasig-discuss] =?utf-8?q?Risks_of_encryption_=26_compression_bui?= =?utf-8?q?lt_into_storage_options=3F?= Message-ID: <20170320163129.b554e26909f2beaf9f8ddbf6be9a6600.e70d8e7aef.wbe@email09.godaddy.com> An HTML attachment was scrubbed... URL: From matthew.addis at arkivum.com Tue Mar 21 04:24:30 2017 From: matthew.addis at arkivum.com (Matthew Addis) Date: Tue, 21 Mar 2017 08:24:30 +0000 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: <20170320163129.b554e26909f2beaf9f8ddbf6be9a6600.e70d8e7aef.wbe@email09.godaddy.com> References: <20170320163129.b554e26909f2beaf9f8ddbf6be9a6600.e70d8e7aef.wbe@email09.godaddy.com> Message-ID: Interesting! There?s a growing number of cloud services that provide bit-level preservation, which is good news as it?s evidence that there?s a growing market. Along with Arkivum (largely UK but we do have customers in the US), there?s also DuraCloud (US) and some interesting ?shared service? options in specific domains, e.g. DPN for scholarly outputs. The guys at AVPreserve in the US profiled some of these using the NDSA preservation levels (https://www.avpreserve.com/papers-and-presentations/cloud-storage-vendor-profiles/) and the TNA in the UK commissioned a similar review (http://www.nationalarchives.gov.uk/documents/CloudStorage-Guidance_March-2015.pdf). There?s also a bit about cloud for digital preservation in the DPC handbook which links to some vendors and case studies (http://www.dpconline.org/handbook/technical-solutions-and-tools/cloud-services) I?m not aware of a recent survey of bit-preservation in the cloud - does anyone have any pointers? Cheers, Matthew Matthew Addis Chief Technology Officer tel: +44 1249 405060 mob: +44 7703 393374 email: matthew.addis at arkivum.com web: www.arkivum.com twitter: @arkivum This message is confidential unless otherwise stated. Arkivum Limited is registered in England and Wales, company number 7530353. Registered Office: 24 Cornhill, London, EC3V 3ND, United Kingdom From: Pasig-discuss > on behalf of "gail at trumantechnologies.com" > Date: Monday, 20 March 2017 23:31 To: Michal R??i?ka >, "pasig-discuss at mail.asis.org" > Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Michal /all - I'm aware of a cloud vendor who has rolled out preservation services (including SHA-256 fixity checks, BagIt for transport, and some other features). Their data centers are in US so probably not useful for .cz but others on this alias may find this useful. Check out http://www.komodocloud.com/TruStore.html Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? From: Michal R??i?ka > Date: Fri, March 17, 2017 12:06 pm To: > Dear all, I am very interested in this discussion. One short comment first and two question next: I do not think erasure coding is a good idea in the LTP system as significantly increases the complexity of the system and coding (increases probability of an error in implementation/process/...) and increases interconnections of the data between the multiple storage areas. I am a big fan of isolation, independence and as-simple-as-possible coding of the data replicas as much as possible. Now questions: 1. What is the best LTP implementation methodology you can recommend me? I do not mean the OAIS itself but practical recommendations on concrete implementations, methods and procedures for a relatively small (<< 1 PB) data archive. 2. The Ceph distributed storage was mentioned in the below cited e-mail. I am aware of the Ceph use in the Dutch National Archive (http://widodh.o.auroraobjects.eu/talks/ceph_dutch_national_archive_2016.pdf#page=11&zoom=page-fit,-177,595). What do you think about the use of Ceph in an LTP system? Do you have any experience with Ceph in practice or strong opinion on this technology? All the best, Michal Dne 17.3.2017 v 15:49 Paul Mather napsal(a): > On Mar 17, 2017, at 3:48 AM, van Wezel, Jos (SCC) > > wrote: > >> Chris, >> do you happen to have any reference to the mathatical correctness or >> computation that 3 copies is optimal. Is proof based on the standard >> ecc values that vendors list with their components (tapes, disks, >> transport lines, memory etc). I'm asking because its difficult to >> argue for the additional costs of a third copy without the math. >> Currently I can't tell my customers how much (as in percentage) extra >> security an addittional copy will bring, even theoretically. > > One thing I don't believe I've seen mentioned so far in regards to > redundancy costs is switching to erasure-resilient coding rather than > using plain replication. Explained briefly, erasure-resilient coding > represents a logical unit of data as k fragments. These k fragments > are then encoded into a larger unit of n fragments, n > k, where the > n-k extra fragments can be thought of as "parity" fragments. The n > encoded fragments may then be distributed across different disks, > racks, and data centres. The value is that *any* k out of n fragments > may be used to reconstitute the original logical unit of data. As n > grows larger, the probability of total data loss grows smaller, and, > conversely, the storage overhead and cost grows larger, allowing you to > choose your cost/risk balance. The main disadvantage of > erasure-resilient coding is that data I/O latency is increased due to > the inherently distributed nature of the storage approach. There are > comparisons between replication and erasure-resilient coding systems. > One such (https://dl.acm.org/citation.cfm?id=687814) concludes, "We > show that systems employing erasure codes have mean time to failures > many orders of magnitude higher than replicated systems with similar > storage and bandwidth requirements. More importantly, erasure-resilient > systems use an order of magnitude less bandwidth and storage to provide > similar system durability as replicated systems." > > Erasure-resilient coding is becoming mainstream in Cloud storage and > object storage systems in general. I believe that Hadoop has recently > acquired an erasure-resilient coding storage option for HDFS as an > alternative to the standard replication model. This is due to the > increase in data set sizes, where erasure-resilient coding can offer > lower redundancy overheads than plain replication options, yet still > offering the same or higher assurance levels on data availability. I > also believe CEPH and OpenStack Swift are supporting erasure-resilient > storage. > > Cheers, > > Paul. -- --------------------------------------------------------------- Michal R??i?ka > Phone: +420 549 49 6834 Aleph Library Management System Library Information Centre, Institute of Computer Science Masaryk University, Czech Republic Office number C308, Botanick? 68a, 602 00 Brno OpenPGP key: https://kic-internal.ics.muni.cz/~ruzicka/pgp-key/ Fingerprint: 4791 027A B994 A183 C28C 9B89 33C1 5D8C 293E 15A9 --------------------------------------------------------------- ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From william.kilbride at dpconline.org Tue Mar 21 04:32:07 2017 From: william.kilbride at dpconline.org (William Kilbride) Date: Tue, 21 Mar 2017 08:32:07 +0000 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: References: <20170320163129.b554e26909f2beaf9f8ddbf6be9a6600.e70d8e7aef.wbe@email09.godaddy.com> Message-ID: Hi All, There was quite a bit of thinking about this in the context of the TIMBUS project: Mike Nolan / Intel were particularly interested in the topic then as were colleagues in SAP. I am not sure the extent to which the research was developed into a service, and the reports written then are a bit dated now: but I can follow up off line if colleagues are interested. W :-) From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Matthew Addis Sent: 21 March 2017 08:25 To: pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Interesting! There?s a growing number of cloud services that provide bit-level preservation, which is good news as it?s evidence that there?s a growing market. Along with Arkivum (largely UK but we do have customers in the US), there?s also DuraCloud (US) and some interesting ?shared service? options in specific domains, e.g. DPN for scholarly outputs. The guys at AVPreserve in the US profiled some of these using the NDSA preservation levels (https://www.avpreserve.com/papers-and-presentations/cloud-storage-vendor-profiles/) and the TNA in the UK commissioned a similar review (http://www.nationalarchives.gov.uk/documents/CloudStorage-Guidance_March-2015.pdf). There?s also a bit about cloud for digital preservation in the DPC handbook which links to some vendors and case studies (http://www.dpconline.org/handbook/technical-solutions-and-tools/cloud-services) I?m not aware of a recent survey of bit-preservation in the cloud - does anyone have any pointers? Cheers, Matthew Matthew Addis Chief Technology Officer tel: +44 1249 405060 mob: +44 7703 393374 email: matthew.addis at arkivum.com web: www.arkivum.com twitter: @arkivum This message is confidential unless otherwise stated. Arkivum Limited is registered in England and Wales, company number 7530353. Registered Office: 24 Cornhill, London, EC3V 3ND, United Kingdom From: Pasig-discuss > on behalf of "gail at trumantechnologies.com" > Date: Monday, 20 March 2017 23:31 To: Michal R??i?ka >, "pasig-discuss at mail.asis.org" > Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Michal /all - I'm aware of a cloud vendor who has rolled out preservation services (including SHA-256 fixity checks, BagIt for transport, and some other features). Their data centers are in US so probably not useful for .cz but others on this alias may find this useful. Check out http://www.komodocloud.com/TruStore.html Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? From: Michal R??i?ka > Date: Fri, March 17, 2017 12:06 pm To: > Dear all, I am very interested in this discussion. One short comment first and two question next: I do not think erasure coding is a good idea in the LTP system as significantly increases the complexity of the system and coding (increases probability of an error in implementation/process/...) and increases interconnections of the data between the multiple storage areas. I am a big fan of isolation, independence and as-simple-as-possible coding of the data replicas as much as possible. Now questions: 1. What is the best LTP implementation methodology you can recommend me? I do not mean the OAIS itself but practical recommendations on concrete implementations, methods and procedures for a relatively small (<< 1 PB) data archive. 2. The Ceph distributed storage was mentioned in the below cited e-mail. I am aware of the Ceph use in the Dutch National Archive (http://widodh.o.auroraobjects.eu/talks/ceph_dutch_national_archive_2016.pdf#page=11&zoom=page-fit,-177,595). What do you think about the use of Ceph in an LTP system? Do you have any experience with Ceph in practice or strong opinion on this technology? All the best, Michal Dne 17.3.2017 v 15:49 Paul Mather napsal(a): > On Mar 17, 2017, at 3:48 AM, van Wezel, Jos (SCC) > > wrote: > >> Chris, >> do you happen to have any reference to the mathatical correctness or >> computation that 3 copies is optimal. Is proof based on the standard >> ecc values that vendors list with their components (tapes, disks, >> transport lines, memory etc). I'm asking because its difficult to >> argue for the additional costs of a third copy without the math. >> Currently I can't tell my customers how much (as in percentage) extra >> security an addittional copy will bring, even theoretically. > > One thing I don't believe I've seen mentioned so far in regards to > redundancy costs is switching to erasure-resilient coding rather than > using plain replication. Explained briefly, erasure-resilient coding > represents a logical unit of data as k fragments. These k fragments > are then encoded into a larger unit of n fragments, n > k, where the > n-k extra fragments can be thought of as "parity" fragments. The n > encoded fragments may then be distributed across different disks, > racks, and data centres. The value is that *any* k out of n fragments > may be used to reconstitute the original logical unit of data. As n > grows larger, the probability of total data loss grows smaller, and, > conversely, the storage overhead and cost grows larger, allowing you to > choose your cost/risk balance. The main disadvantage of > erasure-resilient coding is that data I/O latency is increased due to > the inherently distributed nature of the storage approach. There are > comparisons between replication and erasure-resilient coding systems. > One such (https://dl.acm.org/citation.cfm?id=687814) concludes, "We > show that systems employing erasure codes have mean time to failures > many orders of magnitude higher than replicated systems with similar > storage and bandwidth requirements. More importantly, erasure-resilient > systems use an order of magnitude less bandwidth and storage to provide > similar system durability as replicated systems." > > Erasure-resilient coding is becoming mainstream in Cloud storage and > object storage systems in general. I believe that Hadoop has recently > acquired an erasure-resilient coding storage option for HDFS as an > alternative to the standard replication model. This is due to the > increase in data set sizes, where erasure-resilient coding can offer > lower redundancy overheads than plain replication options, yet still > offering the same or higher assurance levels on data availability. I > also believe CEPH and OpenStack Swift are supporting erasure-resilient > storage. > > Cheers, > > Paul. -- --------------------------------------------------------------- Michal R??i?ka > Phone: +420 549 49 6834 Aleph Library Management System Library Information Centre, Institute of Computer Science Masaryk University, Czech Republic Office number C308, Botanick? 68a, 602 00 Brno OpenPGP key: https://kic-internal.ics.muni.cz/~ruzicka/pgp-key/ Fingerprint: 4791 027A B994 A183 C28C 9B89 33C1 5D8C 293E 15A9 --------------------------------------------------------------- ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From SKlein at gc.cuny.edu Tue Mar 21 08:15:16 2017 From: SKlein at gc.cuny.edu (Klein, Stephen) Date: Tue, 21 Mar 2017 12:15:16 +0000 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: <20170320163129.b554e26909f2beaf9f8ddbf6be9a6600.e70d8e7aef.wbe@email09.godaddy.com> References: <20170320163129.b554e26909f2beaf9f8ddbf6be9a6600.e70d8e7aef.wbe@email09.godaddy.com> Message-ID: Preservica and DuraCloud are both cloud based and provide these services. From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of gail at trumantechnologies.com Sent: Monday, March 20, 2017 7:31 PM To: Michal R??i?ka ; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Michal /all - I'm aware of a cloud vendor who has rolled out preservation services (including SHA-256 fixity checks, BagIt for transport, and some other features). Their data centers are in US so probably not useful for .cz but others on this alias may find this useful. Check out http://www.komodocloud.com/TruStore.html[komodocloud.com] Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com[trumantechnologies.com] facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman[linkedin.com] +1 510 502 6497 -------- Original Message -------- Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? From: Michal R??i?ka > Date: Fri, March 17, 2017 12:06 pm To: > Dear all, I am very interested in this discussion. One short comment first and two question next: I do not think erasure coding is a good idea in the LTP system as significantly increases the complexity of the system and coding (increases probability of an error in implementation/process/...) and increases interconnections of the data between the multiple storage areas. I am a big fan of isolation, independence and as-simple-as-possible coding of the data replicas as much as possible. Now questions: 1. What is the best LTP implementation methodology you can recommend me? I do not mean the OAIS itself but practical recommendations on concrete implementations, methods and procedures for a relatively small (<< 1 PB) data archive. 2. The Ceph distributed storage was mentioned in the below cited e-mail. I am aware of the Ceph use in the Dutch National Archive (http://widodh.o.auroraobjects.eu/talks/ceph_dutch_national_archive_2016.pdf#page=11&zoom=page-fit[widodh.o.auroraobjects.eu],-177,595). What do you think about the use of Ceph in an LTP system? Do you have any experience with Ceph in practice or strong opinion on this technology? All the best, Michal Dne 17.3.2017 v 15:49 Paul Mather napsal(a): > On Mar 17, 2017, at 3:48 AM, van Wezel, Jos (SCC) > > wrote: > >> Chris, >> do you happen to have any reference to the mathatical correctness or >> computation that 3 copies is optimal. Is proof based on the standard >> ecc values that vendors list with their components (tapes, disks, >> transport lines, memory etc). I'm asking because its difficult to >> argue for the additional costs of a third copy without the math. >> Currently I can't tell my customers how much (as in percentage) extra >> security an addittional copy will bring, even theoretically. > > One thing I don't believe I've seen mentioned so far in regards to > redundancy costs is switching to erasure-resilient coding rather than > using plain replication. Explained briefly, erasure-resilient coding > represents a logical unit of data as k fragments. These k fragments > are then encoded into a larger unit of n fragments, n > k, where the > n-k extra fragments can be thought of as "parity" fragments. The n > encoded fragments may then be distributed across different disks, > racks, and data centres. The value is that *any* k out of n fragments > may be used to reconstitute the original logical unit of data. As n > grows larger, the probability of total data loss grows smaller, and, > conversely, the storage overhead and cost grows larger, allowing you to > choose your cost/risk balance. The main disadvantage of > erasure-resilient coding is that data I/O latency is increased due to > the inherently distributed nature of the storage approach. There are > comparisons between replication and erasure-resilient coding systems. > One such (https://dl.acm.org/citation.cfm?id=687814)[dl.acm.org] concludes, "We > show that systems employing erasure codes have mean time to failures > many orders of magnitude higher than replicated systems with similar > storage and bandwidth requirements. More importantly, erasure-resilient > systems use an order of magnitude less bandwidth and storage to provide > similar system durability as replicated systems." > > Erasure-resilient coding is becoming mainstream in Cloud storage and > object storage systems in general. I believe that Hadoop has recently > acquired an erasure-resilient coding storage option for HDFS as an > alternative to the standard replication model. This is due to the > increase in data set sizes, where erasure-resilient coding can offer > lower redundancy overheads than plain replication options, yet still > offering the same or higher assurance levels on data availability. I > also believe CEPH and OpenStack Swift are supporting erasure-resilient > storage. > > Cheers, > > Paul. -- --------------------------------------------------------------- Michal R??i?ka > Phone: +420 549 49 6834 Aleph Library Management System Library Information Centre, Institute of Computer Science Masaryk University, Czech Republic Office number C308, Botanick? 68a, 602 00 Brno OpenPGP key: https://kic-internal.ics.muni.cz/~ruzicka/pgp-key[kic-internal.ics.muni.cz]/ Fingerprint: 4791 027A B994 A183 C28C 9B89 33C1 5D8C 293E 15A9 --------------------------------------------------------------- ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss[mail.asis.org] _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html[preservationandarchivingsig.org] _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss[mail.asis.org] -------------- next part -------------- An HTML attachment was scrubbed... URL: From gail at trumantechnologies.com Tue Mar 21 10:21:32 2017 From: gail at trumantechnologies.com (Gail Truman) Date: Tue, 21 Mar 2017 07:21:32 -0700 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: References: <20170320163129.b554e26909f2beaf9f8ddbf6be9a6600.e70d8e7aef.wbe@email09.godaddy.com> Message-ID: <65F61138-24AE-4339-8FD6-D01423FA7B00@trumantechnologies.com> Hi, yes that's true but it seems like they (Duracloud and Preservica) use Amazon and other public clouds to run on, whereas this is coming from the cloud vendor itself by the look of it. And they appear to have addressed how to import large sizes of data. Interesting. Gail Truman Truman Technologies, LLC Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 5026497 > On Mar 21, 2017, at 5:15 AM, Klein, Stephen wrote: > > Preservica and DuraCloud are both cloud based and provide these services. > > From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of gail at trumantechnologies.com > Sent: Monday, March 20, 2017 7:31 PM > To: Michal R??i?ka ; pasig-discuss at mail.asis.org > Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? > > Michal /all - I'm aware of a cloud vendor who has rolled out preservation services (including SHA-256 fixity checks, BagIt for transport, and some other features). Their data centers are in US so probably not useful for .cz but others on this alias may find this useful. > > Check out http://www.komodocloud.com/TruStore.html[komodocloud.com] > > Gail > > > > > > Gail Truman > Truman Technologies, LLC > Certified Digital Archives Specialist, Society of American Archivists > > Protecting the world's digital heritage for future generations > www.trumantechnologies.com[trumantechnologies.com] > facebook/TrumanTechnologies > https://www.linkedin.com/in/gtruman[linkedin.com] > > +1 510 502 6497 > > > > > -------- Original Message -------- > Subject: Re: [Pasig-discuss] Risks of encryption & compression built > into storage options? > From: Michal R??i?ka > Date: Fri, March 17, 2017 12:06 pm > To: > > Dear all, > > I am very interested in this discussion. One short comment first and > two question next: > > I do not think erasure coding is a good idea in the LTP system as > significantly increases the complexity of the system and coding > (increases probability of an error in implementation/process/...) and > increases interconnections of the data between the multiple storage > areas. I am a big fan of isolation, independence and > as-simple-as-possible coding of the data replicas as much as possible. > > Now questions: > > 1. What is the best LTP implementation methodology you can recommend > me? I do not mean the OAIS itself but practical recommendations on > concrete implementations, methods and procedures for a relatively small > (<< 1 PB) data archive. > > 2. The Ceph distributed storage was mentioned in the below cited > e-mail. I am aware of the Ceph use in the Dutch National Archive > (http://widodh.o.auroraobjects.eu/talks/ceph_dutch_national_archive_2016.pdf#page=11&zoom=page-fit[widodh.o.auroraobjects.eu],-177,595). > What do you think about the use of Ceph in an LTP system? Do you have > any experience with Ceph in practice or strong opinion on this technology? > > All the best, > Michal > > > Dne 17.3.2017 v 15:49 Paul Mather napsal(a): > > On Mar 17, 2017, at 3:48 AM, van Wezel, Jos (SCC) > > wrote: > > > >> Chris, > >> do you happen to have any reference to the mathatical correctness or > >> computation that 3 copies is optimal. Is proof based on the standard > >> ecc values that vendors list with their components (tapes, disks, > >> transport lines, memory etc). I'm asking because its difficult to > >> argue for the additional costs of a third copy without the math. > >> Currently I can't tell my customers how much (as in percentage) extra > >> security an addittional copy will bring, even theoretically. > > > > One thing I don't believe I've seen mentioned so far in regards to > > redundancy costs is switching to erasure-resilient coding rather than > > using plain replication. Explained briefly, erasure-resilient coding > > represents a logical unit of data as k fragments. These k fragments > > are then encoded into a larger unit of n fragments, n > k, where the > > n-k extra fragments can be thought of as "parity" fragments. The n > > encoded fragments may then be distributed across different disks, > > racks, and data centres. The value is that *any* k out of n fragments > > may be used to reconstitute the original logical unit of data. As n > > grows larger, the probability of total data loss grows smaller, and, > > conversely, the storage overhead and cost grows larger, allowing you to > > choose your cost/risk balance. The main disadvantage of > > erasure-resilient coding is that data I/O latency is increased due to > > the inherently distributed nature of the storage approach. There are > > comparisons between replication and erasure-resilient coding systems. > > One such (https://dl.acm.org/citation.cfm?id=687814)[dl.acm.org] concludes, "We > > show that systems employing erasure codes have mean time to failures > > many orders of magnitude higher than replicated systems with similar > > storage and bandwidth requirements. More importantly, erasure-resilient > > systems use an order of magnitude less bandwidth and storage to provide > > similar system durability as replicated systems." > > > > Erasure-resilient coding is becoming mainstream in Cloud storage and > > object storage systems in general. I believe that Hadoop has recently > > acquired an erasure-resilient coding storage option for HDFS as an > > alternative to the standard replication model. This is due to the > > increase in data set sizes, where erasure-resilient coding can offer > > lower redundancy overheads than plain replication options, yet still > > offering the same or higher assurance levels on data availability. I > > also believe CEPH and OpenStack Swift are supporting erasure-resilient > > storage. > > > > Cheers, > > > > Paul. > > > -- > --------------------------------------------------------------- > Michal R??i?ka > Phone: +420 549 49 6834 > Aleph Library Management System > Library Information Centre, Institute of Computer Science > Masaryk University, Czech Republic > Office number C308, Botanick? 68a, 602 00 Brno > OpenPGP key: https://kic-internal.ics.muni.cz/~ruzicka/pgp-key[kic-internal.ics.muni.cz]/ > Fingerprint: 4791 027A B994 A183 C28C 9B89 33C1 5D8C 293E 15A9 > --------------------------------------------------------------- > ---- > To subscribe, unsubscribe, or modify your subscription, please visit > http://mail.asis.org/mailman/listinfo/pasig-discuss[mail.asis.org] > _______ > PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html[preservationandarchivingsig.org] > _______________________________________________ > Pasig-discuss mailing list > Pasig-discuss at mail.asis.org > http://mail.asis.org/mailman/listinfo/pasig-discuss[mail.asis.org] > ---- > To subscribe, unsubscribe, or modify your subscription, please visit > http://mail.asis.org/mailman/listinfo/pasig-discuss > _______ > PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html > _______________________________________________ > Pasig-discuss mailing list > Pasig-discuss at mail.asis.org > http://mail.asis.org/mailman/listinfo/pasig-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From gail at trumantechnologies.com Tue Mar 21 10:27:39 2017 From: gail at trumantechnologies.com (Gail Truman) Date: Tue, 21 Mar 2017 07:27:39 -0700 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: References: <20170320163129.b554e26909f2beaf9f8ddbf6be9a6600.e70d8e7aef.wbe@email09.godaddy.com> Message-ID: There's an NDSA working group that's currently trying to assess cloud vendor practices around fixity, durability promises, etc. some people on this PASIG thread alias are part of the team and might want to chime in, (I am one of those on the NDSA team). Also there's a preservation storage group outside of the NDSA team (but with member overlap) that started prior to IPRES last year, did a workshop at iPRES and is regularly meeting with plans to publish findings and attend this coming iPRES. Folks from that team are also on this PASIG alias and can speak up too. Gail Gail Gail Truman Truman Technologies, LLC Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 5026497 > On Mar 21, 2017, at 1:24 AM, Matthew Addis wrote: > > Interesting! There?s a growing number of cloud services that provide bit-level preservation, which is good news as it?s evidence that there?s a growing market. Along with Arkivum (largely UK but we do have customers in the US), there?s also DuraCloud (US) and some interesting ?shared service? options in specific domains, e.g. DPN for scholarly outputs. > > The guys at AVPreserve in the US profiled some of these using the NDSA preservation levels (https://www.avpreserve.com/papers-and-presentations/cloud-storage-vendor-profiles/) and the TNA in the UK commissioned a similar review (http://www.nationalarchives.gov.uk/documents/CloudStorage-Guidance_March-2015.pdf). There?s also a bit about cloud for digital preservation in the DPC handbook which links to some vendors and case studies (http://www.dpconline.org/handbook/technical-solutions-and-tools/cloud-services) > > I?m not aware of a recent survey of bit-preservation in the cloud - does anyone have any pointers? > > Cheers, > > Matthew > > Matthew Addis > Chief Technology Officer > > tel: > +44 1249 405060 > mob: > +44 7703 393374 > email: > matthew.addis at arkivum.com > web: > www.arkivum.com > twitter: @arkivum > > This message is confidential unless otherwise stated. > Arkivum Limited is registered in England and Wales, company number 7530353. Registered Office: 24 Cornhill, London, EC3V 3ND, United Kingdom > > From: Pasig-discuss on behalf of "gail at trumantechnologies.com" > Date: Monday, 20 March 2017 23:31 > To: Michal R??i?ka , "pasig-discuss at mail.asis.org" > Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? > > Michal /all - I'm aware of a cloud vendor who has rolled out preservation services (including SHA-256 fixity checks, BagIt for transport, and some other features). Their data centers are in US so probably not useful for .cz but others on this alias may find this useful. > > Check out http://www.komodocloud.com/TruStore.html > > Gail > > > > > > Gail Truman > Truman Technologies, LLC > Certified Digital Archives Specialist, Society of American Archivists > > Protecting the world's digital heritage for future generations > www.trumantechnologies.com > facebook/TrumanTechnologies > https://www.linkedin.com/in/gtruman > > +1 510 502 6497 > > > > > -------- Original Message -------- > Subject: Re: [Pasig-discuss] Risks of encryption & compression built > into storage options? > From: Michal R??i?ka > Date: Fri, March 17, 2017 12:06 pm > To: > > Dear all, > > I am very interested in this discussion. One short comment first and > two question next: > > I do not think erasure coding is a good idea in the LTP system as > significantly increases the complexity of the system and coding > (increases probability of an error in implementation/process/...) and > increases interconnections of the data between the multiple storage > areas. I am a big fan of isolation, independence and > as-simple-as-possible coding of the data replicas as much as possible. > > Now questions: > > 1. What is the best LTP implementation methodology you can recommend > me? I do not mean the OAIS itself but practical recommendations on > concrete implementations, methods and procedures for a relatively small > (<< 1 PB) data archive. > > 2. The Ceph distributed storage was mentioned in the below cited > e-mail. I am aware of the Ceph use in the Dutch National Archive > (http://widodh.o.auroraobjects.eu/talks/ceph_dutch_national_archive_2016.pdf#page=11&zoom=page-fit,-177,595). > What do you think about the use of Ceph in an LTP system? Do you have > any experience with Ceph in practice or strong opinion on this technology? > > All the best, > Michal > > > Dne 17.3.2017 v 15:49 Paul Mather napsal(a): > > On Mar 17, 2017, at 3:48 AM, van Wezel, Jos (SCC) > > wrote: > > > >> Chris, > >> do you happen to have any reference to the mathatical correctness or > >> computation that 3 copies is optimal. Is proof based on the standard > >> ecc values that vendors list with their components (tapes, disks, > >> transport lines, memory etc). I'm asking because its difficult to > >> argue for the additional costs of a third copy without the math. > >> Currently I can't tell my customers how much (as in percentage) extra > >> security an addittional copy will bring, even theoretically. > > > > One thing I don't believe I've seen mentioned so far in regards to > > redundancy costs is switching to erasure-resilient coding rather than > > using plain replication. Explained briefly, erasure-resilient coding > > represents a logical unit of data as k fragments. These k fragments > > are then encoded into a larger unit of n fragments, n > k, where the > > n-k extra fragments can be thought of as "parity" fragments. The n > > encoded fragments may then be distributed across different disks, > > racks, and data centres. The value is that *any* k out of n fragments > > may be used to reconstitute the original logical unit of data. As n > > grows larger, the probability of total data loss grows smaller, and, > > conversely, the storage overhead and cost grows larger, allowing you to > > choose your cost/risk balance. The main disadvantage of > > erasure-resilient coding is that data I/O latency is increased due to > > the inherently distributed nature of the storage approach. There are > > comparisons between replication and erasure-resilient coding systems. > > One such (https://dl.acm.org/citation.cfm?id=687814) concludes, "We > > show that systems employing erasure codes have mean time to failures > > many orders of magnitude higher than replicated systems with similar > > storage and bandwidth requirements. More importantly, erasure-resilient > > systems use an order of magnitude less bandwidth and storage to provide > > similar system durability as replicated systems." > > > > Erasure-resilient coding is becoming mainstream in Cloud storage and > > object storage systems in general. I believe that Hadoop has recently > > acquired an erasure-resilient coding storage option for HDFS as an > > alternative to the standard replication model. This is due to the > > increase in data set sizes, where erasure-resilient coding can offer > > lower redundancy overheads than plain replication options, yet still > > offering the same or higher assurance levels on data availability. I > > also believe CEPH and OpenStack Swift are supporting erasure-resilient > > storage. > > > > Cheers, > > > > Paul. > > > -- > --------------------------------------------------------------- > Michal R??i?ka > Phone: +420 549 49 6834 > Aleph Library Management System > Library Information Centre, Institute of Computer Science > Masaryk University, Czech Republic > Office number C308, Botanick? 68a, 602 00 Brno > OpenPGP key: https://kic-internal.ics.muni.cz/~ruzicka/pgp-key/ > Fingerprint: 4791 027A B994 A183 C28C 9B89 33C1 5D8C 293E 15A9 > --------------------------------------------------------------- > ---- > To subscribe, unsubscribe, or modify your subscription, please visit > http://mail.asis.org/mailman/listinfo/pasig-discuss > _______ > PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html > _______________________________________________ > Pasig-discuss mailing list > Pasig-discuss at mail.asis.org > http://mail.asis.org/mailman/listinfo/pasig-discuss > ---- > To subscribe, unsubscribe, or modify your subscription, please visit > http://mail.asis.org/mailman/listinfo/pasig-discuss > _______ > PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html > _______________________________________________ > Pasig-discuss mailing list > Pasig-discuss at mail.asis.org > http://mail.asis.org/mailman/listinfo/pasig-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From gail at trumantechnologies.com Tue Mar 21 10:43:20 2017 From: gail at trumantechnologies.com (Gail Truman) Date: Tue, 21 Mar 2017 07:43:20 -0700 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: References: <20170320163129.b554e26909f2beaf9f8ddbf6be9a6600.e70d8e7aef.wbe@email09.godaddy.com> Message-ID: <2FED5A6F-245E-4032-9502-68ADD1EF0603@trumantechnologies.com> There's an NDSA working group that's currently trying to assess cloud vendor practices around fixity, durability promises, etc. some people on this PASIG thread alias are part of the team and might want to chime in, (I am one of those on the NDSA team). Also there's a preservation storage group outside of the NDSA team (but with member overlap) that started prior to IPRES last year, did a workshop at iPRES and is regularly meeting with plans to publish findings and attend this coming iPRES. Folks from that team are also on this PASIG alias and can speak up too. Gail Gail Gail Truman Truman Technologies, LLC Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 5026497 > On Mar 21, 2017, at 1:24 AM, Matthew Addis wrote: > > Interesting! There?s a growing number of cloud services that provide bit-level preservation, which is good news as it?s evidence that there?s a growing market. Along with Arkivum (largely UK but we do have customers in the US), there?s also DuraCloud (US) and some interesting ?shared service? options in specific domains, e.g. DPN for scholarly outputs. > > The guys at AVPreserve in the US profiled some of these using the NDSA preservation levels (https://www.avpreserve.com/papers-and-presentations/cloud-storage-vendor-profiles/) and the TNA in the UK commissioned a similar review (http://www.nationalarchives.gov.uk/documents/CloudStorage-Guidance_March-2015.pdf). There?s also a bit about cloud for digital preservation in the DPC handbook which links to some vendors and case studies (http://www.dpconline.org/handbook/technical-solutions-and-tools/cloud-services) > > I?m not aware of a recent survey of bit-preservation in the cloud - does anyone have any pointers? > > Cheers, > > Matthew > > Matthew Addis > Chief Technology Officer > > tel: > +44 1249 405060 > mob: > +44 7703 393374 > email: > matthew.addis at arkivum.com > web: > www.arkivum.com > twitter: @arkivum > > This message is confidential unless otherwise stated. > Arkivum Limited is registered in England and Wales, company number 7530353. Registered Office: 24 Cornhill, London, EC3V 3ND, United Kingdom > > From: Pasig-discuss on behalf of "gail at trumantechnologies.com" > Date: Monday, 20 March 2017 23:31 > To: Michal R??i?ka , "pasig-discuss at mail.asis.org" > Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? > > Michal /all - I'm aware of a cloud vendor who has rolled out preservation services (including SHA-256 fixity checks, BagIt for transport, and some other features). Their data centers are in US so probably not useful for .cz but others on this alias may find this useful. > > Check out http://www.komodocloud.com/TruStore.html > > Gail > > > > > > Gail Truman > Truman Technologies, LLC > Certified Digital Archives Specialist, Society of American Archivists > > Protecting the world's digital heritage for future generations > www.trumantechnologies.com > facebook/TrumanTechnologies > https://www.linkedin.com/in/gtruman > > +1 510 502 6497 > > > > > -------- Original Message -------- > Subject: Re: [Pasig-discuss] Risks of encryption & compression built > into storage options? > From: Michal R??i?ka > Date: Fri, March 17, 2017 12:06 pm > To: > > Dear all, > > I am very interested in this discussion. One short comment first and > two question next: > > I do not think erasure coding is a good idea in the LTP system as > significantly increases the complexity of the system and coding > (increases probability of an error in implementation/process/...) and > increases interconnections of the data between the multiple storage > areas. I am a big fan of isolation, independence and > as-simple-as-possible coding of the data replicas as much as possible. > > Now questions: > > 1. What is the best LTP implementation methodology you can recommend > me? I do not mean the OAIS itself but practical recommendations on > concrete implementations, methods and procedures for a relatively small > (<< 1 PB) data archive. > > 2. The Ceph distributed storage was mentioned in the below cited > e-mail. I am aware of the Ceph use in the Dutch National Archive > (http://widodh.o.auroraobjects.eu/talks/ceph_dutch_national_archive_2016.pdf#page=11&zoom=page-fit,-177,595). > What do you think about the use of Ceph in an LTP system? Do you have > any experience with Ceph in practice or strong opinion on this technology? > > All the best, > Michal > > > Dne 17.3.2017 v 15:49 Paul Mather napsal(a): > > On Mar 17, 2017, at 3:48 AM, van Wezel, Jos (SCC) > > wrote: > > > >> Chris, > >> do you happen to have any reference to the mathatical correctness or > >> computation that 3 copies is optimal. Is proof based on the standard > >> ecc values that vendors list with their components (tapes, disks, > >> transport lines, memory etc). I'm asking because its difficult to > >> argue for the additional costs of a third copy without the math. > >> Currently I can't tell my customers how much (as in percentage) extra > >> security an addittional copy will bring, even theoretically. > > > > One thing I don't believe I've seen mentioned so far in regards to > > redundancy costs is switching to erasure-resilient coding rather than > > using plain replication. Explained briefly, erasure-resilient coding > > represents a logical unit of data as k fragments. These k fragments > > are then encoded into a larger unit of n fragments, n > k, where the > > n-k extra fragments can be thought of as "parity" fragments. The n > > encoded fragments may then be distributed across different disks, > > racks, and data centres. The value is that *any* k out of n fragments > > may be used to reconstitute the original logical unit of data. As n > > grows larger, the probability of total data loss grows smaller, and, > > conversely, the storage overhead and cost grows larger, allowing you to > > choose your cost/risk balance. The main disadvantage of > > erasure-resilient coding is that data I/O latency is increased due to > > the inherently distributed nature of the storage approach. There are > > comparisons between replication and erasure-resilient coding systems. > > One such (https://dl.acm.org/citation.cfm?id=687814) concludes, "We > > show that systems employing erasure codes have mean time to failures > > many orders of magnitude higher than replicated systems with similar > > storage and bandwidth requirements. More importantly, erasure-resilient > > systems use an order of magnitude less bandwidth and storage to provide > > similar system durability as replicated systems." > > > > Erasure-resilient coding is becoming mainstream in Cloud storage and > > object storage systems in general. I believe that Hadoop has recently > > acquired an erasure-resilient coding storage option for HDFS as an > > alternative to the standard replication model. This is due to the > > increase in data set sizes, where erasure-resilient coding can offer > > lower redundancy overheads than plain replication options, yet still > > offering the same or higher assurance levels on data availability. I > > also believe CEPH and OpenStack Swift are supporting erasure-resilient > > storage. > > > > Cheers, > > > > Paul. > > > -- > --------------------------------------------------------------- > Michal R??i?ka > Phone: +420 549 49 6834 > Aleph Library Management System > Library Information Centre, Institute of Computer Science > Masaryk University, Czech Republic > Office number C308, Botanick? 68a, 602 00 Brno > OpenPGP key: https://kic-internal.ics.muni.cz/~ruzicka/pgp-key/ > Fingerprint: 4791 027A B994 A183 C28C 9B89 33C1 5D8C 293E 15A9 > --------------------------------------------------------------- > ---- > To subscribe, unsubscribe, or modify your subscription, please visit > http://mail.asis.org/mailman/listinfo/pasig-discuss > _______ > PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html > _______________________________________________ > Pasig-discuss mailing list > Pasig-discuss at mail.asis.org > http://mail.asis.org/mailman/listinfo/pasig-discuss > ---- > To subscribe, unsubscribe, or modify your subscription, please visit > http://mail.asis.org/mailman/listinfo/pasig-discuss > _______ > PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html > _______________________________________________ > Pasig-discuss mailing list > Pasig-discuss at mail.asis.org > http://mail.asis.org/mailman/listinfo/pasig-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From donna.shawhan at oracle.com Tue Mar 21 11:26:59 2017 From: donna.shawhan at oracle.com (Donna Shawhan) Date: Tue, 21 Mar 2017 08:26:59 -0700 (PDT) Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: References: <20170316140956.b554e26909f2beaf9f8ddbf6be9a6600.ee7a29052e.wbe@email09.godaddy.com> <02a701d29eae$c4722a00$4d567e00$@Verizon.net> <1312844D-7E81-462A-B2E9-8B3E77B99ECC@Verizon.net> Message-ID: A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 13523 bytes Desc: not available URL: From Christina.Tealdi at tessella.com Tue Mar 21 12:01:14 2017 From: Christina.Tealdi at tessella.com (Christina.Tealdi at tessella.com) Date: Tue, 21 Mar 2017 16:01:14 +0000 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: References: <20170316140956.b554e26909f2beaf9f8ddbf6be9a6600.ee7a29052e.wbe@email09.godaddy.com> <02a701d29eae$c4722a00$4d567e00$@Verizon.net> <1312844D-7E81-462A-B2E9-8B3E77B99ECC@Verizon.net> Message-ID: Hi, Please kindly unsubscribe me from your mailing list. I have tried through your unsubscribe link but it doesn't seem to be working. Many thanks. Christina Tealdi MCIPR, APRP Senior PR & Marketing Communications Manager E: christina.tealdi at tessella.com | D: +44 (0)1235 546 638 | M: +44 (0) 7799346453 | Skpe: Christina.Tealdi | Tessella 26 The Quadrant, Abingdon Science Park, Abingdon, Oxfordshire, OX14 3YS Part of the Altran Group Please consider the environment and do not print this e-mail unless you really need to. This message is commercial in confidence and may be privileged. It is intended for the addressee only. Access to this message by anyone else is unauthorised and strictly prohibited. If you have received this message in error, please inform the sender immediately. Please note that messages sent or received by the Tessella e-mail system may be monitored and stored in an information retrieval system. From: Donna Shawhan To: Jonathan Tilbury Cc: pasig-discuss at mail.asis.org Date: 21/03/2017 15:37 Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? Sent by: "Pasig-discuss" ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 2888 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 2407 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 36121 bytes Desc: not available URL: From donna.shawhan at oracle.com Tue Mar 21 12:15:36 2017 From: donna.shawhan at oracle.com (Donna Shawhan) Date: Tue, 21 Mar 2017 09:15:36 -0700 (PDT) Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: References: <20170316140956.b554e26909f2beaf9f8ddbf6be9a6600.ee7a29052e.wbe@email09.godaddy.com> <02a701d29eae$c4722a00$4d567e00$@Verizon.net> <1312844D-7E81-462A-B2E9-8B3E77B99ECC@Verizon.net> Message-ID: <2e884959-2b0d-447c-beb7-562ceb2c5912@default> Oracle provides archive solutions that transparently manage tiered storage infrastructures, from on-premises to cloud environments, to ensure long term data access and lower costs. The software, hardware, and storage cloud services together address data integrity and security requirements as noted in this email thread. Included features specifically address the requirements stated by Jonathan and others. Fixity ? Can accept checksum from end-user application and Supports 5 industry standard checksums (MD5, SHA-1, SHA-256, SHA-384, SHA-512) Verifies each read, write, and copy of a file from the application or for the archive workflow Validates file is unchanged, constant, and stable Stores the checksum in extended file attributes Automated Data Integrity Validation ? Set policy-based DIV audits Verifies on each read, write, or periodic time Self heals automatically (i.e. if an error is found in one archive copy, the software will find the second archive copy and create a new one in place of the damaged one) Multiple copies Software manages and stores multiple copies offline and offsite with support for disk, tape, and cloud Oracle Storage Cloud Archive Service Encryption in Transit: Data is always encrypted in transit with SSL/TLS when storing, accessing, and organizing data Encryption at Rest: Client-side or server-side; Customer or Oracle managed keys ? Opportunities to learn more: Solution Brief: HYPERLINK "http://www.oracle.com/us/products/servers-storage/storage/storage-software/oracle-hsm-fixity-brief-3614882.pdf" \nHow to Implement Fixity (PDF) Solution Brief: HYPERLINK "http://www.oracle.com/us/products/servers-storage/storage/storage-software/sb-storagetek-sam-div-2259409.pdf" \nAutomated Data Integrity Validation (PDF) Video: HYPERLINK "https://www.oracle.com/storage/tape-storage/hierarchical-storage-manager/index.html?bcid=4974260789001"Ensure long-term data access with Fixity and Automated Data Integrity Validation https://www.oracle.com/storage/tape-storage/index.html https://www.oracle.com/goto/hsm ? https://cloud.oracle.com/storage ? Best, Donna Shawhan ? ? From: Jonathan Tilbury [mailto:jonathan.tilbury at preservica.com] Sent: Friday, March 17, 2017 9:29 AM Cc: pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? ? I can give you an example of how we address this for our Preservica Cloud Edition, hosted in Amazon Web Services. ? Fixity calculation ? we calculate this on all files on the source machine (such as your laptop) from where you are loading the content. This is the earliest we can calculate it as it is the first time we come into contact with the content. You can use one of 4 fixity algorithms. ? Geographical separation ? Amazon S3 and Glacier default to storing each object (file) more in at least three different data centres within a region, and may have multiple copies in each data centre. The data centres are located in safer locations (e.g. not in earthquake zones) typically within a 10km radius. It?s possible to send copies to another Amazon zone maybe 1000?s of km away if you consider this too much risk. It?s also possible to write copies of the files and the metadata to a remote SFTP server. ? Fixity checking - Each copy is check-summed by both us and Amazon on arrival. We confirm this was not changed on the way in. Amazon check the fixity regularly. If a corruption is noted the system will self-heal from one of the other copies. In addition Preservica can be set up to cycle through the objects on S3 to do its own fixity checks to ensure the objects are still there from its own perspective. Also, we check the fixity when the file is retrieved to ensure it is still uncorrupted. ? Encryption at rest ? Amazon S3 can be set up to encrypt the information on disk and either manage the keys themselves or leave it to the customer (i.e. us) to manage. This is of course a risk of key loss but it is possible to escrow the keys to ensure they are safe. However, as all you are protecting against is theft if the hardware from the data centre you may choose not to encrypt the data on disk. ? Encryption in flight ? we recommend setting all information transport to use HTTPS to reduce the risk of packet interception and inspection. ? I hope this helps ? Jon Tilbury CTO, Preservica ? ? From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Raymond A. Clarke Sent: 17 March 2017 13:45 To: Chris Wood Cc: HYPERLINK "mailto:pasig-discuss at mail.asis.org"pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? ? Right on Chris. Thanks.? Apologies for any typos,? Sent from my iPhone ? Take good care, Ray On Mar 16, 2017, at 9:03 PM, Chris Wood wrote: Thanks Ray as always for a great summary. Now my three bits: Three (3) copies please. One of which is in a remote location on a different flood plane, Electric grid, fault line etc. for the obvious reasons. Mathematically, this has turned out to be the optimal number looked at with a cost/benefit mindset. Kind of like: 2 is better than one, buta? local problem gets both copies. Three (remote) is more expensive but you get A LOT more data resilience/persistence. Four costs a bunch more, but delivers just a little bit more resilience. Four+ are all examples of ever diminishing returns. ? CW On 3/16/2017 4:40 PM, Raymond A. Clarke wrote: Hello All, ? A few years back, I did some research on bit-rot and data corruption, as it relates to the various medium that data passes through, on its way to and from the user.? Consider this simple example; as data from memory to HBA to cable to air to cable and so on, bits can be lost along way at any one of, or several of the medium transit? points. This something that current technologies can help with, in part. ?Back to the original question, :how do we insure against corruption, either from compression, encryption? and/or transmission?? Well disk and tape(data resting places, if you will) have a come very long way in reducing bit-error rates, compression and encryption.? But the ?resting places? are only part of a problem.? In accordance with Gail?s suggestion and as Dr. Rosenthal has coined, LOCKSS (?lot of copies keep stuff safe?).? ? ? Take good care, Raymond ? From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of HYPERLINK "mailto:gail at trumantechnologies.com"gail at trumantechnologies.com Sent: Thursday, March 16, 2017 5:10 PM To: Jeanne Kramer-Smyth HYPERLINK "mailto:jkramersmyth at worldbankgroup.org"; Robert Spindler HYPERLINK "mailto:rob.spindler at asu.edu"; HYPERLINK "mailto:pasig-discuss at mail.asis.org"pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? ? Hello again, Jeanne,? ? I think you're hitting on something that needs to be raised to (and pushed for with) vendors, and that is the need for "More transparency" and the reporting to customers of "events" that are part of the provenance of a digital object. The storage architectures do a good job of error detection and self healing; however, they do not report this out. I'd like to (this is my dream) have vendors report back to customers (as part of their SLA) when a object (or part of an object if it's been chunked) has been repaired/self-healed - or lost forever. I could then record this as a PREMIS event. As you know, vendors "design for" 11x9s or 13x9s durability, but their SLAs do not require them to tell us if their durability and data corruption starts to get really bad for whatever reason. ? I've not directly answered your question about whether the encryption, dedupe, compression, and other things that can happen inside a storage system is increasing the risk of corruption. I'll look around. I am sure the disk vendors and storage solution and cloud storage vendors have run the numbers, but am not sure if they're made public.? ? This alias has people from Oracle, Seagate and other storage companies on it so I encourage them to please share any research they have on this -? ? ? Gail ? ? ? Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists ? Protecting the world's digital heritage for future generations HYPERLINK "http://www.trumantechnologies.com"www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman? ? +1 510 502 6497 ? ? ? -------- Original Message -------- Subject: RE: [Pasig-discuss] Risks of encryption & compression built into storage options? From: Jeanne Kramer-Smyth Date: Thu, March 16, 2017 1:44 pm To: "HYPERLINK "mailto:gail at trumantechnologies.com"gail at trumantechnologies.com" , "Robert Spindler" , "HYPERLINK "mailto:pasig-discuss at mail.asis.org"pasig-discuss at mail.asis.org" Thanks Gail & Rob for your replies. ? I am less worried about the scenario of someone stealing a drive ? as Rob pointed out, if that is happening we have bigger problems. ? I do wonder if there are increased risks of bit-rot/file corruption with encryption, compression, and data deduplication. Have there been any studies on this? Could pulling a file off a drive that requires reversal of the auto-encryption and auto-compression in place at the system level mean a greater risk of bits flipping? I am trying to contrast the increased ?handling? and change required to get from the stored version to the original version vs the decreased ?handling? it would require if what I am pulling off the storage device is exactly what I sent to be stored. ? I am less worried about issues related to not being able to decrypt content. The storage solutions we are contemplating would remain under enough ongoing management that these issues should be avoidable. Since ensuring that non-public records remain secure is also very important, encryption gets some points in the ?pro? column. I agree that having multiple copies in different storage architectures and with different vendors would also decrease risk. ? I want to understand the risks related to the different storage architectures and the ever increasing number of ?automatic? things being done to digital objects in the process of them being stored and retrieved. Are there people doing work, independent of vendor claims, to document these types of risks? ? Thank you, ? Jeanne Jeanne Kramer-Smyth IT Officer, Information Management Services II Information and Technology Solutions WBG Library & Archives of Development T 202-473-9803 E HYPERLINK "mailto:jkramersmyth at worldbankgroup.org%20" \njkramersmyth at worldbankgroup.org W HYPERLINK "https://urldefense.proofpoint.com/v2/url?u=http-3A__www.worldbank.org&d=DQMFAg&c=AGbYxfJbXK67KfXyGqyv2Ejiz41FqQuZFk4A-1IxfAU&r=NJgCuYsVfzWCDaR17iRz_stYXCBl0BBUfunzpCgq3O4&m=6K-rNEvustg-w3KUuAEUFhjRVFmFu0yMAsazbeVm-lg&s=TkShGzs9qr7es714pkkxzLceCXcULADNIGs74_m1QKQ&e=" \nwww.worldbank.org http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg spellboundblog http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg jkramersmyth jkramersmyth A 1818 H St NW Washington, DC 20433 ? From: HYPERLINK "mailto:gail at trumantechnologies.com"gail at trumantechnologies.com [mailto:gail at trumantechnologies.com] Sent: Thursday, March 16, 2017 3:18 PM To: Robert Spindler ; Jeanne Kramer-Smyth ; HYPERLINK "mailto:pasig-discuss at mail.asis.org"pasig-discuss at mail.asis.org Subject: RE: [Pasig-discuss] Risks of encryption & compression built into storage options? ? Hi all, a good topic! There is new drive technology from Seagate (probably other manufacturers) called "Self Encrypted Drives" (SEDs) which can be used to solve the problem of a person stealing a drive and running off with data. ? Most cloud services now automatically provide "server side encryption" which means the vendor is doing the encryption for all data at rest (as you point out Jeanne). This is required by HIPAA for all health care data, and is now considered cloud best practice for cloud vendors due to the very real risk of hacking. So, for archival, we need to weigh the data security provided by cloud storage services using server side encryption with the risk of the vendor managing the encryption keys. Which IMO underscores the importance of having multiple copies of all your archival data -- with different vendors and storage architectures or media types if possible. ? Gail ? ? ? ? ? Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists ? Protecting the world's digital heritage for future generations HYPERLINK "http://www.trumantechnologies.com" \nwww.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman? ? +1 510 502 6497 ? ? ? -------- Original Message -------- Subject: Re: [Pasig-discuss] Risks of encryption & compression built into storage options? From: Robert Spindler Date: Thu, March 16, 2017 9:06 am To: Jeanne Kramer-Smyth , "HYPERLINK "mailto:pasig-discuss at mail.asis.org" \npasig-discuss at mail.asis.org" At risk of starting a conversation, here are a couple basic issues from an archival standpoint: ? Encryption: Who has the keys and what happens should a provider go out of business? ? Compression: Lossy or Lossless and how does that compression act on different file formats (video/audio). If this is frequently accessed material it becomes more of an issue. ? Short story: At a CNI meeting perhaps 15 years ago in a session about ebooks I asked a panel of vendors if they would give up the keys to encrypted e-books when they reached public domain. Crickets. ? Physical discs are not secure given the forensics software widely available today, but if someone can grab a physical disc the provider has more problems than forensics. ? Rob Spindler University Archivist and Head Archives and Special Collections Arizona State University Libraries Tempe AZ 85287-1006 480.965.9277 http://www.asu.edu/lib/archives ? From: Pasig-discuss [mailto:pasig-discuss-bounces at asis.org] On Behalf Of Jeanne Kramer-Smyth Sent: Thursday, March 16, 2017 8:54 AM To: HYPERLINK "mailto:pasig-discuss at mail.asis.org" \npasig-discuss at mail.asis.org Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? ? Is anyone aware of active research into the risks to digital preservation that are posed by built in encryption and compression in both cloud and on-prem storage options? Any and all go-to sources for research and reading on these topics would be very welcome. ? I am being told by the staff who source storage solutions for my organization that encryption and compression are generally included at the hardware level. That content is automatically encrypted and compressed as it is written to disc ? and then un-encrypted and un-compressed as it is pulled off disc in response to a request. It is advertised as both more secure (someone stealing a physical disc could not, in theory, extract its contents) and more cost efficient (taking up less space). ? I want to be sure that as we make our choices for long-term storage of permanent digital records that we take these risks into accounts. ? Thank you! Jeanne ? Jeanne Kramer-Smyth IT Officer, Information Management Services II Information and Technology Solutions WBG Library & Archives of Development T 202-473-9803 E HYPERLINK "mailto:jkramersmyth at worldbankgroup.org%20" \njkramersmyth at worldbankgroup.org W HYPERLINK "https://urldefense.proofpoint.com/v2/url?u=http-3A__www.worldbank.org&d=DQMFAg&c=AGbYxfJbXK67KfXyGqyv2Ejiz41FqQuZFk4A-1IxfAU&r=NJgCuYsVfzWCDaR17iRz_stYXCBl0BBUfunzpCgq3O4&m=6K-rNEvustg-w3KUuAEUFhjRVFmFu0yMAsazbeVm-lg&s=TkShGzs9qr7es714pkkxzLceCXcULADNIGs74_m1QKQ&e=" \nwww.worldbank.org http://siteresources.worldbank.org/NEWS/Images/twitter_logo.jpg spellboundblog http://siteresources.worldbank.org/NEWS/Images/skype_logo.jpg jkramersmyth jkramersmyth A 1818 H St NW Washington, DC 20433 ? ? ? _____ ? ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list HYPERLINK "mailto:Pasig-discuss at mail.asis.org" \nPasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list HYPERLINK "mailto:Pasig-discuss at mail.asis.org"Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss ? -- ---------------------------------------------------- Chris Wood Storage & Data Management Office:? 408-782-2757 (Home Office) Office:? 408-276-0730 (Work Office) Mobile:? 408-218-7313 (Preferred) Email: HYPERLINK "mailto:lw85381 at yahoo.com"lw85381 at yahoo.com ---------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 700 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 11482 bytes Desc: not available URL: From dshr at stanford.edu Thu Mar 23 11:38:53 2017 From: dshr at stanford.edu (David Rosenthal) Date: Thu, 23 Mar 2017 08:38:53 -0700 Subject: [Pasig-discuss] Risks of encryption & compression built into storage options? In-Reply-To: References: <20170320163129.b554e26909f2beaf9f8ddbf6be9a6600.e70d8e7aef.wbe@email09.godaddy.com> Message-ID: <92ea01ce-6f4b-58ca-8466-a607cfd42acb@stanford.edu> I summarized my position on this discussion, and its relationship to trusted repository certification, in a blog post here: http://blog.dshr.org/2017/03/threats-to-stored-data.html David. From kyle.rimkus at gmail.com Mon Mar 20 11:38:52 2017 From: kyle.rimkus at gmail.com (Kyle Rimkus) Date: Mon, 20 Mar 2017 10:38:52 -0500 Subject: [Pasig-discuss] Call for Residents - Univ. of IL at Urbana-Champaign Library Message-ID: Apologies for cross-posting. The University Library at the University of Illinois at Urbana-Champaign is launching a residency program for new professionals (http://www.library.illinois.edu/residency), with two-year visiting positions available in the areas of African American Studies, Data Visualization, Digital Humanities Pedagogy, and Digital Preservation, all slated to start August 16. Please apply, or encourage your favorite early career librarian to do so by April 17. *Visiting Residency Librarian and Visiting Assistant Professor (Four Positions)* *University Library University of Illinois at Urbana-Champaign* The University of Illinois at Urbana-Champaign?s Library is pleased to announce a new opportunity for early-career librarians to gain professional experience and mentoring through a new library residency program. The residency program is offered to librarians within two years of receiving their degrees. The University Library seeks opportunities to help early career librarians embark on successful careers in academic and research libraries. Through the program, the residency librarian will gain in-depth work experience in academic librarianship, will be introduced to academic library administration, and will gain experience designing, conducting, and sharing the results of a research project. As part of a cohort of new professionals, the resident will benefit from mentoring and the opportunity to work closely with a group of individuals in the University Library. Individuals hoping to help the library advance and with the interest in developing themselves as a professional and scholar are encouraged to apply. Time period: Start date August 16, 2017 through ? August 15, 2019 We are recruiting to fill four, two-year positions with individuals who desire to build their skills and contribute to one of the following four areas of strategic importance and need to the institution: African American Studies Resident, Data Visualization, Digital Humanities Pedagogy, and Digital Preservation. . Information on the projected responsibilities of the four positions is available here: http://www.library.illinois.edu/residency. *Environment* The University of Illinois at Urbana-Champaign (UIUC) Library is a leader in the delivery of user services, and active programs in information, instructional, access, and scholarly services which help the Library to maintain its place at the intellectual heart of the campus. The Library also holds one of the preeminent research collections in the world, encompassing more than 13 million volumes and a total of more than 23 million items. The Library is committed to maintaining the strongest collections and service programs possible, and to engaging in research, development, and scholarly practice - all of which support the University's missions of teaching, research, and public engagement. The Library employs approximately 90 faculty members, and more than 300 academic professionals, staff, and graduate assistants. For more information, see: http://www.library.illinois.edu/ *QUALIFICATIONS:* *Required:* ? ALA-accredited Masters of library and information science or an equivalent degree received in 2015-2017 with degree received by August 16, 2017. ? All successful applicants will have demonstrated ability to work collegially and cooperatively with others in a team environment. ? All successful applicants will have demonstrated ability to communicate effectively in writing as evidenced by their cover letter ? Familiarity with or demonstrated interest in the area(s) of librarianship relevant to the specific residency positions in which the candidate has an interest. *Preferred:* ? African American Studies Resident : o Additional advanced degree in a humanities or social sciences discipline, with a focus on African American Studies; o Familiarity with or demonstrated interest in digital publishing and scholarly communications; o Teaching experience or experience conducting training; o Familiarity with collection development in an academic library setting ? Data Analytics and Visualization Resident: o Coursework or experience in data visualization; o Familiarity with data visualization tools (e.g., Tableau, Splunk, R); o Familiarity with best practices in data visualization; o Coursework or experience in statistical analysis; o Familiarity with conducting training and teaching, and developing program materials; o Demonstrated ability to remain conversant with newly evolving technologies; ? Digital Humanities Pedagogy Resident: o Knowledge of or demonstrated experience with research methods and tools in digital humanities, especially for text analysis or digital publishing; o Demonstrated experience or familiarity with teaching workshops or conducting other types of training events, especially for digital tools; o Demonstrated experience with instructional design or development of program materials; o Ability to remain conversant with newly evolving technologies; ? Digital Preservation Resident: o training or professional experience in digital preservation and born-digital content processing and/or data curation; o knowledge of best and evolving practice for providing access to content stored in proprietary, obsolete, and threatened file formats; o ability to install and evaluate computer programs; o demonstrated interest in developing digital preservation procedures and policy; o strong project management and research skills *Position Available:* The expected start date for the four Visiting Resident Librarians is August 16, 2017 *Salary and Rank: *The salary for all four positions is $50,000. A relocation allowance will be provided to offset documented expenses. Successful candidate will join the University Library as Visiting Assistant Professors. *Terms of Appointment*: Twelve-month appointment; 24 annual vacation days; 11 annual paid holidays; 12 annual sick-leave days (cumulative), plus an additional 13 sick-leave days (non-cumulative) available, if needed, each year; health insurance requiring a small co-payment is provided to employee (with the option to purchase coverage for spouse and dependents); required participation in State Universities Retirement System (SURS) (8% of annual salary is withheld and is refundable upon termination), with several options for participation in additional retirement plans; newly-hired employees are covered by the Medicare portion of Social Security and are subject to its deduction. *Campus & Community:* The University of Illinois at Urbana-Champaign is a comprehensive and major public land-grant university (Doctoral/Research University-Extensive) that is ranked among the best in the world. Chartered in 1867, it provides undergraduate and graduate education in more than 150 fields of study, conducts theoretical and applied research, and provides public service to the state and the nation. It employs 3,000 faculty members who serve 31,000 undergraduates and 12,000 graduate and professional students; approximately 25% of faculty receives campus-wide recognition each year for excellence in teaching. More information about the campus is available at www.illinois.edu. The University is located in the twin cities of Champaign and Urbana, which have a combined population of 100,000 and are situated about 140 miles south of Chicago, 120 miles west of Indianapolis, and 170 northeast of St. Louis. The University and its surrounding communities offer a cultural and recreational environment ideally suited to the work of a major research institution. For more information about the community, visit: http://illinois.edu/about/community/community.html or http://www.ccchamber.org/ . *To Apply:* To ensure full consideration, please create your candidate profile at https://jobs.illinois.edu and upload your letter of interest (detailing which position or positions you are interested in being considered for and details about your skills and experiences in that area),curriculum vitae, and contact information (including email addresses) for three professional references. Please see this web page for more information on each position ( http://www.library.illinois.edu/residency). Samples of relevant work or links to portfolios are also appreciated. Applications not submitted through this website will not be considered. For questions, please contact Library Human Resources at 217-333-8169 <%28217%29%20333-8169>. *Deadline:* In order to ensure full consideration, applications must be received by April 17, 2017. The review of applications will continue until the position is filled. * The University of Illinois conducts criminal background checks on all job candidates upon acceptance of a contingent offer*. *The University of Illinois is an Equal Opportunity, Affirmative Action employer. Minorities, women, veterans and individuals with disabilities are encouraged to apply. For more information, visit * *http://go.illinois.edu/EEO* *. To learn more about the University?s commitment to diversity, please visit * *http://www.inclusiveillinois.illinois.edu* -- Kyle R. Rimkus Preservation Librarian University of Illinois at Urbana-Champaign -------------- next part -------------- An HTML attachment was scrubbed... URL: