From arthurpasquinelli at gmail.com Mon Nov 13 14:25:13 2017 From: arthurpasquinelli at gmail.com (Arthur Pasquinelli) Date: Mon, 13 Nov 2017 11:25:13 -0800 Subject: [Pasig-discuss] Dodging the Memory Hole Conference November 15-16, 2017 Message-ID: <6C3EDF7C-4012-4982-AEEE-86C4BC905CF9@gmail.com> A quick note that the "Dodging the Memory Hole: Saving Online News? conference being held at the Internet Archive in San Francisco this week still has a few spots for attendees. https://www.rjionline.org/events/dodging-the-memory-hole-2017 This is the fifth event in the DTMH conference series focusing on preserving born-digital news content. Its name, Dodging the Memory Hole, comes from George Orwell?s ?1984,? in which photographs and documents conflicting with ?Big Brother?s? changing narrative were tossed into a ?memory hole? and destroyed. Featured speakers will be Brewster Kahle, founder and digital librarian for the Internet Archive and Daniel Ellsberg, whistleblower and right-to-know advocate. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmorley at stanford.edu Mon Nov 13 15:43:12 2017 From: jmorley at stanford.edu (Julian M. Morley) Date: Mon, 13 Nov 2017 20:43:12 +0000 Subject: [Pasig-discuss] Experiences with S3-like object store service providers? Message-ID: Hi everyone, I?ve currently got at least four use cases for an S3-compatible object store, spanning everything from traditional S3 through infrequent access stores to cold vaults. As a result I?ve spent considerable time researching options and prices, and was wondering if anyone on this list has any similar experiences they?d like to share. Our use cases range from hundreds of TB through to several PB, with different access patterns and comfort levels around redundancy and access. For most of them a 100% compatible S3 API is a requirement, but we can bend that a bit for the cold storage use case. We?re also considering local/on-prem object stores for one of the use cases - either rolling our own Ceph install, or using Dell/EMC ECS or SpectraLogic ArcticBlue/Blackpearl. The vendors that I?m looking at are: Amazon Web Services (S3, Infrequent Access S3 and S3-to-Glacier). This is the baseline. We have a direct connect pipe to AWS which reduces the pain of data egress considerably. IBM Cloud Bluemix (formerly CleverSafe) A good choice for multi-region redundancy, as they use erasure coding across regions - no ?catch up? replication - providing CRR at a cheaper price than AWS. If you only want to keep one copy of your data in the cloud, but have it be able to survive the loss of a region, this is the best choice (Google can also do this, but not with an S3 API or an infrequent access store). Dell/EMC Virtustream (no cold storage option) Uses EMC ECS hardware. Actually more expensive than AWS at retail pricing for standard object storage; their value add is tying Virtustream into on-prem ECS units. Iron Mountain Iron Cloud (Infrequent Access only) Also uses EMC ECS hardware. Designed primarily for backup/archive workloads (no big surprise there), but with no retrieval, egress or PUT/GET/POST charges. Oracle Cloud (cheapest cold storage option, but not S3 API) Uses Openstack Swift. Has the cheapest cloud-tape product (Oracle Cloud Storage Archive), but has recently increased prices to be closer to AWS Glacier. Google Cloud Platform (not an S3 API) Technically brilliant, but you have to be able to use their APIs. Their cold storage product is online (disk, not tape), but not as cheap as Glacier. Microsoft Azure (not an S3 API) Competitively priced, especially their Infrequent Access product, but again not an S3 API and their vault product is still in beta. Backblaze B2 (not an S3 API) Another backup/archive target, only slightly more expensive than Glacier, but online (no retrieval time or fees) and with significantly cheaper data egress rates than AWS. Wasabi Cloud Recently launched company from the team that brought you Carbonite. Ridiculously cheap S3 storage, but with a 90-day per-object minimum charge. It?s cheaper and faster than Glacier, both to store data and egress it, but there?s obvious concerns around company longevity. Would probably make a good second target if you have a multi-vendor requirement for your data. If anyone is interested in hearing more, or has any experience with any of these vendors, please speak up! -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries -------------- next part -------------- An HTML attachment was scrubbed... URL: From gail at trumantechnologies.com Mon Nov 13 19:06:42 2017 From: gail at trumantechnologies.com (gail at trumantechnologies.com) Date: Mon, 13 Nov 2017 17:06:42 -0700 Subject: [Pasig-discuss] =?utf-8?q?Experiences_with_S3-like_object_store_s?= =?utf-8?q?ervice_providers=3F?= Message-ID: <20171113170642.b554e26909f2beaf9f8ddbf6be9a6600.219166ea37.wbe@email09.godaddy.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Scality Connect Data Sheet.pdf Type: application/pdf Size: 618990 bytes Desc: not available URL: From jmorley at stanford.edu Mon Nov 13 23:27:46 2017 From: jmorley at stanford.edu (Julian M. Morley) Date: Tue, 14 Nov 2017 04:27:46 +0000 Subject: [Pasig-discuss] Experiences with S3-like object store service providers? In-Reply-To: <20171113170642.b554e26909f2beaf9f8ddbf6be9a6600.219166ea37.wbe@email09.godaddy.com> References: <20171113170642.b554e26909f2beaf9f8ddbf6be9a6600.219166ea37.wbe@email09.godaddy.com> Message-ID: <1895412C-8D73-4CD5-AE39-1E36463448DC@stanford.edu> Hi Gail, Sure - would be happy to chat with you. I?ve got Scality in my list of contenders - didn?t mention it here because my first few use cases are explicitly ?not on campus?, but I agree it?s definitely a fit for our main on prem system. As with any commercial software, ongoing licensing costs are a potential pain point for us. -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries From: "gail at trumantechnologies.com" > Date: Monday, November 13, 2017 at 4:06 PM To: Julian Morley >, "pasig-discuss at mail.asis.org" > Cc: "gail at trumantechnologies.com" > Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Julian, thanks for sharing your list and comments. Very thorough list. I'd love to chat (and I'm close by in Oakland).... I've quite a lot of experience in the cloud storage field and would suggest you also take a look at multi-cloud connector technologies that will allow you to standardize on S3, but write to non-S3-based public cloud vendors. And to tier or move data among private and public clouds and do federated search on metadata across a single namespace (across these clouds). Check out a couple of interesting technologies: Open Source Zenko.io - offering S3 connect to AWS, Azure and Google (the latter 2 are coming shortly), and also Scality Connect for Azure Blog Storage - translates S3 API calls to Azure blob storage API calls. See the attached datasheet and also https://www.zenko.io/ I'd add Scality to your list -- see the Gartner magic quadrant they're shown in the Upper Right Visionary quadrant and are close to you in San Francisco. They talk S3, File, NFS/SMB, REST (CDMI etc), can tier off to public clouds, and have lots of multi-PB size customer installs. Gartner MQ is here: https://www.gartner.com/doc/reprints?id=1-4IE870C&ct=171017&st=sb I'd be very interested in learning more about your use cases -- can we connect outside of this PASIG alias? Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: [Pasig-discuss] Experiences with S3-like object store service providers? From: "Julian M. Morley" > Date: Mon, November 13, 2017 12:43 pm To: "pasig-discuss at mail.asis.org" > Hi everyone, I?ve currently got at least four use cases for an S3-compatible object store, spanning everything from traditional S3 through infrequent access stores to cold vaults. As a result I?ve spent considerable time researching options and prices, and was wondering if anyone on this list has any similar experiences they?d like to share. Our use cases range from hundreds of TB through to several PB, with different access patterns and comfort levels around redundancy and access. For most of them a 100% compatible S3 API is a requirement, but we can bend that a bit for the cold storage use case. We?re also considering local/on-prem object stores for one of the use cases - either rolling our own Ceph install, or using Dell/EMC ECS or SpectraLogic ArcticBlue/Blackpearl. The vendors that I?m looking at are: Amazon Web Services (S3, Infrequent Access S3 and S3-to-Glacier). This is the baseline. We have a direct connect pipe to AWS which reduces the pain of data egress considerably. IBM Cloud Bluemix (formerly CleverSafe) A good choice for multi-region redundancy, as they use erasure coding across regions - no ?catch up? replication - providing CRR at a cheaper price than AWS. If you only want to keep one copy of your data in the cloud, but have it be able to survive the loss of a region, this is the best choice (Google can also do this, but not with an S3 API or an infrequent access store). Dell/EMC Virtustream (no cold storage option) Uses EMC ECS hardware. Actually more expensive than AWS at retail pricing for standard object storage; their value add is tying Virtustream into on-prem ECS units. Iron Mountain Iron Cloud (Infrequent Access only) Also uses EMC ECS hardware. Designed primarily for backup/archive workloads (no big surprise there), but with no retrieval, egress or PUT/GET/POST charges. Oracle Cloud (cheapest cold storage option, but not S3 API) Uses Openstack Swift. Has the cheapest cloud-tape product (Oracle Cloud Storage Archive), but has recently increased prices to be closer to AWS Glacier. Google Cloud Platform (not an S3 API) Technically brilliant, but you have to be able to use their APIs. Their cold storage product is online (disk, not tape), but not as cheap as Glacier. Microsoft Azure (not an S3 API) Competitively priced, especially their Infrequent Access product, but again not an S3 API and their vault product is still in beta. Backblaze B2 (not an S3 API) Another backup/archive target, only slightly more expensive than Glacier, but online (no retrieval time or fees) and with significantly cheaper data egress rates than AWS. Wasabi Cloud Recently launched company from the team that brought you Carbonite. Ridiculously cheap S3 storage, but with a 90-day per-object minimum charge. It?s cheaper and faster than Glacier, both to store data and egress it, but there?s obvious concerns around company longevity. Would probably make a good second target if you have a multi-vendor requirement for your data. If anyone is interested in hearing more, or has any experience with any of these vendors, please speak up! -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries ________________________________ ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuart.lewis at nls.uk Tue Nov 14 04:25:50 2017 From: stuart.lewis at nls.uk (Lewis, Stuart) Date: Tue, 14 Nov 2017 09:25:50 +0000 Subject: [Pasig-discuss] Experiences with S3-like object store service providers? In-Reply-To: <1895412C-8D73-4CD5-AE39-1E36463448DC@stanford.edu> References: <20171113170642.b554e26909f2beaf9f8ddbf6be9a6600.219166ea37.wbe@email09.godaddy.com> <1895412C-8D73-4CD5-AE39-1E36463448DC@stanford.edu> Message-ID: <5FC5CD8EDDAC964080B0E5B6FC6B6F770C0FF250@W2K8-MAILBOX.natlibofscot.nls.uk> Hi Julian, Gail, all, At the National Library of Scotland we are also in the middle of some procurements to revamp our storage infrastructure for holding our digitised content archive. The approach historically taken here has been to use general purpose SANs, with a second copy placed on offline tape. The SANs have never been built to scale (so they fill and we buy another), and they are general purpose, trying their best (but often failing!) to run a mixed workload of everything from VMs to data archive and everything in between. We?re now wanting to move to three copies, two online and one offline (in the cloud if possible). For the online copies we?re about to get to tender to buy a geo-replicated object storage system, to be hosted in our data centres in Edinburgh and Glasgow. I suspect the likely candidates will be systems such as Dell EMC ECS, HPE+Scality, IBM ESS**, and Hitachi HPC. (** ESS rather than CleverSafe, as I think that is predicated on three datacentres, but we only want two). We?re also about to try a large-scale proof of concept with the Oracle Archive Cloud, but have an open question regarding its characteristics compared to local offline tape. Due to lack of transparency about what is actually going on behind the scenes in a cloud environment, we don?t know whether this gives us the same offline protection that tape gives us (e.g. much harder to corrupt or accidentally delete). We?re also purposefully not going to use the object storage system?s in-built cloud connectors for replication. We feel it might be safer for us to manage the replication to the cloud in our repository, rather than having a single vendor system manage all three copies at once. Critique of this plan is most welcome! Also happy to join in any offline discussion about this. Best wishes, Stuart Lewis Head of Digital National Library of Scotland George IV Bridge, Edinburgh EH1 1EW Tel: +44 (0) 131 623 3704 Email: stuart.lewis at nls.uk Website: www.nls.uk Twitter: @stuartlewis [cid:image003.jpg at 01D0DE86.3095EC20] From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Julian M. Morley Sent: 14 November 2017 04:28 To: gail at trumantechnologies.com; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Gail, Sure - would be happy to chat with you. I?ve got Scality in my list of contenders - didn?t mention it here because my first few use cases are explicitly ?not on campus?, but I agree it?s definitely a fit for our main on prem system. As with any commercial software, ongoing licensing costs are a potential pain point for us. -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries From: "gail at trumantechnologies.com" > Date: Monday, November 13, 2017 at 4:06 PM To: Julian Morley >, "pasig-discuss at mail.asis.org" > Cc: "gail at trumantechnologies.com" > Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Julian, thanks for sharing your list and comments. Very thorough list. I'd love to chat (and I'm close by in Oakland).... I've quite a lot of experience in the cloud storage field and would suggest you also take a look at multi-cloud connector technologies that will allow you to standardize on S3, but write to non-S3-based public cloud vendors. And to tier or move data among private and public clouds and do federated search on metadata across a single namespace (across these clouds). Check out a couple of interesting technologies: Open Source Zenko.io - offering S3 connect to AWS, Azure and Google (the latter 2 are coming shortly), and also Scality Connect for Azure Blog Storage - translates S3 API calls to Azure blob storage API calls. See the attached datasheet and also https://www.zenko.io/ I'd add Scality to your list -- see the Gartner magic quadrant they're shown in the Upper Right Visionary quadrant and are close to you in San Francisco. They talk S3, File, NFS/SMB, REST (CDMI etc), can tier off to public clouds, and have lots of multi-PB size customer installs. Gartner MQ is here: https://www.gartner.com/doc/reprints?id=1-4IE870C&ct=171017&st=sb I'd be very interested in learning more about your use cases -- can we connect outside of this PASIG alias? Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: [Pasig-discuss] Experiences with S3-like object store service providers? From: "Julian M. Morley" > Date: Mon, November 13, 2017 12:43 pm To: "pasig-discuss at mail.asis.org" > Hi everyone, I?ve currently got at least four use cases for an S3-compatible object store, spanning everything from traditional S3 through infrequent access stores to cold vaults. As a result I?ve spent considerable time researching options and prices, and was wondering if anyone on this list has any similar experiences they?d like to share. Our use cases range from hundreds of TB through to several PB, with different access patterns and comfort levels around redundancy and access. For most of them a 100% compatible S3 API is a requirement, but we can bend that a bit for the cold storage use case. We?re also considering local/on-prem object stores for one of the use cases - either rolling our own Ceph install, or using Dell/EMC ECS or SpectraLogic ArcticBlue/Blackpearl. The vendors that I?m looking at are: Amazon Web Services (S3, Infrequent Access S3 and S3-to-Glacier). This is the baseline. We have a direct connect pipe to AWS which reduces the pain of data egress considerably. IBM Cloud Bluemix (formerly CleverSafe) A good choice for multi-region redundancy, as they use erasure coding across regions - no ?catch up? replication - providing CRR at a cheaper price than AWS. If you only want to keep one copy of your data in the cloud, but have it be able to survive the loss of a region, this is the best choice (Google can also do this, but not with an S3 API or an infrequent access store). Dell/EMC Virtustream (no cold storage option) Uses EMC ECS hardware. Actually more expensive than AWS at retail pricing for standard object storage; their value add is tying Virtustream into on-prem ECS units. Iron Mountain Iron Cloud (Infrequent Access only) Also uses EMC ECS hardware. Designed primarily for backup/archive workloads (no big surprise there), but with no retrieval, egress or PUT/GET/POST charges. Oracle Cloud (cheapest cold storage option, but not S3 API) Uses Openstack Swift. Has the cheapest cloud-tape product (Oracle Cloud Storage Archive), but has recently increased prices to be closer to AWS Glacier. Google Cloud Platform (not an S3 API) Technically brilliant, but you have to be able to use their APIs. Their cold storage product is online (disk, not tape), but not as cheap as Glacier. Microsoft Azure (not an S3 API) Competitively priced, especially their Infrequent Access product, but again not an S3 API and their vault product is still in beta. Backblaze B2 (not an S3 API) Another backup/archive target, only slightly more expensive than Glacier, but online (no retrieval time or fees) and with significantly cheaper data egress rates than AWS. Wasabi Cloud Recently launched company from the team that brought you Carbonite. Ridiculously cheap S3 storage, but with a 90-day per-object minimum charge. It?s cheaper and faster than Glacier, both to store data and egress it, but there?s obvious concerns around company longevity. Would probably make a good second target if you have a multi-vendor requirement for your data. If anyone is interested in hearing more, or has any experience with any of these vendors, please speak up! -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries ________________________________ ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss National Library of Scotland, Scottish Charity, No: SCO11086 This communication is intended for the addressee(s) only. If you are not the addressee please inform the sender and delete the email from your system. The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of National Library of Scotland. This message is subject to the Data Protection Act 1998 and Freedom of Information (Scotland) Act 2002. No liability is accepted for any harm that may be caused to your systems or data by this message. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 18754 bytes Desc: image001.jpg URL: From jfarmer at cambridgecomputer.com Tue Nov 14 10:03:16 2017 From: jfarmer at cambridgecomputer.com (Jacob Farmer) Date: Tue, 14 Nov 2017 10:03:16 -0500 Subject: [Pasig-discuss] Experiences with S3-like object store service providers? In-Reply-To: <5FC5CD8EDDAC964080B0E5B6FC6B6F770C0FF250@W2K8-MAILBOX.natlibofscot.nls.uk> References: <20171113170642.b554e26909f2beaf9f8ddbf6be9a6600.219166ea37.wbe@email09.godaddy.com> <1895412C-8D73-4CD5-AE39-1E36463448DC@stanford.edu> <5FC5CD8EDDAC964080B0E5B6FC6B6F770C0FF250@W2K8-MAILBOX.natlibofscot.nls.uk> Message-ID: <05c2d35f919f24efa4c53024be14ebce@mail.gmail.com> Hi, Stuart. I thought I would weigh in on your plans. I?m a data storage consultant with about 30 years in the game. I?m also the founder of Starfish Storage which makes software products for virtualizing file systems and object stores. ? I think you are correct to manage multiple copies in the application layer. This gives you maximum control, ability to shift vendors, etc. Storage should always be thought of a stack that starts with the application and ends in the storage media. There can be multiple layers and the system architect should pick the right layer of abstraction for any given set of functionality. The higher in the stack, the greater your application awareness. ? By handling replication and addressing in your application, you should be able to switch object stores over time without much difficulty. As such, it does not matter so much which object store you buy. You could simply chase price. ? Oracle ? They are mysterious about what they are doing under the hood, but it does not matter. It?s a ?cloud?. They are so inexpensive. Use them as your second or third copy. I know that Oracle people monitor the news group. Maybe one will offer to connect you to a product manager who can describe the infrastructure. If not, I would be happy to connect you to folks on the US side of the pond. ? There is a particular vendor you did not list who might be interesting for you. They are called Caringo. They have been in the object storage business before S3 came to market, and thus they offer alternative addressing to the S3 bucket paradigm. They can emulate S3 just like everyone else, but S3 was designed by Amazon for the purpose of selling a storage service. It is not necessarily, the logical way to store objects for a digital library. If you are going to address objects directly from your application, they might have some unique value. I am happy to connect you to executives in the company. ? The other vendor worth looking at is Minio.IO. I just pointed Julian to them the other day. They provide an object interface in storage and could federate different cloud stores together. You might consider them for one of your copies. I still like the idea of doing your replication in the application. They are similar in concept to Zenko who Gail recommended earlier. ? POSIX File System Gateway ? My software company (Starfish Storage) has a file system gateway under development (ready for early adopters) that is ideal if you want a POSIX personality. We can take the contents of the object store and present it as a POSIX file system. o We map files 1:1 to objects. Most file system gateways on the market break up files into smaller objects, akin to blocks. o We support Active Directory perfectly, SMB-2, SMB-3, NFS-3, NFS-4 o We also work in-band or side-band to the object store. That means that you can use our POSIX interface simultaneously with S3. ? You probably also have use cases for Starfish, maybe as a migration tool from file to object or as an end-to-end fixity solution. We would be especially useful if you need to migrate files from your tape file system. o Starfish is a metadata and rules engine for file systems and object stores. Too many concepts to put in an email! I hope that helps. Message me offline if you want to discuss. I?m at the SuperComputer conference this week, so replies will be a bit slow. *Jacob Farmer | Chief Technology Officer | Cambridge Computer | "Artists In Data Storage" * Phone 781-250-3210 | jfarmer at CambridgeComputer.com | www.CambridgeComputer.com *From:* Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] *On Behalf Of *Lewis, Stuart *Sent:* Tuesday, November 14, 2017 4:26 AM *To:* 'Julian M. Morley' ; gail at trumantechnologies.com; pasig-discuss at mail.asis.org *Subject:* Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Julian, Gail, all, At the National Library of Scotland we are also in the middle of some procurements to revamp our storage infrastructure for holding our digitised content archive. The approach historically taken here has been to use general purpose SANs, with a second copy placed on offline tape. The SANs have never been built to scale (so they fill and we buy another), and they are general purpose, trying their best (but often failing!) to run a mixed workload of everything from VMs to data archive and everything in between. We?re now wanting to move to three copies, two online and one offline (in the cloud if possible). For the online copies we?re about to get to tender to buy a geo-replicated object storage system, to be hosted in our data centres in Edinburgh and Glasgow. I suspect the likely candidates will be systems such as Dell EMC ECS, HPE+Scality, IBM ESS**, and Hitachi HPC. (** ESS rather than CleverSafe, as I think that is predicated on three datacentres, but we only want two). We?re also about to try a large-scale proof of concept with the Oracle Archive Cloud, but have an open question regarding its characteristics compared to local offline tape. Due to lack of transparency about what is actually going on behind the scenes in a cloud environment, we don?t know whether this gives us the same offline protection that tape gives us (e.g. much harder to corrupt or accidentally delete). We?re also purposefully not going to use the object storage system?s in-built cloud connectors for replication. We feel it might be safer for us to manage the replication to the cloud in our repository, rather than having a single vendor system manage all three copies at once. Critique of this plan is most welcome! Also happy to join in any offline discussion about this. Best wishes, *Stuart LewisHead of DigitalNational Library of Scotland*George IV Bridge, Edinburgh EH1 1EW *Tel:* +44 (0) 131 623 3704 *Email:* stuart.lewis at nls.uk *Website:* www.nls.uk *Twitter:* @stuartlewis [image: cid:image003.jpg at 01D0DE86.3095EC20] *From:* Pasig-discuss [mailto:pasig-discuss-bounces at asist.org ] *On Behalf Of *Julian M. Morley *Sent:* 14 November 2017 04:28 *To:* gail at trumantechnologies.com; pasig-discuss at mail.asis.org *Subject:* Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Gail, Sure - would be happy to chat with you. I?ve got Scality in my list of contenders - didn?t mention it here because my first few use cases are explicitly ?not on campus?, but I agree it?s definitely a fit for our main on prem system. As with any commercial software, ongoing licensing costs are a potential pain point for us. -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries *From: *"gail at trumantechnologies.com" *Date: *Monday, November 13, 2017 at 4:06 PM *To: *Julian Morley , "pasig-discuss at mail.asis.org" < pasig-discuss at mail.asis.org> *Cc: *"gail at trumantechnologies.com" *Subject: *RE: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Julian, thanks for sharing your list and comments. Very thorough list. I'd love to chat (and I'm close by in Oakland).... I've quite a lot of experience in the cloud storage field and would suggest you also take a look at multi-cloud connector technologies that will allow you to standardize on S3, but write to non-S3-based public cloud vendors. And to tier or move data among private and public clouds and do federated search on metadata across a single namespace (across these clouds). Check out a couple of interesting technologies: Open Source Zenko.io - offering S3 connect to AWS, Azure and Google (the latter 2 are coming shortly), and also Scality Connect for Azure Blog Storage - translates S3 API calls to Azure blob storage API calls. See the attached datasheet and also https://www.zenko.io/ I'd add Scality to your list -- see the Gartner magic quadrant they're shown in the Upper Right Visionary quadrant and are close to you in San Francisco. They talk S3, File, NFS/SMB, REST (CDMI etc), can tier off to public clouds, and have lots of multi-PB size customer installs. Gartner MQ is here: https://www.gartner.com/doc/reprints?id=1-4IE870C&ct=171017&st=sb I'd be very interested in learning more about your use cases -- can we connect outside of this PASIG alias? Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists *Protecting the world's digital heritage for future generations* www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: [Pasig-discuss] Experiences with S3-like object store service providers? From: "Julian M. Morley" Date: Mon, November 13, 2017 12:43 pm To: "pasig-discuss at mail.asis.org" Hi everyone, I?ve currently got at least four use cases for an S3-compatible object store, spanning everything from traditional S3 through infrequent access stores to cold vaults. As a result I?ve spent considerable time researching options and prices, and was wondering if anyone on this list has any similar experiences they?d like to share. Our use cases range from hundreds of TB through to several PB, with different access patterns and comfort levels around redundancy and access. For most of them a 100% compatible S3 API is a requirement, but we can bend that a bit for the cold storage use case. We?re also considering local/on-prem object stores for one of the use cases - either rolling our own Ceph install, or using Dell/EMC ECS or SpectraLogic ArcticBlue/Blackpearl. The vendors that I?m looking at are: *Amazon Web Services* (S3, Infrequent Access S3 and S3-to-Glacier). This is the baseline. We have a direct connect pipe to AWS which reduces the pain of data egress considerably. *IBM Cloud Bluemix* (formerly CleverSafe) A good choice for multi-region redundancy, as they use erasure coding across regions - no ?catch up? replication - providing CRR at a cheaper price than AWS. If you only want to keep one copy of your data in the cloud, but have it be able to survive the loss of a region, this is the best choice (Google can also do this, but not with an S3 API or an infrequent access store). *Dell/EMC Virtustream* (no cold storage option) Uses EMC ECS hardware. Actually more expensive than AWS at retail pricing for standard object storage; their value add is tying Virtustream into on-prem ECS units. *Iron Mountain Iron Cloud* (Infrequent Access only) Also uses EMC ECS hardware. Designed primarily for backup/archive workloads (no big surprise there), but with no retrieval, egress or PUT/GET/POST charges. *Oracle Cloud* (cheapest cold storage option, but not S3 API) Uses Openstack Swift. Has the cheapest cloud-tape product (Oracle Cloud Storage Archive), but has recently increased prices to be closer to AWS Glacier. *Google Cloud Platform* (not an S3 API) Technically brilliant, but you have to be able to use their APIs. Their cold storage product is online (disk, not tape), but not as cheap as Glacier. *Microsoft Azure *(not an S3 API) Competitively priced, especially their Infrequent Access product, but again not an S3 API and their vault product is still in beta. *Backblaze B2* (not an S3 API) Another backup/archive target, only slightly more expensive than Glacier, but online (no retrieval time or fees) and with significantly cheaper data egress rates than AWS. *Wasabi Cloud* Recently launched company from the team that brought you Carbonite. Ridiculously cheap S3 storage, but with a 90-day per-object minimum charge. *It?s cheaper and faster than Glacier*, both to store data and egress it, but there?s obvious concerns around company longevity. Would probably make a good second target if you have a multi-vendor requirement for your data. If anyone is interested in hearing more, or has any experience with any of these vendors, please speak up! -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries ------------------------------ ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss National Library of Scotland, Scottish Charity, No: SCO11086 This communication is intended for the addressee(s) only. If you are not the addressee please inform the sender and delete the email from your system. The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of National Library of Scotland. This message is subject to the Data Protection Act 1998 and Freedom of Information (Scotland) Act 2002. No liability is accepted for any harm that may be caused to your systems or data by this message. *Before you print please think about the ENVIRONMENT* -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 18754 bytes Desc: not available URL: From gail at trumantechnologies.com Tue Nov 14 12:05:07 2017 From: gail at trumantechnologies.com (gail at trumantechnologies.com) Date: Tue, 14 Nov 2017 10:05:07 -0700 Subject: [Pasig-discuss] =?utf-8?q?Experiences_with_S3-like_object_store_s?= =?utf-8?q?ervice_providers=3F?= Message-ID: <20171114100507.b554e26909f2beaf9f8ddbf6be9a6600.409ccf0e0f.wbe@email09.godaddy.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 18754 bytes Desc: not available URL: From jmorley at stanford.edu Tue Nov 14 12:49:01 2017 From: jmorley at stanford.edu (Julian M. Morley) Date: Tue, 14 Nov 2017 17:49:01 +0000 Subject: [Pasig-discuss] Experiences with S3-like object store service providers? In-Reply-To: <5FC5CD8EDDAC964080B0E5B6FC6B6F770C0FF250@W2K8-MAILBOX.natlibofscot.nls.uk> References: <20171113170642.b554e26909f2beaf9f8ddbf6be9a6600.219166ea37.wbe@email09.godaddy.com> <1895412C-8D73-4CD5-AE39-1E36463448DC@stanford.edu> <5FC5CD8EDDAC964080B0E5B6FC6B6F770C0FF250@W2K8-MAILBOX.natlibofscot.nls.uk> Message-ID: Hi Stuart, Thanks for sharing! I talked to the Oracle folks quite a bit last year - Oracle Cloud Storage Archive is offline tape. They don?t say/won?t publicly admit, but it?s two copies of data on two tapes, with ?periodic? CRC checks and recovery. However, a little birdie told me that they?re still trying to figure out how to do Cloud for non-Oracle Database customers, so I?m not convinced about the long-term sustainability of the service. We?re moving to four copies - one local (online) , three offsite, either at our secondary site or in the cloud. Right now the plan is for at least two of the offsite copies to be ?offline? - Glacier and Oracle Cloud Storage Archive, with the third copy being either another cold/vault store or on an S3-IA object store. When you say you?re targeting a geo-replicated object store for your two online copies, do you mean that you?re counting the geo-replication as a copy mechanism? I?d be concerned about an error on one site propagating to the other replica - in our plan, whilst we intend to have a robust storage system that does EC and/or replication, possibly even to our second site, we?re only counting that as one logical copy. >> We?re also purposefully not going to use the object storage system?s in-built cloud connectors for replication. We feel it might be safer for us to manage the replication to the cloud in our repository, rather than having a single vendor system manage all three copies at once. I came to the same conclusion. Strength in diversity! -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries From: "Lewis, Stuart" > Date: Tuesday, November 14, 2017 at 1:25 AM To: Julian Morley >, "gail at trumantechnologies.com" >, "pasig-discuss at mail.asis.org" > Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Julian, Gail, all, At the National Library of Scotland we are also in the middle of some procurements to revamp our storage infrastructure for holding our digitised content archive. The approach historically taken here has been to use general purpose SANs, with a second copy placed on offline tape. The SANs have never been built to scale (so they fill and we buy another), and they are general purpose, trying their best (but often failing!) to run a mixed workload of everything from VMs to data archive and everything in between. We?re now wanting to move to three copies, two online and one offline (in the cloud if possible). For the online copies we?re about to get to tender to buy a geo-replicated object storage system, to be hosted in our data centres in Edinburgh and Glasgow. I suspect the likely candidates will be systems such as Dell EMC ECS, HPE+Scality, IBM ESS**, and Hitachi HPC. (** ESS rather than CleverSafe, as I think that is predicated on three datacentres, but we only want two). We?re also about to try a large-scale proof of concept with the Oracle Archive Cloud, but have an open question regarding its characteristics compared to local offline tape. Due to lack of transparency about what is actually going on behind the scenes in a cloud environment, we don?t know whether this gives us the same offline protection that tape gives us (e.g. much harder to corrupt or accidentally delete). We?re also purposefully not going to use the object storage system?s in-built cloud connectors for replication. We feel it might be safer for us to manage the replication to the cloud in our repository, rather than having a single vendor system manage all three copies at once. Critique of this plan is most welcome! Also happy to join in any offline discussion about this. Best wishes, Stuart Lewis Head of Digital National Library of Scotland George IV Bridge, Edinburgh EH1 1EW Tel: +44 (0) 131 623 3704 Email: stuart.lewis at nls.uk Website:www.nls.uk Twitter: @stuartlewis [cid:image003.jpg at 01D0DE86.3095EC20] From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Julian M. Morley Sent: 14 November 2017 04:28 To: gail at trumantechnologies.com; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Gail, Sure - would be happy to chat with you. I?ve got Scality in my list of contenders - didn?t mention it here because my first few use cases are explicitly ?not on campus?, but I agree it?s definitely a fit for our main on prem system. As with any commercial software, ongoing licensing costs are a potential pain point for us. -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries From: "gail at trumantechnologies.com" > Date: Monday, November 13, 2017 at 4:06 PM To: Julian Morley >, "pasig-discuss at mail.asis.org" > Cc: "gail at trumantechnologies.com" > Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Julian, thanks for sharing your list and comments. Very thorough list. I'd love to chat (and I'm close by in Oakland).... I've quite a lot of experience in the cloud storage field and would suggest you also take a look at multi-cloud connector technologies that will allow you to standardize on S3, but write to non-S3-based public cloud vendors. And to tier or move data among private and public clouds and do federated search on metadata across a single namespace (across these clouds). Check out a couple of interesting technologies: Open Source Zenko.io - offering S3 connect to AWS, Azure and Google (the latter 2 are coming shortly), and also Scality Connect for Azure Blog Storage - translates S3 API calls to Azure blob storage API calls. See the attached datasheet and also https://www.zenko.io/ I'd add Scality to your list -- see the Gartner magic quadrant they're shown in the Upper Right Visionary quadrant and are close to you in San Francisco. They talk S3, File, NFS/SMB, REST (CDMI etc), can tier off to public clouds, and have lots of multi-PB size customer installs. Gartner MQ is here: https://www.gartner.com/doc/reprints?id=1-4IE870C&ct=171017&st=sb I'd be very interested in learning more about your use cases -- can we connect outside of this PASIG alias? Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: [Pasig-discuss] Experiences with S3-like object store service providers? From: "Julian M. Morley" > Date: Mon, November 13, 2017 12:43 pm To: "pasig-discuss at mail.asis.org" > Hi everyone, I?ve currently got at least four use cases for an S3-compatible object store, spanning everything from traditional S3 through infrequent access stores to cold vaults. As a result I?ve spent considerable time researching options and prices, and was wondering if anyone on this list has any similar experiences they?d like to share. Our use cases range from hundreds of TB through to several PB, with different access patterns and comfort levels around redundancy and access. For most of them a 100% compatible S3 API is a requirement, but we can bend that a bit for the cold storage use case. We?re also considering local/on-prem object stores for one of the use cases - either rolling our own Ceph install, or using Dell/EMC ECS or SpectraLogic ArcticBlue/Blackpearl. The vendors that I?m looking at are: Amazon Web Services (S3, Infrequent Access S3 and S3-to-Glacier). This is the baseline. We have a direct connect pipe to AWS which reduces the pain of data egress considerably. IBM Cloud Bluemix (formerly CleverSafe) A good choice for multi-region redundancy, as they use erasure coding across regions - no ?catch up? replication - providing CRR at a cheaper price than AWS. If you only want to keep one copy of your data in the cloud, but have it be able to survive the loss of a region, this is the best choice (Google can also do this, but not with an S3 API or an infrequent access store). Dell/EMC Virtustream (no cold storage option) Uses EMC ECS hardware. Actually more expensive than AWS at retail pricing for standard object storage; their value add is tying Virtustream into on-prem ECS units. Iron Mountain Iron Cloud (Infrequent Access only) Also uses EMC ECS hardware. Designed primarily for backup/archive workloads (no big surprise there), but with no retrieval, egress or PUT/GET/POST charges. Oracle Cloud (cheapest cold storage option, but not S3 API) Uses Openstack Swift. Has the cheapest cloud-tape product (Oracle Cloud Storage Archive), but has recently increased prices to be closer to AWS Glacier. Google Cloud Platform (not an S3 API) Technically brilliant, but you have to be able to use their APIs. Their cold storage product is online (disk, not tape), but not as cheap as Glacier. Microsoft Azure (not an S3 API) Competitively priced, especially their Infrequent Access product, but again not an S3 API and their vault product is still in beta. Backblaze B2 (not an S3 API) Another backup/archive target, only slightly more expensive than Glacier, but online (no retrieval time or fees) and with significantly cheaper data egress rates than AWS. Wasabi Cloud Recently launched company from the team that brought you Carbonite. Ridiculously cheap S3 storage, but with a 90-day per-object minimum charge. It?s cheaper and faster than Glacier, both to store data and egress it, but there?s obvious concerns around company longevity. Would probably make a good second target if you have a multi-vendor requirement for your data. If anyone is interested in hearing more, or has any experience with any of these vendors, please speak up! -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries ________________________________ ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss National Library of Scotland, Scottish Charity, No: SCO11086 This communication is intended for the addressee(s) only. If you are not the addressee please inform the sender and delete the email from your system. The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of National Library of Scotland. This message is subject to the Data Protection Act 1998 and Freedom of Information (Scotland) Act 2002. No liability is accepted for any harm that may be caused to your systems or data by this message. Before you print please think about the ENVIRONMENT -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 18754 bytes Desc: image001.jpg URL: From frank.boensch at oracle.com Tue Nov 14 13:17:58 2017 From: frank.boensch at oracle.com (Frank Boensch) Date: Tue, 14 Nov 2017 10:17:58 -0800 (PST) Subject: [Pasig-discuss] Experiences with S3-like object store service providers? In-Reply-To: References: <20171113170642.b554e26909f2beaf9f8ddbf6be9a6600.219166ea37.wbe@email09.godaddy.com> <1895412C-8D73-4CD5-AE39-1E36463448DC@stanford.edu> <5FC5CD8EDDAC964080B0E5B6FC6B6F770C0FF250@W2K8-MAILBOX.natlibofscot.nls.uk> Message-ID: <220a2a62-1eaf-4e30-bbf0-6f6774192aed@default> All, ? Oracle does offer Archive Storage (Oracle Archive Cloud) to non-Oracle DB customers.? Oracle also offers Object Storage in the Cloud for faster access to content.? Both of these can be in support of Oracle based applications or non-Oracle based applications (which would be straight cloud storage) either using an API ? or not. ? >From a diversity or availably perspective, Oracle has built their platform to support SLAs specific to the service.? The Archive Cloud delivers 11 9s data durability by maintaining multiple copies of each object on different devices.? Object Storage carries the same SLA. ? Hope this helps ? Frank ? Frank Boensch Eastern US Sales Lead ? Oracle Diva HYPERLINK "mailto:frank.boensch at oracle.com"frank.boensch at oracle.com 646-303-5187 ? ? ? From: Julian M. Morley [mailto:jmorley at stanford.edu] Sent: Tuesday, November 14, 2017 12:49 PM To: Lewis, Stuart ; gail at trumantechnologies.com; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? ? Hi Stuart, ? Thanks for sharing!? ? I talked to the Oracle folks quite a bit last year - Oracle Cloud Storage Archive is offline tape. They don?t say/won?t publicly admit, but it?s two copies of data on two tapes, with ?periodic? CRC checks and recovery. However, a little birdie told me that they?re still trying to figure out how to do Cloud for non-Oracle Database customers, so I?m not convinced about the long-term sustainability of the service. ? We?re moving to four copies - one local (online) , three offsite, either at our secondary site or in the cloud. Right now the plan is for at least two of the offsite copies to be ?offline? - Glacier and Oracle Cloud Storage Archive, with the third copy being either another cold/vault store or on an S3-IA object store.? ? When you say you?re targeting a geo-replicated object store for your two online copies, do you mean that you?re counting the geo-replication as a copy mechanism? I?d be concerned about an error on one site propagating to the other replica - in our plan, whilst we intend to have a robust storage system that does EC and/or replication, possibly even to our second site, we?re only counting that as one logical copy.? ? >> We?re also purposefully not going to use the object storage system?s in-built cloud connectors for replication.? We feel it might be safer for us to manage the replication to the cloud in our repository, rather than having a single vendor system manage all three copies at once. ? I came to the same conclusion. Strength in diversity! ? --? Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries ? From: "Lewis, Stuart" Date: Tuesday, November 14, 2017 at 1:25 AM To: Julian Morley , "HYPERLINK "mailto:gail at trumantechnologies.com"gail at trumantechnologies.com" , "HYPERLINK "mailto:pasig-discuss at mail.asis.org"pasig-discuss at mail.asis.org" Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers? ? Hi Julian, Gail, all, ? At the National Library of Scotland we are also in the middle of some procurements to revamp our storage infrastructure for holding our digitised content archive. ? The approach historically taken here has been to use general purpose SANs, with a second copy placed on offline tape.? The SANs have never been built to scale (so they fill and we buy another), and they are general purpose, trying their best (but often failing!) to run a mixed workload of everything from VMs to data archive and everything in between. ? We?re now wanting to move to three copies, two online and one offline (in the cloud if possible). ? For the online copies we?re about to get to tender to buy a geo-replicated object storage system, to be hosted in our data centres in Edinburgh and Glasgow.? I suspect the likely candidates will be systems such as Dell EMC ECS, HPE+Scality, IBM ESS**, and Hitachi HPC. ? (** ESS rather than CleverSafe, as I think that is predicated on three datacentres, but we only want two). ? We?re also about to try a large-scale proof of concept with the Oracle Archive Cloud, but have an open question regarding its characteristics compared to local offline tape.? Due to lack of transparency about what is actually going on behind the scenes in a cloud environment, we don?t know whether this gives us the same offline protection that tape gives us (e.g. much harder to corrupt or accidentally delete). ? We?re also purposefully not going to use the object storage system?s in-built cloud connectors for replication.? We feel it might be safer for us to manage the replication to the cloud in our repository, rather than having a single vendor system manage all three copies at once. ? Critique of this plan is most welcome! ? Also happy to join in any offline discussion about this. ? Best wishes, ? ? Stuart Lewis Head of Digital National Library of Scotland George IV Bridge, Edinburgh EH1 1EW Tel: +44 (0) 131 623 3704 Email: HYPERLINK "mailto:stuart.lewis at nls.uk"stuart.lewis at nls.uk Website:HYPERLINK "http://www.nls.uk/"www.nls.uk Twitter: HYPERLINK "http://twitter.com/stuartlewis"@stuartlewis ? ? ? ? From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Julian M. Morley Sent: 14 November 2017 04:28 To: HYPERLINK "mailto:gail at trumantechnologies.com"gail at trumantechnologies.com; HYPERLINK "mailto:pasig-discuss at mail.asis.org"pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? ? Hi Gail, ? Sure - would be happy to chat with you. ? I?ve got Scality in my list of contenders - didn?t mention it here because my first few use cases are explicitly ?not on campus?, but I agree it?s definitely a fit for our main on prem system. As with any commercial software, ongoing licensing costs are a potential pain point for us. ? --? Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries ? From: "HYPERLINK "mailto:gail at trumantechnologies.com"gail at trumantechnologies.com" Date: Monday, November 13, 2017 at 4:06 PM To: Julian Morley , "HYPERLINK "mailto:pasig-discuss at mail.asis.org"pasig-discuss at mail.asis.org" Cc: "HYPERLINK "mailto:gail at trumantechnologies.com"gail at trumantechnologies.com" Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers? ? Hi Julian, thanks for sharing your list and comments. Very thorough list. I'd love to chat (and I'm close by in Oakland).... I've quite a lot of experience in the cloud storage field and would suggest you also take a look at multi-cloud connector technologies that will allow you to standardize on S3, but write to non-S3-based public cloud vendors. And to tier or move data among private and public clouds and do federated search on metadata across a single namespace (across these clouds). Check out a couple of interesting technologies: Open Source HYPERLINK "http://Zenko.io"Zenko.io - offering S3 connect to AWS, Azure and Google (the latter 2 are coming shortly), and also Scality Connect for Azure Blog Storage - translates S3 API calls to Azure blob storage API calls. See the attached datasheet and also??https://www.zenko.io/ I'd add Scality to your list -- see the Gartner magic quadrant they're shown in the Upper Right Visionary quadrant and are close to you in San Francisco. They talk S3, File, NFS/SMB, REST (CDMI etc), can tier off to public clouds, and have lots of multi-PB size customer installs.??Gartner MQ is here:?https://www.gartner.com/doc/reprints?id=1-4IE870C&ct=171017&st=sb ? I'd be very interested in learning more about your use cases -- can we connect outside of this PASIG alias? Gail ? ? ? Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists ? Protecting the world's digital heritage for future generations HYPERLINK "http://www.trumantechnologies.com"www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman? ? +1 510 502 6497 ? ? ? -------- Original Message -------- Subject: [Pasig-discuss] Experiences with S3-like object store service providers? From: "Julian M. Morley" Date: Mon, November 13, 2017 12:43 pm To: "HYPERLINK "mailto:pasig-discuss at mail.asis.org"pasig-discuss at mail.asis.org" ? Hi everyone, ? I?ve currently got at least four use cases for an S3-compatible object store, spanning everything from traditional S3 through infrequent access stores to cold vaults. As a result I?ve spent considerable time researching options and prices, and was wondering if anyone on this list has any similar experiences they?d like to share. ? Our use cases range from hundreds of TB through to several PB, with different access patterns and comfort levels around redundancy and access. For most of them a 100% compatible S3 API is a requirement, but we can bend that a bit for the cold storage use case. We?re also considering local/on-prem object stores for one of the use cases - either rolling our own Ceph install, or using Dell/EMC ECS or SpectraLogic ArcticBlue/Blackpearl. ? The vendors that I?m looking at are: ? Amazon Web Services (S3, Infrequent Access S3 and S3-to-Glacier).? This is the baseline. We have a direct connect pipe to AWS which reduces the pain of data egress considerably. ? IBM Cloud Bluemix (formerly CleverSafe) A good choice for multi-region redundancy, as they use erasure coding across regions - no ?catch up? replication - providing CRR at a cheaper price than AWS. If you only want to keep one copy of your data in the cloud, but have it be able to survive the loss of a region, this is the best choice (Google can also do this, but not with an S3 API or an infrequent access store). ? Dell/EMC Virtustream (no cold storage option) Uses EMC ECS hardware. Actually more expensive than AWS at retail pricing for standard object storage; their value add is tying Virtustream into on-prem ECS units. ? Iron Mountain Iron Cloud (Infrequent Access only) Also uses EMC ECS hardware. Designed primarily for backup/archive workloads (no big surprise there), but with no retrieval, egress or PUT/GET/POST charges. ? Oracle Cloud (cheapest cold storage option, but not S3 API) Uses Openstack Swift. Has the cheapest cloud-tape product (Oracle Cloud Storage Archive), but has recently increased prices to be closer to AWS Glacier. ? Google Cloud Platform (not an S3 API) Technically brilliant, but you have to be able to use their APIs. Their cold storage product is online (disk, not tape), but not as cheap as Glacier. ? Microsoft Azure (not an S3 API) Competitively priced, especially their Infrequent Access product, but again not an S3 API and their vault product is still in beta. ? Backblaze B2 (not an S3 API) Another backup/archive target, only slightly more expensive than Glacier, but online (no retrieval time or fees) and with significantly cheaper data egress rates than AWS. ? Wasabi Cloud Recently launched company from the team that brought you Carbonite. Ridiculously cheap S3 storage, but with a 90-day per-object minimum charge. It?s cheaper and faster than Glacier, both to store data and egress it, but there?s obvious concerns around company longevity. Would probably make a good second target if you have a multi-vendor requirement for your data. ? If anyone is interested in hearing more, or has any experience with any of these vendors, please speak up! ? --? Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries ? _____ ? ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list HYPERLINK "mailto:Pasig-discuss at mail.asis.org"Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss ? National Library of Scotland, Scottish Charity, No: SCO11086 This communication is intended for the addressee(s) only. If you are not the addressee please inform the sender and delete the email from your system. The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of National Library of Scotland. This message is subject to the Data Protection Act 1998 and Freedom of Information (Scotland) Act 2002. No liability is accepted for any harm that may be caused to your systems or data by this message. ?Before you print please think about the ENVIRONMENT HYPERLINK "http://www.nls.uk/" ? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 18754 bytes Desc: not available URL: From sfineberg at gmail.com Tue Nov 14 10:32:17 2017 From: sfineberg at gmail.com (Sam Fineberg) Date: Tue, 14 Nov 2017 10:32:17 -0500 Subject: [Pasig-discuss] SNIA 100 Year Archive Survey Message-ID: <21f401d35d5d$c1a2a5c0$44e7f140$@gmail.com> Ten years ago, a SNIA Task Force undertook a 100 Year Archive Requirements Survey with a goal to determine requirements for long-term digital retention in the data center. The Task Force hypothesized that the practitioner survey respondents would have experiences with terabyte archive systems that would be adequate to define business and operating system requirements for petabyte-sized information repositories in the data center. Time flies while you're having fun. Now it's 2017, and the SNIA Long-Term Retention Technical Working Group (LTR TWG) and the SNIA Data Protection & Capacity Optimization Committee have teamed up to launch the 2017 SNIA Archive Survey. Back in the "first" decade of the 21st century, practitioners struggled with logical and physical retention, but for the most part generally understood their problems. Eighty percent of organizations participating in the 2007 survey had a need to retain information over 50 years, while 68% reported a need of over 100 years. However, "long term" realistically extended to only about 2017-2022 to migrate and retain readability. After that, survey responders felt that processes would fail and/or become too costly under an expected avalanche of information. Fast forward to 2017 and new standards, storage formats, and software are in play; and markets like cloud services offer choices which did not exist 10 years ago. Migration and retention solutions are becoming available but these solutions are not widely used, except in government agencies, libraries, and highly regulated industries. Understanding what is needed and why is a focus of SNIA's new survey. The 2017 survey seeks to assess who needs to retain long term information and what information needs to be retained, with appropriate policies. The focus will now be on IT best practices, not just business requirements. How is long term information stored, secured, and preserved? Does the cloud impact long term retention requirements? SNIA's 2017 Archive Survey launched at September 2017 Storage Developer Conference. We're sending out the call. Are you a member of an IT staff associated with archives? In Records and Information Management (RIM)? An academic? In Legal or Finance? If long term data preservation is near and dear to your heart, you'll want to take the survey, which covers business drivers, policies, storage, practices, preservation, security, and more. Help SNIA understand how archive practices have evolved in the last 10 years, what changes have taken place in corporate practices, and what technology changes have impacted daily operations. ---- Sam Fineberg Sam at Fineberg.net (650) 319-5727 -------------- next part -------------- An HTML attachment was scrubbed... URL: From akropilot at gmail.com Tue Nov 14 15:22:14 2017 From: akropilot at gmail.com (Mike Davis) Date: Tue, 14 Nov 2017 12:22:14 -0800 Subject: [Pasig-discuss] Experiences with S3-like object store service providers? In-Reply-To: <20171114100507.b554e26909f2beaf9f8ddbf6be9a6600.409ccf0e0f.wbe@email09.godaddy.com> References: <20171114100507.b554e26909f2beaf9f8ddbf6be9a6600.409ccf0e0f.wbe@email09.godaddy.com> Message-ID: Hi Gail I appreciate the fact that public cloud pricing can be complex; it's a function of the cost-following strategy. If the vendor incurs a cost, whether from media, IO, or networking, it's passed along as discrete charges to the customer. The alternative is opaquely bundling all the costs, which reduces transparency and flexibility to follow commodity curves downward. I believe it's publicly available data that S3 has dropped capacity pricing for example at an average 10% (ish) per year since launch. But the idea that transaction and I/O fees dramatically inflate the cost of public cloud storage is a myth, particularly for digital asset management and archival. It is certainly possible to design a wonky IO-heavy workload, place it on the wrong storage tier, and end up with unexpectedly high costs. But for archival-oriented workloads, the costs of *moving* data should never be greater than 10% of total or the situation needs to be examined more closely. For example, we might find that large objects are being inadvertently chunked into millions of 16KB objects by some third party gateway solution, that would inflate the transaction count. Happy to give you (and PASIG) a deeper dive on IAAS storage strategies, and to solve for long-term durability. -Mike Davis (AWS Storage) On Tue, Nov 14, 2017 at 9:05 AM, wrote: > Thanks for chiming in Jacob! As always, great additional information. I > think it's worth emphasizing that having an open and native data format > independent of where the data lives - this is really what will enable > multi-cloud workflow management. And also having federated data search of > system and descriptive metadata across the namespace no matter where the > data is stored (including across public- and on-prem cloud storage). These > are what the newer cloud controller software, like Zenko, Starfish, Minio > and (and other sw within some cloud services) can enable. > > Public cloud storage prices are racing to the bottom, but (as David > Rosenthal and others have pointed out) often the "hidden" costs of pulling > the data back will usually result in costs greater than a private cloud. > > Stuart - > I just read a couple of Forrester papers on Total Economic Impact (TEI) of > public clouds -- the ones I have URLs to are posted below and make a useful > read: > https://www.emc.com/collateral/analyst-reports/ > dell-emc-ecs-forrester-tei.pdf > https://whitepapers.theregister.co.uk/paper/view/5835/the-total-economic- > impact-of-scality-ring-with-dell-storage-servers > They talk about Dell hardware for building our on-prem clouds (ECS from > EMC and RING from Scality) and I believe you're working with HPE, but the > maths should be similar to show savings over public cloud. That said, > putting one or more copies in public cloud and managing them from one > namespace would be ideal... I envision use cases where multi-cloud > controller software will allow you to move data to the cloud service that > fits the data. [Even if it's for long-term archival, there are times when > preservation data services will need to be run (format migration, integrity > checks, creating access or derivatives of moving or still images, etc).] > Spin up some quick compute services or Hadoop (for other use case). > > This is a great topic - Julian and Stuart, all the best on your projects, > please do let this alias know what you decide to go with! > > > Gail > > > Gail Truman > Truman Technologies, LLC > Certified Digital Archives Specialist, Society of American Archivists > > Protecting the world's digital heritage for future generations > www.trumantechnologies.com > facebook/TrumanTechnologies > https://www.linkedin.com/in/gtruman > > +1 510 502 6497 <(510)%20502-6497> > > > > > -------- Original Message -------- > Subject: RE: [Pasig-discuss] Experiences with S3-like object store > service providers? > From: Jacob Farmer > Date: Tue, November 14, 2017 7:03 am > To: "Lewis, Stuart" , "Julian M. Morley" > , gail at trumantechnologies.com, > pasig-discuss at mail.asis.org > > Hi, Stuart. I thought I would weigh in on your plans. I?m a data storage > consultant with about 30 years in the game. I?m also the founder of > Starfish Storage which makes software products for virtualizing file > systems and object stores. > > ? I think you are correct to manage multiple copies in the > application layer. This gives you maximum control, ability to shift > vendors, etc. Storage should always be thought of a stack that starts with > the application and ends in the storage media. There can be multiple > layers and the system architect should pick the right layer of abstraction > for any given set of functionality. The higher in the stack, the greater > your application awareness. > > ? By handling replication and addressing in your application, you > should be able to switch object stores over time without much difficulty. > As such, it does not matter so much which object store you buy. You could > simply chase price. > > ? Oracle ? They are mysterious about what they are doing under the > hood, but it does not matter. It?s a ?cloud?. They are so inexpensive. > Use them as your second or third copy. I know that Oracle people monitor > the news group. Maybe one will offer to connect you to a product manager > who can describe the infrastructure. If not, I would be happy to connect > you to folks on the US side of the pond. > > ? There is a particular vendor you did not list who might be > interesting for you. They are called Caringo. They have been in the > object storage business before S3 came to market, and thus they offer > alternative addressing to the S3 bucket paradigm. They can emulate S3 just > like everyone else, but S3 was designed by Amazon for the purpose of > selling a storage service. It is not necessarily, the logical way to store > objects for a digital library. If you are going to address objects > directly from your application, they might have some unique value. I am > happy to connect you to executives in the company. > > ? The other vendor worth looking at is Minio.IO. I just pointed > Julian to them the other day. They provide an object interface in storage > and could federate different cloud stores together. You might consider > them for one of your copies. I still like the idea of doing your > replication in the application. They are similar in concept to Zenko who > Gail recommended earlier. > > ? POSIX File System Gateway ? My software company (Starfish > Storage) has a file system gateway under development (ready for early > adopters) that is ideal if you want a POSIX personality. We can take the > contents of the object store and present it as a POSIX file system. > > o We map files 1:1 to objects. Most file system gateways on the market > break up files into smaller objects, akin to blocks. > o We support Active Directory perfectly, SMB-2, SMB-3, NFS-3, NFS-4 > o We also work in-band or side-band to the object store. That means > that you can use our POSIX interface simultaneously with S3. > > ? You probably also have use cases for Starfish, maybe as a > migration tool from file to object or as an end-to-end fixity solution. We > would be especially useful if you need to migrate files from your tape file > system. > o Starfish is a metadata and rules engine for file systems and object > stores. Too many concepts to put in an email! > > I hope that helps. Message me offline if you want to discuss. I?m at the > SuperComputer conference this week, so replies will be a bit slow. > > > Jacob Farmer | Chief Technology Officer | Cambridge Computer | > "Artists In Data Storage" > Phone 781-250-3210 <(781)%20250-3210> | jfarmer at CambridgeComputer.com > | www.CambridgeComputer.com > > > From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Lewis, > Stuart > Sent: Tuesday, November 14, 2017 4:26 AM > To: 'Julian M. Morley' ; gail at trumantechnologies.com; > pasig-discuss at mail.asis.org > Subject: Re: [Pasig-discuss] Experiences with S3-like object store > service providers? > > Hi Julian, Gail, all, > > At the National Library of Scotland we are also in the middle of some > procurements to revamp our storage infrastructure for holding our digitised > content archive. > > The approach historically taken here has been to use general purpose SANs, > with a second copy placed on offline tape. The SANs have never been built > to scale (so they fill and we buy another), and they are general purpose, > trying their best (but often failing!) to run a mixed workload of > everything from VMs to data archive and everything in between. > > We?re now wanting to move to three copies, two online and one offline (in > the cloud if possible). > > For the online copies we?re about to get to tender to buy a geo-replicated > object storage system, to be hosted in our data centres in Edinburgh and > Glasgow. I suspect the likely candidates will be systems such as Dell EMC > ECS, HPE+Scality, IBM ESS**, and Hitachi HPC. > > (** ESS rather than CleverSafe, as I think that is predicated on three > datacentres, but we only want two). > > We?re also about to try a large-scale proof of concept with the Oracle > Archive Cloud, but have an open question regarding its characteristics > compared to local offline tape. Due to lack of transparency about what is > actually going on behind the scenes in a cloud environment, we don?t know > whether this gives us the same offline protection that tape gives us (e.g. > much harder to corrupt or accidentally delete). > > We?re also purposefully not going to use the object storage system?s > in-built cloud connectors for replication. We feel it might be safer for > us to manage the replication to the cloud in our repository, rather than > having a single vendor system manage all three copies at once. > > Critique of this plan is most welcome! > > Also happy to join in any offline discussion about this. > > Best wishes, > > > Stuart Lewis > Head of Digital > > National Library of Scotland > George IV Bridge, Edinburgh EH1 1EW > > Tel: +44 (0) 131 623 3704 <+44%20131%20623%203704> > Email: stuart.lewis at nls.uk > Website: www.nls.uk > Twitter: @stuartlewis > [image: cid:image003.jpg at 01D0DE86.3095EC20] > > > > > From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org > ] On Behalf Of Julian M. Morley > Sent: 14 November 2017 04:28 > To: gail at trumantechnologies.com; pasig-discuss at mail.asis.org > Subject: Re: [Pasig-discuss] Experiences with S3-like object store > service providers? > > Hi Gail, > > Sure - would be happy to chat with you. > > I?ve got Scality in my list of contenders - didn?t mention it here because > my first few use cases are explicitly ?not on campus?, but I agree it?s > definitely a fit for our main on prem system. As with any commercial > software, ongoing licensing costs are a potential pain point for us. > > -- > Julian M. Morley > Technology Infrastructure Manager > Digital Library Systems & Services > Stanford University Libraries > > From: "gail at trumantechnologies.com" > Date: Monday, November 13, 2017 at 4:06 PM > To: Julian Morley , "pasig-discuss at mail.asis.org" < > pasig-discuss at mail.asis.org> > Cc: "gail at trumantechnologies.com" > Subject: RE: [Pasig-discuss] Experiences with S3-like object store > service providers? > > > Hi Julian, thanks for sharing your list and comments. Very thorough list. > I'd love to chat (and I'm close by in Oakland).... I've quite a lot of > experience in the cloud storage field and would suggest you also take a > look at multi-cloud connector technologies that will allow you to > standardize on S3, but write to non-S3-based public cloud vendors. And to > tier or move data among private and public clouds and do federated search > on metadata across a single namespace (across these clouds). > > Check out a couple of interesting technologies: > > Open Source Zenko.io - offering S3 connect to AWS, Azure and Google (the > latter 2 are coming shortly), and also > Scality Connect for Azure Blog Storage - translates S3 API calls to Azure > blob storage API calls. > > See the attached datasheet and also https://www.zenko.io/ > > I'd add Scality to your list -- see the Gartner magic quadrant they're > shown in the Upper Right Visionary quadrant and are close to you in San > Francisco. They talk S3, File, NFS/SMB, REST (CDMI etc), can tier off to > public clouds, and have lots of multi-PB size customer installs. Gartner > MQ is here: https://www.gartner.com/doc/reprints?id=1-4IE870C&ct= > 171017&st=sb > > I'd be very interested in learning more about your use cases -- can we > connect outside of this PASIG alias? > > Gail > > > > Gail Truman > Truman Technologies, LLC > Certified Digital Archives Specialist, Society of American Archivists > > *Protecting the world's digital heritage for future generations* > www.trumantechnologies.com > facebook/TrumanTechnologies > https://www.linkedin.com/in/gtruman > > +1 510 502 6497 <(510)%20502-6497> > > > > > -------- Original Message -------- > Subject: [Pasig-discuss] Experiences with S3-like object store service > providers? > From: "Julian M. Morley" > Date: Mon, November 13, 2017 12:43 pm > To: "pasig-discuss at mail.asis.org" > > Hi everyone, > > I?ve currently got at least four use cases for an S3-compatible object > store, spanning everything from traditional S3 through infrequent access > stores to cold vaults. As a result I?ve spent considerable time researching > options and prices, and was wondering if anyone on this list has any > similar experiences they?d like to share. > > Our use cases range from hundreds of TB through to several PB, with > different access patterns and comfort levels around redundancy and access. > For most of them a 100% compatible S3 API is a requirement, but we can bend > that a bit for the cold storage use case. We?re also considering > local/on-prem object stores for one of the use cases - either rolling our > own Ceph install, or using Dell/EMC ECS or SpectraLogic > ArcticBlue/Blackpearl. > > The vendors that I?m looking at are: > > Amazon Web Services (S3, Infrequent Access S3 and S3-to-Glacier). > This is the baseline. We have a direct connect pipe to AWS which reduces > the pain of data egress considerably. > > IBM Cloud Bluemix (formerly CleverSafe) > A good choice for multi-region redundancy, as they use erasure coding > across regions - no ?catch up? replication - providing CRR at a cheaper > price than AWS. If you only want to keep one copy of your data in the > cloud, but have it be able to survive the loss of a region, this is the > best choice (Google can also do this, but not with an S3 API or an > infrequent access store). > > Dell/EMC Virtustream (no cold storage option) > Uses EMC ECS hardware. Actually more expensive than AWS at retail pricing > for standard object storage; their value add is tying Virtustream into > on-prem ECS units. > > Iron Mountain Iron Cloud (Infrequent Access only) > Also uses EMC ECS hardware. Designed primarily for backup/archive > workloads (no big surprise there), but with no retrieval, egress or > PUT/GET/POST charges. > > Oracle Cloud (cheapest cold storage option, but not S3 API) > Uses Openstack Swift. Has the cheapest cloud-tape product (Oracle Cloud > Storage Archive), but has recently increased prices to be closer to AWS > Glacier. > > Google Cloud Platform (not an S3 API) > Technically brilliant, but you have to be able to use their APIs. Their > cold storage product is online (disk, not tape), but not as cheap as > Glacier. > > Microsoft Azure (not an S3 API) > Competitively priced, especially their Infrequent Access product, but > again not an S3 API and their vault product is still in beta. > > Backblaze B2 (not an S3 API) > Another backup/archive target, only slightly more expensive than Glacier, > but online (no retrieval time or fees) and with significantly cheaper data > egress rates than AWS. > > Wasabi Cloud > Recently launched company from the team that brought you Carbonite. > Ridiculously cheap S3 storage, but with a 90-day per-object minimum charge. *It?s > cheaper and faster than Glacier*, both to store data and egress it, but > there?s obvious concerns around company longevity. Would probably make a > good second target if you have a multi-vendor requirement for your data. > > If anyone is interested in hearing more, or has any experience with any of > these vendors, please speak up! > > -- > Julian M. Morley > Technology Infrastructure Manager > Digital Library Systems & Services > Stanford University Libraries > ------------------------------ > ---- > To subscribe, unsubscribe, or modify your subscription, please visit > http://mail.asis.org/mailman/listinfo/pasig-discuss > _______ > PASIG Webinars and conference material is at http://www. > preservationandarchivingsig.org/index.html > _______________________________________________ > Pasig-discuss mailing list > Pasig-discuss at mail.asis.org > http://mail.asis.org/mailman/listinfo/pasig-discuss > > > National Library of Scotland, Scottish Charity, No: SCO11086 > This communication is intended for the addressee(s) only. If you are not > the addressee please inform the sender and delete the email from your > system. The statements and opinions expressed in this message are those of > the author and do not necessarily reflect those of National Library of > Scotland. This message is subject to the Data Protection Act 1998 and > Freedom of Information (Scotland) Act 2002. No liability is accepted for > any harm that may be caused to your systems or data by this message. > Before you print please think about the ENVIRONMENT > > > > > > ---- > To subscribe, unsubscribe, or modify your subscription, please visit > http://mail.asis.org/mailman/listinfo/pasig-discuss > _______ > PASIG Webinars and conference material is at http://www. > preservationandarchivingsig.org/index.html > _______________________________________________ > Pasig-discuss mailing list > Pasig-discuss at mail.asis.org > http://mail.asis.org/mailman/listinfo/pasig-discuss > > -- Michael Davis | akropilot at gmail.com | mobile 408-464-0441 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 18754 bytes Desc: not available URL: From artpasquinelli at stanford.edu Tue Nov 14 16:00:15 2017 From: artpasquinelli at stanford.edu (Arthur Pasquinelli) Date: Tue, 14 Nov 2017 21:00:15 +0000 Subject: [Pasig-discuss] Experiences with S3-like object store service providers? In-Reply-To: References: <20171114100507.b554e26909f2beaf9f8ddbf6be9a6600.409ccf0e0f.wbe@email09.godaddy.com> Message-ID: <431DEF17-1725-451F-961F-2079BB529E17@stanford.edu> Mike, I love to see you ex-Sun people joining in on PASIG discussions! I think it shows the Sun commitment to Education is still living on! Since you offered and since Cloud architectures ? specifically AWS - is something we in the LOCKSS community have been doing analysis on, I would love to set up a webinar for the PASIG members. Gail and I can possibly work on it with you. I saw at the recent NDSA meeting that it was difficult to analyze/compare the offerings of the vendors beyond the basic fixity feature. A group has been actually working on a matrix. We can use my Zoom capability for a webinar. While I would also offer the opportunity to the other Cloud vendors, I think AWS should be the first webinar. You have a unique depth of expertise and I want to leverage it for the community. This is a conversation that has been percolating within the PASIG for awhile without the direct involvement of the major Cloud players other than Oracle periodically. So, I will work with you as much as you need to get this first webinar off the ground. Thanks! -- Art Pasquinelli LOCKSS Partnership Manager Stanford University Libraries Cell: 1-650-430-2441 artpasquinelli at stanford.edu From: Pasig-discuss > on behalf of Mike Davis > Date: Tuesday, November 14, 2017 at 12:22 PM To: "gail at trumantechnologies.com" > Cc: "pasig-discuss at mail.asis.org" > Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Gail I appreciate the fact that public cloud pricing can be complex; it's a function of the cost-following strategy. If the vendor incurs a cost, whether from media, IO, or networking, it's passed along as discrete charges to the customer. The alternative is opaquely bundling all the costs, which reduces transparency and flexibility to follow commodity curves downward. I believe it's publicly available data that S3 has dropped capacity pricing for example at an average 10% (ish) per year since launch. But the idea that transaction and I/O fees dramatically inflate the cost of public cloud storage is a myth, particularly for digital asset management and archival. It is certainly possible to design a wonky IO-heavy workload, place it on the wrong storage tier, and end up with unexpectedly high costs. But for archival-oriented workloads, the costs of moving data should never be greater than 10% of total or the situation needs to be examined more closely. For example, we might find that large objects are being inadvertently chunked into millions of 16KB objects by some third party gateway solution, that would inflate the transaction count. Happy to give you (and PASIG) a deeper dive on IAAS storage strategies, and to solve for long-term durability. -Mike Davis (AWS Storage) On Tue, Nov 14, 2017 at 9:05 AM, > wrote: Thanks for chiming in Jacob! As always, great additional information. I think it's worth emphasizing that having an open and native data format independent of where the data lives - this is really what will enable multi-cloud workflow management. And also having federated data search of system and descriptive metadata across the namespace no matter where the data is stored (including across public- and on-prem cloud storage). These are what the newer cloud controller software, like Zenko, Starfish, Minio and (and other sw within some cloud services) can enable. Public cloud storage prices are racing to the bottom, but (as David Rosenthal and others have pointed out) often the "hidden" costs of pulling the data back will usually result in costs greater than a private cloud. Stuart - I just read a couple of Forrester papers on Total Economic Impact (TEI) of public clouds -- the ones I have URLs to are posted below and make a useful read: https://www.emc.com/collateral/analyst-reports/dell-emc-ecs-forrester-tei.pdf https://whitepapers.theregister.co.uk/paper/view/5835/the-total-economic-impact-of-scality-ring-with-dell-storage-servers They talk about Dell hardware for building our on-prem clouds (ECS from EMC and RING from Scality) and I believe you're working with HPE, but the maths should be similar to show savings over public cloud. That said, putting one or more copies in public cloud and managing them from one namespace would be ideal... I envision use cases where multi-cloud controller software will allow you to move data to the cloud service that fits the data. [Even if it's for long-term archival, there are times when preservation data services will need to be run (format migration, integrity checks, creating access or derivatives of moving or still images, etc).] Spin up some quick compute services or Hadoop (for other use case). This is a great topic - Julian and Stuart, all the best on your projects, please do let this alias know what you decide to go with! Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers? From: Jacob Farmer > Date: Tue, November 14, 2017 7:03 am To: "Lewis, Stuart" >, "Julian M. Morley" >, gail at trumantechnologies.com, pasig-discuss at mail.asis.org Hi, Stuart. I thought I would weigh in on your plans. I?m a data storage consultant with about 30 years in the game. I?m also the founder of Starfish Storage which makes software products for virtualizing file systems and object stores. ? I think you are correct to manage multiple copies in the application layer. This gives you maximum control, ability to shift vendors, etc. Storage should always be thought of a stack that starts with the application and ends in the storage media. There can be multiple layers and the system architect should pick the right layer of abstraction for any given set of functionality. The higher in the stack, the greater your application awareness. ? By handling replication and addressing in your application, you should be able to switch object stores over time without much difficulty. As such, it does not matter so much which object store you buy. You could simply chase price. ? Oracle ? They are mysterious about what they are doing under the hood, but it does not matter. It?s a ?cloud?. They are so inexpensive. Use them as your second or third copy. I know that Oracle people monitor the news group. Maybe one will offer to connect you to a product manager who can describe the infrastructure. If not, I would be happy to connect you to folks on the US side of the pond. ? There is a particular vendor you did not list who might be interesting for you. They are called Caringo. They have been in the object storage business before S3 came to market, and thus they offer alternative addressing to the S3 bucket paradigm. They can emulate S3 just like everyone else, but S3 was designed by Amazon for the purpose of selling a storage service. It is not necessarily, the logical way to store objects for a digital library. If you are going to address objects directly from your application, they might have some unique value. I am happy to connect you to executives in the company. ? The other vendor worth looking at is Minio.IO. I just pointed Julian to them the other day. They provide an object interface in storage and could federate different cloud stores together. You might consider them for one of your copies. I still like the idea of doing your replication in the application. They are similar in concept to Zenko who Gail recommended earlier. ? POSIX File System Gateway ? My software company (Starfish Storage) has a file system gateway under development (ready for early adopters) that is ideal if you want a POSIX personality. We can take the contents of the object store and present it as a POSIX file system. o We map files 1:1 to objects. Most file system gateways on the market break up files into smaller objects, akin to blocks. o We support Active Directory perfectly, SMB-2, SMB-3, NFS-3, NFS-4 o We also work in-band or side-band to the object store. That means that you can use our POSIX interface simultaneously with S3. ? You probably also have use cases for Starfish, maybe as a migration tool from file to object or as an end-to-end fixity solution. We would be especially useful if you need to migrate files from your tape file system. o Starfish is a metadata and rules engine for file systems and object stores. Too many concepts to put in an email! I hope that helps. Message me offline if you want to discuss. I?m at the SuperComputer conference this week, so replies will be a bit slow. Jacob Farmer | Chief Technology Officer | Cambridge Computer | "Artists In Data Storage" Phone 781-250-3210 | jfarmer at CambridgeComputer.com | www.CambridgeComputer.com From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Lewis, Stuart Sent: Tuesday, November 14, 2017 4:26 AM To: 'Julian M. Morley' >; gail at trumantechnologies.com; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Julian, Gail, all, At the National Library of Scotland we are also in the middle of some procurements to revamp our storage infrastructure for holding our digitised content archive. The approach historically taken here has been to use general purpose SANs, with a second copy placed on offline tape. The SANs have never been built to scale (so they fill and we buy another), and they are general purpose, trying their best (but often failing!) to run a mixed workload of everything from VMs to data archive and everything in between. We?re now wanting to move to three copies, two online and one offline (in the cloud if possible). For the online copies we?re about to get to tender to buy a geo-replicated object storage system, to be hosted in our data centres in Edinburgh and Glasgow. I suspect the likely candidates will be systems such as Dell EMC ECS, HPE+Scality, IBM ESS**, and Hitachi HPC. (** ESS rather than CleverSafe, as I think that is predicated on three datacentres, but we only want two). We?re also about to try a large-scale proof of concept with the Oracle Archive Cloud, but have an open question regarding its characteristics compared to local offline tape. Due to lack of transparency about what is actually going on behind the scenes in a cloud environment, we don?t know whether this gives us the same offline protection that tape gives us (e.g. much harder to corrupt or accidentally delete). We?re also purposefully not going to use the object storage system?s in-built cloud connectors for replication. We feel it might be safer for us to manage the replication to the cloud in our repository, rather than having a single vendor system manage all three copies at once. Critique of this plan is most welcome! Also happy to join in any offline discussion about this. Best wishes, Stuart Lewis Head of Digital National Library of Scotland George IV Bridge, Edinburgh EH1 1EW Tel:+44 (0) 131 623 3704 Email:stuart.lewis at nls.uk Website:www.nls.uk Twitter:@stuartlewis [cid:image003.jpg at 01D0DE86.3095EC20] From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Julian M. Morley Sent: 14 November 2017 04:28 To:gail at trumantechnologies.com; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Gail, Sure - would be happy to chat with you. I?ve got Scality in my list of contenders - didn?t mention it here because my first few use cases are explicitly ?not on campus?, but I agree it?s definitely a fit for our main on prem system. As with any commercial software, ongoing licensing costs are a potential pain point for us. -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries From: "gail at trumantechnologies.com" > Date: Monday, November 13, 2017 at 4:06 PM To: Julian Morley >, "pasig-discuss at mail.asis.org" > Cc: "gail at trumantechnologies.com" > Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Julian, thanks for sharing your list and comments. Very thorough list. I'd love to chat (and I'm close by in Oakland).... I've quite a lot of experience in the cloud storage field and would suggest you also take a look at multi-cloud connector technologies that will allow you to standardize on S3, but write to non-S3-based public cloud vendors. And to tier or move data among private and public clouds and do federated search on metadata across a single namespace (across these clouds). Check out a couple of interesting technologies: Open Source Zenko.io - offering S3 connect to AWS, Azure and Google (the latter 2 are coming shortly), and also Scality Connect for Azure Blog Storage - translates S3 API calls to Azure blob storage API calls. See the attached datasheet and also https://www.zenko.io/ I'd add Scality to your list -- see the Gartner magic quadrant they're shown in the Upper Right Visionary quadrant and are close to you in San Francisco. They talk S3, File, NFS/SMB, REST (CDMI etc), can tier off to public clouds, and have lots of multi-PB size customer installs. Gartner MQ is here: https://www.gartner.com/doc/reprints?id=1-4IE870C&ct=171017&st=sb I'd be very interested in learning more about your use cases -- can we connect outside of this PASIG alias? Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: [Pasig-discuss] Experiences with S3-like object store service providers? From: "Julian M. Morley" > Date: Mon, November 13, 2017 12:43 pm To: "pasig-discuss at mail.asis.org" > Hi everyone, I?ve currently got at least four use cases for an S3-compatible object store, spanning everything from traditional S3 through infrequent access stores to cold vaults. As a result I?ve spent considerable time researching options and prices, and was wondering if anyone on this list has any similar experiences they?d like to share. Our use cases range from hundreds of TB through to several PB, with different access patterns and comfort levels around redundancy and access. For most of them a 100% compatible S3 API is a requirement, but we can bend that a bit for the cold storage use case. We?re also considering local/on-prem object stores for one of the use cases - either rolling our own Ceph install, or using Dell/EMC ECS or SpectraLogic ArcticBlue/Blackpearl. The vendors that I?m looking at are: Amazon Web Services (S3, Infrequent Access S3 and S3-to-Glacier). This is the baseline. We have a direct connect pipe to AWS which reduces the pain of data egress considerably. IBM Cloud Bluemix (formerly CleverSafe) A good choice for multi-region redundancy, as they use erasure coding across regions - no ?catch up? replication - providing CRR at a cheaper price than AWS. If you only want to keep one copy of your data in the cloud, but have it be able to survive the loss of a region, this is the best choice (Google can also do this, but not with an S3 API or an infrequent access store). Dell/EMC Virtustream (no cold storage option) Uses EMC ECS hardware. Actually more expensive than AWS at retail pricing for standard object storage; their value add is tying Virtustream into on-prem ECS units. Iron Mountain Iron Cloud (Infrequent Access only) Also uses EMC ECS hardware. Designed primarily for backup/archive workloads (no big surprise there), but with no retrieval, egress or PUT/GET/POST charges. Oracle Cloud (cheapest cold storage option, but not S3 API) Uses Openstack Swift. Has the cheapest cloud-tape product (Oracle Cloud Storage Archive), but has recently increased prices to be closer to AWS Glacier. Google Cloud Platform (not an S3 API) Technically brilliant, but you have to be able to use their APIs. Their cold storage product is online (disk, not tape), but not as cheap as Glacier. Microsoft Azure (not an S3 API) Competitively priced, especially their Infrequent Access product, but again not an S3 API and their vault product is still in beta. Backblaze B2 (not an S3 API) Another backup/archive target, only slightly more expensive than Glacier, but online (no retrieval time or fees) and with significantly cheaper data egress rates than AWS. Wasabi Cloud Recently launched company from the team that brought you Carbonite. Ridiculously cheap S3 storage, but with a 90-day per-object minimum charge. It?s cheaper and faster than Glacier, both to store data and egress it, but there?s obvious concerns around company longevity. Would probably make a good second target if you have a multi-vendor requirement for your data. If anyone is interested in hearing more, or has any experience with any of these vendors, please speak up! -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries ________________________________ ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss National Library of Scotland, Scottish Charity, No: SCO11086 This communication is intended for the addressee(s) only. If you are not the addressee please inform the sender and delete the email from your system. The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of National Library of Scotland. This message is subject to the Data Protection Act 1998 and Freedom of Information (Scotland) Act 2002. No liability is accepted for any harm that may be caused to your systems or data by this message. Before you print please think about the ENVIRONMENT ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -- Michael Davis | akropilot at gmail.com | mobile 408-464-0441 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 18754 bytes Desc: image001.jpg URL: From gail at trumantechnologies.com Tue Nov 14 16:07:39 2017 From: gail at trumantechnologies.com (gail at trumantechnologies.com) Date: Tue, 14 Nov 2017 14:07:39 -0700 Subject: [Pasig-discuss] =?utf-8?q?Experiences_with_S3-like_object_store_s?= =?utf-8?q?ervice_providers=3F?= Message-ID: <20171114140739.b554e26909f2beaf9f8ddbf6be9a6600.253dc30a44.wbe@email09.godaddy.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 18754 bytes Desc: not available URL: From gail at trumantechnologies.com Tue Nov 14 16:08:41 2017 From: gail at trumantechnologies.com (gail at trumantechnologies.com) Date: Tue, 14 Nov 2017 14:08:41 -0700 Subject: [Pasig-discuss] =?utf-8?q?Experiences_with_S3-like_object_store_s?= =?utf-8?q?ervice_providers=3F?= Message-ID: <20171114140841.b554e26909f2beaf9f8ddbf6be9a6600.79c7318a2c.wbe@email09.godaddy.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 18754 bytes Desc: not available URL: From dave at dpn.org Tue Nov 14 16:33:19 2017 From: dave at dpn.org (David Pcolar) Date: Tue, 14 Nov 2017 16:33:19 -0500 Subject: [Pasig-discuss] Experiences with S3-like object store service providers? In-Reply-To: References: <20171114100507.b554e26909f2beaf9f8ddbf6be9a6600.409ccf0e0f.wbe@email09.godaddy.com> Message-ID: <0E4FA879-5F8D-4995-A153-2130F28A25DC@dpn.org> Hi All, This is a great thread and it has surfaced a couple of new options for cloud storage. Thanks to everyone who has contributed. I would like to address a common issue with the cloud storage vendors and IO/retrieval costs. Since the likelihood of recovering a specific object is low, cloud storage is quite economical from a simple 'recover the object standpoint'. However, preservation repositories are touching those objects frequently to perform fixity checks. I am not aware of any cloud platform that will do fixity audits on demand, as detailed in NSDA Preservation level 3 (Check fixity of content at fixed intervals;Maintain logs of fixity info; supply audit on demand). A common method for providing these checks for content in S3 is to instantiate an EC2 instance, mount the S3 bucket, and run checksums on the objects. For Glacier, an additional step of staging the objects in an accessible area for the EC2 instance is required. This results in I/O and compute cycle fees that could dramatically inflate the cost of public cloud storage over time. For those utilizing public cloud storage for preservation, how are you addressing fixity checks and event audit capture? - Dave David Pcolar CTO, Digital Preservation Network dave at dpn.org > On Nov 14, 2017, at 3:22 PM, Mike Davis wrote: > > Hi Gail > > I appreciate the fact that public cloud pricing can be complex; it's a function of the cost-following strategy. If the vendor incurs a cost, whether from media, IO, or networking, it's passed along as discrete charges to the customer. The alternative is opaquely bundling all the costs, which reduces transparency and flexibility to follow commodity curves downward. I believe it's publicly available data that S3 has dropped capacity pricing for example at an average 10% (ish) per year since launch. > > But the idea that transaction and I/O fees dramatically inflate the cost of public cloud storage is a myth, particularly for digital asset management and archival. It is certainly possible to design a wonky IO-heavy workload, place it on the wrong storage tier, and end up with unexpectedly high costs. But for archival-oriented workloads, the costs of moving data should never be greater than 10% of total or the situation needs to be examined more closely. For example, we might find that large objects are being inadvertently chunked into millions of 16KB objects by some third party gateway solution, that would inflate the transaction count. > > Happy to give you (and PASIG) a deeper dive on IAAS storage strategies, and to solve for long-term durability. > > -Mike Davis (AWS Storage) > > > On Tue, Nov 14, 2017 at 9:05 AM, > wrote: > Thanks for chiming in Jacob! As always, great additional information. I think it's worth emphasizing that having an open and native data format independent of where the data lives - this is really what will enable multi-cloud workflow management. And also having federated data search of system and descriptive metadata across the namespace no matter where the data is stored (including across public- and on-prem cloud storage). These are what the newer cloud controller software, like Zenko, Starfish, Minio and (and other sw within some cloud services) can enable. > > Public cloud storage prices are racing to the bottom, but (as David Rosenthal and others have pointed out) often the "hidden" costs of pulling the data back will usually result in costs greater than a private cloud. > > Stuart - > I just read a couple of Forrester papers on Total Economic Impact (TEI) of public clouds -- the ones I have URLs to are posted below and make a useful read: > https://www.emc.com/collateral/analyst-reports/dell-emc-ecs-forrester-tei.pdf > https://whitepapers.theregister.co.uk/paper/view/5835/the-total-economic-impact-of-scality-ring-with-dell-storage-servers > They talk about Dell hardware for building our on-prem clouds (ECS from EMC and RING from Scality) and I believe you're working with HPE, but the maths should be similar to show savings over public cloud. That said, putting one or more copies in public cloud and managing them from one namespace would be ideal... I envision use cases where multi-cloud controller software will allow you to move data to the cloud service that fits the data. [Even if it's for long-term archival, there are times when preservation data services will need to be run (format migration, integrity checks, creating access or derivatives of moving or still images, etc).] Spin up some quick compute services or Hadoop (for other use case). > > This is a great topic - Julian and Stuart, all the best on your projects, please do let this alias know what you decide to go with! > > > Gail > > > Gail Truman > Truman Technologies, LLC > Certified Digital Archives Specialist, Society of American Archivists > > Protecting the world's digital heritage for future generations > www.trumantechnologies.com > facebook/TrumanTechnologies > https://www.linkedin.com/in/gtruman > > +1 510 502 6497 > > > > > -------- Original Message -------- > Subject: RE: [Pasig-discuss] Experiences with S3-like object store > service providers? > From: Jacob Farmer > > Date: Tue, November 14, 2017 7:03 am > To: "Lewis, Stuart" >, "Julian M. Morley" > >, gail at trumantechnologies.com , > pasig-discuss at mail.asis.org > > Hi, Stuart. I thought I would weigh in on your plans. I?m a data storage consultant with about 30 years in the game. I?m also the founder of Starfish Storage which makes software products for virtualizing file systems and object stores. > > ? I think you are correct to manage multiple copies in the application layer. This gives you maximum control, ability to shift vendors, etc. Storage should always be thought of a stack that starts with the application and ends in the storage media. There can be multiple layers and the system architect should pick the right layer of abstraction for any given set of functionality. The higher in the stack, the greater your application awareness. > > ? By handling replication and addressing in your application, you should be able to switch object stores over time without much difficulty. As such, it does not matter so much which object store you buy. You could simply chase price. > > ? Oracle ? They are mysterious about what they are doing under the hood, but it does not matter. It?s a ?cloud?. They are so inexpensive. Use them as your second or third copy. I know that Oracle people monitor the news group. Maybe one will offer to connect you to a product manager who can describe the infrastructure. If not, I would be happy to connect you to folks on the US side of the pond. > > ? There is a particular vendor you did not list who might be interesting for you. They are called Caringo. They have been in the object storage business before S3 came to market, and thus they offer alternative addressing to the S3 bucket paradigm. They can emulate S3 just like everyone else, but S3 was designed by Amazon for the purpose of selling a storage service. It is not necessarily, the logical way to store objects for a digital library. If you are going to address objects directly from your application, they might have some unique value. I am happy to connect you to executives in the company. > > ? The other vendor worth looking at is Minio.IO . I just pointed Julian to them the other day. They provide an object interface in storage and could federate different cloud stores together. You might consider them for one of your copies. I still like the idea of doing your replication in the application. They are similar in concept to Zenko who Gail recommended earlier. > > ? POSIX File System Gateway ? My software company (Starfish Storage) has a file system gateway under development (ready for early adopters) that is ideal if you want a POSIX personality. We can take the contents of the object store and present it as a POSIX file system. > > o We map files 1:1 to objects. Most file system gateways on the market break up files into smaller objects, akin to blocks. > o We support Active Directory perfectly, SMB-2, SMB-3, NFS-3, NFS-4 > o We also work in-band or side-band to the object store. That means that you can use our POSIX interface simultaneously with S3. > > ? You probably also have use cases for Starfish, maybe as a migration tool from file to object or as an end-to-end fixity solution. We would be especially useful if you need to migrate files from your tape file system. > o Starfish is a metadata and rules engine for file systems and object stores. Too many concepts to put in an email! > > I hope that helps. Message me offline if you want to discuss. I?m at the SuperComputer conference this week, so replies will be a bit slow. > > > Jacob Farmer | Chief Technology Officer | Cambridge Computer | "Artists In Data Storage" > Phone 781-250-3210 | jfarmer at CambridgeComputer.com | www.CambridgeComputer.com > > > From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org ] On Behalf Of Lewis, Stuart > Sent: Tuesday, November 14, 2017 4:26 AM > To: 'Julian M. Morley' >; gail at trumantechnologies.com ; pasig-discuss at mail.asis.org > Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? > > Hi Julian, Gail, all, > > At the National Library of Scotland we are also in the middle of some procurements to revamp our storage infrastructure for holding our digitised content archive. > > The approach historically taken here has been to use general purpose SANs, with a second copy placed on offline tape. The SANs have never been built to scale (so they fill and we buy another), and they are general purpose, trying their best (but often failing!) to run a mixed workload of everything from VMs to data archive and everything in between. > > We?re now wanting to move to three copies, two online and one offline (in the cloud if possible). > > For the online copies we?re about to get to tender to buy a geo-replicated object storage system, to be hosted in our data centres in Edinburgh and Glasgow. I suspect the likely candidates will be systems such as Dell EMC ECS, HPE+Scality, IBM ESS**, and Hitachi HPC. > > (** ESS rather than CleverSafe, as I think that is predicated on three datacentres, but we only want two). > > We?re also about to try a large-scale proof of concept with the Oracle Archive Cloud, but have an open question regarding its characteristics compared to local offline tape. Due to lack of transparency about what is actually going on behind the scenes in a cloud environment, we don?t know whether this gives us the same offline protection that tape gives us (e.g. much harder to corrupt or accidentally delete). > > We?re also purposefully not going to use the object storage system?s in-built cloud connectors for replication. We feel it might be safer for us to manage the replication to the cloud in our repository, rather than having a single vendor system manage all three copies at once. > > Critique of this plan is most welcome! > > Also happy to join in any offline discussion about this. > > Best wishes, > > > Stuart Lewis > Head of Digital > > National Library of Scotland > George IV Bridge, Edinburgh EH1 1EW > > Tel: +44 (0) 131 623 3704 > Email: stuart.lewis at nls.uk > Website: www.nls.uk > Twitter: @stuartlewis > > > > > > From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org ] On Behalf Of Julian M. Morley > Sent: 14 November 2017 04:28 > To: gail at trumantechnologies.com ; pasig-discuss at mail.asis.org > Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? > > Hi Gail, > > Sure - would be happy to chat with you. > > I?ve got Scality in my list of contenders - didn?t mention it here because my first few use cases are explicitly ?not on campus?, but I agree it?s definitely a fit for our main on prem system. As with any commercial software, ongoing licensing costs are a potential pain point for us. > > -- > Julian M. Morley > Technology Infrastructure Manager > Digital Library Systems & Services > Stanford University Libraries > > From: "gail at trumantechnologies.com " > > Date: Monday, November 13, 2017 at 4:06 PM > To: Julian Morley >, "pasig-discuss at mail.asis.org " > > Cc: "gail at trumantechnologies.com " > > Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers? > > Hi Julian, thanks for sharing your list and comments. Very thorough list. I'd love to chat (and I'm close by in Oakland).... I've quite a lot of experience in the cloud storage field and would suggest you also take a look at multi-cloud connector technologies that will allow you to standardize on S3, but write to non-S3-based public cloud vendors. And to tier or move data among private and public clouds and do federated search on metadata across a single namespace (across these clouds). > > Check out a couple of interesting technologies: > > Open Source Zenko.io - offering S3 connect to AWS, Azure and Google (the latter 2 are coming shortly), and also > Scality Connect for Azure Blog Storage - translates S3 API calls to Azure blob storage API calls. > > See the attached datasheet and also https://www.zenko.io / > > I'd add Scality to your list -- see the Gartner magic quadrant they're shown in the Upper Right Visionary quadrant and are close to you in San Francisco. They talk S3, File, NFS/SMB, REST (CDMI etc), can tier off to public clouds, and have lots of multi-PB size customer installs. Gartner MQ is here: https://www.gartner.com/doc/reprints?id=1-4IE870C&ct=171017&st=sb > > I'd be very interested in learning more about your use cases -- can we connect outside of this PASIG alias? > > Gail > > > > Gail Truman > Truman Technologies, LLC > Certified Digital Archives Specialist, Society of American Archivists > > Protecting the world's digital heritage for future generations > www.trumantechnologies.com > facebook/TrumanTechnologies > https://www.linkedin.com/in/gtruman > > +1 510 502 6497 > > > > -------- Original Message -------- > Subject: [Pasig-discuss] Experiences with S3-like object store service > providers? > From: "Julian M. Morley" > > Date: Mon, November 13, 2017 12:43 pm > To: "pasig-discuss at mail.asis.org " > > > Hi everyone, > > I?ve currently got at least four use cases for an S3-compatible object store, spanning everything from traditional S3 through infrequent access stores to cold vaults. As a result I?ve spent considerable time researching options and prices, and was wondering if anyone on this list has any similar experiences they?d like to share. > > Our use cases range from hundreds of TB through to several PB, with different access patterns and comfort levels around redundancy and access. For most of them a 100% compatible S3 API is a requirement, but we can bend that a bit for the cold storage use case. We?re also considering local/on-prem object stores for one of the use cases - either rolling our own Ceph install, or using Dell/EMC ECS or SpectraLogic ArcticBlue/Blackpearl. > > The vendors that I?m looking at are: > > Amazon Web Services (S3, Infrequent Access S3 and S3-to-Glacier). > This is the baseline. We have a direct connect pipe to AWS which reduces the pain of data egress considerably. > > IBM Cloud Bluemix (formerly CleverSafe) > A good choice for multi-region redundancy, as they use erasure coding across regions - no ?catch up? replication - providing CRR at a cheaper price than AWS. If you only want to keep one copy of your data in the cloud, but have it be able to survive the loss of a region, this is the best choice (Google can also do this, but not with an S3 API or an infrequent access store). > > Dell/EMC Virtustream (no cold storage option) > Uses EMC ECS hardware. Actually more expensive than AWS at retail pricing for standard object storage; their value add is tying Virtustream into on-prem ECS units. > > Iron Mountain Iron Cloud (Infrequent Access only) > Also uses EMC ECS hardware. Designed primarily for backup/archive workloads (no big surprise there), but with no retrieval, egress or PUT/GET/POST charges. > > Oracle Cloud (cheapest cold storage option, but not S3 API) > Uses Openstack Swift. Has the cheapest cloud-tape product (Oracle Cloud Storage Archive), but has recently increased prices to be closer to AWS Glacier. > > Google Cloud Platform (not an S3 API) > Technically brilliant, but you have to be able to use their APIs. Their cold storage product is online (disk, not tape), but not as cheap as Glacier. > > Microsoft Azure (not an S3 API) > Competitively priced, especially their Infrequent Access product, but again not an S3 API and their vault product is still in beta. > > Backblaze B2 (not an S3 API) > Another backup/archive target, only slightly more expensive than Glacier, but online (no retrieval time or fees) and with significantly cheaper data egress rates than AWS. > > Wasabi Cloud > Recently launched company from the team that brought you Carbonite. Ridiculously cheap S3 storage, but with a 90-day per-object minimum charge. It?s cheaper and faster than Glacier, both to store data and egress it, but there?s obvious concerns around company longevity. Would probably make a good second target if you have a multi-vendor requirement for your data. > > If anyone is interested in hearing more, or has any experience with any of these vendors, please speak up! > > -- > Julian M. Morley > Technology Infrastructure Manager > Digital Library Systems & Services > Stanford University Libraries > ---- > To subscribe, unsubscribe, or modify your subscription, please visit > http://mail.asis.org/mailman/listinfo/pasig-discuss > _______ > PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html > _______________________________________________ > Pasig-discuss mailing list > Pasig-discuss at mail.asis.org > http://mail.asis.org/mailman/listinfo/pasig-discuss > > National Library of Scotland, Scottish Charity, No: SCO11086 > This communication is intended for the addressee(s) only. If you are not the addressee please inform the sender and delete the email from your system. The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of National Library of Scotland. This message is subject to the Data Protection Act 1998 and Freedom of Information (Scotland) Act 2002. No liability is accepted for any harm that may be caused to your systems or data by this message. > Before you print please think about the ENVIRONMENT > > > > > ---- > To subscribe, unsubscribe, or modify your subscription, please visit > http://mail.asis.org/mailman/listinfo/pasig-discuss > _______ > PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html > _______________________________________________ > Pasig-discuss mailing list > Pasig-discuss at mail.asis.org > http://mail.asis.org/mailman/listinfo/pasig-discuss > > > > > -- > Michael Davis | akropilot at gmail.com | mobile 408-464-0441 > ---- > To subscribe, unsubscribe, or modify your subscription, please visit > http://mail.asis.org/mailman/listinfo/pasig-discuss > _______ > PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html > _______________________________________________ > Pasig-discuss mailing list > Pasig-discuss at mail.asis.org > http://mail.asis.org/mailman/listinfo/pasig-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmorley at stanford.edu Tue Nov 14 22:34:40 2017 From: jmorley at stanford.edu (Julian M. Morley) Date: Wed, 15 Nov 2017 03:34:40 +0000 Subject: [Pasig-discuss] Experiences with S3-like object store service providers? In-Reply-To: <0E4FA879-5F8D-4995-A153-2130F28A25DC@dpn.org> References: <20171114100507.b554e26909f2beaf9f8ddbf6be9a6600.409ccf0e0f.wbe@email09.godaddy.com> <0E4FA879-5F8D-4995-A153-2130F28A25DC@dpn.org> Message-ID: Hi David, We haven?t done it yet, but we plan to do something similar to what you describe below when we start storing SDR content in cloud providers. One open question (that I intend to ask Mike Davis about!) is how S3 stores and/or updates ETag fields for uploaded objects. I think that when content is recovered from Glacier to S3-IA, the ETag/MD5 of the file is computed when the file is written to S3-IA. This means that a ?good enough? fixity check can be done simply by recalling the data to S3-IA (relatively cheap! No data egress charges!) and performing a simple metadata check of the recovered object. Costs are easily projected/constrained simply by deciding what % of your total data corpus you want ?in flight? at any one time. This requires us to store checksums for all objects that we send to the cloud in a separate datastore - we?ll be using something called the Trusted Checksum Repository for this - a WORM-style database that stands to the side of the SDR. I?m also assuming (again, this is a question for Mike) that S3 and other cloud providers do perform periodic scrubs of their data, and use EC to correct for any bad blocks that they find. For example, Wasabi explicitly states that they validate md5 checksums of content every 90 days. Presumably when they do that they?ll update the ETag if it has changed, which again allows a metadata check to validate fixity. The same process for Glacier recovery to S3-IA works for Oracle Cloud Storage Archive to Oracle Cloud Storage - the recovered object has a freshly-generated ETag, which can then be compared against the stored checksum when the file was first uploaded. No cloud compute instance needed. Oracle has also told me (although this is not official) that they write content to two tapes, and perform occasional CRC checks / migrations of content to ensure that data on the tapes hasn?t gone bad, although that?s not on a fixed schedule. For GCP and other online/non-vault options, running a compute instance is probably still the best way to go. And if we want to unpack the object and checksum all the elements it?s pretty much the only game in town - we plan to do that for a random sampling of our content, adjusting our throughput by varying recovery request rate and EBS disk sizes until we settle on an acceptable rate. ( There *is* a wrinkle here with ETags and multi-part uploads that I?m not getting into. It?s still possible to get and store a useful MD5, you just need to do a little bit of extra legwork to get there. ) -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries From: Pasig-discuss > on behalf of David Pcolar > Date: Tuesday, November 14, 2017 at 1:33 PM To: "pasig-discuss at mail.asis.org" > Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi All, This is a great thread and it has surfaced a couple of new options for cloud storage. Thanks to everyone who has contributed. I would like to address a common issue with the cloud storage vendors and IO/retrieval costs. Since the likelihood of recovering a specific object is low, cloud storage is quite economical from a simple 'recover the object standpoint'. However, preservation repositories are touching those objects frequently to perform fixity checks. I am not aware of any cloud platform that will do fixity audits on demand, as detailed in NSDA Preservation level 3 (Check fixity of content at fixed intervals;Maintain logs of fixity info; supply audit on demand). A common method for providing these checks for content in S3 is to instantiate an EC2 instance, mount the S3 bucket, and run checksums on the objects. For Glacier, an additional step of staging the objects in an accessible area for the EC2 instance is required. This results in I/O and compute cycle fees that could dramatically inflate the cost of public cloud storage over time. For those utilizing public cloud storage for preservation, how are you addressing fixity checks and event audit capture? - Dave David Pcolar CTO, Digital Preservation Network dave at dpn.org On Nov 14, 2017, at 3:22 PM, Mike Davis > wrote: Hi Gail I appreciate the fact that public cloud pricing can be complex; it's a function of the cost-following strategy. If the vendor incurs a cost, whether from media, IO, or networking, it's passed along as discrete charges to the customer. The alternative is opaquely bundling all the costs, which reduces transparency and flexibility to follow commodity curves downward. I believe it's publicly available data that S3 has dropped capacity pricing for example at an average 10% (ish) per year since launch. But the idea that transaction and I/O fees dramatically inflate the cost of public cloud storage is a myth, particularly for digital asset management and archival. It is certainly possible to design a wonky IO-heavy workload, place it on the wrong storage tier, and end up with unexpectedly high costs. But for archival-oriented workloads, the costs of moving data should never be greater than 10% of total or the situation needs to be examined more closely. For example, we might find that large objects are being inadvertently chunked into millions of 16KB objects by some third party gateway solution, that would inflate the transaction count. Happy to give you (and PASIG) a deeper dive on IAAS storage strategies, and to solve for long-term durability. -Mike Davis (AWS Storage) On Tue, Nov 14, 2017 at 9:05 AM, > wrote: Thanks for chiming in Jacob! As always, great additional information. I think it's worth emphasizing that having an open and native data format independent of where the data lives - this is really what will enable multi-cloud workflow management. And also having federated data search of system and descriptive metadata across the namespace no matter where the data is stored (including across public- and on-prem cloud storage). These are what the newer cloud controller software, like Zenko, Starfish, Minio and (and other sw within some cloud services) can enable. Public cloud storage prices are racing to the bottom, but (as David Rosenthal and others have pointed out) often the "hidden" costs of pulling the data back will usually result in costs greater than a private cloud. Stuart - I just read a couple of Forrester papers on Total Economic Impact (TEI) of public clouds -- the ones I have URLs to are posted below and make a useful read: https://www.emc.com/collateral/analyst-reports/dell-emc-ecs-forrester-tei.pdf https://whitepapers.theregister.co.uk/paper/view/5835/the-total-economic-impact-of-scality-ring-with-dell-storage-servers They talk about Dell hardware for building our on-prem clouds (ECS from EMC and RING from Scality) and I believe you're working with HPE, but the maths should be similar to show savings over public cloud. That said, putting one or more copies in public cloud and managing them from one namespace would be ideal... I envision use cases where multi-cloud controller software will allow you to move data to the cloud service that fits the data. [Even if it's for long-term archival, there are times when preservation data services will need to be run (format migration, integrity checks, creating access or derivatives of moving or still images, etc).] Spin up some quick compute services or Hadoop (for other use case). This is a great topic - Julian and Stuart, all the best on your projects, please do let this alias know what you decide to go with! Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers? From: Jacob Farmer > Date: Tue, November 14, 2017 7:03 am To: "Lewis, Stuart" >, "Julian M. Morley" >, gail at trumantechnologies.com, pasig-discuss at mail.asis.org Hi, Stuart. I thought I would weigh in on your plans. I?m a data storage consultant with about 30 years in the game. I?m also the founder of Starfish Storage which makes software products for virtualizing file systems and object stores. ? I think you are correct to manage multiple copies in the application layer. This gives you maximum control, ability to shift vendors, etc. Storage should always be thought of a stack that starts with the application and ends in the storage media. There can be multiple layers and the system architect should pick the right layer of abstraction for any given set of functionality. The higher in the stack, the greater your application awareness. ? By handling replication and addressing in your application, you should be able to switch object stores over time without much difficulty. As such, it does not matter so much which object store you buy. You could simply chase price. ? Oracle ? They are mysterious about what they are doing under the hood, but it does not matter. It?s a ?cloud?. They are so inexpensive. Use them as your second or third copy. I know that Oracle people monitor the news group. Maybe one will offer to connect you to a product manager who can describe the infrastructure. If not, I would be happy to connect you to folks on the US side of the pond. ? There is a particular vendor you did not list who might be interesting for you. They are called Caringo. They have been in the object storage business before S3 came to market, and thus they offer alternative addressing to the S3 bucket paradigm. They can emulate S3 just like everyone else, but S3 was designed by Amazon for the purpose of selling a storage service. It is not necessarily, the logical way to store objects for a digital library. If you are going to address objects directly from your application, they might have some unique value. I am happy to connect you to executives in the company. ? The other vendor worth looking at is Minio.IO. I just pointed Julian to them the other day. They provide an object interface in storage and could federate different cloud stores together. You might consider them for one of your copies. I still like the idea of doing your replication in the application. They are similar in concept to Zenko who Gail recommended earlier. ? POSIX File System Gateway ? My software company (Starfish Storage) has a file system gateway under development (ready for early adopters) that is ideal if you want a POSIX personality. We can take the contents of the object store and present it as a POSIX file system. o We map files 1:1 to objects. Most file system gateways on the market break up files into smaller objects, akin to blocks. o We support Active Directory perfectly, SMB-2, SMB-3, NFS-3, NFS-4 o We also work in-band or side-band to the object store. That means that you can use our POSIX interface simultaneously with S3. ? You probably also have use cases for Starfish, maybe as a migration tool from file to object or as an end-to-end fixity solution. We would be especially useful if you need to migrate files from your tape file system. o Starfish is a metadata and rules engine for file systems and object stores. Too many concepts to put in an email! I hope that helps. Message me offline if you want to discuss. I?m at the SuperComputer conference this week, so replies will be a bit slow. Jacob Farmer | Chief Technology Officer | Cambridge Computer | "Artists In Data Storage" Phone 781-250-3210 | jfarmer at CambridgeComputer.com | www.CambridgeComputer.com From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Lewis, Stuart Sent: Tuesday, November 14, 2017 4:26 AM To: 'Julian M. Morley' >; gail at trumantechnologies.com; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Julian, Gail, all, At the National Library of Scotland we are also in the middle of some procurements to revamp our storage infrastructure for holding our digitised content archive. The approach historically taken here has been to use general purpose SANs, with a second copy placed on offline tape. The SANs have never been built to scale (so they fill and we buy another), and they are general purpose, trying their best (but often failing!) to run a mixed workload of everything from VMs to data archive and everything in between. We?re now wanting to move to three copies, two online and one offline (in the cloud if possible). For the online copies we?re about to get to tender to buy a geo-replicated object storage system, to be hosted in our data centres in Edinburgh and Glasgow. I suspect the likely candidates will be systems such as Dell EMC ECS, HPE+Scality, IBM ESS**, and Hitachi HPC. (** ESS rather than CleverSafe, as I think that is predicated on three datacentres, but we only want two). We?re also about to try a large-scale proof of concept with the Oracle Archive Cloud, but have an open question regarding its characteristics compared to local offline tape. Due to lack of transparency about what is actually going on behind the scenes in a cloud environment, we don?t know whether this gives us the same offline protection that tape gives us (e.g. much harder to corrupt or accidentally delete). We?re also purposefully not going to use the object storage system?s in-built cloud connectors for replication. We feel it might be safer for us to manage the replication to the cloud in our repository, rather than having a single vendor system manage all three copies at once. Critique of this plan is most welcome! Also happy to join in any offline discussion about this. Best wishes, Stuart Lewis Head of Digital National Library of Scotland George IV Bridge, Edinburgh EH1 1EW Tel: +44 (0) 131 623 3704 Email:stuart.lewis at nls.uk Website:www.nls.uk Twitter:@stuartlewis From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Julian M. Morley Sent: 14 November 2017 04:28 To:gail at trumantechnologies.com; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Gail, Sure - would be happy to chat with you. I?ve got Scality in my list of contenders - didn?t mention it here because my first few use cases are explicitly ?not on campus?, but I agree it?s definitely a fit for our main on prem system. As with any commercial software, ongoing licensing costs are a potential pain point for us. -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries From: "gail at trumantechnologies.com" > Date: Monday, November 13, 2017 at 4:06 PM To: Julian Morley >, "pasig-discuss at mail.asis.org" > Cc: "gail at trumantechnologies.com" > Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Julian, thanks for sharing your list and comments. Very thorough list. I'd love to chat (and I'm close by in Oakland).... I've quite a lot of experience in the cloud storage field and would suggest you also take a look at multi-cloud connector technologies that will allow you to standardize on S3, but write to non-S3-based public cloud vendors. And to tier or move data among private and public clouds and do federated search on metadata across a single namespace (across these clouds). Check out a couple of interesting technologies: Open Source Zenko.io - offering S3 connect to AWS, Azure and Google (the latter 2 are coming shortly), and also Scality Connect for Azure Blog Storage - translates S3 API calls to Azure blob storage API calls. See the attached datasheet and also https://www.zenko.io/ I'd add Scality to your list -- see the Gartner magic quadrant they're shown in the Upper Right Visionary quadrant and are close to you in San Francisco. They talk S3, File, NFS/SMB, REST (CDMI etc), can tier off to public clouds, and have lots of multi-PB size customer installs. Gartner MQ is here: https://www.gartner.com/doc/reprints?id=1-4IE870C&ct=171017&st=sb I'd be very interested in learning more about your use cases -- can we connect outside of this PASIG alias? Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: [Pasig-discuss] Experiences with S3-like object store service providers? From: "Julian M. Morley" > Date: Mon, November 13, 2017 12:43 pm To: "pasig-discuss at mail.asis.org" > Hi everyone, I?ve currently got at least four use cases for an S3-compatible object store, spanning everything from traditional S3 through infrequent access stores to cold vaults. As a result I?ve spent considerable time researching options and prices, and was wondering if anyone on this list has any similar experiences they?d like to share. Our use cases range from hundreds of TB through to several PB, with different access patterns and comfort levels around redundancy and access. For most of them a 100% compatible S3 API is a requirement, but we can bend that a bit for the cold storage use case. We?re also considering local/on-prem object stores for one of the use cases - either rolling our own Ceph install, or using Dell/EMC ECS or SpectraLogic ArcticBlue/Blackpearl. The vendors that I?m looking at are: Amazon Web Services (S3, Infrequent Access S3 and S3-to-Glacier). This is the baseline. We have a direct connect pipe to AWS which reduces the pain of data egress considerably. IBM Cloud Bluemix (formerly CleverSafe) A good choice for multi-region redundancy, as they use erasure coding across regions - no ?catch up? replication - providing CRR at a cheaper price than AWS. If you only want to keep one copy of your data in the cloud, but have it be able to survive the loss of a region, this is the best choice (Google can also do this, but not with an S3 API or an infrequent access store). Dell/EMC Virtustream (no cold storage option) Uses EMC ECS hardware. Actually more expensive than AWS at retail pricing for standard object storage; their value add is tying Virtustream into on-prem ECS units. Iron Mountain Iron Cloud (Infrequent Access only) Also uses EMC ECS hardware. Designed primarily for backup/archive workloads (no big surprise there), but with no retrieval, egress or PUT/GET/POST charges. Oracle Cloud (cheapest cold storage option, but not S3 API) Uses Openstack Swift. Has the cheapest cloud-tape product (Oracle Cloud Storage Archive), but has recently increased prices to be closer to AWS Glacier. Google Cloud Platform (not an S3 API) Technically brilliant, but you have to be able to use their APIs. Their cold storage product is online (disk, not tape), but not as cheap as Glacier. Microsoft Azure (not an S3 API) Competitively priced, especially their Infrequent Access product, but again not an S3 API and their vault product is still in beta. Backblaze B2 (not an S3 API) Another backup/archive target, only slightly more expensive than Glacier, but online (no retrieval time or fees) and with significantly cheaper data egress rates than AWS. Wasabi Cloud Recently launched company from the team that brought you Carbonite. Ridiculously cheap S3 storage, but with a 90-day per-object minimum charge. It?s cheaper and faster than Glacier, both to store data and egress it, but there?s obvious concerns around company longevity. Would probably make a good second target if you have a multi-vendor requirement for your data. If anyone is interested in hearing more, or has any experience with any of these vendors, please speak up! -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries ________________________________ ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss National Library of Scotland, Scottish Charity, No: SCO11086 This communication is intended for the addressee(s) only. If you are not the addressee please inform the sender and delete the email from your system. The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of National Library of Scotland. This message is subject to the Data Protection Act 1998 and Freedom of Information (Scotland) Act 2002. No liability is accepted for any harm that may be caused to your systems or data by this message. Before you print please think about the ENVIRONMENT ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -- Michael Davis | akropilot at gmail.com | mobile 408-464-0441 ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From itsoffice at kcl.ac.uk Wed Nov 15 05:12:54 2017 From: itsoffice at kcl.ac.uk (kcl - itsoffice) Date: Wed, 15 Nov 2017 10:12:54 +0000 Subject: [Pasig-discuss] Experiences with S3-like object store service providers? In-Reply-To: <20171114140739.b554e26909f2beaf9f8ddbf6be9a6600.253dc30a44.wbe@email09.godaddy.com> References: <20171114140739.b554e26909f2beaf9f8ddbf6be9a6600.253dc30a44.wbe@email09.godaddy.com> Message-ID: Hi, Please could this email be removed from this mailing list. Kind regards, Katherine From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of gail at trumantechnologies.com Sent: 14 November 2017 21:08 To: Mike Davis Cc: pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Mike, great to have you join the conversation - and to see you're at AWS now. You make very good points, and certainly there are best/better practices that can be used to keep costs under control. How about we all take you up on the offer of PASIG presentation? I know you were at the original PASIG events back in Sun Microsystems days.... Art just posted the info that PASIG will be in February in Mexico. Or perhaps one of the PASIG webinars? Thanks! Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? From: Mike Davis > Date: Tue, November 14, 2017 12:22 pm To: gail at trumantechnologies.com Cc: Jacob Farmer >, "Lewis, Stuart" >, "Julian M. Morley" >, pasig-discuss at mail.asis.org Hi Gail I appreciate the fact that public cloud pricing can be complex; it's a function of the cost-following strategy. If the vendor incurs a cost, whether from media, IO, or networking, it's passed along as discrete charges to the customer. The alternative is opaquely bundling all the costs, which reduces transparency and flexibility to follow commodity curves downward. I believe it's publicly available data that S3 has dropped capacity pricing for example at an average 10% (ish) per year since launch. But the idea that transaction and I/O fees dramatically inflate the cost of public cloud storage is a myth, particularly for digital asset management and archival. It is certainly possible to design a wonky IO-heavy workload, place it on the wrong storage tier, and end up with unexpectedly high costs. But for archival-oriented workloads, the costs of moving data should never be greater than 10% of total or the situation needs to be examined more closely. For example, we might find that large objects are being inadvertently chunked into millions of 16KB objects by some third party gateway solution, that would inflate the transaction count. Happy to give you (and PASIG) a deeper dive on IAAS storage strategies, and to solve for long-term durability. -Mike Davis (AWS Storage) On Tue, Nov 14, 2017 at 9:05 AM, > wrote: Thanks for chiming in Jacob! As always, great additional information. I think it's worth emphasizing that having an open and native data format independent of where the data lives - this is really what will enable multi-cloud workflow management. And also having federated data search of system and descriptive metadata across the namespace no matter where the data is stored (including across public- and on-prem cloud storage). These are what the newer cloud controller software, like Zenko, Starfish, Minio and (and other sw within some cloud services) can enable. Public cloud storage prices are racing to the bottom, but (as David Rosenthal and others have pointed out) often the "hidden" costs of pulling the data back will usually result in costs greater than a private cloud. Stuart - I just read a couple of Forrester papers on Total Economic Impact (TEI) of public clouds -- the ones I have URLs to are posted below and make a useful read: https://www.emc.com/collateral/analyst-reports/dell-emc-ecs-forrester-tei.pdf https://whitepapers.theregister.co.uk/paper/view/5835/the-total-economic-impact-of-scality-ring-with-dell-storage-servers They talk about Dell hardware for building our on-prem clouds (ECS from EMC and RING from Scality) and I believe you're working with HPE, but the maths should be similar to show savings over public cloud. That said, putting one or more copies in public cloud and managing them from one namespace would be ideal... I envision use cases where multi-cloud controller software will allow you to move data to the cloud service that fits the data. [Even if it's for long-term archival, there are times when preservation data services will need to be run (format migration, integrity checks, creating access or derivatives of moving or still images, etc).] Spin up some quick compute services or Hadoop (for other use case). This is a great topic - Julian and Stuart, all the best on your projects, please do let this alias know what you decide to go with! Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers? From: Jacob Farmer > Date: Tue, November 14, 2017 7:03 am To: "Lewis, Stuart" >, "Julian M. Morley" >, gail at trumantechnologies.com, pasig-discuss at mail.asis.org Hi, Stuart. I thought I would weigh in on your plans. I?m a data storage consultant with about 30 years in the game. I?m also the founder of Starfish Storage which makes software products for virtualizing file systems and object stores. ? I think you are correct to manage multiple copies in the application layer. This gives you maximum control, ability to shift vendors, etc. Storage should always be thought of a stack that starts with the application and ends in the storage media. There can be multiple layers and the system architect should pick the right layer of abstraction for any given set of functionality. The higher in the stack, the greater your application awareness. ? By handling replication and addressing in your application, you should be able to switch object stores over time without much difficulty. As such, it does not matter so much which object store you buy. You could simply chase price. ? Oracle ? They are mysterious about what they are doing under the hood, but it does not matter. It?s a ?cloud?. They are so inexpensive. Use them as your second or third copy. I know that Oracle people monitor the news group. Maybe one will offer to connect you to a product manager who can describe the infrastructure. If not, I would be happy to connect you to folks on the US side of the pond. ? There is a particular vendor you did not list who might be interesting for you. They are called Caringo. They have been in the object storage business before S3 came to market, and thus they offer alternative addressing to the S3 bucket paradigm. They can emulate S3 just like everyone else, but S3 was designed by Amazon for the purpose of selling a storage service. It is not necessarily, the logical way to store objects for a digital library. If you are going to address objects directly from your application, they might have some unique value. I am happy to connect you to executives in the company. ? The other vendor worth looking at is Minio.IO. I just pointed Julian to them the other day. They provide an object interface in storage and could federate different cloud stores together. You might consider them for one of your copies. I still like the idea of doing your replication in the application. They are similar in concept to Zenko who Gail recommended earlier. ? POSIX File System Gateway ? My software company (Starfish Storage) has a file system gateway under development (ready for early adopters) that is ideal if you want a POSIX personality. We can take the contents of the object store and present it as a POSIX file system. o We map files 1:1 to objects. Most file system gateways on the market break up files into smaller objects, akin to blocks. o We support Active Directory perfectly, SMB-2, SMB-3, NFS-3, NFS-4 o We also work in-band or side-band to the object store. That means that you can use our POSIX interface simultaneously with S3. ? You probably also have use cases for Starfish, maybe as a migration tool from file to object or as an end-to-end fixity solution. We would be especially useful if you need to migrate files from your tape file system. o Starfish is a metadata and rules engine for file systems and object stores. Too many concepts to put in an email! I hope that helps. Message me offline if you want to discuss. I?m at the SuperComputer conference this week, so replies will be a bit slow. Jacob Farmer | Chief Technology Officer | Cambridge Computer | "Artists In Data Storage" Phone 781-250-3210 | jfarmer at CambridgeComputer.com | www.CambridgeComputer.com From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Lewis, Stuart Sent: Tuesday, November 14, 2017 4:26 AM To: 'Julian M. Morley' >; gail at trumantechnologies.com; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Julian, Gail, all, At the National Library of Scotland we are also in the middle of some procurements to revamp our storage infrastructure for holding our digitised content archive. The approach historically taken here has been to use general purpose SANs, with a second copy placed on offline tape. The SANs have never been built to scale (so they fill and we buy another), and they are general purpose, trying their best (but often failing!) to run a mixed workload of everything from VMs to data archive and everything in between. We?re now wanting to move to three copies, two online and one offline (in the cloud if possible). For the online copies we?re about to get to tender to buy a geo-replicated object storage system, to be hosted in our data centres in Edinburgh and Glasgow. I suspect the likely candidates will be systems such as Dell EMC ECS, HPE+Scality, IBM ESS**, and Hitachi HPC. (** ESS rather than CleverSafe, as I think that is predicated on three datacentres, but we only want two). We?re also about to try a large-scale proof of concept with the Oracle Archive Cloud, but have an open question regarding its characteristics compared to local offline tape. Due to lack of transparency about what is actually going on behind the scenes in a cloud environment, we don?t know whether this gives us the same offline protection that tape gives us (e.g. much harder to corrupt or accidentally delete). We?re also purposefully not going to use the object storage system?s in-built cloud connectors for replication. We feel it might be safer for us to manage the replication to the cloud in our repository, rather than having a single vendor system manage all three copies at once. Critique of this plan is most welcome! Also happy to join in any offline discussion about this. Best wishes, Stuart Lewis Head of Digital National Library of Scotland George IV Bridge, Edinburgh EH1 1EW Tel: +44 (0) 131 623 3704 Email: stuart.lewis at nls.uk Website: www.nls.uk Twitter: @stuartlewis [cid:image003.jpg at 01D0DE86.3095EC20] From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Julian M. Morley Sent: 14 November 2017 04:28 To: gail at trumantechnologies.com; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Gail, Sure - would be happy to chat with you. I?ve got Scality in my list of contenders - didn?t mention it here because my first few use cases are explicitly ?not on campus?, but I agree it?s definitely a fit for our main on prem system. As with any commercial software, ongoing licensing costs are a potential pain point for us. -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries From: "gail at trumantechnologies.com" > Date: Monday, November 13, 2017 at 4:06 PM To: Julian Morley >, "pasig-discuss at mail.asis.org" > Cc: "gail at trumantechnologies.com" > Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Julian, thanks for sharing your list and comments. Very thorough list. I'd love to chat (and I'm close by in Oakland).... I've quite a lot of experience in the cloud storage field and would suggest you also take a look at multi-cloud connector technologies that will allow you to standardize on S3, but write to non-S3-based public cloud vendors. And to tier or move data among private and public clouds and do federated search on metadata across a single namespace (across these clouds). Check out a couple of interesting technologies: Open Source Zenko.io - offering S3 connect to AWS, Azure and Google (the latter 2 are coming shortly), and also Scality Connect for Azure Blog Storage - translates S3 API calls to Azure blob storage API calls. See the attached datasheet and also https://www.zenko.io/ I'd add Scality to your list -- see the Gartner magic quadrant they're shown in the Upper Right Visionary quadrant and are close to you in San Francisco. They talk S3, File, NFS/SMB, REST (CDMI etc), can tier off to public clouds, and have lots of multi-PB size customer installs. Gartner MQ is here: https://www.gartner.com/doc/reprints?id=1-4IE870C&ct=171017&st=sb I'd be very interested in learning more about your use cases -- can we connect outside of this PASIG alias? Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: [Pasig-discuss] Experiences with S3-like object store service providers? From: "Julian M. Morley" > Date: Mon, November 13, 2017 12:43 pm To: "pasig-discuss at mail.asis.org" > Hi everyone, I?ve currently got at least four use cases for an S3-compatible object store, spanning everything from traditional S3 through infrequent access stores to cold vaults. As a result I?ve spent considerable time researching options and prices, and was wondering if anyone on this list has any similar experiences they?d like to share. Our use cases range from hundreds of TB through to several PB, with different access patterns and comfort levels around redundancy and access. For most of them a 100% compatible S3 API is a requirement, but we can bend that a bit for the cold storage use case. We?re also considering local/on-prem object stores for one of the use cases - either rolling our own Ceph install, or using Dell/EMC ECS or SpectraLogic ArcticBlue/Blackpearl. The vendors that I?m looking at are: Amazon Web Services (S3, Infrequent Access S3 and S3-to-Glacier). This is the baseline. We have a direct connect pipe to AWS which reduces the pain of data egress considerably. IBM Cloud Bluemix (formerly CleverSafe) A good choice for multi-region redundancy, as they use erasure coding across regions - no ?catch up? replication - providing CRR at a cheaper price than AWS. If you only want to keep one copy of your data in the cloud, but have it be able to survive the loss of a region, this is the best choice (Google can also do this, but not with an S3 API or an infrequent access store). Dell/EMC Virtustream (no cold storage option) Uses EMC ECS hardware. Actually more expensive than AWS at retail pricing for standard object storage; their value add is tying Virtustream into on-prem ECS units. Iron Mountain Iron Cloud (Infrequent Access only) Also uses EMC ECS hardware. Designed primarily for backup/archive workloads (no big surprise there), but with no retrieval, egress or PUT/GET/POST charges. Oracle Cloud (cheapest cold storage option, but not S3 API) Uses Openstack Swift. Has the cheapest cloud-tape product (Oracle Cloud Storage Archive), but has recently increased prices to be closer to AWS Glacier. Google Cloud Platform (not an S3 API) Technically brilliant, but you have to be able to use their APIs. Their cold storage product is online (disk, not tape), but not as cheap as Glacier. Microsoft Azure (not an S3 API) Competitively priced, especially their Infrequent Access product, but again not an S3 API and their vault product is still in beta. Backblaze B2 (not an S3 API) Another backup/archive target, only slightly more expensive than Glacier, but online (no retrieval time or fees) and with significantly cheaper data egress rates than AWS. Wasabi Cloud Recently launched company from the team that brought you Carbonite. Ridiculously cheap S3 storage, but with a 90-day per-object minimum charge. It?s cheaper and faster than Glacier, both to store data and egress it, but there?s obvious concerns around company longevity. Would probably make a good second target if you have a multi-vendor requirement for your data. If anyone is interested in hearing more, or has any experience with any of these vendors, please speak up! -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries ________________________________ ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss National Library of Scotland, Scottish Charity, No: SCO11086 This communication is intended for the addressee(s) only. If you are not the addressee please inform the sender and delete the email from your system. The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of National Library of Scotland. This message is subject to the Data Protection Act 1998 and Freedom of Information (Scotland) Act 2002. No liability is accepted for any harm that may be caused to your systems or data by this message. Before you print please think about the ENVIRONMENT ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -- Michael Davis | akropilot at gmail.com | mobile 408-464-0441 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 18754 bytes Desc: image001.jpg URL: From rjeetoo at nationaltheatre.org.uk Wed Nov 15 05:31:21 2017 From: rjeetoo at nationaltheatre.org.uk (Rebecca Jeetoo) Date: Wed, 15 Nov 2017 10:31:21 +0000 Subject: [Pasig-discuss] Experiences with S3-like object store service providers? In-Reply-To: <0E4FA879-5F8D-4995-A153-2130F28A25DC@dpn.org> References: <20171114100507.b554e26909f2beaf9f8ddbf6be9a6600.409ccf0e0f.wbe@email09.godaddy.com> <0E4FA879-5F8D-4995-A153-2130F28A25DC@dpn.org> Message-ID: Please could I be taken off this mailing list. This was a course I booked for a colleague, not something I am personally part of. Kind Regards, Becky Jeetoo PA to the Director of Learning National Theatre London SE1 9PX From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of David Pcolar Sent: 14 November 2017 21:33 To: pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? [EXTERNAL MAIL] Hi All, This is a great thread and it has surfaced a couple of new options for cloud storage. Thanks to everyone who has contributed. I would like to address a common issue with the cloud storage vendors and IO/retrieval costs. Since the likelihood of recovering a specific object is low, cloud storage is quite economical from a simple 'recover the object standpoint'. However, preservation repositories are touching those objects frequently to perform fixity checks. I am not aware of any cloud platform that will do fixity audits on demand, as detailed in NSDA Preservation level 3 (Check fixity of content at fixed intervals;Maintain logs of fixity info; supply audit on demand). A common method for providing these checks for content in S3 is to instantiate an EC2 instance, mount the S3 bucket, and run checksums on the objects. For Glacier, an additional step of staging the objects in an accessible area for the EC2 instance is required. This results in I/O and compute cycle fees that could dramatically inflate the cost of public cloud storage over time. For those utilizing public cloud storage for preservation, how are you addressing fixity checks and event audit capture? - Dave David Pcolar CTO, Digital Preservation Network dave at dpn.org On Nov 14, 2017, at 3:22 PM, Mike Davis > wrote: Hi Gail I appreciate the fact that public cloud pricing can be complex; it's a function of the cost-following strategy. If the vendor incurs a cost, whether from media, IO, or networking, it's passed along as discrete charges to the customer. The alternative is opaquely bundling all the costs, which reduces transparency and flexibility to follow commodity curves downward. I believe it's publicly available data that S3 has dropped capacity pricing for example at an average 10% (ish) per year since launch. But the idea that transaction and I/O fees dramatically inflate the cost of public cloud storage is a myth, particularly for digital asset management and archival. It is certainly possible to design a wonky IO-heavy workload, place it on the wrong storage tier, and end up with unexpectedly high costs. But for archival-oriented workloads, the costs of moving data should never be greater than 10% of total or the situation needs to be examined more closely. For example, we might find that large objects are being inadvertently chunked into millions of 16KB objects by some third party gateway solution, that would inflate the transaction count. Happy to give you (and PASIG) a deeper dive on IAAS storage strategies, and to solve for long-term durability. -Mike Davis (AWS Storage) On Tue, Nov 14, 2017 at 9:05 AM, > wrote: Thanks for chiming in Jacob! As always, great additional information. I think it's worth emphasizing that having an open and native data format independent of where the data lives - this is really what will enable multi-cloud workflow management. And also having federated data search of system and descriptive metadata across the namespace no matter where the data is stored (including across public- and on-prem cloud storage). These are what the newer cloud controller software, like Zenko, Starfish, Minio and (and other sw within some cloud services) can enable. Public cloud storage prices are racing to the bottom, but (as David Rosenthal and others have pointed out) often the "hidden" costs of pulling the data back will usually result in costs greater than a private cloud. Stuart - I just read a couple of Forrester papers on Total Economic Impact (TEI) of public clouds -- the ones I have URLs to are posted below and make a useful read: https://www.emc.com/collateral/analyst-reports/dell-emc-ecs-forrester-tei.pdf https://whitepapers.theregister.co.uk/paper/view/5835/the-total-economic-impact-of-scality-ring-with-dell-storage-servers They talk about Dell hardware for building our on-prem clouds (ECS from EMC and RING from Scality) and I believe you're working with HPE, but the maths should be similar to show savings over public cloud. That said, putting one or more copies in public cloud and managing them from one namespace would be ideal... I envision use cases where multi-cloud controller software will allow you to move data to the cloud service that fits the data. [Even if it's for long-term archival, there are times when preservation data services will need to be run (format migration, integrity checks, creating access or derivatives of moving or still images, etc).] Spin up some quick compute services or Hadoop (for other use case). This is a great topic - Julian and Stuart, all the best on your projects, please do let this alias know what you decide to go with! Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers? From: Jacob Farmer > Date: Tue, November 14, 2017 7:03 am To: "Lewis, Stuart" >, "Julian M. Morley" >, gail at trumantechnologies.com, pasig-discuss at mail.asis.org Hi, Stuart. I thought I would weigh in on your plans. I?m a data storage consultant with about 30 years in the game. I?m also the founder of Starfish Storage which makes software products for virtualizing file systems and object stores. ? I think you are correct to manage multiple copies in the application layer. This gives you maximum control, ability to shift vendors, etc. Storage should always be thought of a stack that starts with the application and ends in the storage media. There can be multiple layers and the system architect should pick the right layer of abstraction for any given set of functionality. The higher in the stack, the greater your application awareness. ? By handling replication and addressing in your application, you should be able to switch object stores over time without much difficulty. As such, it does not matter so much which object store you buy. You could simply chase price. ? Oracle ? They are mysterious about what they are doing under the hood, but it does not matter. It?s a ?cloud?. They are so inexpensive. Use them as your second or third copy. I know that Oracle people monitor the news group. Maybe one will offer to connect you to a product manager who can describe the infrastructure. If not, I would be happy to connect you to folks on the US side of the pond. ? There is a particular vendor you did not list who might be interesting for you. They are called Caringo. They have been in the object storage business before S3 came to market, and thus they offer alternative addressing to the S3 bucket paradigm. They can emulate S3 just like everyone else, but S3 was designed by Amazon for the purpose of selling a storage service. It is not necessarily, the logical way to store objects for a digital library. If you are going to address objects directly from your application, they might have some unique value. I am happy to connect you to executives in the company. ? The other vendor worth looking at is Minio.IO. I just pointed Julian to them the other day. They provide an object interface in storage and could federate different cloud stores together. You might consider them for one of your copies. I still like the idea of doing your replication in the application. They are similar in concept to Zenko who Gail recommended earlier. ? POSIX File System Gateway ? My software company (Starfish Storage) has a file system gateway under development (ready for early adopters) that is ideal if you want a POSIX personality. We can take the contents of the object store and present it as a POSIX file system. o We map files 1:1 to objects. Most file system gateways on the market break up files into smaller objects, akin to blocks. o We support Active Directory perfectly, SMB-2, SMB-3, NFS-3, NFS-4 o We also work in-band or side-band to the object store. That means that you can use our POSIX interface simultaneously with S3. ? You probably also have use cases for Starfish, maybe as a migration tool from file to object or as an end-to-end fixity solution. We would be especially useful if you need to migrate files from your tape file system. o Starfish is a metadata and rules engine for file systems and object stores. Too many concepts to put in an email! I hope that helps. Message me offline if you want to discuss. I?m at the SuperComputer conference this week, so replies will be a bit slow. Jacob Farmer | Chief Technology Officer | Cambridge Computer | "Artists In Data Storage" Phone 781-250-3210 | jfarmer at CambridgeComputer.com | www.CambridgeComputer.com From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Lewis, Stuart Sent: Tuesday, November 14, 2017 4:26 AM To: 'Julian M. Morley' >; gail at trumantechnologies.com; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Julian, Gail, all, At the National Library of Scotland we are also in the middle of some procurements to revamp our storage infrastructure for holding our digitised content archive. The approach historically taken here has been to use general purpose SANs, with a second copy placed on offline tape. The SANs have never been built to scale (so they fill and we buy another), and they are general purpose, trying their best (but often failing!) to run a mixed workload of everything from VMs to data archive and everything in between. We?re now wanting to move to three copies, two online and one offline (in the cloud if possible). For the online copies we?re about to get to tender to buy a geo-replicated object storage system, to be hosted in our data centres in Edinburgh and Glasgow. I suspect the likely candidates will be systems such as Dell EMC ECS, HPE+Scality, IBM ESS**, and Hitachi HPC. (** ESS rather than CleverSafe, as I think that is predicated on three datacentres, but we only want two). We?re also about to try a large-scale proof of concept with the Oracle Archive Cloud, but have an open question regarding its characteristics compared to local offline tape. Due to lack of transparency about what is actually going on behind the scenes in a cloud environment, we don?t know whether this gives us the same offline protection that tape gives us (e.g. much harder to corrupt or accidentally delete). We?re also purposefully not going to use the object storage system?s in-built cloud connectors for replication. We feel it might be safer for us to manage the replication to the cloud in our repository, rather than having a single vendor system manage all three copies at once. Critique of this plan is most welcome! Also happy to join in any offline discussion about this. Best wishes, Stuart Lewis Head of Digital National Library of Scotland George IV Bridge, Edinburgh EH1 1EW Tel: +44 (0) 131 623 3704 Email: stuart.lewis at nls.uk Website: www.nls.uk Twitter: @stuartlewis From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Julian M. Morley Sent: 14 November 2017 04:28 To: gail at trumantechnologies.com; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Gail, Sure - would be happy to chat with you. I?ve got Scality in my list of contenders - didn?t mention it here because my first few use cases are explicitly ?not on campus?, but I agree it?s definitely a fit for our main on prem system. As with any commercial software, ongoing licensing costs are a potential pain point for us. -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries From: "gail at trumantechnologies.com" > Date: Monday, November 13, 2017 at 4:06 PM To: Julian Morley >, "pasig-discuss at mail.asis.org" > Cc: "gail at trumantechnologies.com" > Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Julian, thanks for sharing your list and comments. Very thorough list. I'd love to chat (and I'm close by in Oakland).... I've quite a lot of experience in the cloud storage field and would suggest you also take a look at multi-cloud connector technologies that will allow you to standardize on S3, but write to non-S3-based public cloud vendors. And to tier or move data among private and public clouds and do federated search on metadata across a single namespace (across these clouds). Check out a couple of interesting technologies: Open Source Zenko.io - offering S3 connect to AWS, Azure and Google (the latter 2 are coming shortly), and also Scality Connect for Azure Blog Storage - translates S3 API calls to Azure blob storage API calls. See the attached datasheet and also https://www.zenko.io/ I'd add Scality to your list -- see the Gartner magic quadrant they're shown in the Upper Right Visionary quadrant and are close to you in San Francisco. They talk S3, File, NFS/SMB, REST (CDMI etc), can tier off to public clouds, and have lots of multi-PB size customer installs. Gartner MQ is here: https://www.gartner.com/doc/reprints?id=1-4IE870C&ct=171017&st=sb I'd be very interested in learning more about your use cases -- can we connect outside of this PASIG alias? Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: [Pasig-discuss] Experiences with S3-like object store service providers? From: "Julian M. Morley" > Date: Mon, November 13, 2017 12:43 pm To: "pasig-discuss at mail.asis.org" > Hi everyone, I?ve currently got at least four use cases for an S3-compatible object store, spanning everything from traditional S3 through infrequent access stores to cold vaults. As a result I?ve spent considerable time researching options and prices, and was wondering if anyone on this list has any similar experiences they?d like to share. Our use cases range from hundreds of TB through to several PB, with different access patterns and comfort levels around redundancy and access. For most of them a 100% compatible S3 API is a requirement, but we can bend that a bit for the cold storage use case. We?re also considering local/on-prem object stores for one of the use cases - either rolling our own Ceph install, or using Dell/EMC ECS or SpectraLogic ArcticBlue/Blackpearl. The vendors that I?m looking at are: Amazon Web Services (S3, Infrequent Access S3 and S3-to-Glacier). This is the baseline. We have a direct connect pipe to AWS which reduces the pain of data egress considerably. IBM Cloud Bluemix (formerly CleverSafe) A good choice for multi-region redundancy, as they use erasure coding across regions - no ?catch up? replication - providing CRR at a cheaper price than AWS. If you only want to keep one copy of your data in the cloud, but have it be able to survive the loss of a region, this is the best choice (Google can also do this, but not with an S3 API or an infrequent access store). Dell/EMC Virtustream (no cold storage option) Uses EMC ECS hardware. Actually more expensive than AWS at retail pricing for standard object storage; their value add is tying Virtustream into on-prem ECS units. Iron Mountain Iron Cloud (Infrequent Access only) Also uses EMC ECS hardware. Designed primarily for backup/archive workloads (no big surprise there), but with no retrieval, egress or PUT/GET/POST charges. Oracle Cloud (cheapest cold storage option, but not S3 API) Uses Openstack Swift. Has the cheapest cloud-tape product (Oracle Cloud Storage Archive), but has recently increased prices to be closer to AWS Glacier. Google Cloud Platform (not an S3 API) Technically brilliant, but you have to be able to use their APIs. Their cold storage product is online (disk, not tape), but not as cheap as Glacier. Microsoft Azure (not an S3 API) Competitively priced, especially their Infrequent Access product, but again not an S3 API and their vault product is still in beta. Backblaze B2 (not an S3 API) Another backup/archive target, only slightly more expensive than Glacier, but online (no retrieval time or fees) and with significantly cheaper data egress rates than AWS. Wasabi Cloud Recently launched company from the team that brought you Carbonite. Ridiculously cheap S3 storage, but with a 90-day per-object minimum charge. It?s cheaper and faster than Glacier, both to store data and egress it, but there?s obvious concerns around company longevity. Would probably make a good second target if you have a multi-vendor requirement for your data. If anyone is interested in hearing more, or has any experience with any of these vendors, please speak up! -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries ________________________________ ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss National Library of Scotland, Scottish Charity, No: SCO11086 This communication is intended for the addressee(s) only. If you are not the addressee please inform the sender and delete the email from your system. The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of National Library of Scotland. This message is subject to the Data Protection Act 1998 and Freedom of Information (Scotland) Act 2002. No liability is accepted for any harm that may be caused to your systems or data by this message. Before you print please think about the ENVIRONMENT ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -- Michael Davis | akropilot at gmail.com | mobile 408-464-0441 ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From lp30 at st-andrews.ac.uk Wed Nov 15 06:49:27 2017 From: lp30 at st-andrews.ac.uk (Louise Pidcock) Date: Wed, 15 Nov 2017 11:49:27 +0000 Subject: [Pasig-discuss] {Disarmed} Re: Experiences with S3-like object store service providers? In-Reply-To: References: <20171114100507.b554e26909f2beaf9f8ddbf6be9a6600.409ccf0e0f.wbe@email09.godaddy.com> <0E4FA879-5F8D-4995-A153-2130F28A25DC@dpn.org> Message-ID: Could I please come off the list for the same reason From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Rebecca Jeetoo Sent: 15 November 2017 10:31 To: David Pcolar ; pasig-discuss at mail.asis.org Subject: {Disarmed} Re: [Pasig-discuss] Experiences with S3-like object store service providers? Please could I be taken off this mailing list. This was a course I booked for a colleague, not something I am personally part of. Kind Regards, Becky Jeetoo PA to the Director of Learning National Theatre London SE1 9PX From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of David Pcolar Sent: 14 November 2017 21:33 To: pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? [EXTERNAL MAIL] Hi All, This is a great thread and it has surfaced a couple of new options for cloud storage. Thanks to everyone who has contributed. I would like to address a common issue with the cloud storage vendors and IO/retrieval costs. Since the likelihood of recovering a specific object is low, cloud storage is quite economical from a simple 'recover the object standpoint'. However, preservation repositories are touching those objects frequently to perform fixity checks. I am not aware of any cloud platform that will do fixity audits on demand, as detailed in NSDA Preservation level 3 (Check fixity of content at fixed intervals;Maintain logs of fixity info; supply audit on demand). A common method for providing these checks for content in S3 is to instantiate an EC2 instance, mount the S3 bucket, and run checksums on the objects. For Glacier, an additional step of staging the objects in an accessible area for the EC2 instance is required. This results in I/O and compute cycle fees that could dramatically inflate the cost of public cloud storage over time. For those utilizing public cloud storage for preservation, how are you addressing fixity checks and event audit capture? - Dave David Pcolar CTO, Digital Preservation Network dave at dpn.org On Nov 14, 2017, at 3:22 PM, Mike Davis > wrote: Hi Gail I appreciate the fact that public cloud pricing can be complex; it's a function of the cost-following strategy. If the vendor incurs a cost, whether from media, IO, or networking, it's passed along as discrete charges to the customer. The alternative is opaquely bundling all the costs, which reduces transparency and flexibility to follow commodity curves downward. I believe it's publicly available data that S3 has dropped capacity pricing for example at an average 10% (ish) per year since launch. But the idea that transaction and I/O fees dramatically inflate the cost of public cloud storage is a myth, particularly for digital asset management and archival. It is certainly possible to design a wonky IO-heavy workload, place it on the wrong storage tier, and end up with unexpectedly high costs. But for archival-oriented workloads, the costs of moving data should never be greater than 10% of total or the situation needs to be examined more closely. For example, we might find that large objects are being inadvertently chunked into millions of 16KB objects by some third party gateway solution, that would inflate the transaction count. Happy to give you (and PASIG) a deeper dive on IAAS storage strategies, and to solve for long-term durability. -Mike Davis (AWS Storage) On Tue, Nov 14, 2017 at 9:05 AM, > wrote: Thanks for chiming in Jacob! As always, great additional information. I think it's worth emphasizing that having an open and native data format independent of where the data lives - this is really what will enable multi-cloud workflow management. And also having federated data search of system and descriptive metadata across the namespace no matter where the data is stored (including across public- and on-prem cloud storage). These are what the newer cloud controller software, like Zenko, Starfish, Minio and (and other sw within some cloud services) can enable. Public cloud storage prices are racing to the bottom, but (as David Rosenthal and others have pointed out) often the "hidden" costs of pulling the data back will usually result in costs greater than a private cloud. Stuart - I just read a couple of Forrester papers on Total Economic Impact (TEI) of public clouds -- the ones I have URLs to are posted below and make a useful read: MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be https://www.emc.com/collateral/analyst-reports/dell-emc-ecs-forrester-tei.pdf MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be https://whitepapers.theregister.co.uk/paper/view/5835/the-total-economic-impact-of-scality-ring-with-dell-storage-servers They talk about Dell hardware for building our on-prem clouds (ECS from EMC and RING from Scality) and I believe you're working with HPE, but the maths should be similar to show savings over public cloud. That said, putting one or more copies in public cloud and managing them from one namespace would be ideal... I envision use cases where multi-cloud controller software will allow you to move data to the cloud service that fits the data. [Even if it's for long-term archival, there are times when preservation data services will need to be run (format migration, integrity checks, creating access or derivatives of moving or still images, etc).] Spin up some quick compute services or Hadoop (for other use case). This is a great topic - Julian and Stuart, all the best on your projects, please do let this alias know what you decide to go with! Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be www.trumantechnologies.com facebook/TrumanTechnologies MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers? From: Jacob Farmer > Date: Tue, November 14, 2017 7:03 am To: "Lewis, Stuart" >, "Julian M. Morley" >, gail at trumantechnologies.com, pasig-discuss at mail.asis.org Hi, Stuart. I thought I would weigh in on your plans. I?m a data storage consultant with about 30 years in the game. I?m also the founder of Starfish Storage which makes software products for virtualizing file systems and object stores. ? I think you are correct to manage multiple copies in the application layer. This gives you maximum control, ability to shift vendors, etc. Storage should always be thought of a stack that starts with the application and ends in the storage media. There can be multiple layers and the system architect should pick the right layer of abstraction for any given set of functionality. The higher in the stack, the greater your application awareness. ? By handling replication and addressing in your application, you should be able to switch object stores over time without much difficulty. As such, it does not matter so much which object store you buy. You could simply chase price. ? Oracle ? They are mysterious about what they are doing under the hood, but it does not matter. It?s a ?cloud?. They are so inexpensive. Use them as your second or third copy. I know that Oracle people monitor the news group. Maybe one will offer to connect you to a product manager who can describe the infrastructure. If not, I would be happy to connect you to folks on the US side of the pond. ? There is a particular vendor you did not list who might be interesting for you. They are called Caringo. They have been in the object storage business before S3 came to market, and thus they offer alternative addressing to the S3 bucket paradigm. They can emulate S3 just like everyone else, but S3 was designed by Amazon for the purpose of selling a storage service. It is not necessarily, the logical way to store objects for a digital library. If you are going to address objects directly from your application, they might have some unique value. I am happy to connect you to executives in the company. ? The other vendor worth looking at is Minio.IO. I just pointed Julian to them the other day. They provide an object interface in storage and could federate different cloud stores together. You might consider them for one of your copies. I still like the idea of doing your replication in the application. They are similar in concept to Zenko who Gail recommended earlier. ? POSIX File System Gateway ? My software company (Starfish Storage) has a file system gateway under development (ready for early adopters) that is ideal if you want a POSIX personality. We can take the contents of the object store and present it as a POSIX file system. o We map files 1:1 to objects. Most file system gateways on the market break up files into smaller objects, akin to blocks. o We support Active Directory perfectly, SMB-2, SMB-3, NFS-3, NFS-4 o We also work in-band or side-band to the object store. That means that you can use our POSIX interface simultaneously with S3. ? You probably also have use cases for Starfish, maybe as a migration tool from file to object or as an end-to-end fixity solution. We would be especially useful if you need to migrate files from your tape file system. o Starfish is a metadata and rules engine for file systems and object stores. Too many concepts to put in an email! I hope that helps. Message me offline if you want to discuss. I?m at the SuperComputer conference this week, so replies will be a bit slow. Jacob Farmer | Chief Technology Officer | Cambridge Computer | "Artists In Data Storage" Phone 781-250-3210 | jfarmer at CambridgeComputer.com | MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be www.CambridgeComputer.com From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Lewis, Stuart Sent: Tuesday, November 14, 2017 4:26 AM To: 'Julian M. Morley' >; gail at trumantechnologies.com; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Julian, Gail, all, At the National Library of Scotland we are also in the middle of some procurements to revamp our storage infrastructure for holding our digitised content archive. The approach historically taken here has been to use general purpose SANs, with a second copy placed on offline tape. The SANs have never been built to scale (so they fill and we buy another), and they are general purpose, trying their best (but often failing!) to run a mixed workload of everything from VMs to data archive and everything in between. We?re now wanting to move to three copies, two online and one offline (in the cloud if possible). For the online copies we?re about to get to tender to buy a geo-replicated object storage system, to be hosted in our data centres in Edinburgh and Glasgow. I suspect the likely candidates will be systems such as Dell EMC ECS, HPE+Scality, IBM ESS**, and Hitachi HPC. (** ESS rather than CleverSafe, as I think that is predicated on three datacentres, but we only want two). We?re also about to try a large-scale proof of concept with the Oracle Archive Cloud, but have an open question regarding its characteristics compared to local offline tape. Due to lack of transparency about what is actually going on behind the scenes in a cloud environment, we don?t know whether this gives us the same offline protection that tape gives us (e.g. much harder to corrupt or accidentally delete). We?re also purposefully not going to use the object storage system?s in-built cloud connectors for replication. We feel it might be safer for us to manage the replication to the cloud in our repository, rather than having a single vendor system manage all three copies at once. Critique of this plan is most welcome! Also happy to join in any offline discussion about this. Best wishes, Stuart Lewis Head of Digital National Library of Scotland George IV Bridge, Edinburgh EH1 1EW Tel: +44 (0) 131 623 3704 Email: stuart.lewis at nls.uk Website: MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be www.nls.uk Twitter: @stuartlewis From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Julian M. Morley Sent: 14 November 2017 04:28 To: gail at trumantechnologies.com; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Gail, Sure - would be happy to chat with you. I?ve got Scality in my list of contenders - didn?t mention it here because my first few use cases are explicitly ?not on campus?, but I agree it?s definitely a fit for our main on prem system. As with any commercial software, ongoing licensing costs are a potential pain point for us. -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries From: "gail at trumantechnologies.com" > Date: Monday, November 13, 2017 at 4:06 PM To: Julian Morley >, "pasig-discuss at mail.asis.org" > Cc: "gail at trumantechnologies.com" > Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Julian, thanks for sharing your list and comments. Very thorough list. I'd love to chat (and I'm close by in Oakland).... I've quite a lot of experience in the cloud storage field and would suggest you also take a look at multi-cloud connector technologies that will allow you to standardize on S3, but write to non-S3-based public cloud vendors. And to tier or move data among private and public clouds and do federated search on metadata across a single namespace (across these clouds). Check out a couple of interesting technologies: Open Source Zenko.io - offering S3 connect to AWS, Azure and Google (the latter 2 are coming shortly), and also Scality Connect for Azure Blog Storage - translates S3 API calls to Azure blob storage API calls. See the attached datasheet and also MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be https://www.zenko.io/ I'd add Scality to your list -- see the Gartner magic quadrant they're shown in the Upper Right Visionary quadrant and are close to you in San Francisco. They talk S3, File, NFS/SMB, REST (CDMI etc), can tier off to public clouds, and have lots of multi-PB size customer installs. Gartner MQ is here: MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be https://www.gartner.com/doc/reprints?id=1-4IE870C&ct=171017&st=sb I'd be very interested in learning more about your use cases -- can we connect outside of this PASIG alias? Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be www.trumantechnologies.com facebook/TrumanTechnologies MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: [Pasig-discuss] Experiences with S3-like object store service providers? From: "Julian M. Morley" > Date: Mon, November 13, 2017 12:43 pm To: "pasig-discuss at mail.asis.org" > Hi everyone, I?ve currently got at least four use cases for an S3-compatible object store, spanning everything from traditional S3 through infrequent access stores to cold vaults. As a result I?ve spent considerable time researching options and prices, and was wondering if anyone on this list has any similar experiences they?d like to share. Our use cases range from hundreds of TB through to several PB, with different access patterns and comfort levels around redundancy and access. For most of them a 100% compatible S3 API is a requirement, but we can bend that a bit for the cold storage use case. We?re also considering local/on-prem object stores for one of the use cases - either rolling our own Ceph install, or using Dell/EMC ECS or SpectraLogic ArcticBlue/Blackpearl. The vendors that I?m looking at are: Amazon Web Services (S3, Infrequent Access S3 and S3-to-Glacier). This is the baseline. We have a direct connect pipe to AWS which reduces the pain of data egress considerably. IBM Cloud Bluemix (formerly CleverSafe) A good choice for multi-region redundancy, as they use erasure coding across regions - no ?catch up? replication - providing CRR at a cheaper price than AWS. If you only want to keep one copy of your data in the cloud, but have it be able to survive the loss of a region, this is the best choice (Google can also do this, but not with an S3 API or an infrequent access store). Dell/EMC Virtustream (no cold storage option) Uses EMC ECS hardware. Actually more expensive than AWS at retail pricing for standard object storage; their value add is tying Virtustream into on-prem ECS units. Iron Mountain Iron Cloud (Infrequent Access only) Also uses EMC ECS hardware. Designed primarily for backup/archive workloads (no big surprise there), but with no retrieval, egress or PUT/GET/POST charges. Oracle Cloud (cheapest cold storage option, but not S3 API) Uses Openstack Swift. Has the cheapest cloud-tape product (Oracle Cloud Storage Archive), but has recently increased prices to be closer to AWS Glacier. Google Cloud Platform (not an S3 API) Technically brilliant, but you have to be able to use their APIs. Their cold storage product is online (disk, not tape), but not as cheap as Glacier. Microsoft Azure (not an S3 API) Competitively priced, especially their Infrequent Access product, but again not an S3 API and their vault product is still in beta. Backblaze B2 (not an S3 API) Another backup/archive target, only slightly more expensive than Glacier, but online (no retrieval time or fees) and with significantly cheaper data egress rates than AWS. Wasabi Cloud Recently launched company from the team that brought you Carbonite. Ridiculously cheap S3 storage, but with a 90-day per-object minimum charge. It?s cheaper and faster than Glacier, both to store data and egress it, but there?s obvious concerns around company longevity. Would probably make a good second target if you have a multi-vendor requirement for your data. If anyone is interested in hearing more, or has any experience with any of these vendors, please speak up! -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries ________________________________ ---- To subscribe, unsubscribe, or modify your subscription, please visit MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be http://mail.asis.org/mailman/listinfo/pasig-discuss National Library of Scotland, Scottish Charity, No: SCO11086 This communication is intended for the addressee(s) only. If you are not the addressee please inform the sender and delete the email from your system. The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of National Library of Scotland. This message is subject to the Data Protection Act 1998 and Freedom of Information (Scotland) Act 2002. No liability is accepted for any harm that may be caused to your systems or data by this message. Before you print please think about the ENVIRONMENT ---- To subscribe, unsubscribe, or modify your subscription, please visit MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be http://mail.asis.org/mailman/listinfo/pasig-discuss -- Michael Davis | akropilot at gmail.com | mobile 408-464-0441 ---- To subscribe, unsubscribe, or modify your subscription, please visit MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Arya.Shirazi at hbo.com Wed Nov 15 11:12:47 2017 From: Arya.Shirazi at hbo.com (Shirazi, Arya (HBO)) Date: Wed, 15 Nov 2017 16:12:47 +0000 Subject: [Pasig-discuss] {Disarmed} Re: Experiences with S3-like object store service providers? In-Reply-To: References: <20171114100507.b554e26909f2beaf9f8ddbf6be9a6600.409ccf0e0f.wbe@email09.godaddy.com> <0E4FA879-5F8D-4995-A153-2130F28A25DC@dpn.org> Message-ID: Please remove me from this list as well From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Louise Pidcock Sent: Wednesday, November 15, 2017 6:49 AM To: Rebecca Jeetoo ; David Pcolar ; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] {Disarmed} Re: Experiences with S3-like object store service providers? **External Email received from: "Louise Pidcock" > ** Could I please come off the list for the same reason From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Rebecca Jeetoo Sent: 15 November 2017 10:31 To: David Pcolar >; pasig-discuss at mail.asis.org Subject: {Disarmed} Re: [Pasig-discuss] Experiences with S3-like object store service providers? Please could I be taken off this mailing list. This was a course I booked for a colleague, not something I am personally part of. Kind Regards, Becky Jeetoo PA to the Director of Learning National Theatre London SE1 9PX From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of David Pcolar Sent: 14 November 2017 21:33 To: pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? [EXTERNAL MAIL] Hi All, This is a great thread and it has surfaced a couple of new options for cloud storage. Thanks to everyone who has contributed. I would like to address a common issue with the cloud storage vendors and IO/retrieval costs. Since the likelihood of recovering a specific object is low, cloud storage is quite economical from a simple 'recover the object standpoint'. However, preservation repositories are touching those objects frequently to perform fixity checks. I am not aware of any cloud platform that will do fixity audits on demand, as detailed in NSDA Preservation level 3 (Check fixity of content at fixed intervals;Maintain logs of fixity info; supply audit on demand). A common method for providing these checks for content in S3 is to instantiate an EC2 instance, mount the S3 bucket, and run checksums on the objects. For Glacier, an additional step of staging the objects in an accessible area for the EC2 instance is required. This results in I/O and compute cycle fees that could dramatically inflate the cost of public cloud storage over time. For those utilizing public cloud storage for preservation, how are you addressing fixity checks and event audit capture? - Dave David Pcolar CTO, Digital Preservation Network dave at dpn.org On Nov 14, 2017, at 3:22 PM, Mike Davis > wrote: Hi Gail I appreciate the fact that public cloud pricing can be complex; it's a function of the cost-following strategy. If the vendor incurs a cost, whether from media, IO, or networking, it's passed along as discrete charges to the customer. The alternative is opaquely bundling all the costs, which reduces transparency and flexibility to follow commodity curves downward. I believe it's publicly available data that S3 has dropped capacity pricing for example at an average 10% (ish) per year since launch. But the idea that transaction and I/O fees dramatically inflate the cost of public cloud storage is a myth, particularly for digital asset management and archival. It is certainly possible to design a wonky IO-heavy workload, place it on the wrong storage tier, and end up with unexpectedly high costs. But for archival-oriented workloads, the costs of moving data should never be greater than 10% of total or the situation needs to be examined more closely. For example, we might find that large objects are being inadvertently chunked into millions of 16KB objects by some third party gateway solution, that would inflate the transaction count. Happy to give you (and PASIG) a deeper dive on IAAS storage strategies, and to solve for long-term durability. -Mike Davis (AWS Storage) On Tue, Nov 14, 2017 at 9:05 AM, > wrote: Thanks for chiming in Jacob! As always, great additional information. I think it's worth emphasizing that having an open and native data format independent of where the data lives - this is really what will enable multi-cloud workflow management. And also having federated data search of system and descriptive metadata across the namespace no matter where the data is stored (including across public- and on-prem cloud storage). These are what the newer cloud controller software, like Zenko, Starfish, Minio and (and other sw within some cloud services) can enable. Public cloud storage prices are racing to the bottom, but (as David Rosenthal and others have pointed out) often the "hidden" costs of pulling the data back will usually result in costs greater than a private cloud. Stuart - I just read a couple of Forrester papers on Total Economic Impact (TEI) of public clouds -- the ones I have URLs to are posted below and make a useful read: MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be https://www.emc.com/collateral/analyst-reports/dell-emc-ecs-forrester-tei.pdf MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be https://whitepapers.theregister.co.uk/paper/view/5835/the-total-economic-impact-of-scality-ring-with-dell-storage-servers They talk about Dell hardware for building our on-prem clouds (ECS from EMC and RING from Scality) and I believe you're working with HPE, but the maths should be similar to show savings over public cloud. That said, putting one or more copies in public cloud and managing them from one namespace would be ideal... I envision use cases where multi-cloud controller software will allow you to move data to the cloud service that fits the data. [Even if it's for long-term archival, there are times when preservation data services will need to be run (format migration, integrity checks, creating access or derivatives of moving or still images, etc).] Spin up some quick compute services or Hadoop (for other use case). This is a great topic - Julian and Stuart, all the best on your projects, please do let this alias know what you decide to go with! Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be www.trumantechnologies.com facebook/TrumanTechnologies MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers? From: Jacob Farmer > Date: Tue, November 14, 2017 7:03 am To: "Lewis, Stuart" >, "Julian M. Morley" >, gail at trumantechnologies.com, pasig-discuss at mail.asis.org Hi, Stuart. I thought I would weigh in on your plans. I?m a data storage consultant with about 30 years in the game. I?m also the founder of Starfish Storage which makes software products for virtualizing file systems and object stores. ? I think you are correct to manage multiple copies in the application layer. This gives you maximum control, ability to shift vendors, etc. Storage should always be thought of a stack that starts with the application and ends in the storage media. There can be multiple layers and the system architect should pick the right layer of abstraction for any given set of functionality. The higher in the stack, the greater your application awareness. ? By handling replication and addressing in your application, you should be able to switch object stores over time without much difficulty. As such, it does not matter so much which object store you buy. You could simply chase price. ? Oracle ? They are mysterious about what they are doing under the hood, but it does not matter. It?s a ?cloud?. They are so inexpensive. Use them as your second or third copy. I know that Oracle people monitor the news group. Maybe one will offer to connect you to a product manager who can describe the infrastructure. If not, I would be happy to connect you to folks on the US side of the pond. ? There is a particular vendor you did not list who might be interesting for you. They are called Caringo. They have been in the object storage business before S3 came to market, and thus they offer alternative addressing to the S3 bucket paradigm. They can emulate S3 just like everyone else, but S3 was designed by Amazon for the purpose of selling a storage service. It is not necessarily, the logical way to store objects for a digital library. If you are going to address objects directly from your application, they might have some unique value. I am happy to connect you to executives in the company. ? The other vendor worth looking at is Minio.IO. I just pointed Julian to them the other day. They provide an object interface in storage and could federate different cloud stores together. You might consider them for one of your copies. I still like the idea of doing your replication in the application. They are similar in concept to Zenko who Gail recommended earlier. ? POSIX File System Gateway ? My software company (Starfish Storage) has a file system gateway under development (ready for early adopters) that is ideal if you want a POSIX personality. We can take the contents of the object store and present it as a POSIX file system. o We map files 1:1 to objects. Most file system gateways on the market break up files into smaller objects, akin to blocks. o We support Active Directory perfectly, SMB-2, SMB-3, NFS-3, NFS-4 o We also work in-band or side-band to the object store. That means that you can use our POSIX interface simultaneously with S3. ? You probably also have use cases for Starfish, maybe as a migration tool from file to object or as an end-to-end fixity solution. We would be especially useful if you need to migrate files from your tape file system. o Starfish is a metadata and rules engine for file systems and object stores. Too many concepts to put in an email! I hope that helps. Message me offline if you want to discuss. I?m at the SuperComputer conference this week, so replies will be a bit slow. Jacob Farmer | Chief Technology Officer | Cambridge Computer | "Artists In Data Storage" Phone 781-250-3210 | jfarmer at CambridgeComputer.com | MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be www.CambridgeComputer.com From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Lewis, Stuart Sent: Tuesday, November 14, 2017 4:26 AM To: 'Julian M. Morley' >; gail at trumantechnologies.com; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Julian, Gail, all, At the National Library of Scotland we are also in the middle of some procurements to revamp our storage infrastructure for holding our digitised content archive. The approach historically taken here has been to use general purpose SANs, with a second copy placed on offline tape. The SANs have never been built to scale (so they fill and we buy another), and they are general purpose, trying their best (but often failing!) to run a mixed workload of everything from VMs to data archive and everything in between. We?re now wanting to move to three copies, two online and one offline (in the cloud if possible). For the online copies we?re about to get to tender to buy a geo-replicated object storage system, to be hosted in our data centres in Edinburgh and Glasgow. I suspect the likely candidates will be systems such as Dell EMC ECS, HPE+Scality, IBM ESS**, and Hitachi HPC. (** ESS rather than CleverSafe, as I think that is predicated on three datacentres, but we only want two). We?re also about to try a large-scale proof of concept with the Oracle Archive Cloud, but have an open question regarding its characteristics compared to local offline tape. Due to lack of transparency about what is actually going on behind the scenes in a cloud environment, we don?t know whether this gives us the same offline protection that tape gives us (e.g. much harder to corrupt or accidentally delete). We?re also purposefully not going to use the object storage system?s in-built cloud connectors for replication. We feel it might be safer for us to manage the replication to the cloud in our repository, rather than having a single vendor system manage all three copies at once. Critique of this plan is most welcome! Also happy to join in any offline discussion about this. Best wishes, Stuart Lewis Head of Digital National Library of Scotland George IV Bridge, Edinburgh EH1 1EW Tel: +44 (0) 131 623 3704 Email: stuart.lewis at nls.uk Website: MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be www.nls.uk Twitter: @stuartlewis From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Julian M. Morley Sent: 14 November 2017 04:28 To: gail at trumantechnologies.com; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Gail, Sure - would be happy to chat with you. I?ve got Scality in my list of contenders - didn?t mention it here because my first few use cases are explicitly ?not on campus?, but I agree it?s definitely a fit for our main on prem system. As with any commercial software, ongoing licensing costs are a potential pain point for us. -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries From: "gail at trumantechnologies.com" > Date: Monday, November 13, 2017 at 4:06 PM To: Julian Morley >, "pasig-discuss at mail.asis.org" > Cc: "gail at trumantechnologies.com" > Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Julian, thanks for sharing your list and comments. Very thorough list. I'd love to chat (and I'm close by in Oakland).... I've quite a lot of experience in the cloud storage field and would suggest you also take a look at multi-cloud connector technologies that will allow you to standardize on S3, but write to non-S3-based public cloud vendors. And to tier or move data among private and public clouds and do federated search on metadata across a single namespace (across these clouds). Check out a couple of interesting technologies: Open Source Zenko.io - offering S3 connect to AWS, Azure and Google (the latter 2 are coming shortly), and also Scality Connect for Azure Blog Storage - translates S3 API calls to Azure blob storage API calls. See the attached datasheet and also MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be https://www.zenko.io/ I'd add Scality to your list -- see the Gartner magic quadrant they're shown in the Upper Right Visionary quadrant and are close to you in San Francisco. They talk S3, File, NFS/SMB, REST (CDMI etc), can tier off to public clouds, and have lots of multi-PB size customer installs. Gartner MQ is here: MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be https://www.gartner.com/doc/reprints?id=1-4IE870C&ct=171017&st=sb I'd be very interested in learning more about your use cases -- can we connect outside of this PASIG alias? Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be www.trumantechnologies.com facebook/TrumanTechnologies MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: [Pasig-discuss] Experiences with S3-like object store service providers? From: "Julian M. Morley" > Date: Mon, November 13, 2017 12:43 pm To: "pasig-discuss at mail.asis.org" > Hi everyone, I?ve currently got at least four use cases for an S3-compatible object store, spanning everything from traditional S3 through infrequent access stores to cold vaults. As a result I?ve spent considerable time researching options and prices, and was wondering if anyone on this list has any similar experiences they?d like to share. Our use cases range from hundreds of TB through to several PB, with different access patterns and comfort levels around redundancy and access. For most of them a 100% compatible S3 API is a requirement, but we can bend that a bit for the cold storage use case. We?re also considering local/on-prem object stores for one of the use cases - either rolling our own Ceph install, or using Dell/EMC ECS or SpectraLogic ArcticBlue/Blackpearl. The vendors that I?m looking at are: Amazon Web Services (S3, Infrequent Access S3 and S3-to-Glacier). This is the baseline. We have a direct connect pipe to AWS which reduces the pain of data egress considerably. IBM Cloud Bluemix (formerly CleverSafe) A good choice for multi-region redundancy, as they use erasure coding across regions - no ?catch up? replication - providing CRR at a cheaper price than AWS. If you only want to keep one copy of your data in the cloud, but have it be able to survive the loss of a region, this is the best choice (Google can also do this, but not with an S3 API or an infrequent access store). Dell/EMC Virtustream (no cold storage option) Uses EMC ECS hardware. Actually more expensive than AWS at retail pricing for standard object storage; their value add is tying Virtustream into on-prem ECS units. Iron Mountain Iron Cloud (Infrequent Access only) Also uses EMC ECS hardware. Designed primarily for backup/archive workloads (no big surprise there), but with no retrieval, egress or PUT/GET/POST charges. Oracle Cloud (cheapest cold storage option, but not S3 API) Uses Openstack Swift. Has the cheapest cloud-tape product (Oracle Cloud Storage Archive), but has recently increased prices to be closer to AWS Glacier. Google Cloud Platform (not an S3 API) Technically brilliant, but you have to be able to use their APIs. Their cold storage product is online (disk, not tape), but not as cheap as Glacier. Microsoft Azure (not an S3 API) Competitively priced, especially their Infrequent Access product, but again not an S3 API and their vault product is still in beta. Backblaze B2 (not an S3 API) Another backup/archive target, only slightly more expensive than Glacier, but online (no retrieval time or fees) and with significantly cheaper data egress rates than AWS. Wasabi Cloud Recently launched company from the team that brought you Carbonite. Ridiculously cheap S3 storage, but with a 90-day per-object minimum charge. It?s cheaper and faster than Glacier, both to store data and egress it, but there?s obvious concerns around company longevity. Would probably make a good second target if you have a multi-vendor requirement for your data. If anyone is interested in hearing more, or has any experience with any of these vendors, please speak up! -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries ________________________________ ---- To subscribe, unsubscribe, or modify your subscription, please visit MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be http://mail.asis.org/mailman/listinfo/pasig-discuss National Library of Scotland, Scottish Charity, No: SCO11086 This communication is intended for the addressee(s) only. If you are not the addressee please inform the sender and delete the email from your system. The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of National Library of Scotland. This message is subject to the Data Protection Act 1998 and Freedom of Information (Scotland) Act 2002. No liability is accepted for any harm that may be caused to your systems or data by this message. Before you print please think about the ENVIRONMENT ---- To subscribe, unsubscribe, or modify your subscription, please visit MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be http://mail.asis.org/mailman/listinfo/pasig-discuss -- Michael Davis | akropilot at gmail.com | mobile 408-464-0441 ---- To subscribe, unsubscribe, or modify your subscription, please visit MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss This e-mail is intended only for the use of the addressees. Any copying, forwarding, printing or other use of this e-mail by persons other than the addressees is not authorized. This e-mail may contain information that is privileged, confidential and exempt from disclosure. If you are not the intended recipient, please notify us immediately by return e-mail (including the original message in your reply) and then delete and discard all copies of the e-mail. Thank you. HB75 -------------- next part -------------- An HTML attachment was scrubbed... URL: From neil.jefferies at bodleian.ox.ac.uk Wed Nov 15 11:52:29 2017 From: neil.jefferies at bodleian.ox.ac.uk (Neil Jefferies) Date: Wed, 15 Nov 2017 16:52:29 +0000 Subject: [Pasig-discuss] Experiences with S3-like object store service providers? In-Reply-To: <0E4FA879-5F8D-4995-A153-2130F28A25DC@dpn.org> References: <20171114100507.b554e26909f2beaf9f8ddbf6be9a6600.409ccf0e0f.wbe@email09.godaddy.com> <0E4FA879-5F8D-4995-A153-2130F28A25DC@dpn.org> Message-ID: <48E9420A4871584593FC3D435EF345AAEEF6FB09@MBX10.ad.oak.ox.ac.uk> All, While we are on the topic of external fixity checking I would like to query the rationale for actually doing it in isolation. This has bugged me for a bit so I am going to unload. As far as I can see, the main reasons would be undetected corruption on storage and tampering that doesn?t hijack the chain of custody. All storage media now have built-in error detection and correction using Reed-Solomon, Hamming or something similar which is generally capable of dealing with small multi-bit errors. In modern environments, this gives unrecoverable read error rates of at worst around 1 in 10^14 and generally several orders of magnitude better ? which is around one in 12TB total read. Write errors are less frequent ? they do occur but can be detected by device firmware and retried elsewhere on the medium. These are absolute worst case figures and result in *detectable* failure long before we even get to computing fixity. The chance of bit flips occurring in such a pattern as to defeat error correction coding is several orders of magnitude less ? it is similar to bit flips resulting in an unchanged MD5 hash. Interestingly, in most cases the mere act of reading data allows devices to detect and correct future errors as the storage medium becomes marginal so there is value in doing that. Consequently, however, undetected corruption is most likely when data moves from the error corrected environment of the medium to less robust environments. At an interconnect level protocols such as SCSI, SATA, Ethernet and FC are all error corrected as is the PCI-E bus itself. The most likely failure points are likely to be a curator?s PC or software. How many curators work on true workstation grade systems with error corrected RAM and error corrected CPU caches? How well tested are your hashing implementations (MD5 had a bug not so long ago)? How about all the scripts that tie everything together? How about every tool in your preservation toolchain? How many of these fail properly when an unrecoverable media error is encountered? If we consider malicious activity then, again, we have to ask whether it is easier to attack the storage (which may require targeting several geographically dispersed and reasonably secure targets) or the curation workflow, which is localised, generally in a less secure location than a machine room, and can legitimise changes. A robust digital signature environment is the way to deal with this ? and fixity hashes *can* be used to make this more efficient (sign the hash rather than the whole object). Locally computed hashes can be very useful as a bandwidth efficient way of comparing multiple copies of an object (rsync has done this for ages) to ensure that they are in sync. So there are reasons to compute hashes, when needed, but fixity alone is not necessarily a compelling reason given the way modern systems are engineered. Neil From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of David Pcolar Sent: 14 November 2017 21:33 To: pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi All, This is a great thread and it has surfaced a couple of new options for cloud storage. Thanks to everyone who has contributed. I would like to address a common issue with the cloud storage vendors and IO/retrieval costs. Since the likelihood of recovering a specific object is low, cloud storage is quite economical from a simple 'recover the object standpoint'. However, preservation repositories are touching those objects frequently to perform fixity checks. I am not aware of any cloud platform that will do fixity audits on demand, as detailed in NSDA Preservation level 3 (Check fixity of content at fixed intervals;Maintain logs of fixity info; supply audit on demand). A common method for providing these checks for content in S3 is to instantiate an EC2 instance, mount the S3 bucket, and run checksums on the objects. For Glacier, an additional step of staging the objects in an accessible area for the EC2 instance is required. This results in I/O and compute cycle fees that could dramatically inflate the cost of public cloud storage over time. For those utilizing public cloud storage for preservation, how are you addressing fixity checks and event audit capture? - Dave David Pcolar CTO, Digital Preservation Network dave at dpn.org On Nov 14, 2017, at 3:22 PM, Mike Davis > wrote: Hi Gail I appreciate the fact that public cloud pricing can be complex; it's a function of the cost-following strategy. If the vendor incurs a cost, whether from media, IO, or networking, it's passed along as discrete charges to the customer. The alternative is opaquely bundling all the costs, which reduces transparency and flexibility to follow commodity curves downward. I believe it's publicly available data that S3 has dropped capacity pricing for example at an average 10% (ish) per year since launch. But the idea that transaction and I/O fees dramatically inflate the cost of public cloud storage is a myth, particularly for digital asset management and archival. It is certainly possible to design a wonky IO-heavy workload, place it on the wrong storage tier, and end up with unexpectedly high costs. But for archival-oriented workloads, the costs of moving data should never be greater than 10% of total or the situation needs to be examined more closely. For example, we might find that large objects are being inadvertently chunked into millions of 16KB objects by some third party gateway solution, that would inflate the transaction count. Happy to give you (and PASIG) a deeper dive on IAAS storage strategies, and to solve for long-term durability. -Mike Davis (AWS Storage) On Tue, Nov 14, 2017 at 9:05 AM, > wrote: Thanks for chiming in Jacob! As always, great additional information. I think it's worth emphasizing that having an open and native data format independent of where the data lives - this is really what will enable multi-cloud workflow management. And also having federated data search of system and descriptive metadata across the namespace no matter where the data is stored (including across public- and on-prem cloud storage). These are what the newer cloud controller software, like Zenko, Starfish, Minio and (and other sw within some cloud services) can enable. Public cloud storage prices are racing to the bottom, but (as David Rosenthal and others have pointed out) often the "hidden" costs of pulling the data back will usually result in costs greater than a private cloud. Stuart - I just read a couple of Forrester papers on Total Economic Impact (TEI) of public clouds -- the ones I have URLs to are posted below and make a useful read: https://www.emc.com/collateral/analyst-reports/dell-emc-ecs-forrester-tei.pdf https://whitepapers.theregister.co.uk/paper/view/5835/the-total-economic-impact-of-scality-ring-with-dell-storage-servers They talk about Dell hardware for building our on-prem clouds (ECS from EMC and RING from Scality) and I believe you're working with HPE, but the maths should be similar to show savings over public cloud. That said, putting one or more copies in public cloud and managing them from one namespace would be ideal... I envision use cases where multi-cloud controller software will allow you to move data to the cloud service that fits the data. [Even if it's for long-term archival, there are times when preservation data services will need to be run (format migration, integrity checks, creating access or derivatives of moving or still images, etc).] Spin up some quick compute services or Hadoop (for other use case). This is a great topic - Julian and Stuart, all the best on your projects, please do let this alias know what you decide to go with! Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers? From: Jacob Farmer > Date: Tue, November 14, 2017 7:03 am To: "Lewis, Stuart" >, "Julian M. Morley" >, gail at trumantechnologies.com, pasig-discuss at mail.asis.org Hi, Stuart. I thought I would weigh in on your plans. I?m a data storage consultant with about 30 years in the game. I?m also the founder of Starfish Storage which makes software products for virtualizing file systems and object stores. ? I think you are correct to manage multiple copies in the application layer. This gives you maximum control, ability to shift vendors, etc. Storage should always be thought of a stack that starts with the application and ends in the storage media. There can be multiple layers and the system architect should pick the right layer of abstraction for any given set of functionality. The higher in the stack, the greater your application awareness. ? By handling replication and addressing in your application, you should be able to switch object stores over time without much difficulty. As such, it does not matter so much which object store you buy. You could simply chase price. ? Oracle ? They are mysterious about what they are doing under the hood, but it does not matter. It?s a ?cloud?. They are so inexpensive. Use them as your second or third copy. I know that Oracle people monitor the news group. Maybe one will offer to connect you to a product manager who can describe the infrastructure. If not, I would be happy to connect you to folks on the US side of the pond. ? There is a particular vendor you did not list who might be interesting for you. They are called Caringo. They have been in the object storage business before S3 came to market, and thus they offer alternative addressing to the S3 bucket paradigm. They can emulate S3 just like everyone else, but S3 was designed by Amazon for the purpose of selling a storage service. It is not necessarily, the logical way to store objects for a digital library. If you are going to address objects directly from your application, they might have some unique value. I am happy to connect you to executives in the company. ? The other vendor worth looking at is Minio.IO. I just pointed Julian to them the other day. They provide an object interface in storage and could federate different cloud stores together. You might consider them for one of your copies. I still like the idea of doing your replication in the application. They are similar in concept to Zenko who Gail recommended earlier. ? POSIX File System Gateway ? My software company (Starfish Storage) has a file system gateway under development (ready for early adopters) that is ideal if you want a POSIX personality. We can take the contents of the object store and present it as a POSIX file system. o We map files 1:1 to objects. Most file system gateways on the market break up files into smaller objects, akin to blocks. o We support Active Directory perfectly, SMB-2, SMB-3, NFS-3, NFS-4 o We also work in-band or side-band to the object store. That means that you can use our POSIX interface simultaneously with S3. ? You probably also have use cases for Starfish, maybe as a migration tool from file to object or as an end-to-end fixity solution. We would be especially useful if you need to migrate files from your tape file system. o Starfish is a metadata and rules engine for file systems and object stores. Too many concepts to put in an email! I hope that helps. Message me offline if you want to discuss. I?m at the SuperComputer conference this week, so replies will be a bit slow. Jacob Farmer | Chief Technology Officer | Cambridge Computer | "Artists In Data Storage" Phone 781-250-3210 | jfarmer at CambridgeComputer.com | www.CambridgeComputer.com From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Lewis, Stuart Sent: Tuesday, November 14, 2017 4:26 AM To: 'Julian M. Morley' >; gail at trumantechnologies.com; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Julian, Gail, all, At the National Library of Scotland we are also in the middle of some procurements to revamp our storage infrastructure for holding our digitised content archive. The approach historically taken here has been to use general purpose SANs, with a second copy placed on offline tape. The SANs have never been built to scale (so they fill and we buy another), and they are general purpose, trying their best (but often failing!) to run a mixed workload of everything from VMs to data archive and everything in between. We?re now wanting to move to three copies, two online and one offline (in the cloud if possible). For the online copies we?re about to get to tender to buy a geo-replicated object storage system, to be hosted in our data centres in Edinburgh and Glasgow. I suspect the likely candidates will be systems such as Dell EMC ECS, HPE+Scality, IBM ESS**, and Hitachi HPC. (** ESS rather than CleverSafe, as I think that is predicated on three datacentres, but we only want two). We?re also about to try a large-scale proof of concept with the Oracle Archive Cloud, but have an open question regarding its characteristics compared to local offline tape. Due to lack of transparency about what is actually going on behind the scenes in a cloud environment, we don?t know whether this gives us the same offline protection that tape gives us (e.g. much harder to corrupt or accidentally delete). We?re also purposefully not going to use the object storage system?s in-built cloud connectors for replication. We feel it might be safer for us to manage the replication to the cloud in our repository, rather than having a single vendor system manage all three copies at once. Critique of this plan is most welcome! Also happy to join in any offline discussion about this. Best wishes, Stuart Lewis Head of Digital National Library of Scotland George IV Bridge, Edinburgh EH1 1EW Tel: +44 (0) 131 623 3704 Email: stuart.lewis at nls.uk Website: www.nls.uk Twitter: @stuartlewis From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Julian M. Morley Sent: 14 November 2017 04:28 To: gail at trumantechnologies.com; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Gail, Sure - would be happy to chat with you. I?ve got Scality in my list of contenders - didn?t mention it here because my first few use cases are explicitly ?not on campus?, but I agree it?s definitely a fit for our main on prem system. As with any commercial software, ongoing licensing costs are a potential pain point for us. -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries From: "gail at trumantechnologies.com" > Date: Monday, November 13, 2017 at 4:06 PM To: Julian Morley >, "pasig-discuss at mail.asis.org" > Cc: "gail at trumantechnologies.com" > Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Julian, thanks for sharing your list and comments. Very thorough list. I'd love to chat (and I'm close by in Oakland).... I've quite a lot of experience in the cloud storage field and would suggest you also take a look at multi-cloud connector technologies that will allow you to standardize on S3, but write to non-S3-based public cloud vendors. And to tier or move data among private and public clouds and do federated search on metadata across a single namespace (across these clouds). Check out a couple of interesting technologies: Open Source Zenko.io - offering S3 connect to AWS, Azure and Google (the latter 2 are coming shortly), and also Scality Connect for Azure Blog Storage - translates S3 API calls to Azure blob storage API calls. See the attached datasheet and also https://www.zenko.io/ I'd add Scality to your list -- see the Gartner magic quadrant they're shown in the Upper Right Visionary quadrant and are close to you in San Francisco. They talk S3, File, NFS/SMB, REST (CDMI etc), can tier off to public clouds, and have lots of multi-PB size customer installs. Gartner MQ is here: https://www.gartner.com/doc/reprints?id=1-4IE870C&ct=171017&st=sb I'd be very interested in learning more about your use cases -- can we connect outside of this PASIG alias? Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: [Pasig-discuss] Experiences with S3-like object store service providers? From: "Julian M. Morley" > Date: Mon, November 13, 2017 12:43 pm To: "pasig-discuss at mail.asis.org" > Hi everyone, I?ve currently got at least four use cases for an S3-compatible object store, spanning everything from traditional S3 through infrequent access stores to cold vaults. As a result I?ve spent considerable time researching options and prices, and was wondering if anyone on this list has any similar experiences they?d like to share. Our use cases range from hundreds of TB through to several PB, with different access patterns and comfort levels around redundancy and access. For most of them a 100% compatible S3 API is a requirement, but we can bend that a bit for the cold storage use case. We?re also considering local/on-prem object stores for one of the use cases - either rolling our own Ceph install, or using Dell/EMC ECS or SpectraLogic ArcticBlue/Blackpearl. The vendors that I?m looking at are: Amazon Web Services (S3, Infrequent Access S3 and S3-to-Glacier). This is the baseline. We have a direct connect pipe to AWS which reduces the pain of data egress considerably. IBM Cloud Bluemix (formerly CleverSafe) A good choice for multi-region redundancy, as they use erasure coding across regions - no ?catch up? replication - providing CRR at a cheaper price than AWS. If you only want to keep one copy of your data in the cloud, but have it be able to survive the loss of a region, this is the best choice (Google can also do this, but not with an S3 API or an infrequent access store). Dell/EMC Virtustream (no cold storage option) Uses EMC ECS hardware. Actually more expensive than AWS at retail pricing for standard object storage; their value add is tying Virtustream into on-prem ECS units. Iron Mountain Iron Cloud (Infrequent Access only) Also uses EMC ECS hardware. Designed primarily for backup/archive workloads (no big surprise there), but with no retrieval, egress or PUT/GET/POST charges. Oracle Cloud (cheapest cold storage option, but not S3 API) Uses Openstack Swift. Has the cheapest cloud-tape product (Oracle Cloud Storage Archive), but has recently increased prices to be closer to AWS Glacier. Google Cloud Platform (not an S3 API) Technically brilliant, but you have to be able to use their APIs. Their cold storage product is online (disk, not tape), but not as cheap as Glacier. Microsoft Azure (not an S3 API) Competitively priced, especially their Infrequent Access product, but again not an S3 API and their vault product is still in beta. Backblaze B2 (not an S3 API) Another backup/archive target, only slightly more expensive than Glacier, but online (no retrieval time or fees) and with significantly cheaper data egress rates than AWS. Wasabi Cloud Recently launched company from the team that brought you Carbonite. Ridiculously cheap S3 storage, but with a 90-day per-object minimum charge. It?s cheaper and faster than Glacier, both to store data and egress it, but there?s obvious concerns around company longevity. Would probably make a good second target if you have a multi-vendor requirement for your data. If anyone is interested in hearing more, or has any experience with any of these vendors, please speak up! -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries ________________________________ ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss National Library of Scotland, Scottish Charity, No: SCO11086 This communication is intended for the addressee(s) only. If you are not the addressee please inform the sender and delete the email from your system. The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of National Library of Scotland. This message is subject to the Data Protection Act 1998 and Freedom of Information (Scotland) Act 2002. No liability is accepted for any harm that may be caused to your systems or data by this message. Before you print please think about the ENVIRONMENT ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -- Michael Davis | akropilot at gmail.com | mobile 408-464-0441 ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From lcco at bgs.ac.uk Wed Nov 15 12:38:52 2017 From: lcco at bgs.ac.uk (Cullen Coates, Lilian S.E.) Date: Wed, 15 Nov 2017 17:38:52 +0000 Subject: [Pasig-discuss] {Disarmed} Re: Experiences with S3-like object store service providers? In-Reply-To: References: <20171114100507.b554e26909f2beaf9f8ddbf6be9a6600.409ccf0e0f.wbe@email09.godaddy.com> <0E4FA879-5F8D-4995-A153-2130F28A25DC@dpn.org> Message-ID: Please remove me from this list. From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Shirazi, Arya (HBO) Sent: 15 November 2017 16:13 To: Louise Pidcock ; Rebecca Jeetoo ; David Pcolar ; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] {Disarmed} Re: Experiences with S3-like object store service providers? Please remove me from this list as well From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Louise Pidcock Sent: Wednesday, November 15, 2017 6:49 AM To: Rebecca Jeetoo >; David Pcolar >; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] {Disarmed} Re: Experiences with S3-like object store service providers? **External Email received from: "Louise Pidcock" > ** Could I please come off the list for the same reason From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Rebecca Jeetoo Sent: 15 November 2017 10:31 To: David Pcolar >; pasig-discuss at mail.asis.org Subject: {Disarmed} Re: [Pasig-discuss] Experiences with S3-like object store service providers? Please could I be taken off this mailing list. This was a course I booked for a colleague, not something I am personally part of. Kind Regards, Becky Jeetoo PA to the Director of Learning National Theatre London SE1 9PX From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of David Pcolar Sent: 14 November 2017 21:33 To: pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? [EXTERNAL MAIL] Hi All, This is a great thread and it has surfaced a couple of new options for cloud storage. Thanks to everyone who has contributed. I would like to address a common issue with the cloud storage vendors and IO/retrieval costs. Since the likelihood of recovering a specific object is low, cloud storage is quite economical from a simple 'recover the object standpoint'. However, preservation repositories are touching those objects frequently to perform fixity checks. I am not aware of any cloud platform that will do fixity audits on demand, as detailed in NSDA Preservation level 3 (Check fixity of content at fixed intervals;Maintain logs of fixity info; supply audit on demand). A common method for providing these checks for content in S3 is to instantiate an EC2 instance, mount the S3 bucket, and run checksums on the objects. For Glacier, an additional step of staging the objects in an accessible area for the EC2 instance is required. This results in I/O and compute cycle fees that could dramatically inflate the cost of public cloud storage over time. For those utilizing public cloud storage for preservation, how are you addressing fixity checks and event audit capture? - Dave David Pcolar CTO, Digital Preservation Network dave at dpn.org On Nov 14, 2017, at 3:22 PM, Mike Davis > wrote: Hi Gail I appreciate the fact that public cloud pricing can be complex; it's a function of the cost-following strategy. If the vendor incurs a cost, whether from media, IO, or networking, it's passed along as discrete charges to the customer. The alternative is opaquely bundling all the costs, which reduces transparency and flexibility to follow commodity curves downward. I believe it's publicly available data that S3 has dropped capacity pricing for example at an average 10% (ish) per year since launch. But the idea that transaction and I/O fees dramatically inflate the cost of public cloud storage is a myth, particularly for digital asset management and archival. It is certainly possible to design a wonky IO-heavy workload, place it on the wrong storage tier, and end up with unexpectedly high costs. But for archival-oriented workloads, the costs of moving data should never be greater than 10% of total or the situation needs to be examined more closely. For example, we might find that large objects are being inadvertently chunked into millions of 16KB objects by some third party gateway solution, that would inflate the transaction count. Happy to give you (and PASIG) a deeper dive on IAAS storage strategies, and to solve for long-term durability. -Mike Davis (AWS Storage) On Tue, Nov 14, 2017 at 9:05 AM, > wrote: Thanks for chiming in Jacob! As always, great additional information. I think it's worth emphasizing that having an open and native data format independent of where the data lives - this is really what will enable multi-cloud workflow management. And also having federated data search of system and descriptive metadata across the namespace no matter where the data is stored (including across public- and on-prem cloud storage). These are what the newer cloud controller software, like Zenko, Starfish, Minio and (and other sw within some cloud services) can enable. Public cloud storage prices are racing to the bottom, but (as David Rosenthal and others have pointed out) often the "hidden" costs of pulling the data back will usually result in costs greater than a private cloud. Stuart - I just read a couple of Forrester papers on Total Economic Impact (TEI) of public clouds -- the ones I have URLs to are posted below and make a useful read: MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be https://www.emc.com/collateral/analyst-reports/dell-emc-ecs-forrester-tei.pdf MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be https://whitepapers.theregister.co.uk/paper/view/5835/the-total-economic-impact-of-scality-ring-with-dell-storage-servers They talk about Dell hardware for building our on-prem clouds (ECS from EMC and RING from Scality) and I believe you're working with HPE, but the maths should be similar to show savings over public cloud. That said, putting one or more copies in public cloud and managing them from one namespace would be ideal... I envision use cases where multi-cloud controller software will allow you to move data to the cloud service that fits the data. [Even if it's for long-term archival, there are times when preservation data services will need to be run (format migration, integrity checks, creating access or derivatives of moving or still images, etc).] Spin up some quick compute services or Hadoop (for other use case). This is a great topic - Julian and Stuart, all the best on your projects, please do let this alias know what you decide to go with! Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be www.trumantechnologies.com facebook/TrumanTechnologies MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers? From: Jacob Farmer > Date: Tue, November 14, 2017 7:03 am To: "Lewis, Stuart" >, "Julian M. Morley" >, gail at trumantechnologies.com, pasig-discuss at mail.asis.org Hi, Stuart. I thought I would weigh in on your plans. I?m a data storage consultant with about 30 years in the game. I?m also the founder of Starfish Storage which makes software products for virtualizing file systems and object stores. ? I think you are correct to manage multiple copies in the application layer. This gives you maximum control, ability to shift vendors, etc. Storage should always be thought of a stack that starts with the application and ends in the storage media. There can be multiple layers and the system architect should pick the right layer of abstraction for any given set of functionality. The higher in the stack, the greater your application awareness. ? By handling replication and addressing in your application, you should be able to switch object stores over time without much difficulty. As such, it does not matter so much which object store you buy. You could simply chase price. ? Oracle ? They are mysterious about what they are doing under the hood, but it does not matter. It?s a ?cloud?. They are so inexpensive. Use them as your second or third copy. I know that Oracle people monitor the news group. Maybe one will offer to connect you to a product manager who can describe the infrastructure. If not, I would be happy to connect you to folks on the US side of the pond. ? There is a particular vendor you did not list who might be interesting for you. They are called Caringo. They have been in the object storage business before S3 came to market, and thus they offer alternative addressing to the S3 bucket paradigm. They can emulate S3 just like everyone else, but S3 was designed by Amazon for the purpose of selling a storage service. It is not necessarily, the logical way to store objects for a digital library. If you are going to address objects directly from your application, they might have some unique value. I am happy to connect you to executives in the company. ? The other vendor worth looking at is Minio.IO. I just pointed Julian to them the other day. They provide an object interface in storage and could federate different cloud stores together. You might consider them for one of your copies. I still like the idea of doing your replication in the application. They are similar in concept to Zenko who Gail recommended earlier. ? POSIX File System Gateway ? My software company (Starfish Storage) has a file system gateway under development (ready for early adopters) that is ideal if you want a POSIX personality. We can take the contents of the object store and present it as a POSIX file system. o We map files 1:1 to objects. Most file system gateways on the market break up files into smaller objects, akin to blocks. o We support Active Directory perfectly, SMB-2, SMB-3, NFS-3, NFS-4 o We also work in-band or side-band to the object store. That means that you can use our POSIX interface simultaneously with S3. ? You probably also have use cases for Starfish, maybe as a migration tool from file to object or as an end-to-end fixity solution. We would be especially useful if you need to migrate files from your tape file system. o Starfish is a metadata and rules engine for file systems and object stores. Too many concepts to put in an email! I hope that helps. Message me offline if you want to discuss. I?m at the SuperComputer conference this week, so replies will be a bit slow. Jacob Farmer | Chief Technology Officer | Cambridge Computer | "Artists In Data Storage" Phone 781-250-3210 | jfarmer at CambridgeComputer.com | MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be www.CambridgeComputer.com From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Lewis, Stuart Sent: Tuesday, November 14, 2017 4:26 AM To: 'Julian M. Morley' >; gail at trumantechnologies.com; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Julian, Gail, all, At the National Library of Scotland we are also in the middle of some procurements to revamp our storage infrastructure for holding our digitised content archive. The approach historically taken here has been to use general purpose SANs, with a second copy placed on offline tape. The SANs have never been built to scale (so they fill and we buy another), and they are general purpose, trying their best (but often failing!) to run a mixed workload of everything from VMs to data archive and everything in between. We?re now wanting to move to three copies, two online and one offline (in the cloud if possible). For the online copies we?re about to get to tender to buy a geo-replicated object storage system, to be hosted in our data centres in Edinburgh and Glasgow. I suspect the likely candidates will be systems such as Dell EMC ECS, HPE+Scality, IBM ESS**, and Hitachi HPC. (** ESS rather than CleverSafe, as I think that is predicated on three datacentres, but we only want two). We?re also about to try a large-scale proof of concept with the Oracle Archive Cloud, but have an open question regarding its characteristics compared to local offline tape. Due to lack of transparency about what is actually going on behind the scenes in a cloud environment, we don?t know whether this gives us the same offline protection that tape gives us (e.g. much harder to corrupt or accidentally delete). We?re also purposefully not going to use the object storage system?s in-built cloud connectors for replication. We feel it might be safer for us to manage the replication to the cloud in our repository, rather than having a single vendor system manage all three copies at once. Critique of this plan is most welcome! Also happy to join in any offline discussion about this. Best wishes, Stuart Lewis Head of Digital National Library of Scotland George IV Bridge, Edinburgh EH1 1EW Tel: +44 (0) 131 623 3704 Email: stuart.lewis at nls.uk Website: MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be www.nls.uk Twitter: @stuartlewis From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Julian M. Morley Sent: 14 November 2017 04:28 To: gail at trumantechnologies.com; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Gail, Sure - would be happy to chat with you. I?ve got Scality in my list of contenders - didn?t mention it here because my first few use cases are explicitly ?not on campus?, but I agree it?s definitely a fit for our main on prem system. As with any commercial software, ongoing licensing costs are a potential pain point for us. -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries From: "gail at trumantechnologies.com" > Date: Monday, November 13, 2017 at 4:06 PM To: Julian Morley >, "pasig-discuss at mail.asis.org" > Cc: "gail at trumantechnologies.com" > Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Julian, thanks for sharing your list and comments. Very thorough list. I'd love to chat (and I'm close by in Oakland).... I've quite a lot of experience in the cloud storage field and would suggest you also take a look at multi-cloud connector technologies that will allow you to standardize on S3, but write to non-S3-based public cloud vendors. And to tier or move data among private and public clouds and do federated search on metadata across a single namespace (across these clouds). Check out a couple of interesting technologies: Open Source Zenko.io - offering S3 connect to AWS, Azure and Google (the latter 2 are coming shortly), and also Scality Connect for Azure Blog Storage - translates S3 API calls to Azure blob storage API calls. See the attached datasheet and also MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be https://www.zenko.io/ I'd add Scality to your list -- see the Gartner magic quadrant they're shown in the Upper Right Visionary quadrant and are close to you in San Francisco. They talk S3, File, NFS/SMB, REST (CDMI etc), can tier off to public clouds, and have lots of multi-PB size customer installs. Gartner MQ is here: MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be https://www.gartner.com/doc/reprints?id=1-4IE870C&ct=171017&st=sb I'd be very interested in learning more about your use cases -- can we connect outside of this PASIG alias? Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be www.trumantechnologies.com facebook/TrumanTechnologies MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: [Pasig-discuss] Experiences with S3-like object store service providers? From: "Julian M. Morley" > Date: Mon, November 13, 2017 12:43 pm To: "pasig-discuss at mail.asis.org" > Hi everyone, I?ve currently got at least four use cases for an S3-compatible object store, spanning everything from traditional S3 through infrequent access stores to cold vaults. As a result I?ve spent considerable time researching options and prices, and was wondering if anyone on this list has any similar experiences they?d like to share. Our use cases range from hundreds of TB through to several PB, with different access patterns and comfort levels around redundancy and access. For most of them a 100% compatible S3 API is a requirement, but we can bend that a bit for the cold storage use case. We?re also considering local/on-prem object stores for one of the use cases - either rolling our own Ceph install, or using Dell/EMC ECS or SpectraLogic ArcticBlue/Blackpearl. The vendors that I?m looking at are: Amazon Web Services (S3, Infrequent Access S3 and S3-to-Glacier). This is the baseline. We have a direct connect pipe to AWS which reduces the pain of data egress considerably. IBM Cloud Bluemix (formerly CleverSafe) A good choice for multi-region redundancy, as they use erasure coding across regions - no ?catch up? replication - providing CRR at a cheaper price than AWS. If you only want to keep one copy of your data in the cloud, but have it be able to survive the loss of a region, this is the best choice (Google can also do this, but not with an S3 API or an infrequent access store). Dell/EMC Virtustream (no cold storage option) Uses EMC ECS hardware. Actually more expensive than AWS at retail pricing for standard object storage; their value add is tying Virtustream into on-prem ECS units. Iron Mountain Iron Cloud (Infrequent Access only) Also uses EMC ECS hardware. Designed primarily for backup/archive workloads (no big surprise there), but with no retrieval, egress or PUT/GET/POST charges. Oracle Cloud (cheapest cold storage option, but not S3 API) Uses Openstack Swift. Has the cheapest cloud-tape product (Oracle Cloud Storage Archive), but has recently increased prices to be closer to AWS Glacier. Google Cloud Platform (not an S3 API) Technically brilliant, but you have to be able to use their APIs. Their cold storage product is online (disk, not tape), but not as cheap as Glacier. Microsoft Azure (not an S3 API) Competitively priced, especially their Infrequent Access product, but again not an S3 API and their vault product is still in beta. Backblaze B2 (not an S3 API) Another backup/archive target, only slightly more expensive than Glacier, but online (no retrieval time or fees) and with significantly cheaper data egress rates than AWS. Wasabi Cloud Recently launched company from the team that brought you Carbonite. Ridiculously cheap S3 storage, but with a 90-day per-object minimum charge. It?s cheaper and faster than Glacier, both to store data and egress it, but there?s obvious concerns around company longevity. Would probably make a good second target if you have a multi-vendor requirement for your data. If anyone is interested in hearing more, or has any experience with any of these vendors, please speak up! -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries ________________________________ ---- To subscribe, unsubscribe, or modify your subscription, please visit MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be http://mail.asis.org/mailman/listinfo/pasig-discuss National Library of Scotland, Scottish Charity, No: SCO11086 This communication is intended for the addressee(s) only. If you are not the addressee please inform the sender and delete the email from your system. The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of National Library of Scotland. This message is subject to the Data Protection Act 1998 and Freedom of Information (Scotland) Act 2002. No liability is accepted for any harm that may be caused to your systems or data by this message. Before you print please think about the ENVIRONMENT ---- To subscribe, unsubscribe, or modify your subscription, please visit MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be http://mail.asis.org/mailman/listinfo/pasig-discuss -- Michael Davis | akropilot at gmail.com | mobile 408-464-0441 ---- To subscribe, unsubscribe, or modify your subscription, please visit MailScanner has detected a possible fraud attempt from "urldefense.proofpoint.com" claiming to be http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss This e-mail is intended only for the use of the addressees. Any copying, forwarding, printing or other use of this e-mail by persons other than the addressees is not authorized. This e-mail may contain information that is privileged, confidential and exempt from disclosure. If you are not the intended recipient, please notify us immediately by return e-mail (including the original message in your reply) and then delete and discard all copies of the e-mail. Thank you. HB75 ________________________________ This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.clarke1 at verizon.net Wed Nov 15 13:44:55 2017 From: raymond.clarke1 at verizon.net (Raymond Clarke) Date: Wed, 15 Nov 2017 18:44:55 +0000 (UTC) Subject: [Pasig-discuss] Experiences with S3-like object store service providers? In-Reply-To: <48E9420A4871584593FC3D435EF345AAEEF6FB09@MBX10.ad.oak.ox.ac.uk> References: <20171114100507.b554e26909f2beaf9f8ddbf6be9a6600.409ccf0e0f.wbe@email09.godaddy.com> <0E4FA879-5F8D-4995-A153-2130F28A25DC@dpn.org> <48E9420A4871584593FC3D435EF345AAEEF6FB09@MBX10.ad.oak.ox.ac.uk> Message-ID: <9E9B4D628E8CD499.40881B53-CDD4-4F94-94F7-C389389BA734@mail.outlook.com> Neil, That was indeed a mouth full. But I fully agree with your assessment. Particularly with the rates of read error occurrences the ability of today?s storage systems in handling multi-bit.? Well said.? Take good care,Raymond _____________________________ From: Neil Jefferies Sent: Wednesday, November 15, 2017 1:31 PM Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? To: All, ? While we are on the topic of external fixity checking I would like to query the rationale for actually doing it in isolation. This has bugged me for a bit so I am going to unload. ? As far as I can see, the main reasons would be undetected corruption on storage and tampering that doesn?t hijack the chain of custody. ? All storage media now have built-in error detection and correction using Reed-Solomon, Hamming or something similar which is generally capable of dealing with small multi-bit errors. In modern environments, this gives unrecoverable read error rates of at worst around 1 in 10^14 and generally several orders of magnitude better ? which is around one in 12TB total read. Write errors are less frequent ? they do occur but can be detected by device firmware and retried elsewhere on the medium. These are absolute worst case figures and result in *detectable* failure long before we even get to computing fixity. The chance of bit flips occurring in such a pattern as to defeat error correction coding is several orders of magnitude less ? it is similar to bit flips resulting in an unchanged MD5 hash. Interestingly, in most cases the mere act of reading data allows devices to detect and correct future errors as the storage medium becomes marginal so there is value in doing that. ? Consequently, however, undetected corruption is most likely when data moves from the error corrected environment of the medium to less robust environments. At an interconnect level protocols such as SCSI, SATA, Ethernet and FC are all error corrected as is the PCI-E bus itself. The most likely failure points are likely to be a curator?s PC or software. How many curators work on true workstation grade systems with error corrected RAM and error corrected CPU caches? How well tested are your hashing implementations (MD5 had a bug not so long ago)? How about all the scripts that tie everything together? How about every tool in your preservation toolchain? How many of these fail properly when an unrecoverable media error is encountered? ? If we consider malicious activity then, again, we have to ask whether it is easier to attack the storage (which may require targeting several geographically dispersed and reasonably secure targets) or the curation workflow, which is localised, generally in a less secure location than a machine room, and can legitimise changes. A robust digital signature environment is the way to deal with this ? and fixity hashes *can* be used to make this more efficient (sign the hash rather than the whole object).??? ? Locally computed hashes can be very useful as a bandwidth efficient way of comparing multiple copies of an object (rsync has done this for ages) to ensure that they are in sync. ? So there are reasons to compute hashes, when needed, but fixity alone is not necessarily a compelling reason given the way modern systems are engineered. ? Neil ? ? From: Pasig-discuss [ Pasig-discuss [mailto:pasig-discuss-bounces at asist.org]On Behalf Of David Pcolar Sent: 14 November 2017 21:33 To: pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? ? Hi All, ? This is a great thread and it has surfaced a couple of new options for cloud storage. ?Thanks to everyone who has contributed. ? I would like to address a common issue with the cloud storage vendors and IO/retrieval costs. ?Since the likelihood of recovering a specific object is low, cloud storage is quite economical from a simple 'recover the object standpoint'. However, preservation repositories are touching those objects frequently to perform fixity checks. I am not aware of any cloud platform that will do fixity audits on demand, as detailed in NSDA Preservation level 3 (Check fixity of content at fixed intervals;Maintain logs of fixity info; supply audit on demand).? ? A common method for providing these checks for content in S3 is to instantiate an EC2 instance, mount the S3 bucket, and run checksums on the objects. For Glacier, an additional step of staging the objects in an accessible area for the EC2 instance is required. This results in I/O and compute cycle fees that could dramatically inflate the cost of public cloud storage over time. ? For those utilizing public cloud storage for preservation, how are you addressing fixity checks and event audit capture? ? - Dave ? ? David Pcolar CTO, Digital Preservation Network dave at dpn.org On Nov 14, 2017, at 3:22 PM, Mike Davis wrote: ? Hi Gail ? I appreciate the fact that public cloud pricing can be complex; it's a function of the cost-following strategy. If the vendor incurs a cost, whether from media, IO, or networking, it's passed along as discrete charges to the customer. The alternative is opaquely bundling all the costs, which reduces transparency and flexibility to follow commodity curves downward. I believe it's publicly available data that S3 has dropped capacity pricing for example at an average 10% (ish) per year since launch. ? But the idea that transaction and I/O fees dramatically inflate the cost of public cloud storage is a myth, particularly for digital asset management and archival. It is certainly possible to design a wonky IO-heavy workload, place it on the wrong storage tier, and end up with unexpectedly high costs. But for archival-oriented workloads, the costs ofmoving data should never be greater than 10% of total or the situation needs to be examined more closely. For example, we might find that large objects are being inadvertently chunked into millions of 16KB objects by some third party gateway solution, that would inflate the transaction count. ? Happy to give you (and PASIG) a deeper dive on IAAS storage strategies, and to solve for long-term durability. ? -Mike Davis (AWS Storage) ? ? On Tue, Nov 14, 2017 at 9:05 AM, wrote: Thanks for chiming in Jacob!??As always, great additional information. I think it's worth emphasizing that having an open and native data format independent of where the data lives - this is really what will enable multi-cloud workflow management. And also having federated data search of system and descriptive metadata across the namespace no matter where the data is stored (including across public- and on-prem cloud storage). These are what the newer cloud controller software, like Zenko, Starfish, Minio and (and other sw within some cloud services) can enable. ? Public cloud storage prices are racing to the bottom, but (as David Rosenthal and others have pointed out) often the "hidden" costs of pulling the data back will usually result in costs greater than a private cloud. ? Stuart -? I just read a couple of Forrester papers on Total Economic Impact (TEI) of public clouds -- the ones I have URLs to are posted below and make a useful read: https://www.emc.com/collateral/analyst-reports/dell-emc-ecs-forrester-tei.pdf https://whitepapers.theregister.co.uk/paper/view/5835/the-total-economic-impact-of-scality-ring-with-dell-storage-servers They talk about Dell hardware for building our on-prem clouds (ECS from EMC and RING from Scality) and I believe you're working with HPE, but the maths should be similar to show savings over public cloud. That said, putting one or more copies in public cloud and managing them from one namespace would be ideal...?I envision use cases where multi-cloud controller software will allow you to move data to the cloud service that fits the data. [Even if it's for long-term archival, there are times when preservation data services will need to be run (format migration, integrity checks, creating access or derivatives of moving or still images, etc).] Spin up some quick compute services or Hadoop (for other use case).? ? This is a great topic - Julian and Stuart, all the best on your projects, please do let this alias know what you decide to go with! ? ? Gail ? ? Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists ? Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman? ? +1 510 502 6497 ? ? ? -------- Original Message -------- Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers? From: Jacob Farmer Date: Tue, November 14, 2017 7:03 am To: "Lewis, Stuart" , "Julian M. Morley" ,gail at trumantechnologies.com, pasig-discuss at mail.asis.org Hi, Stuart. I thought I would weigh in on your plans.? I?m a data storage consultant with about 30 years in the game. I?m also the founder of Starfish Storage which makes software products for virtualizing file systems and object stores.? ? ???????I think you are correct to manage multiple copies in the application layer.? This gives you maximum control, ability to shift vendors, etc.? Storage should always be thought of a stack that starts with the application and ends in the storage media.? There can be multiple layers and the system architect should pick the right layer of abstraction for any given set of functionality.? The higher in the stack, the greater your application awareness.? ? ???????By handling replication and addressing in your application, you should be able to switch object stores over time without much difficulty.? As such, it does not matter so much which object store you buy.? You could simply chase price. ? ???????Oracle ? They are mysterious about what they are doing under the hood, but it does not matter.? It?s a ?cloud?.? They are so inexpensive.? Use them as your second or third copy.? I know that Oracle people monitor the news group.? Maybe one will offer to connect you to a product manager who can describe the infrastructure. If not, I would be happy to connect you to folks on the US side of the pond.? ? ???????There is a particular vendor you did not list who might be interesting for you.? They are called Caringo.? They have been in the object storage business before S3 came to market, and thus they offer alternative addressing to the S3 bucket paradigm.? They can emulate S3 just like everyone else, but S3 was designed by Amazon for the purpose of selling a storage service.? It is not necessarily, the logical way to store objects for a digital library.? If you are going to address objects directly from your application, they might have some unique value.? I am happy to connect you to executives in the company.? ? ???????The other vendor worth looking at isMinio.IO.? I just pointed Julian to them the other day. They provide an object interface in storage and could federate different cloud stores together.? You might consider them for one of your copies. I still like the idea of doing your replication in the application.? They are similar in concept to Zenko who Gail recommended earlier.? ? ???????POSIX File System Gateway ? My software company (Starfish Storage) has a file system gateway under development (ready for early adopters) that is ideal if you want a POSIX personality.? We can take the contents of the object store and present it as a POSIX file system.? ? o??We map files 1:1 to objects.? Most file system gateways on the market break up files into smaller objects, akin to blocks.? o??We support Active Directory perfectly, SMB-2, SMB-3, NFS-3, NFS-4 o??We also work in-band or side-band to the object store.? That means that you can use our POSIX interface simultaneously with S3. ? ???????You probably also have use cases for Starfish, maybe as a migration tool from file to object or as an end-to-end fixity solution.? We would be especially useful if you need to migrate files from your tape file system.? o??Starfish is a metadata and rules engine for file systems and object stores.? Too many concepts to put in an email!? ? I hope that helps.? Message me offline if you want to discuss.? I?m at the SuperComputer conference this week, so replies will be a bit slow.? ? ? Jacob Farmer? |? Chief Technology Officer ?|? Cambridge Computer ?|? "Artists In Data Storage" Phone781-250-3210??|?jfarmer at CambridgeComputer.com? |? www.CambridgeComputer.com ? ? From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org]On Behalf Of Lewis, Stuart Sent: Tuesday, November 14, 2017 4:26 AM To: 'Julian M. Morley' ;gail at trumantechnologies.com;pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? ? Hi Julian, Gail, all, ? At the National Library of Scotland we are also in the middle of some procurements to revamp our storage infrastructure for holding our digitised content archive. ? The approach historically taken here has been to use general purpose SANs, with a second copy placed on offline tape.? The SANs have never been built to scale (so they fill and we buy another), and they are general purpose, trying their best (but often failing!) to run a mixed workload of everything from VMs to data archive and everything in between. ? We?re now wanting to move to three copies, two online and one offline (in the cloud if possible). ? For the online copies we?re about to get to tender to buy a geo-replicated object storage system, to be hosted in our data centres in Edinburgh and Glasgow.? I suspect the likely candidates will be systems such as Dell EMC ECS, HPE+Scality, IBM ESS**, and Hitachi HPC. ? (** ESS rather than CleverSafe, as I think that is predicated on three datacentres, but we only want two). ? We?re also about to try a large-scale proof of concept with the Oracle Archive Cloud, but have an open question regarding its characteristics compared to local offline tape.? Due to lack of transparency about what is actually going on behind the scenes in a cloud environment, we don?t know whether this gives us the same offline protection that tape gives us (e.g. much harder to corrupt or accidentally delete). ? We?re also purposefully not going to use the object storage system?s in-built cloud connectors for replication.? We feel it might be safer for us to manage the replication to the cloud in our repository, rather than having a single vendor system manage all three copies at once. ? Critique of this plan is most welcome! ? Also happy to join in any offline discussion about this. ? Best wishes, ? ? Stuart Lewis Head of Digital National Library of Scotland George IV Bridge, Edinburgh EH1 1EW Tel: +44 (0) 131 623 3704 Email: stuart.lewis at nls.uk Website:www.nls.uk Twitter: @stuartlewis ? ? ? ? From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org]On Behalf Of Julian M. Morley Sent: 14 November 2017 04:28 To: gail at trumantechnologies.com;pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? ? Hi Gail, ? Sure - would be happy to chat with you. ? I?ve got Scality in my list of contenders - didn?t mention it here because my first few use cases are explicitly ?not on campus?, but I agree it?s definitely a fit for our main on prem system. As with any commercial software, ongoing licensing costs are a potential pain point for us. ? --? Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries ? From:"gail at trumantechnologies.com" Date: Monday, November 13, 2017 at 4:06 PM To: Julian Morley , "pasig-discuss at mail.asis.org" Cc: "gail at trumantechnologies.com" Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers? ? Hi Julian, thanks for sharing your list and comments. Very thorough list. I'd love to chat (and I'm close by in Oakland).... I've quite a lot of experience in the cloud storage field and would suggest you also take a look at multi-cloud connector technologies that will allow you to standardize on S3, but write to non-S3-based public cloud vendors. And to tier or move data among private and public clouds and do federated search on metadata across a single namespace (across these clouds). ? Check out a couple of interesting technologies: ? Open SourceZenko.io - offering S3 connect to AWS, Azure and Google (the latter 2 are coming shortly), and also Scality Connect for Azure Blog Storage - translates S3 API calls to Azure blob storage API calls. ? See the attached datasheet and also??https://www.zenko.io/ ? I'd add Scality to your list -- see the Gartner magic quadrant they're shown in the Upper Right Visionary quadrant and are close to you in San Francisco. They talk S3, File, NFS/SMB, REST (CDMI etc), can tier off to public clouds, and have lots of multi-PB size customer installs.??Gartner MQ is here:?https://www.gartner.com/doc/reprints?id=1-4IE870C&ct=171017&st=sb ? I'd be very interested in learning more about your use cases -- can we connect outside of this PASIG alias? ? Gail ? ? ? Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists ? Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman? ? +1 510 502 6497 ? ? ? -------- Original Message -------- Subject: [Pasig-discuss] Experiences with S3-like object store service providers? From: "Julian M. Morley" Date: Mon, November 13, 2017 12:43 pm To: "pasig-discuss at mail.asis.org" ? Hi everyone, ? I?ve currently got at least four use cases for an S3-compatible object store, spanning everything from traditional S3 through infrequent access stores to cold vaults. As a result I?ve spent considerable time researching options and prices, and was wondering if anyone on this list has any similar experiences they?d like to share. ? Our use cases range from hundreds of TB through to several PB, with different access patterns and comfort levels around redundancy and access. For most of them a 100% compatible S3 API is a requirement, but we can bend that a bit for the cold storage use case. We?re also considering local/on-prem object stores for one of the use cases - either rolling our own Ceph install, or using Dell/EMC ECS or SpectraLogic ArcticBlue/Blackpearl. ? The vendors that I?m looking at are: ? Amazon Web Services (S3, Infrequent Access S3 and S3-to-Glacier).? This is the baseline. We have a direct connect pipe to AWS which reduces the pain of data egress considerably. ? IBM Cloud Bluemix (formerly CleverSafe) A good choice for multi-region redundancy, as they use erasure coding across regions - no ?catch up? replication - providing CRR at a cheaper price than AWS. If you only want to keep one copy of your data in the cloud, but have it be able to survive the loss of a region, this is the best choice (Google can also do this, but not with an S3 API or an infrequent access store). ? Dell/EMC Virtustream (no cold storage option) Uses EMC ECS hardware. Actually more expensive than AWS at retail pricing for standard object storage; their value add is tying Virtustream into on-prem ECS units. ? Iron Mountain Iron Cloud (Infrequent Access only) Also uses EMC ECS hardware. Designed primarily for backup/archive workloads (no big surprise there), but with no retrieval, egress or PUT/GET/POST charges. ? Oracle Cloud (cheapest cold storage option, but not S3 API) Uses Openstack Swift. Has the cheapest cloud-tape product (Oracle Cloud Storage Archive), but has recently increased prices to be closer to AWS Glacier. ? Google Cloud Platform (not an S3 API) Technically brilliant, but you have to be able to use their APIs. Their cold storage product is online (disk, not tape), but not as cheap as Glacier. ? Microsoft Azure(not an S3 API) Competitively priced, especially their Infrequent Access product, but again not an S3 API and their vault product is still in beta. ? Backblaze B2 (not an S3 API) Another backup/archive target, only slightly more expensive than Glacier, but online (no retrieval time or fees) and with significantly cheaper data egress rates than AWS. ? Wasabi Cloud Recently launched company from the team that brought you Carbonite. Ridiculously cheap S3 storage, but with a 90-day per-object minimum charge.It?s cheaper and faster than Glacier, both to store data and egress it, but there?s obvious concerns around company longevity. Would probably make a good second target if you have a multi-vendor requirement for your data. ? If anyone is interested in hearing more, or has any experience with any of these vendors, please speak up! ? --? Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss ? National Library of Scotland, Scottish Charity, No: SCO11086 This communication is intended for the addressee(s) only. If you are not the addressee please inform the sender and delete the email from your system. The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of National Library of Scotland. This message is subject to the Data Protection Act 1998 and Freedom of Information (Scotland) Act 2002. No liability is accepted for any harm that may be caused to your systems or data by this message. ?Before you print please think about the ENVIRONMENT ? ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss ? -- Michael Davis? | ?akropilot at gmail.com? |? mobile 408-464-0441 ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmorley at stanford.edu Wed Nov 15 15:27:36 2017 From: jmorley at stanford.edu (Julian M. Morley) Date: Wed, 15 Nov 2017 20:27:36 +0000 Subject: [Pasig-discuss] Experiences with S3-like object store service providers? In-Reply-To: <48E9420A4871584593FC3D435EF345AAEEF6FB09@MBX10.ad.oak.ox.ac.uk> References: <20171114100507.b554e26909f2beaf9f8ddbf6be9a6600.409ccf0e0f.wbe@email09.godaddy.com> <0E4FA879-5F8D-4995-A153-2130F28A25DC@dpn.org> <48E9420A4871584593FC3D435EF345AAEEF6FB09@MBX10.ad.oak.ox.ac.uk> Message-ID: From: Pasig-discuss > on behalf of Neil Jefferies > Date: Wednesday, November 15, 2017 at 8:52 AM To: "pasig-discuss at mail.asis.org" > Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Interestingly, in most cases the mere act of reading data allows devices to detect and correct future errors as the storage medium becomes marginal so there is value in doing that. Yes, exactly this. It?s fundamental to how erasure coding works. Being able to prove that the data you wrote to storage is unchanged is important, but the process you use to provide the proof doesn?t have to be by comparing md5 file hashes - certifying a particular storage environment to either store bits correctly or alert on failure should be acceptable. I wouldn?t trust a USB thumb drive for long term storage, but I would trust a well-run reed-solomon based disk cluster with periodic scrubbing. We?ve had some discussions internally about how to fixity check content that we send to the cloud, which has been focusing more on *how* to do it rather than *why* we do it. We?re storing checksums (so many checksums!) external to our main preservation system so that we can prove chain of custody and eventually start signing them, but I?m really not worried about undetectable, uncorrectable bit flipping of content that we send to IaaS providers. I?m more concerned about human-induced data loss, whether it?s accidental (coding errors or manual mistakes) or malicious. Fixity checking definitely has a role to play there, but is only one part of the entire audit process. -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries -------------- next part -------------- An HTML attachment was scrubbed... URL: From kyle.rimkus at gmail.com Wed Nov 15 17:59:23 2017 From: kyle.rimkus at gmail.com (Kyle Rimkus) Date: Wed, 15 Nov 2017 16:59:23 -0600 Subject: [Pasig-discuss] Experiences with S3-like object store service providers? In-Reply-To: References: <20171114100507.b554e26909f2beaf9f8ddbf6be9a6600.409ccf0e0f.wbe@email09.godaddy.com> <0E4FA879-5F8D-4995-A153-2130F28A25DC@dpn.org> <48E9420A4871584593FC3D435EF345AAEEF6FB09@MBX10.ad.oak.ox.ac.uk> Message-ID: On the issue of fixity checking, I agree there is a great deal of misplaced paranoia around "bit-flipping" in contemporary storage technologies, but at the same time I've found regular fixity checks to be very useful in protecting against various types of human error that can be made in managing the storage itself or in scripted processes that interact with stored data. As an example, at my university we farm our storage out to a campus unit which stores two copies locally while we push a third into Amazon Glacier. Our preservation repository software's regular fixity checks of on-campus data have helped us keep our storage providers and ourselves honest. We have on more than one occasion discovered fixity errors that pointed to questionable server management. Regular fixity checking was what flagged us to these errors, and the storage of a third file copy off-site was what saved us. We are also looking into pushing more of our storage services into the (most likely AWS) cloud, where greater guarantees are made against this type of problem. Maybe in time we'll come to see regular fixity checks as less critically important than we do now. Julian's comment that "certifying a particular storage environment to either store bits correctly or alert on failure should be acceptable" is interesting. I'm sure we'd all like to get away from having to run constant fixity checks in our repositories, and would like to see digital preservation management architecture evolve in this direction. For now though I'd wager that most of us are constrained by the fact that fixity checking remains essential to making sure our storage is doing what it claims to do. Kyle -- Kyle R. Rimkus Assistant Professor Preservation Librarian University of Illinois at Urbana-Champaign -------------- next part -------------- An HTML attachment was scrubbed... URL: From neil at jefferies.org Wed Nov 15 19:15:35 2017 From: neil at jefferies.org (Neil Jefferies) Date: Thu, 16 Nov 2017 00:15:35 +0000 Subject: [Pasig-discuss] Experiences with S3-like object store service providers? In-Reply-To: References: <20171114100507.b554e26909f2beaf9f8ddbf6be9a6600.409ccf0e0f.wbe@email09.godaddy.com> <0E4FA879-5F8D-4995-A153-2130F28A25DC@dpn.org> <48E9420A4871584593FC3D435EF345AAEEF6FB09@MBX10.ad.oak.ox.ac.uk> Message-ID: Kyle, Can you expand... because that doesn't actually sound like a "fixity" issue (which is kind of my point). Either the wrong thing has been copied or the wrong thing has been stored i.e. an off-storage process failure. For those cases checks should really be done at the time of whatever operation caused the issue. Waiting for a check to pick it up downstream at some later time is somewhat risky. My point is, these are not "fixity" checks, they are transmission checks. Neil On 2017-11-15 22:59, Kyle Rimkus wrote: > On the issue of fixity checking, I agree there is a great deal of > misplaced paranoia around "bit-flipping" in contemporary storage > technologies, but at the same time I've found regular fixity checks to > be very useful in protecting against various types of human error that > can be made in managing the storage itself or in scripted processes > that interact with stored data. > > As an example, at my university we farm our storage out to a campus > unit which stores two copies locally while we push a third into Amazon > Glacier. Our preservation repository software's regular fixity checks > of on-campus data have helped us keep our storage providers and > ourselves honest. We have on more than one occasion discovered fixity > errors that pointed to questionable server management. Regular fixity > checking was what flagged us to these errors, and the storage of a > third file copy off-site was what saved us. > > We are also looking into pushing more of our storage services into the > (most likely AWS) cloud, where greater guarantees are made against > this type of problem. Maybe in time we'll come to see regular fixity > checks as less critically important than we do now. Julian's comment > that "certifying a particular storage environment to either store bits > correctly or alert on failure should be acceptable" is interesting. > I'm sure we'd all like to get away from having to run constant fixity > checks in our repositories, and would like to see digital preservation > management architecture evolve in this direction. For now though I'd > wager that most of us are constrained by the fact that fixity checking > remains essential to making sure our storage is doing what it claims > to do. > > Kyle > > -- > > Kyle R. Rimkus > > Assistant Professor > > Preservation Librarian > > University of Illinois at Urbana-Champaign > ---- > To subscribe, unsubscribe, or modify your subscription, please visit > http://mail.asis.org/mailman/listinfo/pasig-discuss > _______ > PASIG Webinars and conference material is at > http://www.preservationandarchivingsig.org/index.html > _______________________________________________ > Pasig-discuss mailing list > Pasig-discuss at mail.asis.org > http://mail.asis.org/mailman/listinfo/pasig-discuss From kyle.rimkus at gmail.com Wed Nov 15 20:55:14 2017 From: kyle.rimkus at gmail.com (Kyle Rimkus) Date: Thu, 16 Nov 2017 01:55:14 +0000 Subject: [Pasig-discuss] Experiences with S3-like object store service providers? In-Reply-To: References: <20171114100507.b554e26909f2beaf9f8ddbf6be9a6600.409ccf0e0f.wbe@email09.godaddy.com> <0E4FA879-5F8D-4995-A153-2130F28A25DC@dpn.org> <48E9420A4871584593FC3D435EF345AAEEF6FB09@MBX10.ad.oak.ox.ac.uk> Message-ID: I see your point. We had an instance of a number of files failing their regular fixity check, after having passed several times over the years. When we investigated, the files were the same size on disk as they always were, but were just strings of zeroes, and unreadable. Whatever caused this could have been due to a transmission error behind the scenes during some sort of server maintenance, but from where we stood as a repository that had outsourced our storage we had no way of knowing one way or the other, and the fixity checks saved us. On Wed, Nov 15, 2017 at 6:15 PM Neil Jefferies wrote: > Kyle, > > Can you expand... because that doesn't actually sound like a "fixity" > issue (which is kind of my point). Either the wrong thing has been > copied or the wrong thing has been stored i.e. an off-storage process > failure. For those cases checks should really be done at the time of > whatever operation caused the issue. Waiting for a check to pick it up > downstream at some later time is somewhat risky. > > My point is, these are not "fixity" checks, they are transmission > checks. > > Neil > > On 2017-11-15 22:59, Kyle Rimkus wrote: > > On the issue of fixity checking, I agree there is a great deal of > > misplaced paranoia around "bit-flipping" in contemporary storage > > technologies, but at the same time I've found regular fixity checks to > > be very useful in protecting against various types of human error that > > can be made in managing the storage itself or in scripted processes > > that interact with stored data. > > > > As an example, at my university we farm our storage out to a campus > > unit which stores two copies locally while we push a third into Amazon > > Glacier. Our preservation repository software's regular fixity checks > > of on-campus data have helped us keep our storage providers and > > ourselves honest. We have on more than one occasion discovered fixity > > errors that pointed to questionable server management. Regular fixity > > checking was what flagged us to these errors, and the storage of a > > third file copy off-site was what saved us. > > > > We are also looking into pushing more of our storage services into the > > (most likely AWS) cloud, where greater guarantees are made against > > this type of problem. Maybe in time we'll come to see regular fixity > > checks as less critically important than we do now. Julian's comment > > that "certifying a particular storage environment to either store bits > > correctly or alert on failure should be acceptable" is interesting. > > I'm sure we'd all like to get away from having to run constant fixity > > checks in our repositories, and would like to see digital preservation > > management architecture evolve in this direction. For now though I'd > > wager that most of us are constrained by the fact that fixity checking > > remains essential to making sure our storage is doing what it claims > > to do. > > > > Kyle > > > > -- > > > > Kyle R. Rimkus > > > > Assistant Professor > > > > Preservation Librarian > > > > University of Illinois at Urbana-Champaign > > ---- > > To subscribe, unsubscribe, or modify your subscription, please visit > > http://mail.asis.org/mailman/listinfo/pasig-discuss > > _______ > > PASIG Webinars and conference material is at > > http://www.preservationandarchivingsig.org/index.html > > _______________________________________________ > > Pasig-discuss mailing list > > Pasig-discuss at mail.asis.org > > http://mail.asis.org/mailman/listinfo/pasig-discuss > > -- Kyle R. Rimkus Assistant Professor Preservation Librarian University of Illinois at Urbana-Champaign -------------- next part -------------- An HTML attachment was scrubbed... URL: From artpasquinelli at stanford.edu Wed Nov 15 20:05:09 2017 From: artpasquinelli at stanford.edu (Arthur Pasquinelli) Date: Thu, 16 Nov 2017 01:05:09 +0000 Subject: [Pasig-discuss] Unsubscribe requests Message-ID: <3C8371AA-10B6-4126-A93C-E821B4AC59E8@stanford.edu> If someone wants to be dropped off the announce and/or discuss PASIG email list, just email me directly. Thanks! -- Art Pasquinelli LOCKSS Partnership Manager Stanford University Libraries Cell: 1-650-430-2441 artpasquinelli at stanford.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From akropilot at gmail.com Thu Nov 16 14:19:57 2017 From: akropilot at gmail.com (Mike Davis) Date: Thu, 16 Nov 2017 11:19:57 -0800 Subject: [Pasig-discuss] Experiences with S3-like object store service providers? In-Reply-To: References: <20171114100507.b554e26909f2beaf9f8ddbf6be9a6600.409ccf0e0f.wbe@email09.godaddy.com> <0E4FA879-5F8D-4995-A153-2130F28A25DC@dpn.org> Message-ID: Hi All I'll clarify AWS position on Etags and fixity checking, but I'm hesitant to hijack this thread into a "how S3 and Glacier works" thread. So feel free to follow up with me at mikdav at amazon.com to connect for any sort of deeper dialogue. I'll with with Art and Gail on possibly setting up a webinar for the group. To answer Julian, our S3 Etags are fixed at time of upload. They are never modified, and never reflect any background tasks associated with fixity checking, audit runs, media migration, or healing processes that occur over time. I do recognize the desire of the preservation community (who has a long history of bit-rot-related data-loss) to achieve more transparency on those housekeeping functions. But rest assured, AWS is fully committed to supporting "100-year archive" type workloads. S3 and Glacier's 11 9's of durability says a single object will be protected across 3 facilities, against concurrent media/host/power/network failures, across flood plains, discrete power grids, and discrete networking providers. We ask you to consider the histogram of things that cause data-loss: For those losses related to *infrastructure*, we reduce the odds of data-loss to zero, leaving access control, application bugs, malicious insiders, catalog loss, and workflow bugs as imposing far higher risks to data-integrity. So storage fixity checks are no substitute for *end-to-end* data validation checks, audit mechanisms, and DLP (see AWS Macie). Etag is not the best solution for fixity checks given its' non-deterministic nature (it's not necessarily a MD5) and the fact that it doesn't reflect internal integrity housekeeping. To store a hash with the object, one approach would be to append it as a S3 key-value tag (along with other shadow metadata that could use 11 9's of protection). Although we wouldn't see the ROI of running your own integrity crawl every N months, you could certainly do it...Spot compute, Glacier Bulk restore, and $0 in-region transfers *should* make this a relatively cheap operation. -Mike On Tue, Nov 14, 2017 at 7:34 PM, Julian M. Morley wrote: > Hi David, > > We haven?t done it yet, but we plan to do something similar to what you > describe below when we start storing SDR content in cloud providers. > > One open question (that I intend to ask Mike Davis about!) is how S3 > stores and/or updates ETag fields for uploaded objects. I think that when > content is recovered from Glacier to S3-IA, the ETag/MD5 of the file is > computed when the file is written to S3-IA. This means that a ?good enough? > fixity check can be done simply by recalling the data to S3-IA (relatively > cheap! No data egress charges!) and performing a simple metadata check of > the recovered object. Costs are easily projected/constrained simply by > deciding what % of your total data corpus you want ?in flight? at any one > time. > > This requires us to store checksums for all objects that we send to the > cloud in a separate datastore - we?ll be using something called the Trusted > Checksum Repository for this - a WORM-style database that stands to the > side of the SDR. > > I?m also assuming (again, this is a question for Mike) that S3 and other > cloud providers do perform periodic scrubs of their data, and use EC to > correct for any bad blocks that they find. For example, Wasabi explicitly > states that they validate md5 checksums of content every 90 days. > Presumably when they do that they?ll update the ETag if it has changed, > which again allows a metadata check to validate fixity. > > The same process for Glacier recovery to S3-IA works for Oracle Cloud > Storage Archive to Oracle Cloud Storage - the recovered object has a > freshly-generated ETag, which can then be compared against the stored > checksum when the file was first uploaded. No cloud compute instance > needed. Oracle has also told me (although this is not official) that they > write content to two tapes, and perform occasional CRC checks / migrations > of content to ensure that data on the tapes hasn?t gone bad, although > that?s not on a fixed schedule. > > For GCP and other online/non-vault options, running a compute instance is > probably still the best way to go. And if we want to unpack the object and > checksum all the elements it?s pretty much the only game in town - we plan > to do that for a random sampling of our content, adjusting our throughput > by varying recovery request rate and EBS disk sizes until we settle on an > acceptable rate. > > ( There *is* a wrinkle here with ETags and multi-part uploads that I?m not > getting into. It?s still possible to get and store a useful MD5, you just > need to do a little bit of extra legwork to get there. ) > > -- > Julian M. Morley > Technology Infrastructure Manager > Digital Library Systems & Services > Stanford University Libraries > > From: Pasig-discuss on behalf of David > Pcolar > Date: Tuesday, November 14, 2017 at 1:33 PM > To: "pasig-discuss at mail.asis.org" > > Subject: Re: [Pasig-discuss] Experiences with S3-like object store > service providers? > > Hi All, > > This is a great thread and it has surfaced a couple of new options for > cloud storage. Thanks to everyone who has contributed. > > I would like to address a common issue with the cloud storage vendors and > IO/retrieval costs. Since the likelihood of recovering a specific object > is low, cloud storage is quite economical from a simple 'recover the object > standpoint'. However, preservation repositories are touching those objects > frequently to perform fixity checks. I am not aware of any cloud platform > that will do fixity audits on demand, as detailed in NSDA Preservation > level 3 (Check fixity of content at fixed intervals;Maintain logs of fixity > info; supply audit on demand). > > A common method for providing these checks for content in S3 is to > instantiate an EC2 instance, mount the S3 bucket, and run checksums on the > objects. For Glacier, an additional step of staging the objects in an > accessible area for the EC2 instance is required. This results in I/O and > compute cycle fees that could dramatically inflate the cost of public cloud > storage over time. > > For those utilizing public cloud storage for preservation, how are you > addressing fixity checks and event audit capture? > > - Dave > > > David Pcolar > CTO, Digital Preservation Network > dave at dpn.org > > On Nov 14, 2017, at 3:22 PM, Mike Davis wrote: > > Hi Gail > > I appreciate the fact that public cloud pricing can be complex; it's a > function of the cost-following strategy. If the vendor incurs a cost, > whether from media, IO, or networking, it's passed along as discrete > charges to the customer. The alternative is opaquely bundling all the > costs, which reduces transparency and flexibility to follow commodity > curves downward. I believe it's publicly available data that S3 has dropped > capacity pricing for example at an average 10% (ish) per year since launch. > > But the idea that transaction and I/O fees dramatically inflate the cost > of public cloud storage is a myth, particularly for digital asset > management and archival. It is certainly possible to design a wonky > IO-heavy workload, place it on the wrong storage tier, and end up with > unexpectedly high costs. But for archival-oriented workloads, the costs of > *moving* data should never be greater than 10% of total or the situation > needs to be examined more closely. For example, we might find that large > objects are being inadvertently chunked into millions of 16KB objects by > some third party gateway solution, that would inflate the transaction count. > > Happy to give you (and PASIG) a deeper dive on IAAS storage strategies, > and to solve for long-term durability. > > -Mike Davis (AWS Storage) > > > On Tue, Nov 14, 2017 at 9:05 AM, wrote: > >> Thanks for chiming in Jacob! As always, great additional information. I >> think it's worth emphasizing that having an open and native data format >> independent of where the data lives - this is really what will enable >> multi-cloud workflow management. And also having federated data search >> of system and descriptive metadata across the namespace no matter where the >> data is stored (including across public- and on-prem cloud storage). These >> are what the newer cloud controller software, like Zenko, Starfish, Minio >> and (and other sw within some cloud services) can enable. >> >> Public cloud storage prices are racing to the bottom, but (as David >> Rosenthal and others have pointed out) often the "hidden" costs of pulling >> the data back will usually result in costs greater than a private cloud. >> >> Stuart - >> I just read a couple of Forrester papers on Total Economic Impact (TEI) >> of public clouds -- the ones I have URLs to are posted below and make a >> useful read: >> https://www.emc.com/collateral/analyst-reports/dell-emc-ecs- >> forrester-tei.pdf >> https://whitepapers.theregister.co.uk/paper/view/5835/the- >> total-economic-impact-of-scality-ring-with-dell-storage-servers >> They talk about Dell hardware for building our on-prem clouds (ECS from >> EMC and RING from Scality) and I believe you're working with HPE, but the >> maths should be similar to show savings over public cloud. That said, >> putting one or more copies in public cloud and managing them from one >> namespace would be ideal... I envision use cases where multi-cloud >> controller software will allow you to move data to the cloud service that >> fits the data. [Even if it's for long-term archival, there are times when >> preservation data services will need to be run (format migration, integrity >> checks, creating access or derivatives of moving or still images, etc).] >> Spin up some quick compute services or Hadoop (for other use case). >> >> This is a great topic - Julian and Stuart, all the best on your projects, >> please do let this alias know what you decide to go with! >> >> >> Gail >> >> >> Gail Truman >> Truman Technologies, LLC >> Certified Digital Archives Specialist, Society of American Archivists >> >> Protecting the world's digital heritage for future generations >> www.trumantechnologies.com >> facebook/TrumanTechnologies >> https://www.linkedin.com/in/gtruman >> >> +1 510 502 6497 <(510)%20502-6497> >> >> >> >> >> -------- Original Message -------- >> Subject: RE: [Pasig-discuss] Experiences with S3-like object store >> service providers? >> From: Jacob Farmer >> Date: Tue, November 14, 2017 7:03 am >> To: "Lewis, Stuart" , "Julian M. Morley" >> , gail at trumantechnologies.com, >> pasig-discuss at mail.asis.org >> >> Hi, Stuart. I thought I would weigh in on your plans. I?m a data storage >> consultant with about 30 years in the game. I?m also the founder of >> Starfish Storage which makes software products for virtualizing file >> systems and object stores. >> >> ? I think you are correct to manage multiple copies in the >> application layer. This gives you maximum control, ability to shift >> vendors, etc. Storage should always be thought of a stack that starts with >> the application and ends in the storage media. There can be multiple >> layers and the system architect should pick the right layer of abstraction >> for any given set of functionality. The higher in the stack, the greater >> your application awareness. >> >> ? By handling replication and addressing in your application, you >> should be able to switch object stores over time without much difficulty. >> As such, it does not matter so much which object store you buy. You could >> simply chase price. >> >> ? Oracle ? They are mysterious about what they are doing under the >> hood, but it does not matter. It?s a ?cloud?. They are so inexpensive. >> Use them as your second or third copy. I know that Oracle people monitor >> the news group. Maybe one will offer to connect you to a product manager >> who can describe the infrastructure. If not, I would be happy to connect >> you to folks on the US side of the pond. >> >> ? There is a particular vendor you did not list who might be >> interesting for you. They are called Caringo. They have been in the >> object storage business before S3 came to market, and thus they offer >> alternative addressing to the S3 bucket paradigm. They can emulate S3 just >> like everyone else, but S3 was designed by Amazon for the purpose of >> selling a storage service. It is not necessarily, the logical way to store >> objects for a digital library. If you are going to address objects >> directly from your application, they might have some unique value. I am >> happy to connect you to executives in the company. >> >> ? The other vendor worth looking at is Minio.IO . >> I just pointed Julian to them the other day. They provide an object >> interface in storage and could federate different cloud stores together. >> You might consider them for one of your copies. I still like the idea of >> doing your replication in the application. They are similar in concept to >> Zenko who Gail recommended earlier. >> >> ? POSIX File System Gateway ? My software company (Starfish >> Storage) has a file system gateway under development (ready for early >> adopters) that is ideal if you want a POSIX personality. We can take the >> contents of the object store and present it as a POSIX file system. >> >> o We map files 1:1 to objects. Most file system gateways on the >> market break up files into smaller objects, akin to blocks. >> o We support Active Directory perfectly, SMB-2, SMB-3, NFS-3, NFS-4 >> o We also work in-band or side-band to the object store. That means >> that you can use our POSIX interface simultaneously with S3. >> >> ? You probably also have use cases for Starfish, maybe as a >> migration tool from file to object or as an end-to-end fixity solution. We >> would be especially useful if you need to migrate files from your tape file >> system. >> o Starfish is a metadata and rules engine for file systems and object >> stores. Too many concepts to put in an email! >> >> I hope that helps. Message me offline if you want to discuss. I?m at >> the SuperComputer conference this week, so replies will be a bit slow. >> >> >> Jacob Farmer | Chief Technology Officer | Cambridge Computer | >> "Artists In Data Storage" >> Phone 781-250-3210 <(781)%20250-3210> | jfarmer at CambridgeComputer.com >> | www.CambridgeComputer.com >> >> >> From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf >> Of Lewis, Stuart >> Sent: Tuesday, November 14, 2017 4:26 AM >> To: 'Julian M. Morley' ; >> gail at trumantechnologies.com; pasig-discuss at mail.asis.org >> Subject: Re: [Pasig-discuss] Experiences with S3-like object store >> service providers? >> >> Hi Julian, Gail, all, >> >> At the National Library of Scotland we are also in the middle of some >> procurements to revamp our storage infrastructure for holding our digitised >> content archive. >> >> The approach historically taken here has been to use general purpose >> SANs, with a second copy placed on offline tape. The SANs have never been >> built to scale (so they fill and we buy another), and they are general >> purpose, trying their best (but often failing!) to run a mixed workload of >> everything from VMs to data archive and everything in between. >> >> We?re now wanting to move to three copies, two online and one offline (in >> the cloud if possible). >> >> For the online copies we?re about to get to tender to buy a >> geo-replicated object storage system, to be hosted in our data centres in >> Edinburgh and Glasgow. I suspect the likely candidates will be systems >> such as Dell EMC ECS, HPE+Scality, IBM ESS**, and Hitachi HPC. >> >> (** ESS rather than CleverSafe, as I think that is predicated on three >> datacentres, but we only want two). >> >> We?re also about to try a large-scale proof of concept with the Oracle >> Archive Cloud, but have an open question regarding its characteristics >> compared to local offline tape. Due to lack of transparency about what is >> actually going on behind the scenes in a cloud environment, we don?t know >> whether this gives us the same offline protection that tape gives us (e.g. >> much harder to corrupt or accidentally delete). >> >> We?re also purposefully not going to use the object storage system?s >> in-built cloud connectors for replication. We feel it might be safer for >> us to manage the replication to the cloud in our repository, rather than >> having a single vendor system manage all three copies at once. >> >> Critique of this plan is most welcome! >> >> Also happy to join in any offline discussion about this. >> >> Best wishes, >> >> >> Stuart Lewis >> Head of Digital >> >> National Library of Scotland >> George IV Bridge, Edinburgh EH1 1EW >> >> Tel: +44 (0) 131 623 3704 <+44%20131%20623%203704> >> Email:stuart.lewis at nls.uk >> Website:www.nls.uk >> Twitter:@stuartlewis >> >> >> >> >> >> From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org >> ] On Behalf Of Julian M. Morley >> Sent: 14 November 2017 04:28 >> To:gail at trumantechnologies.com; pasig-discuss at mail.asis.org >> Subject: Re: [Pasig-discuss] Experiences with S3-like object store >> service providers? >> >> Hi Gail, >> >> Sure - would be happy to chat with you. >> >> I?ve got Scality in my list of contenders - didn?t mention it here >> because my first few use cases are explicitly ?not on campus?, but I agree >> it?s definitely a fit for our main on prem system. As with any commercial >> software, ongoing licensing costs are a potential pain point for us. >> >> -- >> Julian M. Morley >> Technology Infrastructure Manager >> Digital Library Systems & Services >> Stanford University Libraries >> >> From: "gail at trumantechnologies.com" >> Date: Monday, November 13, 2017 at 4:06 PM >> To: Julian Morley , "pasig-discuss at mail.asis.org" < >> pasig-discuss at mail.asis.org> >> Cc: "gail at trumantechnologies.com" >> Subject: RE: [Pasig-discuss] Experiences with S3-like object store >> service providers? >> >> >> Hi Julian, thanks for sharing your list and comments. Very thorough list. >> I'd love to chat (and I'm close by in Oakland).... I've quite a lot of >> experience in the cloud storage field and would suggest you also take a >> look at multi-cloud connector technologies that will allow you to >> standardize on S3, but write to non-S3-based public cloud vendors. And to >> tier or move data among private and public clouds and do federated search >> on metadata across a single namespace (across these clouds). >> >> Check out a couple of interesting technologies: >> >> Open Source Zenko.io - offering S3 connect to AWS, >> Azure and Google (the latter 2 are coming shortly), and also >> Scality Connect for Azure Blog Storage - translates S3 API calls to Azure >> blob storage API calls. >> >> See the attached datasheet and also https://www.zenko.io/ >> >> I'd add Scality to your list -- see the Gartner magic quadrant they're >> shown in the Upper Right Visionary quadrant and are close to you in San >> Francisco. They talk S3, File, NFS/SMB, REST (CDMI etc), can tier off to >> public clouds, and have lots of multi-PB size customer installs. Gartner >> MQ is here: https://www.gartner.com/doc/reprints?id=1-4IE870C&ct=1 >> 71017&st=sb >> >> I'd be very interested in learning more about your use cases -- can we >> connect outside of this PASIG alias? >> >> Gail >> >> >> >> Gail Truman >> Truman Technologies, LLC >> Certified Digital Archives Specialist, Society of American Archivists >> >> *Protecting the world's digital heritage for future generations* >> www.trumantechnologies.com >> facebook/TrumanTechnologies >> https://www.linkedin.com/in/gtruman >> >> +1 510 502 6497 <(510)%20502-6497> >> >> >> >> >> -------- Original Message -------- >> Subject: [Pasig-discuss] Experiences with S3-like object store service >> providers? >> From: "Julian M. Morley" >> Date: Mon, November 13, 2017 12:43 pm >> To: "pasig-discuss at mail.asis.org" >> >> Hi everyone, >> >> I?ve currently got at least four use cases for an S3-compatible object >> store, spanning everything from traditional S3 through infrequent access >> stores to cold vaults. As a result I?ve spent considerable time researching >> options and prices, and was wondering if anyone on this list has any >> similar experiences they?d like to share. >> >> Our use cases range from hundreds of TB through to several PB, with >> different access patterns and comfort levels around redundancy and access. >> For most of them a 100% compatible S3 API is a requirement, but we can bend >> that a bit for the cold storage use case. We?re also considering >> local/on-prem object stores for one of the use cases - either rolling our >> own Ceph install, or using Dell/EMC ECS or SpectraLogic >> ArcticBlue/Blackpearl. >> >> The vendors that I?m looking at are: >> >> Amazon Web Services (S3, Infrequent Access S3 and S3-to-Glacier). >> This is the baseline. We have a direct connect pipe to AWS which reduces >> the pain of data egress considerably. >> >> IBM Cloud Bluemix (formerly CleverSafe) >> A good choice for multi-region redundancy, as they use erasure coding >> across regions - no ?catch up? replication - providing CRR at a cheaper >> price than AWS. If you only want to keep one copy of your data in the >> cloud, but have it be able to survive the loss of a region, this is the >> best choice (Google can also do this, but not with an S3 API or an >> infrequent access store). >> >> Dell/EMC Virtustream (no cold storage option) >> Uses EMC ECS hardware. Actually more expensive than AWS at retail pricing >> for standard object storage; their value add is tying Virtustream into >> on-prem ECS units. >> >> Iron Mountain Iron Cloud (Infrequent Access only) >> Also uses EMC ECS hardware. Designed primarily for backup/archive >> workloads (no big surprise there), but with no retrieval, egress or >> PUT/GET/POST charges. >> >> Oracle Cloud (cheapest cold storage option, but not S3 API) >> Uses Openstack Swift. Has the cheapest cloud-tape product (Oracle Cloud >> Storage Archive), but has recently increased prices to be closer to AWS >> Glacier. >> >> Google Cloud Platform (not an S3 API) >> Technically brilliant, but you have to be able to use their APIs. Their >> cold storage product is online (disk, not tape), but not as cheap as >> Glacier. >> >> Microsoft Azure (not an S3 API) >> Competitively priced, especially their Infrequent Access product, but >> again not an S3 API and their vault product is still in beta. >> >> Backblaze B2 (not an S3 API) >> Another backup/archive target, only slightly more expensive than Glacier, >> but online (no retrieval time or fees) and with significantly cheaper data >> egress rates than AWS. >> >> Wasabi Cloud >> Recently launched company from the team that brought you Carbonite. >> Ridiculously cheap S3 storage, but with a 90-day per-object minimum charge. *It?s >> cheaper and faster than Glacier*, both to store data and egress it, but >> there?s obvious concerns around company longevity. Would probably make a >> good second target if you have a multi-vendor requirement for your data. >> >> If anyone is interested in hearing more, or has any experience with any >> of these vendors, please speak up! >> >> -- >> Julian M. Morley >> Technology Infrastructure Manager >> Digital Library Systems & Services >> Stanford University Libraries >> ------------------------------ >> ---- >> To subscribe, unsubscribe, or modify your subscription, please visit >> http://mail.asis.org/mailman/listinfo/pasig-discuss >> _______ >> PASIG Webinars and conference material is at >> http://www.preservationandarchivingsig.org/index.html >> _______________________________________________ >> Pasig-discuss mailing list >> Pasig-discuss at mail.asis.org >> http://mail.asis.org/mailman/listinfo/pasig-discuss >> >> >> National Library of Scotland, Scottish Charity, No: SCO11086 >> This communication is intended for the addressee(s) only. If you are not >> the addressee please inform the sender and delete the email from your >> system. The statements and opinions expressed in this message are those of >> the author and do not necessarily reflect those of National Library of >> Scotland. This message is subject to the Data Protection Act 1998 and >> Freedom of Information (Scotland) Act 2002. No liability is accepted for >> any harm that may be caused to your systems or data by this message. >> Before you print please think about the ENVIRONMENT >> >> >> >> >> >> ---- >> To subscribe, unsubscribe, or modify your subscription, please visit >> http://mail.asis.org/mailman/listinfo/pasig-discuss >> _______ >> PASIG Webinars and conference material is at >> http://www.preservationandarchivingsig.org/index.html >> _______________________________________________ >> Pasig-discuss mailing list >> Pasig-discuss at mail.asis.org >> http://mail.asis.org/mailman/listinfo/pasig-discuss >> >> > > > -- > Michael Davis | akropilot at gmail.com | mobile 408-464-0441 > <(408)%20464-0441> > ---- > To subscribe, unsubscribe, or modify your subscription, please visit > http://mail.asis.org/mailman/listinfo/pasig-discuss > _______ > PASIG Webinars and conference material is at http://www. > preservationandarchivingsig.org/index.html > _______________________________________________ > Pasig-discuss mailing list > Pasig-discuss at mail.asis.org > http://mail.asis.org/mailman/listinfo/pasig-discuss > > > > ---- > To subscribe, unsubscribe, or modify your subscription, please visit > http://mail.asis.org/mailman/listinfo/pasig-discuss > _______ > PASIG Webinars and conference material is at http://www. > preservationandarchivingsig.org/index.html > _______________________________________________ > Pasig-discuss mailing list > Pasig-discuss at mail.asis.org > http://mail.asis.org/mailman/listinfo/pasig-discuss > > -- Michael Davis | akropilot at gmail.com | mobile 408-464-0441 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmorley at stanford.edu Thu Nov 16 15:01:34 2017 From: jmorley at stanford.edu (Julian M. Morley) Date: Thu, 16 Nov 2017 20:01:34 +0000 Subject: [Pasig-discuss] Experiences with S3-like object store service providers? In-Reply-To: References: <20171114100507.b554e26909f2beaf9f8ddbf6be9a6600.409ccf0e0f.wbe@email09.godaddy.com> <0E4FA879-5F8D-4995-A153-2130F28A25DC@dpn.org> Message-ID: <6D3B5B99-DD91-43D3-B4C3-C07A63DC5A94@stanford.edu> Thanks for the clarification, Mike - appreciate it. Follow-up question; if we copy an existing object (per http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html: "Objects created by the PUT Object, POST Object, or Copy operation, or through the AWS Management Console, and are encrypted by SSE-S3 or plaintext, have ETags that are an MD5 digest of their object data.? ), the docs imply that the new copy would have an MD5 digest. Is this calculated by AWS when the copy operation occurs, or are you just copying the existing object?s metadata? I?m thinking of what happens in the following use case: Client performs a multi-part upload using the RESTful API to S3, so we end up with an ETag that?s a tree hash that?s derived from the hashes of all the upload parts. Client then performs a copy using the RESTful API - is the ETag on the new object still a tree hash based on the original multipart upload, or does it get re-computed because presumably the back-end copy inside AWS isn?t a multi-part upload? (or even if the backend process does use multi-part transfers, do the checksums get recalculated?). Yes, I?m still looking for ways to get a checksum re-generated without having to run a compute instance and EBS holding disk. -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries From: Mike Davis > Date: Thursday, November 16, 2017 at 11:19 AM To: Julian Morley > Cc: David Pcolar >, "pasig-discuss at mail.asis.org" > Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi All I'll clarify AWS position on Etags and fixity checking, but I'm hesitant to hijack this thread into a "how S3 and Glacier works" thread. So feel free to follow up with me at mikdav at amazon.com to connect for any sort of deeper dialogue. I'll with with Art and Gail on possibly setting up a webinar for the group. To answer Julian, our S3 Etags are fixed at time of upload. They are never modified, and never reflect any background tasks associated with fixity checking, audit runs, media migration, or healing processes that occur over time. I do recognize the desire of the preservation community (who has a long history of bit-rot-related data-loss) to achieve more transparency on those housekeeping functions. But rest assured, AWS is fully committed to supporting "100-year archive" type workloads. S3 and Glacier's 11 9's of durability says a single object will be protected across 3 facilities, against concurrent media/host/power/network failures, across flood plains, discrete power grids, and discrete networking providers. We ask you to consider the histogram of things that cause data-loss: For those losses related to infrastructure, we reduce the odds of data-loss to zero, leaving access control, application bugs, malicious insiders, catalog loss, and workflow bugs as imposing far higher risks to data-integrity. So storage fixity checks are no substitute for end-to-end data validation checks, audit mechanisms, and DLP (see AWS Macie). Etag is not the best solution for fixity checks given its' non-deterministic nature (it's not necessarily a MD5) and the fact that it doesn't reflect internal integrity housekeeping. To store a hash with the object, one approach would be to append it as a S3 key-value tag (along with other shadow metadata that could use 11 9's of protection). Although we wouldn't see the ROI of running your own integrity crawl every N months, you could certainly do it...Spot compute, Glacier Bulk restore, and $0 in-region transfers *should* make this a relatively cheap operation. -Mike On Tue, Nov 14, 2017 at 7:34 PM, Julian M. Morley > wrote: Hi David, We haven?t done it yet, but we plan to do something similar to what you describe below when we start storing SDR content in cloud providers. One open question (that I intend to ask Mike Davis about!) is how S3 stores and/or updates ETag fields for uploaded objects. I think that when content is recovered from Glacier to S3-IA, the ETag/MD5 of the file is computed when the file is written to S3-IA. This means that a ?good enough? fixity check can be done simply by recalling the data to S3-IA (relatively cheap! No data egress charges!) and performing a simple metadata check of the recovered object. Costs are easily projected/constrained simply by deciding what % of your total data corpus you want ?in flight? at any one time. This requires us to store checksums for all objects that we send to the cloud in a separate datastore - we?ll be using something called the Trusted Checksum Repository for this - a WORM-style database that stands to the side of the SDR. I?m also assuming (again, this is a question for Mike) that S3 and other cloud providers do perform periodic scrubs of their data, and use EC to correct for any bad blocks that they find. For example, Wasabi explicitly states that they validate md5 checksums of content every 90 days. Presumably when they do that they?ll update the ETag if it has changed, which again allows a metadata check to validate fixity. The same process for Glacier recovery to S3-IA works for Oracle Cloud Storage Archive to Oracle Cloud Storage - the recovered object has a freshly-generated ETag, which can then be compared against the stored checksum when the file was first uploaded. No cloud compute instance needed. Oracle has also told me (although this is not official) that they write content to two tapes, and perform occasional CRC checks / migrations of content to ensure that data on the tapes hasn?t gone bad, although that?s not on a fixed schedule. For GCP and other online/non-vault options, running a compute instance is probably still the best way to go. And if we want to unpack the object and checksum all the elements it?s pretty much the only game in town - we plan to do that for a random sampling of our content, adjusting our throughput by varying recovery request rate and EBS disk sizes until we settle on an acceptable rate. ( There *is* a wrinkle here with ETags and multi-part uploads that I?m not getting into. It?s still possible to get and store a useful MD5, you just need to do a little bit of extra legwork to get there. ) -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries From: Pasig-discuss > on behalf of David Pcolar > Date: Tuesday, November 14, 2017 at 1:33 PM To: "pasig-discuss at mail.asis.org" > Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi All, This is a great thread and it has surfaced a couple of new options for cloud storage. Thanks to everyone who has contributed. I would like to address a common issue with the cloud storage vendors and IO/retrieval costs. Since the likelihood of recovering a specific object is low, cloud storage is quite economical from a simple 'recover the object standpoint'. However, preservation repositories are touching those objects frequently to perform fixity checks. I am not aware of any cloud platform that will do fixity audits on demand, as detailed in NSDA Preservation level 3 (Check fixity of content at fixed intervals;Maintain logs of fixity info; supply audit on demand). A common method for providing these checks for content in S3 is to instantiate an EC2 instance, mount the S3 bucket, and run checksums on the objects. For Glacier, an additional step of staging the objects in an accessible area for the EC2 instance is required. This results in I/O and compute cycle fees that could dramatically inflate the cost of public cloud storage over time. For those utilizing public cloud storage for preservation, how are you addressing fixity checks and event audit capture? - Dave David Pcolar CTO, Digital Preservation Network dave at dpn.org On Nov 14, 2017, at 3:22 PM, Mike Davis > wrote: Hi Gail I appreciate the fact that public cloud pricing can be complex; it's a function of the cost-following strategy. If the vendor incurs a cost, whether from media, IO, or networking, it's passed along as discrete charges to the customer. The alternative is opaquely bundling all the costs, which reduces transparency and flexibility to follow commodity curves downward. I believe it's publicly available data that S3 has dropped capacity pricing for example at an average 10% (ish) per year since launch. But the idea that transaction and I/O fees dramatically inflate the cost of public cloud storage is a myth, particularly for digital asset management and archival. It is certainly possible to design a wonky IO-heavy workload, place it on the wrong storage tier, and end up with unexpectedly high costs. But for archival-oriented workloads, the costs of moving data should never be greater than 10% of total or the situation needs to be examined more closely. For example, we might find that large objects are being inadvertently chunked into millions of 16KB objects by some third party gateway solution, that would inflate the transaction count. Happy to give you (and PASIG) a deeper dive on IAAS storage strategies, and to solve for long-term durability. -Mike Davis (AWS Storage) On Tue, Nov 14, 2017 at 9:05 AM, > wrote: Thanks for chiming in Jacob! As always, great additional information. I think it's worth emphasizing that having an open and native data format independent of where the data lives - this is really what will enable multi-cloud workflow management. And also having federated data search of system and descriptive metadata across the namespace no matter where the data is stored (including across public- and on-prem cloud storage). These are what the newer cloud controller software, like Zenko, Starfish, Minio and (and other sw within some cloud services) can enable. Public cloud storage prices are racing to the bottom, but (as David Rosenthal and others have pointed out) often the "hidden" costs of pulling the data back will usually result in costs greater than a private cloud. Stuart - I just read a couple of Forrester papers on Total Economic Impact (TEI) of public clouds -- the ones I have URLs to are posted below and make a useful read: https://www.emc.com/collateral/analyst-reports/dell-emc-ecs-forrester-tei.pdf https://whitepapers.theregister.co.uk/paper/view/5835/the-total-economic-impact-of-scality-ring-with-dell-storage-servers They talk about Dell hardware for building our on-prem clouds (ECS from EMC and RING from Scality) and I believe you're working with HPE, but the maths should be similar to show savings over public cloud. That said, putting one or more copies in public cloud and managing them from one namespace would be ideal... I envision use cases where multi-cloud controller software will allow you to move data to the cloud service that fits the data. [Even if it's for long-term archival, there are times when preservation data services will need to be run (format migration, integrity checks, creating access or derivatives of moving or still images, etc).] Spin up some quick compute services or Hadoop (for other use case). This is a great topic - Julian and Stuart, all the best on your projects, please do let this alias know what you decide to go with! Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers? From: Jacob Farmer > Date: Tue, November 14, 2017 7:03 am To: "Lewis, Stuart" >, "Julian M. Morley" >, gail at trumantechnologies.com, pasig-discuss at mail.asis.org Hi, Stuart. I thought I would weigh in on your plans. I?m a data storage consultant with about 30 years in the game. I?m also the founder of Starfish Storage which makes software products for virtualizing file systems and object stores. ? I think you are correct to manage multiple copies in the application layer. This gives you maximum control, ability to shift vendors, etc. Storage should always be thought of a stack that starts with the application and ends in the storage media. There can be multiple layers and the system architect should pick the right layer of abstraction for any given set of functionality. The higher in the stack, the greater your application awareness. ? By handling replication and addressing in your application, you should be able to switch object stores over time without much difficulty. As such, it does not matter so much which object store you buy. You could simply chase price. ? Oracle ? They are mysterious about what they are doing under the hood, but it does not matter. It?s a ?cloud?. They are so inexpensive. Use them as your second or third copy. I know that Oracle people monitor the news group. Maybe one will offer to connect you to a product manager who can describe the infrastructure. If not, I would be happy to connect you to folks on the US side of the pond. ? There is a particular vendor you did not list who might be interesting for you. They are called Caringo. They have been in the object storage business before S3 came to market, and thus they offer alternative addressing to the S3 bucket paradigm. They can emulate S3 just like everyone else, but S3 was designed by Amazon for the purpose of selling a storage service. It is not necessarily, the logical way to store objects for a digital library. If you are going to address objects directly from your application, they might have some unique value. I am happy to connect you to executives in the company. ? The other vendor worth looking at is Minio.IO. I just pointed Julian to them the other day. They provide an object interface in storage and could federate different cloud stores together. You might consider them for one of your copies. I still like the idea of doing your replication in the application. They are similar in concept to Zenko who Gail recommended earlier. ? POSIX File System Gateway ? My software company (Starfish Storage) has a file system gateway under development (ready for early adopters) that is ideal if you want a POSIX personality. We can take the contents of the object store and present it as a POSIX file system. o We map files 1:1 to objects. Most file system gateways on the market break up files into smaller objects, akin to blocks. o We support Active Directory perfectly, SMB-2, SMB-3, NFS-3, NFS-4 o We also work in-band or side-band to the object store. That means that you can use our POSIX interface simultaneously with S3. ? You probably also have use cases for Starfish, maybe as a migration tool from file to object or as an end-to-end fixity solution. We would be especially useful if you need to migrate files from your tape file system. o Starfish is a metadata and rules engine for file systems and object stores. Too many concepts to put in an email! I hope that helps. Message me offline if you want to discuss. I?m at the SuperComputer conference this week, so replies will be a bit slow. Jacob Farmer | Chief Technology Officer | Cambridge Computer | "Artists In Data Storage" Phone 781-250-3210 | jfarmer at CambridgeComputer.com | www.CambridgeComputer.com From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Lewis, Stuart Sent: Tuesday, November 14, 2017 4:26 AM To: 'Julian M. Morley' >; gail at trumantechnologies.com; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Julian, Gail, all, At the National Library of Scotland we are also in the middle of some procurements to revamp our storage infrastructure for holding our digitised content archive. The approach historically taken here has been to use general purpose SANs, with a second copy placed on offline tape. The SANs have never been built to scale (so they fill and we buy another), and they are general purpose, trying their best (but often failing!) to run a mixed workload of everything from VMs to data archive and everything in between. We?re now wanting to move to three copies, two online and one offline (in the cloud if possible). For the online copies we?re about to get to tender to buy a geo-replicated object storage system, to be hosted in our data centres in Edinburgh and Glasgow. I suspect the likely candidates will be systems such as Dell EMC ECS, HPE+Scality, IBM ESS**, and Hitachi HPC. (** ESS rather than CleverSafe, as I think that is predicated on three datacentres, but we only want two). We?re also about to try a large-scale proof of concept with the Oracle Archive Cloud, but have an open question regarding its characteristics compared to local offline tape. Due to lack of transparency about what is actually going on behind the scenes in a cloud environment, we don?t know whether this gives us the same offline protection that tape gives us (e.g. much harder to corrupt or accidentally delete). We?re also purposefully not going to use the object storage system?s in-built cloud connectors for replication. We feel it might be safer for us to manage the replication to the cloud in our repository, rather than having a single vendor system manage all three copies at once. Critique of this plan is most welcome! Also happy to join in any offline discussion about this. Best wishes, Stuart Lewis Head of Digital National Library of Scotland George IV Bridge, Edinburgh EH1 1EW Tel: +44 (0) 131 623 3704 Email:stuart.lewis at nls.uk Website:www.nls.uk Twitter:@stuartlewis From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Julian M. Morley Sent: 14 November 2017 04:28 To:gail at trumantechnologies.com; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Gail, Sure - would be happy to chat with you. I?ve got Scality in my list of contenders - didn?t mention it here because my first few use cases are explicitly ?not on campus?, but I agree it?s definitely a fit for our main on prem system. As with any commercial software, ongoing licensing costs are a potential pain point for us. -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries From: "gail at trumantechnologies.com" > Date: Monday, November 13, 2017 at 4:06 PM To: Julian Morley >, "pasig-discuss at mail.asis.org" > Cc: "gail at trumantechnologies.com" > Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Julian, thanks for sharing your list and comments. Very thorough list. I'd love to chat (and I'm close by in Oakland).... I've quite a lot of experience in the cloud storage field and would suggest you also take a look at multi-cloud connector technologies that will allow you to standardize on S3, but write to non-S3-based public cloud vendors. And to tier or move data among private and public clouds and do federated search on metadata across a single namespace (across these clouds). Check out a couple of interesting technologies: Open Source Zenko.io - offering S3 connect to AWS, Azure and Google (the latter 2 are coming shortly), and also Scality Connect for Azure Blog Storage - translates S3 API calls to Azure blob storage API calls. See the attached datasheet and also https://www.zenko.io/ I'd add Scality to your list -- see the Gartner magic quadrant they're shown in the Upper Right Visionary quadrant and are close to you in San Francisco. They talk S3, File, NFS/SMB, REST (CDMI etc), can tier off to public clouds, and have lots of multi-PB size customer installs. Gartner MQ is here: https://www.gartner.com/doc/reprints?id=1-4IE870C&ct=171017&st=sb I'd be very interested in learning more about your use cases -- can we connect outside of this PASIG alias? Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: [Pasig-discuss] Experiences with S3-like object store service providers? From: "Julian M. Morley" > Date: Mon, November 13, 2017 12:43 pm To: "pasig-discuss at mail.asis.org" > Hi everyone, I?ve currently got at least four use cases for an S3-compatible object store, spanning everything from traditional S3 through infrequent access stores to cold vaults. As a result I?ve spent considerable time researching options and prices, and was wondering if anyone on this list has any similar experiences they?d like to share. Our use cases range from hundreds of TB through to several PB, with different access patterns and comfort levels around redundancy and access. For most of them a 100% compatible S3 API is a requirement, but we can bend that a bit for the cold storage use case. We?re also considering local/on-prem object stores for one of the use cases - either rolling our own Ceph install, or using Dell/EMC ECS or SpectraLogic ArcticBlue/Blackpearl. The vendors that I?m looking at are: Amazon Web Services (S3, Infrequent Access S3 and S3-to-Glacier). This is the baseline. We have a direct connect pipe to AWS which reduces the pain of data egress considerably. IBM Cloud Bluemix (formerly CleverSafe) A good choice for multi-region redundancy, as they use erasure coding across regions - no ?catch up? replication - providing CRR at a cheaper price than AWS. If you only want to keep one copy of your data in the cloud, but have it be able to survive the loss of a region, this is the best choice (Google can also do this, but not with an S3 API or an infrequent access store). Dell/EMC Virtustream (no cold storage option) Uses EMC ECS hardware. Actually more expensive than AWS at retail pricing for standard object storage; their value add is tying Virtustream into on-prem ECS units. Iron Mountain Iron Cloud (Infrequent Access only) Also uses EMC ECS hardware. Designed primarily for backup/archive workloads (no big surprise there), but with no retrieval, egress or PUT/GET/POST charges. Oracle Cloud (cheapest cold storage option, but not S3 API) Uses Openstack Swift. Has the cheapest cloud-tape product (Oracle Cloud Storage Archive), but has recently increased prices to be closer to AWS Glacier. Google Cloud Platform (not an S3 API) Technically brilliant, but you have to be able to use their APIs. Their cold storage product is online (disk, not tape), but not as cheap as Glacier. Microsoft Azure (not an S3 API) Competitively priced, especially their Infrequent Access product, but again not an S3 API and their vault product is still in beta. Backblaze B2 (not an S3 API) Another backup/archive target, only slightly more expensive than Glacier, but online (no retrieval time or fees) and with significantly cheaper data egress rates than AWS. Wasabi Cloud Recently launched company from the team that brought you Carbonite. Ridiculously cheap S3 storage, but with a 90-day per-object minimum charge. It?s cheaper and faster than Glacier, both to store data and egress it, but there?s obvious concerns around company longevity. Would probably make a good second target if you have a multi-vendor requirement for your data. If anyone is interested in hearing more, or has any experience with any of these vendors, please speak up! -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries ________________________________ ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss National Library of Scotland, Scottish Charity, No: SCO11086 This communication is intended for the addressee(s) only. If you are not the addressee please inform the sender and delete the email from your system. The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of National Library of Scotland. This message is subject to the Data Protection Act 1998 and Freedom of Information (Scotland) Act 2002. No liability is accepted for any harm that may be caused to your systems or data by this message. Before you print please think about the ENVIRONMENT ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -- Michael Davis | akropilot at gmail.com | mobile 408-464-0441 ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -- Michael Davis | akropilot at gmail.com | mobile 408-464-0441 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmorley at stanford.edu Thu Nov 16 16:50:52 2017 From: jmorley at stanford.edu (Julian M. Morley) Date: Thu, 16 Nov 2017 21:50:52 +0000 Subject: [Pasig-discuss] Experiences with S3-like object store service providers? In-Reply-To: <6D3B5B99-DD91-43D3-B4C3-C07A63DC5A94@stanford.edu> References: <20171114100507.b554e26909f2beaf9f8ddbf6be9a6600.409ccf0e0f.wbe@email09.godaddy.com> <0E4FA879-5F8D-4995-A153-2130F28A25DC@dpn.org> <6D3B5B99-DD91-43D3-B4C3-C07A63DC5A94@stanford.edu> Message-ID: No matter - I realized I can just test this, and did. It works! Original multi-part upload: Etag 3db0cc62a747be7290c04938aec5aa99-13 Copied it via API in the same bucket: Etag 3f55bc756bef418d425e3251f4c3f16d ?and that ETag matches the MD5 of the original file on my local system. magnus:Downloads jmorley$ md5 VMware-ClientIntegrationPlugin-6.0.0.mac64.dmg MD5 (VMware-ClientIntegrationPlugin-6.0.0.mac64.dmg) = 3f55bc756bef418d425e3251f4c3f16d OK, we can get back to talking about other cloud providers now. :-) -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries From: Julian Morley > Date: Thursday, November 16, 2017 at 12:01 PM To: Mike Davis > Cc: David Pcolar >, "pasig-discuss at mail.asis.org" > Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Thanks for the clarification, Mike - appreciate it. Follow-up question; if we copy an existing object (per http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html: "Objects created by the PUT Object, POST Object, or Copy operation, or through the AWS Management Console, and are encrypted by SSE-S3 or plaintext, have ETags that are an MD5 digest of their object data.? ), the docs imply that the new copy would have an MD5 digest. Is this calculated by AWS when the copy operation occurs, or are you just copying the existing object?s metadata? I?m thinking of what happens in the following use case: Client performs a multi-part upload using the RESTful API to S3, so we end up with an ETag that?s a tree hash that?s derived from the hashes of all the upload parts. Client then performs a copy using the RESTful API - is the ETag on the new object still a tree hash based on the original multipart upload, or does it get re-computed because presumably the back-end copy inside AWS isn?t a multi-part upload? (or even if the backend process does use multi-part transfers, do the checksums get recalculated?). Yes, I?m still looking for ways to get a checksum re-generated without having to run a compute instance and EBS holding disk. -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries From: Mike Davis > Date: Thursday, November 16, 2017 at 11:19 AM To: Julian Morley > Cc: David Pcolar >, "pasig-discuss at mail.asis.org" > Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi All I'll clarify AWS position on Etags and fixity checking, but I'm hesitant to hijack this thread into a "how S3 and Glacier works" thread. So feel free to follow up with me at mikdav at amazon.com to connect for any sort of deeper dialogue. I'll with with Art and Gail on possibly setting up a webinar for the group. To answer Julian, our S3 Etags are fixed at time of upload. They are never modified, and never reflect any background tasks associated with fixity checking, audit runs, media migration, or healing processes that occur over time. I do recognize the desire of the preservation community (who has a long history of bit-rot-related data-loss) to achieve more transparency on those housekeeping functions. But rest assured, AWS is fully committed to supporting "100-year archive" type workloads. S3 and Glacier's 11 9's of durability says a single object will be protected across 3 facilities, against concurrent media/host/power/network failures, across flood plains, discrete power grids, and discrete networking providers. We ask you to consider the histogram of things that cause data-loss: For those losses related to infrastructure, we reduce the odds of data-loss to zero, leaving access control, application bugs, malicious insiders, catalog loss, and workflow bugs as imposing far higher risks to data-integrity. So storage fixity checks are no substitute for end-to-end data validation checks, audit mechanisms, and DLP (see AWS Macie). Etag is not the best solution for fixity checks given its' non-deterministic nature (it's not necessarily a MD5) and the fact that it doesn't reflect internal integrity housekeeping. To store a hash with the object, one approach would be to append it as a S3 key-value tag (along with other shadow metadata that could use 11 9's of protection). Although we wouldn't see the ROI of running your own integrity crawl every N months, you could certainly do it...Spot compute, Glacier Bulk restore, and $0 in-region transfers *should* make this a relatively cheap operation. -Mike On Tue, Nov 14, 2017 at 7:34 PM, Julian M. Morley > wrote: Hi David, We haven?t done it yet, but we plan to do something similar to what you describe below when we start storing SDR content in cloud providers. One open question (that I intend to ask Mike Davis about!) is how S3 stores and/or updates ETag fields for uploaded objects. I think that when content is recovered from Glacier to S3-IA, the ETag/MD5 of the file is computed when the file is written to S3-IA. This means that a ?good enough? fixity check can be done simply by recalling the data to S3-IA (relatively cheap! No data egress charges!) and performing a simple metadata check of the recovered object. Costs are easily projected/constrained simply by deciding what % of your total data corpus you want ?in flight? at any one time. This requires us to store checksums for all objects that we send to the cloud in a separate datastore - we?ll be using something called the Trusted Checksum Repository for this - a WORM-style database that stands to the side of the SDR. I?m also assuming (again, this is a question for Mike) that S3 and other cloud providers do perform periodic scrubs of their data, and use EC to correct for any bad blocks that they find. For example, Wasabi explicitly states that they validate md5 checksums of content every 90 days. Presumably when they do that they?ll update the ETag if it has changed, which again allows a metadata check to validate fixity. The same process for Glacier recovery to S3-IA works for Oracle Cloud Storage Archive to Oracle Cloud Storage - the recovered object has a freshly-generated ETag, which can then be compared against the stored checksum when the file was first uploaded. No cloud compute instance needed. Oracle has also told me (although this is not official) that they write content to two tapes, and perform occasional CRC checks / migrations of content to ensure that data on the tapes hasn?t gone bad, although that?s not on a fixed schedule. For GCP and other online/non-vault options, running a compute instance is probably still the best way to go. And if we want to unpack the object and checksum all the elements it?s pretty much the only game in town - we plan to do that for a random sampling of our content, adjusting our throughput by varying recovery request rate and EBS disk sizes until we settle on an acceptable rate. ( There *is* a wrinkle here with ETags and multi-part uploads that I?m not getting into. It?s still possible to get and store a useful MD5, you just need to do a little bit of extra legwork to get there. ) -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries From: Pasig-discuss > on behalf of David Pcolar > Date: Tuesday, November 14, 2017 at 1:33 PM To: "pasig-discuss at mail.asis.org" > Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi All, This is a great thread and it has surfaced a couple of new options for cloud storage. Thanks to everyone who has contributed. I would like to address a common issue with the cloud storage vendors and IO/retrieval costs. Since the likelihood of recovering a specific object is low, cloud storage is quite economical from a simple 'recover the object standpoint'. However, preservation repositories are touching those objects frequently to perform fixity checks. I am not aware of any cloud platform that will do fixity audits on demand, as detailed in NSDA Preservation level 3 (Check fixity of content at fixed intervals;Maintain logs of fixity info; supply audit on demand). A common method for providing these checks for content in S3 is to instantiate an EC2 instance, mount the S3 bucket, and run checksums on the objects. For Glacier, an additional step of staging the objects in an accessible area for the EC2 instance is required. This results in I/O and compute cycle fees that could dramatically inflate the cost of public cloud storage over time. For those utilizing public cloud storage for preservation, how are you addressing fixity checks and event audit capture? - Dave David Pcolar CTO, Digital Preservation Network dave at dpn.org On Nov 14, 2017, at 3:22 PM, Mike Davis > wrote: Hi Gail I appreciate the fact that public cloud pricing can be complex; it's a function of the cost-following strategy. If the vendor incurs a cost, whether from media, IO, or networking, it's passed along as discrete charges to the customer. The alternative is opaquely bundling all the costs, which reduces transparency and flexibility to follow commodity curves downward. I believe it's publicly available data that S3 has dropped capacity pricing for example at an average 10% (ish) per year since launch. But the idea that transaction and I/O fees dramatically inflate the cost of public cloud storage is a myth, particularly for digital asset management and archival. It is certainly possible to design a wonky IO-heavy workload, place it on the wrong storage tier, and end up with unexpectedly high costs. But for archival-oriented workloads, the costs of moving data should never be greater than 10% of total or the situation needs to be examined more closely. For example, we might find that large objects are being inadvertently chunked into millions of 16KB objects by some third party gateway solution, that would inflate the transaction count. Happy to give you (and PASIG) a deeper dive on IAAS storage strategies, and to solve for long-term durability. -Mike Davis (AWS Storage) On Tue, Nov 14, 2017 at 9:05 AM, > wrote: Thanks for chiming in Jacob! As always, great additional information. I think it's worth emphasizing that having an open and native data format independent of where the data lives - this is really what will enable multi-cloud workflow management. And also having federated data search of system and descriptive metadata across the namespace no matter where the data is stored (including across public- and on-prem cloud storage). These are what the newer cloud controller software, like Zenko, Starfish, Minio and (and other sw within some cloud services) can enable. Public cloud storage prices are racing to the bottom, but (as David Rosenthal and others have pointed out) often the "hidden" costs of pulling the data back will usually result in costs greater than a private cloud. Stuart - I just read a couple of Forrester papers on Total Economic Impact (TEI) of public clouds -- the ones I have URLs to are posted below and make a useful read: https://www.emc.com/collateral/analyst-reports/dell-emc-ecs-forrester-tei.pdf https://whitepapers.theregister.co.uk/paper/view/5835/the-total-economic-impact-of-scality-ring-with-dell-storage-servers They talk about Dell hardware for building our on-prem clouds (ECS from EMC and RING from Scality) and I believe you're working with HPE, but the maths should be similar to show savings over public cloud. That said, putting one or more copies in public cloud and managing them from one namespace would be ideal... I envision use cases where multi-cloud controller software will allow you to move data to the cloud service that fits the data. [Even if it's for long-term archival, there are times when preservation data services will need to be run (format migration, integrity checks, creating access or derivatives of moving or still images, etc).] Spin up some quick compute services or Hadoop (for other use case). This is a great topic - Julian and Stuart, all the best on your projects, please do let this alias know what you decide to go with! Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers? From: Jacob Farmer > Date: Tue, November 14, 2017 7:03 am To: "Lewis, Stuart" >, "Julian M. Morley" >, gail at trumantechnologies.com, pasig-discuss at mail.asis.org Hi, Stuart. I thought I would weigh in on your plans. I?m a data storage consultant with about 30 years in the game. I?m also the founder of Starfish Storage which makes software products for virtualizing file systems and object stores. ? I think you are correct to manage multiple copies in the application layer. This gives you maximum control, ability to shift vendors, etc. Storage should always be thought of a stack that starts with the application and ends in the storage media. There can be multiple layers and the system architect should pick the right layer of abstraction for any given set of functionality. The higher in the stack, the greater your application awareness. ? By handling replication and addressing in your application, you should be able to switch object stores over time without much difficulty. As such, it does not matter so much which object store you buy. You could simply chase price. ? Oracle ? They are mysterious about what they are doing under the hood, but it does not matter. It?s a ?cloud?. They are so inexpensive. Use them as your second or third copy. I know that Oracle people monitor the news group. Maybe one will offer to connect you to a product manager who can describe the infrastructure. If not, I would be happy to connect you to folks on the US side of the pond. ? There is a particular vendor you did not list who might be interesting for you. They are called Caringo. They have been in the object storage business before S3 came to market, and thus they offer alternative addressing to the S3 bucket paradigm. They can emulate S3 just like everyone else, but S3 was designed by Amazon for the purpose of selling a storage service. It is not necessarily, the logical way to store objects for a digital library. If you are going to address objects directly from your application, they might have some unique value. I am happy to connect you to executives in the company. ? The other vendor worth looking at is Minio.IO. I just pointed Julian to them the other day. They provide an object interface in storage and could federate different cloud stores together. You might consider them for one of your copies. I still like the idea of doing your replication in the application. They are similar in concept to Zenko who Gail recommended earlier. ? POSIX File System Gateway ? My software company (Starfish Storage) has a file system gateway under development (ready for early adopters) that is ideal if you want a POSIX personality. We can take the contents of the object store and present it as a POSIX file system. o We map files 1:1 to objects. Most file system gateways on the market break up files into smaller objects, akin to blocks. o We support Active Directory perfectly, SMB-2, SMB-3, NFS-3, NFS-4 o We also work in-band or side-band to the object store. That means that you can use our POSIX interface simultaneously with S3. ? You probably also have use cases for Starfish, maybe as a migration tool from file to object or as an end-to-end fixity solution. We would be especially useful if you need to migrate files from your tape file system. o Starfish is a metadata and rules engine for file systems and object stores. Too many concepts to put in an email! I hope that helps. Message me offline if you want to discuss. I?m at the SuperComputer conference this week, so replies will be a bit slow. Jacob Farmer | Chief Technology Officer | Cambridge Computer | "Artists In Data Storage" Phone 781-250-3210 | jfarmer at CambridgeComputer.com | www.CambridgeComputer.com From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Lewis, Stuart Sent: Tuesday, November 14, 2017 4:26 AM To: 'Julian M. Morley' >; gail at trumantechnologies.com; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Julian, Gail, all, At the National Library of Scotland we are also in the middle of some procurements to revamp our storage infrastructure for holding our digitised content archive. The approach historically taken here has been to use general purpose SANs, with a second copy placed on offline tape. The SANs have never been built to scale (so they fill and we buy another), and they are general purpose, trying their best (but often failing!) to run a mixed workload of everything from VMs to data archive and everything in between. We?re now wanting to move to three copies, two online and one offline (in the cloud if possible). For the online copies we?re about to get to tender to buy a geo-replicated object storage system, to be hosted in our data centres in Edinburgh and Glasgow. I suspect the likely candidates will be systems such as Dell EMC ECS, HPE+Scality, IBM ESS**, and Hitachi HPC. (** ESS rather than CleverSafe, as I think that is predicated on three datacentres, but we only want two). We?re also about to try a large-scale proof of concept with the Oracle Archive Cloud, but have an open question regarding its characteristics compared to local offline tape. Due to lack of transparency about what is actually going on behind the scenes in a cloud environment, we don?t know whether this gives us the same offline protection that tape gives us (e.g. much harder to corrupt or accidentally delete). We?re also purposefully not going to use the object storage system?s in-built cloud connectors for replication. We feel it might be safer for us to manage the replication to the cloud in our repository, rather than having a single vendor system manage all three copies at once. Critique of this plan is most welcome! Also happy to join in any offline discussion about this. Best wishes, Stuart Lewis Head of Digital National Library of Scotland George IV Bridge, Edinburgh EH1 1EW Tel: +44 (0) 131 623 3704 Email:stuart.lewis at nls.uk Website:www.nls.uk Twitter:@stuartlewis From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Julian M. Morley Sent: 14 November 2017 04:28 To:gail at trumantechnologies.com; pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Gail, Sure - would be happy to chat with you. I?ve got Scality in my list of contenders - didn?t mention it here because my first few use cases are explicitly ?not on campus?, but I agree it?s definitely a fit for our main on prem system. As with any commercial software, ongoing licensing costs are a potential pain point for us. -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries From: "gail at trumantechnologies.com" > Date: Monday, November 13, 2017 at 4:06 PM To: Julian Morley >, "pasig-discuss at mail.asis.org" > Cc: "gail at trumantechnologies.com" > Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers? Hi Julian, thanks for sharing your list and comments. Very thorough list. I'd love to chat (and I'm close by in Oakland).... I've quite a lot of experience in the cloud storage field and would suggest you also take a look at multi-cloud connector technologies that will allow you to standardize on S3, but write to non-S3-based public cloud vendors. And to tier or move data among private and public clouds and do federated search on metadata across a single namespace (across these clouds). Check out a couple of interesting technologies: Open Source Zenko.io - offering S3 connect to AWS, Azure and Google (the latter 2 are coming shortly), and also Scality Connect for Azure Blog Storage - translates S3 API calls to Azure blob storage API calls. See the attached datasheet and also https://www.zenko.io/ I'd add Scality to your list -- see the Gartner magic quadrant they're shown in the Upper Right Visionary quadrant and are close to you in San Francisco. They talk S3, File, NFS/SMB, REST (CDMI etc), can tier off to public clouds, and have lots of multi-PB size customer installs. Gartner MQ is here: https://www.gartner.com/doc/reprints?id=1-4IE870C&ct=171017&st=sb I'd be very interested in learning more about your use cases -- can we connect outside of this PASIG alias? Gail Gail Truman Truman Technologies, LLC Certified Digital Archives Specialist, Society of American Archivists Protecting the world's digital heritage for future generations www.trumantechnologies.com facebook/TrumanTechnologies https://www.linkedin.com/in/gtruman +1 510 502 6497 -------- Original Message -------- Subject: [Pasig-discuss] Experiences with S3-like object store service providers? From: "Julian M. Morley" > Date: Mon, November 13, 2017 12:43 pm To: "pasig-discuss at mail.asis.org" > Hi everyone, I?ve currently got at least four use cases for an S3-compatible object store, spanning everything from traditional S3 through infrequent access stores to cold vaults. As a result I?ve spent considerable time researching options and prices, and was wondering if anyone on this list has any similar experiences they?d like to share. Our use cases range from hundreds of TB through to several PB, with different access patterns and comfort levels around redundancy and access. For most of them a 100% compatible S3 API is a requirement, but we can bend that a bit for the cold storage use case. We?re also considering local/on-prem object stores for one of the use cases - either rolling our own Ceph install, or using Dell/EMC ECS or SpectraLogic ArcticBlue/Blackpearl. The vendors that I?m looking at are: Amazon Web Services (S3, Infrequent Access S3 and S3-to-Glacier). This is the baseline. We have a direct connect pipe to AWS which reduces the pain of data egress considerably. IBM Cloud Bluemix (formerly CleverSafe) A good choice for multi-region redundancy, as they use erasure coding across regions - no ?catch up? replication - providing CRR at a cheaper price than AWS. If you only want to keep one copy of your data in the cloud, but have it be able to survive the loss of a region, this is the best choice (Google can also do this, but not with an S3 API or an infrequent access store). Dell/EMC Virtustream (no cold storage option) Uses EMC ECS hardware. Actually more expensive than AWS at retail pricing for standard object storage; their value add is tying Virtustream into on-prem ECS units. Iron Mountain Iron Cloud (Infrequent Access only) Also uses EMC ECS hardware. Designed primarily for backup/archive workloads (no big surprise there), but with no retrieval, egress or PUT/GET/POST charges. Oracle Cloud (cheapest cold storage option, but not S3 API) Uses Openstack Swift. Has the cheapest cloud-tape product (Oracle Cloud Storage Archive), but has recently increased prices to be closer to AWS Glacier. Google Cloud Platform (not an S3 API) Technically brilliant, but you have to be able to use their APIs. Their cold storage product is online (disk, not tape), but not as cheap as Glacier. Microsoft Azure (not an S3 API) Competitively priced, especially their Infrequent Access product, but again not an S3 API and their vault product is still in beta. Backblaze B2 (not an S3 API) Another backup/archive target, only slightly more expensive than Glacier, but online (no retrieval time or fees) and with significantly cheaper data egress rates than AWS. Wasabi Cloud Recently launched company from the team that brought you Carbonite. Ridiculously cheap S3 storage, but with a 90-day per-object minimum charge. It?s cheaper and faster than Glacier, both to store data and egress it, but there?s obvious concerns around company longevity. Would probably make a good second target if you have a multi-vendor requirement for your data. If anyone is interested in hearing more, or has any experience with any of these vendors, please speak up! -- Julian M. Morley Technology Infrastructure Manager Digital Library Systems & Services Stanford University Libraries ________________________________ ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss National Library of Scotland, Scottish Charity, No: SCO11086 This communication is intended for the addressee(s) only. If you are not the addressee please inform the sender and delete the email from your system. The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of National Library of Scotland. This message is subject to the Data Protection Act 1998 and Freedom of Information (Scotland) Act 2002. No liability is accepted for any harm that may be caused to your systems or data by this message. Before you print please think about the ENVIRONMENT ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -- Michael Davis | akropilot at gmail.com | mobile 408-464-0441 ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss ---- To subscribe, unsubscribe, or modify your subscription, please visit http://mail.asis.org/mailman/listinfo/pasig-discuss _______ PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html _______________________________________________ Pasig-discuss mailing list Pasig-discuss at mail.asis.org http://mail.asis.org/mailman/listinfo/pasig-discuss -- Michael Davis | akropilot at gmail.com | mobile 408-464-0441 -------------- next part -------------- An HTML attachment was scrubbed... URL: From neil.jefferies at bodleian.ox.ac.uk Fri Nov 17 08:46:11 2017 From: neil.jefferies at bodleian.ox.ac.uk (Neil Jefferies) Date: Fri, 17 Nov 2017 13:46:11 +0000 Subject: [Pasig-discuss] Experiences with S3-like object store service providers? In-Reply-To: References: <20171114100507.b554e26909f2beaf9f8ddbf6be9a6600.409ccf0e0f.wbe@email09.godaddy.com> <0E4FA879-5F8D-4995-A153-2130F28A25DC@dpn.org> <48E9420A4871584593FC3D435EF345AAEEF6FB09@MBX10.ad.oak.ox.ac.uk> Message-ID: <48E9420A4871584593FC3D435EF345AAEEF750EE@MBX10.ad.oak.ox.ac.uk> That?s nasty ? I see your viewpoint, though. At the higher level of abstraction that you deal with ?storage? that makes sense ? since there seem to be unchecked and unauditable processes happening under the hood that may be the only good tool you have. From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Kyle Rimkus Sent: 16 November 2017 01:55 To: Neil Jefferies Cc: pasig-discuss at mail.asis.org Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers? I see your point. We had an instance of a number of files failing their regular fixity check, after having passed several times over the years. When we investigated, the files were the same size on disk as they always were, but were just strings of zeroes, and unreadable. Whatever caused this could have been due to a transmission error behind the scenes during some sort of server maintenance, but from where we stood as a repository that had outsourced our storage we had no way of knowing one way or the other, and the fixity checks saved us. On Wed, Nov 15, 2017 at 6:15 PM Neil Jefferies > wrote: Kyle, Can you expand... because that doesn't actually sound like a "fixity" issue (which is kind of my point). Either the wrong thing has been copied or the wrong thing has been stored i.e. an off-storage process failure. For those cases checks should really be done at the time of whatever operation caused the issue. Waiting for a check to pick it up downstream at some later time is somewhat risky. My point is, these are not "fixity" checks, they are transmission checks. Neil On 2017-11-15 22:59, Kyle Rimkus wrote: > On the issue of fixity checking, I agree there is a great deal of > misplaced paranoia around "bit-flipping" in contemporary storage > technologies, but at the same time I've found regular fixity checks to > be very useful in protecting against various types of human error that > can be made in managing the storage itself or in scripted processes > that interact with stored data. > > As an example, at my university we farm our storage out to a campus > unit which stores two copies locally while we push a third into Amazon > Glacier. Our preservation repository software's regular fixity checks > of on-campus data have helped us keep our storage providers and > ourselves honest. We have on more than one occasion discovered fixity > errors that pointed to questionable server management. Regular fixity > checking was what flagged us to these errors, and the storage of a > third file copy off-site was what saved us. > > We are also looking into pushing more of our storage services into the > (most likely AWS) cloud, where greater guarantees are made against > this type of problem. Maybe in time we'll come to see regular fixity > checks as less critically important than we do now. Julian's comment > that "certifying a particular storage environment to either store bits > correctly or alert on failure should be acceptable" is interesting. > I'm sure we'd all like to get away from having to run constant fixity > checks in our repositories, and would like to see digital preservation > management architecture evolve in this direction. For now though I'd > wager that most of us are constrained by the fact that fixity checking > remains essential to making sure our storage is doing what it claims > to do. > > Kyle > > -- > > Kyle R. Rimkus > > Assistant Professor > > Preservation Librarian > > University of Illinois at Urbana-Champaign > ---- > To subscribe, unsubscribe, or modify your subscription, please visit > http://mail.asis.org/mailman/listinfo/pasig-discuss > _______ > PASIG Webinars and conference material is at > http://www.preservationandarchivingsig.org/index.html > _______________________________________________ > Pasig-discuss mailing list > Pasig-discuss at mail.asis.org > http://mail.asis.org/mailman/listinfo/pasig-discuss -- Kyle R. Rimkus Assistant Professor Preservation Librarian University of Illinois at Urbana-Champaign -------------- next part -------------- An HTML attachment was scrubbed... URL: From evviva.weinraub at northwestern.edu Fri Nov 17 15:41:04 2017 From: evviva.weinraub at northwestern.edu (Evviva Weinraub) Date: Fri, 17 Nov 2017 20:41:04 +0000 Subject: [Pasig-discuss] Call for Papers - Open Repositories 2018: Sustaining Open Message-ID: The 13th International Conference on Open Repositories, OR2018, will be held on June 4th-7th, 2018 in Bozeman, Montana, USA. Open Repositories 2018 is now calling for proposals around the theme of Sustaining Open. http://www.or2018.net/call-for-papers/ Research and Cultural Heritage communities have embraced the idea of Open; open communities, open source software, open data, scholarly communications, and open access publications and collections. These projects and communities require different modes of thinking and resourcing than purchasing vended products. While open may be the way forward, mitigating fatigue, finding sustainable funding, and building flexible digital repository platforms is something most of us are striving for. Submissions this year should focus on the how, why, and what it will take to make open sustainable. While not limited to the below topics, we?re focusing our attention on issues around the sustainability of: * Open source software - sustainability of software developed locally and large open source systems, legacy code * Community - reaching out to new audiences, developing a community, governance * Content - research data, digital preservation, persistent urls, archiving * Teams/People - staff and knowledge within the community, contingency planning, training and development, and succession planning * Projects - sustainability of projects beyond the grant, maturing communities * Infrastructure/Integrations - integrations between systems, changing technical environments * Policy - national, international, local and community policy and decisions * Challenges of sustainability - funding, local, technical, community * Rights and Copyright - including Data Protection, sharing and storing of content * Reuse, standards, and reproducibility - for example: software, data, content types * New open technologies and standards Submission Process Accepted proposals in all categories will be made available through the conference?s web site, and later they and associated materials will be made available in an open repository. Some conference sessions may be live streamed or recorded, then made publicly available. Interest Groups This year there are no separate interest groups for the different repository systems, instead if your 24x7 or presentation submission is related to a specific repository system please indicate so in your proposal. Presentations Presentation proposals are expected to be two to four pages (see below for submission templates). Successful submissions in past years have typically described work relevant to a wide audience and applicable beyond a single software system. Presentations are 30 minutes long including questions. Panels Panel proposals are expected to be two to four pages (see below for submission templates). Successful submissions in past years have typically described work relevant to a wide audience and applicable beyond a single software system. All panels are expected to include at least some degree of diversity in viewpoints and personal background of the panelists. Panel sessions are expected to include a short presentation from each panel member followed by a discussion. Panels may take an entire session or may be combined with another submission. Panels can be 45 or 90 minutes long. Discussion Question and Answer Discussion Q&A proposals are expected to be two to four pages (see below for submission templates). This is your opportunity to suggest members of the community to join in a Q&A discussion on various proposed topics. This is meant to be a deep-dive into why a decision was made, how projects got started, where an idea came from, or anything else that you want to know more about. Imagine this as a 45 - 90 minute grilling at a cocktail party but on a stage in front of your peers. Q&As may take an entire session or may be combined with another submission. This session will not be video recorded. Discussion Q&A can be 45 or 90 minutes long. 24?7 Presentations 24?7 presentations are 7 minute presentations comprising no more than 24 slides. Successful 24x7 presentations have a clear focus on one or a few ideas and a narrower focus than a 25 minute presentation. Similar to Pecha Kuchas or Lightning Talks, these 24?7 presentations will be grouped into blocks based on conference themes, with each block followed by a moderated question and answer session involving the audience and all block presenters. This format will provide conference goers with a fast-paced survey of like work across many institutions. Proposals for 24?7 presentations should be one to two pages (see below for submission templates). 24x7 presentations are 7 minutes long. Posters We invite one-page proposals for posters that showcase current work (see below for submission templates). OR2018 will feature physical posters only. Posters will be on display throughout the conference. Instructions for preparing the posters will be distributed to authors of accepted poster proposals prior to the conference. Poster submitters will be expected to give a one-minute teaser to encourage visitors to their poster during the conference. Posters presentations will be 1 minute. Developer Track: Top Tips, Cunning Code and Imaginative Innovation Each year a significant proportion of the delegates at Open Repositories are software developers who work on repository software or related services. OR2018 will feature a Developer Track that will provide a focus for showcasing work and exchanging ideas. Building on the success of previous Developer Tracks, where we encouraged live hacking and audience participation, we invite members of the technical community to share the features, systems, tools and best practices that are important to you (see below for submission templates). The 15 minute presentations can be as informal as you like, but we encourage live demonstrations, tours of code repositories, examples of cool features, and the unique viewpoints that so many members of our community possess. Proposals should be one to two pages, including a title, a brief outline of what will be shared with the community, and technologies covered. Developers are also encouraged to contribute to the other tracks. Developer Track presentations are 15 minutes including questions. Ideas Challenge OR2018 will also again include the popular Ideas Challenge. Taking part in this competition provides an opportunity to take an active role in repository innovation, in collaboration with your peers and in pursuit of prizes. The Ideas Challenge is open to all conference attendees. Further details and guidance on the Ideas Challenge will be forthcoming closer to the conference. Workshops and tutorials The first day of Open Repositories will be dedicated to workshops and tutorials. One to two-page proposals addressing theoretical or practical issues around digital repositories are welcomed. See below for Proposal Templates; please address the following in your proposal: * The subject of the event and what knowledge you intend to convey * Length of session (90 minutes, 3 hours or a whole day) * A brief statement on the learning outcomes from the session * The target audience for your session and how many attendees you plan to accommodate * Technology and facility requirements * Any other supplies or support required * Anything else you believe is pertinent to carrying out the session Please note, the program committee may consider submissions for other tracks and formats, as appropriate. Submission System The submission system will be available at the start of December. When a link will be added to this page. Review Process All submissions will be peer reviewed and evaluated according to the criteria outlined in the call for proposals, including quality of content, significance, originality, and thematic fit. Code of Conduct The OR2018 Code of Conduct and Anti-Harassment Policy are available at http://or2018.net/code-of-conduct/. Scholarship Programme OR2018 will again run a Scholarship Programme which will enable us to provide support for a small number of full registered places (including the poster reception and conference dinner) for the conference in Bozeman. The programme is open to librarians, repository managers, developers and researchers in digital libraries and related fields. Applicants submitting a proposal for the conference will be given priority consideration for funding. Please note that the programme does not cover costs such as accommodation, travel and subsistence. It is anticipated that the applicant?s home institution will provide financial support to supplement the OR Scholarship Award. Full details and an application form will shortly be available on the conference website. Key Dates * 5 January 2018: Deadline for submissions * 5 January 2018: Deadline for Scholarship Programme applications * 09 February 2018: Submitters notified of acceptance to Workshops * 12 February 2018: Registration opens * 21 February 2018: Submitters notified of acceptance to other tracks * 21 February 2018: Scholarship Programme winners notified * 23 February 2018: Submitters notified of acceptance of 24x7, posters, and developer track * 20 April 2018: All presenters are encouraged to register by the close of Early Bird * 25 May 2018: Presenter registration deadline * 4-7 June 2018: OR2018 conference Program Co-Chairs Claire Knowles and Evviva Weinraub ~~~~~~~~~~~~~~~ Evviva Weinraub Associate University Librarian for Digital Strategies Northwestern University Library Northwestern University www.library.northwestern.edu evviva.weinraub at northwestern.edu Phone: 847.467.6178 -------------- next part -------------- An HTML attachment was scrubbed... URL: From awoods at duraspace.org Sun Nov 19 20:43:48 2017 From: awoods at duraspace.org (Andrew Woods) Date: Sun, 19 Nov 2017 20:43:48 -0500 Subject: [Pasig-discuss] Poll: Oxford Common Filesystem Layout In-Reply-To: References: Message-ID: Hello All, >From conversations both on and off list it is clear that a community-based recommendation describing the presentation-centric layout for repository resources on the filesystem (or cloud storage) would be valuable as a baseline for repository persistence, shared tooling, preservation workflows, etc. If you would be interested in participating in the inaugural discussion of the effort being termed the "Oxford Common Filesystem Layout" towards defining such a recommendation , please indicate your availability on the following poll: https://doodle.com/poll/txhg7hkt6mvbwnyn The motivating document from the team at Oxford is attached. Additional preparatory material for this meeting is the MOAB model from Stanford: http://journal.code4lib.org/articles/8482 Regards, Andrew Woods p.s. I will be closing this poll Friday, November 24. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Oxford Common Filesystem Layout.pdf Type: application/pdf Size: 75910 bytes Desc: not available URL: From Mathieu.Giannecchini at ymagis.com Tue Nov 21 15:39:03 2017 From: Mathieu.Giannecchini at ymagis.com (GIANNECCHINI Mathieu) Date: Tue, 21 Nov 2017 20:39:03 +0000 Subject: [Pasig-discuss] ClairMeta : opensource DCP / DCDM probe and check tool Message-ID: <83A0511B-C14A-449D-8C44-8240A580F02A@ymagis.com> Dear Pasig community, As announced during last Pasig in Oxford, i?m very pleased to announce that Eclair has just released his DCP and DCDM check and probe tool called ? ClairMeta ? under MIT licence. Source code is available on GitHub : https://github.com/Ymagis/ClairMeta A PyPi package is also available (beta) : https://pypi.python.org/pypi/clairmeta Feel free to try it, use it and contribute ! Cheers Mathieu -------------- next part -------------- An HTML attachment was scrubbed... URL: From awoods at duraspace.org Fri Nov 24 22:50:34 2017 From: awoods at duraspace.org (Andrew Woods) Date: Fri, 24 Nov 2017 22:50:34 -0500 Subject: [Pasig-discuss] Poll: Oxford Common Filesystem Layout In-Reply-To: References: Message-ID: Hello All, The inaugural discussion on the "Oxford Common Filesystem Layout" will take place on: Friday, Dec 1st @4:00pm UTC (11am ET) The agenda and call-in information will be in the following Google Doc: https://docs.google.com/document/d/1ATFC0YdtpRWsHm0r5GUTY9JY5dzDwJCNvs1RcDuBayE/edit?usp=sharing The ?Oxford Common Filesystem Layout? initiative is motivated by the need for a preservation-centric, common approach to filesystem (or cloud) layout for institutional repositories. The goal of this effort is to establish or identify recommendations for how IR systems should structure and store files. One of the objectives of the call will be to highlight relevant prior art, driving use cases, and active initiatives. There are five 5-minute slots in the agenda for any of the attendees to fill in your names to ensure that you have time to discuss related work. Please add your name in the agenda by close of business on Wed, Nov 29th if you would be willing to share work related to this effort. Also, please add links to relevant reading in the "Related Reading" section of the agenda. Regards, Andrew Woods p.s. Subsequent communication will take place on the pasig-discuss at mail.asis.org list. On Sun, Nov 19, 2017 at 8:43 PM, Andrew Woods wrote: > Hello All, > From conversations both on and off list it is clear that a community-based > recommendation describing the presentation-centric layout for repository > resources on the filesystem (or cloud storage) would be valuable as a > baseline for repository persistence, shared tooling, preservation > workflows, etc. > > If you would be interested in participating in the inaugural discussion of > the effort being termed the "Oxford Common Filesystem Layout" towards defining > such a recommendation , please indicate your availability on the > following poll: > https://doodle.com/poll/txhg7hkt6mvbwnyn > > The motivating document from the team at Oxford is attached. > > Additional preparatory material for this meeting is the MOAB model from > Stanford: > http://journal.code4lib.org/articles/8482 > > Regards, > Andrew Woods > p.s. I will be closing this poll Friday, November 24. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zvowell at calpoly.edu Tue Nov 28 18:22:24 2017 From: zvowell at calpoly.edu (Zach Vowell) Date: Tue, 28 Nov 2017 23:22:24 +0000 Subject: [Pasig-discuss] Call for Software Preservation Project Proposals // DUE Jan. 12 Message-ID: OVERVIEW The Fostering a Community of Practice: Software Preservation and Emulation Experts in Libraries and Archives (FCoP) [IMLS grant RE-95-17-0058-17] project is Calling for Project Proposals that empower librarians, archivists, and curators to address the key challenges to providing long-term access to software-dependent cultural heritage. Projects undertaken by the selected proposals will advance digital preservation practice by advancing field-wide understanding of software preservation in a variety of organizational contexts. The activities and documentation produced by FCoP cohort members will complement parallel efforts to bring software preservation and access into mainstream digital preservation practice (addressing specific legal, metadata and technical preservation and access challenges). APPLICATION TIMELINE >From November 28, 2017 to January 12th, 2018, the FCoP Project Staff for the Fostering Communities of Practice project will be accepting applications for projects during the Summer 2018-Summer 2019 term. PROJECT DESCRIPTION Project proposals should be initiated by cultural heritage organizations that are currently working to preserve and provide access to digital content. The FCoP Project Team is particularly interested in project proposals from Historically Black Colleges & Universities Library Alliance (HBCU) and Association for Specialized and Cooperative Library Agencies (ASCLA) member organizations. However project proposals from all other cultural heritage institutions are welcome. Participation in the FCoP cohort includes: * $5,000 financial award to be used for travel and registration costs for conferences and workshops where cohort members will present, facilitate discussion and actively solicit interest from fellow librarians, archivists, and museum conservators and curators * Community fellowship, sharing and information exchange with members of the FCoP cohort * Access to and technical support for a web-based emulation sandbox which requires no local installation * Formal support for problem-based learning and research on the challenges to implementing software preservation and emulation in their local organization * Access to and support for communication tools for the duration of the project in order to encourage the cohort to communicate with one another outside of structured or facilitated interaction * Support from a caring and deeply invested FCoP Project Staff that want to ensure that these projects are meaningful for participating individuals and organizations * Crucial contribution to broader national and international software preservation efforts and access strategies, including the Software Preservation Network. All project proposals must include: * 2 Letters of Commitment * Statement of Interest * Identification of Project Team * Resumes/CVs from the Applicant Project Team * Project Summary MORE INFORMATION: For more details and application instructions, check out the project website: http://www.softwarepreservationnetwork.org/fcop/ Think you might want to submit a software preservation and access project proposal but have questions? We encourage your participation in the FCoP Software Preservation Open Forum! * WHAT: FCoP Software Preservation Open Forum will include: * Brief overview of the FCoP project * 15 minute presentation on Emulation as a Service from Klaus Reichert (University of Frieburg, OpenSLX) * 15 minute presentation on software preservation workflows from Tim Walsh (Canadian Centre for Architecture) * Open question and answer session for attendees * WHEN: FCoP Software Preservation Open Forum will be held: * Tuesday, December 12th, 2017 * 9am PT/11am CT/12pm ET * Sign up for a calendar reminder here: https://goo.gl/4KTkRC * WHERE: FCoP Software Preservation Open Forum will be hosted in Zoom, see call-in instructions below: * Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/946742570 * Or Telephone: US: +1 646 558 8656 or +1 669 900 6833 * Meeting ID: 946 742 570 -------------- next part -------------- An HTML attachment was scrubbed... URL: From awoods at duraspace.org Wed Nov 29 18:55:45 2017 From: awoods at duraspace.org (Andrew Woods) Date: Wed, 29 Nov 2017 18:55:45 -0500 Subject: [Pasig-discuss] Agenda: Oxford Common Filesystem Layout Message-ID: Hello All, The agenda and call-in information for this Friday's call (2018-12-01) is available: https://docs.google.com/document/d/1ATFC0YdtpRWsHm0r5GUTY9JY5dzDwJCNvs1RcDuBayE/edit?usp=sharing Regards, Andrew -------------- next part -------------- An HTML attachment was scrubbed... URL: From artpasquinelli at stanford.edu Thu Nov 30 14:05:11 2017 From: artpasquinelli at stanford.edu (Arthur Pasquinelli) Date: Thu, 30 Nov 2017 19:05:11 +0000 Subject: [Pasig-discuss] Inaugural LOCKSS Quarterly Newsletter - November 2017 Message-ID: There are some events and documents that might be of interest to the broader PASIG and preservation community, so I am sharing this inaugural newsletter beyond the LOCKSS user groups. Subject: LOCKSS Quarterly Newsletter - November 22, 2017 Welcome to the inaugural LOCKSS Quarterly Newsletter. It is our goal to share developments, activities, and useful information about LOCKSS and preservation, upcoming events, and new content. This is a vehicle to publicize and exchange LOCKSS community news, so please actively participate and share! The Quarterly Newsletter will be kept on the www.lockss.org webpage. If you have any questions or ideas for the next newsletter just email or call Art Pasquinelli at artpasquinelli at stanford.edu, 650-430-2441. We have seen growing momentum for LOCKSS in 2017. This includes the Mellon Grant to re-architect LOCKSS into a web services framework, new LOCKSS networks, a closer positioning of the LOCKSS program within the Stanford IT infrastructure, a number of speaking engagements, and a big uptick in partner discussions. LOCKSS has been much more actively involved in other associations and communities such as NDSA, PASIG, CNI, and the Digital Preservation Coalition (DPC). Lastly, we have instituted more communication mechanisms within the LOCKSS community ? including this newsletter ? and we will be revamping the LOCKSS website in the next few months. Many thanks to everyone who has been contributing on the monthly calls so far. As we go forward, please feel free to offer comments, advice, and content. I. Community News A. LOCKSS Transitions: Founders of LOCKSS retire, new leadership and grant continue innovative nature of the preservation network. The Andrew W. Mellon Foundation awards LOCKSS just over $1.2 Million to upgrade its architecture; welcome news to founders Dr. David S.H. Rosenthal and Victoria Reich who will phase out of the organization by early next year. See the Stanford article on Vicky and and David Retiring at http://hosted-p0.vresp.com/260487/6e9b644305/ARCHIVE B. Stanford U. LOCKSS Program to Mainstream Distributed Digital Preservation through New Project, September 7, 2017 https://library.stanford.edu/node/130509 C. An open Google Docs folder now holds past presentation and monthly LOCKSS Zoom call content. You can see recent October-November monthly call slide decks on the LOCKSS Overview, ADPN, SAFE, and the November 15 LOCKSS presentation at Dodging the Memory Hole (DTMH) in San Francisco: https://drive.google.com/drive/folders/1ZToNMm2aLp-A_9e_fd4UVtBoka5aCo3N D. Welcome Perma.cc as a new LOCKSS Network! Perma.cc is based at Harvard and can be seen at https://perma.cc/. Perma.cc is developed and maintained by the Harvard Law School Library in conjunction with university law libraries across the country and other organizations in the ?forever? business. In a sample of several legal journals, approximately 70% of all links in citations published between 1999 and 2011 no longer point to the same material. Broken links in journal articles undermine the citation-based system of legal scholarship by obscuring the evidence underlying authors? ideas. When a user creates a Perma.cc link, Perma.cc archives the referenced content and generates a link to an archived record of the page. Regardless of what may happen to the original source, the archived record will always be available through the Perma.cc link. II. Upcoming LOCKSS-related Events, Activities, and Trainings A. Save the Date for the Next LOCKSS Annual Meeting! The next annual LOCKSS meeting at Stanford is scheduled for March 29-30, 2018. The LDCX developer conference will be the Monday-Wednesday before. Details on both these events will be coming in the next several weeks. B. The first LOCKSS Quarterly Technical Zoom Call will be at 9am PT on December 6, 2017. Thib Guicherd-Callin will do a technology review of the LOCKSS web services development that he and his team are working on. He will then have an open Q&A. https://stanford.zoom.us/j/3122819697 Meeting ID: 312 281 9697 Dial: +1 650 724 9799 (US, Canada, Caribbean Toll) or +1 833 302 1536 (US, Canada, Caribbean Toll Free) International numbers available: https://stanford.zoom.us/zoomconference?m=HnWNYPQOZx83yC_s4ngFs-84pKE29zfr C. The next LOCKSS Monthly Zoom Call will be December 13, 2017 at 9am PT. James Jacobs, Stanford?s Government Information Librarian, will give an update on the Digital Federal Depository Library LOCKSS network. James will also review trends he sees in federal government documents and data preservation and his impressions of the recent Dodging the Memory Hole conference. Additionally, Kris Kazianovitz, the Stanford Government Information Librarian for International, State and Local Documents, will provide her insights on trends following the recent Best Practices Exchange (BPE) Conference in Boston (https://bpexchange.wordpress.com). https://stanford.zoom.us/j/3122819697 Meeting ID: 312 281 9697 Dial: +1 650 724 9799 (US, Canada, Caribbean Toll) or +1 833 302 1536 (US, Canada, Caribbean Toll Free) International numbers available: https://stanford.zoom.us/zoomconference?m=HnWNYPQOZx83yC_s4ngFs-84pKE29zfr D. Events Planning Calendar for LOCKSS: - LOCKSS Annual Meeting, March 29-30, 2018, Stanford (preceded by LDCX) - Open Repositories 2018, June 4-7, 2018, Bozeman, Montana - iPres, September 24-27, 2018, Boston - DLF/NDSA, October 15-18, Las Vegas - PASIG, February 14-16, 2019, Mexico City ***Note: Please inform us of your LOCKSS-related events for this quarterly calendar and to publicize on the monthly Zoom calls and website! III. Useful Content and Events A. PASIG: The Preservation and Archiving Special Interest Group had its annual meeting in September. The presentations are presently at https://pasigoxford.figshare.com/. The Twitter notes from the meeting are at https://docs.google.com/document/d/1KbenZQTNZ_KUAYdrlH2bptk_z0lkqqjbTlSmbGfcQc4/edit#heading=h.ht36l8suiab6 The main PASIG website has all the past years? content and is at http://www.preservationandarchivingsig.org. B. Other key groups that create resources focused on Preservation include: 1) Digital Preservation Coalition - http://www.dpconline.org/ 2) National Digital Stewardship Alliance - http://ndsa.org/ 3) iPres - https://ipres-conference.org/ -- Art Pasquinelli LOCKSS Partnerships Manager Stanford University Libraries Cell: 1-650-430-2441 artpasquinelli at stanford.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From sschaefer at ucsd.edu Thu Nov 30 16:25:44 2017 From: sschaefer at ucsd.edu (Schaefer, Sibyl) Date: Thu, 30 Nov 2017 21:25:44 +0000 Subject: [Pasig-discuss] NDSA National Agenda Organizational Survey Message-ID: <2FA0CFED-22FB-43A1-A178-7BDDEEA23049@ucsd.edu> Dear colleagues, The NDSA National Agenda Working Group needs your perspective on the human, economic, and organizational challenges that memory institutions that curate digital content face in carrying out their missions over the long term. We have developed a brief survey that will inform the writing of the 2018 National Agenda for Digital Stewardship: https://www.surveymonkey.com/r/NationalAgenda We ask that all NDSA member institutions participate -- it requires less than 5 minutes to complete. The survey will close Dec 30. Thank you, The NDSA National Agenda Working Group -------------- next part -------------- An HTML attachment was scrubbed... URL: