[Pasig-discuss] Experiences with S3-like object store service providers?

Arthur Pasquinelli artpasquinelli at stanford.edu
Tue Nov 14 16:00:15 EST 2017


Mike, I love to see you ex-Sun people joining in on PASIG discussions! I think it shows the Sun commitment to Education is still living on!

Since you offered and since Cloud architectures – specifically AWS - is something we in the LOCKSS community have been doing analysis on, I would love to set up a webinar for the PASIG members. Gail and I can possibly work on it with you. I saw at the recent NDSA meeting that it was difficult to analyze/compare the offerings of the vendors beyond the basic fixity feature. A group has been actually working on a matrix.

We can use my Zoom capability for a webinar. While I would also offer the opportunity to the other Cloud vendors, I think AWS should be the first webinar. You have a unique depth of expertise and I want to leverage it for the community. This is a conversation that has been percolating within the PASIG for awhile without the direct involvement of the major Cloud players other than Oracle periodically. So, I will work with you as much as you need to get this first webinar off the ground. Thanks!

--
Art Pasquinelli
LOCKSS Partnership Manager
Stanford University Libraries
Cell: 1-650-430-2441
artpasquinelli at stanford.edu

From: Pasig-discuss <pasig-discuss-bounces at asist.org<mailto:pasig-discuss-bounces at asist.org>> on behalf of Mike Davis <akropilot at gmail.com<mailto:akropilot at gmail.com>>
Date: Tuesday, November 14, 2017 at 12:22 PM
To: "gail at trumantechnologies.com<mailto:gail at trumantechnologies.com>" <gail at trumantechnologies.com<mailto:gail at trumantechnologies.com>>
Cc: "pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org>" <pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org>>
Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers?

Hi Gail

I appreciate the fact that public cloud pricing can be complex; it's a function of the cost-following strategy. If the vendor incurs a cost, whether from media, IO, or networking, it's passed along as discrete charges to the customer. The alternative is opaquely bundling all the costs, which reduces transparency and flexibility to follow commodity curves downward. I believe it's publicly available data that S3 has dropped capacity pricing for example at an average 10% (ish) per year since launch.

But the idea that transaction and I/O fees dramatically inflate the cost of public cloud storage is a myth, particularly for digital asset management and archival. It is certainly possible to design a wonky IO-heavy workload, place it on the wrong storage tier, and end up with unexpectedly high costs. But for archival-oriented workloads, the costs of moving data should never be greater than 10% of total or the situation needs to be examined more closely. For example, we might find that large objects are being inadvertently chunked into millions of 16KB objects by some third party gateway solution, that would inflate the transaction count.

Happy to give you (and PASIG) a deeper dive on IAAS storage strategies, and to solve for long-term durability.

-Mike Davis (AWS Storage)


On Tue, Nov 14, 2017 at 9:05 AM, <gail at trumantechnologies.com<mailto:gail at trumantechnologies.com>> wrote:
Thanks for chiming in Jacob!  As always, great additional information. I think it's worth emphasizing that having an open and native data format independent of where the data lives - this is really what will enable multi-cloud workflow management. And also having federated data search of system and descriptive metadata across the namespace no matter where the data is stored (including across public- and on-prem cloud storage). These are what the newer cloud controller software, like Zenko, Starfish, Minio and (and other sw within some cloud services) can enable.

Public cloud storage prices are racing to the bottom, but (as David Rosenthal and others have pointed out) often the "hidden" costs of pulling the data back will usually result in costs greater than a private cloud.

Stuart -
I just read a couple of Forrester papers on Total Economic Impact (TEI) of public clouds -- the ones I have URLs to are posted below and make a useful read:
https://www.emc.com/collateral/analyst-reports/dell-emc-ecs-forrester-tei.pdf
https://whitepapers.theregister.co.uk/paper/view/5835/the-total-economic-impact-of-scality-ring-with-dell-storage-servers
They talk about Dell hardware for building our on-prem clouds (ECS from EMC and RING from Scality) and I believe you're working with HPE, but the maths should be similar to show savings over public cloud. That said, putting one or more copies in public cloud and managing them from one namespace would be ideal... I envision use cases where multi-cloud controller software will allow you to move data to the cloud service that fits the data. [Even if it's for long-term archival, there are times when preservation data services will need to be run (format migration, integrity checks, creating access or derivatives of moving or still images, etc).] Spin up some quick compute services or Hadoop (for other use case).

This is a great topic - Julian and Stuart, all the best on your projects, please do let this alias know what you decide to go with!


Gail


Gail Truman
Truman Technologies, LLC
Certified Digital Archives Specialist, Society of American Archivists

Protecting the world's digital heritage for future generations
www.trumantechnologies.com<http://www.trumantechnologies.com>
facebook/TrumanTechnologies
https://www.linkedin.com/in/gtruman

+1 510 502 6497<tel:(510)%20502-6497>




-------- Original Message --------
Subject: RE: [Pasig-discuss] Experiences with S3-like object store
service providers?
From: Jacob Farmer <jfarmer at cambridgecomputer.com<mailto:jfarmer at cambridgecomputer.com>>
Date: Tue, November 14, 2017 7:03 am
To: "Lewis, Stuart" <stuart.lewis at nls.uk<mailto:stuart.lewis at nls.uk>>, "Julian M. Morley"
<jmorley at stanford.edu<mailto:jmorley at stanford.edu>>, gail at trumantechnologies.com<mailto:gail at trumantechnologies.com>,
pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org>

Hi, Stuart. I thought I would weigh in on your plans.  I’m a data storage consultant with about 30 years in the game. I’m also the founder of Starfish Storage which makes software products for virtualizing file systems and object stores.

·       I think you are correct to manage multiple copies in the application layer.  This gives you maximum control, ability to shift vendors, etc.  Storage should always be thought of a stack that starts with the application and ends in the storage media.  There can be multiple layers and the system architect should pick the right layer of abstraction for any given set of functionality.  The higher in the stack, the greater your application awareness.

·       By handling replication and addressing in your application, you should be able to switch object stores over time without much difficulty.  As such, it does not matter so much which object store you buy.  You could simply chase price.

·       Oracle – They are mysterious about what they are doing under the hood, but it does not matter.  It’s a “cloud”.  They are so inexpensive.  Use them as your second or third copy.  I know that Oracle people monitor the news group.  Maybe one will offer to connect you to a product manager who can describe the infrastructure. If not, I would be happy to connect you to folks on the US side of the pond.

·       There is a particular vendor you did not list who might be interesting for you.  They are called Caringo.  They have been in the object storage business before S3 came to market, and thus they offer alternative addressing to the S3 bucket paradigm.  They can emulate S3 just like everyone else, but S3 was designed by Amazon for the purpose of selling a storage service.  It is not necessarily, the logical way to store objects for a digital library.  If you are going to address objects directly from your application, they might have some unique value.  I am happy to connect you to executives in the company.

·       The other vendor worth looking at is Minio.IO<http://Minio.IO>.  I just pointed Julian to them the other day. They provide an object interface in storage and could federate different cloud stores together.  You might consider them for one of your copies. I still like the idea of doing your replication in the application.  They are similar in concept to Zenko who Gail recommended earlier.

·       POSIX File System Gateway – My software company (Starfish Storage) has a file system gateway under development (ready for early adopters) that is ideal if you want a POSIX personality.  We can take the contents of the object store and present it as a POSIX file system.

o   We map files 1:1 to objects.  Most file system gateways on the market break up files into smaller objects, akin to blocks.
o   We support Active Directory perfectly, SMB-2, SMB-3, NFS-3, NFS-4
o   We also work in-band or side-band to the object store.  That means that you can use our POSIX interface simultaneously with S3.

·       You probably also have use cases for Starfish, maybe as a migration tool from file to object or as an end-to-end fixity solution.  We would be especially useful if you need to migrate files from your tape file system.
o   Starfish is a metadata and rules engine for file systems and object stores.  Too many concepts to put in an email!

I hope that helps.  Message me offline if you want to discuss.  I’m at the SuperComputer conference this week, so replies will be a bit slow.


Jacob Farmer  |  Chief Technology Officer  |  Cambridge Computer  |  "Artists In Data Storage"
Phone 781-250-3210<tel:(781)%20250-3210>  |  jfarmer at CambridgeComputer.com<mailto:jfarmer at CambridgeComputer.com>  |  www.CambridgeComputer.com<http://www.cambridgecomputer.com/>


From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org<mailto:pasig-discuss-bounces at asist.org>] On Behalf Of Lewis, Stuart
Sent: Tuesday, November 14, 2017 4:26 AM
To: 'Julian M. Morley' <jmorley at stanford.edu<mailto:jmorley at stanford.edu>>; gail at trumantechnologies.com<mailto:gail at trumantechnologies.com>; pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org>
Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers?

Hi Julian, Gail, all,

At the National Library of Scotland we are also in the middle of some procurements to revamp our storage infrastructure for holding our digitised content archive.

The approach historically taken here has been to use general purpose SANs, with a second copy placed on offline tape.  The SANs have never been built to scale (so they fill and we buy another), and they are general purpose, trying their best (but often failing!) to run a mixed workload of everything from VMs to data archive and everything in between.

We’re now wanting to move to three copies, two online and one offline (in the cloud if possible).

For the online copies we’re about to get to tender to buy a geo-replicated object storage system, to be hosted in our data centres in Edinburgh and Glasgow.  I suspect the likely candidates will be systems such as Dell EMC ECS, HPE+Scality, IBM ESS**, and Hitachi HPC.

(** ESS rather than CleverSafe, as I think that is predicated on three datacentres, but we only want two).

We’re also about to try a large-scale proof of concept with the Oracle Archive Cloud, but have an open question regarding its characteristics compared to local offline tape.  Due to lack of transparency about what is actually going on behind the scenes in a cloud environment, we don’t know whether this gives us the same offline protection that tape gives us (e.g. much harder to corrupt or accidentally delete).

We’re also purposefully not going to use the object storage system’s in-built cloud connectors for replication.  We feel it might be safer for us to manage the replication to the cloud in our repository, rather than having a single vendor system manage all three copies at once.

Critique of this plan is most welcome!

Also happy to join in any offline discussion about this.

Best wishes,


Stuart Lewis
Head of Digital

National Library of Scotland
George IV Bridge, Edinburgh EH1 1EW

Tel:+44 (0) 131 623 3704<tel:+44%20131%20623%203704>
Email:stuart.lewis at nls.uk<mailto:stuart.lewis at nls.uk>
Website:www.nls.uk<http://www.nls.uk/>
Twitter:@stuartlewis<http://twitter.com/stuartlewis>
[cid:image003.jpg at 01D0DE86.3095EC20]




From: Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] On Behalf Of Julian M. Morley
Sent: 14 November 2017 04:28
To:gail at trumantechnologies.com<mailto:gail at trumantechnologies.com>; pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org>
Subject: Re: [Pasig-discuss] Experiences with S3-like object store service providers?

Hi Gail,

Sure - would be happy to chat with you.

I’ve got Scality in my list of contenders - didn’t mention it here because my first few use cases are explicitly ‘not on campus’, but I agree it’s definitely a fit for our main on prem system. As with any commercial software, ongoing licensing costs are a potential pain point for us.

--
Julian M. Morley
Technology Infrastructure Manager
Digital Library Systems & Services
Stanford University Libraries

From: "gail at trumantechnologies.com<mailto:gail at trumantechnologies.com>" <gail at trumantechnologies.com<mailto:gail at trumantechnologies.com>>
Date: Monday, November 13, 2017 at 4:06 PM
To: Julian Morley <jmorley at stanford.edu<mailto:jmorley at stanford.edu>>, "pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org>" <pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org>>
Cc: "gail at trumantechnologies.com<mailto:gail at trumantechnologies.com>" <gail at trumantechnologies.com<mailto:gail at trumantechnologies.com>>
Subject: RE: [Pasig-discuss] Experiences with S3-like object store service providers?

Hi Julian, thanks for sharing your list and comments. Very thorough list. I'd love to chat (and I'm close by in Oakland).... I've quite a lot of experience in the cloud storage field and would suggest you also take a look at multi-cloud connector technologies that will allow you to standardize on S3, but write to non-S3-based public cloud vendors. And to tier or move data among private and public clouds and do federated search on metadata across a single namespace (across these clouds).

Check out a couple of interesting technologies:

Open Source Zenko.io<http://Zenko.io> - offering S3 connect to AWS, Azure and Google (the latter 2 are coming shortly), and also
Scality Connect for Azure Blog Storage - translates S3 API calls to Azure blob storage API calls.

See the attached datasheet and also  https://www.zenko.io/

I'd add Scality to your list -- see the Gartner magic quadrant they're shown in the Upper Right Visionary quadrant and are close to you in San Francisco. They talk S3, File, NFS/SMB, REST (CDMI etc), can tier off to public clouds, and have lots of multi-PB size customer installs.  Gartner MQ is here: https://www.gartner.com/doc/reprints?id=1-4IE870C&ct=171017&st=sb

I'd be very interested in learning more about your use cases -- can we connect outside of this PASIG alias?

Gail



Gail Truman
Truman Technologies, LLC
Certified Digital Archives Specialist, Society of American Archivists

Protecting the world's digital heritage for future generations
www.trumantechnologies.com<http://www.trumantechnologies.com>
facebook/TrumanTechnologies
https://www.linkedin.com/in/gtruman

+1 510 502 6497<tel:(510)%20502-6497>



-------- Original Message --------
Subject: [Pasig-discuss] Experiences with S3-like object store service
providers?
From: "Julian M. Morley" <jmorley at stanford.edu<mailto:jmorley at stanford.edu>>
Date: Mon, November 13, 2017 12:43 pm
To: "pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org>" <pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org>>

Hi everyone,

I’ve currently got at least four use cases for an S3-compatible object store, spanning everything from traditional S3 through infrequent access stores to cold vaults. As a result I’ve spent considerable time researching options and prices, and was wondering if anyone on this list has any similar experiences they’d like to share.

Our use cases range from hundreds of TB through to several PB, with different access patterns and comfort levels around redundancy and access. For most of them a 100% compatible S3 API is a requirement, but we can bend that a bit for the cold storage use case. We’re also considering local/on-prem object stores for one of the use cases - either rolling our own Ceph install, or using Dell/EMC ECS or SpectraLogic ArcticBlue/Blackpearl.

The vendors that I’m looking at are:

Amazon Web Services (S3, Infrequent Access S3 and S3-to-Glacier).
This is the baseline. We have a direct connect pipe to AWS which reduces the pain of data egress considerably.

IBM Cloud Bluemix (formerly CleverSafe)
A good choice for multi-region redundancy, as they use erasure coding across regions - no ‘catch up’ replication - providing CRR at a cheaper price than AWS. If you only want to keep one copy of your data in the cloud, but have it be able to survive the loss of a region, this is the best choice (Google can also do this, but not with an S3 API or an infrequent access store).

Dell/EMC Virtustream (no cold storage option)
Uses EMC ECS hardware. Actually more expensive than AWS at retail pricing for standard object storage; their value add is tying Virtustream into on-prem ECS units.

Iron Mountain Iron Cloud (Infrequent Access only)
Also uses EMC ECS hardware. Designed primarily for backup/archive workloads (no big surprise there), but with no retrieval, egress or PUT/GET/POST charges.

Oracle Cloud (cheapest cold storage option, but not S3 API)
Uses Openstack Swift. Has the cheapest cloud-tape product (Oracle Cloud Storage Archive), but has recently increased prices to be closer to AWS Glacier.

Google Cloud Platform (not an S3 API)
Technically brilliant, but you have to be able to use their APIs. Their cold storage product is online (disk, not tape), but not as cheap as Glacier.

Microsoft Azure (not an S3 API)
Competitively priced, especially their Infrequent Access product, but again not an S3 API and their vault product is still in beta.

Backblaze B2 (not an S3 API)
Another backup/archive target, only slightly more expensive than Glacier, but online (no retrieval time or fees) and with significantly cheaper data egress rates than AWS.

Wasabi Cloud
Recently launched company from the team that brought you Carbonite. Ridiculously cheap S3 storage, but with a 90-day per-object minimum charge. It’s cheaper and faster than Glacier, both to store data and egress it, but there’s obvious concerns around company longevity. Would probably make a good second target if you have a multi-vendor requirement for your data.

If anyone is interested in hearing more, or has any experience with any of these vendors, please speak up!

--
Julian M. Morley
Technology Infrastructure Manager
Digital Library Systems & Services
Stanford University Libraries
________________________________
----
To subscribe, unsubscribe, or modify your subscription, please visit
http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org<mailto:Pasig-discuss at mail.asis.org>
http://mail.asis.org/mailman/listinfo/pasig-discuss

National Library of Scotland, Scottish Charity, No: SCO11086
This communication is intended for the addressee(s) only. If you are not the addressee please inform the sender and delete the email from your system. The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of National Library of Scotland. This message is subject to the Data Protection Act 1998 and Freedom of Information (Scotland) Act 2002. No liability is accepted for any harm that may be caused to your systems or data by this message.
 Before you print please think about the ENVIRONMENT

<http://www.nls.uk/>


----
To subscribe, unsubscribe, or modify your subscription, please visit
http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org<mailto:Pasig-discuss at mail.asis.org>
http://mail.asis.org/mailman/listinfo/pasig-discuss




--
Michael Davis  |  akropilot at gmail.com<mailto:akropilot at gmail.com>  |  mobile 408-464-0441
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20171114/bcce8859/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 18754 bytes
Desc: image001.jpg
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20171114/bcce8859/attachment-0001.jpg>


More information about the Pasig-discuss mailing list