[Pasig-discuss] Experiences with S3-like object store service providers?
Jacob Farmer
jfarmer at cambridgecomputer.com
Tue Nov 14 10:03:16 EST 2017
Hi, Stuart. I thought I would weigh in on your plans. I’m a data storage
consultant with about 30 years in the game. I’m also the founder of
Starfish Storage which makes software products for virtualizing file
systems and object stores.
· I think you are correct to manage multiple copies in the
application layer. This gives you maximum control, ability to shift
vendors, etc. Storage should always be thought of a stack that starts with
the application and ends in the storage media. There can be multiple
layers and the system architect should pick the right layer of abstraction
for any given set of functionality. The higher in the stack, the greater
your application awareness.
· By handling replication and addressing in your application, you
should be able to switch object stores over time without much difficulty.
As such, it does not matter so much which object store you buy. You could
simply chase price.
· Oracle – They are mysterious about what they are doing under the
hood, but it does not matter. It’s a “cloud”. They are so inexpensive.
Use them as your second or third copy. I know that Oracle people monitor
the news group. Maybe one will offer to connect you to a product manager
who can describe the infrastructure. If not, I would be happy to connect
you to folks on the US side of the pond.
· There is a particular vendor you did not list who might be
interesting for you. They are called Caringo. They have been in the
object storage business before S3 came to market, and thus they offer
alternative addressing to the S3 bucket paradigm. They can emulate S3 just
like everyone else, but S3 was designed by Amazon for the purpose of
selling a storage service. It is not necessarily, the logical way to store
objects for a digital library. If you are going to address objects
directly from your application, they might have some unique value. I am
happy to connect you to executives in the company.
· The other vendor worth looking at is Minio.IO. I just pointed
Julian to them the other day. They provide an object interface in storage
and could federate different cloud stores together. You might consider
them for one of your copies. I still like the idea of doing your
replication in the application. They are similar in concept to Zenko who
Gail recommended earlier.
· POSIX File System Gateway – My software company (Starfish Storage)
has a file system gateway under development (ready for early adopters) that
is ideal if you want a POSIX personality. We can take the contents of the
object store and present it as a POSIX file system.
o We map files 1:1 to objects. Most file system gateways on the market
break up files into smaller objects, akin to blocks.
o We support Active Directory perfectly, SMB-2, SMB-3, NFS-3, NFS-4
o We also work in-band or side-band to the object store. That means that
you can use our POSIX interface simultaneously with S3.
· You probably also have use cases for Starfish, maybe as a migration
tool from file to object or as an end-to-end fixity solution. We would be
especially useful if you need to migrate files from your tape file system.
o Starfish is a metadata and rules engine for file systems and object
stores. Too many concepts to put in an email!
I hope that helps. Message me offline if you want to discuss. I’m at the
SuperComputer conference this week, so replies will be a bit slow.
*Jacob Farmer | Chief Technology Officer | Cambridge Computer |
"Artists In Data Storage" *
Phone 781-250-3210 | jfarmer at CambridgeComputer.com |
www.CambridgeComputer.com <http://www.cambridgecomputer.com/>
*From:* Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] *On Behalf
Of *Lewis, Stuart
*Sent:* Tuesday, November 14, 2017 4:26 AM
*To:* 'Julian M. Morley' <jmorley at stanford.edu>; gail at trumantechnologies.com;
pasig-discuss at mail.asis.org
*Subject:* Re: [Pasig-discuss] Experiences with S3-like object store
service providers?
Hi Julian, Gail, all,
At the National Library of Scotland we are also in the middle of some
procurements to revamp our storage infrastructure for holding our digitised
content archive.
The approach historically taken here has been to use general purpose SANs,
with a second copy placed on offline tape. The SANs have never been built
to scale (so they fill and we buy another), and they are general purpose,
trying their best (but often failing!) to run a mixed workload of
everything from VMs to data archive and everything in between.
We’re now wanting to move to three copies, two online and one offline (in
the cloud if possible).
For the online copies we’re about to get to tender to buy a geo-replicated
object storage system, to be hosted in our data centres in Edinburgh and
Glasgow. I suspect the likely candidates will be systems such as Dell EMC
ECS, HPE+Scality, IBM ESS**, and Hitachi HPC.
(** ESS rather than CleverSafe, as I think that is predicated on three
datacentres, but we only want two).
We’re also about to try a large-scale proof of concept with the Oracle
Archive Cloud, but have an open question regarding its characteristics
compared to local offline tape. Due to lack of transparency about what is
actually going on behind the scenes in a cloud environment, we don’t know
whether this gives us the same offline protection that tape gives us (e.g.
much harder to corrupt or accidentally delete).
We’re also purposefully not going to use the object storage system’s
in-built cloud connectors for replication. We feel it might be safer for
us to manage the replication to the cloud in our repository, rather than
having a single vendor system manage all three copies at once.
Critique of this plan is most welcome!
Also happy to join in any offline discussion about this.
Best wishes,
*Stuart LewisHead of DigitalNational Library of Scotland*George IV Bridge,
Edinburgh EH1 1EW
*Tel:* +44 (0) 131 623 3704
*Email:* stuart.lewis at nls.uk
*Website:* www.nls.uk
*Twitter:* @stuartlewis <http://twitter.com/stuartlewis>
[image: cid:image003.jpg at 01D0DE86.3095EC20]
*From:* Pasig-discuss [mailto:pasig-discuss-bounces at asist.org
<pasig-discuss-bounces at asist.org>] *On Behalf Of *Julian M. Morley
*Sent:* 14 November 2017 04:28
*To:* gail at trumantechnologies.com; pasig-discuss at mail.asis.org
*Subject:* Re: [Pasig-discuss] Experiences with S3-like object store
service providers?
Hi Gail,
Sure - would be happy to chat with you.
I’ve got Scality in my list of contenders - didn’t mention it here because
my first few use cases are explicitly ‘not on campus’, but I agree it’s
definitely a fit for our main on prem system. As with any commercial
software, ongoing licensing costs are a potential pain point for us.
--
Julian M. Morley
Technology Infrastructure Manager
Digital Library Systems & Services
Stanford University Libraries
*From: *"gail at trumantechnologies.com" <gail at trumantechnologies.com>
*Date: *Monday, November 13, 2017 at 4:06 PM
*To: *Julian Morley <jmorley at stanford.edu>, "pasig-discuss at mail.asis.org" <
pasig-discuss at mail.asis.org>
*Cc: *"gail at trumantechnologies.com" <gail at trumantechnologies.com>
*Subject: *RE: [Pasig-discuss] Experiences with S3-like object store
service providers?
Hi Julian, thanks for sharing your list and comments. Very thorough list.
I'd love to chat (and I'm close by in Oakland).... I've quite a lot of
experience in the cloud storage field and would suggest you also take a
look at multi-cloud connector technologies that will allow you to
standardize on S3, but write to non-S3-based public cloud vendors. And to
tier or move data among private and public clouds and do federated search
on metadata across a single namespace (across these clouds).
Check out a couple of interesting technologies:
Open Source Zenko.io - offering S3 connect to AWS, Azure and Google (the
latter 2 are coming shortly), and also
Scality Connect for Azure Blog Storage - translates S3 API calls to Azure
blob storage API calls.
See the attached datasheet and also https://www.zenko.io/
I'd add Scality to your list -- see the Gartner magic quadrant they're
shown in the Upper Right Visionary quadrant and are close to you in San
Francisco. They talk S3, File, NFS/SMB, REST (CDMI etc), can tier off to
public clouds, and have lots of multi-PB size customer installs. Gartner
MQ is here:
https://www.gartner.com/doc/reprints?id=1-4IE870C&ct=171017&st=sb
I'd be very interested in learning more about your use cases -- can we
connect outside of this PASIG alias?
Gail
Gail Truman
Truman Technologies, LLC
Certified Digital Archives Specialist, Society of American Archivists
*Protecting the world's digital heritage for future generations*
www.trumantechnologies.com
facebook/TrumanTechnologies
https://www.linkedin.com/in/gtruman
+1 510 502 6497
-------- Original Message --------
Subject: [Pasig-discuss] Experiences with S3-like object store service
providers?
From: "Julian M. Morley" <jmorley at stanford.edu>
Date: Mon, November 13, 2017 12:43 pm
To: "pasig-discuss at mail.asis.org" <pasig-discuss at mail.asis.org>
Hi everyone,
I’ve currently got at least four use cases for an S3-compatible object
store, spanning everything from traditional S3 through infrequent access
stores to cold vaults. As a result I’ve spent considerable time researching
options and prices, and was wondering if anyone on this list has any
similar experiences they’d like to share.
Our use cases range from hundreds of TB through to several PB, with
different access patterns and comfort levels around redundancy and access.
For most of them a 100% compatible S3 API is a requirement, but we can bend
that a bit for the cold storage use case. We’re also considering
local/on-prem object stores for one of the use cases - either rolling our
own Ceph install, or using Dell/EMC ECS or SpectraLogic
ArcticBlue/Blackpearl.
The vendors that I’m looking at are:
*Amazon Web Services* (S3, Infrequent Access S3 and S3-to-Glacier).
This is the baseline. We have a direct connect pipe to AWS which reduces
the pain of data egress considerably.
*IBM Cloud Bluemix* (formerly CleverSafe)
A good choice for multi-region redundancy, as they use erasure coding
across regions - no ‘catch up’ replication - providing CRR at a cheaper
price than AWS. If you only want to keep one copy of your data in the
cloud, but have it be able to survive the loss of a region, this is the
best choice (Google can also do this, but not with an S3 API or an
infrequent access store).
*Dell/EMC Virtustream* (no cold storage option)
Uses EMC ECS hardware. Actually more expensive than AWS at retail pricing
for standard object storage; their value add is tying Virtustream into
on-prem ECS units.
*Iron Mountain Iron Cloud* (Infrequent Access only)
Also uses EMC ECS hardware. Designed primarily for backup/archive workloads
(no big surprise there), but with no retrieval, egress or PUT/GET/POST
charges.
*Oracle Cloud* (cheapest cold storage option, but not S3 API)
Uses Openstack Swift. Has the cheapest cloud-tape product (Oracle Cloud
Storage Archive), but has recently increased prices to be closer to AWS
Glacier.
*Google Cloud Platform* (not an S3 API)
Technically brilliant, but you have to be able to use their APIs. Their
cold storage product is online (disk, not tape), but not as cheap as
Glacier.
*Microsoft Azure *(not an S3 API)
Competitively priced, especially their Infrequent Access product, but again
not an S3 API and their vault product is still in beta.
*Backblaze B2* (not an S3 API)
Another backup/archive target, only slightly more expensive than Glacier,
but online (no retrieval time or fees) and with significantly cheaper data
egress rates than AWS.
*Wasabi Cloud*
Recently launched company from the team that brought you Carbonite.
Ridiculously cheap S3 storage, but with a 90-day per-object minimum
charge. *It’s
cheaper and faster than Glacier*, both to store data and egress it, but
there’s obvious concerns around company longevity. Would probably make a
good second target if you have a multi-vendor requirement for your data.
If anyone is interested in hearing more, or has any experience with any of
these vendors, please speak up!
--
Julian M. Morley
Technology Infrastructure Manager
Digital Library Systems & Services
Stanford University Libraries
------------------------------
----
To subscribe, unsubscribe, or modify your subscription, please visit
http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at
http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org
http://mail.asis.org/mailman/listinfo/pasig-discuss
National Library of Scotland, Scottish Charity, No: SCO11086
This communication is intended for the addressee(s) only. If you are not
the addressee please inform the sender and delete the email from your
system. The statements and opinions expressed in this message are those of
the author and do not necessarily reflect those of National Library of
Scotland. This message is subject to the Data Protection Act 1998 and
Freedom of Information (Scotland) Act 2002. No liability is accepted for
any harm that may be caused to your systems or data by this message.
*Before you print please think about the ENVIRONMENT*
<http://www.nls.uk/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20171114/2303cd85/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 18754 bytes
Desc: not available
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20171114/2303cd85/attachment-0001.jpg>
More information about the Pasig-discuss
mailing list