[Pasig-discuss] Preservation Storage suppliers
Tom Hutchinson
thutchi1 at swarthmore.edu
Fri Dec 22 11:53:38 EST 2017
Hi, Aslam —
Preservation is a broad term. The idea is how to keep information over an
extended period. Most of preservation is not technical, it is
organizational. At this time, technical aspects involve two main areas. One
is ensuring your data is in a format that you can actually read (e.g. a zip
disk of Lotus 1-2-3 spreadsheets may be in perfect condition but is useless
without the hardware to read the media and the software to understand the
files)
The other is ensuring that good data does not become corrupted. Files can
become corrupted ("bit rot"). If you don't notice when it happens, your
backups will simply contain the bad files too.
A backup alone is not preservation because it doesn't insure the the actual
information stays valid and usable.
The standard way of "noticing" files have gone bad is to calculate
checksums (aka hashes, or in preservation parlance, "fixity checks"). To
see if a file has been corrupted, recalculate the checksum and compare it
with the old checksum. One problem with this approach is that you won't be
able to tell if the file itself has been corrupted or if the checksum was
corrupted (or both). To standard way around this having multiple copies of
checksums.
You can improve on checksums by using error correcting codes, such as with
the tool par2, and with cryptographic signatures, such as with the tool
gpg. Checksums and backups alone will get you pretty far.
Using checksums assumes that the files you started with were "good" - not
corrupted and also properly made in the first place. Another popular check
done is to see if your files are formatted properly. See if that .zip is
actually a valid zip file. Is it readable? Does it conform to spec?
"Preservation storage" is the idea of building in preservation abilities
into the storage itself. My personal view is that it is a flawed approach.
I encourage keeping preservation as a layer separate from storage. Don't
throw your SAN in the trash just yet ;-) Storage should be a dumb
commodity. Local storage, off the shelf cloud providers, off-site tape, etc
are all perfectly fine. Pick solutions that will limit and detect errors,
e.g. ZFS, but don't rely on them. Assume particular file, including stored
checksums, will become corrupted.
Your preservation layer is where to perform preservation activities. There
you run your checksums. There you also look and see when the last time you
ran checksums was.
For many institutions, a DIY approach to preservation can make sense.
Vendor solutions are also available. Some will sell you a preservation
platform, either as a product you buy or a service they provide. You plug
in your storage, their software runs on top.
Many vendors will sell you both technical preservation and storage
together, "preservation storage". These can be convenient one-stop shops.
These tend toward the costly side and lack flexibility. However, they can
be rather robust and usable. Many customers are quite happy with them. When
considering a preservation storage solution, I'm a fan of the last slide in
the 2017 "acid-free" presentation previously mentioned. Beware lock-in.
Best of luck in your search. Please report back. Happy New Year.
Tom
On Wed, Dec 20, 2017 at 9:48 AM, Antonio Guillermo Martinez (LIBNOVA) <
a.guillermo at libnova.com> wrote:
> Dear Michelle,
>
>
>
> Thank you for your interest. You have two main options when dealing with
> this problem and using the products we offer.
>
>
>
> The first one is to use LIBSAFE (a software to create the preservation
> repository; you can use the software we provide, but probably other
> equivalent solutions in the market like Preservica/Archivematica/etc. would
> also do the job) to create the digital preservation repository and use
> LIBDATA to save one or more of the copies in LIBDATA appliances. This way,
> the LIBSAFE software is in charge of making several copies, hash files
> within the defined period (for fixity check), manage versions, store
> metadata, give you a search interface, validate/characterize file formats,
> etc.
>
>
>
> In LIBSAFE you simply go to Storage Groups and create them. Inside every
> group you create one or more mountpoints:
>
> [image: cid:image002.jpg at 01D379A8.4ABAF770]
>
>
>
>
>
> And you can indicate where LIBSAFE should make the copies of the ingested
> objects:
>
> [image: cid:image006.jpg at 01D379A8.4ABAF770]
>
>
>
> This way, you get a fully managed system. You don’t need to care about
> where one object is stored (you can always know that, but you don’t need
> to). The repository software is used to create an abstraction layer that
> gives you access to your objects (you can search by metadata, instead of
> looking at a filesystem, you leave trace of every action, etc.) and takes
> care of all the complexity, like the periodic hashing, storage healing,
> migration, etc.
>
>
>
> For instance, every time you ingest one object, if you have configured
> LIBSAFE to make three copies: one in your datacenter A, another in your
> datacenter B and another in the Amazon S3 cloud, LIBSAFE will make them by
> itself and then verify them from time to time.
>
>
>
> If you look at the underlying storage, you just see a list of folders (one
> folder = one preserved object) and, inside, the same files you ingested,
> without any renaming or structure change, plus some XML files with the
> object’s metadata, fixity information (hashes), etc.
>
>
>
> [image: cid:image007.png at 01D379A8.4ABAF770]
>
>
>
> *Note that, when using LIBSAFE to create your repository, you are not
> forced to use LIBDATA for storage. It can use almost any storage sold
> nowadays (CIFS and/or cloud). Using LIBDATA for one of your copies is just
> convenient, efficient and affordable.*
>
>
>
> The other option is to just use the LIBDATA storage. Without any
> repository software. You lose a lot of functionality, but you can certainly
> do it. My personal opinion is that, if you have a decent collection and you
> need to buy hardware, it is not going to be worth not to include a
> repository software and get the full package. The total cost of the system,
> including the repository software, is not a lot more and you get much more
> protection and benefits (you get the base for a true preservation system),
> but, hey!, we sell the software, what are you going to expect! :)
>
>
>
> The LIBDATA appliances have enough functionality to replicate themselves
> using rsync attending to digital preservation-oriented best practices (with
> the possibility not to replicate deletions and/or overwrites or to
> replicate them a few days later, warn you if the replication is affecting a
> lot of files, fixity traking, etc.).
>
>
>
> (note the “DarkSync IP”, Sync Protect options, etc. in this LIBDATA
> manager interface)
>
>
>
> And disk management/replacement is fully managed:
>
>
>
>
>
>
>
> In your case, you are going to get much better advice from other members
> of this list (you have the best of the best here) than from me, but, for
> starting, I would recommend sticking to the NDSA levels of preservation and
> point to a level 4, if you can afford it. This way, you are not talking to
> your IT about abstract thoughts. You are pointing to a widely
> community-accepted way to understand preservation, and, from this point,
> you can move your IT to have several number of copies, in several places,
> without anyone in your organization being able to delete them all, etc.. I
> can also remember a very good presentation the DPC made in last PASIG
> (sorry, I can’t remember if it was by Sharon or by William) with very
> interesting thoughts, guidelines and recommendations that can serve you to
> create the case.
>
>
>
> If you have interest, I can ask for credentials for you to access a
> LIBSAFE repository software and/or access to a LIBDATA appliance to play
> with them. I’m also attaching some LIBSAFE information.
>
>
>
> Best regards, AG.
>
> ---
>
> Antonio Guillermo Martínez Largo
>
> libnova – Technology changes. Information prevails.
>
> www.libnova.com
>
>
>
> EMEA & LATAM: Paseo de la Castellana, 153 – Madrid [t] +34 91 449 08
> 94 <+34%20914%2049%2008%2094>
>
> USA & CANADA: 14 NE First Ave (2nd Floor) - Miami, Florida 33132, USA
> [t]: +1 855-542-6682 <(855)%20542-6682>
>
>
>
> *From:* Lindlar, Michelle [mailto:Michelle.Lindlar at tib.eu]
> *Sent:* Wednesday, December 20, 2017 12:18 PM
> *To:* Antonio Guillermo Martinez (LIBNOVA) <a.guillermo at libnova.com>;
> pasig-discuss at mail.asis.org
> *Subject:* AW: [Pasig-discuss] Preservation Storage suppliers
>
>
>
> Dear Antonio,
>
> all,
>
>
>
> I have a specific question about LIBDATA. I understand it to be HW with a
> SW storage management layer. Does it support multiple independent copies as
> well as geographically spread storage clusters?
> Coming from a digital preservation unit which is not embedded into the
> organization’s IT department, those are the two issues I find hardest to
> explain to IT as a digital preservation need.
>
> Looking through your brochure I didn’t see pointers towards those issues
> (might be me just not having seen them), so I was wondering what the
> thoughts are on this.
>
>
>
> Cheers,
> ML
>
>
>
> Michelle Lindlar
> Technische Informationsbibliothek (TIB)
> German National Library of Science and Technology
> Digital Preservation
> Welfengarten 1 B // 30167 Hannover, Germany
> T +49 511 762-19826 <+49%20511%2076219826>
> michelle.lindlar at tib.eu
> www.tib.eu
>
>
>
>
>
>
>
> *Von:* Pasig-discuss [mailto:pasig-discuss-bounces at asist.org
> <pasig-discuss-bounces at asist.org>] *Im Auftrag von *Antonio Guillermo
> Martinez (LIBNOVA)
> *Gesendet:* Montag, 18. Dezember 2017 19:32
> *An:* pasig-discuss at mail.asis.org
> *Betreff:* Re: [Pasig-discuss] Preservation Storage suppliers
>
>
>
> Dear Aslam,
>
>
>
> We currently provide a digital preservation-specific Operating System
> (LIBDATAos), on top of a high density, low cost, high robustness hardware.
>
>
>
> The main benefits we see for our customers are:
>
>
>
> - Preservation-specific features (see the attached brochure for
> details), but:
> - Hashing could be done directly by the storage appliance using an
> API (20x faster hashing performance over traditional server-based approach)
> - There are WORM-protected areas. System-level protection for
> modification or deletion with some rules.
> - Including DarkSync mode, with advanced file replication, without
> propagating deletions and overwrites directly and tDhe ability to
> generate system-wide manifests of storage contents.
> - Highly secure with embedded bit rot protection, standard ZFS
> filesystem (extended Z1, Z2 or Z3 data protection of every 12 disks).
> Storage mountable using standard CIFS (including SMB v3) and NFS, among
> other protocols.
> - High density, with 2PB of raw capacity in a 42U rack, or 216TB for
> each 4U appliance.
> - Low cost, with very low cost of acquisition and operation.
> - NBD support nearly in every State.
> - Very easy to manage. Everything is done using a management web
> interface.
>
>
>
> When you use the LIBDATA storage combined with the LIBSAFE Integrated
> Digital Preservation Repository software, you also get a lot of extended
> benefits.
>
>
>
> Best regards, AG.
>
> ----
>
> Antonio Guillermo Martínez Largo
>
> libnova – Technology changes. Information prevails.
>
> www.libnova.com
>
>
>
> EMEA & LATAM: Paseo de la Castellana, 153 – Madrid [t] +34 91 449 08
> 94 <+34%20914%2049%2008%2094>
>
> USA & CANADA: 14 NE First Ave (2nd Floor) - Miami, Florida 33132, USA
> [t]: +1 855-542-6682 <(855)%20542-6682>
>
>
>
> *From:* Pasig-discuss [mailto:pasig-discuss-bounces at asist.org] *On Behalf
> Of *Aslam Ghumra (IT Services, Facilities Management)
> *Sent:* Monday, December 18, 2017 1:19 PM
> *To:*
> *Subject:* [Pasig-discuss] Preservation Storage suppliers
>
>
>
> Hi All,
>
>
>
> Since it’s quite here I’ve been able to review the conference notes of
> PASIG17 and there was a presentation from someone ( prob from the US )
> regarding preservation storage.
>
>
>
> She inferred in her talk that she has already had several chats with some
> suppliers but was collating more.
>
>
>
> We have a storage solution, but I really want to understand what is meant
> by “Preservation Storage” and how does that differentiate to normal
> archival storage or data where is it tiered off to an archive.
>
>
>
> There is a criteria “Preservation Storage Criteria, Version 2” which I’ve
> been reviewing via the presentation “Acid-free AIPS: Digital preservation
> storage criteria” by Sibyl Schaefer.
>
>
>
> Any pointers to suppliers whether you use them or not would be greatly
> received.
>
>
>
> Aslam Ghumra
>
> Research Data Management
>
> ____________________________
>
> IT Services
>
> Elms Road Data Centre
> Building G5
>
> Edgbaston
>
> Birmingham B15 2TT
>
> T: 0121 414 5877
>
> F; 0121 414 3952
>
> Skype : JanitorX
>
> Twitter : @aslamghumra @uob_rescomp
>
> in *: *https://uk.linkedin.com/in/aslam-ghumra-13907993
>
> http://intranet.bham.ac.uk/bear
>
>
>
> ----
> To subscribe, unsubscribe, or modify your subscription, please visit
> http://mail.asis.org/mailman/listinfo/pasig-discuss
> _______
> PASIG Webinars and conference material is at http://www.
> preservationandarchivingsig.org/index.html
> _______________________________________________
> Pasig-discuss mailing list
> Pasig-discuss at mail.asis.org
> http://mail.asis.org/mailman/listinfo/pasig-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20171222/1d10a546/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 43133 bytes
Desc: not available
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20171222/1d10a546/attachment-0004.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.jpg
Type: image/jpeg
Size: 82944 bytes
Desc: not available
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20171222/1d10a546/attachment-0005.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image007.png
Type: image/png
Size: 22450 bytes
Desc: not available
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20171222/1d10a546/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 45760 bytes
Desc: not available
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20171222/1d10a546/attachment-0006.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image006.jpg
Type: image/jpeg
Size: 35647 bytes
Desc: not available
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20171222/1d10a546/attachment-0007.jpg>
More information about the Pasig-discuss
mailing list