[Pasig-discuss] WORM (Write Once Read Many) AIPs

Tim.Gollins at nrscotland.gov.uk Tim.Gollins at nrscotland.gov.uk
Fri May 12 08:18:10 EDT 2017


Hi Neil

Brilliant - Most helpful and thought provoking. The fact that Fedora has the idea of a versioning Object store is particularly interesting. 

I think there are a couple of distinctions between Minimal Ingest and Sheer Curation but (from a quick glance at Google articles) they are appear very closely related. I think APT uses something like Asynchronous Message Driven Workers. 

Very many thanks indeed,  especially for such a swift an comprehensive response.

Tim

Tim Gollins | Head of Digital Archiving and Director of the NRS Digital Preservation Programme
National Records of Scotland | West Register House | Edinburgh EH2 4DF
+ 44 (0)131 535 1431 / + 44 (0)7974 922614 | tim.gollins at nrscotland.gov.uk | www.nrscotland.gov.uk

Preserving the past | Recording the present | Informing the future
Follow us on Twitter: @NatRecordsScot | http://twitter.com/NatRecordsScot


-----Original Message-----
From: Neil Jefferies [mailto:neil at jefferies.org] 
Sent: 12 May 2017 13:06
To: Gollins T (Tim)
Cc: pasig-discuss at mail.asis.org
Subject: Re: [Pasig-discuss] WORM (Write Once Read Many) AIPs

Tim,

If we store AIP's unpackaged, as a collection of files in a folder, then 
object updates could just be a new folder with symlinks to the unchanged 
parts and the updated parts in place in the folder. The object 
"location" would be a parent folder for all these version folders - for 
example, a pairtree (or triple-tree for faster scanning/rebuilds) based 
on object UUID. Version folders would be named accoprding to date or 
version number (date might make Memento compliant access simpler). 
Creating anew version clones the current verion (including links) with a 
new name and then replaces the updated parts in situ. Final act is to 
update a "current" symlink in the object. Any update failure will mean 
"current" is not updated an the partial clone can be discarded.

This assumes most updates are metadata and that a diff won't save much 
compared to a complete new XML file or whatever. I am also assuming that 
metadata won't be wrappered either (so you can forget METS) so that 
different types are stored in the most stuiable format and are accessed 
only when required. The problems with roundtripping packaged AIP's for 
updates rather than diff-ing are repeated by METS wrappering.

These may be a virtual folder/filesytem presentation and underneath an 
HSM would retrieve files from wherever when it is actually accessed. HSM 
policy in soemthing like SAM-QFS/Versity/Cray TAS can ensure folders are 
kep intact when moved to other storage (we could even dereference 
symlinks when dealing with tape).

This can be done with a POSIX filesystem and not muich code - Ben 
O'Steen started something along these lines here: 
https://github.com/dataflow/RDFDatabank/wiki/What-is-DataBank-and-what-does-it-do%3F

Fedora also also a versioning object store that could support this kind 
of model but also adds a fair bit of complexity to be 
Linked-Data_platform compliant.

In my paralance I would probably equate "Minimal Ingest" with "Sheer 
Curation" and APT with Asynchronous Message Driven Workers.

Neil


On 2017-05-12 12:33, Tim.Gollins at nrscotland.gov.uk wrote:
> Dear PASIG
> 
> I have been thinking recently about the challenge of managing
> "physical"  AIPs on offline or near line storage and how to optimise
> or simplify the use of managed storage media in a tape based (robotic)
> Hierarchical Storage Management (HSM) system. By "physical" AIPs I
> mean that the actual structure of the AIP written to the storage
> system is sufficiently self-describing that even if the management or
> other elements of a DP system were to be lost to a disaster then the
> entire collection could be fully re-instated reliably from the stored
> AIPs alone.
> 
> I have also been thinking about the huge benefits of adopting the
> concepts of "Minimal Ingest" (MI) and "Autonomous Preservation Tools"
> (APT) in a new Digital Archive solution.
> 
> One of the potential effects of the MI and APT concepts is that over
> time it is clear that while (of course) the original bit streams will
> never need to be updated, the metadata packaged in the AIP will need
> to change relatively often (through the life of the AIP) . This is of
> course in addition to any new renderings of the bit streams produced
> for preservation purposes (manifestations as termed in some systems).
> 
> If to update the AIP the process involves the AIP being "loaded" and
> "Modified" and "Stored" again as a whole then this will result in
> significant "churn" of the offline or near line media (i.e. tapes) in
> a HSM - which I would like to avoid. I think it would be really great
> if the AIP representation could accommodate the concept of an "update
> IP" (perhaps UIP?) where the UIP contains a "delta" of the original
> AIP - the full AIP then being interpreted as the original as modified
> by a series of deltas. This would then effectively result in AIPs (and
> UIPs) becoming WORM objects with clear benefits that I perceive in
> managing their reliable and safe storage.
> 
> I am not sufficiently familiar with the detail of all the different
> AIP models or implementations, I was wondering if anyone in the team
> would be able to comment on whether the they know of any AIP models,
> specifications or implementations that  would support such a use case.
> 
> I have just posted a version of this question to the E-Ark Linked in
> Group so my apologies to those who see it twice.
> 
> Many thanks
> 
> Tim
> Tim Gollins | Head of Digital Archiving and Director of the NRS
> Digital Preservation Programme
> National Records of Scotland | West Register House | Edinburgh EH2 4DF
> + 44 (0)131 535 1431 / + 44 (0)7974 922614 |
> tim.gollins at nrscotland.gov.uk | www.nrscotland.gov.uk
> 
> Preserving the past | Recording the present | Informing the future
> Follow us on Twitter: @NatRecordsScot | 
> http://twitter.com/NatRecordsScot
> 
> 
> **********************************************************************
> This e-mail (and any files or other attachments transmitted with it)
> is intended solely for the attention of the addressee(s). Unauthorised
> use, disclosure, storage, copying or distribution of any part of this
> e-mail is not permitted. If you are not the intended recipient please
> destroy the email, remove any copies from your system and inform the
> sender immediately by return.
> 
> Communications with the Scottish Government may be monitored or
> recorded in order to secure the effective operation of the system and
> for other lawful purposes. The views or opinions contained within this
> e-mail may not necessarily reflect those of the Scottish Government.
> 
> 
> Tha am post-d seo (agus faidhle neo ceanglan  còmhla ris) dhan neach
> neo luchd-ainmichte a-mhàin. Chan eil e ceadaichte a chleachdadh ann
> an dòigh sam bith, a’ toirt a-steach còraichean, foillseachadh neo
> sgaoileadh,  gun chead. Ma ’s e is gun d’fhuair sibh seo le gun
> fhiosd’, bu choir cur às dhan phost-d agus lethbhreac sam bith air an
> t-siostam agaibh, leig fios chun  neach a sgaoil am post-d  gun dàil.
> 
> Dh’fhaodadh gum bi teachdaireachd sam bith bho Riaghaltas na h-Alba
> air a chlàradh neo air a sgrùdadh airson dearbhadh gu bheil an siostam
> ag obair gu h-èifeachdach neo airson adhbhar laghail eile. Dh’fhaodadh
> nach  eil beachdan anns a’ phost-d seo co-ionann ri beachdan
> Riaghaltas na h-Alba.
> **********************************************************************
> 
> 
> 
> ----
> To subscribe, unsubscribe, or modify your subscription, please visit
> http://mail.asis.org/mailman/listinfo/pasig-discuss
> _______
> PASIG Webinars and conference material is at
> http://www.preservationandarchivingsig.org/index.html
> _______________________________________________
> Pasig-discuss mailing list
> Pasig-discuss at mail.asis.org
> http://mail.asis.org/mailman/listinfo/pasig-discuss


______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________

*********************************** ********************************
This email has been received from an external party and
has been swept for the presence of computer viruses.
******************************************************************** 



More information about the Pasig-discuss mailing list