[Pasig-discuss] WORM (Write Once Read Many) AIPs

BURNHILL Peter peter.burnhill at ed.ac.uk
Fri May 12 08:25:21 EDT 2017


Yes, I appreciated that too.

Peter


 Peter Burnhill

University of Edinburgh



Mobile: +44 (0) 774 0763 119<tel:+44%20774%200763%20119>

ps Am writing 'on the go' so pl excuse brevity

On 12 May 2017, at 1:23 pm, "Tim.Gollins at nrscotland.gov.uk<mailto:Tim.Gollins at nrscotland.gov.uk>" <Tim.Gollins at nrscotland.gov.uk<mailto:Tim.Gollins at nrscotland.gov.uk>> wrote:

Hi Neil

Brilliant - Most helpful and thought provoking. The fact that Fedora has the idea of a versioning Object store is particularly interesting.

I think there are a couple of distinctions between Minimal Ingest and Sheer Curation but (from a quick glance at Google articles) they are appear very closely related. I think APT uses something like Asynchronous Message Driven Workers.

Very many thanks indeed,  especially for such a swift an comprehensive response.

Tim

Tim Gollins | Head of Digital Archiving and Director of the NRS Digital Preservation Programme
National Records of Scotland | West Register House | Edinburgh EH2 4DF
+ 44 (0)131 535 1431 / + 44 (0)7974 922614 | tim.gollins at nrscotland.gov.uk<mailto:tim.gollins at nrscotland.gov.uk> | www.nrscotland.gov.uk<http://www.nrscotland.gov.uk>

Preserving the past | Recording the present | Informing the future
Follow us on Twitter: @NatRecordsScot | http://twitter.com/NatRecordsScot


-----Original Message-----
From: Neil Jefferies [mailto:neil at jefferies.org]
Sent: 12 May 2017 13:06
To: Gollins T (Tim)
Cc: pasig-discuss at mail.asis.org<mailto:pasig-discuss at mail.asis.org>
Subject: Re: [Pasig-discuss] WORM (Write Once Read Many) AIPs

Tim,

If we store AIP's unpackaged, as a collection of files in a folder, then
object updates could just be a new folder with symlinks to the unchanged
parts and the updated parts in place in the folder. The object
"location" would be a parent folder for all these version folders - for
example, a pairtree (or triple-tree for faster scanning/rebuilds) based
on object UUID. Version folders would be named accoprding to date or
version number (date might make Memento compliant access simpler).
Creating anew version clones the current verion (including links) with a
new name and then replaces the updated parts in situ. Final act is to
update a "current" symlink in the object. Any update failure will mean
"current" is not updated an the partial clone can be discarded.

This assumes most updates are metadata and that a diff won't save much
compared to a complete new XML file or whatever. I am also assuming that
metadata won't be wrappered either (so you can forget METS) so that
different types are stored in the most stuiable format and are accessed
only when required. The problems with roundtripping packaged AIP's for
updates rather than diff-ing are repeated by METS wrappering.

These may be a virtual folder/filesytem presentation and underneath an
HSM would retrieve files from wherever when it is actually accessed. HSM
policy in soemthing like SAM-QFS/Versity/Cray TAS can ensure folders are
kep intact when moved to other storage (we could even dereference
symlinks when dealing with tape).

This can be done with a POSIX filesystem and not muich code - Ben
O'Steen started something along these lines here:
https://github.com/dataflow/RDFDatabank/wiki/What-is-DataBank-and-what-does-it-do%3F

Fedora also also a versioning object store that could support this kind
of model but also adds a fair bit of complexity to be
Linked-Data_platform compliant.

In my paralance I would probably equate "Minimal Ingest" with "Sheer
Curation" and APT with Asynchronous Message Driven Workers.

Neil


On 2017-05-12 12:33, Tim.Gollins at nrscotland.gov.uk<mailto:Tim.Gollins at nrscotland.gov.uk> wrote:
Dear PASIG

I have been thinking recently about the challenge of managing
"physical"  AIPs on offline or near line storage and how to optimise
or simplify the use of managed storage media in a tape based (robotic)
Hierarchical Storage Management (HSM) system. By "physical" AIPs I
mean that the actual structure of the AIP written to the storage
system is sufficiently self-describing that even if the management or
other elements of a DP system were to be lost to a disaster then the
entire collection could be fully re-instated reliably from the stored
AIPs alone.

I have also been thinking about the huge benefits of adopting the
concepts of "Minimal Ingest" (MI) and "Autonomous Preservation Tools"
(APT) in a new Digital Archive solution.

One of the potential effects of the MI and APT concepts is that over
time it is clear that while (of course) the original bit streams will
never need to be updated, the metadata packaged in the AIP will need
to change relatively often (through the life of the AIP) . This is of
course in addition to any new renderings of the bit streams produced
for preservation purposes (manifestations as termed in some systems).

If to update the AIP the process involves the AIP being "loaded" and
"Modified" and "Stored" again as a whole then this will result in
significant "churn" of the offline or near line media (i.e. tapes) in
a HSM - which I would like to avoid. I think it would be really great
if the AIP representation could accommodate the concept of an "update
IP" (perhaps UIP?) where the UIP contains a "delta" of the original
AIP - the full AIP then being interpreted as the original as modified
by a series of deltas. This would then effectively result in AIPs (and
UIPs) becoming WORM objects with clear benefits that I perceive in
managing their reliable and safe storage.

I am not sufficiently familiar with the detail of all the different
AIP models or implementations, I was wondering if anyone in the team
would be able to comment on whether the they know of any AIP models,
specifications or implementations that  would support such a use case.

I have just posted a version of this question to the E-Ark Linked in
Group so my apologies to those who see it twice.

Many thanks

Tim
Tim Gollins | Head of Digital Archiving and Director of the NRS
Digital Preservation Programme
National Records of Scotland | West Register House | Edinburgh EH2 4DF
+ 44 (0)131 535 1431 / + 44 (0)7974 922614 |
tim.gollins at nrscotland.gov.uk<mailto:tim.gollins at nrscotland.gov.uk> | www.nrscotland.gov.uk<http://www.nrscotland.gov.uk>

Preserving the past | Recording the present | Informing the future
Follow us on Twitter: @NatRecordsScot |
http://twitter.com/NatRecordsScot


**********************************************************************
This e-mail (and any files or other attachments transmitted with it)
is intended solely for the attention of the addressee(s). Unauthorised
use, disclosure, storage, copying or distribution of any part of this
e-mail is not permitted. If you are not the intended recipient please
destroy the email, remove any copies from your system and inform the
sender immediately by return.

Communications with the Scottish Government may be monitored or
recorded in order to secure the effective operation of the system and
for other lawful purposes. The views or opinions contained within this
e-mail may not necessarily reflect those of the Scottish Government.


Tha am post-d seo (agus faidhle neo ceanglan  còmhla ris) dhan neach
neo luchd-ainmichte a-mhàin. Chan eil e ceadaichte a chleachdadh ann
an dòigh sam bith, a’ toirt a-steach còraichean, foillseachadh neo
sgaoileadh,  gun chead. Ma ’s e is gun d’fhuair sibh seo le gun
fhiosd’, bu choir cur às dhan phost-d agus lethbhreac sam bith air an
t-siostam agaibh, leig fios chun  neach a sgaoil am post-d  gun dàil.

Dh’fhaodadh gum bi teachdaireachd sam bith bho Riaghaltas na h-Alba
air a chlàradh neo air a sgrùdadh airson dearbhadh gu bheil an siostam
ag obair gu h-èifeachdach neo airson adhbhar laghail eile. Dh’fhaodadh
nach  eil beachdan anns a’ phost-d seo co-ionann ri beachdan
Riaghaltas na h-Alba.
**********************************************************************



----
To subscribe, unsubscribe, or modify your subscription, please visit
http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at
http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org<mailto:Pasig-discuss at mail.asis.org>
http://mail.asis.org/mailman/listinfo/pasig-discuss


______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________

*********************************** ********************************
This email has been received from an external party and
has been swept for the presence of computer viruses.
********************************************************************

----
To subscribe, unsubscribe, or modify your subscription, please visit
http://mail.asis.org/mailman/listinfo/pasig-discuss
_______
PASIG Webinars and conference material is at http://www.preservationandarchivingsig.org/index.html
_______________________________________________
Pasig-discuss mailing list
Pasig-discuss at mail.asis.org<mailto:Pasig-discuss at mail.asis.org>
http://mail.asis.org/mailman/listinfo/pasig-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170512/7baa098d/attachment-0001.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20170512/7baa098d/attachment-0001.pl>


More information about the Pasig-discuss mailing list