[Pasig-discuss] Fwd: TRY IT: Fedora 4 Alpha 3 with Extensive Performance Benchmarking and Improvements

Carol Minton Morris cmmorris at fedora-commons.org
Thu Jan 2 10:31:39 EST 2014


*FOR IMMEDIATE RELEASE*

January 2, 2014
Contact: Andrew Woods <awoods at duraspace.org>
Read it online: http://bit.ly/JwimVB

*GIVE IT A TRY: Fedora 4 Alpha 3 with Extensive Performance Benchmarking,
Improvements, Documentation*

*Winchester, MA*  The Fedora 4 team is proud to announce the third Alpha
release of Fedora 4. In the continuing effort to provide early access to
the quickly growing Fedora 4 feature set, this Alpha release is one of
several leading up to the feature-complete Fedora 4 Beta release. The
Fedora 4 development team and supporting institutions made a strong
commitment and pushed to produce the Fedora 4 Alpha 3 Release–'The Holiday
Release'–on Dec. 28.

The list of features, performance benchmarking and improvements, and
associated documentation is extensive.

Release Notes:
https://wiki.duraspace.org/display/FF/Fedora+4.0+Alpha+3+Release+Notes

The Holiday Release, is a public-facing alpha release with a 'one-click
run' download and associated 'Quick Start Guide' to get as much of the
Fedora community as possible putting eyes on the current Fedora 4.

Quick Start Guide:
https://wiki.duraspace.org/display/FF/Quick+Start

*Give it a try*, and Happy New Year!

*Complete list of features:*

*Authorization*

The initial pattern for Fedora 4 Authorization is that a given user request
will have already been Authenticated before entering the Fedora 4
application. Authenticated user requests are expected to contain an
identity and zero or more additional attributes, such as groups. These
combined user attributes (in addition to other attributes which may be
mapped to the requesting user) along with the requesting action are
compared against configurable rules to determine if the user has the
privilege to perform the action on the resource.

Administrators can associate "read", "write", and "admin" roles with user
principals on repository object and datastream nodes, as well as on
hierarchies of nodes using:

• The restricted Access Roles REST API [4] or
• The input form [5] of the Fedora 4 HTML UI

Once the access rules have been defined on repository resources, the Basic
Roles-based Policy Enforcement Point [6] (if enabled) will restrict
requests as described above and in further detail on the wiki [7].

The Fedora 4 authorization feature ensures that:

• Restricted child nodes of a requested node are not visible in API
responses
• Deletion of a node will recursively delete its children nodes, unless the
requesting user does not have sufficient privileges to delete one or more
of the children, in which case the entire deletion operation will fail

More details on the design and implementation of the authorization feature
can be found on the following wiki page [8] and its sub-pages.

*Batch Operations*

This release enhances the previous batch operations capability to support a
more standardized approach to performing the following actions batched as a
single request:

• Retrieve multiple binary resources in a single request
• Create multiple resources in a single request
• Modify multiple resources in a single request
• Delete multiple resources in a single request

In addition to batching multiple actions of the same type,
create/modify/delete actions can also be mixed in a single request.

Examples and feature documentation can be found on the wiki [9], along with
the REST API documentation [10].

*Content Modeling*

One aspect of Fedora 4 content modeling is the ability to define custom
repository node-types including the node's composition (i.e. property types
and multiplicity). In addition to the existing "compact node definition"
[11] (CND) file, this release adds the ability to define node-types at
runtime via the Fedora 4 REST-API [12]. This now allows repository managers
to configure repository node-types programmatically after the application
has been installed.

An example set of configurations [13] have been constructed that represent
an initial set of Fedora 3 content models translated into Fedora 4
node-types.

*Large Files*

One of the long-standing requirements of Fedora is support for the
management and serving of large files. The native "projection" or
"federation" capability offered by Fedora 4's underlying JCR implementation
(ModeShape [14]) allows for content on the filesystem, in a database,
web-accessible, etc., to be connected to and exposed through the
repository. The results of testing this capability over multi-gigabyte
files showed performance bottlenecks.

One of the advantages of leveraging the opensource ModeShape under Fedora 4
is that we are able to push improvements upstream to that project.
Modifications and enhancements to ModeShape's FileSystemConnecter [15] from
the Fedora 4 team have been incorporated into ModeShape 3.6.0. The
contributed updates to the ModeShape codebase provide the option to either
postpone the most time-intensive "federation" action (i.e. unique internal
identifier generation based on content checksum) until the content is
requested or to use a faster, surrogate internal identifier in the case
where performance would otherwise be unacceptable.

See the "Performance Benchmark - Large files" section below for details of
the limits and performance of large file support.

Additional details of the "large files" approach can be found on the wiki
[16].

*Search*

Fedora 4 is designed to support two search services:

• External search (i.e. standalone Solr populated by repository event
listener)
• Administrative search (i.e. advanced legacy field-search)

*External Search*

External search went through a significant round of refactoring this
release in order to address performance issues discovered in the
application profiling effort as well as to establish a flexible pattern for
transforming resource properties into indexible fields. In a similar
pattern employed by the external triplestore feature, external search
relies on repository event messages to trigger index updates. These
messages have been refactored to contain minimal, essential event and
resource information which now eliminates the previous overhead imposed by
the eventing machinery of making additional lookups back into the
repository.

As for the configurable identification of resources to be indexed and the
definition of transformations which the external search component leverages
to get a mapping of resource properties to indexible fields, the basic
approach is as follows:

1. Set the property on a resource that flags it for indexing
2. Optionally, set the property on a resource that references the
properties mapping transformation
3. Optionally, create a new resource that contains the actual LDPath [17]
transformation referenced in the previous step

More details of the external search feature and its configuration can be
found on the wiki [18].

*Administrative Search*

This release establishes the administrative search service. If a
user-facing, full-featured search service is required of your repository,
the external search is ideal. However, if a repository administrator-facing
search is needed in support of queries over resource properties, then the
new administrative search may suffice. Administrative search exposes both a
text search over resource properties as well as a SPARQL endpoint over
repository subjects.
For more details on the administrative search and its usage, see the wiki
[19].

*Simplified Deployment*

One of the goals of Fedora 4 is to simplify the application deployment as
well as the wiring of optional components and their subsequent
configuration. Although there is a significant amount of work remaining
towards this goal, one early step in this direction is the ability to
deploy Fedora 4 by just dropping the web-application archive (WAR) file
into a servlet container without the need for any additional configuration.
Leveraging this simple deployment capability, this release produced a
"One-Click Run" download [20] which literally enables the user to click on
the download to start up a local Fedora 4 repository.

A brief introduction to navigating the Fedora 4 web interface is documented
on the wiki [21].

Additionally, in support of the devOps users, on-going effort is dedicated
to making the deployment and configuration of Fedora 4 as straight-forward
and reproducible as possible. In an attempt to eliminate the confusion as
to which system properties should be set for configuring Fedora 4
persistence locations, a single system property (fcrepo.home) allows a
Fedora 4 installation to specify the base directory under which all other
application data will be written.

Details of the deployment and configuration of Fedora 4 is described in the
wiki [22].

*Storage Durability*

The fundamental principles of Fedora have always included a commitment to a
non-proprietary, transparent, persistence format. Within the Fedora 4
architecture, there are several available approaches to defining the
backend persistence store.
The two backend stores that have primarily been used so far in development
are the filesystem and LevelDB [23] implementations. In both cases, Fedora
4 persists the binary content in a tree of directories and the resource
properties as binary JSON.

Details of the format of the JSON and nested fields is described in the
wiki [24].

*Versioning*

This release introduces the first implementation of the versioning [25]
capability within Fedora 4.

Versions can be created for a specific repository resource via the REST API
[26], with the option to associate a label with the version.

Additionally, auto-versioning can be enabled by setting a property on a
resource that indicates the activation of auto-versioning [27]. This
property can either be set at runtime by the repository user, or more
globally as a default property defined in the Fedora 4 node-type
definitions [28].
Resource versions are returned via the REST API and HTML interfaces.

*Performance*

It often goes without saying that Fedora 4 must be performant under a range
of use cases and scenarios. A very specific theme of this release was
ensuring those assumptions hold true, and in the cases where they do not,
surface and address the reasons. The following performance-related topics
received attention this release.

*Profiling*

Profiling was employed as an initial means of inspecting the hotspots
within the codebase. In general, it was determined that the greatest
sources of slowdown relate to:

• Extraneous creation of JCR sessions
• JCR node lookups
• Synchronous internal index updates

*Benchmarking*

In parallel to the profiling work, significant effort was put towards
painting a clear picture of the current performance status of Fedora 4
across a variety of hardware, configurations, and scenarios.

Tests were performed with consistent and documented setups across test
servers at the following institutions:

• FIZ Karlsruhe
• Stanford University
• University of California, San Diego
• University of North Carolina, Chapel Hill
• University of Wisconsin
• Yale University

Tests are defined by their union of the following four variables [29]:

• Platform Profile - the hardware and networking used to conduct the tests
• Repository Profile - the Fedora-specific configuration options
• Setup Profile - the data loaded into the repository as a baseline before
testing
• Workflow Profile - the specific tests performed, what tools were used,
and what was measured

Of particular interest are the results [30] of ingest/read/update/delete
workflows with Fedora 4 single-node installation.

*Performance Benchmark - Authorization*

An additional set of benchmarks were collected to determine the effect of
authorization on performance [31].

As expected, there is a performance penalty with authorization enabled;
however, these tests tend to indicate the impact to be less than 10% across
the ingest/read/update/delete functions.

*Performance Benchmark - Fedora 3 vs. 4*

Defining the goals for acceptable performance levels for a repository is an
ambiguous task. There are many variables that come into play, and
generating test cases that simulate production scenarios is not always
effective. That said, one concrete measure of performance is the relative
behavior of Fedora 4 in comparison to Fedora 3.

Significant work remains in this comparison [32], but some initial numbers
show favorably for Fedora 4's ingest capability.

*Performance Benchmark - Large files*

In terms of performance related to large files, this release tested the
limits and performance of:

• Ingest and retrieval via the Fedora 4 REST API
• Retrieval via Fedora 4 filesystem projection

In both cases, content as large as 1-TB was successfully tested with
documented [33] throughput.

*Documentation*

As we move closer to a Beta release of Fedora 4, it is vital that there
exist developer and administrator documentation for the application. An
initial structuring of this documentation can be found on the wiki [34].

The following sections contain user-facing documentation:

• Administrator Guide [35]
• Developers Guide [36]
• Feature Tour [37]
• Features [38]
• Glossary [39]

*Acknowledgements*

This release is due to the commitment of the Fedora sponsors [40] and the
effort of the following Fedora community developers:

• Benjamin Armintor - Columbia University
• Nigel Banks - Discovery Garden
• Frank Asseg - FIZ Karlsruhe
• Ye Cao - Max Planck Digital Library
• Chris Beer - Stanford University
• Esme Cowles - University of California, San Diego
• Greg Jansen - University of North Carolina, Chapel Hill
• Michael Durbin - University of Virginia
• Scott Prater - University of Wisconsin
• Osman Din - Yale University
• Eric James - Yale University

*References*

[1] https://github.com/futures/fcrepo4/releases/tag/fcrepo-4.0.0-alpha-3
[2]
https://github.com/futures/fcrepo4/releases/download/fcrepo-4.0.0-alpha-3/fcrepo-webapp-4.0.0-alpha-3-jetty-console.war
[3]
https://github.com/futures/fcrepo4/releases/download/fcrepo-4.0.0-alpha-3/fcrepo-webapp-4.0.0-alpha-3.war
[4] https://wiki.duraspace.org/display/FF/Access+Roles+Module[5]
https://wiki.duraspace.org/display/FF/Feature+Tour+-+Action+-+Access+Roles<https://wiki.duraspace.org/display/FF/Access+Roles+Module%5B5%5D%20https://wiki.duraspace.org/display/FF/Feature+Tour+-+Action+-+Access+Roles>
[6] https://wiki.duraspace.org/display/FF/Basic+Role-based+PEP
[7]
https://wiki.duraspace.org/display/FF/Design+Guide+-+Policy+Enforcement+Points
[8] https://wiki.duraspace.org/display/FF/Authorization
[9] https://wiki.duraspace.org/display/FF/Batch+Operations
[10] https://wiki.duraspace.org/display/FF/REST+API+-+Batch+Operations
[11]
https://docs.jboss.org/author/display/MODE/Compact+Node+Type+(CND)+files
[12] https://wiki.duraspace.org/display/FF/REST+API+-+Node+Types
[13] https://github.com/futures/fcrepo-content-model-examples
[14] https://docs.jboss.org/author/display/MODE/Federation
[15]
https://docs.jboss.org/author/display/MODE/File+system+connector
[16] https://wiki.duraspace.org/display/FF/Federation
[17] http://wiki.apache.org/marmotta/LDPath
[18] https://wiki.duraspace.org/display/FF/External+Search
[19] https://wiki.duraspace.org/display/FF/Admin+Search
[20]
https://github.com/futures/fcrepo4/releases/download/fcrepo-4.0.0-alpha-3/fcrepo-webapp-4.0.0-alpha-3-jetty-console.war
[21] https://wiki.duraspace.org/display/FF/Feature+Tour
[22]
https://wiki.duraspace.org/display/FF/Deploying+Fedora+4
[23] https://code.google.com/p/leveldb/<https://wiki.duraspace.org/display/FF/Deploying+Fedora+4%20%5B23%5D%20https://code.google.com/p/leveldb/>
[24] https://wiki.duraspace.org/display/FF/ModeShape+Artifacts+Layout
[25] https://wiki.duraspace.org/display/FF/Versioning
[26] https://wiki.duraspace.org/display/FF/REST+API+-+Versioning
[27]
https://wiki.duraspace.org/display/FF/How+to+set+repository-wide+auto-versioning
[28]
https://github.com/futures/fcrepo4/blob/fcrepo-4.0.0-alpha-3/fcrepo-kernel/src/main/resources/fedora-node-types.cnd
[29] https://wiki.duraspace.org/display/FF/Performance+Testing
[30] https://wiki.duraspace.org/display/FF/Single-Node+Test+Results
[31]
https://wiki.duraspace.org/display/FF/AuthZ+-+No+AuthZ+Fedora+4+Comparison+Performance+Testing
[32]
https://wiki.duraspace.org/display/FF/Single-Node+Test+Results#Single-NodeTestResults-Fedora3/4Comparison
[33] https://wiki.duraspace.org/display/FF/Large+File+Ingest+and+Retrieval
[34] https://wiki.duraspace.org/display/FF/Documentation
[35] https://wiki.duraspace.org/display/FF/Administrator+Guide
[36] https://wiki.duraspace.org/display/FF/Developers+Guide
[37] https://wiki.duraspace.org/display/FF/Feature+Tour
[38] https://wiki.duraspace.org/display/FF/Features
[39] https://wiki.duraspace.org/display/FF/Glossary
[40] https://wiki.duraspace.org/display/FF/Fedora+Sponsors

-- 
Carol Minton Morris
DuraSpace
Director of Marketing and Communications
cmmorris at DuraSpace.org
Skype: carolmintonmorris
607 592-3135
Twitter at DuraSpace <http://twitter.com/duraspace>
Twitter at DuraCloud <http://twitter.com/duracloud>
http://DuraSpace.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.asis.org/pipermail/pasig-discuss/attachments/20140102/827385e1/attachment-0001.html>


More information about the Pasig-discuss mailing list