Warning: These documents are under active
development and subject to change (version 2.1.0-beta).
The latest release documents are at:
https://purl.dataone.org/architecture
These notes were initiated by DV with responses by RW and MJ around 17 December, 2010.
Need a couple of methods internal to the CN to allow efficient updating of mutable content, which includes SystemMetadata and the Node Registry.
Update of system metadata is required so that CNs can keep track of content as it is replicated around the place between nodes.
Update of registry is required on a fairly frequent basis for tracking node status and synchronization and other operations.
bool updateSystemMetadata(Identifier pid, newSysMetaDocument doc)
or perhaps a more granular approach:
String updateSystemMetadataProperty(Identifier pid, propertyPath path, String newValue)
pid = the identifier of the object for which sys meta is to be updated
path = simple xpath statement identifying element or attribute to update
newValue = the new value to set (assuming everything is a string)
returns previous value
We could generalize this a bit:
stringList updateMutableObject(Identifier pid, UpdateStatementList updates)
UpdateStatementList = list of
path
new value
Robert’s Notes:
We decided not to use XPath as a means for editing Metadata.
The rest interface must have a way inwhich to allow memberNodes and Clients to submit changes to the SystemMetadata.
For now we will just implement a separate update REST call on metacat such that a metacat controlled document can be updated without its PID being changed. However, this will need to change in the future. Possible other solutions involve Java RMI or Java JMX. (due to the inherently restrictive nature of Java RMI, may that will be the preferred method)
We need to consider the System Metadata schema as a packaging xml document. The System Metadata type will have three groups, an Immutable group, a Provenance group and a Mutable group. The root element has several ‘general’ elements that are considered immutable. They are considered immutable because membernodes or clients may not alter these elements. These elements may be represented as the SystemMetadata header complex should we wish to explicitly nest and typify them.
The new nesting element is being proposed to typify and group provenance information. The provenance complex type will contain two sub-elements origin and derivation. origin complex type element will contain two children, submitter and originMemberNode, that are immutable. The derivation complex type element contains mutable elements. Not that the derivation type can be used as a root element itself.
All of the mutable elements are types that can be themselves serialized as DataONE messages (or they are represented as root elements). They can be submitted by MemberNodes or clients.
The Immutable grouping:
- identifier element
- dateUploaded element
- dateSysMetadataModified element (may be modified by cn interal system processing)
- objectFormat element
- size element
- checksum element
- replica complex type (may be modified by cn interal system processing)
The Provenance grouping & complex type(contains both mutable and immutable types
origin complex type (Immutable)
- submitter element
- originMemberNode element
derivation complex type (Mutable)
- rightsHolder - element
- authoritativeMemberNode - element
- obsoletes -element
- obsoletedBy -element
- derivedFrom -element
- describes -element
- describedBy -element
The Mutable grouping (may be modified by Cns, Mns or Clients)
- replication policy complex type
- access control complex type
REST API:
PUT /meta/pid (token, pid, derivation | replicationPolicy | accessControl) -> Types.Identifier
CRUD Api (Methods Overloaded or named separately??? assuming named separately???)
internal only:
CN_crud.updateSystemMetadata(token, pid, systemMetadata) -> boolean
externally available through REST API:
CN_crud.updateSystemMetadataProvenance(token, pid, derivation) -> boolean
CN_crud.updateSystemMetadataReplication(token, pid, replicationPolicy) -> boolean
CN_crud.updateSystemMetadataAccessControl(token, pid, accessControl) -> boolean
I think we need a slightly different set of groups of system metadata elements, which I outline below, and which then lead to a different set of services. Each group should be defined in its own freestanding ComplexType. I don’t think we should have a general service for modifying system metadata that is externally accessible, but rather should have a separate method for each system metadata group so that access to the services can be controlled independently.
The ObjectStatus group (mutable by CNs only):
- dateSysMetadataModified element (may be modified by cn internal system processing)
- replica complex type (may be modified by cn interal system processing)
The Policy group (mutable with changePermission [maybe ‘changePolicy’] permission by CNs, MNs, Clients):
AccessPolicy complex type
- rightsHolder - element (NB: this needs to be added to AP schema)
- authoritativeMemberNode - element (NB: this needs to be added to AP schema)
ReplicationPolicy complex type
The Provenance group (mutable by Clients/MNs):
- obsoletes -element (requires ‘write’ permission, set at time of update() call)
- obsoletedBy -element (requires ‘write’ permission, set at time of update() call)
- derivedFrom -element (requires ‘write’ permission, set at time of update() or create() call)*
- describes -element (requires ‘write’ permission, set at time of update() or create() call)*
- describedBy -element (requires ‘write’ permission, set at time of update() or create() call)*
* There is a case to be made that these 3 elements should be settable by any client, but the problem comes in determining who can change values that someone else wrote (i.e., Joe says X describes Y, can Jane then say ‘Y describes X’ or other contrary statements, and who can delete Joe’s assertion?)
The Overall SystemMetadata object would contain elements for all of these groups, so thet getSystemMetadata() returns all of this information. Updating system metadata would be done with the following calls:
CN_crud.updateSystemMetadata(token, pid, systemMetadata) -> Types.Identifier
REST API: PUT /meta/pid (token, pid, SystemMetadata) -> Types.Identifier
CN_crud.setAccessPolicy(token, pid, AccessPolicy) -> boolean
REST API: PUT /access/pid (body containing token, AccessPolicy) -> boolean
CN_crud.setReplicationPolicy(token, pid, ReplicationPolicy) -> boolean
REST API: PUT /replication/pid (body containing token, AccessPolicy) -> boolean
CN_crud.setProvenance(token, pid, Provenance) -> boolean
REST API: PUT /provenance/pid (body containing token, AccessPolicy) -> boolean
-robert’s response: I don’t think we have a boolean return type. Everything we return needs to be serializable.
I was attempting to keep
- submitter element
- originMemberNode element
as part of the provenance group. It is information more appropriate to the provenance of an object than the identification of the object. Hence, i broke provenance into two parts, one mutable, the other not.
Otherwise, it reads well.