Warning: These documents are under active
development and subject to change (version 2.1.0-beta).
The latest release documents are at:
https://purl.dataone.org/architecture
The service interfaces described here are exposed through the Member Node REST interface to support interactions with Coordinating Nodes and DataONE clients.
The following table provides a list of API methods exposed by Member Nodes.
Tier: | The tier in which a method is grouped. |
---|---|
Version: | Version of API method is available. The lowest version number indicates when the method was added. A version number in parentheses indicates the method is available in that version and is unchanged from the previous version. If more than one version number is present, then the method signature or functionality has changed between API versions. e.g. “1.0, 2.0” indicates that the method was first introduced in Version 1.0 and has been modified in Version 2.0. |
REST: | The HTTP method and path relative to the Base URL. Parameters specified in the URL are indicatd by braces. Note that parameters included in a path MUST be properly path encoded, and parameters included as key, value pairs MUST also be properly encoded. |
Function: | The function name, associated with an API grouping. |
Parameters: | Indicates the parameters used when calling the method (sent in the message payload) and the return type. |
Tier | Version | REST | Function | Parameters |
---|---|---|---|---|
Tier 1 | 1.0 | GET /monitor/ping |
MNCore.ping() |
() -> null |
Tier 1 | 1.0, 2.0 | GET /log?[fromDate={fromDate}][&toDate={toDate}][&event={event}][&idFilter={idFilter}][&start={start}][&count={count}] |
MNCore.getLogRecords() |
(session , [fromDate] , [toDate] , [event] , [idFilter] , [start=0] , [count=1000] ) -> Types.Log |
Tier 1 | 1.0 | GET / and GET /node |
MNCore.getCapabilities() |
() -> Types.Node |
Tier 1 | 1.0 | GET /object/{id} |
MNRead.get() |
(session , id ) -> Types.OctetStream |
Tier 1 | 1.0 | GET /meta/{id} |
MNRead.getSystemMetadata() |
(session , id ) -> Types.SystemMetadata |
Tier 1 | 1.0 | HEAD /object/{id} |
MNRead.describe() |
(session , id ) -> Types.DescribeResponse |
Tier 1 | 1.0 | GET /checksum/{pid}[?checksumAlgorithm={checksumAlgorithm}] |
MNRead.getChecksum() |
(session , pid , [checksumAlgorithm] ) -> Types.Checksum |
Tier 1 | 1.0 | GET /object[?fromDate={fromDate}&toDate={toDate}&identifier={identifier}&formatId={formatId}&replicaStatus={replicaStatus} &start={start}&count={count}] |
MNRead.listObjects() |
(session , [fromDate] , [toDate] , [formatId] , [identifier] , [replicaStatus] , [start=0] , [count=1000] ) -> Types.ObjectList |
Tier 1 | POST /error |
MNRead.synchronizationFailed() |
(session , message ) -> Types.Boolean |
|
Tier 1 | 1.0 | POST /dirtySystemMetadata |
MNRead.systemMetadataChanged() |
(session , id , serialVersion , dateSysMetaLastModified ) -> boolean |
Tier 1 | 1.0 | GET /replica/{pid} |
MNRead.getReplica() |
(session , pid ) -> Types.OctetStream |
Tier 2 | 1.0 | GET /isAuthorized/{id}?action={action} |
MNAuthorization.isAuthorized() |
(session , id , action ) -> boolean |
Tier 3 | 1.0 | POST /object |
MNStorage.create() |
(session , pid , object , sysmeta ) -> Types.Identifier |
Tier 3 | 1.0 | PUT /object/{pid} |
MNStorage.update() |
(session , pid , object , newPid , sysmeta ) -> Types.Identifier |
Tier 3 | 1.0 | POST /generate |
MNStorage.generateIdentifier() |
(session , scheme , [fragment] ) -> Types.Identifier |
Tier 3 | 1.0 | DELETE /object/{id} |
MNStorage.delete() |
(session , id ) -> Types.Identifier |
Tier 3 | 1.0 | PUT /archive/{id} |
MNStorage.archive() |
(session , id ) -> Types.Identifier |
Tier 1 | 2.0 | PUT /meta |
MNStorage.updateSystemMetadata() |
(session , pid , sysmeta ) -> boolean |
Tier 4 | 1.0 | POST /replicate |
MNReplication.replicate() |
(session , sysmeta , sourceNode ) -> boolean |
Tier 1 | 1.1 | GET /query/{queryEngine}/{query} |
MNQuery.query() |
(session , queryEngine , query ) -> Types.OctetStream |
Tier 1 | 1.1 | GET /query/{queryType} |
MNQuery.getQueryEngineDescription() |
(session , queryEngine ) -> Types.QueryEngineDescription |
Tier 1 | 1.1 | GET /query |
MNQuery.listQueryEngines() |
(session ) -> Types.QueryEngineList |
Tier 1 | 1.2 | GET /views/{theme}/{pid} |
MNView.view() |
(session , theme , id ) -> Types.OctetStream |
Tier 1 | 1.2 | GET /views |
MNView.listViews() |
(session ) -> Types.OptionList |
Tier 1 | 1.2 | GET /packages/{packageType}/{pid} |
MNPackage.getPackage() |
(session , packageType , id ) -> Types.OctetStream |
The MN_core API provides mechanisms for a Member Node to report on the level of service compliance and to specify replication policies. The capabilities information is used in the Member Node registration process by the Coordinating Nodes.
The state of health API provides mechanisms for the monitoring infrastructure to report on the current state of the DataONE infrastructure and for the Coordinating Nodes to track the current operating state of the Member Node.
Tier | Version | REST | Function | Parameters |
---|---|---|---|---|
Tier 1 | 1.0 | GET /monitor/ping |
ping() |
() -> null |
Tier 1 | 1.0, 2.0 | GET /log?[fromDate={fromDate}][&toDate={toDate}][&event={event}][&idFilter={idFilter}][&start={start}][&count={count}] |
getLogRecords() |
(session , [fromDate] , [toDate] , [event] , [idFilter] , [start=0] , [count=1000] ) -> Types.Log |
Tier 1 | 1.0 | GET / and GET /node |
getCapabilities() |
() -> Types.Node |
MNCore.
ping
() → null¶Low level “are you alive” operation. A valid ping response is indicated by a HTTP status of 200. A timestmap indicating the current system time (UTC) on the node MUST be returned in the HTTP Date header.
The Member Node should perform some minimal internal functionality testing before answering. However, ping checks will be frequent (every few minutes) so the internal functionality test should not be high impact.
Any status response other than 200 indicates that the node is offline for DataONE operations.
Note that the timestamp returned in the Date header should follow the semantics as described in the HTTP specifications, http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.18
The response body will be ignored by the caller except in the case of an error, in which case the response body should contain the appropriate DataONE exception.
Version: | 1.0 |
---|---|
Use Cases: | |
REST URL: |
|
Returns: | Null body or Exception. The body of the message may be ignored by the caller. The HTTP header Date MUST be set in the response. |
Return type: | null |
Raises: |
|
Response
The response should be a valid HTTP response with a blank or arbitrary body. Only the HTTP header information is considered by the requestor. A successful response MUST have a HTTP status code of 200. In case of an error condition, the appropriate HTTP status code MUST be set, and an exception or error information MAY be returned in the response body.
Example
Example of ping request and response for a Member Node (Coordinating Nodes implement the same functionality). Lines prefixed with “>” indicate outgoing information, lines prefixed with “<” show content returned from the server. Lines associated with SSL connection initiation and close are not shown here. Note that the actual response headers may vary, the only required header fields are the first status line and a
Date
entry. However, in order to fully support clients that may cache the response, it is recommended that theExpires
, andCache-Control
headers are returned.export NODE="https://demo2.test.dataone.org/knb/d1/mn" curl -k -v "$NODE/v1/monitor/ping" > GET /knb/d1/mn/v1/monitor/ping HTTP/1.1 > User-Agent: curl/7.21.6 (x86_64-pc-linux-gnu) libcurl/7.21.6 OpenSSL/1.0.0e zlib/1.2.3.4 libidn/1.22 librtmp/2.3 > Host: demo2.test.dataone.org > Accept: */* > < HTTP/1.1 200 OK < Date: Tue, 06 Mar 2012 14:19:59 GMT < Server: Apache/2.2.14 (Ubuntu) < Content-Length: 0 < Content-Type: text/plain <
MNCore.
getLogRecords
(session[, fromDate][, toDate][, event][, idFilter][, start=0][, count=1000]) → Log¶Retrieve log information from the Member Node for the specified slice parameters. Log entries will only return PIDs.
This method is used primarily by the log aggregator to generate aggregate statistics for nodes, objects, and the methods of access.
The response MUST contain only records for which the requestor has permission to read.
Note that date time precision is limited to one millisecond. If no timezone information is provided UTC will be assumed.
Access control for this method MUST be configured to allow calling by Coordinating Nodes and MAY be configured to allow more general access.
v2.0: The event parameter has changed from v1_0.Types.Event
to a plain string
v2.0: The structure of v2_0.Types.Log
has changed.
Version: | 1.0, 2.0 |
---|---|
REST URL: |
|
Parameters: |
|
Returns: | |
Return type: | |
Raises: |
|
Example
Example of retrieving 3 log records from a Member Node. The xml command is provided by xmlstarlet and is used to format the output.
export NODE="https://demo2.test.dataone.org/knb/d1/mn" curl -k -s "$NODE/v1/log?start=0&count=3" | xml fo <?xml version="1.0" encoding="UTF-8"?> <d1:log xmlns:d1="http://ns.dataone.org/service/types/v1" count="3" start="0" total="1273"> <logEntry> <entryId>1</entryId> <identifier>MNodeTierTests.201260152556757.</identifier> <ipAddress>129.24.0.17</ipAddress> <userAgent>null</userAgent> <subject>CN=testSubmitter,DC=dataone,DC=org</subject> <event>create</event> <dateLogged>2012-02-29T23:25:58.104+00:00</dateLogged> <nodeIdentifier>urn:node:DEMO2</nodeIdentifier> </logEntry> <logEntry> <entryId>2</entryId> <identifier>TierTesting:testObject:RightsHolder_Person.4</identifier> <ipAddress>129.24.0.17</ipAddress> <userAgent>null</userAgent> <subject>CN=testSubmitter,DC=dataone,DC=org</subject> <event>create</event> <dateLogged>2012-02-29T23:26:38.828+00:00</dateLogged> <nodeIdentifier>urn:node:DEMO2</nodeIdentifier> </logEntry> <logEntry> <entryId>3</entryId> <identifier>TierTesting:testObject:RightsHolder_Group.4</identifier> <ipAddress>129.24.0.17</ipAddress> <userAgent>null</userAgent> <subject>CN=testSubmitter,DC=dataone,DC=org</subject> <event>create</event> <dateLogged>2012-02-29T23:27:40.255+00:00</dateLogged> <nodeIdentifier>urn:node:DEMO2</nodeIdentifier> </logEntry> </d1:log>
MNCore.
getCapabilities
() → Node¶Returns a document describing the capabilities of the Member Node.
The response at the Member Node base URL is for convenience only. Clients of Member Nodes SHOULD use the /node URL to retrieve the node capabilities document.
Version: | 1.0 |
---|---|
REST URL: |
|
Returns: | The technical capabilities of the Member Node |
Return type: | |
Raises: |
|
Example
export NODE="https://demo2.test.dataone.org/knb/d1/mn" curl -k -s "$NODE/v1/node" | xml fo <?xml version="1.0" encoding="UTF-8"?> <d1:node xmlns:d1="http://ns.dataone.org/service/types/v1" replicate="true" synchronize="true" type="mn" state="up"> <identifier>urn:node:DEMO2</identifier> <name>DEMO2 Metacat Node</name> <description>A DataONE member node implemented in Metacat.</description> <baseURL>https://demo2.test.dataone.org:443/knb/d1/mn</baseURL> <services> <service name="MNRead" version="v1" available="true"/> <service name="MNCore" version="v1" available="true"/> <service name="MNAuthorization" version="v1" available="true"/> <service name="MNStorage" version="v1" available="true"/> <service name="MNReplication" version="v1" available="true"/> </services> <synchronization> <schedule hour="*" mday="*" min="0/3" mon="*" sec="10" wday="?" year="*"/> <lastHarvested>2012-03-06T14:57:39.851+00:00</lastHarvested> <lastCompleteHarvest>2012-03-06T14:57:39.851+00:00</lastCompleteHarvest> </synchronization> <ping success="true"/> <subject>CN=urn:node:DEMO2, DC=dataone, DC=org</subject> <contactSubject>CN=METACAT1, DC=dataone, DC=org</contactSubject> </d1:node>
The MNRead API implements methods that enable object management operations on a Member Node.
Tier | Version | REST | Function | Parameters |
---|---|---|---|---|
Tier 1 | 1.0 | GET /object/{id} |
get() |
(session , id ) -> Types.OctetStream |
Tier 1 | 1.0 | GET /meta/{id} |
getSystemMetadata() |
(session , id ) -> Types.SystemMetadata |
Tier 1 | 1.0 | HEAD /object/{id} |
describe() |
(session , id ) -> Types.DescribeResponse |
Tier 1 | 1.0 | GET /checksum/{pid}[?checksumAlgorithm={checksumAlgorithm}] |
getChecksum() |
(session , pid , [checksumAlgorithm] ) -> Types.Checksum |
Tier 1 | 1.0 | GET /object[?fromDate={fromDate}&toDate={toDate}&identifier={identifier}&formatId={formatId}&replicaStatus={replicaStatus} &start={start}&count={count}] |
listObjects() |
(session , [fromDate] , [toDate] , [formatId] , [identifier] , [replicaStatus] , [start=0] , [count=1000] ) -> Types.ObjectList |
Tier 1 | POST /error |
synchronizationFailed() |
(session , message ) -> Types.Boolean |
|
Tier 1 | 1.0 | POST /dirtySystemMetadata |
systemMetadataChanged() |
(session , id , serialVersion , dateSysMetaLastModified ) -> boolean |
Tier 1 | 1.0 | GET /replica/{pid} |
getReplica() |
(session , pid ) -> Types.OctetStream |
MNRead.
get
(session, id) → OctetStream¶Retrieve an object identified by id from the node. Supports both PIDs and SIDs. SID will return HEAD PID.
The response MUST contain the bytes of the indicated object, and the checksum of the bytes retrieved SHOULD match the SystemMetadata.checksum
recorded in the Types.SystemMetadata
when calling with PID.
If the object does not exist on the node servicing the request, then Exceptions.NotFound
must be raised even if the object exists on another node in the DataONE system.
Also implmented by Coordinating Nodes as CNRead.get()
.
Version: | 1.0 |
---|---|
Use Cases: | |
REST URL: |
|
Parameters: |
|
Returns: | Bytes of the specified object. |
Return type: | |
Raises: |
|
Examples
(GET) Retrieve the object with identifier “XYZ332”:
export NODE="https://demo2.test.dataone.org/knb/d1/mn" curl -k "$NODE/v1/object/XYZ332" ... data ...(GET) Attempt to retrieve a non-existent object (and show headers in response):
export NODE="https://demo2.test.dataone.org/knb/d1/mn" curl -D - "$NODE/v1/object/DOESNTEXIST" HTTP/1.1 404 Not Found Date: Tue, 06 Mar 2012 15:25:35 GMT Server: Apache/2.2.14 (Ubuntu) Content-Length: 196 Vary: Accept-Encoding Content-Type: text/xml <?xml version="1.0" encoding="UTF-8"?> <error detailCode="1800" errorCode="404" name="NotFound"> <description>No system metadata could be found for given PID: DOESNTEXIST</description> </error>
MNRead.
getSystemMetadata
(session, id) → SystemMetadata¶Describes the object identified by id by returning the associated system metadata object.
If the object does not exist on the node servicing the request, then Exceptions.NotFound
MUST be raised even if the object exists on another node in the DataONE system.
Version: | 1.0 |
---|---|
Use Cases: | |
REST URL: |
|
Parameters: |
|
Returns: | System metadata object describing the object. |
Return type: | |
Raises: |
|
Examples
(GET) Retrieve system metadata from a Member Node for object “XYZ332” which happens to be science metadata (an EML document) that has been obsoleted by a new version with identifier “XYZ33”:
curl http://m1.dataone.org/mn/v1/meta/XYZ332 <?xml version="1.0" encoding="UTF-8"?> <d1:systemMetadata xmlns:d1="http://ns.dataone.org/service/types/v1"> <serialVersion>1</serialVersion> <identifier>XYZ332</identifier> <formatId>eml://ecoinformatics.org/eml-2.1.0</formatId> <size>20875</size> <checksum algorithm="MD5">e7451c1775461b13987d7539319ee41f</checksum> <submitter>uid=mbauer,o=NCEAS,dc=ecoinformatics,dc=org</submitter> <rightsHolder>uid=mbauer,o=NCEAS,dc=ecoinformatics,dc=org</rightsHolder> <accessPolicy> <allow> <subject>uid=jdoe,o=NCEAS,dc=ecoinformatics,dc=org</subject> <permission>read</permission> <permission>write</permission> <permission>changePermission</permission> </allow> <allow> <subject>public</subject> <permission>read</permission> </allow> <allow> <subject>uid=nceasadmin,o=NCEAS,dc=ecoinformatics,dc=org</subject> <permission>read</permission> <permission>write</permission> <permission>changePermission</permission> </allow> </accessPolicy> <replicationPolicy replicationAllowed="false"/> <obsoletes>XYZ331</obsoletes> <obsoletedBy>XYZ333</obsoletedBy> <archived>true</archived> <dateUploaded>2008-04-01T23:00:00.000+00:00</dateUploaded> <dateSysMetadataModified>2012-06-26T03:51:25.058+00:00</dateSysMetadataModified> <originMemberNode>urn:node:TEST</originMemberNode> <authoritativeMemberNode>urn:node:TEST</authoritativeMemberNode> </d1:systemMetadata>(GET) Attempt to retrieve system metadata for an object that does not exist.:
curl http://cn.dataone.org/cn/v1/meta/SomeObjectID <?xml version="1.0" encoding="UTF-8"?> <error detailCode="1800" errorCode="404" name="NotFound"> <description>No system metadata could be found for given PID: SomeObjectID</description> </error>
MNRead.
describe
(session, id) → DescribeResponse¶This method provides a lighter weight mechanism than MNRead.getSystemMetadata()
for a client to determine basic properties of the referenced object. The response should indicate properties that are typically returned in a HTTP HEAD request: the date late modified, the size of the object, the type of the object (the SystemMetadata.formatId
).
The principal indicated by token must have read privileges on the object, otherwise Exceptions.NotAuthorized
is raised.
If the object does not exist on the node servicing the request, then Exceptions.NotFound
must be raised even if the object exists on another node in the DataONE system.
Note that this method is likely to be called frequently and so efficiency should be taken into consideration during implementation.
Version: | 1.0 |
---|---|
Use Cases: | |
REST URL: |
|
Parameters: |
|
Returns: | A set of values providing a basic description of the object. |
Return type: | |
Raises: |
|
Examples
(HEAD) Retrieve information about the object with identifier “ABC123”:
curl -I http://mn1.dataone.org/mn/v1/object/ABC123
HTTP/1.1 200 OK
Last-Modified: Wed, 16 Dec 2009 13:58:34 GMT
Content-Length: 10400
Content-Type: application/octet-stream
DataONE-ObjectFormat: eml://ecoinformatics.org/eml-2.0.1
DataONE-Checksum: SHA-1,2e01e17467891f7c933dbaa00e1459d23db3fe4f
DataONE-SerialVersion: 1234
(HEAD) An error response to a describe() request for object “IDONTEXIST”:
curl -I http://mn1.dataone.org/mn/v1/object/IDONTEXIST
HTTP/1.1 404 Not Found
Last-Modified: Wed, 16 Dec 2009 13:58:34 GMT
Content-Length: 1182
Content-Type: text/xml
DataONE-Exception-Name: NotFound
DataONE-Exception-DetailCode: 1380
DataONE-Exception-Description: The specified object does not exist on this node.
DataONE-Exception-PID: IDONTEXIST
MNRead.
getChecksum
(session, pid[, checksumAlgorithm]) → Checksum¶Returns a Types.Checksum
for the specified object using an accepted hashing algorithm. The result is used to determine if two instances referenced by a PID are identical, hence it is necessary that MNs can ensure that the returned checksum is valid for the referenced object either by computing it on the fly or by using a cached value that is certain to be correct.
Version: | 1.0 |
---|---|
REST URL: |
|
Parameters: |
|
Returns: | The checksum value originally computed for the specified object. |
Return type: | |
Raises: |
|
MNRead.
listObjects
(session[, fromDate][, toDate][, formatId][, identifier][, replicaStatus][, start=0][, count=1000]) → ObjectList¶Retrieve the list of objects present on the MN that match the calling parameters. This method is required to support the process of Member Node synchronization. At a minimum, this method MUST be able to return a list of objects that match:
fromDate < SystemMetadata.dateSysMetadataModified
but is expected to also support date range (by also specifying toDate), and should also support slicing of the matching set of records by indicating the starting index of the response (where 0 is the index of the first item) and the count of elements to be returned.
Note that date time precision is limited to one millisecond. If no timezone information is provided, the UTC will be assumed.
Note that date time precision is limited to one millisecond. If no timezone information is provided, the UTC will be assumed.
Version: | 1.0 |
---|---|
Use Cases: | |
REST URL: |
|
Parameters: |
|
Returns: | The list of PIDs that match the query criteria. If none match, an empty list is returned. |
Return type: | |
Raises: |
|
Example
Retrieve an object list from a member node, and pipe the response through an xml formatter for easier viewing:
curl "https://gmn-dev.test.dataone.org/mn/v1/object?count=5" | xml fo
<?xml version="1.0"?>
<ns1:objectList xmlns:ns1="http://ns.dataone.org/service/types/v1" count="5" start="0" total="12">
<objectInfo>
<identifier>AnserMatrix.htm</identifier>
<formatId>eml://ecoinformatics.org/eml-2.0.0</formatId>
<checksum algorithm="MD5">0e25cf59d7bd4d57154cc83e0aa32b34</checksum>
<dateSysMetadataModified>1970-05-27T06:12:49</dateSysMetadataModified>
<size>11048</size>
</objectInfo>
...
<objectInfo>
<identifier>hdl:10255/dryad.218/mets.xml</identifier>
<formatId>eml://ecoinformatics.org/eml-2.0.0</formatId>
<checksum algorithm="MD5">65c4e0a9c4ccf37c1e3ecaaa2541e9d5</checksum>
<dateSysMetadataModified>1987-01-14T07:09:09</dateSysMetadataModified>
<size>2796</size>
</objectInfo>
</ns1:objectList>
MNRead.
synchronizationFailed
(session, message) → Boolean¶This is a callback method used by a CN to indicate to a MN that it cannot complete synchronization of the science metadata identified by pid. When called, the MN should take steps to record the problem description and notify an administrator or the data owner of the issue.
A successful response is indicated by a HTTP status of 200. An unsuccessful call is indicated by a returned exception and associated HTTP status code.
Access control for this method MUST be configured to allow calling by Coordinating Nodes and MAY be configured to allow more general access.
Version: | |
---|---|
Use Cases: | |
REST URL: |
|
Parameters: |
|
Returns: | A successful response is indicated by a HTTP 200 status. An unsuccessful call is indicated by returing the appropriate exception. |
Return type: | |
Raises: |
|
MNRead.
systemMetadataChanged
(session, id, serialVersion, dateSysMetaLastModified) → boolean¶Notifies the Member Node that the authoritative copy of system metadata on the Coordinating Nodes has changed.
The Member Node SHOULD schedule an update to its information about the affected object by retrieving an authoritative copy from a Coordinating Node.
Note that date time precision is limited to one millisecond.
Access control for this method MUST be configured to allow calling by Coordinating Nodes.
Version: | 1.0 |
---|---|
REST URL: |
|
Parameters: |
|
Returns: | True if notification was received OK, otherwise an error is returned. |
Return type: | boolean |
Raises: |
|
MNRead.
getReplica
(session, pid) → OctetStream¶Called by a target Member Node to fullfill the replication request originated by a Coordinating Node calling MNReplication.replicate()
. This is a request to make a replica copy of the object, and differs from a call to GET /object in that it should be logged as a replication event rather than a read event on that object.
If the object being retrieved is restricted access, then a Tier 2 or higher Member Node MUST make a call to CNReplication.isNodeAuthorized()
to verify that the Subject of the caller is authorized to retrieve the content.
A successful operation is indicated by a HTTP status of 200 on the response.
Failure of the operation MUST be indicated by returning an appropriate exception.
Version: | 1.0 |
---|---|
Use Cases: | |
REST URL: |
|
Parameters: |
|
Returns: | Bytes of the specified object. |
Return type: | |
Raises: |
|
The MNQuery API is an optional API that may be implemented by Member Nodes that intend to support querying the local repository. The actual form of the query is undefined, and t is expected that a small set of well known query engine types will be supported.
Tier | Version | REST | Function | Parameters |
---|---|---|---|---|
Tier 1 | 1.1 | GET /query/{queryEngine}/{query} |
query() |
(session , queryEngine , query ) -> Types.OctetStream |
Tier 1 | 1.1 | GET /query/{queryType} |
getQueryEngineDescription() |
(session , queryEngine ) -> Types.QueryEngineDescription |
Tier 1 | 1.1 | GET /query |
listQueryEngines() |
(session ) -> Types.QueryEngineList |
MNQuery.
query
(session, queryEngine, query) → OctetStream¶Submit a query against the specified queryEngine and return the response as formatted by the queryEngine.
The MNQuery.query()
operation may be implemented by more than one type of search engine and the queryEngine parameter indicates which search engine is targeted. The value and form of query is determined by the specific query engine.
For example, the SOLR search engine will accept many of the standard parameters of SOLR, including field restrictions and faceting.
This method is optional for Member Nodes, but if implemented, both getQueryEngineDescription and listQueryEngines must also be implemented.
Version: | 1.1 |
---|---|
Use Cases: | |
REST URL: |
|
Parameters: |
|
Returns: | The structure of the response is determined by the chosen search engine and parameters provided to it. |
Return type: | |
Raises: |
|
MNQuery.
getQueryEngineDescription
(session, queryEngine) → QueryEngineDescription¶Provides metadata about the query service of the specified queryEngine. The metadata provides a brief description of the query engine, its version, its schema version, and an optional list of fields supported by the query engine.
Version: | 1.1 |
---|---|
REST URL: |
|
Parameters: |
|
Returns: | A list of fields that are supported by the search index and additional metadata. |
Return type: |
|
Raises: |
|
MNQuery.
listQueryEngines
(session) → QueryEngineList¶Returns a list of query engines, i.e. supported values for the queryEngine parameter of the getQueryEngineDescription and query operations.
The list of search engines available may be influenced by the authentication status of the request.
Version: | 1.1 |
---|---|
REST URL: |
|
Parameters: | session ( |
Returns: | A list of names of queryEngines available to the user identified by session. |
Return type: |
|
Raises: |
|
The MNView API is an optional API that may be implemented by Member Nodes that intend to support providing rendered views of content on their repository. Each repository can implement multipe themed views of their content, each accesed using the name of the theme and the identifier of the content to be viewed. Unlike the MNRead service, which returns the exact bytes of content, the MNView service provides a rendered view of the content which can transform the content into different formats. The most common use of the view service will likely be to provide a rendered HTML landing page at a well-known URL that can be used to provide a human-readable view of metadata and data. Other potential uses include providing alternative formats for metadata and data. Each Member Node that implements the MNView service must implement at least one theme named ‘default’ which provides the default view of all content. Other themes can be provided for use by various clients.
Tier | Version | REST | Function | Parameters |
---|---|---|---|---|
Tier 1 | 1.2 | GET /views/{theme}/{pid} |
view() |
(session , theme , id ) -> Types.OctetStream |
Tier 1 | 1.2 | GET /views |
listViews() |
(session ) -> Types.OptionList |
MNView.
view
(session, theme, id) → OctetStream¶Provides a formatted view of an object (science metadata, data, resource, or other) using the given named theme.
If this service is implemented, the MNView.view()
operation must implement at least one {theme} named ‘default’ to provide a standard (possibly minimalistic) view of the content in HTML format.
If the {theme} parameter is not recognized, the service must render the object using the default theme rather than throwing an error. Note that the return type of Types.OctetStream requires that the consuming client has a priori knowledge of the theme being returned (like HTML). Response headers must include the correct mime-type of the view being returned.
This method is optional for Member Nodes, but if implemented, MNView.listViews must also be implemented.
Version: | 1.2 |
---|---|
REST URL: |
|
Parameters: |
|
Returns: | Any return type is allowed, including application/octet-stream, but the format of the response should be specialized by the requested theme. |
Return type: | |
Raises: |
|
MNView.
listViews
(session) → OptionList¶Provides a list of usable themes for rendering content in a view, including a required ‘default’ theme. The list of themes is provided as an OptionList, where the option key should be used as the theme name in calls to MNView.view, and the description provides a human readable description of what will be returned fo rthat theme.
This method is optional for Member Nodes, but if implemented, MNView.view must also be implemented.
Version: | 1.2 |
---|---|
REST URL: |
|
Parameters: | session ( |
Returns: | A list of available themes that can be used with the MNView.view service. |
Return type: |
|
Raises: |
|
The MNPackage API is an optional API that may be implemented by Member Nodes that intend to support downloading all of the contents of a data package in a single API call. Without this service, a client application must individually retrieve each of the metadata and data components of a package as they are listed in the ORE document that describes the package. Using the MNPackage service, a caller can instead request a serialized form of all of the data in a package, which is returned in the format requested. All implementations must support the BagIt format specification, but may also support additional well-defined packaging standards and specifications.
Tier | Version | REST | Function | Parameters |
---|---|---|---|---|
Tier 1 | 1.2 | GET /packages/{packageType}/{pid} |
getPackage() |
(session , packageType , id ) -> Types.OctetStream |
MNPackage.
getPackage
(session, packageType, id) → OctetStream¶Provides all of the content of a DataONE data package as defined by an OAI-ORE document in DataONE, in one of several possible package serialization formats. The serialized package will contain all of the data described in the ORE aggregation. The default implementation will include packages in the BagIt format. The packageType formats must be specified using the associated ObjectFormat formatId for that package serialization format.
The {id} parameter must be the identifier of an ORE package object. If it is the identifier of one of the science metadata documents or data files contained within the package, the Member Node should throw an InvalidRequest exception. Identifiers may be either PIDss or SIDs.
This method is optional for Member Nodes.
Version: | 1.2 |
---|---|
REST URL: |
|
Parameters: |
|
Returns: | Any return type is allowed, including application/octet-stream, but the format of the response should be specialized by the requested packageType. |
Return type: | |
Raises: |
|
Provides mechanisms Member Nodes to verify access to resources for users (subject). See the document Identity Management and Authenticated Session Management for more details on some authentication options.
Tier | Version | REST | Function | Parameters |
---|---|---|---|---|
Tier 2 | 1.0 | GET /isAuthorized/{id}?action={action} |
isAuthorized() |
(session , id , action ) -> boolean |
MNAuthorization.
isAuthorized
(session, id, action) → boolean¶Test if the user identified by the provided session has authorization for operation on the specified object.
A successful operation is indicated by a return HTTP status of 200.
Failure is indicated by an exception such as NotAuthorized
being returned.
The body of the response is arbitrary and SHOULD be ignored by the caller.
If the action is not authorized, then a NotAuthorized
exception MUST be raised.
Note
Should perhaps add convenience methods for “canRead()” and “canWrite()” to verify that a user is able to read / write an object.
Version: | 1.0 |
---|---|
Use Cases: | |
REST URL: |
|
Parameters: |
|
Returns: | True if the operation is allowed |
Return type: | boolean |
Raises: |
|
Tier | Version | REST | Function | Parameters |
---|---|---|---|---|
Tier 3 | 1.0 | POST /object |
create() |
(session , pid , object , sysmeta ) -> Types.Identifier |
Tier 3 | 1.0 | PUT /object/{pid} |
update() |
(session , pid , object , newPid , sysmeta ) -> Types.Identifier |
Tier 3 | 1.0 | POST /generate |
generateIdentifier() |
(session , scheme , [fragment] ) -> Types.Identifier |
Tier 3 | 1.0 | DELETE /object/{id} |
delete() |
(session , id ) -> Types.Identifier |
Tier 3 | 1.0 | PUT /archive/{id} |
archive() |
(session , id ) -> Types.Identifier |
Tier 1 | 2.0 | PUT /meta |
updateSystemMetadata() |
(session , pid , sysmeta ) -> boolean |
MNStorage.
create
(session, pid, object, sysmeta) → Identifier¶Called by a client to adds a new object to the Member Node.
The pid must not exist in the DataONE system or should have been previously reserved using CNCore.reserveIdentifier()
. A new, unique Types.SystemMetadata.seriesId
may be included.
The caller MUST have authorization to write or create content on the Member Node.
Version: | 1.0 |
---|---|
Use Cases: | |
REST URL: |
|
Parameters: |
|
Returns: | The identifier that was used to insert the document into the system. |
Return type: | |
Raises: |
|
Examples
The outgoing request body must be encoded as MIME multipart/form-data with the system metadata portion and the object as file attachments.
(POST) Create a new object with a given identifier (XYZ33256):
curl -E /tmp/x509up_u502 \
-F "pid=XYZ33256" \
-F "object=@sciencemetadata.xml" \
-F "sysmeta=@sysmeta.xml" \
https://m1.dataone.org/mn/v1/object
HTTP/1.1 200 Success
Content-Type:
Date: Wed, 16 Dec 2009 13:58:34 GMT
Content-Length: 355
XYZ33256
The system metadata included with the create call must contain values for the elements required to be set by clients (see System Metadata). The system metadata document can be crafted by hand or preferably with a tool such as generate_sysmeta.py which is available in the d1_instance_generator Python package. See documentation included with that package for more information on its operation.
For example, the system metadata document for the example above was generated using the sequence of commands:
<<log on to cilogon.org and download my certificate>>
MYSUBJECT=`python my_subject.py /tmp/x509up_u502`
echo $MYSUBJECT
CN=Dave Vieglais T799,O=Google,C=US,DC=cilogon,DC=org
python generate_sysmeta.py -f sciencemetadata.xml \
-i "XYZ33256" \
-s "$MYSUBJECT" \
-t "eml://ecoinformatics.org/eml-2.0.1" \
> sysmeta.xml
The generated system metadata document contains default information that indicates:
CN=Dave Vieglais T799,O=Google,C=US,DC=cilogon,DC=org
The generated system metadata document is presented below:
<?xml version='1.0' encoding='UTF-8'?>
<ns1:systemMetadata xmlns:ns1="http://ns.dataone.org/service/types/v1">
<identifier>XYZ33256</identifier>
<formatId>eml://ecoinformatics.org/eml-2.0.1</formatId>
<size>22936</size>
<checksum algorithm="MD5">2ec0084d1e11e0d5c9a46ba6a230aa85</checksum>
<submitter>CN=Dave Vieglais T799,O=Google,C=US,DC=cilogon,DC=org</submitter>
<rightsHolder>CN=Dave Vieglais T799,O=Google,C=US,DC=cilogon,DC=org</rightsHolder>
<accessPolicy>
<allow>
<subject>public</subject>
<permission>read</permission>
</allow>
<allow>
<subject>CN=Dave Vieglais T799,O=Google,C=US,DC=cilogon,DC=org</subject>
<permission>changePermission</permission>
</allow>
</accessPolicy>
<replicationPolicy replicationAllowed="true"/>
<dateUploaded>2012-02-20T20:39:19.664495</dateUploaded>
<dateSysMetadataModified>2012-02-20T20:39:19.70598</dateSysMetadataModified>
</ns1:systemMetadata>
MNStorage.
update
(session, pid, object, newPid, sysmeta) → Identifier¶This method is called by clients to update objects on Member Nodes.
Updates an existing object by creating a new object identified by newPid on the Member Node which explicitly obsoletes the object identified by pid through appropriate changes to the SystemMetadata of pid and newPid.
The Member Node sets Types.SystemMetadata.obsoletedBy
on the object being obsoleted to the pid of the new object. It then updates Types.SystemMetadata.dateSysMetadataModified
on both the new and old objects. The modified system metadata entries then become available in MNRead.listObjects()
. This ensures that a Coordinating Node will pick up the changes when filtering on Types.SystemMetadata.dateSysMetadataModified
.
The update operation MUST fail with Exceptions.InvalidRequest
on objects that have the Types.SystemMetadata.archived
property set to true.
A new, unique Types.SystemMetadata.seriesId
may be included when beginning a series, or a series may be extended if the newPid obsoletes the existing pid.
Version: | 1.0 |
---|---|
Use Cases: | |
REST URL: |
|
Parameters: |
|
Returns: | The identifier of the document that is replacing the original, which should be the same as newPid. |
Return type: | |
Raises: |
|
MNStorage.
generateIdentifier
(session, scheme[, fragment]) → Identifier¶Given a scheme and optional fragment, generates an identifier with that scheme and fragment that is unique. Maybe be used for generating either PIDs or SIDs.
The message body is encoded as MIME Multipart/form-data
Version: | 1.0 |
---|---|
REST URL: |
|
Parameters: |
|
Returns: | The identifier that was generated |
Return type: | |
Raises: |
|
Todo
Need to provide a list of recommended identifier schemes.
MNStorage.
delete
(session, id) → Identifier¶Deletes an object managed by DataONE from the Member Node. Member Nodes MUST check that the caller (typically a Coordinating Node) is authorized to perform this function.
The delete operation will be used primarily by Coordinating Nodes to help manage the number of replicas of an object that are present in the entire system.
The operation removes the object from further interaction with DataONE services. The implementation may delete the object bytes, and in general should do so since a delete operation may be in response to a problem with the object (e.g. it contains malicious content, is innappropriate, or is the subject of a legal request).
If the object does not exist on the node servicing the request, then an Exceptions.NotFound
exception is raised. The message body of the exception SHOULD contain a hint as to the location of the CNRead.resolve()
method.
Version: | 1.0 |
---|---|
Use Cases: | |
REST URL: |
|
Parameters: |
|
Returns: | The identifier of the object that was deleted. |
Return type: | |
Raises: |
|
MNStorage.
archive
(session, id) → Identifier¶Hides an object managed by DataONE from search operations, effectively preventing its discovery during normal operations.
The operation does not delete the object bytes, but instead sets the Types.SystemMetadata.archived
flag to True. This ensures that the object can still be resolved (and hence remain valid for existing citations and cross references), though will not appear in searches.
Objects that are archived can not be updated through the MNStorage.update()
operation.
Archived objects can not be un-archived. This behavior may change in future versions of the DataONE API.
Member Nodes MUST check that the caller is authorized to perform this function.
If the object does not exist on the node servicing the request, then an Exceptions.NotFound
exception is raised. The message body of the exception SHOULD contain a hint as to the location of the CNRead.resolve()
method.
Version: | 1.0 |
---|---|
REST URL: |
|
Parameters: |
|
Returns: | The identifier of the object that was archived. |
Return type: | |
Raises: |
|
MNStorage.
updateSystemMetadata
(session, pid, sysmeta) → boolean¶Provides a mechanism for updating system metadata for any objects held on the Member Node where that Member Node is the authoritative Member Node. Coordinating Node can call this method on the non-authoritative Member Node. However, this is not a normal operation and is for the special case - the authoritative Member Node doesn’t exist any more. Coordinating Node calling the method on the non-authoriative Memember Node in the normal operation can cause an unexpected consequence.
This method is typically used by Authoritative Member Node or rights holder[s] to ensure system metadata quality.
Version: | 2.0 |
---|---|
REST URL: |
|
Parameters: |
|
Returns: | True if the update was successful. |
Return type: | boolean |
Raises: |
|
The Replication API provides methods to support CN-directed replication of content between MNs.
Tier | Version | REST | Function | Parameters |
---|---|---|---|---|
Tier 4 | 1.0 | POST /replicate |
replicate() |
(session , sysmeta , sourceNode ) -> boolean |
MNReplication.
replicate
(session, sysmeta, sourceNode) → boolean¶Called by a Coordinating Node to request that the Member Node create a copy of the specified object by retrieving it from another Member Nodeode and storing it locally so that it can be made accessible to the DataONE system.
A successful operation is indicated by a HTTP status of 200 on the response.
Failure of the operation MUST be indicated by returning an appropriate exception.
Access control for this method MUST be configured to allow calling by Coordinating Nodes.
Version: | 1.0 |
---|---|
Use Cases: | |
REST URL: |
|
Parameters: |
|
Returns: | True if everything works OK, otherwise an error is returned. |
Return type: | boolean |
Raises: |
|
Response
The response should be a valid HTTP response with a blank or arbitrary body. Only the HTTP header information is considered by the requestor. A successful response must have a HTTP status code of 200. In case of an error condition, the appropriate HTTP status code must be set, and an exception or error information may be returned in the response.
The outgoing request body must be encoded as MIME multipart/form-data with the system metadata portion as a file attachment and the sourceNode parameter as a form field.
curl -v -X POST "https://localhost:8000/mn/v1/replicate" \
-H "Content-type: multipart/form-data" \
-F "sysmeta=@systemmetadata.xml" \
-F "sourceNode=urn:node:MN_B"
* About to connect() to localhost port 8000 (#0)
* Trying ::1... Connection refused
* Trying fe80::1... Connection refused
* Trying 127.0.0.1... connected
* Connected to localhost (127.0.0.1) port 8000 (#0)
> POST /mn/v1/replicate HTTP/1.1
> User-Agent: curl/7.19.7 (universal-apple-darwin10.0) libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
> Host: localhost:8000
> Accept: */*
> Content-Length: 1021
> Expect: 100-continue
> Content-type: multipart/form-data; boundary=----------------------------88ffdd8070e9
>
* Done waiting for 100-continue
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Date: Fri, 14 Jan 2011 22:01:13 GMT
< Server: WSGIServer/0.1 Python/2.6.1
< Content-Type: text/xml
<
<
* Closing connection #0