Warning: These documents are under active
development and subject to change (version 2.1.0-beta).
The latest release documents are at:
https://purl.dataone.org/architecture
This document describes how the Member (Member Node APIs) and Coordinating (Coordinating Node APIs) Node APIs are implemented using a Representational State Transfer (REST) approach over HTTP.
Key points on REST interactions in DataONE:
PIDs
identifying individual items of a collection.Exceptions.InvalidRequest
exception.Types.Checksum
type. It is
a String (the checksum) with an attribute (the algorithm). Complex types are
serialized to UTF-8 encoded XML structures that are defined by the DataONE
Types Schema.Collections exposed by Member Nodes and Coordinating Nodes include:
/object : | The set of objects available for retrieval from the node. |
---|---|
/meta : | Metadata about objects available for retrieval from the node. |
/formats : | Object formats registered on the node. |
/log : | Log records held on the node. |
/reserve : | Identifiers that have been reserved for future use. |
/accounts : | Principal and ownership related functionality. |
/sessions : | Authenticated session management functions. |
/node : | Service and status information for all nodes on the system. |
/monitor : | Node health monitoring |
/replicate : | Member node to member node replication functionality |
The format of the response (except for responses from MNRead.get()
or
CNRead.get()
) is determined by the Accept: header provided in the
request.
Version 1.0 of the DataONE services only support XML serialization, and this format MUST be supported by all services and clients interacting with the DataONE system.
All request and response documents MUST be encoded using the UTF-8 character set.
If the service is not able to provide a response in the specified format, then
the node should return an error code of Exceptions.NotImplemented
, with
the HTTP error code set to 406.
Many of the URL patterns described here accept parameters in the URL and as components of a MIME multipart-mixed message body.
Unless otherwise indicated, all parameter names and values should be considered case sensitive.
Note
The default configuration of web servers such as Apache introduces some ambiguity in the interpretation of URLs that include slashes and other reserved characters that are used as path separators for example. The document Apache Configuration for DataONE Services describes appropriate configuration details for the Apache web server.
Session information (formerly referred to as a token) is obtained from the client side authentication certificate held by the SSL processing library of the HTTPS service handling the request. Hence, even though a session parameter may be present in the method signature, the session information itself is transported as part of the HTTPS handshaking process and is not present in the body or header section of the HTTP request.
Some parameters are passed as part of the REST service URL path (e.g. /get/{pid}). Such values MUST be encoded according to the rules of RFC3986 with the additional restriction that the space character MUST be encoded as “%20” rather than “+”. Examples of DataONE REST URLs for retrieving an object (i.e. the get() operation):
PID: 10.1000/182
URL: https://mn.example.com/mn/v1/object/10.1000%2F182
PID: http://example.com/data/mydata?row=24
URL: https://mn.example.com/mn/v1/object/http:%2F%2Fexample.com%2Fdata%2Fmydata%3Frow=24
PID: Is_féidir_liom_ithe_gloine
URL: https://mn.example.com/mn/v1/object/Is_f%C3%A9idir_liom_ithe_gloine
Parameters passed as key, values parameters in the URL query string MUST be appropriately encoded for transmission as part of the URL according to RFC3986 rules for the URL query component. In addition, the space character MUST be encoded as “%20” rather than the alternative “+” character.
Where a boolean parameter value is being specified as the value portion of a key-value pair appearing in a URL, the strings “true” and “false” MUST be used to indicate logical true and logical false respectively.
Date values in URLs should be formatted as:
yyyy-MM-dd[Thh:mm:ss.S[+ZZ:zz]]
Where:
yyyy = Four digit year
MM = Two digit month, 01 = January
dd = Two digit day of month, 01 = first day
hh = Hour of day, 00 - 23
mm = Minute of hour, 00 - 59
ss = Second of minute, 00 - 59
S = Milliseconds
ZZ = Hours of timezone offset
zz = Minutes of timezone offset
If the timezone values are not present then the date time is interpreted to be in GMT.
If the time portion of the date time is not present, then the time is assumed to be 00:00:00.0, i.e. the first millisecond of the specified date.
Requests sent using the HTTP POST or PUT verbs MUST use MIME multipart-mixed encoding of the message body as described in RFC2046. In most cases and unless otherwise indicated, all parameters for PUT and POST requests except the authorization session will be sent in the message body (as opposed to URL parameters).
Example of a HTTP POST request to the MN create() method using curl:
curl -X POST \
--cert /tmp/x509up_u502 \
-H "Charset: utf-8" \
-H "Content-Type: multipart/mixed; boundary=----------6B3C785C-6290-11DF-A355-A6ECDED72085_$" \
-H "Accept: text/xml" \
-H "User-Agent: pyd1/1.0 +http://dataone.org/" \
-F pid=10Dappend1.txt \
-F object=@content.bin \
-F sysmeta=@systemmetadata.abc \
"https://demo1.test.dataone.org/knb/d1/mn/v1/object"
Example serialized body of a HTTP POST request to the MN create() method (excluding session information):
TODO
RFC2046 does not explicitly prevent the presence of a message body in a HTTP DELETE request, however support for transmission of the request payload may vary by technology. DELETE requests requiring a request payload MUST have accompanying integration tests that exercise the technologies involved.