<%=title%> Developer Tools
The <%=title%> supports the DataONE REST API, and Java, Python, and R libraries for easily creating client tools.
DataONE REST API
A REST API for accessing and contributing data.
The <%=title%> supports the DataONE REST API for automating the process of uploading, downloading, and searching for data on the <%=title%> using scripted languages such as shell, R, Matlab, and Python, among others. This guide is a brief synopsis of the DataONE API, which is more comprehensively documented in the DataONE Architecture Documentation (also see the development version of the architecture guide for future releases). This API allows any software tool that supports the DataONE API (such as the rDataONE R package) to also be able to seamlessly interact with <%=title%> data. While DataONE maintains the full technical documentation on the API, here is a brief overview for commonly accessed services on the <%=title%>.
Summary
DataONE distinguishes three classes of objects that it will store and manage: data objects, science metadata objects, and resource map documents. Each of these are uniquely identifiable by their persistent identifier (PID), and each has associated SystemMetadata which describes the object type, size, access rules, etc.
-
Data objects
are treated as opaque blobs, and are retrievable via the
get
method given a persistent identifier (PID). Data objects can be represented in any format, but the repository encourages the use of non-proprietary, open formats such as CSV and netCDF. -
Science metadata objects
are metadata documents such as EML, FGDC, ISO19115, and so forth that provide metadata describing some data object(s). These are represented in XML according to their respective schema.
-
Resource Map objects
are OAI-ORE documents that describe the aggregations of data and metadata into
data packages
. Individual data and metadata files can be uploaded to the repository, but to indicate that a set of files is part of an aggregated data package, you must provide a OAI-ORE resource map linking the objects.In addition to aggregation, Resource Maps can describe the origin of objects by asserting provenance relationships. These relationships will be displayed on the <%=title%>.
All API access is over HTTPS, and accessed via the
<%=apiBaseUrl%>/
endpoint. The relative path prefix/v2/
indicates that we are currently using version 2 of the DataONE API.
The examples below show calls to the production <%=title%> data repository REST endpoint (
<%=apiBaseUrl%>
), but users should not create test data in the production environment. Instead, please use a test Metacat server to explore the API and create test data (e.g.,<%=apiBaseUrl%>
).
Quick Reference
URL | Method | Example |
---|---|---|
/object/<pid> |
GET | Get an Object |
/object |
POST | Create an Object |
/object/<pid> |
PUT | Update an Object |
/archive/<pid> |
PUT | Archive an Object |
/meta/<pid> |
GET | Get System Metadata for an Object |
/meta/<pid> |
PUT | Update System Metadata for an Object |
/generate |
POST | Generate an Identifier |
/query/solr/<query> |
GET | Search the metadata index |
/object |
GET | List objects |
/object/<pid> |
DELETE | Delete an Object |
Request Format
- GET, HEAD, and DELETE requests only pass parameters as part of the URL. The parameter values must be converted to UTF-8 and appropriately escaped for incorporating into the URL.
- Message bodies (e.g. for POST and PUT requests) are encoded using MIME Multipart, mixed (RFC2046). All information for creating the new object or resource is transmitted in the message body, which is encoded as a MIME multipart/mixed message. We use two types of content in MIME multipart/mixed messages: parameters and files. Parameters are to be used for all simple types (such as a String value). Files are to be used for all complex types (such as an XML structure) and for octet streams.
Response Format
Version 1.0 of the DataONE services only support XML serialization, and this format MUST be used when communicating with the <%=title%>. Request and response documents MUST also be encoded using UTF-8.
Authentication
Two mechanisms are supported for authentication:
- Authentication Tokens passed in the HTTP "Authorization" header
- Client-side SSL certificates (deprecated)
Using Authentication Tokens. In this preferred approach, users sign in to the <%=MetacatUI.appModel.get("repositoryName")%>, and copy an authentication token from their profile page which can then be included in HTTPS requests in the HTTP header "Authorization:" to establish their identity.
The Authentication Token is a long base64-encoded string of characters that encodes the user's credentials
in a signed and validatable JWT token. Each language or
tool will have is own mechanism for setting HTTP headers. For example, for curl an authenticated request
can be made using the '-H' command line option to set the header, such as:
$ export TOKEN='eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiJodHRwOlwvXC9vcmNp...'
$ curl -H "Authorization: Bearer $TOKEN" <%=apiBaseUrl%>/object
The token expires in 18 hours, and so a new token will need to be retrieved periodically.
The authentication token should be carefully protected just as you would an account password, as it gives the holder full rights to the account. Do not save tokens in code, and don't check them into version control systems, or otherwise make them available to other people.
Using client-side certificates over SSL.
Users can log into CILogon to download a client certificate, which can then be included in requests as part of the SSL session with the host. The Subject of the provided certificate will be used by the <%=title%> to determine all access control decisions for accessing, creating, updating, archiving, and deleting objects. If a client-side certificate is not provided, the user will be considered an anonymousDeprecated. Client-side SSL certificates are deprecated because certain clients and browsers do not fully support them. Use JWT Authentication tokens as described above instead.
public
user
and will only be able to access public content.
Each language or submission tool will have different mechanisms for setting the
client certificate in the SSL session. For example, for Curl the certificate filename
is passed in on the command line: curl -X POST --cert /tmp/x509up_u502 ...
.
The version of
curl
shipped by Apple on MacOS X 10.9 and later is broken and does not support providing PEM certificates via the command line. Instead, it uses certificates registered in the system keychain, as described on the curl mailing list. Thus, calls to the <%=title%> that require a certificate will fail on the standard Mac curl version, which can be fixed by replacing this with the MacPorts version of curl, or by using a certificate converted to PK12 format.
Authorization
Authorizing access for each object in the system is determined by the policies in its associated SystemMetadata document,
which contains fields for rightsHolder
, authoritativeMemberNode
, and accessPolicy
.
Here’s an example SystemMetadata document showing these fields on a test server:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<d1:systemMetadata xmlns:d1v1="http://ns.dataone.org/service/types/v1" xmlns:d1="http://ns.dataone.org/service/types/v2.0">
<serialVersion>0</serialVersion>
<identifier>02281a16-763e-4be0-8d2e-cd75da8c83a5</identifier>
<formatId>eml://ecoinformatics.org/eml-2.1.1</formatId>
<size>1197</size>
<checksum algorithm="MD5">f1bb96fe0239e72e993e172d8c410bd6</checksum>
<submitter>https://orcid.org/0000-0002-7136-9046</submitter>
<rightsHolder>https://orcid.org/0000-0002-7136-9046</rightsHolder>
<accessPolicy>
<allow>
<subject>public</subject>
<permission>read</permission>
</allow>
</accessPolicy>
<archived>false</archived>
<dateUploaded>2019-07-05T18:00:42.337+00:00</dateUploaded>
<dateSysMetadataModified>2019-07-05T18:00:42.337+00:00</dateSysMetadataModified>
<originMemberNode>urn:node:mnTestKNB</originMemberNode>
<authoritativeMemberNode>urn:node:mnTestKNB</authoritativeMemberNode>
</d1:systemMetadata>
Each user in the DataONE network can be identified as a Subject
string
which links to that user's authenticated identity such as its ORCID identifier. These Subject strings
can then be used to grant access to objects within the system.
Note how the object is owned by the ORCID on the rightsHolder
field — that user can always do anything to the object.
Additional rules are specified in accessPolicy
. For each user, you can grant read
, write
,
and changePermission
permissions. Note that the special user public
has been granted read
,
which makes the document globally readable by all users, even if they are not logged in. If you want to allow another account to
make changes, you could add an allow
rule to grant their ORCID the write
permission in the SystemMetadata
document, and then update that using the updateSystemMetadata
API.
Method Details and Examples
Get an Object
Each object on DataONE has a persistent identifier (PID), which can be used to get the bytes of tha object. Note that PID identifiers must be escaped using URL escaping conventions if they contain characters that are normally reserved in URLs. For example, a DOI such as
doi:10.5063/FF1HT2M7Q
is a PID which would need to be escaped to
doi:10.5063%2FF1HT2M7Q
when used in a URL.
ENDPOINT="<%=apiBaseUrl%>"
curl -X GET \
-H "Accept: text/xml" \
"${ENDPOINT}/object/doi:10.5063%2FF1HT2M7Q"
If a certificate is not provided in the request, then the results will only include publicly
accessible content. To view private content, be sure to include a valid X.509 certificate in
the request (e.g,, in curl, use the --cert
argument to provide the path to a
certificate that that was previously downloaded from CILogon).
Create an Object
An object can be inserted into the repository using thecreate
API call, which
involves POSTing the object to the object collection. Required parameters include the
pid
to be used for the object, the bytes of the object
itself,
and an XML
SystemMetadata
(sysmeta
) document describing core metadata
properties about the object, including who owns it, its format, etc.
export TOKEN='eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiJodHRwOlwvXC9vcmNp...'
curl -X POST \
-H "Authorization: Bearer $TOKEN"
-H "Charset: utf-8" \
-H "Content-Type: multipart/mixed; boundary=----------4A2D135C-52CC-017FC-B269-B711ED211576_$" \
-H "Accept: text/xml" \
-F pid=urn:uuid:56eafcec-8b0a-11e3-a5e8-00334b2a1a0a \
-F object=@mydatafile.csv \
-F sysmeta=@mysystemmetadata.xml \
"${ENDPOINT}/object"
Update an Object
An object can be updated in the repository using theupdate
API call, which
involves PUTing the object to the object collection. Required parameters include the
newPid
to be used for the object, the bytes of the object
itself,
and an XML
SystemMetadata
(sysmeta
) document describing core metadata
properties about the object, including who owns it, its format, etc. Note that this operation
occurs against the original object by including its pid
in the REST URL.
curl -X PUT \
-H "Authorization: Bearer $TOKEN"
-H "Charset: utf-8" \
-H "Content-Type: multipart/mixed; boundary=----------4A2D135C-52CC-017FC-B269-B711ED211576_$" \
-H "Accept: text/xml" \
-F newPid=urn:uuid:21865616-8b0d-11e3-a31f-00334b2a1a0a \
-F object=@mydatafile.csv \
-F sysmeta=@mysystemmetadata.xml \
"${ENDPOINT}/object/urn:uuid:56eafcec-8b0a-11e3-a5e8-00334b2a1a0a"
Archive an Object
An object can be archived, which moves it out of the search path so it won't be discovered, but is still accessible to users who know thepid
of the object so that citations remain
viable. To archive an object, call the archive
service using an HTTP PUT with the
pid
in the service endpoint.
curl -X PUT \
-H "Authorization: Bearer $TOKEN"
-H "Accept: text/xml" \
"${ENDPOINT}/archive/urn:uuid:56eafcec-8b0a-11e3-a5e8-00334b2a1a0a"
Get System Metadata for an Object
UsegetSystemMetadata
to access the SystemMetadata for an object, which represents critical
information about each object on the repository, including its identifier, its type, access
control policies, and replication policies, and other details like size and checksum.
curl -X GET \
-H "Accept: text/xml" \
"${ENDPOINT}/meta/urn:uuid:56eafcec-8b0a-11e3-a5e8-00334b2a1a0a"
Update System Metadata for an Object
UseupdateSystemMetadata
to update the SystemMetadata for an object, which represents critical
information about each object on the repository, including its identifier, its type, access
control policies, and replication policies, and other details like size and checksum.
curl -X PUT \
-H "Authorization: Bearer $TOKEN"
-H "Charset: utf-8" \
-H "Content-Type: multipart/mixed; boundary=----------4A2D135C-52CC-017FC-B269-B711ED211576_$" \
-H "Accept: text/xml" \
-F pid=urn:uuid:56eafcec-8b0a-11e3-a5e8-00334b2a1a0a \
-F sysmeta=@mysystemmetadata.xml \
"${ENDPOINT}/meta/urn:uuid:56eafcec-8b0a-11e3-a5e8-00334b2a1a0a"
Generate an identifier
Creating an object on the repository requires submitting it with a globally unique identifier, which can be generated by calling thegenerateIdentifier
service. This service can
be used to generate identifiers that are UUIDs, DOIs, and that potentially follow other syntax
conventions. The scheme
parameter controls which type of identifier should be
generated. Generally, the use of UUIDs is encouraged for fine-grained identification of individual
files within a data package, and the use of DOIs for the identifier for the metadata record for
an overall data package.
curl -X POST \
-H "Authorization: Bearer $TOKEN"
-H "Accept: text/xml" \
-F scheme=UUID \
"${ENDPOINT}/generate"
Search the metadata index
To search across all of the metadata in the repository, use thequery
service to
configure a SOLR query. The full SOLR syntax is supported, providing the means to create complex
logical query conditions, and to customize the metadata fields returned. Query results can be
returned in xml
and json
formats. Paging through results is supported
using the rows
and start
parameters. To search only the most recent
version of the metadata, include the -obsoletedBy:*
constraint in the SOLR query. And
note that all SOLR queries must be properly URL-escaped and SOLR escaped to be processed correctly
(e.g., spaces in the SOLR query need to be escaped with a '+' or '%20', and colons in a SOLR query
value need to be preceded by a backslash). In addition, to run these commands from curl, shell
escapes will also need to be added as appropriate (e.g., by quoting strings).
curl -X GET \
-H "Accept: text/xml" \
"${ENDPOINT}/query/solr/q=title:soil+AND+-obsoletedBy:*&fl=identifier,title,origin&rows=30&start=0&wt=xml"
The searchable SOLR fields that can be used to compose queries are accessible from the query service as well by accessing the endpoint without any query constraints.
curl -X GET \
-H "Accept: text/xml" \
"${ENDPOINT}/query/solr"
Example: To retrieve the download/view counts of a particular object in the <%=title%>, use this Solr query:
curl -X GET \
-H "Accept: text/xml" \
"{ENDPOINT}/query/solr/q=id:{OBJECT_PID}&fl=read_count_i"
List Objects
ThelistObjects
service provides a sequential list of objects on a node, and is
minimally filterable. The query
service generally contains more information and is
preferred, but the object list can be useful to see recent activity on the repository.
curl -X GET -H "Accept: text/xml" "${ENDPOINT}/object?start=0&count=100"
Delete an Object
Delete is an administrative service that can not be called by users. Contact an administrator for appropriate credentials. Thedelete
service is provided to fully remove content from
the repository, particularly when that content violates a law or ethical standard. When removing
content for scientific reasons, archive
is the proper method as it preserves citable
links while still hiding content from search.
curl -X DELETE \
-H "Authorization: Bearer $TOKEN"
-H "Accept: text/xml" \
"${ENDPOINT}/object/urn:uuid:56eafcec-8b0a-11e3-a5e8-00334b2a1a0a"
DataONE Java Client Library
A helper library for calling the REST API using Java.
DataONE Python Client Library
A helper library for calling the REST API using Python.
DataONE R Package
An R package providing classes and methods for calling the API within R.
DataONE MATLAB library
A MATLAB package providing classes and methods for calling the API within Matlab.