Developer Tools
The KNB supports the DataONE REST API, and Java, Python, and R libraries for easily creating client tools.
DataONE Developer API
A REST API for accessing and contributing data.
The KNB supports the DataONE REST API for automating the process of uploading, downloading, and searching for data on the KNB using scripted languages such as shell, R, Matlab, and Python, among others. This guide is a brief synopsis of the DataONE API, which is more comprehensively documented in the DataONE Architecture Documentation (also see the development version of the architecture guide for future releases). This API allows any software tool that supports the DataONE API (such as the rDataONE R package) to also be able to seamlessly interact with KNB data. While DataONE maintains the full technical documentation on the API, here is a brief overview for commonly accessed services on the KNB.
Summary
DataONE distinguishes three classes of objects that it will store and manage: data objects, science metadata objects, and resource map documents. Each of these are uniquely identifiable by their persistent identifier (PID), and each has associated SystemMetadata which describes the object type, size, access rules, etc.
Data objects are treated as opaque blobs, and are retrievable
via the get
method given a persistent identifier (PID). Data objects
can be represented in any format, but the repository encourages the use of
non-propietary, open formats such as CSV and netCDF that make good archival formats.
Science metadata objects are metadata documents such as EML, FGDC, ISO19115, and so forth that provide metadata describing some data object(s). These are represented in XML according to their respective schema.
Resource Maps objects are OAI-ORE documents that
describe the aggregations of data and metadata into
data packages
. Individual data and metadata files can be uploaded
to the repository, but to indicate that a set of files is part of an
aggregated data package, you must provide a OAI-ORE resource map linking the
objects.
All API access is over HTTPS, and accessed via the
https://knb.ecoinformatics.org/knb/d1/mn/v1/
endpoint. The relative path prefix/v1/
indicates that we are currently using version 1 of the DataONE API. This will be updated with future API version releases.
The examples below show calls to the production KNB data repository REST endpoint (
https://knb.ecoinformatics.org/knb/d1/mn/v1
), but users should not create test data on the production KNB. Instead, please use a test Metacat server to explore the API and create test data (e.g.,https://dev.nceas.ucsb.edu/knb/d1/mn/v1
).
Quick Reference
URL | Method | Example |
---|---|---|
/object/<pid> |
GET | Get an Object |
/object |
POST | Create an Object |
/object/<pid> |
PUT | Update an Object |
/archive/<pid> |
PUT | Archive an Object |
/meta/<pid> |
GET | Get System Metadata for an Object |
/generate |
POST | Generate an Identifier |
/query/solr/<query> |
GET | Search the metadata index |
/object |
GET | List objects |
/object/<pid> |
DELETE | Delete an Object |
Request Format
- GET, HEAD, and DELETE requests only pass parameters as part of the URL. The parameter values must be converted to UTF-8 and appropriately escaped for incorporating into the URL.
- Message bodies (e.g. for POST and PUT requests) are encoded using MIME Multipart, mixed (RFC2046). All information for creating the new object or resource is transmitted in the message body, which is encoded as a MIME multipart/mixed message. We use two types of content in MIME multipart/mixed messages: parameters and files. Parameters are to be used for all simple types (such as a String value). Files are to be used for all complex types (such as an XML structure) and for octet streams.
Response Format
Version 1.0 of the DataONE services only support XML serialization, and this format MUST be used when communicating with the KNB. Request and response documents MUST also be encoded using UTF-8.
Authentication and Authorization
Authentication is handled using SSL with client-side certificates
(in X.509 format). Users can
log into CILogon
to download a certificate,
which can then be included in requests. The Subject of the provided certificate
will be used by the KNB to determine all access control decisions for accessing,
creating, updating, archiving, and deleting objects. If a client-side certificate
is not provided, the user will be considered an anonymous public
user
and will only be able to access public content.
Each language or submission tool will have different mechanisms for setting the
client certificate in the SSL session. For example, for Curl the certificate filename
is passed in on the commandline: curl -X POST --cert /tmp/x509up_u502 ...
.
The version of
curl
shipped by Apple on MacOS X 10.9 and later is broken and does not support providing PEM certificates via the command line. Instead, it uses certificates registered in the system keychain, as described on the curl mailing list. Thus, calls to the KNB that require a certificate will fail on the standard Mac curl version, which can be fixed by replacing this with the MacPorts version of curl, or by using a certificate converted to PK12 format. A workaround for these issues is being explored, as the behavior differs in Mavericks and Yosemite.
Get an Object
Each object on DataONE has a persistent identifier (PID), which can be used to get the
bytes of tha object. Note that PID identifiers must be escaped using URL escaping conventions
if they contain characters that are normally reserved in URLs. For example, a DOI such as
doi:10.5063/FF1HT2M7Q
is a PID which would need to be escaped to
doi:10.5063%2FF1HT2M7Q
when used in a URL.
ENDPOINT="https://knb.ecoinformatics.org/knb/d1/mn/v1"
curl -X GET \
-H "Accept: text/xml" \
"${ENDPOINT}/object/doi:10.5063%2FF1HT2M7Q"
If a certificate is not provided in the request, then the results will only include publicly
accessible content. To view private content, be sure to include a valid X.509 certificate in
the request (e.g,, in curl, use the --cert
argument to provide the path to a
certificate that that was previously downloaded from CILogon).
Create an Object
An object can be inserted into the repository using thecreate
API call, which
involves POSTing the object to the object collection. Required parameters include the
pid
to be used for the object, the bytes of the object
itself,
and an XML
SystemMetadata
(sysmeta
) document describing core metadata
properties about the object, including who owns it, its format, etc.
curl -X POST \
--cert /tmp/x509up_u501 \
-H "Charset: utf-8" \
-H "Content-Type: multipart/mixed; boundary=----------4A2D135C-52CC-017FC-B269-B711ED211576_$" \
-H "Accept: text/xml" \
-F pid=urn:uuid:56eafcec-8b0a-11e3-a5e8-00334b2a1a0a \
-F object=@mydatafile.csv \
-F sysmeta=@mysystemmetadata.xml \
"${ENDPOINT}/object"
Update an Object
An object can be updated in the repository using theupdate
API call, which
involves PUTing the object to the object collection. Required parameters include the
newPid
to be used for the object, the bytes of the object
itself,
and an XML
SystemMetadata
(sysmeta
) document describing core metadata
properties about the object, including who owns it, its format, etc. Note that this operation
occurs against the original object by including its pid
in the REST URL.
curl -X PUT \
--cert /tmp/x509up_u501 \
-H "Charset: utf-8" \
-H "Content-Type: multipart/mixed; boundary=----------4A2D135C-52CC-017FC-B269-B711ED211576_$" \
-H "Accept: text/xml" \
-F newPid=urn:uuid:21865616-8b0d-11e3-a31f-00334b2a1a0a \
-F object=@mydatafile.csv \
-F sysmeta=@mysystemmetadata.xml \
"${ENDPOINT}/object/urn:uuid:56eafcec-8b0a-11e3-a5e8-00334b2a1a0a"
Archive an Object
An object can be archived, which moves it out of the search path so it won't be discovered, but is still accessible to users who know thepid
of the object so that citations remain
viable. To archive an object, call the archive
service using an HTTP PUT with the
pid
in the service endpoint.
curl -X PUT \
--cert /tmp/x509up_u501 \
-H "Accept: text/xml" \
"${ENDPOINT}/archive/urn:uuid:56eafcec-8b0a-11e3-a5e8-00334b2a1a0a"
Get System Metadata for an Object
Use thegetSystemMetadata
to access the SystemMetadata for an object, which represents critical
information about each object on the repository, including its identifier, its type, access
control policies, and replication policies, and other details like size and checksum.
curl -X GET \
-H "Accept: text/xml" \
"${ENDPOINT}/meta/urn:uuid:56eafcec-8b0a-11e3-a5e8-00334b2a1a0a"
Generate an identifier
Creating an object on the repository requires submitting it with a globally unique identifier, which can be generated by calling thegenerateIdentifier
service. This service can
be used to generate identifiers that are UUIDs, DOIs, and that potentially follow other syntax
conventions. The scheme
parameter controls which type of identifier should be
generated. Generally, the use of UUIDs is encouraged for fine-grained identifiation of individual
files within a data package, and the use of DOIs for the identifier for the metadata record for
an overall data package.
curl -X POST \
--cert /tmp/x509up_u501 \
-H "Accept: text/xml" \
-F scheme=UUID \
"${ENDPOINT}/generate"
Search the metadata index
To search across all of the metadata in the repository, use thequery
service to
configure a SOLR query. The full SOLR syntax is supported, providing the means to create complex
logical query conditions, and to customize the metadata fields returned. Query results can be
returned in xml
and json
formats. Paging through results is supported
using the rows
and start
parameters. To search only the most recent
version of the metadata, include the -obsoletedBy:*
constraint in the SOLR query. And
note that all SOLR queries must be properly URL-escaped and SOLR escaped to be processed correctly
(e.g., spaces in the SOLR query need to be escaped with a '+' or '%20', and colons in a SOLR query
value need to be preceded by a backslash). In addition, to run these commands from curl, shell
escapes will also need to be added as appropriate (e.g., by quoting strings).
curl -X GET \
-H "Accept: text/xml" \
"${ENDPOINT}/query/solr/q=title:soil+AND+-obsoletedBy:*&fl=identifier,title,origin&rows=30&start=0&wt=xml"
curl -X GET \
-H "Accept: text/xml" \
"${ENDPOINT}/query/solr"
List Objects
ThelistObjects
service provides a sequential list of objects on a node, and is
minimally filterable. The query
service generally contains more information and is
preferred, but the object list can be useful to see recent activity on the repository.
curl -X GET -H "Accept: text/xml" "${ENDPOINT}/object?start=0&count=100"
Delete an Object
Delete is an administrative service that can not be called by users. Contact an administrator for appropriate credentials. Thedelete
service is provided to fully remove content from
the repository, particularly when that content violates a law or ethical standard. When removing
content for scientific reasons, archive
is the proper method as it preserves citable
links while still hiding content from search.
curl -X DELETE \
--cert /tmp/x509up_u501 \
-H "Accept: text/xml" \
"${ENDPOINT}/object/urn:uuid:56eafcec-8b0a-11e3-a5e8-00334b2a1a0a"
DataONE Java Client Library
A helper library for calling the REST API using Java.
DataONE Python Client Library
A helper library for calling the REST API using Python.
DataONE R Package
An R package providing classes and methods for calling the API within R.