Warning: These documents are under active
development and subject to change (version 2.1.0-beta).
The latest release documents are at:
https://purl.dataone.org/architecture
The process of confirming whether a user has privileges to access a resource or use a service is called authorization. Authentication, on the other hand, is the process of determining whether or not a user is who they say they are. Both are required of a security architecture to ensure that the right people have the right access to resources and services.
Authorization is achieved through the association of usernames (Subjects) and permissions with the resources and services being secured. Typically, this is done using access control lists (ACL). When a request is made, the identity of the user is looked up in the ACL, and the appropriate action is taken based on the user’s permissions. DataONE uses Subjects contained in a resource’s SystemMetadata, as well as Subjects in the Authoritative Member Node’s Node document as the ACL for the resource when making authorization decisions. The latter is used primarily for administrative actions and to secure services.
In authentication, the user provides their username along with other information that gives assurances that they are who they say they are. Typical computer logon accounts are examples of authentication, where the password serves as the information used to assure a user’s identity. Username-password systems over the internet need to be a bit more complicated than that, in that even the username and password have to be secured before sending them to the remote server. That is, the user needs to authenticate the remote server and encrypt her confidential information before sending it. X.509 has emerged as the de-facto standard used to do this, and is what DataONE uses for authentication.
X.509 is a public infrastructure that provides for a way to trust newly-encountered entities through a strict chain-of-trust system. It works though a public key infrastructure where trusted third parties known as Certificate Authorities (CA) issue certificates to entities that they can send to end-users and use for encrypted communication. Through chain-of-trust, if the issuing CA (who’s identity is contained in the certificate sent to the end-user) is trusted by the end-user, then the end-user trusts the entity sending them the certificate. Major internet browsers come pre-packaged with a set of CA certificates from well-established and reputable CAs. Certificates signed by one of these CAs can be referred to as “commercially-signed” certificates.
For example, VeriSign and Thawte are two well-known CAs. Imagine a bank purchases a certificate from VeriSign to use in online transactions with customers. When customers connect to the bank’s web-site, their browser receives the bank’s certificate, and traces the signing chain, finding VeriSign as the signer. If it finds the VeriSign certificate in its local trusted CA list, then it trusts that the certificate it just received is the bank’s, and can authenticate the connection. Otherwise, authentication fails, and the web page is not loaded. (At this point, some browsers appeal to the user that it doesn’t trust the signer of the certificate, and asks the user if they should, by adding the signer to their list of trusted CAs.)
It’s possible for organizations to create their own signing authority, and use those. These types of certificates are generally only useful for situations where trust can be established in other ways - in other words, where the client and the server know each other. Prime examples of this are certificates used by corporations for internal applications, where system administrators can install the certificate on behalf of users. DataONE uses this type of certificate to authenticate requests between Nodes in its network.
In the above example, the end-user provides a username and password to authenticate themselves, while the web-server authenticates itself to the end-user using a certificate. This approach doesn’t work in the distributed DataONE environment, where servers communicate with other servers, as well as end-users. Instead, DataONE relies on both end-users and servers (the MNs and CNs) to use these X.509 certificates to authenticate themselves, and relies on CILogon to provide certificates to end-users.
The use of CILogon has two main advantages for end-users. First, they can use existing accounts to obtain certificates, so don’t need to create and remember another username and password combination. Second, once they have downloaded the certificate, it will secure connections with all DataONE nodes throughout the day, and can be used by multiple DataONE applications. This technique is known as single-sign-on.
CILogon certificates issued for DataONE also have a third feature: they include additional DataONE Subjects mapped to the certificate’s Subject through DataONE’s identity management service, the DataONE Portal. In a nutshell, a DataONE identity is the set of user accounts and groups that a person maintains.
For more information on CILogon see their FAQ.
The DataONE landing page for CILogon is here.
Member Nodes cannot not use CILogon certificates to make calls to other DataONE nodes (as they are short-lived), but rather they use long-lived X.509 certificates issued by DataONE when they register their node with the DataONE network. Note that this DataONE-signed certificate is only used for initiating requests, and is not used when responding to requests. In other words, it is used only when the Member Node is as acting as a client making requests. In this situation, the connection manager it uses for the request will receive a commercially-signed certificate from the other DataONE Node during the request handshake, and so no special trust needs to be set up.
Note that the behavior of the “other DataONE Node” from above is the same behavior the Member Node needs when responding to DataONE service API requests. This certificate is known as the Member Node’s server certificate.
In short, Member Nodes (and Coordinating Nodes) acts both as a client and as a server. In its client role, the Member Node uses its DataONE issued and signed certificate, and needs to trust only commercially signed certificates. In its server role, it needs to accept CILogon-issued-commercially-signed certificates as well as DataONE signed certificates from requesters, and respond with a commercially-signed certificate of its own.
Below illustrates the certificates used for making requests...
Client / Requester | requests using | request cert. type |
---|---|---|
End-user | CILogon-signed cert. | short-lived, commercial |
Coordinating Node | DataONE-signed cert. | long-lived, non-commercial |
Member Node | DataONE-signed cert. | long-lived, non-commercial |
... and the certificates given in response.
Server | responds with |
---|---|
Coordinating Node | commercially-signed cert |
Member Node | commercially-signed cert |
Client applications use client connection managers to set up the SSL connection that will exchange certificates, and most connection managers come configured with mostly the same set of CAs that they trust. However, the overlap is not complete, so Member Nodes should take extra care to test that their server certificate is widely trusted by all major browsers, (Java) JVMs, and OS-specific trust-stores, so that their data is most widely accessible.