Authorization and Authentication in DataONE =========================================== Authorization vs. Authentication: A Primer ------------------------------------------- The process of confirming whether a user has privileges to access a resource or use a service is called *authorization*. *Authentication,* on the other hand, is the process of determining whether or not a user is who they say they are. Both are required of a security architecture to ensure that the right people have the right access to resources and services. Authorization is achieved through the association of usernames (Subjects) and permissions with the resources and services being secured. Typically, this is done using access control lists (ACL). When a request is made, the identity of the user is looked up in the ACL, and the appropriate action is taken based on the user's permissions. DataONE uses Subjects contained in a resource's SystemMetadata, as well as Subjects in the Authoritative Member Node's Node document as the ACL for the resource when making authorization decisions. The latter is used primarily for administrative actions and to secure services. In authentication, the user provides their username along with other information that gives assurances that they are who they say they are. Typical computer logon accounts are examples of authentication, where the password serves as the information used to assure a user's identity. Username-password systems over the internet need to be a bit more complicated than that, in that even the username and password have to be secured before sending them to the remote server. That is, the user needs to authenticate the remote server and encrypt her confidential information before sending it. X.509 has emerged as the de-facto standard used to do this, and is what DataONE uses for authentication. X.509 Authentication -------------------- X.509 is a public infrastructure that provides for a way to trust newly-encountered entities through a strict chain-of-trust system. It works though a public key infrastructure where trusted third parties known as Certificate Authorities (CA) issue certificates to entities that they can send to end-users and use for encrypted communication. Through chain-of-trust, if the issuing CA (who's identity is contained in the certificate sent to the end-user) is trusted by the end-user, then the end-user trusts the entity sending them the certificate. Major internet browsers come pre-packaged with a set of CA certificates from well-established and reputable CAs. Certificates signed by one of these CAs can be referred to as "commercially-signed" certificates. For example, VeriSign and Thawte are two well-known CAs. Imagine a bank purchases a certificate from VeriSign to use in online transactions with customers. When customers connect to the bank's web-site, their browser receives the bank's certificate, and traces the signing chain, finding VeriSign as the signer. If it finds the VeriSign certificate in its local trusted CA list, then it trusts that the certificate it just received is the bank's, and can authenticate the connection. Otherwise, authentication fails, and the web page is not loaded. (At this point, some browsers appeal to the user that it doesn't trust the signer of the certificate, and asks the user if they should, by adding the signer to their list of trusted CAs.) Self-signed Certificates ------------------------ It's possible for organizations to create their own signing authority, and use those. These types of certificates are generally only useful for situations where trust can be established in other ways - in other words, where the client and the server know each other. Prime examples of this are certificates used by corporations for internal applications, where system administrators can install the certificate on behalf of users. DataONE uses this type of certificate to authenticate requests between Nodes in its network. DataONE Authentication ---------------------- In the above example, the end-user provides a username and password to authenticate themselves, while the web-server authenticates itself to the end-user using a certificate. This approach doesn't work in the distributed DataONE environment, where servers communicate with other servers, as well as end-users. Instead, DataONE relies on both end-users and servers (the MNs and CNs) to use these X.509 certificates to authenticate themselves, and relies on CILogon to provide certificates to end-users. The use of CILogon has two main advantages for end-users. First, they can use existing accounts to obtain certificates, so don't need to create and remember another username and password combination. Second, once they have downloaded the certificate, it will secure connections with all DataONE nodes throughout the day, and can be used by multiple DataONE applications. This technique is known as single-sign-on. CILogon certificates issued for DataONE also have a third feature: they include additional DataONE Subjects mapped to the certificate's Subject through DataONE's identity management service, the DataONE Portal. In a nutshell, a DataONE identity is the set of user accounts and groups that a person maintains. For more information on CILogon see their FAQ_. The DataONE landing page for CILogon is here_. .. _FAQ: http://www.cilogon.org/faq .. _here: https://cilogon.org/?skin=DataONE Member Node Certificates ------------------------ Member Nodes cannot not use CILogon certificates to make calls to other DataONE nodes (as they are short-lived), but rather they use long-lived X.509 certificates issued by DataONE when they register their node with the DataONE network. Note that this DataONE-signed certificate is only used for initiating requests, and is not used when responding to requests. In other words, it is used only when the Member Node is as acting as a client making requests. In this situation, the connection manager it uses for the request will receive a commercially-signed certificate from the other DataONE Node during the request handshake, and so no special trust needs to be set up. Note that the behavior of the "other DataONE Node" from above is the same behavior the Member Node needs when responding to DataONE service API requests. This certificate is known as the Member Node's *server* certificate. In short, Member Nodes (and Coordinating Nodes) acts both as a client and as a server. In its client role, the Member Node uses its DataONE issued and signed certificate, and needs to trust only commercially signed certificates. In its server role, it needs to accept CILogon-issued-commercially-signed certificates as well as DataONE signed certificates from requesters, and respond with a commercially-signed certificate of its own. Trust Relationships -------------------- Below illustrates the certificates used for making requests... =================== ======================== =========================== Client / Requester requests using request cert. type =================== ======================== =========================== End-user CILogon-signed cert. short-lived, commercial ------------------- ------------------------ --------------------------- Coordinating Node DataONE-signed cert. long-lived, non-commercial ------------------- ------------------------ --------------------------- Member Node DataONE-signed cert. long-lived, non-commercial =================== ======================== =========================== ... and the certificates given in response. =================== ========================== Server responds with =================== ========================== Coordinating Node commercially-signed cert ------------------- -------------------------- Member Node commercially-signed cert =================== ========================== Regarding Commercially-Signed Certificates ------------------------------------------- Client applications use client connection managers to set up the SSL connection that will exchange certificates, and most connection managers come configured with mostly the same set of CAs that they trust. However, the overlap is not complete, so Member Nodes should take extra care to test that their server certificate is widely trusted by all major browsers, (Java) JVMs, and OS-specific trust-stores, so that their data is most widely accessible.