Warning: These documents are under active development and subject to change (version 2.1.0-beta).
The latest release documents are at: https://purl.dataone.org/architecture

Replication Overview

Revision History
View document revision history.

DataONE provides replication services to satisfy both data and metadata preservation needs and to provide the potential for fault-tolerance and load balancing services for data and metadata access. Tier 4 Member Nodes within the federation are set up to house replicas of content, and provide this service to other Member Nodes based on certain policy agreements. Replication is handled on a per-object basis within DataONE, with the RightsHolder and/or Authoritative Member Node controlling the ReplicationPolicy for each object, which determines whether it will be replicated. In addition, each Member Node decides whether it will accept replicas in general (by supporting the Tier 4 CNReplication API and setting replicate=true), and can decide whether it will accept any given request to replicate an object. Coordinating Nodes monitor the Types.ReplicationPolicy for each object in DataONE, and ensure that the appropriate replication target nodes house an accurate replica of the object. Each replica of an object is recorded by the Coordinating Nodes, so when a consumer wishes to retrieve the object, they can use CNRead.resolve() to list the replicas and MNRead.get() to retrieve any of the replicas in the network.

Summary of Replication process

To fulfill the Types.ReplicationPolicy for each object, the CN schedules each object to be replicated with one of the Tier 4 MNs that are willing to host replicas. Replication is an asynchronous, multi-step process, in order to allow for non-blocking replication of objects that would take more than a few seconds to copy over the network. The process originates with 1) the CN calling MNReplication.replicate() on the target MN, which is a request for the MN to replicate a particular object. The MN responds with a HTTP 200 if it is willing and able to attempt the replication and house the object, and the CN marks the replica request as REQUESTED. See Types.ReplicationStatus for the definition of the status values. Then, 2) the target MN calls the source MNRead.getReplica() to request the bytes of the object, and if they are transferred correctly, then 3) calls CNReplication.setReplicationStatus() to indicate that the request has been COMPLETED, or if it FAILED. At this point the replication is finished. If the replication fails, the CN then requests that it be replicated elsewhere. If it succeeds, the CN will check in periodically with the MN to verify the checksum of the object held to confirm validity.

Object Replication Policy

The Types.ReplicationPolicy for an object defines if replication should be attempted for this object, and if so, how many replicas should be maintained. It also permits specification of preferred and blocked nodes as potential replication targets.

If a ReplicationPolicy is provided in the System Metadata for an object, then that policy is followed precisely by the Coordinating Nodes when managing replication. In the absence of a defined ReplicationPolicy for an object, DataONE will by default attempt to maintain two replicas for the object, as long as the object’s size is below a threshold size that would allow transfer over networks in reasonable time periods. As network transfer capabilities improve among DataONE nodes, this threshold size will be increased.

<replicationPolicy replicationAllowed="true" numberReplicas="2">
    <preferredMemberNode>urn:node:KNB</preferredMemberNode>
    <preferredMemberNode>urn:node:PISCO</preferredMemberNode>
    <blockedMemberNode>urn:node:SOMEBADNODE</blockedMemberNode>
</replicationPolicy>

Node Replication Policy

Nodes that wish to serve as a replication target and thereby are available to store replicas of data from around the network set Types.Node.replicate to ‘true’ in their Types.Node description when registering their node. In addition, these nodes must support the Tier 4 CNReplication API to allow Coordinating Nodes to perform all necessary operations. Nodes can express constraints on object size, total replication space available, source nodes, and object format types that a node will replicate by providing a Types.NodeReplicationPolicy as part of a it’s Types.Node description. A node may choose to restrict replication from only certain peer nodes, may have file size limits, total allocated size limits, or may want to focus on being a replication target for domain-specific object formats.

<nodeReplicationPolicy>
    <maxObjectSize>524288000</maxObjectSize>
    <spaceAllocated>1099511627776</spaceAllocated>
    <allowedNode>urn:node:KNB</allowedNode>
    <allowedNode>urn:node:ESA</allowedNode>
    <allowedNode>urn:node:SANPARKS</allowedNode>
    <allowedObjectFormat>FGDC-STD-001.1-1999</allowedObjectFormat>
    <allowedObjectFormat>eml://ecoinformatics.org/eml-2.1.1</allowedObjectFormat>
    <allowedObjectFormat>text/csv</allowedObjectFormat>
</nodeReplicationPolicy>

The Types.NodeReplicationPolicy.maxObjectSize indicates the maximum allowable size of an object to be replicated in bytes. The Types.NodeReplicationPolicy.spaceAllocated field sets an upper limit on space usage for replica storage on the given node. Once the spaceAllocated has been reached for a node, the Coordinating Nodes will no longer request that additional replicas be stored on that node. Types.NodeReplicationPolicy.allowedNode is used to list all nodes that are allowed to replicate to the target. If it is absent, then any node may replicate to the target. Types.NodeReplicationPolicy.allowedObjectFormat is used to list all object formats that may be replicated to the target. If it is absent, then any object format may be replicated to the target.

Note

Types.NodeReplicationPolicy is not currently implemented on the CN and so is ignored when making decisions as to which MN should be used for replication. In a future release, the CN scheduler will utilize the Types.NodeReplicationPolicy to limit the types of objects that are scheduled to be replicated to a MN, but for now the information is not used at all.