Overview of Coordinating Node Software

TODO: Describe software, installation and upgrade at a high level

Overview of Upgrading a Coordinating Node

During an upgrade procedure we have several goals to accomplish.

High Level goals:

  1. Update all Coordinating Nodes to the same software release level
  2. Keep Production CN environment running and responding to requests at all times
  3. Ensure consistent data responses to end users

Details of Goals

  1. Do not have different versions of products running on CNs communicating with one another

    1. Restrict incompatible data structures (schema changes) from being accessed in an environment
    2. Note that we try not to remove existing data structures now i) DataONE may add new data structures ii) DataONE may modify existing ones iii) We must support previous revisions of DataONE data structures
    1. Restrict access to incompatible software stacks (e.g. HZ 1.x –> 2.x)
    1. password changes in service deployments, etc. (broken comms) ?
  2. Always have a single CN up and running (No down time)

    1. Always have read services up . When we say ‘No down time’, we mean that MemberNodes and Clients will still be able to minimally use Coordinating Node Services.
    2. For non interference with MemberNodes, at a minimal we should be able to call ‘reserveIdentifiers’ as a write function.
    3. ONEMercury always up - read access w/ authorization: ‘GET’ calls
    4. Access to CNCore API (with exceptions of CNCore.create(), CNCore.setObsoletedBy(), CNCore.delete(), CNCore.archive() and CNCore.registerSystemMetadata()
    5. Access to CNRead API
  3. Do not allow a situation in which a user experiences data retrieval inconsistency

    1. user should not see different UI if two CNs are running and RR DNS switch between them
    2. If a user discovers a PID on a CN, then it should not ‘disappear’ for hours because of an upgrade process

Issues of Upgrading a Read-Only Coordinating Node Stack

The institution of a Cn Rest Service read only + reserveIdentifier operation may cause violate goal 3. If LDAP needs upgrading in a manner that causes incompatibility between different version of the CN, then the CNs will need to be isolated from one another until all upgrades are complete. Thus a production CN that is exposed to the DataONE community while other CNs are upgraded may receive reservations that, when the the upgraded CNs go live and take the place of the previous production CN, will have reservations the newly upgraded CNs will not have. It seems impractical to state that LDAP will always be backwards compatible and able to maintain replication during upgrades to ensure all CNs have all written data at the time of a switchover. We may wish to consider that we keep a journalling system of posted reservations (independent of LDAP) on a pubic facing CN during upgrades that will create a replayable log of reserveIdentifier actions in order to ensure consistency of user access experience