Overview of Coordinating Node Software
======================================
TODO: Describe software, installation and upgrade at a high level
Overview of Upgrading a Coordinating Node
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
During an upgrade procedure we have several goals to accomplish.
High Level goals:
1) Update all Coordinating Nodes to the same software release level
2) Keep Production CN environment running and responding to requests at all times
3) Ensure consistent data responses to end users
Details of Goals
1) Do not have different versions of products running on CNs
communicating with one another
a) Restrict incompatible data structures (schema changes) from
being accessed in an environment
b) Note that we try not to remove existing data structures now
i) DataONE may add new data structures
ii) DataONE may modify existing ones
iii) We must support previous revisions of DataONE data structures
d) Restrict access to incompatible software stacks (e.g. HZ 1.x --> 2.x)
d) password changes in service deployments, etc. (broken comms) ?
2) Always have a single CN up and running (No down time)
a) Always have read services up . When we say 'No down time', we
mean that MemberNodes and Clients will still be able
to minimally use Coordinating Node Services.
b) For non interference with MemberNodes, at a minimal we should be
able to call 'reserveIdentifiers' as a write function.
c) ONEMercury always up - read access w/ authorization: 'GET' calls
d) Access to CNCore API (with exceptions of CNCore.create(),
CNCore.setObsoletedBy(), CNCore.delete(),
CNCore.archive() and CNCore.registerSystemMetadata()
e) Access to CNRead API
3) Do not allow a situation in which a user experiences data
retrieval inconsistency
a) user should not see different UI if two CNs are running
and RR DNS switch between them
b) If a user discovers a PID on a CN, then it should not 'disappear'
for hours because of an upgrade process
Issues of Upgrading a Read-Only Coordinating Node Stack
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The institution of a Cn Rest Service read only + reserveIdentifier
operation may cause violate goal 3. If LDAP needs upgrading in a
manner that causes incompatibility between different
version of the CN, then the CNs will need to be isolated from one
another until all upgrades are complete. Thus a production CN that
is exposed to the DataONE community while other CNs are upgraded
may receive reservations that, when the the upgraded CNs go live
and take the place of the previous production CN, will have
reservations the newly upgraded CNs will not have. It seems
impractical to state that LDAP will always be backwards compatible
and able to maintain replication during upgrades to ensure all CNs
have all written data at the time of a switchover. We may wish to
consider that we keep a journalling system of posted reservations
(independent of LDAP) on a pubic facing CN during upgrades that
will create a replayable log of reserveIdentifier actions in order
to ensure consistency of user access experience