Rough Draft of CN Audit SystemMetadata DesignΒΆ

Activity Diagrams

Figure 1. Audit Job Controller

../_images/01_activity.png

The Audit Job Controller is the start of the CN Audit SystemMetadata process. The Cn Audit SystemMetadata process communicates between the different CN Members of a cluster via Hazelcast topics. These topics are configured as the process begins. All the nodes that should be a part of the cluster should be gathered from the configuration file and saved in a datastructure for ease of access and further description.

The nodes in the hazelcast configuration are IPs. These IPs will be crossreferenced to the DataONE configuration in order to determine their NodeIDs. The DataONE configuration can be found online or in /etc/dataone.

/etc/dataone/d1DebConfig.xml

or

or

The active node of the cluster will need to be determined. The active node will be responsible for merging all documents, and keeping track of the progress of the passive nodes. The active node is determined by a property set in a CNAudit.properties file named cn.audit.activenode. If the value in node.properties equals that of cn.audit.activenode, then the CN is considered the active node.

The Audit Job Controller will initialize the static state of the program. Since this is a multithreaded application, state of a single running instance will need to be synchronized in order to avoid race conditions. Also, some of the static state will be persisted at the filesystem level, those files will need to be either created &/or truncated.

There are several listeners and singletons that will need to instantiated before the rest of the system is running. One or more of the singletons will manage a persistence layer for java data objects. Listeners will be used to process commands from the AuditJob.

Any node that is passive will exit the Controller, only the active node proceeds beyond this point.

The Audit Job Controller will start a loop that checks if all the nodes as found in the hazelcast configuration are active in the hazelcast cluster. Once it is determined that all the nodes are active, then the Audit Job will be started. Since the Controller for the active node performs a loop, there will be a check to ascertain if the audit job is actively running.

The Audit Job will end either because all pids have processed, because of a node failure, or because of an Audit Job failure,

If the Audit Job ended from all pids having been processed then the passive nodes will be sent a signal to end processing. The Audit Job Controller will then evict all systemMetadata from the Storage Cluster.

If the CN Audit process still has pids to process, then the reason for the Audit Job returning must be dtermined.If the Audit Job node ended with a Passive Node failure, then the failure reason must be checked and an appropriate state should be determined for continued processing.

If the Audit Job node ended from an Active Node failure, then the state should be determined if it is recoverable error. If it is recoverable without a reset needed, then the Audit Job may be continued from where it left off. If a reset is needed then the failure reason should be checked and an appropriate state should be determined for continued processing.

If all the nodes are active, then all the nodes are sent the appropriate reset signal.

Figure 2. Audit Job

../_images/02_activity.png

The audit job will only run for the active node. The passive nodes and the active node in the cluster will react to the messages send from the active node’s running of the Audit Job.

The first task of the Audit Job is to send a signal to all the nodes to harvest pids, SerialVersions, lastSystemMetadataModificationDate and Deleted Indicators from Metacat. The Audit job will first receive from all the nodes how many pids it expects from each. Then the Audit Job waits until all data have been processed from each of the nodes (including itself).

Once the Audit Job has a list of all pids with their corresponding state, then it will loop through all the pids and process them. It compares the SerialVersions and lastSystemMetadataModificationDate and deleted Indicator of the working pid. It will determine if any of the instances have the deleted indicator set. If the deleted Indicator is set, then a message must be sent to delete the object.

If the delete indicator is not set, then the comparison investigates the serial version. If the all the SerialVersions are the same, then the lastSystemMetadataModificationDate is compared. If there is a discrepancy between any of the node’s serial version or lastSystemMetadataMOdified with each other, then further processingmust take place. Otherwise, a message is sent to all the nodes that processing of the working pid is complete.

If there is a discrepancy in either the serialVersion or lastSystemMetadataModificationDate, then SystemMetadata must be collected by the active node from all the nodes in the cluster. The AuditJob will send a Retrieve SystemMetadata signal. The active node will wait until all the nodes in the cluster have responded. The version of the SystemMetadata with the highest serial version, or if the serial versions are the same, the latest lastSystemMetadataModificationDate will be considered the ascendant version of the SystemMetadata that will be used to merge data from other revisions into.

Once all the nodes have responded then the active node will merge the systemMetadata.

Once the systemMetadata is merged, the active node will send an update systemMetadata request. The Update systemMetadata request will begin a Transaction on all the listening nodes. The AuditJob will place the systemMetadata to be updated on a Hazelcast structure to be be read by all the nodes. The AuditJob will wait until all the Nodes have Acknowledged that the systemMetadata has been updated.

After the auditJob has confirmed that all the nodes have updated the SystemMetadata, then the Audit Job will send a commit message. This commit message will complete the transaction that began with the Update. The Audit Job will wait until all the listening nodes have responded with an acknowledgement that the transaction was committed successfully.

The AuditJob can then move the WorkingPid to the list of pids that have been completed. The Audit job will then pull the next pid off of the active pid list and continue with the processing loop. Otherwise, if all the pids have been processed the AuditJob will end.

Figure 2.1 Merge SystemMetadata

../_images/02-01_activity.png

SystemMetadata Merge Rule

The highest serial version establishes the ascendant revision of the SystemMetadata. Only if the serial versions are the same across the SystemMetadata instances across the cluster will the most recent dateSystemMetadataModified be used to determine the ascendant revision.

First, determine if the serialVersions are equal. If they are not then the SystemMetadata record with the highest serial version becomes the ascendant revision of the SystemMetadata. If the serial versions are equal, then find the SystemMetadata record with the most recent dateSystemMetadataModified.

Determine if any record has the archive flag set. If any record does have the archive flag set, then the ascendant revision must have its archive flag set.

Determine if any record has the obsoletes or obsoletedBy field set. If any record does have the obsoletes or obsoletedBy field set, then copy the value from either the obsoletes or obsoletedBy field to the ascendant revision.

Determine if any record has replication policy info set. On accendant revision, Set replication allowed if set true on any revision instance. Set number of replicas to highest number from any revision instance. Merge preferred member node list and blocked member node list. So long as there are no conflicts betwen preferred and blocked list. If conflicts, then use ascendant revisions lists. Preferred list must maintain original order.

Figure 2.2 Audit Job Listener

../_images/02-02_activity.png

Figure 2.2.1 Audit Job Listener Harvest List

../_images/02-02-01_activity.png

Figure 2.2.1-a Audit Job Listener Process Temp Harvest List

../_images/02-02-01-a_activity.png

Figure 2.2.2 Audit Job Listener Get Pid, SerialVersion and Date

../_images/02-02-02_activity.png

Figure 2.2.3 Audit Job Listener Get SystemMetadata Record

../_images/02-02-03_activity.png

Figure 2.2.4 Process Update to SystemMetadata

../_images/02-02-04_activity.png

Figure 3 CN Audit Package Structures

../_images/01_class.png

Figure 3.1 CN Audit Control Package Structure

../_images/02_class.png

Figure 3.2 CN Audit Event Package Structure

../_images/03_class.png

Figure 3.3 Cn Audit Strategy Package Structure

../_images/04_class.png

Figure 3.4 Cn Audit Data Package Structure

../_images/05_class.png

Figure 3.4a Cn Audit Hazelcast Data Package Structure

../_images/05_01_class.png

Figure 3.4b Cn Audit Persistent Data Package Structure

../_images/05_02_class.png

Figure 3.5 Cn Audit Sql Package Structure

../_images/06_class.png