filter
public boolean filter(org.dataone.service.types.v2.SystemMetadata sysmeta)
Here is the algorithm:
First to filter out pretty old objects, then:
1. Fetching the solr index for the pid.
2. If there is no solr index for the pid,
2.1 check the archive flag in the system metadata. If the archive=true, filter this pid out
2.2 if the archive=false, keep this pid task (return false).
3. If there is a solr index, compare the modification date between the solr index and the system metadata.
3.1 if sysmeta > solr index, return false (keep index task)
3.2 if sysmeta < solr index, return true (filter it out)
3.3 if sysemeta = solr index, compare the replica info.
3.3.1 If serialVersion in the solr is available, compare the value of serial version.
3.3.1.1 If solr = sysmeta , return true (filter it out) since no change in replica
3.3.1.2 If solr < sysmeta, return false (keep index task) since the solr has a smaller (older) serial version.
3.3.1.3 If solr > sysmeta, return true (filter it out) since the solr has a bigger (newer) serial version.
3.3.2. If serialVersion in solr is Not availabe, comare replica lists:
3.3.2.1 no change on replica info, return true (filter out)
3.3.2.2 there is a change, return false (keep index task)
If any exception happens, it will return false for safet.
- Parameters:
sysmeta
-
- Returns:
- true if we don't need to index it (filter out)