ECSO Improvements, 2016 Goal: Organize the work on ECSO, and develop a plan for next phase evaluation run - When is the next P/R run? *TBD, when tasks have time estimate*. - What changes on the ontology need to be done by then? *BELOW, responsible party in* **BOLD** - What changes on the algorithm need to be done by then? **TBD - **Jim**** ECSO tasks -------------- 1. Problems exposed by reasoner are resolved *short term fix done, committed* 1. OBOE rule on DerivedUnit 1. hasUnit exactly 1 (BaseUnit or PrefixedUnit) 2. Short term fix: remove this rule because we are not performing conversions **BEN** (done) 3. Long term: repair, need expert advice **JIM** 2. object properties on Primary Production Carbon Flux and Fixed Carbon Pool, eg RO_0001000 (derives from) 1. Described: see notes, doc titled “ECSO History, Notes” 2. Short term fix: remove rules on these two classes **MARGARET** (done) 3. Long Term fix: we will want these rules corrected for searches. need expert advice: **JIM** 2. Autogenerated classes: 1. for carbon-flux related classes, adapt or adopt Entity, MeasurementType, Characteristic into main ontology. 1. Add minimum acceptable annotation (rdfs:label, definition, definition_Source, Definition_Contributor (an orcid). Other desirable annots: example_Of_Usage **MARGARET, SARA** 2. Subclassing: move to appropriate trees **SARA, MARGARET, MARK** 3. Existing carbon-cycling related classes 1. Review candidates; Add or subclass per notes **SARA, MARGARET, MARK** 2. Char, entity, std for all. **MARGARET, MARK** 4. MsTMIP classes 1. Confirm that all Sophie’s new info has been incorporated **MARGARET** 5. PATO qualities - OBOE characteristics 1. equate where appropriate **JIM, MARGARET** (done, Ben) 6. Synonyms/alternate labels 1. file of suggested (by wikipedia) synonyms **JIM** 2. Confirm and incorporate **BRYCE, MARGARET** Annotation tasks -------------- 1. Reannotate test corpus F as needed, with new classes. **SARA, BRYCE, MARGARET** 1. Describe annotation process **MARGARET** 2. Dependencies: additions to ECSO (classes, synonyms, axioms) Algorithm tasks --------------- 1. Create a script that uses the EML files to extract annotations. **Jim** Partly complete, need to reproduce what Ben was sending to the ESOR service. 2. Integrate a Named Entity Recognition (NER) algorithm into ESOR as a preprocessing step to the entity linking service. **Zhen** (done except debugging and evaluation) 3. Create an ontological representation of the dataset descriptions in EML to use as the input to ontology matching algorithms. **Booma and Jim** (Done) 4. Investigate other suggested improvements from Jin Zheng **Zhen** 5. Integrate and evaluate the following ontology matching algorithms: **Booma, Sabita, and Jim** (complete except for evaluation) 1. AgreementMaker 2. Custom cosine similarity and edit distance comparison matching 6. Improve the ontology generation capability around units **Bryce and Jim** Status at 6/15/2016 ---------------- What we got done last week: 1. Create a script that uses the EML files to extract annotations. (COMPLETE) 2. Integrate and evaluate the following ontology matching algorithms: Booma and Jim 1. AgreementMaker Lite 2. Similarity Matcher (complete except for a weighting function to combine all of the different scoring strategies) 3. Found a NER that works well to integrate with Linkipedia Our goals for this coming week: 1. Finish writing the weighting function for SimilarityMatcher. 2. Write evaluation scripts for AgreementMaker Lite and SimilarityMatcher (and run them). 3. Integrate the named entity recogniser (NER) into the annotate web service on linkipedia. 4. Improve the ontology generation capability around units Bryce and Jim 5. Come up with a proposed RDF annotation format for ESOR to return. 1. repeatable attribute identifer should be something like: datatable_1/attribute_1 Present: Jim, Ben, Bryce, Booma, Sabita, Zhen Existing roadmap: https://github.com/DataONEorg/sem-prov-ontologies/blob/master/observation/ECSO_tasks_20160601.md Status at 6/22/2016 ---------------- What we got done last week: 1. Finish writing the weighting function for SimilarityMatcher. Our goals for this coming week: 1. Come up with a proposed RDF annotation format for ESOR to return. 1. repeatable attribute identifer should be something like: datatable_1/attribute_1 2. Write evaluation scripts for AgreementMaker Lite and SimilarityMatcher (and run them). ---I could start looking into this (Sabita) 3. Integrate the named entity recogniser (NER) into the annotate web service on linkipedia (and evaluate using dataone_entity_linking_eml.py). (almost complete) 4. Improve the ontology generation capability around units Bryce and Jim