METADATA format registration =========================== **Status: initial draft** Overview ~~~~~~~~ While DataONE's architecture is designed to accommodate any metadata format Member Nodes make use of, each new metadata format requires a bit of development to enable DataONE's discovery mechanisms for those metadata documents. Both Content Curator (usually a Member Node administrator) and DataONE developer effort is required, and more significantly, a patch-level release of the CN software stack needs to be performed so that content of the new format can be synchronized, indexed, and ultimately discovered. The building, testing, and deploying the necessary items to the CNs does necessitate a lag between when the new format is published and when content using it can be successfully created. Accordingly, content curators making use of a new format, or a new version of an existing format, need to account for that in their own planning. The process of registering a new metadata format involves the creation and testing of the following items:: 1. a **published schema or DTD** (done by Content Curator) 2. an **indexing parser** (a DataONE developer responsibility) 3. an **XSLT template** (built by either, depending on time and ability ) // TODO: verify who's responsible Once all are available and tested, the format can be fully registered into DataONE as a new object format. When done as part of a new Member Node deployment, it is good to plan for this work to be done early on, as final testing of the node requires that all objects use a registered format. Metadata Format Registration ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Irrespective of Member Node deployment, registering a metadata format follows the same steps: Content Curators: 1. develop and test their schema or DTD. The schema or DTD needs to pass standard schema validation tests that can be found at numerous testing services online (search for "online XML schema validation"). 2. publish the schema such that the namespace and schemaLocation of the metadata documents point to an immutable copy of the schema, where it can continue to be resolved consistently indefinitely. 3. contact DataONE via support@dataone.org, attaching example metadata documents, or providing a link to a test instance of the Member Node that contains them. DataONE developers: 4. test the schema format via the examples, iterating with the content curator on any bug fixes. 5. write an indexing parser and / or XSLT template. 6. test the indexing parser and XSLT template (in the DEV environment). 7. Review test results with the content curator (show search results, and metadata visualizations) 8. Deploy indexing parser and XSLT templates and new object format record to additional environments (STAGE and/or production) (Currently XSLT template is handed off to ONEMercury maintainers) 9. Notify content curator when work is done. Content Curator can then start submitting metadata objects using the new format. // TODO: who names the object format (gives the identifier?) As part of Member Node deployment ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Deployment-phase testing of Member Nodes requires all metadata formats used by the prospective Member Node to be registered, so that the processes under test (synchronization, indexing, ONEMercury presentation) can be run. Keeping in mind that DataONE will need to build, test, and deploy items to the Coordinating Nodes, format registration would ideally be started during the implementation / development phase of the Member Node on-boarding process. Specifically, the first item (the published schema) needs to be published and tested, and the object format registered to the target testing environment before the Member Node itself can be tested. Absent these things, synchronization will fail, and the indexing and ONEMercury tests cannot be run. Typically, the indexing parser and SXLT template are tested and deployed to the Coordinating Nodes of the DEV testing environment for testing by DataONE developers, and then if successful, deployed to the STAGE environment, in preparation for registration of the prospective Member Node in that environment. Member Node implementers should work out specific timings and placements with their primary DataONE contact to optimize their development cycles. Notes: ~~~~~~~ What information is pulled from metadata into the search index: http://mule1.dataone.org/ArchitectureDocs-current/design/SearchMetadata.html#values-extracted-from-science-metadata current effort estimation: - 2 days dev, 2 days testing (sandbox, staging), 1 for the release, 1 day ONEMercury upgrade. - new versions of existing formats require less development and result in quicker testing - what is process for registering a data format? Remaining issue ~~~~~~~~~~~~~~~ Because of the difficulty re-synchronizing failed objects, the Member Node is dependent on DataONE to register the data format before it can start even entering data onto their node. This seems like a backwards dependency that puts DataONE resources on the critical path of external projects. Q. is there a more graceful way to handle this situation?