Converting EML documents from v2.0.0/1 to v2.1.0 2.1. About the EML conversion stylesheet An XSL stylesheet, 'eml201to210.xsl' is provided to convert valid EML 2.0-series documents to EML 2.1.0. The intent of this reference to not to provide comprehensive instructions for transforming XML documents since adequate information is available elsewhere. A few specific examples will be given here, along with recommended practices and discussion of issues likely to occur when transforming EML 2.0 to 2.1. The reader is referred to XSLT references available at W3C schools (http://www.w3schools.com/xsl/), Wikipedia (http://www.wikipedia.org/), W3C (http://www.w3.org/Style/XSL/) and various books for more information about using XSLT to transform XML documents. The stylesheet performs these basic tasks: 1. Updates namespaces to eml-2.1.0 and stmml-1.1 2. Encloses XML markup within sections in tags 3. Renames elements and 4. Moves access trees: 1. if an access tree was a child of a dataset, citation, software or protocol, it will be moved to the eml-level. 2. if an access tree is found in an section which references a distribution node via a element, it will be moved to the corresponding distribution node. 5. Replaces the content of the "packageId" attribute (see the root element, ) when a parameter named "package-id" is passed to it. Without the parameter, the existing packageId content is retained. 2.2. Converting EML instances using the XSL stylesheet Software: 1. You will first need to identify or obtain a program (or programming language module) capable of transforming XML using an XSL stylesheet. Suitable programs are available for all computer platforms, and modules are included in or are available for most programming and scripting languages. For example, Linux distributions generally include a command line utility called "xsltproc". 2. Download the EML Utilities module which includes the transformation stylesheet "eml201to210.xsl" from http://knb.ecoinformatics.org/software/eml/. Examples: 1. Here are examples illustrating how to transform an EML document from the command line using xsltproc. In the first example, the redirect operator is used to save the output to a file. An explicit output argument can also be used instead of a redirect. xsltproc eml201to210.xsl eml201_doc.xml > eml210_doc.xml xsltproc -output eml210_doc.xml path/to/stylesheet/eml201to210.xsl eml201_doc.xml To include a new value for the packageId attribute, the stylesheet will accept a parameter called 'package-id' from the calling program: xsltproc -stringparam package-id knb-lter-sbc.1006.4 eml201to210.xsl eml201_doc.xml > eml210_doc.xml More complete documentation for the xsltproc program is available at http://xmlsoft.org/XSLT/xsltproc2.html, or with "man xsltproc" at your system prompt. 2. Methods in the Apache Xalan Java library (http://xml.apache.org/xalan-j/commandline.html) can also be used to run the transform from a Linux, Macintosh or Windows command line if a Java JRE (Java runtime environment) and the Xalan classes are installed. Use the ‘-IN’ parameter to specify the source document, ‘-XSL’ parameter to specify the stylesheet, and ‘-OUT’ parameter to specify the output filename, as follows: java org.apache.xalan.xslt.Process -IN eml201_doc.xml -XSL eml201to210.xsl -OUT eml210_doc.xml 3. Windows users can also use Microsoft MSXML and the msxsl.exe command line utility, which is freely available from sources such as CNET and included with SynchroSoft’s oXygen XML editor. msxsl eml201_doc.xml eml201to210.xsl -o eml210_doc.xml 2.3. Validity of new EML 2.1.0 documents Because of the flexibility allowed in EML, the stylesheet may encounter EML 2.0.1 structures that cannot be transformed as stated above or that may result in invalid EML 2.1.0 after processing. The stylesheet is able to detect some of these conditions (below). In these cases, the stylesheet creates the new EML 2.1.0 document, but also issues an alert about the potential problem. This may mean that the EML 2.0.1 document must be a) examined manually, edited and reprocessed, b) transformed by your own custom method, or c) the stylesheet's output used as a template to create valid EML 2.1.0. 1. An tree is present in , but an adjacent element did not reference a node. The tree will be left in , the rest of the document transformed, and a message printed to notify the user. 2. Two (or more) trees are present in , and reference the same element via their sibling tags. If the "orderBy" attributes on all elements match, then the trees will be merged (with rules listed in document order) and moved to the appropriate element. If the "orderBy" attributes are not identical, the stylesheet will not attempt to resolve the issue. The trees will all remain in (while the rest of the document transformed) and a message will be printed to notify the user. 3. Empty content appears in an element which is typed as "NonEmptyString" in EML2.1.0. The document is transformed but the user warned that it will not be valid EML 2.1.0 due to the missing content. In addition to those noted above, there are possible EML constructs which the stylesheet cannot detect, but which might result in invalid EML 2.1.0. For example, by design sections are parsed laxly, and so it is possible for their content in EML-2.0.0/1 to contain trees that are not valid EML. Also, the content of several elements has been more tightly constrained in EML 2.1.0 (e.g., latitude and longitude), but only empty content is reasonably detected by the transformation stylesheet. For these reasons, document authors are advised to check the validity of their new EML 2.1 using the EML parser after transformation. EML instance documents can be validated these ways: 1. With the online EML Parser. 2. Using the Parser that comes with EML. To execute it, change into the 'lib' directory of the EML release and run the 'runEMLParser' script passing your EML instance file as a parameter. The script performs two actions: it validates the document against the EML 2.1 schema, and it checks the validity of references and id attributes (as of EML 2.1.0). 2.4. Notes regarding EML and Metacat 1. If you are planning to contribute your EML 2.1.0 document to a Metacat repository, note that the Metacat servlet checks incoming documents for validity as part of the insertion process. 2. Metacat submitters should be aware that some EML instances have content in the 'packageId' attribute which is intended to match Metacat's docid. In this case, it is recommended that you make use of the stylesheet's ' package-id' parameter to increment the revision number.