Information for EML 2.1.0 Document Authors
EML Schema Documentation
EML FAQs
Several modifications to the EML schema made in version 2.1.0 will require changes to how
EML documents are structured, and these changes are highlighted here. EML authors should also
refer to the affected sections in the normative schema documents for complete usage
information and examples. Existing EML 2.0-series documents can be converted to EML 2.1.0
using the XSL stylesheet that accompanies this release, as described in section 2 below.
The EML 2.1.0 release addresses several errors with respect to W3C specifications
for XML schema (http://www.w3.org/TR/xml). Although the changes are small,
they are incompatible with EML 2.0.0 and 2.0.1 schemas, which necessitated advancing the
version number to "2.1". The STMML schema was also found to be invalid with respect to XML
Schema language, and the most reasonable fix for this bug also is incompatible with its earlier
versions. EML users should note that the STMML schema error was not
related to elements used directly by EML (i.e., <unitList> or <unitType>). However,
EML imports all of STMML, and
authors of EML documents may have made use of other parts of that schema. Therefore,
it was decided to advance the namespace used for STMML-related imports to "stmml-1.1",
in keeping with the EML version naming pattern. The STMML authors have been contacted,
and they are interested in our development and use of STMML.
Other features and enhancements were added to this release that represent significant
improvements. The XML data type requirements for several elements were changed, in some
cases to constrain their content, and in other cases to increase flexibility. The names of two
elements were changed to make them consistent throughout EML. In the literature schema two
elements became optional so that EML could accommodate in-press publications where
the volume and page range are not yet known. Support for two new optional elements was
also added: a 'contact' tree can now be used in the literature module, and a ‘descriptive’ element
can be used in distribution trees.
For the most part, EML 2.1.0 does not introduce major new features, or require a
shift in use or implementation. There was a deliberate decision to balance the impact on
instance document authors with necessary schema maintenance, and to prepare the schema
for the next phase of planned improvements and features. Some of the changes to EML 2.1.0
are invisible to document authors; see the 'Readme' that accompanies the distribution for a
complete list of the bugs addressed, and for information of interest to developers.
Changes and New Features in EML 2.1.0
EML Schema validity
EML allows authors to place any XML markup in
<additionalMetadata> sections at the end of the document. The content
model for <additionalMetadata> includes an optional <describes>
element so that references to EML nodes can be included as necessary.
In EML 2.0 this element was placed alongside the additional XML content;
however, this construct is not allowed in XML Schema, and the error was
not reported by XML parsers available at the time EML 2.0 was released.
In EML 2.1.0, the error has been corrected by adding a required child element
to the <additionalMetadata> section to contain the
"<xs:any>" XML content.
<additionalMetadata> sections must include the child
<metadata> to contain the additional XML markup. The optional
<describes> element may still be included to reference a particular node
of the document. Multiple <describes> elements can be included if needed.
Examples of documents written against 2.1.0 and 2.0.1 are below. Also see the
additionalMetadata normative documentation.
In EML 2.0.1, an additionalMetadata section looked like this:
123
...]]>
In EML 2.1.0, the markup must be enclosed within <metadata> tags:
123
...
]]>
STMML Schema validity
EML makes use of the Scientific Technical and Medical Markup Language schema
(STMML, stmml.xsd) for describing units, and the STMML schema was also found to
be invalid. The error was not related to
elements used directly by EML (i.e., <unitList> or
<unitType>), however some authors may have used other parts of stmml.xsd
in their documents. The required schema changes were not compatible with STMML-1.0, and
the EML development group is working with the STMML developers on this
issue. Since EML now imports a version of STMML that is not identical to that available
from its authors, it was decided to advance the namespace used by EML 2.1.0 for stmml-related
files to "stmml-1.1". To import stmml.xsd into one of your EML 2.1.0 documents use the XML
namespace declaration for STMML in the code below:
...
..
]]>
Location of Access Control Trees
In EML 2.0.1 an <access> tree could be included in each top-level module (i.e. dataset,
citation, software, or protocol) to control access to the entire metadata document. Additionally, to
control access to individual entities,
some authors put <access> trees in <additionalMetadata> sections and used
<describes> elements to reference their <distribution> nodes.
Authors may have inferred that access
control could be applied to any node with this practice. However, node-level access
control is problematic to implement, and in practice only access trees that reference
distribution nodes are recognized (as was stated in EML 2.0.1 documentation).
A better solution is to locate <access> nodes above or near the node to
which the access rules should be applied. This feature has been added to
EML 2.1.0.
In EML 2.1.0, access trees can be placed in 2 locations. To control the entire
metadata document (i.e., "document-level access"), an <access> tree should
be placed as a child of the root element (EML image). If a
metadata author wishes to override the document-level control for a specific entity, an
additional access tree may be placed as the last child of a <distribution>
element within the <physical> tree of that entity
(Physical Distribution Type image).
The structure of the access module itself has not changed (access
module documentation ).
Example 1. To control access to all the metadata and by default to the data, use an
<access> element at the top level:
...
...
...
]]>
Example 2. Access rules can still be specified for any data entity by placing an
access tree under that entity's physical/distribution element. The following example
illustrates how a dataTable's access tree can be used to override permissions
set at the document level.
If no access is specified in distribution then the document-level access rules are
applied.
...
...
...
...
...
...
]]>
Typing of <gRing> corrected
The content of the <gRing> element was retyped to make these nodes more usable. This
element is generally analogous to the FGDC component for ring. This element should now contain
a string comprised of a comma-delimited sequence of longitude and latitude values for vertex
coordinates (in decimal degrees), as in the example below. For more information, see the normative documents
for gRing in the
coverage module. -119.453,35.0 -125,37.5555 -122,40 -119.453,35.0
..
]]>
Entity Attributes: <bounds> minimum and maximum are of type
xs:float
In EML 2.0.1, <bounds> elements were typed as
xs:decimal and did not support scientific notation. The base data type
was changed to 'xs:float' in EML 2.1.0 to accommodate both decimal and scientific
notation while maintaining backward compatibility. Authors should keep
in mind that there are still advantages to using decimal numbers for bounds,
because the decimal data type maintains precision during storage while
the floating point type does not. An alternative type, "precisionDecimal"
(corresponding to the IEEE type "floating-point decimal”), may be available
in the next version of XML Schema (i.e., v1.1, a working draft as of late 2008).
It combines features of both the decimal and float types in that it supports
the values and notation of a float, but is treated as decimal in arithmetic
and storage. The typing of this element may be changed to this new
type in a future release of EML. For more information, see the normative
documentation for
NumericDomainType.
In EML 2.1.0, bounds can be written as:
...
real
0
1.234E15
]]>
Geographic Coverage: <altitudeUnits> use Standard Units of
LengthType
In EML 2.0.0 and 2.0.1, altitude units were typed as xs:string,
and EML authors were instructed to include a vertical datum along
with the unit. In EML 2.1.0 this has been revised. Altitudes are now
restricted to lengths in Standard Units (e.g. meter, foot, etc), and the
datum should be included as part of the textual geographicDescription element. Document
authors should note that including any additional content in the
<altitudeUnits> element other than a length value, such as the datum,
is not valid in EML 2.1.0. For a list of allowable units, see
the normative description for <altitudeUnits>.
...
0
120
meter
..
]]>
Geographic Coverage: Latitude and Longitude are type xs:decimal, with appropriate
ranges
In EML 2.0.1, latitude and longitude values in <geographicCoverage>
elements were typed as a xs:string. In EML 2.1.0 these values are restricted to
decimal numbers with realistic ranges (-90 to 90, and -180 to 180, respectively).
Fractions of a degree in minutes and seconds should be converted to decimal
format, and strings denoting direction or hemisphere (e.g., 'S' or 'south') are not
allowed. South latitudes and west longitudes must be indicated by a minus
sign (-) in front of the coordinate, as in the example below. These constraints
are consistent with the intended use of this field, which is to support mapping
the general geographic coverage of EML resources. Authors should keep in
mind that very specific descriptions of spatial data can be accommodated by
EML modules dedicated to that purpose. More information on bounding coordinates can
be found in the normative technical
documents.
-120.2534
-119.7550
34.2231
34.1231
..
]]>
Element content must be non-empty
In EML 2.0.1, elements of the string data type were allowed to be
empty or contain only whitespace. This feature
was occasionally exploited as a work-around to force incomplete documents
to validate in XML editors and the Metacat harvester, but this practice may
cause problems in document parsing or for EML tools such as Kepler.
In EML 2.1.0, string content is now typed as "NonEmptyString" and string
entities are required to have minimal non-whitespace content. So, whereas the following
content would have been allowed in EML 2.0.1:
...
]]>
or
...
]]>
In EML 2.1.0, empty (or whitespace) content is not allowed. Actual content must be
provided. approx. temperature
...
]]>
An offline resource has a minimum of one element required
(<mediumName>)
In EML 2.0.1, an author could describe an offline data resource, but include no
information about the resource's distribution. In EML-2.1.0, minimal content (one
element) is now required.
In EML 2.0.1, the distribution tree for an offline resource could have ended with
no content:
...
]]>
In EML 2.1.0, the element <mediumName> is required:
Atlas of Lake Erie Shorelines
...
]]>
Methods elements are standardized to <methods>
In EML 2.0.1, both "<method>" and "<methods>" elements were included
in the schema, which caused confusion for some authors. In EML
2.1.0, instances of the MethodsType have been standardized to
"methods".
In EML 2.0.1, this path existed:
In EML 2.1.0, this path is now properly constructed as:
Elements for date-time have been standardized to <dateTime>
In EML 2.0.1, both "<datetime>" and
"<dateTime>" elements were included, which caused confusion for some authors. In EML
2.1.0, these instances have been standardized to "dateTime".
In EML 2.0.1, this path existed:
In EML 2.1.0, this path is now properly constructed as:
For journal articles, the elements <volume> and
<pageRange> are now optional
Two elements describing journal articles in the literature schema (eml-literature.xsd),
<volume> and <pageRange>, are now optional to permit
articles-in-press to be described in EML.
A Citation may have an optional <contact> tree
Also in eml-literature.xsd, an optional <contact> tree has been added
to permit a contact to be designated for a publication. For example, a contact
could be provided for reprint requests.
New optional element (<onlineDescription>) for a description of an
online resource
A new element, <onlineDescription>, was added to support providing
a brief description of
the content of an online element's child. This optional element is available for both
resource-level and physical-level distribution nodes, and is typed as a
NonEmptyString. One possible use for the description is to provide optional content for
the HTML anchor element that accompanies a URL.
Converting EML documents from v2.0.0/1 to v2.1.0
About the EML conversion stylesheet
An XSL stylesheet is provided with the EML Utilities
to convert valid EML 2.0-series documents to EML 2.1.0 (see
http://knb.ecoinformatics.org/software/eml/).
The stylesheet performs basic tasks to create a
template EML 2.1.0 document (below). For more information, see the Utilities documentation.
Updates namespaces to eml-2.1.0 and stmml-1.1
Encloses XML markup within <additionalMetadata> sections in
<metadata> tags
Renames elements whose spelling has changed (<method> and <datetime>)
Copies access trees from <additionalMetadata> to other parts of the document (for common
constructs)
Optionally replaces the content of the "packageId" attribute on the root
element, <eml:eml>, using a parameter
Validity of new EML 2.1.0 documents
Because of the flexibility allowed in EML, the stylesheet may encounter EML 2.0.1 structures that
cannot be transformed or that may result in invalid EML 2.1.0 after processing.
For example, by design <additionalMetadata> sections are parsed laxly, and
so it is possible for their content in EML-2.0.0/1 to contain <access> trees
which are invalid. Also, the content of several elements has been more tightly
constrained in EML 2.1.0 (e.g., latitude and longitude), and data types are not
detectable by a stylesheet. Document authors are advised to check the validity of
their new EML 2.1.0 after transformation. EML instance documents
can be validated in these ways:
With the online EML
Parser. The online parser will validate all versions of EML.
Using the Parser that comes with EML. To execute it, change into the 'lib'
directory of the EML release and run the 'runEMLParser' script passing your EML
instance file as a parameter. The script performs two actions: it checks the validity of references
and id attributes, and it validates the document against the EML 2.1 schema.
The EML parser included with the distribution is capable of checking only EML 2.1.0 documents, and
cannot be used to validate earlier versions (e.g., EML 2.0.1).
If you are planning to contribute your EML 2.1.0 document to a Metacat repository, note
that the Metacat servlet checks all versions of incoming EML for validity as part of the insertion process.