Module Documentation: eml-physical

Back to EML Contents

The eml-physical module - Physical file format

The eml-physical module describes the external and internal physical characteristics of a data object as well as the information required for its distribution. Examples of the external physical characteristics of a data object would be the filename, size, compression, encoding methods, and authentication of a file or byte stream. Internal physical characteristics describe the format of the data object being described. Both named binary or otherwise proprietary formats can be cited (e.g., Microsoft Access 2000), or text formats can be precisely described (e.g., ASCII text delimited with commas). For these text formats, it also includes the information needed to parse the data object to extract the entity and its attributes from the data object. Distribution information describes how to retrieve the data object. The retrieval information can be either online (e.g., a URL or other connection information) or offline (e.g., a data object residing on an archival tape).

The eml-physical module, like other modules, may be "referenced" via the <references> tag. This allows a physical document to be described once, and then used as a reference in other locations within the EML document via its ID.

Module details

Recommended Usage: Any data object that is being described by EML needs this information so the entities and attributes that reside with in the data object can be extracted.
Stand-alone: yes
Imports: eml-documentation, eml-literature, eml-resource, eml-access
Imported By:
View an image of the schema: eml-physical image

Element Definitions:

physical 

This element has no default value.
Content of this field: Description of this field:
Type: PhysicalType
The content model for physical is a CHOICE between "references" and all of the elements that let you describe the internal/external characteristics and distribution of a data object (e.g., dataObject, dataFormat, distribution.) A physical element can contain a reference to an physical element defined elsewhere. Using a reference means that the referenced physical is identical, not just in name but identical in its complete description.

objectName 

This element has no default value.
Content of this field: Description of this field:
Type: res:NonEmptyStringType
The name of the data object. This is possibly distinct from the entity name in that one physical object can contain multiple entities, even though that is not a recommended practice. The objectName often is the filename of a file in a file system or that is accessible on the network.
Example(s):
rainfall-sev-2002-10.txt

size 

This element has no default value.
Content of this field: Description of this field:
Attributes: Use: Default Value:
unitoptionalbyte
This element contains information of the physical size of the entity, by default represented in bytes unless the unit attribute is provided to change the units.
Example(s):
134

authentication 

This element has no default value.
Content of this field: Description of this field:
Attributes: Use: Default Value:
methodoptional
This element describes authentication procedures or techniques, typically by giving a checksum value for the object. The method used to compute the authentication value (e.g., MD5) is listed in the method attribute.
Example(s):
f5b2177ea03aea73de12da81f896fe40

compressionMethod 

This element has no default value.
Content of this field: Description of this field:
Type: res:NonEmptyStringType
This element lists a compression method used to compress the object, such as zip, compress, etc. Compression and encoding methods must be listed in the order in which they were applied, so that decompression and decoding should occur in the reverse order of the listing. For example, if a file is compressed using zip and then encoded using MIME base64, the compression method would be listed first and the encoding method second.
Example(s):
zip
gzip
compress

encodingMethod 

This element has no default value.
Content of this field: Description of this field:
Type: res:NonEmptyStringType
This element lists a encoding method used to encode the object, such as base64, BinHex, etc. Compression and encoding methods must be listed in the order in which they were applied, so that decompression and decoding should occur in the reverse order of the listing. For example, if a file is compressed using zip and then encoded using MIME base64, the compression method would be listed first and the encoding method second.
Example(s):
base64
uuencode
binhex

characterEncoding 

This element has no default value.
Content of this field: Description of this field:
Type: res:NonEmptyStringType
This element contains the name of the character encoding. This is typically ASCII or UTF-8, or one of the other common encodings.
Example(s):
UTF-8

dataFormat 

This element has no default value.
Content of this field: Description of this field:
Elements: Use: How many:
A choice of (
textFormatrequired
OR
externallyDefinedFormatrequired
OR
binaryRasterFormatrequired
)
This element is the parent which is a CHOICE between four possible internal physical formats which describe the internal physical characteristics of the data object. Using this information the user should be able parse physical object to extract the entity and its attributes. Note that this is the format of the physical object itself.

textFormat 

This element has no default value.
Content of this field: Description of this field:
Elements: Use: How many:
A sequence of (
numHeaderLinesoptional
numFooterLinesoptional
recordDelimiteroptionalunbounded
physicalLineDelimiteroptionalunbounded
numPhysicalLinesPerRecordoptional
maxRecordLengthoptional
attributeOrientationrequired
A choice of (
simpleDelimitedrequired
OR
complexrequired
)
)
Description of a text formatted object. The description includes detailed parsing instructions for extracting attributes from the bytestream for simple delimited file formats (e.g., CSV), fixed format files that use fixed columns for attribute locations, and mixtures of the two. It also supports records that span multiple lines.

numHeaderLines 

This element has no default value.
Content of this field: Description of this field:
Type: xs:int
Number of header lines preceding data. Lines are determined by the physicalLineDelimiter, or if it is absent, by the recordDelimiter. This value indicated the number of header lines that should be skipped before starting to parse the data.
Example(s):
4

numFooterLines 

This element has no default value.
Content of this field: Description of this field:
Type: xs:int
Number of footer lines following data. Lines are determined by the physicalLineDelimiter, or if it is absent, by the recordDelimiter. This value indicated the number of footer lines that should be skipped after parsing the data. If this value is omitted, parsers should assume the data continues to the end of the data stream.
Example(s):
4

recordDelimiter 

This element has no default value.
Content of this field: Description of this field:
Type: xs:string
This element specifies the record delimiter character when the format is text. The record delimiter is usually a linefeed (\n) on UNIX, a carriage return (\r) on MacOS, or both (\r\n) on Windows/DOS. Multiline records are usually delimited with two line ending characters, for example on UNIX it would be two linefeed characters (\n\n). As record delimiters are often non-printing characters, one can use either the special value "\n" to represent a linefeed (ASCII 0x0a) and "\r" to represent a carriage return (ASCII 0x0d). Alternatively, one can use the hex value to represent character values (e.g., 0x0a).
Example(s):
\n\r

physicalLineDelimiter 

This element has no default value.
Content of this field: Description of this field:
Type: xs:string
This element specifies the physical line delimiter character when the format is text. The line delimiter is usually a linefeed (\n) on UNIX, a carriage return (\r) on MacOS, or both (\r\n) on Windows/DOS. Multiline records are usually delimited with two line ending characters, for example on UNIX it would be two linefeed characters (\n\n). As line delimiters are often non-printing characters, one can use either the special value "\n" to represent a linefeed (ASCII 0x0a) and "\r" to represent a carriage return (ASCII 0x0d). Alternatively, one can use the hex value to represent character values (e.g., 0x0a). If this value is not provided, processors should assume that the physical line delimiter is the same as the record delimiter.
Example(s):
\n\r

numPhysicalLinesPerRecord 

This element has no default value.
Content of this field: Description of this field:
Type: xs:unsignedInt
A single logical data record may be written over several physical lines in a file, with no special marker to indicate the end of a record. In such cases, it is necessary to know the number of lines per record in order to correctly read them. If this value is not provided, processors should assume that records are wholly contained on one physical line. If the value is greater than 1, then processors should examine the lineNumber field for each attribute to determine which line of the record contains the information.
Example(s):
3

maxRecordLength 

This element has no default value.
Content of this field: Description of this field:
Type: xs:unsignedLong
The maximum number of characters in any record in the physical file. For delimited files, the record length varies and this is not particularly useful. However, for fixed format files that do not contain record delimiters, this field is critical to tell processors when one record stops and another begins.
Example(s):
597

attributeOrientation 

This element has no default value.
Content of this field: Description of this field:
Specifies whether the attributes described in the physical stream are found in columns or rows. The valid values are column or row. If set to 'column', then the attributes are in columns. If set to 'row', then the attributes are in rows. Row orientation is rare, but some systems such as SPlus and R utilize it. For example, some data with column orientation: DATE PLOT SPECIES 2002-01-15 hfr5 acer rubrum 2002-01-15 hfr5 acer xxxx The same data in a rowMajor table: DATE 2002-01-15 PLOT hfr5 SPECIES acer rubrum acer xxxx
Example(s):
column
row

Derived from: xs:string (by xs:restriction)

Allowed values:

  • column
  • row

simpleDelimited 

This element has no default value.
Content of this field: Description of this field:
Elements: Use: How many:
A sequence of (
fieldDelimiterrequiredunbounded
collapseDelimitersoptional
quoteCharacteroptionalunbounded
literalCharacteroptionalunbounded
)
A simple delimited format that uses one of a series of delimiters to indicate the ends of fields in the data stream. More complex formats such as fixed format or mixed delimited and fixed formats can be described using the "complex" element.

fieldDelimiter 

This element has no default value.
Content of this field: Description of this field:
Type: xs:string
This element specifies a character to be used in the object for indicating the ending column for an attribute. The delimiter character itself is not part of the attribute value, but rather is present in the column following the last character of the value. Typical delimiter characters include commas, tabs, spaces, and semicolons. The only time the fieldDelimiter character is not interpreted as a delimiter is if it is contained in a quoted string (see quoteCharacter) or is immediately preceded by a literalCharacter. Non-printable quote characters can be provided as their hex values, and for tab characters by its ASCII string "\t". Processors should assume that the field starts in the column following the previous field if the previous field was fixed, or in the column following the delimiter from the previous field if the previous field was delimited.
Example(s):
,
\t
0x09
0x20

collapseDelimiters 

This element has no default value.
Content of this field: Description of this field:
The collapseDelimiters element specifies whether sequential delimiters should be treated as a single delimiter or multiple delimiters. An example is when a space delimiter is used; often there may be several repeated spaces that should be treated as a single delimiter, but not always. The valid values are yes or no. If it is set to yes, then consecutive delimiters will be collapsed to one. If set to no or absent, then consecutive delimiters will be treated as separate delimiters. Default behaviour is no; hence, consecutive delimiters will be treated as separate delimiters, by default.
Example(s):
yes
no

Derived from: xs:string (by xs:restriction)

Allowed values:

  • yes
  • no

quoteCharacter 

This element has no default value.
Content of this field: Description of this field:
Type: res:NonEmptyStringType
This element specifies a character to be used in the object for quoting values so that field delimiters can be used within the value. This basically allows delimiter "escaping". The quoteChacter is typically a " or '. When a processor encounters a quote character, it should not interpret any following characters as a delimiter until a matching quote character has been encountered (i.e., quotes come in pairs). It is an error to not provide a closing quote before the record ends. Non-printable quote characters can be provided as their hex values.
Example(s):
"
'

literalCharacter 

This element has no default value.
Content of this field: Description of this field:
Type: res:NonEmptyStringType
This element specifies a character to be used for escaping special character values so that they are treated as literal values. This allows "escaping" for special characters like quotes, commas, and spaces when they are intended to be used in an attribute value rather than being intended as a delimiter. The literalCharacter is typically a \.
Example(s):
\

complex 

This element has no default value.
Content of this field: Description of this field:
Elements: Use: How many:
A choice of (
textFixedrequired
OR
textDelimitedrequired
)
A complex text format that can describe delimited fields, fixed width fields, and mixtures of the two. This supports multiline records (where one record is distributed across multiple physical lines). When using the complex format, the number of textFixed and textDelimited elements should exactly equal the number of attributes that have been described for the entity, and the order of the textFixed and textDelimited elements should correspond to the order of the attributes as described in the entity. Thus, for a delimited file with fourteen attributes, one should provide exactly fourteen textDelimited elements.

textFixed 

This element has no default value.
Content of this field: Description of this field:
Elements: Use: How many:
A sequence of (
fieldWidthrequired
lineNumberoptional
fieldStartColumnoptional
)
Describes the physical format of data sequences that use a fixed number of characters in a specified position in the stream to locate attribute values. This method is common in sensor-derived data and in legacy database systems. To parse it, one must know the number of characters for each attribute and the starting column and line to begin reading the value.

fieldWidth 

This element has no default value.
Content of this field: Description of this field:
Type: xs:unsignedLong
Fixed width fields have a set length, thus the end of the field can always be determined by adding the fieldWidth to the starting column number.
Example(s):
7

lineNumber 

This element has no default value.
Content of this field: Description of this field:
Type: xs:unsignedLong
A single logical data record may be written over several physical lines in a file, with no special marker to indicate the end of a record. In such cases, the relative location of a data field must be indicated by both relative row and column number. The lineNumber should never greater that the number of physical lines per record.
Example(s):
3

fieldStartColumn 

This element has no default value.
Content of this field: Description of this field:
Type: xs:long
Fixed width fields have a set length, thus the end of the field can always be determined by adding the fieldWidth to the starting column number. If the starting column is not provided, processors should assume that the field starts in the column following the previous field if the previous field was fixed, or in the column following the delimiter from the previous field if the previous field was delimited.
Example(s):
58

textDelimited 

This element has no default value.
Content of this field: Description of this field:
Elements: Use: How many:
A sequence of (
fieldDelimiterrequired
collapseDelimitersoptional
lineNumberoptional
quoteCharacteroptionalunbounded
literalCharacteroptionalunbounded
)
Describes the physical format of data sequences that use delimiters in the stream to locate attribute values. This method is common in data exported from spreadsheets and database systems, To parse it, one must know the character that indicates the end of each attribute and the line to begin reading the value.

fieldDelimiter 

This element has no default value.
Content of this field: Description of this field:
Type: xs:string
This element specifies a character to be used in the object for indicating the ending column for an attribute. The delimiter character itself is not part of the attribute value, but rather is present in the column following the last character of the value. Typical delimiter characters include commas, tabs, spaces, and semicolons. The only time the fieldDelimiter character is not interpreted as a delimiter is if it is contained in a quoted string (see quoteCharacter) or is immediately preceded by a literalCharacter. Non-printable quote characters can be provided as their hex values, and for tab characters by its ASCII string "\t". Processors should assume that the field starts in the column following the previous field if the previous field was fixed, or in the column following the delimiter from the previous field if the previous field was delimited.
Example(s):
,
\t
0x09
0x20

collapseDelimiters 

This element has no default value.
Content of this field: Description of this field:
The collapseDelimiters element specifies whether sequential delimiters should be treated as a single delimiter or multiple delimiters. An example is when a space delimiter is used; often there may be several repeated spaces that should be treated as a single delimiter, but not always. The valid values are yes or no. If it is set to yes, then consecutive delimiters will be collapsed to one. If set to no or absent, then consecutive delimiters will be treated as separate delimiters. Default behaviour is no; hence, consecutive delimiters will be treated as separate delimiters, by default.
Example(s):
yes
no

Derived from: xs:string (by xs:restriction)

Allowed values:

  • yes
  • no

lineNumber 

This element has no default value.
Content of this field: Description of this field:
Type: xs:unsignedLong
A single logical data record may be written over several physical lines in a file, with no special marker to indicate the end of a record. In such cases, the relative location of a data field must be indicated by both relative row and column number. The lineNumber should never be greater that the number of physical lines per record. When parsing the first field on a physical line as a delimited field, they should assume that the field data starts in the first column. Otherwise, follow the rules indicated under fieldDelimiter.
Example(s):
3

quoteCharacter 

This element has no default value.
Content of this field: Description of this field:
Type: res:NonEmptyStringType
This element specifies a character to be used in the object for quoting values so that field delimiters can be used within the value. This basically allows delimiter "escaping". The quoteChacter is typically a " or '. When a processor encounters a quote character, it should not interpret any following characters as a delimiter until a matching quote character has been encountered (i.e., quotes come in pairs). It is an error to not provide a closing quote before the record ends. Non-printable quote characters can be provided as their hex values.
Example(s):
"
'

literalCharacter 

This element has no default value.
Content of this field: Description of this field:
Type: res:NonEmptyStringType
This element specifies a character to be used for escaping special character values so that they are treated as literal values. This allows "escaping" for special characters like quotes, commas, and spaces when they are intended to be used in an attribute value rather than being intended as a delimiter. The literalCharacter is typically a \.
Example(s):
\

externallyDefinedFormat 

This element has no default value.
Content of this field: Description of this field:
Elements: Use: How many:
A sequence of (
formatNamerequired
formatVersionoptional
citationoptional
)
Information about a non-text or proprietary formatted object. The description names the format explicitly, but assumes a processor implicitly knows how to parse that format to extract the data. A format version can be included. This is mainly used for proprietary formats, including binary files like Microsoft Excel and text formats like ESRI's ArcInfo export format. This is not a recommended way to permanently archive data because the software to parse the format is unlikely to be available over extended periods, but is included to allow for commonly used physical formats.

formatName 

This element has no default value.
Content of this field: Description of this field:
Type: res:NonEmptyStringType
Name of the format of the data object
Example(s):
Microsoft Excel

formatVersion 

This element has no default value.
Content of this field: Description of this field:
Type: res:NonEmptyStringType
Version of the format of the data object
Example(s):
2000 (9.0.2720)

citation 

This element has no default value.
Content of this field: Description of this field:
Type: cit:CitationType
Citation providing more detail about the physical format, including parsing information or information about the software required for reading the object.

binaryRasterFormat 

This element has no default value.
Content of this field: Description of this field:
Elements: Use: How many:
A sequence of (
rowColumnOrientationrequired
multiBandoptional
nbitsrequired
byteorderrequired
skipbytesoptional
bandrowbytesoptional
totalrowbytesoptional
bandgapbytesoptional
)
The binaryRasterInfo element is a container for various parameters used to described the contents of binary raster image files. In this case, it is based on a white paper on the ESRI site that describes the header information used for BIP and BIL files ("Extendable Image Formats for ArcView GIS 3.1 and 3.2").

rowColumnOrientation 

This element has no default value.
Content of this field: Description of this field:
Specifies whether the data should be read across rows or down columns. The valid values are column or row. If set to 'column', then the data are read down columns. If set to 'row', then the data are read across rows.
Example(s):
column
row

Derived from: xs:string (by xs:restriction)

Allowed values:

  • column
  • row

multiBand 

This element has no default value.
Content of this field: Description of this field:
Elements: Use: How many:
A sequence of (
nbandsrequired
layoutrequired
)
Information needed to properly interpret a multiband image.

nbands 

This element has no default value.
Content of this field: Description of this field:
Type: xs:int
The number of spectral bands in the image. Must be greater than 1.
Example(s):
2

layout 

This element has no default value.
Content of this field: Description of this field:
Type: res:NonEmptyStringType
The organization of the bands in the image file. Acceptable values are bil - Band interleaved by line. bip - Band interleaved by pixel. bsq - Band sequential.
Example(s):
bil
bip
bsq

nbits 

This element has no default value.
Content of this field: Description of this field:
Type: xs:int
The number of bits per pixel per band. Acceptable values are typically 1, 4, 8, 16, and 32. The default value is eight bits per pixel per band. For a true color image with three bands (R, G, B) stored using eight bits for each pixel in each band, nbits equals eight and nbands equals three, for a total of twenty-four bits per pixel.
Example(s):
8

byteorder 

This element has no default value.
Content of this field: Description of this field:
Type: res:NonEmptyStringType
The byte order in which values are stored. The byte order is important for sixteen-bit and higher images, that have two or more bytes per pixel. Acceptable values are little-endian (common on Intel systems like PCs) and big-endian (common on Motorola platforms).
Example(s):
little-endian
big-endian

skipbytes 

This element has no default value.
Content of this field: Description of this field:
Type: res:NonEmptyStringType
The number of bytes of data in the image file to skip in order to reach the start of the image data. This keyword allows you to bypass any existing image header information in the file. The default value is zero bytes.
Example(s):
0

bandrowbytes 

This element has no default value.
Content of this field: Description of this field:
Type: res:NonEmptyStringType
The number of bytes per band per row. This must be an integer. This keyword is used only with BIL files when there are extra bits at the end of each band within a row that must be skipped.
Example(s):
3

totalrowbytes 

This element has no default value.
Content of this field: Description of this field:
Type: res:NonEmptyStringType
The total number of bytes of data per row. Use totalrowbytes when there are extra trailing bits at the end of each row.
Example(s):
8

bandgapbytes 

This element has no default value.
Content of this field: Description of this field:
Type: res:NonEmptyStringType
The number of bytes between bands in a BSQ format image. The default is zero.
Example(s):
1

distribution 

This element has no default value.
Content of this field: Description of this field:
Type: PhysicalDistributionType
This element provides information on how the resource is distributed. Connections to online systems can be described as URLs or as a list of connection parameters. Please see the Type definition for complete information.

online 

This element has no default value.
Content of this field: Description of this field:
Type: PhysicalOnlineType
Information for a resource that is distributed online. Please see the Type definition for complete information.

offline 

This element has no default value.
Content of this field: Description of this field:
Type: res:OfflineType
Information for a resource that is distributed offline. Please see the Type definition for complete information.

inline 

This element has no default value.
Content of this field: Description of this field:
Type: res:InlineType
Information for a resource that is distributed inline, i.e., along with the metadata. Please see the Type definition for complete information.

access 

This element has no default value.
Content of this field: Description of this field:
Type: acc:AccessType
When this element occurs in a distribution module, it controls access only to the resource being described by the same distribution parent. Please see the Type definition for complete information on constructing an access tree.

onlineDescription 

This element has no default value.
Content of this field: Description of this field:
Type: res:NonEmptyStringType
The onlineDescription element can hold a brief description of the content of the online element's online|offline|inline child. This description element could supply content for an html anchor tag.

url 

This element has no default value.
Content of this field: Description of this field:
Type: res:UrlType
The URL of the resource that is available online. Please see the Type definition for complete information.

connection 

This element has no default value.
Content of this field: Description of this field:
Type: res:ConnectionType
A connection to a resource that is available online. Please see the Type definition for complete information.

Attribute Definitions:

unit

Use: optional

Default value: byte

This element gives the unit of measurement for the size of the entity, and is by default a byte.
Example(s):
byte

method

Type: xs:string

Use: optional

This element names the method used to calculate and authentication checksum that can be used to validate a bytestream. Typical checksum methods include MD5 and CRC.
Example(s):
MD5

id

Type: res:IDType

Use: optional

system

Type: res:SystemType

Use: optional

scope

Type: res:ScopeType

Use: optional

Default value: document

id

Type: res:IDType

Use: optional

system

Type: res:SystemType

Use: optional

scope

Type: res:ScopeType

Use: optional

Default value: document

Complex Type Definitions:

PhysicalType 

Content of this field: Description of this field:
Elements: Use: How many:
A choice of (
A sequence of (
objectNamerequired
sizeoptional
authenticationoptionalunbounded
A choice of (
compressionMethodrequired
OR
encodingMethodrequired
)
characterEncodingoptional
dataFormatrequired
distributionoptionalunbounded
)
OR
res:ReferencesGroup  
)
Attributes: Use: Default Value:
idoptional
systemoptional
scopeoptionaldocument

The eml-physical module describes the physical characteristics of a data object and the information required for its distribution. External physical characteristics include the filename, size, compression, encoding methods, and authentication of a file or byte stream. Internal physical characteristics describe the format of the data object. Proprietary formats can be cited (e.g., Microsoft Access 2000), or text formats can be precisely described (e.g., ASCII text delimited with commas). The module includes the information needed to parse the text data object to extract the entity and its attributes. Distribution information describes how to retrieve the data object, either as online (a URL or connection definition), offline (e.g., a data object residing on an archival tape), or inline (i.e., the data are included with the metadata).

Like many other EML elements, a physical Type can contain a reference to another physical element defined elsewhere in the document instead of a description of the resource. Using a reference means that the referenced physical is identical, not just in name but identical in its complete description.

PhysicalDistributionType 

Content of this field: Description of this field:
Elements: Use: How many:
A choice of (
A sequence of (
A choice of (
onlinerequired
OR
offlinerequired
OR
inlinerequired
)
accessoptional
)
OR
res:ReferencesGroup  
)
Attributes: Use: Default Value:
idoptional
systemoptional
scopeoptionaldocument

The PhysicalDistributionType contains the information required for retrieving the resource.

It differs from the

res:DistributionType

:

Generally, the PhysicalDisribtutionType is intended for download whereas the Type at the resource level is intended primarily for information.

The phys:PhysicalDistributionType includes an optional access tree which can be used to override access rules applied at the resource level. Access for the documents included entities can then be managed individually.

Also see individual sub elements for more information.

PhysicalOnlineType 

Content of this field: Description of this field:
Elements: Use: How many:
A sequence of (
onlineDescriptionoptional
A choice of (
urlrequired
OR
connectionrequired
)
)

Distribution information for accessing the resource online, represented either as a URL or as the series of named parameters needed to connect. The URL field can contain a simple web address or an entire query string. The connection element allows the components of a complex protocol to be described individually.

The PhysicalOnlineType differs from the

res:OnlineType

in that this type only allows a connectionDefinition to appear as the child of a connection. In other words, in a PhysicalOnlineType, the connectionDefinition cannot be abstracted, and must be included as part of an actual connection.

Simple Type Definitions:

Group Definitions:

Web Contact: jones@nceas.ucsb.edu