User Guide to the Data Manager Library API

Overview

The purpose of the Data Manager Library is to provide an Application Programming Interface (API) through which a calling application can access an EML (or other metadata) document, parse its contents, download its related data entities, store those data entities as tables in a relational database, and query those tables using SQL-like constructs.

Installation and Configuration

Minimum Requirements

  1. One of the following relational database management systems:
  2. A recent Java SDK; j2sdk1.4.2 or later is required.

Installation

Download and uncompress file datamanager-1.0.0.zip (Windows) or datamanager-1.0.0.tar.gz (Linux or Unix). The Java application that uses this library (referred to throughout this document as the Calling Application) should include the datamanager.jar file in its Java classpath.

Using the Data Manager Library API

The Calling Application interacts with the Data Manager Library through its API. The API exposes to the Calling Application a set of public methods in the following Java class:

org.ecoinformatics.datamanager.DataManager

The capabilities provided to the Calling Application and their related methods in the DataManager class are summarized in the table below:

Use Case Capability DataManager Method Name
1 Parse a metadata document to obtain information about the entities and attributes in the data package parseMetadata()
2 Download data from the remote source to a local data store downloadData()
3 Load data into a relational database table; supported relational database management systems are HSQL, Oracle, and PostgreSQL loadDataToDB()
4 Query the data from the relational database selectData()


Use cases corresponding to these four capabilities are detailed in useCases.pdf.

Requirements of the Calling Application

The Calling Application is required to provide class implementations of the following interfaces:

Sample Calling Application

A sample Calling Application is provided with the distribution in package org.ecoinformatics.datamanager.sample. It is configured to load database connectivity properties and other properties at run-time from file datamanager.properties (as accessible in your classpath). The user may edit this file and modify these properties in accordance with local database settings.

The sample Calling Application consists of three classes, described in the table below:
Class Purpose Implements Interface
SampleCallingApp Main program for the sample Calling Application. Executes a number of small tests to demonstrate the use cases supported by the Data Manager Library API. DatabaseConnectionPoolInterface
SampleDataStorage Demonstrates implementation of DataStorageInterface DataStorageInterface
EcogridEndPoint Demonstrates implementation of EcogridEndPointInterface EcogridEndPointInterface

To run the sample Calling Application, change directory to the top-level of the Data Manager Library distribution (the directory that contains the datamanager.jar file). For example:


cd C:\datamanager-1.0.0
Next, run the following command:

java -cp "datamanager.jar" org.ecoinformatics.datamanager.sample.SampleCallingApp

If it executes successfully, the output of the sample Calling Application will look similar to the following:


Finished testParseMetadata(), success = true

Constructing DownloadHandler for URL: http://gce-lter.marsci.uga.edu/lter/asp/db/send_file.asp?name=metacat-user&email=none&affiliation=LNO¬ify=0&accession=INS-GCEM-0011&filename=INS-GCEM-0011_1_3.TXT

the identifier is ============ tao2075037663
the identifier is ============ tao2075037663
Finished testDownloadData(), success = true

[Ljava.lang.String;@16cd7d5
[Ljava.lang.String;@cdedfd
Attribute Name: Site
DB Field Name : Site
dbDataType    : TEXT

Attribute Name: Year
DB Field Name : Year
dbDataType    : TEXT

Attribute Name: Month
DB Field Name : Month
dbDataType    : TEXT

Attribute Name: Day
DB Field Name : Day
dbDataType    : TEXT

Attribute Name: Transect
DB Field Name : Transect
dbDataType    : TEXT

Attribute Name: Species_Code
DB Field Name : Species_Code
dbDataType    : TEXT

Attribute Name: Count
DB Field Name : Count
dbDataType    : INTEGER

Constructing DownloadHandler for URL: http://gce-lter.marsci.uga.edu/lter/asp/db/send_file.asp?name=metacat-user&email=none&affiliation=LNO¬ify=0&accession=INS-GCEM-0011&filename=INS-GCEM-0011_1_3.TXT
Finished testLoadDataToDB(), success = true

Query SQL = 'SELECT INS_GCEM_0011_1_3_TXT.Count FROM INS_GCEM_0011_1_3_TXT  where INS_GCEM_0011_1_3_TXT.Count > 1;'
Printing all records with 'count' value greater than 1
resultSet[1], count =  2
resultSet[2], count =  3
resultSet[3], count =  2
resultSet[4], count =  2
resultSet[5], count =  2
resultSet[6], count =  4
resultSet[7], count =  2
resultSet[8], count =  3
resultSet[9], count =  5
resultSet[10], count =  8
resultSet[11], count =  5
resultSet[12], count =  8
resultSet[13], count =  5
resultSet[14], count =  9
resultSet[15], count =  7
Finished testSelectData(), success = true

tableName: INS_GCEM_0011_1_3_TXT
  fieldNames[0]: site
  fieldNames[1]: year
  fieldNames[2]: month
  fieldNames[3]: day
  fieldNames[4]: transect
  fieldNames[5]: species_code
  fieldNames[6]: count
Finished testEnumerationMethods(), success = true

Finished all tests, success = true

Finished dropping tables.

Future Enhancements

Additional capabilities planned for the Data Manager Library will include:

Use Case Capability DataManager Method Name
5 Set an upper limit on the size of the database. The Data Manager Library will monitor the size of the database, and if the upper limit (as set by the Calling Application) is exceeded, old data tables will be dropped as needed to free up space in the database. setDatabaseSize()
6 Set a life-span priority on individual data tables. This relates to the previous capability (Use Case 5), in that the Calling Application may single out individual data tables as high priority to indicate that they should not be dropped when the database exceeds the specified size limit. setTableExpirationPolicy()