<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>14. OAI Protocol for Metadata Harvesting — Metacat 2.19.0 documentation</title> <link rel="stylesheet" href="_static/bootstrap.min.css" type="text/css" /> <link rel="stylesheet" href="_static/font-awesome/css/font-awesome.min.css" type="text/css" /> <link rel="stylesheet" href="_static/pygments.css" type="text/css" /> <link rel="stylesheet" href="_static/metacatui.css" type="text/css" /> <script type="text/javascript"> var DOCUMENTATION_OPTIONS = { URL_ROOT: './', VERSION: '2.19.0', COLLAPSE_MODINDEX: false, FILE_SUFFIX: '.html', HAS_SOURCE: true }; </script> <script type="text/javascript" src="_static/jquery.js"></script> <script type="text/javascript" src="_static/underscore.js"></script> <script type="text/javascript" src="_static/doctools.js"></script> <link rel="index" title="Index" href="genindex.html" /> <link rel="search" title="Search" href="search.html" /> <link rel="top" title="Metacat 2.19.0 documentation" href="index.html" /> <link rel="prev" title="13. Harvester and Harvest List Editor" href="harvester.html" /> <link rel="next" title="15. Event Logging" href="event-logging.html" /> </head> <body> <div id="metacatDocs"> <div class="banner"> <a href="index.html"><img class="logo" src="_static/metacat-logo-white.png" /></a> <a href="index.html"><h1 class="title">Metacat: Metadata and Data Management Server</h1></a> <img class="logo-right" src="_static/nceas-logo-white.png" /> </div> <div class="related"> <h3>Navigation</h3> <ul> <li class="right"> <span id="searchbox" style="display: none;"> <form class="search" action="search.html" method="get"> <input type="text" name="q" size="18" /> <input type="submit" value="Go" class="icon-search"/> <input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="area" value="default" /> </form> </span> </li> <script type="text/javascript">$('#searchbox').show(0);</script> <li class="right"> <a href="genindex.html" title="General Index" accesskey="I">index</a> </li> <li class="right"> <a href="event-logging.html" title="15. Event Logging" accesskey="N">next</a> </li> <li class="right"> <a href="harvester.html" title="13. Harvester and Harvest List Editor" accesskey="P">previous</a> </li> <li class="breadcrumb first"><a href="index.html">Metacat 2.19.0 documentation</a> »</li> </ul> </div> <div class="document"> <div class="documentwrapper"> <div class="bodywrapper"> <div class="body"> <div class="section" id="oai-protocol-for-metadata-harvesting"> <h1>14. OAI Protocol for Metadata Harvesting<a class="headerlink" href="#oai-protocol-for-metadata-harvesting" title="Permalink to this headline">¶</a></h1> <p>The Open Archives Initiative Protocol for Metadata Harvesting (<a class="reference external" href="http://www.openarchives.org/pmh/">OAI-PMH</a>) was first developed in the late 1990’s as a standard for harvesting metadata from distributed metadata/data repositories. The current version of the OAI-PMH standard is 2.0 as of June 2002, with minor updates in December 2008.</p> <p>The OAI-PMH standard uses the Hypertext Transport Protocol (HTTP) as a transport layer and specifies six query methods (called verbs) that must be supported by an OAI-PMH compliant data provider (also referred to as a repository). These methods are:</p> <ol class="arabic simple"> <li><code class="docutils literal"><span class="pre">GetRecord</span></code> - retrieves zero or one complete metadata record from a repository;</li> <li><code class="docutils literal"><span class="pre">Identify</span></code> - retrieves information about a repository;</li> <li><code class="docutils literal"><span class="pre">ListIdentifiers</span></code> - retrieves zero or more metadata record “headers†(not the complete metadata record) from a repository;</li> <li><code class="docutils literal"><span class="pre">ListMetadataFormats</span></code> - retrieves a list of available metadata record formats supported by a repository;</li> <li><code class="docutils literal"><span class="pre">ListRecords</span></code> - retrieves zero or more complete metadata records from a repository; and</li> <li><code class="docutils literal"><span class="pre">ListSets</span></code> - retrieves the set structure from a repository.</li> </ol> <p>The OAI-PMH compliant data provider must accept requests from both HTTP GET and HTTP POST request methods. Responses from the data provider must be returned as an XML-encoded (version 1.0) stream. Error handling must be supported by the data provider and return the correct error response code back to the harvester. Detailed specifications and examples of all six verbs may be viewed in Section 4 of the <a class="reference external" href="http://www.openarchives.org/OAI/openarchivesprotocol.html">OAI-PMH standards document</a>.</p> <div class="section" id="eml-and-dublin-core"> <h2>14.1. EML and Dublin Core<a class="headerlink" href="#eml-and-dublin-core" title="Permalink to this headline">¶</a></h2> <p>The OAI-PMH requires that unqualified Dublin Core metadata be supported as a minimum. Although EML generally provides more fine-grained metadata than Dublin Core, the two metadata standards do share many of the same (or similar) content elements. Transformations from EML to Dublin Core performed by Metacat OAI-PMH produce <em>simple</em> or <em>unqualified</em> Dublin Core, which is associated with the reserved metadataPrefix symbol <code class="docutils literal"><span class="pre">oai_dc</span></code> in the OAI-PMH.</p> <p>The following table summarizes the element mappings of the EML to Dublin Core crosswalk performed by Metacat OAI-PMH, including notes specific to each element mapping.</p> <table border="1" class="docutils"> <colgroup> <col width="20%" /> <col width="7%" /> <col width="74%" /> </colgroup> <thead valign="bottom"> <tr class="row-odd"><th class="head">EML Element</th> <th class="head">DC Element</th> <th class="head">Notes</th> </tr> </thead> <tbody valign="top"> <tr class="row-even"><td>Title</td> <td>title</td> <td> </td> </tr> <tr class="row-odd"><td>Creator</td> <td>creator</td> <td>Use only the creator’s name (givenName and surName elements); could be an organization name</td> </tr> <tr class="row-even"><td>keyword</td> <td>subject</td> <td>One subject element per keyword element</td> </tr> <tr class="row-odd"><td>abstract</td> <td>description</td> <td>Must extract text formatting tags</td> </tr> <tr class="row-even"><td>publisher</td> <td>publisher</td> <td>Use only the publisher’s name (givenName and surName elements); could be an organization name</td> </tr> <tr class="row-odd"><td>associatedParty</td> <td>contributor</td> <td>Use only the party’s name (givenName and surName); could be an organization name</td> </tr> <tr class="row-even"><td>pubDate</td> <td>date</td> <td>One-to-one mapping</td> </tr> <tr class="row-odd"><td>dataset, citation, protocol, software</td> <td>type</td> <td>Type value is determined by the type of EML document rather than by a specific field value</td> </tr> <tr class="row-even"><td>physical</td> <td>format</td> <td>Use a mime type as the Format value? For example, if EML has <textFormat> element within <physical>, then use ‘text/plain’ as the Format value?</td> </tr> <tr class="row-odd"><td><ol class="first last arabic simple"> <li>packageId;</li> <li>URL to the EML document</li> </ol> </td> <td>identifier</td> <td>packageId can be used as the value of one identifier element; a second identifier element can hold a URL to the EML document</td> </tr> <tr class="row-even"><td>dataSource</td> <td>source</td> <td>Use the document URL of the referenced data source?</td> </tr> <tr class="row-odd"><td>Citation</td> <td>relation</td> <td>Use the document URL of the referenced citation?</td> </tr> <tr class="row-even"><td>geographicCoverage</td> <td>coverage</td> <td>Add separate coverage elements for geographic description and geographic bounding coordinates. For bounding coordinates, use minimal labeling, for example: 81.505000 W, 81.495000 W, 31.170000 N, 31.163000 N</td> </tr> <tr class="row-odd"><td>taxonomicCoverage</td> <td>coverage</td> <td>Use only genus/species binomials; place each binomial in a separate coverage element</td> </tr> <tr class="row-even"><td>temporalCoverage</td> <td>coverage</td> <td>Include begin date and end date when available. For example: 1915-01-01 to 2004-12-31</td> </tr> <tr class="row-odd"><td>intellectualRights</td> <td>rights</td> <td>Must extract text formatting tags</td> </tr> </tbody> </table> <p>Metacat OAI-PMH includes a set of XSLT stylesheets used for converting specific versions of EML to their Dublin Core equivalents.</p> </div> <div class="section" id="metacat-oai-pmh-service-interfaces"> <h2>14.2. Metacat OAI-PMH Service Interfaces<a class="headerlink" href="#metacat-oai-pmh-service-interfaces" title="Permalink to this headline">¶</a></h2> <p>Metacat includes support for two OAI-PMH service interfaces: a data provider (or repository) service interface and a harvester service interface.</p> <div class="section" id="data-provider"> <h3>14.2.1. Data Provider<a class="headerlink" href="#data-provider" title="Permalink to this headline">¶</a></h3> <p>The Metacat OAI-PMH Data Provider service interface supports all six OAI-PMH methods (GetRecord, Identify, ListIdentifiers, ListMetadataFormats, ListRecords, and ListSets) as defined in the OAI-PMH Version 2 Specification through a standard HTTP URL that accepts both HTTP GET and HTTP POST requests.</p> <p>The Metacat OAI-PMH Data Provider service was implemented using the Online Computer Library Center (OCLC) OAICat Open Source Software as the basis for its implementation, with customizations added to facilitate integration with Metacat.</p> <p>Users of the Metacat OAI-PMH Data Provider should be aware of the following issues:</p> <ul class="simple"> <li>‘Deleted’ Status - OAI-PMH repositories can optionally flag records with a ‘deleted’ status, indicating that a record in the metadata format specified by the metadataPrefix is no longer available. Since Metacat does not provide a mechanism for retrieving a list of deleted documents, the use of the ‘deleted’ status is not supported in this implementation of the OAI-PMH Data Provider. This represents a possible future enhancement.</li> <li>Sets - OAI-PMH repositories can optionally support set hierarchies. Since it has not been determined how set hierarchies should be structured in Metacat, this implementation of the OAI-PMH repository does not support set hierarchies. This represents a possible future enhancement.</li> <li>Datestamp Granularity - When expressing datestamps for repository documents, OAI-PMH allows two levels of granularity: day granularity and seconds granularity. Since the Metacat database stores the value of its <code class="docutils literal"><span class="pre">xml_documents.date_updated</span></code> field in day granularity, it is the level that is supported by the Metacat OAI-PMH Data Provider.</li> </ul> </div> <div class="section" id="metacat-oai-pmh-harvester"> <h3>14.2.2. Metacat OAI-PMH Harvester<a class="headerlink" href="#metacat-oai-pmh-harvester" title="Permalink to this headline">¶</a></h3> <p>The Metacat OAI-PMH Harvester service interface utilizes OAI-PMH methods to request metadata or related information from an OAI-PMH-compliant data provider using a standard HTTP URL in either an HTTP-GET or HTTP-POST request.</p> <p>The Metacat OAI-PMH Harvester client was implemented using OCLC’s OAIHarvester2 open source code as its base implementation, with customizations as needed to support integration with Metacat.</p> <p>Users of the Metacat OAI-PMH Harvester should be aware of the following issues:</p> <ul class="simple"> <li>Handling of ‘Deleted’ status - The Metacat OAI-PMH Harvester program does check to see whether a ‘deleted’ status is flagged for a harvested document, and if it is, the document is correspondingly deleted from the Metacat repository.</li> <li>Datestamp Granularity - When expressing datestamps for repository documents, OAI-PMH allows two levels of granularity - day granularity and seconds granularity. Since the Metacat database stores the value of its <code class="docutils literal"><span class="pre">xml_documents.last_updated</span></code> field in day granularity, it is also the level that is supported by both the Metacat OAI-PMH Data Provider and the Metacat OAI-PMH Harvester. This has implications when Metacat OAI-PMH Harvester (MOH) interacts with data providers such as the Dryad repository, which stores its documents with seconds granularity. For example, consider the following sequence of events:<ol class="arabic"> <li>On January 1, 2010, MOH harvests a document from the Dryad repository with datestamp ‘2010-01-01T10:00:00Z’, and stores its local copy with datestamp ‘2010-01-01’.</li> <li>Later that same day, the Dryad repository updates the document to a newer revision, with a new datestamp such as ‘2010-01-01T20:00:0Z’.</li> <li>On the following day, MOH runs another harvest. It determines that it has a local copy of the document with datestamp ‘2010-01-01’ and does not re-harvest the document, despite the fact that its local copy is not the latest revision.</li> </ol> </li> </ul> </div> </div> <div class="section" id="configuring-and-running-metacat-oai-pmh"> <h2>14.3. Configuring and Running Metacat OAI-PMH<a class="headerlink" href="#configuring-and-running-metacat-oai-pmh" title="Permalink to this headline">¶</a></h2> <div class="section" id="metacat-oai-pmh-data-provider-servlet"> <h3>14.3.1. Metacat OAI-PMH Data Provider Servlet<a class="headerlink" href="#metacat-oai-pmh-data-provider-servlet" title="Permalink to this headline">¶</a></h3> <p>To configure and enable the Data Provider servlet:</p> <ol class="arabic"> <li><p class="first">Stop Tomcat and edit the Metacat properties (<code class="docutils literal"><span class="pre">metacat.properties</span></code>) file in the Metacat context directory inside the Tomcat application directory. The Metacat context directory is the name of the application (usually <code class="docutils literal"><span class="pre">knb</span></code>):</p> <div class="highlight-default"><div class="highlight"><pre><span></span><span class="o"><</span><span class="n">tomcat_app_dir</span><span class="o">>/<</span><span class="n">context_dir</span><span class="o">>/</span><span class="n">WEB</span><span class="o">-</span><span class="n">INF</span><span class="o">/</span><span class="n">metacat</span><span class="o">.</span><span class="n">properties</span> </pre></div> </div> </li> <li><p class="first">Change the following properties appropriately:</p> <div class="highlight-default"><div class="highlight"><pre><span></span>``oaipmh.repositoryIdentifier`` - A string that identifies this repository ``Identify.adminEmail`` - The email address of the repository administrator </pre></div> </div> </li> <li><p class="first">Edit the deployment descriptor (<code class="docutils literal"><span class="pre">web.xml</span></code>) file, also in the WEB-INF directory. Uncomment the servlet-name and servlet-mapping entries for the DataProvider servlet by removing the surroundin “<!–†and “–>†strings:</p> <div class="highlight-default"><div class="highlight"><pre><span></span><span class="o"><</span><span class="n">servlet</span><span class="o">></span> <span class="o"><</span><span class="n">servlet</span><span class="o">-</span><span class="n">name</span><span class="o">></span><span class="n">DataProvider</span><span class="o"></</span><span class="n">servlet</span><span class="o">-</span><span class="n">name</span><span class="o">></span> <span class="o"><</span><span class="n">description</span><span class="o">></span><span class="n">Processes</span> <span class="n">OAI</span> <span class="n">verbs</span> <span class="k">for</span> <span class="n">Metacat</span> <span class="n">OAI</span><span class="o">-</span><span class="n">PMH</span> <span class="n">Data</span> <span class="n">Provider</span> <span class="p">(</span><span class="n">MODP</span><span class="p">)</span><span class="o"></</span><span class="n">description</span><span class="o">></span> <span class="o"><</span><span class="n">servlet</span><span class="o">-</span><span class="n">class</span><span class="o">></span><span class="n">edu</span><span class="o">.</span><span class="n">ucsb</span><span class="o">.</span><span class="n">nceas</span><span class="o">.</span><span class="n">metacat</span><span class="o">.</span><span class="n">oaipmh</span><span class="o">.</span><span class="n">provider</span><span class="o">.</span><span class="n">server</span><span class="o">.</span><span class="n">OAIHandler</span><span class="o"></</span><span class="n">servlet</span><span class="o">-</span><span class="n">class</span><span class="o">></span> <span class="o"><</span><span class="n">load</span><span class="o">-</span><span class="n">on</span><span class="o">-</span><span class="n">startup</span><span class="o">></span><span class="mi">4</span><span class="o"></</span><span class="n">load</span><span class="o">-</span><span class="n">on</span><span class="o">-</span><span class="n">startup</span><span class="o">></span> <span class="o"></</span><span class="n">servlet</span><span class="o">></span> <span class="o"><</span><span class="n">servlet</span><span class="o">-</span><span class="n">mapping</span><span class="o">></span> <span class="o"><</span><span class="n">servlet</span><span class="o">-</span><span class="n">name</span><span class="o">></span><span class="n">DataProvider</span><span class="o"></</span><span class="n">servlet</span><span class="o">-</span><span class="n">name</span><span class="o">></span> <span class="o"><</span><span class="n">url</span><span class="o">-</span><span class="n">pattern</span><span class="o">>/</span><span class="n">dataProvider</span><span class="o"></</span><span class="n">url</span><span class="o">-</span><span class="n">pattern</span><span class="o">></span> <span class="o"></</span><span class="n">servlet</span><span class="o">-</span><span class="n">mapping</span><span class="o">></span> </pre></div> </div> </li> <li><p class="first">Save the <code class="docutils literal"><span class="pre">metacat.properties</span></code> and <code class="docutils literal"><span class="pre">web.xml</span></code> files and start Tomcat.</p> </li> </ol> <p>The following table describes the complete set of <code class="docutils literal"><span class="pre">metacat.properties</span></code> settings that are used by the DataProvider servlet.</p> <table border="1" class="docutils"> <colgroup> <col width="15%" /> <col width="29%" /> <col width="56%" /> </colgroup> <thead valign="bottom"> <tr class="row-odd"><th class="head">Property Name</th> <th class="head">Sample Value</th> <th class="head">Description</th> </tr> </thead> <tbody valign="top"> <tr class="row-even"><td>oaipmh.maxListSize</td> <td>5</td> <td>Maximum number of records returned by each call to the ListIdentifiers and ListRecords verbs.</td> </tr> <tr class="row-odd"><td>oaipmh.repositoryIdentifier</td> <td>metacat.lternet.edu</td> <td>An identifier string for the respository.</td> </tr> <tr class="row-even"><td>AbstractCatalog.oaiCatalogClassName</td> <td>edu.ucsb.nceas.metacat.oaipmh.provider.server.catalog.MetacatCatalog</td> <td>The Java class that implements the AbstractCatalog interface. This class determines which records exist in the repository and their datestamps.</td> </tr> <tr class="row-odd"><td>AbstractCatalog.recordFactoryClassName</td> <td>edu.ucsb.nceas.metacat.oaipmh.provider.server.catalog.MetacatRecordFactory</td> <td>The Java class that extends the RecordFactory class. This class creates OAI-PMH metadata records.</td> </tr> <tr class="row-even"><td>AbstractCatalog.secondsToLive</td> <td>3600</td> <td>The lifetime, in seconds, of the resumptionToken.</td> </tr> <tr class="row-odd"><td>AbstractCatalog.granularity</td> <td>YYYY-MM-DD or YYYY-MM-DDThh:mm:ssZ</td> <td>Granularity of datestamps. Either “days granularity†or “seconds granularity†values can be used.</td> </tr> <tr class="row-even"><td>Identify.repositoryName</td> <td>Metacat OAI-PMH Data Provider</td> <td>A name for the repository.</td> </tr> <tr class="row-odd"><td>Identify.earliestDatestamp</td> <td>2000-01-01T00:00:00Z</td> <td>Earliest datestamp supported by this repository</td> </tr> <tr class="row-even"><td>Identify.deletedRecord</td> <td>yes or no</td> <td>Use “yes†if the repository indicates the status of deleted records; use “no†if it doesn’t.</td> </tr> <tr class="row-odd"><td>Identify.adminEmail</td> <td><a class="reference external" href="mailto:tech_support%40someplace.org">mailto:tech_support<span>@</span>someplace<span>.</span>org</a></td> <td>Email address of the repository administrator.</td> </tr> <tr class="row-even"><td>Crosswalks.oai_dc</td> <td>edu.ucsb.nceas.metacat.oaipmh.provider.server.crosswalk.Eml2oai_dc</td> <td>Java class that controls the EML 2.x.y to oai_dc (Dublin Core) crosswalk.</td> </tr> <tr class="row-odd"><td>Crosswalks.eml2.0.0</td> <td>edu.ucsb.nceas.metacat.oaipmh.provider.server.crosswalk.Eml200</td> <td>Java class that furnishes EML 2.0.0 metadata.</td> </tr> <tr class="row-even"><td>Crosswalks.eml2.0.1</td> <td>edu.ucsb.nceas.metacat.oaipmh.provider.server.crosswalk.Eml201</td> <td>Java class that furnishes EML 2.0.1 metadata.</td> </tr> <tr class="row-odd"><td>Crosswalks.eml2.1.0</td> <td>edu.ucsb.nceas.metacat.oaipmh.provider.server.crosswalk.Eml210</td> <td>Java class that furnishes EML 2.1.0 metadata.</td> </tr> </tbody> </table> <div class="section" id="sample-urls"> <h4>14.3.1.1. Sample URLs<a class="headerlink" href="#sample-urls" title="Permalink to this headline">¶</a></h4> <p>Sample URLs that demonstrate use of the Metacat OAI-PMH Data Provider follow:</p> <table border="1" class="docutils"> <colgroup> <col width="10%" /> <col width="28%" /> <col width="62%" /> </colgroup> <thead valign="bottom"> <tr class="row-odd"><th class="head">OAI-PMH Verb</th> <th class="head">Description</th> <th class="head">URL</th> </tr> </thead> <tbody valign="top"> <tr class="row-even"><td>GetRecord</td> <td>Get an EML 2.0.1 record using its LSID identifier</td> <td><a class="reference external" href="http:/">http:/</a>/<your_context_url>/dataProvider?verb=GetRecord&metadataPrefix=eml-2.0.1&identifier=urn:lsid:knb.ecoinformatics.org:knb-ltergce:26</td> </tr> <tr class="row-odd"><td>GetRecord</td> <td>Get an oai_dc (Dublin Core) record using its LSID identifier</td> <td><a class="reference external" href="http:/">http:/</a>/<your_context_url>/dataProvider?verb=GetRecord&metadataPrefix=oai_dc&identifier=urn:lsid:knb.ecoinformatics.org:knb-lter-gce:26</td> </tr> <tr class="row-even"><td>Identify</td> <td>Identify this data provider</td> <td><a class="reference external" href="http:/">http:/</a>/<your_context_url>/dataProvider?verb=Identify</td> </tr> <tr class="row-odd"><td>ListIdentifiers</td> <td>List all EML 2.1.0 identifiers in the repository</td> <td><a class="reference external" href="http:/">http:/</a>/<your_context_url>/dataProvider?verb=ListIdentifiers&metadataPrefix=eml-2.1.0</td> </tr> <tr class="row-even"><td>ListIdentifiers</td> <td>List all oai_dc (Dublin Core) identifiers in the repository between a range of dates</td> <td><a class="reference external" href="http:/">http:/</a>/<your_context_url>/dataProvider?verb=ListIdentifiers&metadataPrefix=oai_dc&from=2006-01-01&until=2010-01-01</td> </tr> <tr class="row-odd"><td>ListMetadataFormats</td> <td>List metadata formats supported by this repository</td> <td><a class="reference external" href="http:/">http:/</a>/<your_context_url>/dataProvider?verb=ListMetadataFormats</td> </tr> <tr class="row-even"><td>ListRecords</td> <td>List all EML 2.0.0 records in the repository</td> <td><a class="reference external" href="http:/">http:/</a>/<your_context_url>/dataProvider?verb=ListRecords&metadataPrefix=eml-2.0.0</td> </tr> <tr class="row-odd"><td>ListRecords</td> <td>List all oai_dc (Dublin Core) records in the repository</td> <td><a class="reference external" href="http:/">http:/</a>/<your_context_url>/dataProvider?verb=ListRecords&metadataPrefix=oai_dc</td> </tr> <tr class="row-even"><td>ListSets</td> <td>List sets supported by this repository</td> <td><a class="reference external" href="http:/">http:/</a>/<your_context_url>/dataProvider?verb=ListSets</td> </tr> </tbody> </table> </div> </div> <div class="section" id="id1"> <h3>14.3.2. Metacat OAI-PMH Harvester<a class="headerlink" href="#id1" title="Permalink to this headline">¶</a></h3> <p>The Metacat OAI-PMH Harvester (MOH) is executed as a command-line program:</p> <div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">sh</span> <span class="n">runHarvester</span><span class="o">.</span><span class="n">sh</span> <span class="o">-</span><span class="n">dn</span> <span class="o"><</span><span class="n">distinguishedName</span><span class="o">></span> \ <span class="o">-</span><span class="n">password</span> <span class="o"><</span><span class="n">password</span><span class="o">></span> \ <span class="o">-</span><span class="n">metadataPrefix</span> <span class="o"><</span><span class="n">prefix</span><span class="o">></span> \ <span class="p">[</span><span class="o">-</span><span class="kn">from</span> <span class="o"><</span><span class="n">fromDate</span><span class="o">></span><span class="p">]</span> \ <span class="p">[</span><span class="o">-</span><span class="n">until</span> <span class="o"><</span><span class="n">untilDate</span><span class="o">></span><span class="p">]</span> \ <span class="p">[</span><span class="o">-</span><span class="n">setSpec</span> <span class="o"><</span><span class="n">setName</span><span class="o">></span><span class="p">]</span> \ <span class="o"><</span><span class="n">baseURL</span><span class="o">></span> </pre></div> </div> <p>The following example illustrates how the Metacat OAI-PMH Harvester is run from the command line:</p> <ol class="arabic"> <li><p class="first">Open a system command window or terminal window.</p> </li> <li><p class="first">Set the METACAT_HOME environment variable to the value of the Metacat installation directory. Some examples follow:</p> <div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">export</span> <span class="n">METACAT_HOME</span><span class="o">=/</span><span class="n">home</span><span class="o">/</span><span class="n">somePath</span><span class="o">/</span><span class="n">metacat</span> </pre></div> </div> </li> <li><p class="first">cd to the following directory:</p> <div class="highlight-default"><div class="highlight"><pre><span></span>cd $METACAT_HOME/lib/oaipmh </pre></div> </div> </li> <li><p class="first">Run the appropriate Metacat OAI-PMH Harvester shell script, as determined by the operating system:</p> <div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">sh</span> <span class="n">runHarvester</span><span class="o">.</span><span class="n">sh</span> \ <span class="o">-</span><span class="n">dn</span> <span class="n">uid</span><span class="o">=</span><span class="n">jdoe</span><span class="p">,</span><span class="n">o</span><span class="o">=</span><span class="n">myorg</span><span class="p">,</span><span class="n">dc</span><span class="o">=</span><span class="n">ecoinformatics</span><span class="p">,</span><span class="n">dc</span><span class="o">=</span><span class="n">org</span> \ <span class="o">-</span><span class="n">password</span> <span class="n">some_password</span> \ <span class="o">-</span><span class="n">metadataPrefix</span> <span class="n">oai_dc</span> \ <span class="n">http</span><span class="p">:</span><span class="o">//</span><span class="n">baseurl</span><span class="o">.</span><span class="n">repository</span><span class="o">.</span><span class="n">org</span><span class="o">/</span><span class="n">metacat</span><span class="o">/</span><span class="n">dataProvider</span> </pre></div> </div> </li> </ol> <p>Command line options and parameters are described in the following table:</p> <table border="1" class="docutils"> <colgroup> <col width="16%" /> <col width="30%" /> <col width="54%" /> </colgroup> <thead valign="bottom"> <tr class="row-odd"><th class="head">Command Option or Parameter</th> <th class="head">Example</th> <th class="head">Description</th> </tr> </thead> <tbody valign="top"> <tr class="row-even"><td>-dn</td> <td><code class="docutils literal"><span class="pre">-dn</span> <span class="pre">uid=dryad,o=LTER,dc=ecoinformatics,dc=org</span></code></td> <td>Full distinguished name of the LDAP account used when harvesting documents into Metacat. (Required)</td> </tr> <tr class="row-odd"><td>-password</td> <td><code class="docutils literal"><span class="pre">-password</span> <span class="pre">some_password</span></code></td> <td>Password of the LDAP account used when harvesting documents into Metacat. (Required)</td> </tr> <tr class="row-even"><td>-metadataPrefix</td> <td><code class="docutils literal"><span class="pre">-metadataPrefix</span> <span class="pre">oai_dc</span></code></td> <td>The type of documents being harvested from the remote repository. (Required)</td> </tr> <tr class="row-odd"><td>-from</td> <td><code class="docutils literal"><span class="pre">-from</span> <span class="pre">2000-01-01</span></code></td> <td>The lower limit of the datestamp for harvested documents. (Optional)</td> </tr> <tr class="row-even"><td>-until</td> <td><code class="docutils literal"><span class="pre">-until</span> <span class="pre">2010-12-31</span></code></td> <td>The upper limit of the datestamp for harvested documents. (Optional)</td> </tr> <tr class="row-odd"><td>-setSpec</td> <td><code class="docutils literal"><span class="pre">-setSpec</span> <span class="pre">someSet</span></code></td> <td>Harvest documents belonging to this set. (Optional)</td> </tr> <tr class="row-even"><td>base_url</td> <td><code class="docutils literal"><span class="pre">http://baseurl.repository.org/metacat/dataProvider</span></code></td> <td>Base URL of the remote repository</td> </tr> </tbody> </table> </div> </div> <div class="section" id="oai-pmh-error-codes"> <h2>14.4. OAI-PMH Error Codes<a class="headerlink" href="#oai-pmh-error-codes" title="Permalink to this headline">¶</a></h2> <table border="1" class="docutils"> <colgroup> <col width="20%" /> <col width="63%" /> <col width="17%" /> </colgroup> <tbody valign="top"> <tr class="row-odd"><td>Error Code</td> <td>Description</td> <td>Applicable Verbs</td> </tr> <tr class="row-even"><td>badArgument</td> <td>The request includes illegal arguments, is missing required arguments, includes a repeated argument, or values for arguments have an illegal syntax.</td> <td>all verbs</td> </tr> <tr class="row-odd"><td>badResumptionToken</td> <td>The value of the resumptionToken argument is invalid or expired.</td> <td>ListIdentifiers ListRecords ListSets</td> </tr> <tr class="row-even"><td>badVerb</td> <td>Value of the verb argument is not a legal OAI-PMH verb, the verb argument is missing, or the verb argument is repeated.</td> <td>N/A</td> </tr> <tr class="row-odd"><td>cannotDisseminateFormat</td> <td>The metadata format identified by the value given for the metadataPrefix argument is not supported by the item or by the repository.</td> <td>GetRecord ListIdentifiers ListRecords</td> </tr> <tr class="row-even"><td>idDoesNotExist</td> <td>The value of the identifier argument is unknown or illegal in this repository.</td> <td>GetRecord ListMetadataFormats</td> </tr> <tr class="row-odd"><td>noRecordsMatch</td> <td>The combination of the values of the from, until, set and metadataPrefix arguments results in an empty list.</td> <td>ListIdentifiers ListRecords</td> </tr> <tr class="row-even"><td>noMetadataFormats</td> <td>There are no metadata formats available for the specified item.</td> <td>ListMetadataFormats</td> </tr> <tr class="row-odd"><td>noSetHierarchy</td> <td>The repository does not support sets.</td> <td>ListSets ListIdentifiers ListRecords</td> </tr> </tbody> </table> </div> </div> </div> </div> </div> <div class="clearer"></div> </div> <div class="footer"> <div class="footerNav"> <div class="related"> <h3>Navigation</h3> <ul> <li class="right"> <span id="searchbox" style="display: none;"> <form class="search" action="search.html" method="get"> <input type="text" name="q" size="18" /> <input type="submit" value="Go" class="icon-search"/> <input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="area" value="default" /> </form> </span> </li> <script type="text/javascript">$('#searchbox').show(0);</script> <li class="right"> <a href="genindex.html" title="General Index" >index</a> </li> <li class="right"> <a href="event-logging.html" title="15. Event Logging" >next</a> </li> <li class="right"> <a href="harvester.html" title="13. Harvester and Harvest List Editor" >previous</a> </li> <li class="breadcrumb first"><a href="index.html">Metacat 2.19.0 documentation</a> »</li> </ul> </div> </div> <div class="small-print"> © Copyright 2012, Regents of the University of California. Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 1.6.7. </div> </div> </div> </body> </html>