€cdocutils.nodes
document
q)q}q(U nametypesq}qX querying dataoneqNsUsubstitution_defsq}qUparse_messagesq ]q
Ucurrent_sourceqNU
decorationqNUautofootnote_startq
KUnameidsq}qhUquerying-dataoneqsUchildrenq]qcdocutils.nodes
section
q)q}q(U rawsourceqU UparentqhUsourceqXl /var/lib/jenkins/jobs/API_Documentation_trunk/workspace/api-documentation/source/design/querying_content.txtqUtagnameqUsectionqU
attributesq}q(Udupnamesq]Uclassesq]Ubackrefsq ]Uidsq!]q"haUnamesq#]q$hauUlineq%KUdocumentq&hh]q'(cdocutils.nodes
title
q()q)}q*(hX Querying DataONEq+hhhhhUtitleq,h}q-(h]h]h ]h!]h#]uh%Kh&hh]q.cdocutils.nodes
Text
q/X Querying DataONEq0…q1}q2(hh+hh)ubaubcdocutils.nodes
target
q3)q4}q5(hU hhhNhUtargetq6h}q7(h!]h ]h]h]h#]Urefidq8Uindex-0q9uh%Nh&hh]ubcsphinx.ext.todo
todo_node
q:)q;}q<(hX| - Attribute mapping to the list prepared previously
- Attribute mapping to sysmeta docs
- SOLR examples, specific to Mercuryq=hhhhUexpect_referenced_by_nameq>}hU todo_nodeq?h}q@(h]h]qAUadmonition-todoqBah ]h!]qCh9ah#]uh%Kh&hUexpect_referenced_by_idqD}qEh9h4sh]qF(h()qG}qH(hX TodoqIh}qJ(h]h]h ]h!]h#]uhh;h]qKh/X TodoqL…qM}qN(hU hhGubahh,ubcdocutils.nodes
bullet_list
qO)qP}qQ(hU h}qR(UbulletqSX -h!]h ]h]h]h#]uhh;h]qT(cdocutils.nodes
list_item
qU)qV}qW(hX1 Attribute mapping to the list prepared previouslyqXh}qY(h]h]h ]h!]h#]uhhPh]qZcdocutils.nodes
paragraph
q[)q\}q](hhXhhVhhhU paragraphq^h}q_(h]h]h ]h!]h#]uh%Kh]q`h/X1 Attribute mapping to the list prepared previouslyqa…qb}qc(hhXhh\ubaubahU list_itemqdubhU)qe}qf(hX! Attribute mapping to sysmeta docsqgh}qh(h]h]h ]h!]h#]uhhPh]qih[)qj}qk(hhghhehhhh^h}ql(h]h]h ]h!]h#]uh%Kh]qmh/X! Attribute mapping to sysmeta docsqn…qo}qp(hhghhjubaubahhdubhU)qq}qr(hX" SOLR examples, specific to Mercuryqsh}qt(h]h]h ]h!]h#]uhhPh]quh[)qv}qw(hhshhqhhhh^h}qx(h]h]h ]h!]h#]uh%Kh]qyh/X" SOLR examples, specific to Mercuryqz…q{}q|(hhshhvubaubahhdubehUbullet_listq}ubeubh[)q~}q(hXC This document has been DEPRECATED: Please see :doc:`SearchMetadata`q€hhhhhh^h}q(h]h]h ]h!]h#]uh%Kh&hh]q‚(h/X. This document has been DEPRECATED: Please see qƒ…q„}q…(hX. This document has been DEPRECATED: Please see hh~ubcsphinx.addnodes
pending_xref
q†)q‡}qˆ(hX :doc:`SearchMetadata`q‰hh~hhhUpending_xrefqŠh}q‹(UreftypeX docqŒUrefwarnqˆU reftargetqŽX SearchMetadataU refdomainU h!]h ]Urefexplicit‰h]h]h#]UrefdocqX design/querying_contentquh%Kh]q‘cdocutils.nodes
inline
q’)q“}q”(hh‰h}q•(h]h]q–(Uxrefq—hŒeh ]h!]h#]uhh‡h]q˜h/X SearchMetadataq™…qš}q›(hU hh“ubahUinlineqœubaubeubcdocutils.nodes
comment
q)qž}qŸ(hXk" Content here is preserved for notes until the search API is completed.
Synopsis
--------
This document provides an outline for approaches to querying content available
in DataONE through the ``/object/`` collection exposed by the CNs and MNs
(i.e. :func:`MN_replication.listObjects` and :func:`CN_query.search`
methods). The same approach can be applied to the ``/log/`` collection exposed
by the CNs and MNs (i.e. the :func:`CN_query.getLogRecords` and
:func:`MN_crud.getLogRecords` methods).
There are three types of query that can be readily supported by CNs
(name-value pairs, Metacat path query, and Mercury SOLR query), and at least
one by MNs (name-value pairs). There may also be additional query types
specified in the future (e.g. CQL, SPARQL).
Overview
--------
The basic model is that a query applied against a collection acts as a filter,
restricting the results to only those objects whose properties match the
supplied query expression. The default, or unfiltered view of the collection
shows all objects (that the user is authorized to access). The query does not
shape the result, i.e. it does not indicate which fields are returned or the
structure of the response.
There seems to be two basic types of query that need to be supported. One is
querying against fairly distinct and controlled object attributes that are for
the most part, defined by the DataONE system ("system queries"). The other is
for queries that apply to the content of objects that are contributed to
DataONE ("content queries"). In this case, the content, structure, and even
representation is essentially uncontrolled, and so may vary considerably
across the universe of objects that are managed by DataONE.
A longterm goal would be to support a query syntax that is expressive enough
to enable precise discovery of content but also simple enough that at least
common queries can be expressed in a URL.
There are three types of query expression that can be supported easily with
the initial version of the DataONE cyber-infrastructure:
1) Simple name-value pairs combined together with a single logical operator
(e.g. AND).
2) The Path Query syntax / structure that is used by Metacat. This is a
potentially very expressive query that is encoded in an XML structure, and so
can be unwieldy for passing in a URL (POST is typically used) or generation by
hand.
3) The SOLR / Lucene query syntax that is supported by Mercury. Fairly
sophisticated queries can be expressed, but there is no mechanism for querying
against structure (e.g. matching the value of a term that is a child of some
other element). SOLR queries are designed to be transmitted in URLs and are
reasonably simple to create by hand.
The different types of query are described in more detail below.
Since it is feasible that MNs and CNs could support multiple query types, it
is desirable that the client provide a hint about the type of query being
transmitted through a URL parameter such as "``qt``" (query type), with::
qt=nvp --> Name, value pairs
qt=path --> Metacat path query
qt=solr --> SOLR query syntax (used by Mercury)
Simple NV Pairs
---------------
The basic approach here is the use name/value pairs (NVPs) in the URL to
construct a query, with names typically mapping to an attribute + comparison
operator (with comparison operator indicated as a suffix to the attribute),
and values being the value to compare against entries in the database.
Multiple NVPs are combined together with either the logical AND operator or
the logical OR operator. The types of queries that can be expressed are quite
limited, though can be sufficient for restricting results to a portion of a
data set modeled as a flat table.
The primary goal of this query syntax is to enable simple implementation of
range restrictions for collections available on MNs.
An example of how a simple query might express "objects of type data that have
been modified since 6AM on the first of January, 2010 UTC"::
../object/?qt=nvp&oclass=data&lastModified_gt=20100101T060000+00
Suggestions for comparison operator suffixes:
======= ===========================
Suffix Comparison Operator
======= ===========================
None Equals (==) (default)
_eq Equals (==)
_ne Not equal (!=)
_lt Less than (<)
_le Less than or equals (<=)
_gt Greater than (>)
_ge Greater than or equals (>=)
======= ===========================
The presence of one or more wildcard characters in the value for an
equivalence operator would invoke the equivalent of a substring search. For
example::
../object/?qt=nvp&oclass=d*
could be mapped to the SQL WHERE clause::
WHERE oclass LIKE 'd%'
The general grammar of the query can be expressed as:
.. productionlist::
NVPQuery : { `nvpair` }
nvpair : `name` + "=" + `value`
name : string [+ `operator`]
operator : "_eq" | "_ne" | "_lt" | "_le" | "_gt" | "_ge"
value : string
An alternative approach is to use enumerated triples, so for the same query as
above (with ``a`` referring to "attribute name", ``c`` to "comparison
operator", and ``v`` to "value")::
../object/?qt=nvp&a0=oclass&c0=eq&v0=data&
a1=lastModified&c1=gt&v1=20100101T060000+00
This approach has an advantage of specifying simple logical operators, e.g.::
&lop0_1=AND
which would indicate that the logical operator between the first and second
query elements is "AND". This gets messy pretty quickly though when
considering precedence rules.
Metacat Path Query
------------------
.. TODO::
- Rewrite this section to use the EarthGrid query syntax, which is more
readable and expresses the same concepts as the pathquery
Metacat is an XML database, and so must support mechanisms for querying not
just the attribute name, but also its location relative to other elements of
the document (similar to XPath). The path query also indicates the elements
that will be returned in the response. An `example path query`_::
unspecified
unspecified
dataset/title
keyword
originator/individualName/surName
eml://ecoinformatics.org/eml-2.0.1
eml://ecoinformatics.org/eml-2.0.0
Plant
dataset/title
plant
keyword
This query states something like return the field values ``dataset/title``,
``keyword``, and ``originator/individualName/surName`` from documents where
the string "plant" appears in the ``keyword`` attribute or the string "Datos"
appears in the ``dataset/title`` attribute. The comparisons are performed
without consideration of case.
Since path queries are expressed as XML documents, they can get quite large
and so can be unwieldy when sending over a HTTP GET request. However, the
types of queries that can be created can be quite precise and expressive, so
these should be supported by the CN services, which shouldn't involve much
more than passing the query through to the Metacat instance operating as the
document store on the CN.
.. _example path query: https://code.ecoinformatics.org/code/metacat/trunk/docs/user/metacatquery.html
SOLR Query Syntax
-----------------
- http://wiki.apache.org/solr/SolrQuerySyntax
- http://lucene.apache.org/java/2_4_0/queryparsersyntax.html
Query Attributes
----------------
- Best if query attributes were consistent across all the query types
- Distinction between searches against system metadata and science metadata
(though some overlap of attributes)
- Log searches can probably be pretty simple - just slicing by time
- MNs and CNs should support introspection that lists the supported query
types and the supported query attributes
Misc Notes
Google visualization api query language: http://code.google.com/apis/visualization/documentation/querylanguage.html
SRU/SRW and CQL: http://www.loc.gov/standards/sru/
OpenSearch: http://www.opensearch.org/Home
XPath: http://www.w3.org/TR/xpath and XQuery: http://www.w3.org/TR/xquery/
(appropriate for querying against a general XML model)
SPARQL (assuming you can express content in an RDF model):
http://www.w3.org/TR/rdf-sparql-query/
TAPIR:
http://www.tdwg.org/dav/subgroups/tapir/1.0/docs/TAPIRSpecification_2008-02-07.html
MetaCat (EarthGRID):
https://code.ecoinformatics.org/code/metacat/trunk/docs/user/metacatquery.htmlhhhhhUcommentq h}q¡(U xml:spaceq¢Upreserveq£h!]h ]h]h]h#]uh%Köh&hh]q¤h/Xk" Content here is preserved for notes until the search API is completed.
Synopsis
--------
This document provides an outline for approaches to querying content available
in DataONE through the ``/object/`` collection exposed by the CNs and MNs
(i.e. :func:`MN_replication.listObjects` and :func:`CN_query.search`
methods). The same approach can be applied to the ``/log/`` collection exposed
by the CNs and MNs (i.e. the :func:`CN_query.getLogRecords` and
:func:`MN_crud.getLogRecords` methods).
There are three types of query that can be readily supported by CNs
(name-value pairs, Metacat path query, and Mercury SOLR query), and at least
one by MNs (name-value pairs). There may also be additional query types
specified in the future (e.g. CQL, SPARQL).
Overview
--------
The basic model is that a query applied against a collection acts as a filter,
restricting the results to only those objects whose properties match the
supplied query expression. The default, or unfiltered view of the collection
shows all objects (that the user is authorized to access). The query does not
shape the result, i.e. it does not indicate which fields are returned or the
structure of the response.
There seems to be two basic types of query that need to be supported. One is
querying against fairly distinct and controlled object attributes that are for
the most part, defined by the DataONE system ("system queries"). The other is
for queries that apply to the content of objects that are contributed to
DataONE ("content queries"). In this case, the content, structure, and even
representation is essentially uncontrolled, and so may vary considerably
across the universe of objects that are managed by DataONE.
A longterm goal would be to support a query syntax that is expressive enough
to enable precise discovery of content but also simple enough that at least
common queries can be expressed in a URL.
There are three types of query expression that can be supported easily with
the initial version of the DataONE cyber-infrastructure:
1) Simple name-value pairs combined together with a single logical operator
(e.g. AND).
2) The Path Query syntax / structure that is used by Metacat. This is a
potentially very expressive query that is encoded in an XML structure, and so
can be unwieldy for passing in a URL (POST is typically used) or generation by
hand.
3) The SOLR / Lucene query syntax that is supported by Mercury. Fairly
sophisticated queries can be expressed, but there is no mechanism for querying
against structure (e.g. matching the value of a term that is a child of some
other element). SOLR queries are designed to be transmitted in URLs and are
reasonably simple to create by hand.
The different types of query are described in more detail below.
Since it is feasible that MNs and CNs could support multiple query types, it
is desirable that the client provide a hint about the type of query being
transmitted through a URL parameter such as "``qt``" (query type), with::
qt=nvp --> Name, value pairs
qt=path --> Metacat path query
qt=solr --> SOLR query syntax (used by Mercury)
Simple NV Pairs
---------------
The basic approach here is the use name/value pairs (NVPs) in the URL to
construct a query, with names typically mapping to an attribute + comparison
operator (with comparison operator indicated as a suffix to the attribute),
and values being the value to compare against entries in the database.
Multiple NVPs are combined together with either the logical AND operator or
the logical OR operator. The types of queries that can be expressed are quite
limited, though can be sufficient for restricting results to a portion of a
data set modeled as a flat table.
The primary goal of this query syntax is to enable simple implementation of
range restrictions for collections available on MNs.
An example of how a simple query might express "objects of type data that have
been modified since 6AM on the first of January, 2010 UTC"::
../object/?qt=nvp&oclass=data&lastModified_gt=20100101T060000+00
Suggestions for comparison operator suffixes:
======= ===========================
Suffix Comparison Operator
======= ===========================
None Equals (==) (default)
_eq Equals (==)
_ne Not equal (!=)
_lt Less than (<)
_le Less than or equals (<=)
_gt Greater than (>)
_ge Greater than or equals (>=)
======= ===========================
The presence of one or more wildcard characters in the value for an
equivalence operator would invoke the equivalent of a substring search. For
example::
../object/?qt=nvp&oclass=d*
could be mapped to the SQL WHERE clause::
WHERE oclass LIKE 'd%'
The general grammar of the query can be expressed as:
.. productionlist::
NVPQuery : { `nvpair` }
nvpair : `name` + "=" + `value`
name : string [+ `operator`]
operator : "_eq" | "_ne" | "_lt" | "_le" | "_gt" | "_ge"
value : string
An alternative approach is to use enumerated triples, so for the same query as
above (with ``a`` referring to "attribute name", ``c`` to "comparison
operator", and ``v`` to "value")::
../object/?qt=nvp&a0=oclass&c0=eq&v0=data&
a1=lastModified&c1=gt&v1=20100101T060000+00
This approach has an advantage of specifying simple logical operators, e.g.::
&lop0_1=AND
which would indicate that the logical operator between the first and second
query elements is "AND". This gets messy pretty quickly though when
considering precedence rules.
Metacat Path Query
------------------
.. TODO::
- Rewrite this section to use the EarthGrid query syntax, which is more
readable and expresses the same concepts as the pathquery
Metacat is an XML database, and so must support mechanisms for querying not
just the attribute name, but also its location relative to other elements of
the document (similar to XPath). The path query also indicates the elements
that will be returned in the response. An `example path query`_::
unspecified
unspecified
dataset/title
keyword
originator/individualName/surName
eml://ecoinformatics.org/eml-2.0.1
eml://ecoinformatics.org/eml-2.0.0
Plant
dataset/title
plant
keyword
This query states something like return the field values ``dataset/title``,
``keyword``, and ``originator/individualName/surName`` from documents where
the string "plant" appears in the ``keyword`` attribute or the string "Datos"
appears in the ``dataset/title`` attribute. The comparisons are performed
without consideration of case.
Since path queries are expressed as XML documents, they can get quite large
and so can be unwieldy when sending over a HTTP GET request. However, the
types of queries that can be created can be quite precise and expressive, so
these should be supported by the CN services, which shouldn't involve much
more than passing the query through to the Metacat instance operating as the
document store on the CN.
.. _example path query: https://code.ecoinformatics.org/code/metacat/trunk/docs/user/metacatquery.html
SOLR Query Syntax
-----------------
- http://wiki.apache.org/solr/SolrQuerySyntax
- http://lucene.apache.org/java/2_4_0/queryparsersyntax.html
Query Attributes
----------------
- Best if query attributes were consistent across all the query types
- Distinction between searches against system metadata and science metadata
(though some overlap of attributes)
- Log searches can probably be pretty simple - just slicing by time
- MNs and CNs should support introspection that lists the supported query
types and the supported query attributes
Misc Notes
Google visualization api query language: http://code.google.com/apis/visualization/documentation/querylanguage.html
SRU/SRW and CQL: http://www.loc.gov/standards/sru/
OpenSearch: http://www.opensearch.org/Home
XPath: http://www.w3.org/TR/xpath and XQuery: http://www.w3.org/TR/xquery/
(appropriate for querying against a general XML model)
SPARQL (assuming you can express content in an RDF model):
http://www.w3.org/TR/rdf-sparql-query/
TAPIR:
http://www.tdwg.org/dav/subgroups/tapir/1.0/docs/TAPIRSpecification_2008-02-07.html
MetaCat (EarthGRID):
https://code.ecoinformatics.org/code/metacat/trunk/docs/user/metacatquery.htmlq¥…q¦}q§(hU hhžubaubeubahU Utransformerq¨NU
footnote_refsq©}qªUrefnamesq«}q¬Usymbol_footnotesq]q®Uautofootnote_refsq¯]q°Usymbol_footnote_refsq±]q²U citationsq³]q´h&hUcurrent_lineqµNUtransform_messagesq¶]q·cdocutils.nodes
system_message
q¸)q¹}qº(hU h}q»(h]UlevelKh!]h ]Usourcehh]h#]UlineKUtypeUINFOq¼uh]q½h[)q¾}q¿(hU h}qÀ(h]h]h ]h!]h#]uhh¹h]qÁh/X- Hyperlink target "index-0" is not referenced.qÂ…qÃ}qÄ(hU hh¾ubahh^ubahUsystem_messageqÅubaUreporterqÆNUid_startqÇKU
autofootnotesqÈ]qÉU
citation_refsqÊ}qËUindirect_targetsqÌ]qÍUsettingsqÎ(cdocutils.frontend
Values
qÏoqÐ}qÑ(Ufootnote_backlinksqÒKUrecord_dependenciesqÓNUrfc_base_urlqÔUhttps://tools.ietf.org/html/qÕU tracebackqÖˆUpep_referencesq×NUstrip_commentsqØNU
toc_backlinksqÙUentryqÚU
language_codeqÛUenqÜU datestampqÝNUreport_levelqÞKU_destinationqßNU
halt_levelqàKU
strip_classesqáNh,NUerror_encoding_error_handlerqâUbackslashreplaceqãUdebugqäNUembed_stylesheetqå‰Uoutput_encoding_error_handlerqæUstrictqçU
sectnum_xformqèKUdump_transformsqéNU
docinfo_xformqêKUwarning_streamqëNUpep_file_url_templateqìUpep-%04dqíUexit_status_levelqîKUconfigqïNUstrict_visitorqðNUcloak_email_addressesqñˆUtrim_footnote_reference_spaceqò‰UenvqóNUdump_pseudo_xmlqôNUexpose_internalsqõNUsectsubtitle_xformqö‰Usource_linkq÷NUrfc_referencesqøNUoutput_encodingqùUutf-8qúU
source_urlqûNUinput_encodingqüU utf-8-sigqýU_disable_configqþNU id_prefixqÿU U tab_widthr KUerror_encodingr UUTF-8r U_sourcer hUgettext_compactr ˆU generatorr NUdump_internalsr NUsmart_quotesr ‰Upep_base_urlr U https://www.python.org/dev/peps/r Usyntax_highlightr
Ulongr Uinput_encoding_error_handlerr hçUauto_id_prefixr
Uidr Udoctitle_xformr ‰Ustrip_elements_with_classesr NU
_config_filesr ]r Ufile_insertion_enabledr ˆUraw_enabledr KU
dump_settingsr NubUsymbol_footnote_startr K Uidsr }r (hhh9h;uUsubstitution_namesr }r hh&h}r (h]h!]h ]Usourcehh]h#]uU footnotesr ]r Urefidsr }r h9]r h4asub.