.. _UC02: Use Case 02 - List PIDs By Search ---------------------------------- .. index:: Use Case 02, List, Search, Query .. contents:: Contents :local: Goal ~~~~ Get list of PIDs from metadata search (anonymous and authenticated). Summary ~~~~~~~ A user performs a search against the DataONE system and receives a list of object identifiers (PIDs) that match the search criteria. The list of PIDs is filtered such that only objects for which the user has read permission will be returned. Content discovery in DataONE is achieved primarily through the service interfaces provided by the Coordinating Nodes. Other systems may index content available in DataONE (:doc:`UC34<34_uc>`), though the operation of those systems is out of scope for DataONE operations except that the exposed APIs enable such functionality. Actors ~~~~~~ - Client performing search operation - Coordinating Node .. uml:: @startuml images/02_uc.png actor User usecase "12. Authentication" as authen actor "Coordinating Node" as CN usecase "13. Authorization" as author usecase "02. Search Metadata" as SEARCH User -- SEARCH CN -- SEARCH SEARCH ..> author: <> SEARCH ..> authen: <> @enduml **Figure 1.** Actors and dependencies for Use Case 02. Preconditions ~~~~~~~~~~~~~ - Client has authenticated to at the desired level (e.g. client may not have authenticated, so access might be anonymous). Triggers ~~~~~~~~ - A search is performed against the DataONE system Post Conditions ~~~~~~~~~~~~~~~ - The client has a list of PIDs (:class:`Types.objectList`) for which they have permission to read and match the supplied query or an error condition. - The log is updated with information about the request Implementation ~~~~~~~~~~~~~~ .. uml:: @startuml images/02_seq.png participant "Client" as app_client << Application >> participant "Query API" as c_query << Coordinating Node >> participant "Authentication API" as c_authenticate << Coordinating Node >> participant "Read API" as c_crud << Coordinating Node >> app_client -> c_query: search(session, query) activate c_query c_query -> c_query: search -> objectList note right of c_query The query response is a list of PIDs. Each ID needs to be checked for read access by the authenticated user. end note loop for pid in objectList c_query -> c_authenticate: isAuthorized(session, pid, OP_GET) c_query <-- c_authenticate: T or F end c_query --> c_crud: log app_client <-- c_query: objectList deactivate c_query @enduml **Figure 2.** Interaction diagram for Use Case 02. The process for determining READ access is for illustration purposes only. Actually implementation may vary (e.g. by augmenting the query used for searching). Examples ~~~~~~~~ Search is implemented by the Coordinating Nodes and optionally by Member Nodes. Two discovery endpoints are provided by Coordinating Nodes: query and search. The search endpoint provides a response that is more constrained than the search endpoint, with only an ObjetList structure being returned. It is recommended that general searches be performed against the query endpoint. The following examples assume a Coordinating Node base URL is set in the ${NODE} variable, for example: .. code-block:: bash export NODE="https://cn.dataone.org/cn" .. Note:: For more example queries and detailed description of the various fields, please visit :doc:`/design/SearchMetadata` .. Note:: The actual response XML may be more compressed than the examples below show. For easier viewing, pipe the response throug the xmlstarlet_ command line tool using the format ("fo") option. For example: .. code-block:: bash curl ${NODE}/v1/query | xml fo Discover Available Query Engines ................................ To discover the query engines (search indexes) supported on the node: .. code-block:: xml $ curl ${NODE}/v1/query solr logsolr The response show two query engines available ``solr`` and ``logsolr``. The ``solr`` query engine provides access to content (data, metadata, resource maps) that have been indexed by the Coordinating Nodes. The ``logsolr`` endpoint provides access to log records that have been aggregated by the Coordinating Nodes. List Search Fields Offered .......................... To determine the search fields provided by a query engine, append the value of a ``queryEngine`` element to the url: .. code-block:: xml $ curl ${NODE}/v1/query/solr 3.4.0.2011.09.20.17.19.53 1.1 solr https://releases.dataone.org/online/api-documentation-v1.2.0/ abstract The full text of the abstract as provided in the science metadata document. text true true true false attribute Multi-valued field containing the text from attributeName, attributeLabel, attributeDescription, attributeUnit fields into a single searchable text field. text true true true true attributeDescription Multi-valued field containing the attribute descriptive text. text true true true true attributeLabel Multi-valued field containing secondary attribute name information. string true true true true ... Full Text Search ................ The solr endpoint supports standard solr_ query syntax and construct. To search all text for the string "water", the query "text:water" could be used. Expressed as a command line request: .. code-block:: xml $ curl "${NODE}/v1/query/solr/?q=text:water" 0 5 text:water 1 ... which indicates there were 139455 matches. The response is the standard solr XML response (json may be returned by adding ``&wt=json`` to the url), with ```` elements holding the actual response records. Limiting Returned Fields ........................ The default solr response returns all fields of the doc records which can be quite verbose. To limit the response, against the standard solr syntax is used with the ``fl`` parameter. For example, to return only the record identifier (PID) and the date the system metadata was last modified: .. code-block:: bash $ curl "${NODE}/v1/query/solr/?q=text:water&fl=id,dateModified" .. code-block:: xml 0 4 id,dateModified text:water 5 2015-03-20T23:18:10.507Z https://pasta.lternet.edu/package/metadata/eml/knb-lter-gce/249/34 2012-06-26T13:50:33.75Z doi:10.6073/AA/knb-lter-gce.249.23 2012-06-26T13:51:00.556Z doi:10.6073/AA/knb-lter-gce.249.16 2012-06-26T13:50:21.131Z doi:10.6073/AA/knb-lter-gce.249.19 ... Paging Response Records ....................... The solr ``rows`` parameter limits the number of records that are returned in a response, and the ``start`` parameter indicates the 0-based offset of the first records from the start of the set of matching results. For example the second page of records with five results per page would use ``start=5`` and ``count=5``: .. code-block:: bash $ curl "${NODE}/v1/query/solr/?q=text:water&fl=id,dateModified&start=5&rows=5" .. code-block:: xml 0 3 id,dateModified 5 text:water 5 2012-06-26T13:51:00.556Z doi:10.6073/AA/knb-lter-gce.249.16 2012-06-26T13:50:21.131Z doi:10.6073/AA/knb-lter-gce.249.19 2012-06-26T13:49:54.779Z doi:10.6073/AA/knb-lter-gce.249.21 2012-06-26T13:49:54.409Z doi:10.6073/AA/knb-lter-gce.249.22 2012-06-26T17:09:59.721Z doi:10.6073/AA/knb-lter-gce.249.17 .. _xmlstarlet: http://xmlstar.sourceforge.net/ .. _solr: https://lucene.apache.org/core/2_9_4/queryparsersyntax.html