ࡱ> gif7 (bjbjUU &F7|7| blZZZZ\NL (t t t t <<<9N;N;N;N;N;N;N$O Qt_N<9.<<<_N@ZZt t EtN@@@<Z^t t 9N@<9N@@H:eM,NMt @ `]94=M MLN0NMR@R?*RM@ZZZZBuilding EcoGrid Query System Version 1.0 Bing Zhu San Diego Supercomputer Center Jing Tao National Center for Ecological Analysis and Synthesis June 11, 2003 Table of Contents EcoGrid Data Model Analysis EcoGrid Query Tokenizer Query Engines 1. EcoGrid Data Model Analysis Within the current EcoGrid, two major data management systems have been used. They are Metacat [1] developed at National Center for Ecological Analysis and Synthesis and SRB [2] developed at San Diego Supercomputer Center. SRB has been deployed to and has been used by scientific research projects such as Long Term Ecological Research project that uses SRB to store image files and user-defined metadata [3]. SRB stores data as files in a virtual hierarchical collections and sub-collections structure. Each file has its own system metadata such as owner, create date and file size. Thus the SRB system can be viewed as a virtual XML document that has collection(s) and file(s) as sub-nodes. The system metadata will serve as attributes for either collection or file. An extra attribute, object type, is needed SRB to indicate whether a node is a collection, a file, or a metadata. SRB also allow users to add their own metadata for each file. Each metadata will be a sub-node under a file or a collection. Metacat is a network-enable metadata and data storage framework. It allows users store, query and retrieve XML documents with arbitrary schemas. In Metacat, a XML file (Metadata) is treated as a document and it is stored in a relational database system. The entire documents can be thought as one collection of documents. When user searches data in Metacat, it will find all documents in the collection which match the search criteria and will return the document ids to the user. Based on the above analysis of both SRB and Metacat, the EcoGrid data model is suitable for implementing either XML-based query suggested by Matt Jones, or XQuery, which is officially proposed and standardized by W3C community [4]. The Metacat already has an implementation of XML-based query for EML data models. The following is an example composed by Matt Jones. Soils metadata query %soil% %soil% %Jones% %Vieglais% Metacat has some JAVA code which can be revised to process the above MXL-Query document into a SQL query. The SQL query will be submitted to a RDBMS such as Oracle. Since SRBs MCAT is also running on top of a RDBMS, this implementation should be easily adopted by SRB. This approach will provide a foundation for our implementation for our EcoGrid search system as we will describe it in details later. Here we immediate face a question that what is the purpose of using XQuery. One apparent reason is that it is not easy for user to construct their query into an XML document as described above. On the other hand, XQuery is a de-facto standard query language which is similar to SQL language and therefore will be very easy for software developers and any users with SQL knowledge to build their searches using XQuery. As we are dealing with heterogeneous data sources in EcoGrid, we realized that there is a need for a common layer in our software stack for building a query system within EcoGrid. This layer should be designed and implemented in a way that can be easily integrated with either one of the data management systems. And using XQuery as a common search language in SEEK EcoGrid will serve this purpose. The XQuery acts as a language-neutral implementation layer for underneath heterogeneous systems such as Metacat and SRB. The EcoGrid Tokenizer is designed to translate the XQuery input into XM-Query document which can be consumed by both Metacat and SRB search engine. The following diagram illustrates the overall search flow within the current EcoGrid.        2. EcoGrid Query Tokennizer The EcoGrdi Query Tokenizer is one of the two key components in our overall implementation. (Another one is the query engine.) Our tokenizer is designed in our search system to translate XQueries into XML query documents as we shown in the example in EcoGrid Data Model Analysis. When our Tokenizer receives an XQuery from a user or an application, it will decompose the XQuery input and create object(s) each of which has a list of combination of information such as action mode, value, and parameters. The following is an example of XQuery search on EcoGrid for Metacat. declare namespace eml=eml://ecoinformatics.org/eml-2.0.0 for $e in input() where $e//dataset/creator/surName=Johnson and $e//dataset/pubDate=09-23-2002 return $e When the EcoGrid Query Toeknizer receives the above query, it decomposes the query into a top query object which has child objects of Term and an attribute, Relation, to indicate the database operation. In the following diagram, Term1 and Term2 have Intersect (AND) relationship (Union will be OR relationship).   For more complicated case, an object can be embedded into another object. So the query data structure has the ability to handle subquery.  SHAPE \* MERGEFORMAT  Figure 3 Objects Embedded in Another Object After all objects are constructed, they will be serialized into a XML Query document by Toeknizer. The XML Query document will then be sent to both Metacat and SRB search engines. In building object(s) for underneath systems, the Tokennizer needs to query the EcoGrid registry to dynamically find the search nodes from the XML-Query document or an XQuery statement. One example is that it has to find the search nodes in the following XQuery example. declare namespace xlink = "http://www.w3.org/1999/xlink" { for $hr in input()//@xlink:href return { $hr } } Once the object(s) is/are constructed, it/they will be serialized into an XML-Query document which then will be submitted to both Metacat and SRB search engines. 3. Query Engines Both Metacat team and SRB team are responsible for building Query engines to translate XML-Query document input from Tokenizer into native queries for either system. The search results will be simply a list of document names from metacat and a list of file names with collection paths in SRB. The search in SRB is mainly performed on system metadata and user-defined metadata. Since SRB allows users to add any metadata with format of attribute-value-[unit] for a file, the SRB Query engine will translate the SQL statements into a local search format for SRB APIs (similar to Smeta) which in turn will be translated into SQL statement by a SRB server and will be submitted to the RDBMS in which SRB MCAT tables are installed. Currently Metacat has an existing implementation for similar XML-Query document. The implementation only need be revised slightly. The following is an example of the input and output of Metacat Query Engine in processing an XQuery statement.  Reference [1] Metacat: the flexible XML database, http://knb.ecoinformatics.org/software/metacat [2] Storage Resource Broker, http://www.sdsc.edu/DICE/SRB. [3] San Diego Supercomputer Center Releases New Version of Widely Used Data. Management Software , http://www.sdsc.edu/Press/03/031003_srb3.html. [4] XML Query, http://www.w3.org/XML/Query.  SHAPE \* MERGEFORMAT  Tokenizer Metacat Query engine SRB Query engine Query Front-end Query for Metacat Query for SRB XQuery Figure 2 Construct an Object in Tokenizer Object3 Relation: Union EcoGrid Registry Figure 1. Query procedure in EcoGrid Metacat data source SRB Data source Object1 SQL Result Set declare namespace eml=eml://ecoinformatics.org/eml-2.0.0 for $e in input() where $e//dataset/creator/surName=Johnson and $e//dataset/pubDate=09-23-2002 return $e Metacat Native Query Engine (Parsing XML and Constructing SQL) XML-Query Document Client Metacat Data Source XML-Query Document Tokenizer Object1 Relation: Intersect Term1 Param: dataset/creator/surname Value: Johnson Mode: equals Term2 Param: dataset/pubDate Value: 09-23-2002 Mode: equals Object2 Figure 5 Tokenizer and Metacat Search Engine Result Set p +,-.123467?PUVMNefgi""$$$$$$$$$$$$$%%%%%%^%o%%%%%%& 'ƵCJaJ(CJOJPJQJ^JaJmH%nH sH%tH CJ!jB*UmHnHphtH u jU B*ph * B*ph%jB*CJUmHnHphtH u * jCJUmHnHtH umH%sH%<*+4ST]   & Fh^h$a$$a$$ ( e f E C=n12'()*+-/135689:;<= `=>?QRSTUWXYuvq4UtuMjkMN]^0AVW$a$W  2 3 !!o"p"""""""""""""""""""$a$"""""""""""""""######### # # # # ##$a$##############%#&#Q###$a$$$$$$$$$$a$$$$$%%%%%%%D%E%M%]%^%o%p%%%%%%%%%%% $7$8$H$a$$a$%%%%&%&U&v&&&&&&&&&&&&&' '''''('0'D'E'$a$ ''(CJE'K'j'y'''''''''''''( ( ( (($a$$&P1h/ =!"#$%Dd +qD  3 @@"? i<@< NormalCJ_HaJmH nHsH tH0@0 Heading 1$@&>*<A@< Default Paragraph Font.>@. Title$a$ 5CJ \L@ Date.U@. Hyperlink >*B*ph !3DWfo!--AI^r}#$TUa$   tpozDFfgh !3DWfo!--AI^r}#$TUad     $F*+4ST]efE C = n   12'()*+-/135689:;<=>?QRSTUWXYuvq4UtuMjkMN]^0AVW  23op     %&Q a !!!!!!!D!E!M!]!^!o!p!!!!!!!!!!!!!!"%"U"v"""""""""""""# ###'#(#0#D#E#K#j#y#############$ $ $$0000000000000 0 0 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000Y0Y0Y0Y0Y0Y0Y0Y0000000000000000000000000000000000000000000000000000000000000000000000000000000 '(! =W"#$%E'( " (Meh $8@<0 (  F  "! #  s"*?`  c $X99? "!h  3  "`\ cH!9!  h  3 "`PX  h  3 "`LT  \ p% s #" h f 3 "`p% n g C g"` 53  h h 3 "`3c$  ` o c $   r  6 . Z  S ̙ ( Z  S $ TB @ c $D#r  6 TB  c $DBb   3 Bb   3 r   6 / \ '+o x #" ~   6 '+' `B  c $DQQ`B  c $D==o\ "_( w #" ~   6 "m_(q6 `B  c $Dq%q%'"`B  c $D%%l  0   l  0   TB  c $D TB l c $D TB m c $Df p s * p  ZB q s *D f t s *t NB u S D NB v S D N  3  NB  S D B@ p(*- ~L p(p) p(p)t D S D"`P X 2 \b E 3 "`!@p)0t F S F"`8$h(7 T z # z|` TB { C D}ZB |B S DK!TB B C D%%ZB  S D%n  C "` (J ZB  S Dco  0"`%  n  C "`p  n  C "`9 h  3 "`@ h9 ZB  S D $ZB  S D h  3  "`| !  l  c $z +v *- B S  ?+-136?@ABCDEFGHIJKLMNOUf$p~ ^tq ZZt} DtXytp\t$$t tv{N{dtu0txt88tlt$=tlpptx v5twg_tm]]t Mt tmq thvDt3to HtsYi tt~#}t t!b%t*q E T  C [  = T n !!$p D S  B Z  < S m   56U[nu !!!!""%"*"V"W"v"|"##$333333333333333333O !!!!!E!\!^!n!!!!!!!!!!!"""""# ###&#(#C#####$ $$ Bing ZhuWC:\R&D\SEEK\seek_cvs\seek\projects\ecogrid\docs\QueryInterface\queryTokenizerEngine.doc Bing ZhuWC:\R&D\SEEK\seek_cvs\seek\projects\ecogrid\docs\QueryInterface\queryTokenizerEngine.doc Bing ZhuWC:\R&D\SEEK\seek_cvs\seek\projects\ecogrid\docs\QueryInterface\queryTokenizerEngine.doc Bing ZhuWC:\R&D\SEEK\seek_cvs\seek\projects\ecogrid\docs\QueryInterface\queryTokenizerEngine.doc Bing ZhuWC:\R&D\SEEK\seek_cvs\seek\projects\ecogrid\docs\QueryInterface\queryTokenizerEngine.doc Bing ZhuWC:\R&D\SEEK\seek_cvs\seek\projects\ecogrid\docs\QueryInterface\queryTokenizerEngine.doc Bing ZhuWC:\R&D\SEEK\seek_cvs\seek\projects\ecogrid\docs\QueryInterface\queryTokenizerEngine.doc Bing ZhuWC:\R&D\SEEK\seek_cvs\seek\projects\ecogrid\docs\QueryInterface\queryTokenizerEngine.doc Bing ZhuWC:\R&D\SEEK\seek_cvs\seek\projects\ecogrid\docs\QueryInterface\queryTokenizerEngine.doc Bing ZhulC:\Documents and Settings\bing\Application Data\Microsoft\Word\AutoRecovery save of queryTokenizerEngine.asdEbz4) ^`o(hH[] ^`hH. pLp^p`LhH. @ @ ^@ `hH. ^`hH. L^`LhH. ^`hH. ^`hH. PLP^P`LhH.^`o(.^`.pLp^p`L.@ @ ^@ `.^`.L^`L.^`.^`.PLP^P`L.zEb4                  D###$ $$a0a0a0a0a0a0@**G**$@@UnknownGz Times New Roman5Symbol3& z Arial; Batang"h{vv& 9;!x20d!  3QXDate: June 10, 2003 Bing Zhu Bing ZhuFZZZZ  pm06_10_2003 withJing_revise.doc##Oh+'0   < H T`hpxDate: June 10, 2003ate Bing Zhu 1BinBin Normal.dot1 Bing Zhu127nMicrosoft Word 9.0@_@L3@2X94՜.+,0 hp|   9 ! Date: June 10, 2003 Title  !"#%&'()*+-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUWXYZ[\]_`abcdehRoot Entry F0/]94jData $1Table,RWordDocument&FSummaryInformation(VDocumentSummaryInformation8^CompObjjObjectPool0/]940/]94  FMicrosoft Word Document MSWordDocWord.Document.89q