<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>16. Enabling Web Searches: Sitemaps — Metacat 2.19.0 documentation</title> <link rel="stylesheet" href="_static/bootstrap.min.css" type="text/css" /> <link rel="stylesheet" href="_static/font-awesome/css/font-awesome.min.css" type="text/css" /> <link rel="stylesheet" href="_static/pygments.css" type="text/css" /> <link rel="stylesheet" href="_static/metacatui.css" type="text/css" /> <script type="text/javascript"> var DOCUMENTATION_OPTIONS = { URL_ROOT: './', VERSION: '2.19.0', COLLAPSE_MODINDEX: false, FILE_SUFFIX: '.html', HAS_SOURCE: true }; </script> <script type="text/javascript" src="_static/jquery.js"></script> <script type="text/javascript" src="_static/underscore.js"></script> <script type="text/javascript" src="_static/doctools.js"></script> <link rel="index" title="Index" href="genindex.html" /> <link rel="search" title="Search" href="search.html" /> <link rel="top" title="Metacat 2.19.0 documentation" href="index.html" /> <link rel="prev" title="15. Event Logging" href="event-logging.html" /> <link rel="next" title="17. Appendix: Metacat Properties" href="metacat-properties.html" /> </head> <body> <div id="metacatDocs"> <div class="banner"> <a href="index.html"><img class="logo" src="_static/metacat-logo-white.png" /></a> <a href="index.html"><h1 class="title">Metacat: Metadata and Data Management Server</h1></a> <img class="logo-right" src="_static/nceas-logo-white.png" /> </div> <div class="related"> <h3>Navigation</h3> <ul> <li class="right"> <span id="searchbox" style="display: none;"> <form class="search" action="search.html" method="get"> <input type="text" name="q" size="18" /> <input type="submit" value="Go" class="icon-search"/> <input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="area" value="default" /> </form> </span> </li> <script type="text/javascript">$('#searchbox').show(0);</script> <li class="right"> <a href="genindex.html" title="General Index" accesskey="I">index</a> </li> <li class="right"> <a href="metacat-properties.html" title="17. Appendix: Metacat Properties" accesskey="N">next</a> </li> <li class="right"> <a href="event-logging.html" title="15. Event Logging" accesskey="P">previous</a> </li> <li class="breadcrumb first"><a href="index.html">Metacat 2.19.0 documentation</a> »</li> </ul> </div> <div class="document"> <div class="documentwrapper"> <div class="bodywrapper"> <div class="body"> <div class="section" id="enabling-web-searches-sitemaps"> <h1>16. Enabling Web Searches: Sitemaps<a class="headerlink" href="#enabling-web-searches-sitemaps" title="Permalink to this headline">¶</a></h1> <p>Sitemaps are XML files that tell search engines - such as Google, which is discussed in this section - which URLs on your websites are available for crawling. Currently, the only way for a search engine to crawl and index Metacat so that individual metadata entries are available via Web searches is with a sitemap. Metacat automatically creates sitemaps for all public documents in the repository that meet these criteria:</p> <ul class="simple"> <li>Is publicly readable</li> <li>Is metadata</li> <li>Is the newest version in a version chain</li> <li>Is not archived</li> </ul> <p>However, you must register the sitemaps with the search engine before it will take effect.</p> <div class="section" id="configuration"> <h2>16.1. Configuration<a class="headerlink" href="#configuration" title="Permalink to this headline">¶</a></h2> <p>Metacat’s sitemaps functionality is controlled by four properties in metacat.properties.</p> <ul class="simple"> <li><code class="docutils literal"><span class="pre">sitemap.enabled</span></code>: Controls whether sitemaps are automatically generated while Metacat is running. Defaults to true.</li> <li><code class="docutils literal"><span class="pre">sitemap.interval</span></code>: Controls the interval, in milliseconds, between rebuilding the sitemap index and sitemap files.</li> <li><code class="docutils literal"><span class="pre">sitemap.location.base</span></code>: Controls the URL pattern used in the <code class="docutils literal"><span class="pre">sitemap_index.xml</span></code> file. You can use either a full URL (e.g., <code class="docutils literal"><span class="pre">https://example.com/some_path</span></code>) or a URL relative to your server (e.g., <code class="docutils literal"><span class="pre">/some_path</span></code>). This is different than the <code class="docutils literal"><span class="pre">sitemap.entry.base</span></code> property (see directly below).</li> <li><code class="docutils literal"><span class="pre">sitemap.entry.base</span></code>: Controls the URL pattern used for the entires in the individual sitemap files (e.g., <code class="docutils literal"><span class="pre">sitemap1.xml</span></code>). You can use either a full URL (e.g., <code class="docutils literal"><span class="pre">https://example.com/some_path</span></code>) or a URL relative to your server (e.g., <code class="docutils literal"><span class="pre">/some_path</span></code>).</li> </ul> </div> <div class="section" id="creating-a-sitemap"> <h2>16.2. Creating a Sitemap<a class="headerlink" href="#creating-a-sitemap" title="Permalink to this headline">¶</a></h2> <p>Metacat automatically generates a sitemap file for all public documents in the repository on a daily basis. The sitemap file(s) must be available via the Web on your server, and must be registered with Google before they take effect. For information on the sitemap protocol, please refer to the Google page on using the sitemap protocol. You can view Metacat’s sitemap files at:</p> <div class="highlight-default"><div class="highlight"><pre><span></span><span class="o"><</span><span class="n">your_web_context</span><span class="o">>/</span><span class="n">sitemaps</span> </pre></div> </div> <p>The directory contains an index file:</p> <blockquote> <div>sitemap_index.xml</div></blockquote> <p>and one or more sitemap XML files named:</p> <blockquote> <div>sitemap<X>.xml</div></blockquote> <p>where <code class="docutils literal"><span class="pre"><X></span></code> is a number (e.g., 1 or 2) used to increment each sitemap file. Because Metacat limits the number of sitemap entries in each sitemap file to 50,000, the servlet creates an additional sitemap file for each group of 50,000 entries.</p> <p>Verify that your sitemap files are available to the Web by browsing to:</p> <div class="highlight-default"><div class="highlight"><pre><span></span><span class="o"><</span><span class="n">your_web_context</span><span class="o">>/</span><span class="n">sitemaps</span><span class="o">/</span><span class="n">sitemap</span><span class="o"><</span><span class="n">X</span><span class="o">>.</span><span class="n">xml</span> <span class="p">(</span><span class="n">e</span><span class="o">.</span><span class="n">g</span><span class="o">.</span><span class="p">,</span> <span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">example</span><span class="o">.</span><span class="n">org</span><span class="o">/</span><span class="n">metacat</span><span class="o">/</span><span class="n">sitemaps</span><span class="o">/</span><span class="n">sitemap1</span><span class="o">.</span><span class="n">xml</span><span class="p">)</span> </pre></div> </div> </div> <div class="section" id="serving-your-sitemaps"> <h2>16.3. Serving Your Sitemaps<a class="headerlink" href="#serving-your-sitemaps" title="Permalink to this headline">¶</a></h2> <p>In most scenarios, you’ll want to take extra steps to make sure your sitemaps are served correctly so they’re available and indexable by Google. Because Metacat places sitemap XML files in <code class="docutils literal"><span class="pre"><your_web_context>/sitemaps</span></code>, you’ll need to configure your web server to serve these files.</p> <p>As an example, a sample configuration is presented for the Apache 2 web server that uses <cite>mod_rewrite</cite> to redirect clients accessing your sitemaps from the top level of your website to their location under the Metacat deployment context:</p> <p>(Note: Ensure <cite>mod_rewrite</cite> is enabled)</p> <div class="highlight-text"><div class="highlight"><pre><span></span>RewriteRule ^/(sitemap.+) /metacat/sitemaps/$1 [R=303] </pre></div> </div> <p>You should also ensure your <code class="docutils literal"><span class="pre">robots.txt</span></code> file correctly points to the location of the <code class="docutils literal"><span class="pre">sitemap_index.xml</span></code>. e.g., for example.org:</p> <p><code class="docutils literal"><span class="pre">robots.txt</span></code>:</p> <div class="highlight-text"><div class="highlight"><pre><span></span>User-agent: * Allow: / sitemap: https://example.org/sitemap_index.xml </pre></div> </div> </div> <div class="section" id="registering-a-sitemap"> <h2>16.4. Registering a Sitemap<a class="headerlink" href="#registering-a-sitemap" title="Permalink to this headline">¶</a></h2> <p>Before Google will begin indexing the public files in your Metacat, you must register the sitemaps. To register your sitemaps and ensure that they are up to date:</p> <ol class="arabic simple"> <li>Register for a Google Webmaster Tools account, and add your Metacat site to the Dashboard.</li> <li>From your Google Webmaster Tools site account, register your sitemaps. See the Google help site for more information about how to register sitemaps. Note: Register the full URL path to your sitemap files, including the <a class="reference external" href="http://">http://</a> (or <a class="reference external" href="https://">https://</a>) headers.</li> </ol> <p>Once the sitemaps are registered, Google will begin to index the public documents in your Metacat repository.</p> <p>NOTE: As you add more publicly accessible data to Metacat, you will need to periodically revisit the Google Webmaster Tools utility to refresh your sitemap registration.</p> </div> </div> </div> </div> </div> <div class="clearer"></div> </div> <div class="footer"> <div class="footerNav"> <div class="related"> <h3>Navigation</h3> <ul> <li class="right"> <span id="searchbox" style="display: none;"> <form class="search" action="search.html" method="get"> <input type="text" name="q" size="18" /> <input type="submit" value="Go" class="icon-search"/> <input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="area" value="default" /> </form> </span> </li> <script type="text/javascript">$('#searchbox').show(0);</script> <li class="right"> <a href="genindex.html" title="General Index" >index</a> </li> <li class="right"> <a href="metacat-properties.html" title="17. Appendix: Metacat Properties" >next</a> </li> <li class="right"> <a href="event-logging.html" title="15. Event Logging" >previous</a> </li> <li class="breadcrumb first"><a href="index.html">Metacat 2.19.0 documentation</a> »</li> </ul> </div> </div> <div class="small-print"> © Copyright 2012, Regents of the University of California. Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 1.6.7. </div> </div> </div> </body> </html>