Apache Solr UIMA Metadata Extraction Library
Introduction
------------
This module is intended to be used both as an UpdateRequestProcessor while indexing documents and as a set of tokenizer/filters
to be configured inside the schema.xml for use during analysis phase.
UIMAUpdateRequestProcessor purpose is to provide additional on the fly automatically generated fields to the Solr index.
Such fields could be language, concepts, keywords, sentences, named entities, etc.
UIMA based tokenizers/filters can be used either inside plain Lucene or as index/query analyzers to be defined
inside the schema.xml of a Solr core to create/filter tokens using specific UIMA annotations.
Getting Started
---------------
To start using Solr UIMA Metadata Extraction Library you should go through the following configuration steps:
1. copy generated solr-uima jar and its libs (under contrib/uima/lib) inside a Solr libraries directory.
   or set  tags in solrconfig.xml appropriately to point those jar files.
   
   
   
2. modify your schema.xml adding the fields you want to be hold metadata specifying proper values for type, indexed, stored and multiValued options:
   for example you could specify the following
  
  
  
3. modify your solrconfig.xml adding the following snippet:
  
    
      
        
          VALID_ALCHEMYAPI_KEY
          VALID_ALCHEMYAPI_KEY
          VALID_ALCHEMYAPI_KEY
          VALID_ALCHEMYAPI_KEY
          VALID_ALCHEMYAPI_KEY
          VALID_OPENCALAIS_KEY
        
        /org/apache/uima/desc/OverridingParamsExtServicesAE.xml
        
        true
        
        
          false
          
            text
          
        
        
          
            org.apache.uima.alchemy.ts.concept.ConceptFS
            
              text
              concept
            
          
          
            org.apache.uima.alchemy.ts.language.LanguageFS
            
              language
              language
            
          
          
            org.apache.uima.SentenceAnnotation
            
              coveredText
              sentence
            
          
        
      
    
    
    
  
   where VALID_ALCHEMYAPI_KEY is your AlchemyAPI Access Key. You need to register AlchemyAPI Access
   key to exploit the AlchemyAPI services: http://www.alchemyapi.com/api/register.html
   where VALID_OPENCALAIS_KEY is your Calais Service Key. You need to register Calais Service
   key to exploit the Calais services: http://www.opencalais.com/apikey
  
   the analysisEngine must contain an AE descriptor inside the specified path in the classpath
   the analyzeFields must contain the input fields that need to be analyzed by UIMA,
   if merge=true then their content will be merged and analyzed only once
   field mapping describes which features of which types should go in a field
4. in your solrconfig.xml replace the existing default (
    
      uima
    
  
Once you're done with the configuration you can index documents which will be automatically enriched with the specified fields