Introduction

This tutorial covers making use of Jackrabbit to enable full text search across many supported content types.

We look at the following topics:

  • Jackrabbit in the brix-demo sample project
  • browsing the Jackrabbit repository
  • some simple search examples

Jackrabbit in brix-demo

Brix is configured out of the box to index and search content using Apache Jackrabbit. The brix-demo project which ships with Brix demonstrates this. As the project follows Maven 2 standards, the file used to configure the repository ‘repository.xml’ can be found in ‘src/main/resources/brix/demo’.

 

Covering the repository configuration encapsulated in this file is outside of the scope of this tutorial, there is nothing non standard about this instance of ‘repository.xml’. If you would like to learn more about it you can visit Apache Jackrabbit.

The part of this file that we are concerned with is:

        <!--
            Search index and the file system it uses.
            class: FQN of class implementing the QueryHandler interface
        -->
        <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
            <param name="path" value="${wsp.home}/index"/>
            <param name="textFilterClasses" value="org.apache.jackrabbit.extractor.MsWordTextExtractor,
              org.apache.jackrabbit.extractor.MsExcelTextExtractor,
              org.apache.jackrabbit.extractor.MsPowerPointTextExtractor,
              org.apache.jackrabbit.extractor.PdfTextExtractor,
              org.apache.jackrabbit.extractor.OpenOfficeTextExtractor,
              org.apache.jackrabbit.extractor.RTFTextExtractor,
              brix.jcr.jackrabbit.HtmlTextExtractor,
              org.apache.jackrabbit.extractor.XMLTextExtractor"/>
            <param name="extractorPoolSize " value="2"/>
            <param name="supportHighlighting" value="true"/>
        </SearchIndex>

This allows us to search a given workspace across content of type:

  • MS Word
  • MS Excel
  • MS Powerpoint
  • PDF
  • Open Office
  • RTF
  • HTML
  • XML
  • This list can be extended by creating new custom extractors.

As documents are placed in the repository, or updated, they are indexed and re-indexed, filtered through the extractors made available to the ‘Search Index’ element.

Once artifacts have been indexed, we can either browse them, or search for them.

Browsing the Jackrabbit Repository

Browsing the repository tree could be considered a simple form of searching. For example if you need to list all documents located at a given path. To provide an example of browsing the repository, we are going to update the site structure of the brix-demo webapp.

Create a folder called ‘documents’ in the admin panel at the same level as the ‘images’ folder. Under this new folder lets imagine that your web application needs to store a series of documents under different categories. Create two new folders, one called ‘heroes’ and one called ‘villains’. The result should look as below.

Now categorise your documents by placing them in the appropriate folder.

In order to implement a simple repository browsing example you can create a new tile. A good place to start is by looking at the Guest Book Tile Tutorial.

Our tile will be very similar to the guest book tile, it will have a form and a submit button, but in place of the two text type inputs, you might want to use a drop down component to enable selection of either ‘heroes’ or ‘villains’ for browsing.

In an ‘onBrowse’ method called when the submit button is pressed we need too get a handle on a JCR session.

import javax.jcr.Session;

import brix.workspace.Workspace; 
import brix.workspace.WorkspaceManager; 

import javax.jcr.Node; 
import javax.jcr.NodeIterator; 

... 
... 

JcrNode tile = (JcrNode)getDefaultModelObject();
JcrSession session = tile.getSession();
//Now you can work with the session object to browse the tree at any given path. 
try 
{ 
  JcrNode root = session.getRootNode(); 
  JcrNode myFolderNode = root.getNode(SitePlugin.get().getSiteRootPath()+'/documents/' + yourDropDown.getModelObject()); 
  NodeIterator children = myFolderNode.getNodes(); 
  while (children.hasNext()) 
  { 
    Node child = children.nextNode(); 
    // Do whatever you want with the results here 
    logger.info('node name : ' + child.getName() + ' , node path : ' + child.getPath()); 
  } 
} 
catch (Exception ex) 
{ 
  logger.error("Exception occurred : " + ex.getMessage()); 
}

Plug the tile into a page, make your selection in the drop down, press submit, and you should see some debug in the terminal.

This is a unique website which will require a more modern browser to work!

Please upgrade today!