Full-Text Search with XAP 12.1

Nowadays applications require making text search against the data stored in databases. For this purpose, we can use well-known tools like ElasticSearch, Solr or Hibernate Search.

However, these tools require additional integration and operation effort or they are dependent on particular technology. If you need to execute full-text search queries on your data stored in XAP without any additional integration, you can use our new XAP feature Full Text Search, which we released in version 12.1.

XAP Full-Text Search Features

Our newest XAP version includes the following Full-Text Seach features:

  • Schema free index capability
  • Flexible customization of Tokenization, Stop words, Steaming
  • Indexing nested properties
  • Combining text search and standard predicates
  • Supported queries:
    • Keyword matching
    • Search for phrase
    • Wildcard matching
    • Proximity matching
    • Range searching
    • Boosting a term
    • Regular expressions
    • Fuzzy search

    DEMO XAP FULL-TEXT SEARCH


XAP Full-Text Search Architecture

To leverage full-text search capabilities, XAP uses Lucene under the hood. Each space partition has its own Lucene index held on the same node. When a user writes an indexed document to the space, the document is not flushed to the Lucene index immediately. It is flushed only after the search or after overflowing the buffer (see configuration property lucene.max-uncommitted-changes in Xap Full Text Search documentation).

If the partition has a backup, the backup is synchronizer as usual, and the Lucene backup index is synchronized too. When backup becomes primary the Lucene index is ready for the search.

To use Full-Text Search, you can either specify if a particular field is indexed or not. In case the field doesn’t have index it makes the full scan of the space and for each object creates in-memory Lucene index to check whether the field’s value meets the condition.

XAP Full text Search architecture

XAP Full-Text Search Example

First, in order to use full-text-search capabilities, you need to include the xap-full-text-search module in our app:

<dependency>
<groupId>org.gigaspaces</groupId>
<artifactId>xap-full-text-search</artifactId>
<version>12.1.0</version>
</dependency>
view rawExample hosted with ❤ by GitHub

Creating the Model

To get started with full-text search we will make queries against Reddit comments. So we have space model defined as following:

@SpaceClass
public class Comment {
private String id;
private String body;
private Author author;
private Boolean archived;
private Integer score;
@SpaceId
public String getId() {
return id;
}
@SpaceTextIndex // – (1)
public String getBody() {
return body;
}
@SpaceTextIndex(path = “name”) // – (2)
@SpaceTextAnalyzer(analyzer = KeywordAnalyzer.class) // – (3)
public Author getAuthor() {
return author;
}
// getters and setters go here
}

 

And Author model:

@SpaceClass
public class Author {
private String name;
// getters and setters go here
}
view rawAuthor model hosted with ❤ by GitHub

 

When the model is created, you can make full-text search queries against the model without specifying any additional annotations or creating indexes for the data. But to speed up the execution of the queries you can use annotation @SpaceTextIndex (1) to mark that the field is indexed. During write operation, the value of the field will be added to Lucene index.

To specify that nested object’s “name” field is indexed you can mark the field by annotation @SpaceTextIndex(path = “name”(2).

By default, all fields are analyzed by org.apache.lucene.analysis.standard.StandardAnalyzer Lucene built-in analyzer. To specify different implementation you can mark the field with @SpaceTextAnalyzer (3) annotation and provide the analyzer class. In case of nested fields, you have to specify “path” in annotation similar to indexing.

If no built-in analyzers meet your requirements, you can implement your own analyzer and make sure it is in the application classpath.

Writing Data to the Space

POJO Document

In order to insert the documents which are capable of Full-Text Search, you don’t need to change the existing code. Example of writing POJO document to space:

GigaSpace space = …

space.write(pojo);

Schema-Free Document

In case you don’t need to make field indexed neither specify the analyzer you don’t need to change existing code base:

SpaceTypeDescriptor typeDescriptor = new SpaceTypeDescriptorBuilder(“Comment”)
.idProperty(“id”)
.idProperty(“body”)
.create();
space.getTypeManager().registerTypeDescriptor(typeDescriptor);

To specify additional Full-Text Search information to the SpaceTypeDescriptor you can use addQueryExtensionInfo() method:

SpaceTypeDescriptor typeDescriptor = new SpaceTypeDescriptorBuilder(“Comment”)
.idProperty(“id”)
.idProperty(“body”)
.addQueryExtensionInfo(“body”, LuceneTextSearchQueryExtensionProvider.index()) // – (1)
.addQueryExtensionInfo(“author.name”, LuceneTextSearchQueryExtensionProvider.index()) // – (2)
.addQueryExtensionInfo(“author.name”, LuceneTextSearchQueryExtensionProvider.analyzer(KeywordAnalyzer.class)) // – (3)
.create();
space.getTypeManager().registerTypeDescriptor(typeDescriptor);

 

(1) To specify that field “body” is indexed you can use LuceneTextSearchQueryExtensionProvider.index() in conjunction with addQueryExtensionInfo method.

(2) To specify nested field you can split the field names with dot.

(3) To specify that field has specific analyzer you can use LuceneTextSearchQueryExtensionProvider.analyzer() in conjunction with addQueryExtensionInfo method.

Searching

Finally, assuming we’ve written some comments objects to the space, we can query them. Since the standard SQL comparisons (=, >, etc.) do not support the notion of text search operations, we’ve extended SQL in XAP to support the Full-Text Search. Use the “text:match” operator to instruct the SQL parser to use the full-text search. For example,

The following query finds all comments that contain the word “java”:

SQLQuery<Comment> query = new SQLQuery(Comment.class, “body text:match ?”);
query.setParameter(1, “java”);
Comment[] comments = space.readMultiple(query);
view rawSearching Java hosted with ❤ by GitHub

 

You can combine full text search with SQL conditionals in one query:

SQLQuery<Comment> query = new SQLQuery(Comment.class, “archived = ? AND body text:match ?”);
query.setParameter(1, true);
query.setParameter(2, “java”);
Comment[] comments = space.readMultiple(query);

 

To create more complicated queries you can use:

  • Wildcard query
    query.setParameter(1, “pro*”);
    It will find all the documents with the words starting from “pro”
  • OR/AND query
    query.setParameter(1, “java || scala”);
    It will find all the documents either with “java” or “scala” word.
    Note: The same could be with AND. For this specify &&.
  • NOT query
    query.setParameter(1, “java !xml”);
    It will find all the documents with “java” word but without “xml” word.
  • NOT query
    query.setParameter(1, “java !xml”);
    It will find all the documents with “java” word but without “xml” word.
  • Fuzzy Search query
    query.setParameter(1, “roam~”);
    It will find all the documents with the words similar to “roam” for example “foam” and “roams”

Note: For full list of queries refer to Lucene Query Parser Syntax documentation.

Example Application

Prerequisites:

  1. Clone git repository:
    git clone https://github.com/GigaSpaces-ProfessionalServices/full-text-search-demo
  2. Enter to the root project’s folder:
    cd full-text-search-demo
  3. Build the project:
    mvn clean package
  4. In separate terminal start gs-ui
    <XAP_HOME>/bin/gs-ui.sh
  5. In separate terminal start gs-agent
    <XAP_HOME>/bin/gs-agent.sh
  6. Make sure in XAP ui that gsa, 2 gsc, lus and gsm running

    XAP Full text Search Example Application

  7. Deploy space
    <XAP_HOME>/gs.sh deploy-space -cluster total_members=2,1 comments
  8. Make sure spaces are deployed

    XAP Full text Search architecture

  9. Run feeder:
    java -jar feeder/target/feeder.jar jini://*/*/comments xap-12.1.0 100000 RC_2015-01-short-100k
    Note: If you want to use all Reddit dataset follow
  10. Deploy Web app:
    <XAP_HOME>/bin/gs.sh deploy web/target/web.war
  11. Open http://localhost:8090/web
  12. Observe demo web app:

    XAP Full text Search demo web app

  13. On the left side, you can choose one of the pre-defined queries. On the right side, you can play with custom queries.

Where Can I Download XAP 12.1?

Want to know more about XAP 12.1? Visit our XAP 12.1 page to learn more.

Sponsored by GigaSpaces

You may also like...