Full-Text Search with XAP 12.1

Nowadays applications require making text search against the data stored in databases. For this purpose, we can use well-known tools like ElasticSearch, Solr or Hibernate Search.

However, these tools require additional integration and operation effort or they are dependent on particular technology. If you need to execute full-text search queries on your data stored in XAP without any additional integration, you can use our new XAP feature Full Text Search, which we released in version 12.1.

XAP Full-Text Search Features

Our newest XAP version includes the following Full-Text Seach features:

Schema free index capability
Flexible customization of Tokenization, Stop words, Steaming
Indexing nested properties
Combining text search and standard predicates
Supported queries:
- Keyword matching
- Search for phrase
- Wildcard matching
- Proximity matching
- Range searching
- Boosting a term
- Regular expressions
- Fuzzy search
DEMO XAP FULL-TEXT SEARCH

XAP Full-Text Search Architecture

To leverage full-text search capabilities, XAP uses Lucene under the hood. Each space partition has its own Lucene index held on the same node. When a user writes an indexed document to the space, the document is not flushed to the Lucene index immediately. It is flushed only after the search or after overflowing the buffer (see configuration property lucene.max-uncommitted-changes in Xap Full Text Search documentation).

If the partition has a backup, the backup is synchronizer as usual, and the Lucene backup index is synchronized too. When backup becomes primary the Lucene index is ready for the search.

To use Full-Text Search, you can either specify if a particular field is indexed or not. In case the field doesn’t have index it makes the full scan of the space and for each object creates in-memory Lucene index to check whether the field’s value meets the condition.

XAP Full-Text Search Example

First, in order to use full-text-search capabilities, you need to include the xap-full-text-search module in our app:

<dependency>

<groupId>org.gigaspaces</groupId>

<artifactId>xap-full-text-search</artifactId>

<version>12.1.0</version>

</dependency>

view raw Example hosted with

by GitHub

Creating the Model

To get started with full-text search we will make queries against Reddit comments. So we have space model defined as following:

@SpaceClass

public class Comment {

private String id;

private String body;

private Author author;

private Boolean archived;

private Integer score;

@SpaceId

public String getId() {

return id;

}

@SpaceTextIndex // – (1)

public String getBody() {

return body;

}

@SpaceTextIndex(path = “name”) // – (2)

@SpaceTextAnalyzer(analyzer = KeywordAnalyzer.class) // – (3)

public Author getAuthor() {

return author;

}

// getters and setters go here

}

view raw Creating the Model hosted with

by GitHub

And Author model:

@SpaceClass

public class Author {

private String name;

// getters and setters go here

}

view raw Author model hosted with

by GitHub

When the model is created, you can make full-text search queries against the model without specifying any additional annotations or creating indexes for the data. But to speed up the execution of the queries you can use annotation @SpaceTextIndex (1) to mark that the field is indexed. During write operation, the value of the field will be added to Lucene index.

To specify that nested object’s “name” field is indexed you can mark the field by annotation @SpaceTextIndex(path = “name”) (2).

By default, all fields are analyzed by org.apache.lucene.analysis.standard.StandardAnalyzer Lucene built-in analyzer. To specify different implementation you can mark the field with @SpaceTextAnalyzer (3) annotation and provide the analyzer class. In case of nested fields, you have to specify “path” in annotation similar to indexing.

If no built-in analyzers meet your requirements, you can implement your own analyzer and make sure it is in the application classpath.

Writing Data to the Space

POJO Document

In order to insert the documents which are capable of Full-Text Search, you don’t need to change the existing code. Example of writing POJO document to space:

GigaSpace space = …

space.write(pojo);

Schema-Free Document

In case you don’t need to make field indexed neither specify the analyzer you don’t need to change existing code base:

SpaceTypeDescriptor typeDescriptor = new SpaceTypeDescriptorBuilder(“Comment”)

.idProperty(“id”)

.idProperty(“body”)

.create();

space.getTypeManager().registerTypeDescriptor(typeDescriptor);

view raw Schema-Free Document hosted with

by GitHub

To specify additional Full-Text Search information to the SpaceTypeDescriptor you can use addQueryExtensionInfo() method:

SpaceTypeDescriptor typeDescriptor = new SpaceTypeDescriptorBuilder(“Comment”)

.idProperty(“id”)

.idProperty(“body”)

.addQueryExtensionInfo(“body”, LuceneTextSearchQueryExtensionProvider.index()) // – (1)

.addQueryExtensionInfo(“author.name”, LuceneTextSearchQueryExtensionProvider.index()) // – (2)

.addQueryExtensionInfo(“author.name”, LuceneTextSearchQueryExtensionProvider.analyzer(KeywordAnalyzer.class)) // – (3)

.create();

space.getTypeManager().registerTypeDescriptor(typeDescriptor);

view raw addQueryExtensionInfo() Method hosted with

by GitHub

(1) To specify that field “body” is indexed you can use LuceneTextSearchQueryExtensionProvider.index() in conjunction with addQueryExtensionInfo method.

(2) To specify nested field you can split the field names with dot.

(3) To specify that field has specific analyzer you can use LuceneTextSearchQueryExtensionProvider.analyzer() in conjunction with addQueryExtensionInfo method.

Searching

Finally, assuming we’ve written some comments objects to the space, we can query them. Since the standard SQL comparisons (=, >, etc.) do not support the notion of text search operations, we’ve extended SQL in XAP to support the Full-Text Search. Use the “text:match” operator to instruct the SQL parser to use the full-text search. For example,

The following query finds all comments that contain the word “java”:

SQLQuery<Comment> query = new SQLQuery(Comment.class, “body text:match ?”);

query.setParameter(1, “java”);

Comment[] comments = space.readMultiple(query);

view raw Searching Java hosted with

by GitHub

You can combine full text search with SQL conditionals in one query:

SQLQuery<Comment> query = new SQLQuery(Comment.class, “archived = ? AND body text:match ?”);

query.setParameter(1, true);

query.setParameter(2, “java”);

Comment[] comments = space.readMultiple(query);

view raw Full-Text Search with SQL Conditionals hosted with

by GitHub

To create more complicated queries you can use:

Wildcard query
query.setParameter(1, “pro*”);
It will find all the documents with the words starting from “pro”
OR/AND query
query.setParameter(1, “java || scala”);
It will find all the documents either with “java” or “scala” word.
Note: The same could be with AND. For this specify &&.
NOT query
query.setParameter(1, “java !xml”);
It will find all the documents with “java” word but without “xml” word.
NOT query
query.setParameter(1, “java !xml”);
It will find all the documents with “java” word but without “xml” word.
Fuzzy Search query
query.setParameter(1, “roam~”);
It will find all the documents with the words similar to “roam” for example “foam” and “roams”

Note: For full list of queries refer to Lucene Query Parser Syntax documentation.

Example Application

Prerequisites:

Installed maven
Installed git
Installed java
Install XAP 12.1 distro (see Installation instructions)

Clone git repository:
git clone https://github.com/GigaSpaces-ProfessionalServices/full-text-search-demo
Enter to the root project’s folder:
cd full-text-search-demo
Build the project:
mvn clean package
In separate terminal start gs-ui
<XAP_HOME>/bin/gs-ui.sh
In separate terminal start gs-agent
<XAP_HOME>/bin/gs-agent.sh
Make sure in XAP ui that gsa, 2 gsc, lus and gsm running
Deploy space
<XAP_HOME>/gs.sh deploy-space -cluster total_members=2,1 comments
Make sure spaces are deployed
Run feeder:
java -jar feeder/target/feeder.jar jini://*/*/comments xap-12.1.0 100000 RC_2015-01-short-100k
Note: If you want to use all Reddit dataset follow
Deploy Web app:
<XAP_HOME>/bin/gs.sh deploy web/target/web.war
Open http://localhost:8090/web
Observe demo web app:
On the left side, you can choose one of the pre-defined queries. On the right side, you can play with custom queries.

Where Can I Download XAP 12.1?

XAP 12.1 for download

Want to know more about XAP 12.1? Visit our XAP 12.1 page to learn more.

Sponsored by GigaSpaces

Full-Text Search with XAP 12.1

XAP Full-Text Search Features

DEMO XAP FULL-TEXT SEARCH

XAP Full-Text Search Architecture

XAP Full-Text Search Example

Writing Data to the Space

POJO Document

Schema-Free Document

Searching

Example Application

Where Can I Download XAP 12.1?

You may also like...

Resources

Search

News

Events

Archives

Sponsored By

HPCC Systems from LexisNexis Risk Solutions

KX

InterSystems

MySQL/Oracle

SingleStore

Supporters

McObject

NEXTGRES

Progress

Raima

Scality

Volt Active Data