Nowadays applications require making text search against the data stored in databases. For this purpose, we can use well-known tools like ElasticSearch, Solr or Hibernate Search.
However, these tools require additional integration and operation effort or they are dependent on particular technology. If you need to execute full-text search queries on your data stored in XAP without any additional integration, you can use our new XAP feature Full Text Search, which we released in version 12.1.
XAP Full-Text Search Features
Our newest XAP version includes the following Full-Text Seach features:
- Schema free index capability
- Flexible customization of Tokenization, Stop words, Steaming
- Indexing nested properties
- Combining text search and standard predicates
- Supported queries:
- Keyword matching
- Search for phrase
- Wildcard matching
- Proximity matching
- Range searching
- Boosting a term
- Regular expressions
- Fuzzy search
DEMO XAP FULL-TEXT SEARCH
XAP Full-Text Search Architecture
To leverage full-text search capabilities, XAP uses Lucene under the hood. Each space partition has its own Lucene index held on the same node. When a user writes an indexed document to the space, the document is not flushed to the Lucene index immediately. It is flushed only after the search or after overflowing the buffer (see configuration property lucene.max-uncommitted-changes in Xap Full Text Search documentation).
If the partition has a backup, the backup is synchronizer as usual, and the Lucene backup index is synchronized too. When backup becomes primary the Lucene index is ready for the search.
To use Full-Text Search, you can either specify if a particular field is indexed or not. In case the field doesn’t have index it makes the full scan of the space and for each object creates in-memory Lucene index to check whether the field’s value meets the condition.
XAP Full-Text Search Example
First, in order to use full-text-search capabilities, you need to include the xap-full-text-search module in our app:
<dependency> | |
<groupId>org.gigaspaces</groupId> | |
<artifactId>xap-full-text-search</artifactId> | |
<version>12.1.0</version> | |
</dependency> |
Creating the Model
To get started with full-text search we will make queries against Reddit comments. So we have space model defined as following:
@SpaceClass | |
public class Comment { | |
private String id; | |
private String body; | |
private Author author; | |
private Boolean archived; | |
private Integer score; | |
@SpaceId | |
public String getId() { | |
return id; | |
} | |
@SpaceTextIndex // – (1) | |
public String getBody() { | |
return body; | |
} | |
@SpaceTextIndex(path = “name”) // – (2) | |
@SpaceTextAnalyzer(analyzer = KeywordAnalyzer.class) // – (3) | |
public Author getAuthor() { | |
return author; | |
} | |
// getters and setters go here | |
} |
And Author model:
When the model is created, you can make full-text search queries against the model without specifying any additional annotations or creating indexes for the data. But to speed up the execution of the queries you can use annotation @SpaceTextIndex (1) to mark that the field is indexed. During write operation, the value of the field will be added to Lucene index.
To specify that nested object’s “name” field is indexed you can mark the field by annotation @SpaceTextIndex(path = “name”) (2).
By default, all fields are analyzed by org.apache.lucene.analysis.standard.StandardAnalyzer Lucene built-in analyzer. To specify different implementation you can mark the field with @SpaceTextAnalyzer (3) annotation and provide the analyzer class. In case of nested fields, you have to specify “path” in annotation similar to indexing.
If no built-in analyzers meet your requirements, you can implement your own analyzer and make sure it is in the application classpath.
Writing Data to the Space
POJO Document
In order to insert the documents which are capable of Full-Text Search, you don’t need to change the existing code. Example of writing POJO document to space:
GigaSpace space = …
space.write(pojo);
Schema-Free Document
In case you don’t need to make field indexed neither specify the analyzer you don’t need to change existing code base:
SpaceTypeDescriptor typeDescriptor = new SpaceTypeDescriptorBuilder(“Comment”) | |
.idProperty(“id”) | |
.idProperty(“body”) | |
.create(); | |
space.getTypeManager().registerTypeDescriptor(typeDescriptor); |
To specify additional Full-Text Search information to the SpaceTypeDescriptor you can use addQueryExtensionInfo() method:
SpaceTypeDescriptor typeDescriptor = new SpaceTypeDescriptorBuilder(“Comment”) | |
.idProperty(“id”) | |
.idProperty(“body”) | |
.addQueryExtensionInfo(“body”, LuceneTextSearchQueryExtensionProvider.index()) // – (1) | |
.addQueryExtensionInfo(“author.name”, LuceneTextSearchQueryExtensionProvider.index()) // – (2) | |
.addQueryExtensionInfo(“author.name”, LuceneTextSearchQueryExtensionProvider.analyzer(KeywordAnalyzer.class)) // – (3) | |
.create(); | |
space.getTypeManager().registerTypeDescriptor(typeDescriptor); |
(1) To specify that field “body” is indexed you can use LuceneTextSearchQueryExtensionProvider.index() in conjunction with addQueryExtensionInfo method.
(2) To specify nested field you can split the field names with dot.
(3) To specify that field has specific analyzer you can use LuceneTextSearchQueryExtensionProvider.analyzer() in conjunction with addQueryExtensionInfo method.
Searching
Finally, assuming we’ve written some comments objects to the space, we can query them. Since the standard SQL comparisons (=, >, etc.) do not support the notion of text search operations, we’ve extended SQL in XAP to support the Full-Text Search. Use the “text:match” operator to instruct the SQL parser to use the full-text search. For example,
The following query finds all comments that contain the word “java”:
SQLQuery<Comment> query = new SQLQuery(Comment.class, “body text:match ?”); | |
query.setParameter(1, “java”); | |
Comment[] comments = space.readMultiple(query); |
You can combine full text search with SQL conditionals in one query:
SQLQuery<Comment> query = new SQLQuery(Comment.class, “archived = ? AND body text:match ?”); | |
query.setParameter(1, true); | |
query.setParameter(2, “java”); | |
Comment[] comments = space.readMultiple(query); |
To create more complicated queries you can use:
- Wildcard query
query.setParameter(1, “pro*”);
It will find all the documents with the words starting from “pro” - OR/AND query
query.setParameter(1, “java || scala”);
It will find all the documents either with “java” or “scala” word.
Note: The same could be with AND. For this specify &&. - NOT query
query.setParameter(1, “java !xml”);
It will find all the documents with “java” word but without “xml” word. - NOT query
query.setParameter(1, “java !xml”);
It will find all the documents with “java” word but without “xml” word. - Fuzzy Search query
query.setParameter(1, “roam~”);
It will find all the documents with the words similar to “roam” for example “foam” and “roams”
Note: For full list of queries refer to Lucene Query Parser Syntax documentation.
Example Application
Prerequisites:
- Installed maven
- Installed git
- Installed java
- Install XAP 12.1 distro (see Installation instructions)
- Clone git repository:
git clone https://github.com/GigaSpaces-ProfessionalServices/full-text-search-demo - Enter to the root project’s folder:
cd full-text-search-demo - Build the project:
mvn clean package - In separate terminal start gs-ui
<XAP_HOME>/bin/gs-ui.sh - In separate terminal start gs-agent
<XAP_HOME>/bin/gs-agent.sh - Make sure in XAP ui that gsa, 2 gsc, lus and gsm running
- Deploy space
<XAP_HOME>/gs.sh deploy-space -cluster total_members=2,1 comments - Make sure spaces are deployed
- Run feeder:
java -jar feeder/target/feeder.jar jini://*/*/comments xap-12.1.0 100000 RC_2015-01-short-100k
Note: If you want to use all Reddit dataset follow - Deploy Web app:
<XAP_HOME>/bin/gs.sh deploy web/target/web.war - Open http://localhost:8090/web
- Observe demo web app:
- On the left side, you can choose one of the pre-defined queries. On the right side, you can play with custom queries.
Where Can I Download XAP 12.1?
Want to know more about XAP 12.1? Visit our XAP 12.1 page to learn more.
Sponsored by GigaSpaces