On Redis Search 2.0 Q&A with Pieter Cailliau
Q1. You have introduced RediSearch 2.0. What is it?
RediSearch is a real-time secondary index with a full-text search engine for Redis. In RediSearch 2.0 we changed the architecture to achieve two main goals. First we wanted to improve the developer experience. Creating an index is now easier than before, you just define the schema of the index on top of your existing data and you can start searching. No need to move data around, no need to restart Redis. Secondly, we wanted RediSearch to inherit almost all Redis Enterprise capabilities (https://redislabs.com/redis-enterprise-software/overview/) , from which Active-Active is the most important one.
Q2. What are the benefits of having a real-time secondary index with full-text search capabilities for Redis?
The core of Redis does not have a secondary index, so searching for all hashes with a field which matches a certain prefix implies a full scan of all the data, which is highly inefficient. RediSearch has an inverted index for text fields, numeric, geospatial but also tag fields (similar to text, but without enhanced full-text search capabilities). Having a secondary index in Redis opens up many new use cases where Redis can be used as primary database.
Q3. RediSearch 2.0 supports Redis Labs’ Active-Active geo-distribution . What are the benefits?
The Active-Active means that you can have local read and write latencies to your documents with a seamless conflict resolution in a geo distributed database with several replica’s. In each replica, RediSearch will follow the data in a strong eventual consistent manner. The Active-Active technology significantly increases the availability of your database while keeping the performance of Redis.
Q4. To assess RediSearch 2.0’s ingestion performance, you extended your full-text search benchmark (FTSB: https://github.com/RediSearch/ftsb) suite with the publicly available NYC Taxi dataset: https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page. Can you summarize the main results of this benchmark?
We did several benchmarks in the past for RediSearch. We have compared it e.g. to Elasticsearch and found RediSearch to be 4x faster (https://redislabs.com/blog/search-benchmarking-redisearch-vs-elasticsearch/). In the benchmark for RediSearch 2.0 we mainly focussed on ingestion performance by comparing ourselves to the previous version of RediSearch. In the previous architecture, users had to write their data through commands introduced by RediSearch. In the new architecture they just use the core commands for Hashes (https://redis.io/topics/data-types). We wanted to prove this new architecture would have no impact on the ingestion performance. It turns out you can expect a speed up of 2.4x compared to v1.6 of RediSearch. The new architecture did not impact the read performance, for which you can also find more benchmarks here: https://redislabs.com/blog/redisearch-1-6-boosts-performance-up-to-64/
Q5. What’s next for RediSearch 2.0
Now we have this new architecture that can follow other data structures in Redis, we plan to extend it. We are thinking about adding support for indexing messages in streams, but also to add support for RedisJSON. RedisJSON adds a nested data structure to redis and implements the JSON standard. Under the hood it stores the documents as a binary tree for fast access to sub elements. Lastly we want to keep our momentum of increasing the developer experience. We’ll add several tools to help you optimize your read queries.
Principal Product Manager, Redis Labs.
Pieter leads the Product Management team at Redis Labs focusingon the domain of capabilities for developers.
He holds a MSc in Computer Science from Ghent University,
where he wrote a distinguished thesis on time-based graph models.
Prior to joining RedisLabs, Pieter used to work with graph
databases and was an instance of Software Engineer at TomTom, the world’s leader in location and navigation software.
Sponsored by Redis Labs