On LeanXcale database. Interview with Patrick Valduriez and Ricardo Jimenez-Peris
“LeanXcale is the first startup that instead of going to market with a single innovation or know-how, is going to market with 10 disruptive innovations that are making it really differential for many different workloads and extremely competitive on different use cases.” — Patrick Valduriez.
I have interviewed Patrick Valduriez and Ricardo Jimenez-Peris. Patrick is a well know database researcher, and since 2019, he is the scientific advisor of LeanXcale. Ricardo is the CEO and Founder of LeanXcale. We talked about NewSQL, Hybrid Transaction and Analytics Processing (HTAP), and LeanXcale, a start up that offers an innovative HTAP database.
Q1. There is a class of new NewSQL databases in the market, called Hybrid Transaction and Analytics Processing (HTAP) – a term created by Gartner Inc. What is special about such systems?
Patrick Valduriez: NewSQL is a recent class of DBMS that seeks to combine the scalability of NoSQL systems with the strong consistency and usability of RDBMSs. An important class of NewSQL is Hybrid Transaction and Analytics Processing (HTAP) whose objective is to perform real-time analysis on operational data, thus avoiding the traditional separation between operational database and data warehouse and the complexity of dealing with ETLs.
Q2. HTAP functionality is offered by several database companies. How does LeanXcale compare with respect to other HTAP systems?
Ricardo Jimenez-Peris: HTAP covers a large spectrum that has three dimensions. One dimension is the scalability of the OLTP part. There is where we excel. We scale out linearly to hundreds of nodes. The second dimension is the ability to scale out OLAP. This is well known technology from the last two decades. Some systems are mostly centralized, but those that are distributed should be able to handle reasonably well the OLAP part. The third dimension is the efficiency on the OLAP part. There is where we are still working to improve the optimizer, so the expectation is that we will become pretty competitive in the next 18 months. Patrick’s expertise in distributed query processing will be key. I would like also to note that, for recurrent aggregation analytical queries, we are really unbeatable thanks to a new invention that enables us to update in real-time these aggregations, so these aggregation analytical queries becomes costless since they just need to read a single row from the relevant aggregation table.
Q3. Patrick, you wrote in a blog that “LeanXcale has a disruptive technology that can make a big difference on the DBMS market”. Can you please explain what is special about LeanXcale?
Patrick Valduriez: I believe that LeanXcale is at the forefront of the HTAP movement, with a disruptive technology that provides ultra-scalable transactions (see Q4), key-value capabilities (see Q5), and polyglot capabilities. On one hand, we support polyglot queries that allow integrating data coming from different data stores, such as HDFS, NoSQL and SQL systems. On the other hand, we already support SQL and key-value functionality on the same database, and soon we will support JSON documents in a seamless manner, so we are becoming a polystore.
LeanXcale is the first startup that instead of going to market with a single innovation or know-how, is going to market with 10 disruptive innovations that are making it really differential for many different workloads and extremely competitive on different use cases.
Q4. What are the basic principles you have used to design and implement LeanXcale as a distributed database that allows scaling transactions from 1 node to thousands?
Ricardo Jimenez-Peris: LeanXcale solves the traditional transaction management bottleneck with a new invention that lies in a distributed processing of the ACID properties, where each ACID property is scaled out independently but in a composable manner. LeanXcale’s architecture is based on three layers that scale out independently, 1) KiVi, the storage layer that is a relational key-value data store, 2) the distributed transactional manager that provides ultra-scalable transactions, and 3) the distributed query engine that enables to scale out both OLTP and OLAP workloads. KiVi counts with 8 disruptive innovations that provide dynamic elasticity, online aggregations, push down of all algebraic operators but join, active-active replication, simultaneous efficiency for both ingesting data and range queries, efficient execution in NUMA architectures, costless multiversioning, hybrid row-columnar storage, vectorial acceleration, and so on.
Q5. The LeanXcale database offers a so-called dual interface, key-value and SQL. How does it work and what is it useful for?
Ricardo Jimenez-Peris: (how does it work): The storage layer, it is a proprietary relational key-value data store, called KiVi, which we have developed. Unlike traditional key-value data stores, KiVi is not schemaless, but relational. Thus, KiVi tables have a relational schema, but can also have a part that is schemaless. The relational part enabled us to enrich KiVi with predicate filtering, aggregation, grouping, and sorting. As a result, we can push down all algebraic operators below a join to KiVi and execute them in parallel, thus saving the movement of a very large fraction of rows between the storage layer and they query engine layer. Furthermore, KiVi has a direct API that allows doing everything that SQL can do but join, but without the cost of SQL. In particular, it can ingest data as efficiently as the most efficient key-value data stores, but the data is stored in relational tables in a fully ACID way and the data is accessible through SQL. This enables to highly reduce the footprint of the database in terms of hardware resources for workloads where data ingestion represents a high fraction.
Patrick Valduriez: (what is it useful for): As for RDBMSs, the SQL interface allows rapid application development and remains the preferred interface for BI and analytics tools. The key-value interface is complementary and allows the developer to have better control of the integration of application code and database access, for higher performance. This interface also allows easy migration from other key-value stores.
Q6. You write that LeanXcale could be used in different ways. Can you please elaborate on that?
Ricardo Jimenez-Peris: LeanXcale can be used in many different ways: as an operational database (thanks to transaction scalability), as a data warehouse (thanks to our distributed OLAP query engine), as a real-time analytics platform (due to our HTAP capability), as an ACID key-value data store (using KiVi and our ultra-scalable transactional management), as a time series database (thanks to our high ingestion capabilities), as an integration polyglot query engine (based on our polyglot capabilities), as an operational data lake (combining our scalability in volume of a data lake with operational capabilities at any scale), as a fast data store (using KiVi as standalone), as an IoT database (deploying KiVi in IoT devices), and edge database (deploying KiVi on IoT devices and the edge and full LeanXcale database on the cloud with georeplication).
Thanks to all our innovations and our efficiency and flexible architecture, we can compete in many different scenarios.
Q7. The newly defined SQL++ language allows adding a JSON data type in SQL. N1QL for Analytics is the first commercial implementation of SQL++ (**). Do you plan to support SQL++ as well?
Ricardo Jimenez-Peris and Patrick Valduriez: Yes, but within SQL, as we don’t think any language will replace SQL in the near future. Over the last 30 years, there have been many claims that new languages would (“soon”) replace SQL, e.g., object query languages such as OQL in the 1990s or XML query languages such as XQuery in the 2000s. But this did not happen for three main reasons. First, SQL’s data abstraction (table) is ubiquitous and simple. Second, the language is easy to learn, powerful and has been adopted by legions of developers. And it is a (relatively) standard language, which makes it a good interface for tool vendors. This being said, the JSON data model is important to manage documents and SQL++ is a very nice SQL-like language for JSON. In LeanXcale, we plan to support a JSON data type in SQL columns and have a seamless integration of SQL++ within SQL, with the best of both (relational and document) worlds. Basically, each row can be relational or JSON and SQL statements can include SQL++ statements.
Q8. What are the typical use cases for LeanXcale? and what are the most interesting verticals for you?
Ricardo Jimenez-Peris: Too many. Basically, all data intensive use cases. We are ideal for the new technological verticals such as traveltech, adtech, IoT, smart-*, online multi-player games, eCommerce, …. But we are also very good and cost-effective for traditional use cases such as Banking/Finance, Telco, Retail, Insurance, Transportation, Logistics, etc.
Q9. Patrick, as Scientific Advisor of LeanXcale, what is your role? What are you working at present?
Patrick Valduriez: My role is as a sort of consulting chief architect for the company, providing advice on architectural and design choices as well as implementation techniques. I will also do what I like most, i.e., teach the engineers the principles of distributed database systems, do technology watch, write white papers and blog posts on HTAP-related topics, and do presentations at various venues. We are currently working on query optimization, based on the Calcite open source software, where we need to improve the optimizer cost model and search space, in particular, to support bushy trees in parallel query execution plans. Another topic is to add the JSON data type in SQL in order to combine the best of relational DBMS and document NoSQL DBMS.
Q10. What is the role that Glenn Osaka is having as an advisor for LeanXcale?
Ricardo Jimenez-Peris: Ricardo: Glenn is an amazing guy and successful Silicon Valley entrepreneur (CEO of Reactivity, sold to Cisco). He was advisor of Peter Thiel at Confinity, who later merged his company with Elon Musk’s X.com to create PayPal, and continued to be advisor there till it was sold to eBay.
He is guiding us in the strategy to become a global company. A company doing B2B to enterprises has as main challenge to overcome the slowness of enterprise sales, and through his advice we have built a strategy to overcome this slowness.
Q11. You plan to work with Ikea. Can you please tell us more?
Ricardo Jimenez-Peris: Ikea has isolated ERPs per store. Thus, the main issue is that when a customer wants to buy an item at a store and there is not enough stock, this isolation prevents them from selling using stock from other stores. Similarly, orders for new stock are not optimized since they are made based on the local store view. We are providing them with a centralized database that keeps the stock across all stores and solving the two problems. We are also working with them in a proximity marketing solution to offer customers coupon-based discounts as they go through the store.
Qx Anything else you wish to add?
Patrick Valduriez: Well, the adventure just got started and it is already a lot of fun. It is a great opportunity for me, and probably the right time, to go deeper in applying the principles of distributed and parallel databases on real-world problems. The timing is perfect as the new (fourth) edition of the book “Principles of Distributed Database Systems“, which I co-authored with Professor Tamer Özsu, is in production at Springer. As a short preview, note that there is a section on LeanXcale’s ultra-scalable transaction management approach in the transaction chapter and another section on LeanXcale’s architecture in the NoSQL/NewSQL chapter.
Ricardo Jimenez-Peris: Ricardo: It is a really exciting moment now that we are going to market. We managed to build an amazing team able to make the product strong and go to market with it. We believe to be the most innovative startup in the database arena and our objective is to become the next global database company. Still a lot of work and exciting challenges ahead. Now we are working on our database cloud managed service that will be delivered in Amazon, hopefully, by the end of the year.
Dr. Patrick Valduriez is a senior scientist at Inria in France. He has been a scientist at Microelectronics and Computer Technology Corp. in Austin (Texas) in the 1980s and a professor at University Pierre et Marie Curie (UPMC) in Paris in the early 2000s. He has also been consulting for major companies in USA (HP Labs, Lucent Bell Labs, NERA, LECG, Microsoft), Europe (ESA, Eurocontrol, Ask, Shell) and France (Bull, Capgemini, Matra, Murex, Orsys, Schlumberger, Sodifrance, Teamlog). Since 2019, he is the scientific advisor of the LeanXcale startup.
He is currently the head of the Zenith team (between Inria and University of Montpellier, LIRMM) that focuses on data science, in particular data management in large-scale distributed and parallel systems and scientific data management. He has authored and co-authored many technical papers and several textbooks, among which “Principles of Distributed Database Systems” with Professor Tamer Özsu. He currently serves as associate editor of several journals, including the VLDB Journal, Distributed and Parallel Databases, and Internet and Databases. He has served as PC chair of major conferences such as SIGMOD and VLDB. He was the general chair of SIGMOD04, EDBT08 and VLDB09.
He received prestigious awards and prizes. He obtained several best paper awards, including VLDB00. He was the recipient of the 1993 IBM scientific prize in Computer Science in France and the 2014 Innovation Award from Inria – French Academy of Science – Dassault Systems. He is an ACM Fellow.
Dr. Ricardo Jimenez was professor and researcher at Technical University of Madrid (Universidad Politécnica de Madrid – UPM) and abandoned his academic career to bring to the market an ultra-scalable database. At UPM, he already sold technology to European enterprises such as Ericsson, Telefonica, and Bull. He has been member of the advisory Committee on Cloud Computing for the European Commission.
He is co-inventor of the two patents already granted in US and Europe and of 8 new patent applications that are being prepared. He is co-author of the book “Replicated Databases” and more than 100 research papers and articles.
He has been invited to present LeanXcale technology in the headquarters of many tech companies in Silicon Valley such as Facebook, Twitter, Salesforce, Heroku, Greenplum (now Pivotal), HP, Microsoft, etc.
He has coordinated (as overall coordinator or technical coordinator) over 10 European projects. One of them, LeanBigData, was awarded with the “Best European project” award by the Madrid Research Council (Madri+d).
– LeanXcale was awarded with the “Best SME” award by the Innovation Radar of the European Commission in Nov. 2017 recognizing it as the most innovative European startup. LeanXcale has been identified as one of the innovator startups in the NewSQL arena by Bloor market analyst, and has been identified as one of the companies in the HTAP arena by 451 Research market analyst.
Follow us on Twitter: @odbmsorg