On data management. Q&A with Shireesh Thota
“The increasing convergence on relational and non-relational, operational and analytical, and across data warehouses and data lakes is a key trend that will accelerate faster.”
Q1. You have recently joined SingleStore as their senior vice president of engineering. What are your current projects?
I run engineering for SingleStore, including execution of product strategy, innovation and organizational growth. My key focus is on a trifecta of challenges – Distributed SQL, Real-Time Analytics and Developer Experience — wrapped under Cloud-First approach. We have made a lot of headway in how we think of these pillars, and I am responsible for setting the engineering vision to deliver them successfully. Some of the key projects that we are deeply involved in include High Availability/Disaster Recovery, seamless elasticity, cross-partition consistency semantics, JSON-type expansion, OLAP query surface area, ease of use APIs, and cloud-infrastructure innovation. It’s an exciting time at SingleStore!
Q2. Talking about engineering innovation, what are the main lessons you have learned in more than 15 years at Microsoft?
Microsoft is a giant machine with enormous resources. I have had the good fortune to participate in building and managing several types of databases and services, all of which add valuable perspectives. The single most critical lesson is about how to pace when building a database engine and a scalable service. Most big innovations and key breakthroughs in this area take a good amount of time. The trick is to ensure we have broken it down to deliver incrementally while keeping an eye on the big prize. The confluence of modern data stack trends, query algorithms, hardware trends, and cloud innovations make this area incredibly exciting and dynamic. We need fortitude and constant learning to build the next great database!
Q3. Why do you believe in the increasing convergence across relational and non-relational databases?
Relational algebra and the corresponding calculus with SQL have stood the test of time and will continue to be innovated for several years to come. The theoretical foundations ensure this. With the advent of Web2, there was an enormous strain on scaling the single node view. We built NoSQL systems by trading off capabilities such as ACID, functionality rich QP, and strict schema adherence to navigate the scalability and availability guarantees within the constraints of CAP theorem.
The breakthroughs in distributed systems, hardware trends and evolving data stacks have shown us that we can bring the best of both. The industry is innovating on various fronts to tackle this, advancing on JSON type within relational systems, managing logical clock semantics for x-partition transactions (including true time and HLC) and tunable consistencies for geo-replication. Aurora, SQL Hyperscale and AlloyDB decouple storage by centralizing the write partition and guaranteeing greater relational compatibility, but they don’t scale fully. SingleStoreDB, YugabyteDB and CockroachDB have shared-nothing architectures to scale horizontally with the right relational surface-area trade offs. Customer demands will further push us all to build all of the goodness of NoSQL systems on top of relational foundations.
Q4. What are the most exciting things stemming from this new era of the internet that could revolutionize the database industry?
The advances in network reliability and latencies are exciting for the database industry. A lot of distributed system challenges have to deal with node-to-node communication; as the network gets robust, we can tackle the horizontal scale intra region and inter region (even across AZs). The pressure to evolve to the needs of growing killer applications such as real-time analytics is immense. Many such lateral applications need immediate insights on fresh data on top of operational transactability latencies. There are a lot of exciting innovations to tackle these needs, and they will revolutionize the way we perceive the so-called modern data stack. We will need far more unification across relational and non-relational (with distributed SQL), transactional and analytical (with HTAP) systems.
Q5. What is your advice for engineers and developers for building data-intensive applications?
I would ask them to take a fresh look at the current state of their architecture and their needs. Many modern data applications have a variety of challenges across speed of ingestion and volume of data to the query complexity with low latency and high concurrency. To assess this, SingleStore has put together a handy tool.
The choice of the database will heavily determine the success of your application. One ought to pick something that can scale horizontally for the volume of big data and provide a seamless capability to ingest it quickly, likely in parallel across various partitions. For the low latency and high complexity of queries and the choice of storage formats, query engine capabilities are critical. One should be sure to simplify the data management stack to avoid unwarranted complexity, low freshness of data and being error-prone. Most of the data-intensive applications demand multi-model support and cloud-native capabilities, too.
With the right choice of the database, applications can then focus on the business constraints that are typically availability, latency and throughput-driven. With a unified stack, observability and governance can be set up a lot more easily. Finally, constant evaluation of the data intensity would help adjust the technical stack accurately.
Q6. In your opinion, what are the key trends in data management?
The modern data stack, as widely accepted, is filled with too many highly specialized components and needs an extreme amount of data movement. The general flow is to have an operational database as the source of truth and then an extract-load phase to go to various options for storage and query (Data Warehouse, Data Lakes and specialized Query Engines), and then add Transformation before delivered to user-facing dashboards and broader analysis tooling. There are many variations of this flow, but the gist is a series of phases with multiple boxes and arrays.
Our customers are realizing this complexity and looking for unified stacks that eliminate this extreme creep. The solutions that unify data integrations, database service (distributed and HTAP) and then user-facing services are gradually trending upwards given the immense benefits to the customer-facing challenges. The increasing convergence on relational and non-relational, operational and analytical, and across data warehouses and data lakes is a key trend that will accelerate faster. Customers would want to focus on integrated application development with the appropriate data management solution instead of tiresome data engineering pipelines for every specific challenge.
Qx. Anything else you wish to add?
The evolution of databases has been constant for the past several decades. We live in a golden era of DBMS technologies with a dizzying array of technologies available to solve the ever growing demands of applications. We at SingleStore are pursuing the vision to be the general- purpose database to unify and simplify various phases of the modern data stack. I anticipate a growing list of companies trying to follow us on this path, and we welcome that increased momentum in this direction rather than pursuing extreme specializations.
…………………………………………
Shireesh Thota is SVP of Engineering at SingleStore, responsible for product design and development. Shireesh has experience on large scale, big data, scale-out, and relational and schema agnostic distributed systems. He previously ran engineering efforts at Azure Cosmos DB and PostgreSQL Hyperscale (Citus) services at Microsoft. Shireesh has a B.S. in computer science from B.I.T.S Pilani, India, and an M.S. in computer science from the University of Connecticut.
Sponsored by SingleStore.