On Unifying Transactions and Analytics. Q&A with Shireesh Thota
“The so-called modern data stack is broken, and we need to commit to simplify it urgently.”
Q1. You recently announced the amplification of your technology stack, allowing customers to unify transactions and analytics in the same engine for real-time data experiences. What does it mean in practice?
We built SingleStoreDB on the foundational principle of unifying transactions and analytics to enable businesses to deliver real-time experiences for a wide variety of applications.
Our recent announcement provides significant amplification to our mission. We have provided abilities for the customers to build enterprise-scale applications with capabilities such as Workspaces, CMEK, advances in authentication and authorization, etc. Workspaces provide a way for our customers to deploy multiple isolated compute environments on the same database on demand, providing a highly cost-effective way to decouple storage and multiple compute workloads. We have expanded on developer experiences so that our customers can build advanced scenarios and modern applications much more seamlessly. Features such as Data API and built-in vector functions come quite handy. Our new feature, named Code Engine, to enable Wasm modules to run deeply integrated in DB run-time is a paradigm shift to build highly complex compute solutions without having to move the data back and forth from the database. Finally, we invested in enhancing our analytical capabilities to help our customers push further on their data science, BI and such logical extensions of our analytical end of the unified data spectrum. Flexible parallelism is a feature that will help customers run massive queries with higher performance. Our PowerBI connector is now Microsoft certified. We also have a dbt connector to help data engineers build data flow pipelines on the data stored in SingleStoreDB.
Q2. What is a modern SaaS application?
A modern SaaS application is one that empowers businesses with deep, real-time insights while providing a demonstrably high value to the customer. Modern applications today rely on minimizing the time-to-insights, from when a piece of transactional data is generated, to when they can drive insights or business decisions out of it. These applications ought to handle the dynamic nature of data, where the user and the system interact increasingly more unlike the old models of static delivery of a service. The deluge of volume, variety and velocity of big data is deeply embedded into modern SaaS applications and are implicitly scaleable with user interaction (queries) and can handle higher complexity of queries. The experience to the customers and the internal computation need to happen with extremely low latency to qualify as a modern SaaS application.
Q3. Why is it important for companies to reevaluate their data strategies?
Companies need to re-evaluate their data strategies given the nature of the experiences that customers expect and the resulting pressure on the old architectures. Any strategy that doesn’t account for the deluge of big data and the ability to deliver real-time experiences is broken. Unfortunately, many companies are still stuck on the old data strategy of silos for operational and analytical solutions and complex data-science workflows to replicate massive amounts of data with error-prone workflows and inefficient compute-storage relationships. The so-called modern data stack is broken, and we need to commit to simplify it urgently. At the heart of this is understanding when to specialize and when not to. Unified transactional and analytical platforms have compounded benefits in reducing unnecessary data movement; avoiding complex replication flows; and, most importantly, decreasing the latency of the queries.
Q4. Workspaces: What benefits do customers have in working on isolated workspaces for different types of workloads?
Workspaces help customers provision isolated compute environments that can be attached on demand on an existing database. This feature helps in building limitless flexibility to have a clear separation between various applications or compute spaces in a cost-effective manner. One could now potentially run a point-of-sales application, segmentation of customers and BI reports across different workspaces, for independent time windows, on the same exact database, without any performance impact.
This is a good example of our commitment to seamless elasticity. In essence, Workspaces provide a) auto-scale capabilities where these can be attached and detached completely on demand, and b) separation of one compute environment from another compute environment to avoid noisy-neighbor challenges or the alternative of a highly costly choice of replicating the data across multiple databases.
Q5. Universal Language Support with WASM: How does it help organizations ?
Wasm was invented to enable high-performance applications on the web, but given its safe, portable and low-level binary code format, it has evolved to specify interfaces for such applications to run on various environments.
We are excited to announce Code Engine – powered by Wasm, to support running Wasm modules natively within our database engine. Programs written in C/C++/Rust can be compiled into these modules that can be viewed as advanced UDFs that help run complex business logic closer to the data. With this, organizations can now run capabilities such as sentiment analysis or fuzzy text matching most efficiently without moving data in and out of the system. Support for more languages including Python will be coming soon.
Data movement between a stateless mid-tier and the stateful data layer to do compute-intensive computations outside of the database is quite costly and adds latency that stands in the way of building real-time experiences. By enabling Wasm module integrations into the DB runtime, we are laying the groundwork to create a much more efficient paradigm where complex computation is done natively without any data movement. This is the next logical advance of SQL-based stored-procedures and UDFs.
Q6. How do these new capabilities of security, scale, and real-time analytics relate to each other?
All of these capabilities are fundamentally related to our core mission and interoperate with each other in helping build modern real-time analytical applications. Modern SaaS applications need enterprise-grade security and scale to succeed in the marketplace and meet customer demands. Real-time analytics applications are inherently large scale and need cost-effective approaches to deal with big-data constraints. Given that they deal with a variety of data (both analytical and transactional) and that we aim to unify large swathes of data in one place, it becomes inevitable that customers expect tier-1 grade security capabilities such as encryption at rest and integration into the popular identity providers, etc. Scale and performance via features such as Workspaces, flexible parallelism, and then paving a path to integrate with the visualization tooling such as PowerBI, are all connected. It’s the inherent nature of modern applications. Incidentally, we established this quite clearly in our customer engagements.
Q7. You also offer so-called Dbt connectors. What are these?
We built an adapter that can be used to connect with any SingleStoreDB to build data transformation pipelines using dbt. dbt provides a development environment to create transformation workflows on data that is already in SingleStore, which dbt turns into tables and views through SELECT statements.
This adapter enables analytics engineers to easily build, test, document and deploy data pipelines to perform robust enrichment/transformations using SQL. Analytics engineers can now manage real-time data transformations through SQL views to power low-latency applications, which isn’t possible with traditional analytical databases and data warehouses that don’t provide operational and analytical capabilities in the same engine.
Q8. Flexible parallelism lets customers use all the cores on their machines to process a single query. How does it work in practice?
Flexible parallelism enables a single query to exploit all cores, thereby increasing the performance of complex queries significantly. We have seen some of our customers adopt this to gain north of a 4x speed improvement.
SingleStoreDB uses a partitioned storage model. Every database has a fixed number of partitions, defined when you create the database. Historically, SingleStoreDB has done parallel query processing with one thread per partition. This works great when you have an appropriate partitioning level compared to the number of cores, such as one core for every one or two partitions. With prior releases, when you scaled up, you couldn’t use the extra cores to run one query faster. You could only benefit from them by running more queries concurrently. Flexible parallelism changes that by internally adding a sub-partition id to each columnstore table row. Then, query execution decides a set of sub-partitions for each thread to scan on the fly.
Qx. Anything else you wish to add?
We believe that these features add significant credence to our mission in building a real-time analytics platform on top of our distributed relational technology. Along with this compelling value of real-time analytics, we are furthering our focus on developer experience, distributed SQL capabilities and cloud evolution. It’s our firm belief that we provide the most comprehensive, performant and cost-effective database service. Take note of our recent benchmark study by GigaOM, which establishes this claim conclusively. Please find the report here.
Shireesh Thota is SVP of Engineering at SingleStore, responsible for product design and development. Shireesh has experience on large scale, big data, scale-out, and relational and schema agnostic distributed systems. He previously ran engineering efforts at Azure Cosmos DB and PostgreSQL Hyperscale (Citus) services at Microsoft. Shireesh has a B.S. in computer science from B.I.T.S Pilani, India, and an M.S. in computer science from the University of Connecticut.
Sponsored by SingleStore.