On HeatWave, MySQL database and GenAI. Q&A with Nitin Kunal

by Roberto Zicari · Published January 29, 2025 · Updated January 29, 2025

Q1. You are one of the founding engineers of HeatWave. Can you tell us what are you current projects?

I am indeed one of the founding engineers of the HeatWave project, having written its first header file and designed and developed one of its most advanced query optimizers. Currently, I lead the HeatWave server team, responsible for delivering core features of the engine. Some of the exciting projects we’re working on include developing cloud-aware query processing algorithms and applying machine learning to manage databases and optimize data processing.

Q2. Talking about cloud-scale data processing systems and the application of machine learning in databases, what are the main challenges?

In the cloud era, performance per dollar and ease of use are paramount. Modern users and applications demand data processing systems that can elastically scale to thousands of cores, operate autonomously, and self-heal. This shift introduces new computer science challenges, requiring us to reimagine algorithms and data management processes to function seamlessly across multiple clouds at an unprecedented scale.

While machine learning (ML) in databases is widely discussed, very few systems have truly unlocked its potential to improve query performance and enhance user experience without unintended side effects.

We approached this challenge from first principles—identifying usability and performance problems that traditional methods, such as advanced statistics, failed to solve effectively. Gradually, we progressed towards more sophisticated analytical models and naturally evolved into ML-based approaches as the complexity of solutions increased.

Over time, we refined this process and gained clarity on where ML can be practically applied within databases to make a real impact. Today, we are successfully leveraging ML across multiple areas, including query optimization, data management, and cluster sizing, delivering tangible improvements in performance and user experience.

Q3. What did the development team for the HeatWave server and HeatWave data warehouse produce recently?

Here are some of our latest achievements, showcasing our relentless commitment to product improvement and innovation.

Q4. How MySQL Database and HeatWave relate to each other?

HeatWave is an innovative, cloud-native data processing engine built from the ground up to scale across thousands of cloud cores. Initially launched as an ambitious research project, HeatWave introduced the concept of a secondary storage engine, enabling seamless integration with the MySQL database. Despite being a new engine, HeatWave retains MySQL as its frontend, ensuring that users and applications continue to interact with MySQL as they always have.

HeatWave significantly extends the capabilities of MySQL by supporting OLAP, machine learning (ML) processing, Lakehouse processing, and now Generative AI. This makes MySQL more versatile, addressing a wide range of data processing needs in the cloud with the simplicity of one integrated cloud service.

Q5. To support HeatWave GenAI with unstructured data, HeatWave has introduced a new VECTOR data type and a set of distance and vector utility functions in 9.0. What are these new features useful for?

Enterprises face unprecedented challenges in analyzing the explosion of unstructured data like PDFs and HTML. Traditional keyword-based search often fails to deliver relevant results due to its inability to understand context. For example, searching for “esteemed scientists” may miss “Nobel Prize-winning physicists.”

Generative AI techniques like similarity and semantic search overcome this limitation by leveraging machine learning to understand data context. Data is transformed into vector embeddings—numerical representations capturing relationships across a variety of data. Vectors capture the context of the data and relationships to other data. This data can be text, video, audio etc. And the vector distance functions help in measuring or quantifying contextual similarity between data. Vector stores, combined with distance functions, measure contextual similarity, enabling more precise and insightful search results.

Starting with version 9.0, HeatWave natively supports vectors as a first-class data type and introduces highly performant vector distance functions to deliver advanced, high-speed semantic search capabilities.

Q6. HeatWave has been enhanced to support concurrent query processing. Why? What are the applications that can benefits from such an extension?

Now the HeatWave query processing engine can simultaneously run multiple queries, resulting in lower response time for short queries and higher overall throughput of the workload. All types of applications are going to see benefit from this feature, and overall more work will get done in the same unit of time improving the cost effectiveness of HeatWave.

The idea is to pick multiple queries from the queue and start executing queries which have complementary resource requirements or those queries which are short running and otherwise would have waited long for its turn. A custom scheduling algorithm, has been specially designed and implemented to scale query processing in HeatWave.

Q7. Specifically, do you have any benchmark results to show how this enhancement Improves throughput for workloads where queries complement each other in terms of resource utilization?

All standard query processing benchmarks show improved throughput and resource utilization. For example, queries from the TPC-H benchmark when submitted together will show higher throughput. The short running queries from this benchmark will have lower latency, as they won’t necessarily wait for large queries issued before them to complete.

Q8. When users submit a query to a HeatWave instance, the MySQL query optimizer decides if the query should be offloaded to the HeatWave cluster for accelerated execution. Why? What are the benefits? What if the HeatWave cluster is in a separate node in a distributed architecture with respect to the MySQL query optimizer?

Let us first understand the query processing architecture of HeatWave. The system consists of two query processing engines, specialized for different purposes. MySQL server along with the InnoDB storage engine is efficient for OLTP and for point queries, while the HeatWave query processing engine is the fastest engine built from scratch for OLAP and machine learning processing. Now, these two engines are natively integrated by the makers of MySQL Database to provide the unique experience of single system which can do OLTP, OLAP and ML processing, along with Generative AI. HeatWave massively partitions data across a cluster of nodes, which can be operated in parallel. With increasing data size and performance requirements the HeatWave cluster can be scaled out without any downtime.

Anytime a query is sent to HeatWave, it first lands in the MySQL server which uses its optimizer to decide if this query is better suited for the MySQL (InnoDB) or the HeatWave engine. Based on the decision, the query is sent to the right engine. A reporting type query or a complex join query is expected to run orders of magnitude faster on HeatWave, thus changing the game for application developers. They can now do things or process data on MySQL at unpreceded speed.

Q9. In 9.0, HeatWave introduces machine learning–based query offload. Please explain this concept and possibly give us an example of use.

In earlier versions, query offload decisions were rule-based. For instance, if a query’s estimated cost was below a certain threshold, it was executed on the MySQL server; otherwise, it was offloaded to the HeatWave engine for acceleration. However, this approach was approximate and could be error prone.

Ideally, the query cost should be evaluated for both engines to determine the best execution plan. However, comparing costs between MySQL and HeatWave is challenging because the two engines use completely different query processing models and cost estimation methods.

To address this, HeatWave 9.0 introduces a lightweight machine learning (ML) model trained on millions of queries and workloads. This ML model accurately predicts the optimal engine for query execution, ensuring intelligent and efficient offload decisions. This innovation resolves the complexity of query cost comparison once and for all, significantly improving performance and resource utilization.

This feature is a very good example of practical application of machine learning and database optimization.

Qx Anything else you wish to add?

The first version of HeatWave was released in 2020, and since then, we have continuously innovated by introducing features like HeatWave Autopilot, HeatWave Lakehouse, HeatWave AutoML, and now HeatWave GenAI. We are committed to delivering cutting-edge technology to our users ahead of the competition, helping them stay ahead of the curve.

These innovations not only improve performance and enable more capabilities at a lower cost but also unlock new possibilities that were previously unattainable. The synergy between these advanced features transforms the way our users operate. For instance, with HeatWave, users can perform anomaly detection by analyzing their logs and then explain the results in natural language using GenAI. This accelerates application development and empowers users to build innovative solutions that would not have been possible without HeatWave.

Resources

HeatWave is available now and enables customers to build generative AI apps at no additional cost without moving data. It allows developers to build apps with AI and it allows customers to bring differentiated solutions to market at better speed. You can now try HeatWave for free in the OCI Free Tier.

…………………………………………………..

Nitin Kunal, Senior Director of Software Development,Oracle.

Nitin is one of the founding engineers of MySQL HeatWave, and he is currently leading the HeatWave server development team at Oracle. His expertise lies in the realm of cloud-scale data processing systems and the application of machine learning in databases. He has been granted 9 US patents for the HeatWave project.
Prior to joining Oracle, Nitin successfully developed and delivered a high-performance file system and storage engine tailored for advanced enterprise-grade SSD based intelligent storage devices.
Blog

You may also like...

Resources

Search

News

Events

Archives

Sponsored By

InterSystems

MySQL/Oracle

Supporters

McObject

Raima

Scality

TIAA

Undo

Volt Active Data