On Databases for real-time analytics on streaming data. Q&A with Karunakar Kotha and Andres Bolans

Q1.  Traditional relational databases like SQL Server were designed for transactional workloads, yet organizations increasingly need real-time analytics on streaming data. You’ve worked extensively with both SQL Server and Fabric’s Real-Time Intelligence databases. Beyond the Microsoft ecosystem, how should developers and architects think about the fundamental architectural trade-offs between row-oriented transactional databases and columnar time-series optimized systems?

What are the signs that an organization has outgrown the “batch ETL into a data warehouse” model, and when does real-time intelligence become a necessity rather than a luxury?

Developers and architects should view row‑oriented transactional databases and columnar time‑series systems as complementary technologies rather than competitors. Microsoft SQL Server is optimized for OLTP workloads that require frequent updates, strong consistency, and well‑defined schemas, while time‑series platforms such as Microsoft Fabric Real-Time Intelligence are designed for high‑throughput, append‑only data ingestion, efficient compression, and large‑scale time‑based analytics.

The optimal choice depends on the workload: transactional applications run best on SQL Server, whereas logs, metrics, IoT signals, and other streaming datasets are better served by systems built for time and scale. These platforms also provide native capabilities—such as ingestion‑time partitioning, retention policies, and materialized aggregations—that simplify operations.

Organizations typically outgrow batch ETL when dashboards slow down, queries scan billions of rows, ingestion pipelines lag, or when data loses value quickly without real‑time insights. At that stage, real‑time intelligence becomes essential for scenarios like anomaly detection, operational monitoring, and instant decision‑making.

The key is to use each technology where it excels: SQL Server for transactional workloads, and Microsoft Fabric RTI for high‑velocity streaming and analytical workloads that require speed and scale.

These links can help expand on the use cases: 

What is Azure Data Explorer?

Microsoft Fabric decision guide

Q2. You’ve helped teams transition from SQL to Kusto Query Language (KQL). This touches on a broader industry question: as we see the proliferation of specialized query languages (SQL, KQL, PromQL for metrics, various graph query languages), how should developers approach learning and choosing between these paradigms?

What makes KQL particularly suited for time-series and streaming analytics that SQL fundamentally struggles with? And how do you advise teams on balancing the benefits of specialized languages against the costs of fragmenting their skill sets?

Different query languages exist because different kinds of data need different ways of working with them. SQL is great for relational data and transactional workloads, but it isn’t built for high-volume time-series or streaming data. That’s where KQL comes in. KQL is designed to handle huge amounts of logs and telemetry quickly. It makes filtering, aggregating, and doing time-based analysis fast and simple, without the heavy joins that SQL struggles with in these situations.

Our recommendation to teams is to focus on the core concepts first things like filtering, grouping, and aggregation. These ideas stay the same no matter which language you use. Then choose the language that fits the data: KQL for telemetry and logs, SQL for structured relational data.

To avoid people feeling like they need completely different skill sets, with  shared documentation, and cross-training. With tools like Copilot, teams can easily pick up KQL and even convert SQL queries into KQL when needed. This keeps everyone aligned and reduces the learning curve.

KQL offers many enjoyable and interactive ways to learn, such as the Kusto Detective Agency, where participants can win prizes like Funkos .

Q3.  The shift from dedicated database servers to consumption-based, serverless architectures is happening across all major cloud platforms—not just Microsoft Fabric, but also AWS, Google Cloud, and specialized vendors. From your escalation engineering experience dealing with performance issues and cost overruns, what are the hidden pitfalls of serverless real-time analytics platforms that developers don’t anticipate?

How do you help organizations model and predict costs when query patterns are unpredictable, and what architectural patterns have you seen that keep performance high while preventing runaway cloud bills?

Microsoft Eventhouse provides the flexibility to choose between serverless and provisioned capacity models, enabling customers to balance performance and cost according to their specific workload needs.

For development or intermittent workloads where cost efficiency is the priority, the serverless model is an excellent fit. For production scenarios that require consistent, predictable performance even after extended idle periods the provisioned model ensures dedicated resources and reliability.

Eventhouse also allows customers to configure an “always-on” mode by specifying a minimum scale level so the engine never scales below that threshold. The maximum scale is determined by the capacity assigned to the workspace, ensuring the Eventhouse cannot exceed the limits of the allocated capacity.

When I talk to customers, I usually recommend the following:

 1. Start with visibility. If your query patterns are hard to predict, you need good observability. Track how often queries run, how much data they scan, and how many run at the same time. Use this information to estimate costs, set baselines, and create alerts when something looks unusual.

 2. Design for efficiency. Architect your system so each query scans less data. Techniques like data partitioning, pre-aggregations, and materialized views can significantly reduce workload.

3. Govern how queries are used. Limit heavy ad-hoc queries and use caching when the same queries run repeatedly. These steps help keep performance strong while avoiding unexpected bills.
Microsoft provides guidance on Eventhouse cost drivers and how cost per GB is calculated in Fabric Real-Time Intelligence, please refer to the below links and use Azure Pricing Calculator for estimation.

Eventhouse overview

Cost breakdown of Eventhouse

Q4.  Most organizations can’t simply flip a switch from SQL Server to a new real-time intelligence platform—they need hybrid approaches and gradual migration paths. What patterns have you seen work well for running both traditional databases and real-time analytics systems in parallel?

How do you maintain consistency, handle data synchronization, and manage the complexity of having multiple sources of truth? This seems like a challenge that extends beyond Microsoft’s ecosystem to anyone dealing with legacy systems and modern streaming architectures.

Most organizations cannot transition away from SQL Server overnight nor is that the goal. SQL Server and Microsoft Fabric Real-Time Intelligence (RTI) each excel in different areas, and both play essential roles in a modern data architecture. The objective is not to replace one with the other, but to use each where it provides the greatest value. Many customers succeed with a hybrid approach that combines the strengths of both systems.

A common pattern is dual‑write or Change Data Capture (CDC). In this model, SQL Server continues to manage core transactional workloads, while CDC streams data changes into Microsoft Fabric RTI for real‑time analytics. With SQL Server 2025, this becomes even more powerful through Change Event Streaming (CES), which streams SQL Server changes directly into real‑time platforms. This delivers instant insights without disrupting existing applications and preserves SQL Server’s role in mission‑critical operations.

Another major advantage of the newly announced SQL database in Fabric is its native integration with OneLake. Because all data resides in a unified storage layer, customers can query transactional data from analytical engines and vice versa without any data movement. This reinforces the idea that organizations do not need to choose between systems; instead, they can combine them and leverage each where it performs best.

Q5. Time-series data—from IoT sensors to application logs to financial tick data—is exploding in volume, and many organizations are struggling to architect systems that can handle both ingestion velocity and query performance. You’ve seen how different systems handle these workloads through your work with Azure Synapse and Fabric RTI.

What are the most common architectural mistakes you see organizations make when building time-series analytics systems, regardless of the specific technology they choose? And what fundamental principles should guide developers when they’re evaluating whether their current database can scale, or whether they need to consider purpose-built time-series or real-time analytics platforms?

One of the most common mistakes in time-series analytics is treating it the same as traditional relational data. Many teams naturally gravitate toward transactional databases because that’s what they are familiar with. However, modern workloads demand purpose-built engines that dramatically simplify operations and deliver far better performance. A frequent issue occurs when high‑velocity sensor or log data is stored in row‑oriented databases without proper time‑based partitioning or indexing, resulting in slow queries and inefficient scans. Another recurring challenge is pushing both ingestion and analytics into the same system, which causes bottlenecks and unpredictable latency.

The core design principle for time‑series workloads is to architect for time and scale. These scenarios require fast ingestion, efficient compression, and time‑based indexing. When dashboards begin to lag or queries start scanning billions of rows, it is a clear signal that the underlying system is no longer scaling with the workload.

At that stage, organizations should consider purpose-built platforms such as Microsoft Fabric Real-Time Intelligence (RTI). RTI uses columnar storage and streaming ingestion to handle data velocity while enabling fast analytical queries. It also provides incremental aggregation and materialized views, helping reduce compute consumption while maintaining high performance.

In practice: partition by time, separate hot and cold data, and avoid forcing streaming workloads through transactional engines not designed for them.

With Microsoft Fabric RTI, Eventhouse automatically manages ingestion‑time partitioning, indexing, data expiration, cleanup, and the transition of data from hot to cold storage. This automation significantly reduces operational overhead, eliminating many of the manual tasks that are traditionally both time-consuming and error‑prone.

……………………………………………..

Karunakr Kotha, Sr. Azure customer engineer -Microsoft.

Karunakr Kotha is a Technical leader with over 18+ years of experience and evangelist with deep expertise across database technologies. delivered scalable, cloud‑ready database solutions for a wide range of clients and led major modernization initiatives that improved performance and reduced costs. passionate about data and AI technologies and committed to delivering solutions that exceed business expectations. Outside of work, I enjoy playing cricket, blogging, camping, Technical Mentorship running marathons.

Connect with me on Linkedin here

Andres Bolans, Microsoft Escalation Enginee.

Andre worked with databases for nearly 20 years, from SQL Server 2000 to Azure SQL, Synapse Analytics, Data Explorer (Kusto), and now Microsoft Fabric.

He is passionate about driving product enhancements, mentoring technical teams, and empowering customers to reach their full potential.

You can reach connect Andres on Linkedin

You may also like...