On InfluxDB 3.0: The Future of Time Series Analytics. Q&A with Gunnar Aasen
Q1. InfluxData just announced InfluxDB 3.0, a new product suite built around InfluxDB IOx. Tell us about this announcement.
We just announced the latest version of our platform, InfluxDB 3.0, which will serve as the foundation for all InfluxData’s products in the future. InfluxDB 3.0 brings the new storage engine and capabilities of our open source InfluxDB IOx project into our main suite of database products.
InfluxDB 3.0 is currently available in both of our cloud products. InfluxDB Cloud Serverless is our fully managed, multi-tenant database with usage-based pricing. InfluxDB Cloud Dedicated is now generally available and provides a fully managed, scalable, and performant single-tenant InfluxDB cluster.
We also announced that InfluxDB 3.0 will be the core of a couple of new products launching later this year:
- InfluxDB 3.0 Clustered, which is the evolution of our InfluxDB Enterprise product. InfluxDB 3.0 Clustered is a self-managed InfluxDB cluster for deployment on-premises or in a private cloud.
- InfluxDB 3.0 Edge, which is a single-node time series database for local edge deployments.
The InfluxDB 3.0 product suite delivers significant gains in high-volume ingestion, better compression and storage cost reduction, and real-time SQL queries.
Q2. What are the key features of InfluxDB 3.0?
InfluxDB 3.0 delivers on our vision to build a single datastore to handle the trifecta of operational monitoring: metrics, events, and tracing data. The architecture of InfluxDB 3.0 provides the freedom of unlimited cardinality in your schema. This enables InfluxDB to address new use cases for observability, real-time analytics, IoT (including IIoT), and other time series problems.
Key features of InfluxDB 3.0 include:
- Industry-leading performance. Scale to the highest sampling frequency or deploy your global fleet of sensors. InfluxDB will handle your workload by independently scaling ingest and query.
- SQL language support. Leverage familiar SQL syntax to explore, join, and transform your time series data. Instantly plug into business intelligence tools with SQL language support.
- Low-cost storage. Best in category compression and use of object store reduces data storage costs to enable ten times more storage without sacrificing performance.
- Unlimited cardinality. Eliminates a major barrier to scaling time series workloads in InfluxDB and other TSDBs. InfluxDB 3.0 can handle data with ephemeral time series, like tracing.
Q3. What are the underlying technologies the InfluxData team used to build InfluxDB 3.0?
Building a new storage engine is a huge effort. When we started the effort that became InfluxDB 3.0 several years ago, we surveyed the landscape of open source projects for working with data and the Apache Arrow project caught our eye.
Apache Arrow is a columnar, in-memory data format and an ecosystem of projects using the format. The main appeal of Apache Arrow is data buffered in-memory can be shared with other processes with no serialization and minimal copying. Being a columnar format, Arrow is optimized for data analytics processing. These attributes are very complementary for time series data, which is often consumed via aggregations.
The Apache Arrow project nurtures several other projects, which build tools for working with Arrow data. Two of the projects we use heavily are Apache Arrow DataFusion and Apache Arrow Flight SQL. DataFusion is a SQL query engine that operates on Arrow buffers. Flight SQL is a client-server protocol for executing SQL queries with Arrow data. We built InfluxDB 3.0 on top of these and other Apache Arrow projects, which gives us a stable and interoperable foundation.
InfluxData engineers make frequent and significant contributions to the upstream Apache Arrow projects as we develop InfluxDB 3.0. Since Apache Arrow is forming the core of a new set of large-scale analytics tools, InfluxDB 3.0 is able to immensely benefit from the inherent interoperability with this next-generation toolset.
Q4. How does the architecture of InfluxDB 3.0 make it well-suited for real-time analytics?
The InfluxDB 3.0 architecture is a huge evolutionary jump over the original InfluxDB versions in some important ways, with flexibility around scaling performance being a key differentiator. The new architecture separates out the ingest, query, and storage into independent paths, while providing real-time query results. This enables, for example, more resources to be put into the query tier to handle a spike during a window of anticipated high query load while leaving the ingest path unaffected.
InfluxDB 3.0 shifts to storing data files in an object store, like AWS S3. Compared to previous versions of InfluxDB, which required fast and expensive SSDs, costs with an object store can be close to an order of magnitude less. This enables longer retention of high-fidelity data.
Another impactful architecture change on the storage side is the use of Parquet files for data storage. Apache Parquet is another popular Apache project that defines a common on-disk data format. It is a mature technology common in data lake software and will eventually allow integration with data lakes to directly fetch InfluxDB 3.0 Parquet files for processing jobs.
Q5. Tell us about the performance gains in InfluxDB 3.0. What does this mean for developers using the database?
We developed InfluxDB 3.0 as a columnar store written in Rust, a cutting-edge programming language designed for speed, efficiency, reliability, and memory safety. For many analytics queries, the use of a columnar format allows the use of SIMD instructions on modern CPUs to improve performance of analytical processing by orders of magnitude. In many cases, InfluxDB 3.0 returns queries in seconds (or less) that would take minutes or hours on row-oriented relational databases and even other NoSQL databases.
Q6. How does InfluxDB 3.0 fit into the broader data analytics ecosystem?
We use the Apache Arrow DataFusion query engine to implement native SQL support. The addition of SQL, the lingua franca of data analysts, greatly expands the number of users who can quickly experience the speed of InfluxDB. For original InfluxDB users, our SQL-like language for time series data, InfluxQL, is also available in InfluxDB 3.0 as a native port on top of DataFusion.
Through the Flight SQL client-server protocol, InfluxDB 3.0 can connect directly to business intelligence and machine learning tools. With Flight SQL, data is passed around in Arrow format. This allows analysts and data scientists to efficiently fetch large query results sets from InfluxDB and operate directly on the data with Arrow clients with no serialization to intermediate formats.
Whether users are running analytics for preventative maintenance or evaluating financial forecasts. Developers will be able to use SQL with popular tools such as Grafana, Apache SuperSet, and Jupyter Notebooks to visualize their data. Soon, InfluxDB will support pretty much any SQL-based tool using a JDBC driver for Flight SQL.
Q7. What’s on the horizon for InfluxData?
We have more releases on the calendar for 2023. To date, our InfluxDB 3.0 releases have focused on fully managed cloud offerings, InfluxDB Cloud Serverless and InfluxDB Cloud Dedicated. Upcoming releases will target self-managed deployments. InfluxDB 3.0 Clustered will be the next evolution of our InfluxDB Enterprise product. We are also planning a single instance InfluxDB 3.0 Edge product. Stay tuned for more on that later this year.
Of course, we’ll continue to make enhancements to the broader InfluxDB 3.0 platform and our managed cloud products. The next incremental improvement for InfluxDB Cloud Dedicated will be availability in Azure regions.
Qx. Anything else you want to share?
We’re very excited to get InfluxDB 3.0 into the hands of our users. It’s a massive leap forward for InfluxDB. While other time series databases have been catching up to InfluxDB’s performance and ease of use over the past few years, this new technology affirms InfluxData’s leadership in the time series database market.
Looking forward, we are thrilled to see growing adoption of InfluxDB 3.0 for real-time analytics workloads. Especially, among users with large data volumes who are tired of long delays to query new data, poor handling of upserts, and operational complexity.
Gunnar Aasen is a Senior Product Manager at InfluxData. Gunnar was an early employee and the first support engineer at InfluxData. He now enjoys applying his deep technical expertise toward building developer-oriented products. He is based in Berkeley, California.
Sponsored by InfluxData.