On Vertica in Eon Mode. Interview with Ben Vandiver
“I would argue that the definition of “small” keeps getting bigger as hardware improves and more economical storage options abound. As data volumes get bigger and bigger, organizations are looking to graduate out of the “small” arena and start to leverage big data for truly transformational projects. “–Ben Vandiver
I have interviewed Ben Vandiver, CTO at Vertica. Main topics of the interview are: Vertica database, the Cloud, and the new Vertica cloud architecture: Eon Mode.
Q1. Can you start by giving us some background on your role and history at Vertica?
Ben Vandiver: My bio covers a bit of this, but I’ve been at Vertica from version 2.0 to our newly released 9.1. Along the way I’ve seen Vertica transform from a database that could barely run SQL and delete records, to an enterprise grade analytics platform. I built a number of the core features of the database as a developer. Some of my side-projects turned into interesting features: Flex tables is Vertica’s schema-on-read mechanism and Key/Value allows fast, scalable single node queries. I started the Eon mode project 2 ½ years ago to enable Vertica to take advantage of variable workloads and shared storage, both on-premises and in the cloud. Upon promotion to CTO, I continue to remain engaged with development as a core architect, but I also look after product strategy, information flow within the Vertica organization, and technical customer engagement.
Q2. Is the assumption that “One size does not fit all” (aka Michael Stonebraker) still valid for new generation of databases?
Ben Vandiver: Mike’s statement of “One size does not fit all” still holds and if anything, the proliferation of new tools demonstrates how relevant that statement still is today. Each tool is designed for a specific purpose and an effective data analytics stack combines a collection of best-in-class tools to address an organization’s data needs.
For “small” problems, a single flexible tool can often address these needs. But what exactly is “small” in today’s world?
I would argue that the definition of “small” keeps getting bigger as hardware improves and more economical storage options abound. As data volumes get bigger and bigger, organizations are looking to graduate out of the “small” arena and start to leverage big data for truly transformational projects. These organizations would benefit from developing a data stack that incorporates the right tools – BI, ETL, data warehousing, etc. – for the right jobs, and choosing solutions that favour a more open, ecosystem-friendly architecture.
This belief is evident in Vertica’s own product strategy, where our focus is to build the most performant analytical database on the market, free from underlying infrastructure and open to a wide range of ecosystem integrations.
Q3. Vertica, like many databases, started off on-premises and has moved to the cloud. What has that journey looked like?
Ben Vandiver: Our pure software, hardware agnostic approach has enabled Vertica to be deployed in a wide variety of configurations, from embedded devices to multiple cloud platforms. Historically, most of Vertica’s deployments have been on-premises, but we’ve been building AMIs for running Vertica in the Amazon cloud since 2008. More recently, we have built integrations for S3 read/write and cloud monitoring.
In our 9.0 release last year, we extended our SQL-on-Hadoop offering to support Amazon S3 data in ORC or Parquet format, enabling customers to run highly-performant analytical queries against their Hadoop data lakes on S3.
And of course, with our latest 9.1 release, the general availability of Eon Mode represents a transformational leap in our cloud journey.
With Eon Mode, Vertica is moving from simply integrating with cloud services to introducing a core architecture optimized specifically for the cloud, so customers can capitalize on the economics of compute and storage separation.
Q4. Vertica just released a completely new cloud architecture, Eon Mode. Can you describe what that is and how it works?
Ben Vandiver: Eon Mode is a new architecture that places the data on a reliable, cost-effective shared storage, while matching Vertica Enterprise Mode’s performance on existing workloads and supporting entirely new use cases. While the design reuses Vertica’s core optimizer and execution engine, the metadata, storage, and fault tolerance mechanisms are re-architected to enable and take advantage of shared storage. A sharding mechanism distributes load over the nodes while retaining the capability of running node-local table joins.
A caching layer provides full Vertica performance on in-cache data and transparent query on non-cached data with mildly degraded performance.
Eon Mode initially supports running on Amazon EC2 compute and S3 storage, but includes an internal API layer that we have built to support our roadmap vision for other shared storage platforms such as Microsoft Azure, Google Cloud, or HDFS.
Eon Mode demonstrates strong performance, superior scalability, and robust operational behavior.
With these improvements, Vertica delivers on the promise of cloud economics, by allowing customers to provision only the compute and storage resources needed – from month to month, day to day, or hour to hour – while supporting efficient elasticity. For organizations that have more dynamic workloads, this separation of compute and storage architecture represents a significant opportunity for cloud savings and operational efficiency.
Q5. What are the similarities and differences between Vertica Enterprise Mode and Vertica Eon Mode?
Ben Vandiver: Eon Mode and Enterprise Mode have both significant similarities and differences.
Both are accessible from the same RPM – the choice of mode is determined at the time of database deployment. Both use the same cost-based distributed optimizer and data flow execution engine. The same SQL functions that run on Enterprise Mode will also run on Eon Mode, along with Vertica’s extensions for geospatial, in-database machine learning, schema-on-read, user-defined functions, time series analytics, and so on.
The fundamental difference however, is that Enterprise Mode deployments must provision storage capacity for the entire dataset whereas Eon Mode deployments are recommended to have cache for the working set. Additionally, Eon Mode has a lightweight re-subscribe and cache warming step which speeds recovery for down nodes. Eon Mode can rapidly scale out elastically for performance improvements which is the key to aligning resources to variable workloads, optimizing for cloud economics.
Many analytics platforms offered by cloud providers are not incentivized to optimize infrastructure costs.
Q6. How does Vertica distribute query processing across the cluster in Eon Mode and implement load balancing?
Ben Vandiver: Eon Mode combines a core Vertica concept, Projections, with a new sharding mechanism to distribute processing load across the cluster.
A Projection describes the physical storage for a table, stipulating columns, compression, sorting, and a set of columns to hash to determine how the data is laid out on the cluster. Eon introduces another layer of indirection, where nodes subscribe to and serve data for a collection of shards. During query processing, Vertica assembles a node to serve each shard, selecting from available subscribers. For an elastically scaled out cluster, each query will run on just some of the nodes of the cluster. The administrator can designate sub-clusters of nodes for workload isolation: clients connected to a sub-cluster run queries only on nodes in the sub-cluster.
Q7. What do you see as the primary benefits of separating compute and storage?
Ben Vandiver: Since storage capacity is decoupled from compute instances, an Eon Mode cluster can cost-effectively store a lot more data than an Enterprise Mode deployment. The resource costs associated with maintaining large amounts of historical data is minimized with Eon Mode, discouraging using two different tools (such as a data lake and a query engine) for current and historical queries.
The operational cost is also minimized since node failures are less impactful and easier to recover from.
On the flip side, running many compute instances against a small shared data set provides strong scale-out performance for interactive workloads. Elasticity allows movement between the two extremes to align resource consumption with dynamic needs. And finally, the operational simplicity of Eon Mode can be impactful to the health and sanity of the database administrators.
Q8. What types of engineering challenges had to be overcome to create and launch this new architecture?
Ben Vandiver: Eon Mode is an application of core database concepts to a cloud environment. Even though much of the core optimizer and execution engine functionality remains untouched, large portions of the operational core of the database are different in Eon Mode. While Vertica’s storage usage maps well to an object store like S3, determining when a file can be safely deleted was an interesting challenge. We also migrated a significant amount of our test infrastructure to AWS.
Finally, Vertica is a mature database, having been around for over 10 years – Eon Mode doesn’t have the luxury to launch as a 0.1 release full of bugs. This is why Eon Mode has been in Beta, both private and public, for the last year.
Q9. It’s still early days for Eon Mode’s general availability, but do you have any initial customer feedback or performance benchmarks?
Ben Vandiver: Although Eon Mode just became generally available, it’s been in Beta for the last year and a number of our Beta customers have had significant success with this new architecture. For instance, one large gaming customer of ours subjected a much smaller Eon Mode deployment to their full production load, and realized 30% faster load rates without any tuning. Some of their queries ran 3-6x faster, even when spilling out of the cache. Operationally, the company’s node recovery was 6-8x faster and new nodes could be added in under 30 minutes. Eon Mode is enabling this customer to not only improve query performance, but the dynamic AWS service consumption resulted in dramatic cost savings as well.
Q10. What should we expect from Vertica in the future with respect to cloud and Eon Mode product development?
Ben Vandiver: We are working on expanding Eon Mode functionality in a variety of dimensions. By distributing work for a shard among a collection of nodes, Eon Mode can get more “crunch” from adding nodes, thus improving elasticity. Operationally, we are working on better support for sub-clusters, no-downtime upgrade, auto-scaling, and backup snapshots for operator error. As mentioned previously, deployment options like Azure cloud, Google cloud, HDFS, and other on-premises technologies are on our roadmap. Our initial 9.1 Eon Mode release is just the beginning. I’m excited at what the future holds for Vertica and the innovations we continue to bring to market in support of our customers.
I spent many years at MIT, picking up a bachelor’s, master’s, and PhD (My thesis was on Byzantine Fault Tolerance of Databases). I have a passion for teaching, having spent several years teaching computer science.
From classes of 25 to 400, I enjoy finding clear ways to explain technical concepts, untangle student confusion, and have fun in the process. The database group at MIT, located down the hall from my office, developed Vertica’s founding C-Store paper.
I joined Vertica as a software engineer in August 2008. Over the years, I worked on many areas of the product including transactions, locking, WOS, backup/restore, distributed query, execution engine, resource pools, networking, administrative tooling, metadata management, and so on. If I can’t answer a technical question myself, I can usually point at the engineer who can. Several years ago I made the transition to management, running the Distributed Infrastructure, Execution Engine, and Security teams. I believe in an inclusive engineering culture where everyone shares knowledge and works on fun and interesting problems together – I sponsor our Hackathons, Crack-a-thon, Tech Talks, and WAR Rooms.
More recently, I’ve been running the Eon project, which aims to support a cloud-ready design for Vertica running on shared storage. While engineering is where I spend most of my time, I occasionally fly out to meet customers, notably a number of bigger ones in the Bay area. I was promoted to Vertica CTO in May 2017.
– For more information on Vertica in Eon Mode, read the technical paper: Eon Mode: Bringing the Vertica Columnar Database to the Cloud.
– To learn more about Vertica’s cloud capabilities visit www.vertica.com/clouds
– On RDBMS, NoSQL and NewSQL databases. Interview with John Ryan ODBMS Industry Watch, 2018-03-09
– On Vertica and the new combined Micro Focus company. Interview with Colin Mahony ODBMS Industry Watch, 2017-10-25
Follow us on Twitter: @odbmsorg