On the Cloud and Vertica. Q&A with David Sprogis
Q1. When do you recommend it is appropriate for an organization to shift all of their data to the cloud or to take instead a hybrid approach?
Great question! First, I think it is important to understand that hybrid is not one architectural pattern but many. For example, we see hybrid architectures supporting (1) disaster recovery in the cloud with production on premises, (2) production on premises with seasonal overflow supported in the cloud, and (3) development environments in the cloud with production deployments on premises. A fourth, favorite example is training machine learning models in the cloud for production deployment on premises. The customer trains their machine learning (ML) model every night, adding new data from the last 24 hours and dropping the oldest data to create a sliding window of training data. Training takes only 4 hours so it would have been wasteful to purchase equipment and leave it idle for the other 20 hours of the day. Cloud offers the compute they need when they need it.
“Compute when you need it” is the first of three value propositions offered by cloud infrastructure. Cloud is ideal for variable or intermittent workloads, enabling users to scale compute up when demand is high and scale compute down to save money when demand is low.
Opex over Capex is probably the second most important consideration for cloud usage. Start-ups, for example, want to conserve capital and pay for infrastructure as they consume it. Cloud is a great starting point for many businesses that want to focus on their core offering and not get distracted by data center build-out.
Scale is probably the third value proposition for cloud. Unless a user’s compute requirements have scale, the costs of building out a data center and hiring staff to manage it probably exceed the potential savings in actual compute cost. Cloud providers simplify and automate operations which reduces the complexity for a start-up or small company.
And therein might be a fourth consideration, the value of time. Vertica users can started running in the cloud within 15 to 20 minutes on any of the top three public clouds. How long would it take to order, install, and configure your own equipment on premises? What is the opportunity cost of waiting?
Our users have other considerations, data sovereignty and transfer cost among them. Bringing it all together, Vertica recognizes that cloud and on-premise deployments each have their pros and cons. Frequently, users will go with a hybrid approach. This is why the freedom to run Vertica on premises, in the cloud, or both has been and will continue to be a cornerstone to our strategy.
Q2. Why is Vertica offering two deployment modes for getting analytics up and running in the clouds?
I am glad you asked because this question comes up frequently. And, of course, the follow-on question is, “Which mode should I run?” I’ll start by answering your question.
When introducing a radical new architecture that challenges tenets of the existing architecture, it is important to isolate new functionality completely. Sure, both modes share the same code base but Enterprise mode is a share-nothing architecture whereas Eon mode is a communal-storage architecture. Both modes share the same query planning and execution engine, the same advanced analytics and machine learning, but down in the nuts and bolts of where data is stored and how it is managed, there are very significant differences. Thus, we have two modes.
Taking a step back, it is important to note why we introduced the new mode, Eon. Eon was introduced specifically to address variable workloads in the cloud. Users want to be able to scale their compute up and down with demand. Enterprise’s share-nothing architecture takes a long time to scale because the data must be re-segmented. Eon scales quickly because responsibility for the durable copy is moved to communal storage. Nodes simply subscribe to a segment (“shard”) which does not require re-segmenting data.
To our surprise, we learned that there is huge demand for Eon on premises. This is because enterprise data centers are adopting aspects of cloud data architecture to improve operational efficiencies, most notably, separation of compute and storage pursuant to a “data lake” strategy. Vertica is responding to this demand by prioritizing efforts to bring Eon on premises.
Q3. When should a company choose to run Vertica in Eon Mode vs Enterprise Mode?
Well, variable workload is one great reason to run Eon mode. If a user’s demand fluctuates through the day or through the week, Eon mode enables the user to scale compute in response to changing demand. But it’s important to note that Eon takes advantage of the notion of “hot” data and “cold” data. Eon mode is not uniformly performant across the entire database – only the most frequently queried “hot” data, which Eon mode caches, provides the high level of performance Vertica users expect. If a use case requires uniformly high performance across the entire database, Enterprise mode is probably the better choice.
We have learned a lot from running Vertica in both modes, and we are beginning to see a future in which they coexist. Our long-range vision is to merge Eon and Enterprise into a single mode of operation so that users do not have to decide which mode to run, or how to get their data from one mode to the other.
Q4. What are the main differences and similarities between Vertica in Enterprise Mode and Vertica in Eon Mode?
The differences are operational. Eon mode can be scaled quickly in response to workload demand, can be shut down completely then revived when needed again, and workloads can be separated to support different SLAs, different types of workload or different business units for accounting and charge-back ability. To achieve this operational flexibility, Eon depends on caching for performance which means that Eon is less performant when reading data outside of the cache.
Where large volumes of data are needed in low cost shared storage, but smaller amounts of the most relevant data are the focus of the use case, Vertica in Eon Mode is a great solution.
Where access to the entire database is required and the highest levels of performance are expected for all queries and all data, Vertica in Enterprise mode would deliver the greatest performance at a better value. In addition, due to the maturity and sophistication proven across many thousands of scenarios and use cases, Enterprise mode has more optimizations and control options.
To the user or the application, there is no apparent difference between Eon mode and Enterprise mode – same queries, comparable performance, same advanced analytics and machine learning capabilities.
Q5. Vertica in Eon mode separates the computational processes from the storage layer of the database: What are the benefits?
There are a number of benefits to separating compute from storage. We have covered a few already. Worth emphasizing is the value of the “data lake” which breaks the barriers between otherwise siloed data so that data can be joined across business units for greater insights.
Sure, Vertica can be the “data lake” and meet most, if not all of an enterprise’s needs, but Vertica also plays well in an environment with other tools acting on the same data. Data center operations may decide that Spark is the best tool for pre-processing and ETL. There may be a need for an OLTP database as well as Vertica for OLAP uses. And for the occasional ad hoc queries, a data center might go with a query engine on a Hadoop cluster rather than keeping the data in Vertica. However, I will point out that Vertica does offer formidable query engine capabilities on ORC and Parquet which we call “external tables”.
Q6. How does Vertica address the need to rapidly scale clusters to variable workloads?
There are two important points here: (1) scaling for concurrency vs scaling for performance and (2) auto-scaling vs scripted scaling.
Regarding the first point, for the last year Eon mode has scaled for concurrency. Said plainly, scaling an Eon mode cluster enabled processing of more queries at the same time but the query, itself, would not run faster. This was a departure from what Vertica was known for – namely, scaling for performance. That is about to change. With the coming release of Vertica 9.2.1 we are featuring a new capability in Eon mode, “query crunching”. “Query crunching” may be a humorous name but the results are serious performance improvements. When an Eon mode cluster is scaled with this feature turned on, additional nodes will collaborate to improve the performance of the individual query. This gives users the choice of scaling for concurrency or scaling for performance.
Regarding the second point, I am frequently asked about auto-scaling for Eon mode. Specifically, users want Vertica to scale up automatically as their demand increases and scale down automatically when demand decreases. Auto-scaling is a great idea and would be practical in a world of instant elasticity. While elasticity is rapid, it’s not instant and the latency creates a bunch of complicating issues that make auto-scaling impractical for most situations. For this reason, Vertica recommends scripted scaling based on historic workload patterns. Scripting is a proven and practical way to ensure that compute capacity is available when you need it and ensures that your infrastructure costs are predictable.
Saying that GoodData is a favorite partner is like identifying one of your children as your favorite (but in the case of GoodData, it’s true!) Vertica is a very powerful platform. GoodData brings the power of Vertica to the customer for better, faster insights. They have been a great partner to work with, helping us to improve aspects of Eon mode while we help them to deliver fully managed insights at massive scale to their customers.
Q8. Anything else you wish to add?
Colin Mahony, Vertica’s GM, is known for saying, “Our next release will be our best release,” and as owner of the Eon roadmap, I share Colin’s enthusiasm. Eon is a journey as much as it is a mode of operation. It is a journey of operational flexibility, bringing high performance at scale to the changing needs of the data center, in the cloud, on premises and hybrid. Our highest priority is to include Eon mode in our mantra for Enterprise, “Freedom to run anywhere”. And in the not-too-distant future, operating modes will merge, thereby allowing users to decide how they want to store data as it is loaded, rather than having to decide when the database is created.
David Sprogis is Principal Product Manager for Vertica’s cloud strategy and Eon mode architecture. His experience includes software development, data warehousing, analytics and data visualization across a number of verticals spanning three decades.