On Kubernetes, Hybrid and Multi-cloud. Interview with Jonathan Ellis
“Container and orchestration technologies have made a quantum leap in manageability for microservice architectures. Kubernetes is the clear winner in this space. It’s taken a little longer, but recently Kubernetes has turned a corner in its maturity and readiness to handle stateful workloads, so you’re going to see 2020 be the year of Kubernetes adoption in the database space in particular. “— Jonathan Ellis.
I have interviewed Jonathan Ellis, Co-Founder and CTO at DataStax. We talked about Kubernetes, Hybrid and Multi-cloud. In addition, Jonathan tells us his 2020 predictions and thoughts around migrating from relational to NoSQL.
Happy and Healthy New Year! RVZ
Q1. Hybrid cloud vs. multi-cloud: What’s the difference?
Jonathan Ellis: Both hybrid and multi-cloud involve spreading your data across more than one kind of infrastructure. As most people use the terms, the difference is that hybrid cloud involves a mix of public cloud services and self-managed data center resources, while multi-cloud involves using multiple public cloud services together, like Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP).
Importantly, multi-cloud is more than using multiple regions within one cloud provider’s infrastructure. Multiple regions can provide resiliency and distribution of your data (although outages with a large enough blast radius can still affect multiple regions, like Azure’s global DNS outage earlier this year), but you’re still limited to the features of a single provider rather than a true multi-cloud environment.
Q2. What is your advice: When is it better to use on-prem, or hybrid, or multi-cloud?
Jonathan Ellis: There are three main areas to consider when evaluating the infrastructure options for an application. The best approach will depend on what you want to optimize for.
The first thing to consider is agility—cloud services offer significant advantages on how quickly you can spin infrastructure up and down, allowing you to concentrate on creating value on the software and data side. But the flip side of this agility is our second factor, which is cost. The agility and convenience of cloud infrastructure comes with a price premium that you pay over time, particularly for “higher level” services than raw compute and storage.
The third factor is control. If you want full control over the hardware or network or security environment that your data lives in, then you will probably want to manage that on-premises.
A hybrid cloud strategy can let you take advantage of the agility of the cloud where speed is the most important factor, while optimizing for cost or for control where those are more critical. This approach is popular for DataStax customers in the financial services sector, for instance. They like the flexibility of cloud, but they also want to retain control over their on-premises data center environment. We have partnered with VMware on delivering the best experience for public/private cloud deployments here.
DataStax builds on Apache Cassandra™ technology to provide fine-grained control over data distribution in hybrid cloud deployments. DataStax Enterprise (DSE) adds performance, security and operational management tools to help enterprises improve time-to-market and TCO.
Q3. IT departments are facing an uphill battle of managing hybrid, multi-cloud environments. Why does building scalable modern applications in the cloud remain a challenge?
Jonathan Ellis: Customers of modern, cloud-native applications expect quick response times and 100% availability, no matter where you are in the world. This means your data layer needs the ability to scale both in a single location and across datacenters. Relational databases and other systems built on master/slave architectures can’t deliver this combination of features. That’s what Cassandra was created for.
Cloud vendors have started trying to tackle these market requirements, but by definition their products are single-cloud only. DSE not only provides a data layer that can run anywhere, but it can actually run on a single cluster that spans machines on-premises and in the cloud, or across multiple public clouds.
Q4. Securing a multi-cloud strategy can be difficult due to a lack of visibility across hosts. What is your take on this?
Jonathan Ellis: Security for a multi-cloud architecture is more complex than security for a single cloud and has unique challenges. Security is required at multiple levels in the cloud and often involves compliance with regulatory standards. While security vendors are trying to solve this problem across clouds, the current tooling is limited and the feature sets vary so the ability to have a cohesive view of the underlying IaaS across clouds is not optimal. This implies a need for IT teams to have skill sets for each cloud in their architecture, while relying on the AWS, GCP or Azure specific security, monitoring, alerting and analytics services to provide visibility. (As applications and databases move to managed kubernetes platforms like GKE, EKS and AKS, some of the security burden for host level security shifts to the cloud providers who manage and secure these instances at different levels.)
These challenges are not stopping companies from moving forward with a multi-cloud strategy, driven by the advantages of avoiding vendor lock in and improved efficiency from a common data layer across their infrastructure, as well as by non-technical factors such as acquisitions.
Datastax provides capabilities that enable companies to improve their security posture and help with the security challenges. At the data security level, DSE advanced security allows companies to minimize risk, achieve granular access control, and help with regulatory compliance. It does this with functionality like unified authentication, end-to-end encryption, and enhanced data auditing. We are also developing a next generation cloud based monitoring tool that will have a unified view across all of your Cassandra deployments in the cloud and will be able to provide visibility into the underlying instances running the cluster. Finally, Datastax managed services offerings like Apollo (see below) will also provide some relief to this problem.
Q5. You recently announced early access to the DataStax Change Data Capture (CDC) Connector for Apache Kafka®. What are the benefits of bridging Apache Kafka with Apache Cassandra?
Jonathan Ellis: Event streaming is a great approach for applications where you want to take actions in realtime. Apache Kafka was developed by the technology team at LinkedIn to manage streaming data and events for these scenarios.
Cassandra is the perfect fit for event streaming data because it was built for the same high ingest rates that are common for streaming platforms such as Kafka. DataStax makes it easier to bring these two technologies together so that you can do all of your real-time streaming operations in Kafka and then serve your application APIs with a highly available, globally distributed database. This defines a future proof architecture that handles any needs that microservices and associated applications throw at it.
It’s important to recognise what Kafka does really well in streaming, and what Cassandra does well in data management. Bringing these two projects together allows you to do things that you can’t do with either by itself.
Q6. DataStax recently announced a production partnership with VMware in support of their VMware vSAN to include hybrid and multi-cloud configurations. Can you please elaborate on this?
Jonathan Ellis: We have worked with VMware for years on how to support hybrid cloud environments, and this partnership is the result. VMware and DataStax have a lot of customers in common, and for a lot of those customers, the smoothest path to cloud is to use VMware to provide a common substrate across their on-premises and cloud deployments. Partnering with VMware allows DataStax to provide improved performance and operational experience for these enterprises.
Q7. What are your 2020 predictions and thoughts around migrating from relational to NoSQL?
Jonathan Ellis: Container and orchestration technologies have made a quantum leap in manageability for microservice architectures. Kubernetes is the clear winner in this space. It’s taken a little longer, but recently Kubernetes has turned a corner in its maturity and readiness to handle stateful workloads, so you’re going to see 2020 be the year of Kubernetes adoption in the database space in particular. (Kubernetes support for DSE is available on our Labs site.)
In terms of moving from relational to NoSQL, there’s still a gap that exists in terms of awareness and understanding around how best to build and run applications that can really take advantage of what Cassandra can offer. Our work in DataStax Academy for Cassandra training will continue in 2020, educating people on how to best make use of Cassandra and get started with their newest applications. This investment in education and skills development is essential to helping the Cassandra community develop, alongside the drivers and other contributions we make on the code side.
Q8. What is the road ahead for Apache Cassandra?
Jonathan Ellis: I was speaking to the director of applications at a French bank recently, and he said that while he thought the skill level for developers had gone up massively overall, he also thought that skills specifically around databases and data design have remained fairly static, if not down over time. To address this skills gap, and to take advantage of cloud-based agility, we’ve created the Apollo database (now in open beta) as a cloud-native service based on Cassandra. This makes the operational complexities of managing a distributed system a complete non-problem.
Our goal is to continue supporting Cassandra as the leading platform for delivering modern applications across hybrid and multi-cloud environments. For companies that want to run at scale, it’s the only choice that can deliver availability and performance together in the cloud.
Jonathan is a co-founder of DataStax. Before DataStax, Jonathan was Project Chair of Apache Cassandra for six years, where he built the Cassandra project and community into an open-source success. Previously, Jonathan built an object storage system based on Reed-Solomon encoding for data backup provider Mozy that scaled to petabytes of data and gigabits per second throughput.
– DataStax Enterprise (DSE)
– Apollo database
– The Global AI Index 2019, ODBMS.org DEC. 17, 2019
– Look ahead to 2020 in-memory computing, ODBMS.org DEC. 27, 2019
Follow us on Twitter: @odbmsorg
Follow us on: LinkedIn