On Managing Cloud Native Data on Kubernetes. Q&A with Patrick McFadin
Q1. You wrote a new book called “Managing Cloud Native Data on Kubernetes”. What is this book about?
It’s meant to be a comprehensive overview of running all aspects of data infrastructure inside Kubernetes. This not only covers databases but also includes streaming and analytics. As an industry, we’ve been learning how to deploy and run large-scale projects for quite a while now. As more organizations are looking to consolidate the entire application stack under Kubernetes, there are things to learn about doing the old things in new ways. Stateful workloads in Kubernetes are now becoming the norm and my co-author Jeff Carpenter and I are trying to share the most state-of-the-art information.
One feature of the book that we are particularly excited about is subject matter expert sidebars. We have interviewed people inside the Kubernetes and data community to get useful points of view on specific topics. As a community, we learn by sharing together. This is meant to embody that ideal and create a diversity of thought about an incredibly massive topic.
Q2. What are containerized applications? And what are they useful for?
In the simplest terms, it means building containers of your application components instead of relying on traditional software installations to deploy software. When we talk about containerized applications there is much more to the story, however. It implies that there are also strict version controls, monitoring, and automation.
Organizations that rely on containerized applications are finding a lot more efficiency in resource costs, which makes it an easy sell. The real impact has been in increased velocity, however. Several new surveys have shown that companies that embrace data on Kubernetes have almost twice as much velocity when bringing new applications into production. Repeatability and efficiency have a long-term effect and it is now measurable.
Q3. Why has Kubernetes gained broad acceptance in almost all enterprises?
A couple of main reasons. First, it was the right technology that fits into the evolution of automated infrastructure. DevOps is a practice that has been evolving for almost 15 years. With the introduction of containers, there was the need to orchestrate more within a common control plane. Kubernetes is the technology for continuing the progress of cloud native applications and will be the home of innovation for many more years.
Second: the communities that have grown around Kubernetes. The momentum that thriving communities lend to decision-making in modern IT infrastructure is incredibly important. It ensures there is a continuous supply of problem-solving and helps to grow the number of people qualified to do the job. The cloud native and Kubernetes communities are some of the most vibrant right now. Enterprises run the risk of looking like laggards if they don’t adopt Kubernetes somewhere in their infrastructure.
Q4. Is Kubernetes the new Operating System?
It certainly seems like it with the analogies we can create. Given the entire application stack, you can now deploy all of it as one declarative configuration. It’s similar to installing a package in Linux where the details are all behind an apt or yum command. I don’t think we are quite there yet but I can see that as a likely future.
Today, I prefer to think of using Kubernetes for creating virtual data centers. We have used virtual machines to declare the underlying components such as CPU, memory, and disk. Our applications run inside and consume the resources. The entire machine can then be moved, copied, and deployed with repeatable success. Today’s cloud native applications consist of compute, network, and storage typically rented from cloud providers. We can use those to now compose our entire application stacks with a high degree of portability. That part is very similar to how operating systems make things easier.
Q5. What are the tips you can offer for effective Cloud-Native Data Management?
There are a few principles that anyone going this route should consider.
- Always try to leverage compute, network, and storage as a commodity
- Maintain separation of your control and data planes
- Make observability easy and default. When something is deployed, it gets monitored.
- The default state for your entire stack should be secure. Use TLS and firewalls everywhere.
- Embrace the power of declarative configuration. Avoid imperative one-offs when possible.
Q6. What are the right elements for designing and implementing data management functions for cloud native applications?
We try to make this a theme throughout the book, but embracing the qualities of a cloud native application throughout the stack is an important thing for organizations to adopt.
- Scaling – The ability to add more capacity is a critical element of today’s applications. You don’t know how much you’ll need until you need it. How fast can you respond to increased load?
- Elastic – Cost is the final arbitrator for many of the decisions we make in infrastructure. After you respond to a high-scale event, how fast can you release those resources?
- Self-healing – The old way of infrastructure meant being on pager duty and responding to problems at all hours. At the scale required for modern cloud applications, we can’t possibly keep up unless the systems have ways of routing around issues.
- Observability – Cloud native applications are by definition distributed. From request to persistence, we need insights into every step to ensure systems are working their best. To get there, we need useful metrics from every component.
Q7. What are the prerequisites for Creating and Managing Cloud Native Services in Kubernetes?
Readers should be familiar with traditional data infrastructure since that is a basis for the discussion. No experience with Kubernetes is required and can be something learned in parallel while you read the book. I also recommend having an environment to try things.
Qx. Anything else you wish to add?
For anyone wanting to be part of the larger community of cloud native data management, the Data on Kubernetes community is really taking off and is a great independent resource. You can find it at dok.community
by Jeff Carpenter, Patrick McFadin
Released January 2023. Publisher: O’Reilly Media, Inc.
Patrick McFadin joined the internet-all-the-things wave in the 1990s and continued after the dot com crash, architecting highly scalable web platforms. While building open source infrastructure using Cassandra, he eventually landed at DataStax where he is now vice president of Developer Relations, and co-author of the upcoming O’Reilly book “Managing Cloud Native Data on Kubernetes”.
Sponsored by DataStax