On Kubernetes. Interview with Eric Tune
“Perhaps less obvious is how role definitions in an organization change as scale increases. Once rare tasks that were just a small part of one team’s responsibilities become so common that they are a full-time job for someone. At that point, one either needs to create automation for the task, or a new team needs to be assembled (or hired) to perform that task full time. ” — Eric Tune
I have interviewed Eric Tune, Senior Staff Engineer at Google. We talked about Kubernetes. Eric has been a Kubernetes contributor since 1.0
Q1. What are the main technical challenges in implementing massive-scale environments?
Eric Tune: Whether working at small or massive scale, the high-level technical goals don’t change: security, developer velocity, efficiency in use of compute resources, supportability of production environments, and so on.
As scale increases, there are some fairly obvious discontinuities, like moving from an application that fits on a single-machine to one that spans multiple machines, and from a single data center or zone to multiple regions. Quite a bit has been written about this. Microservices in particular can be a good fit because they scale well to more machines and more regions.
Perhaps less obvious is how role definitions in an organization change as scale increases. Once rare tasks that were just a small part of one team’s responsibilities become so common that they are a full-time job for someone. At that point, one either needs to create automation for the task, or a new team needs to be assembled (or hired) to perform that task full time. Sometimes, it is obvious how to do this. But, when this repeats many times, one can end up with a confusing mess of automation and tickets, dragging down development velocity and confounding attempts to analyze security and debug systemic failure.
So, a key challenge is finding the right separation responsibilities so that multiple pieces of automation, and multiple human teams collaborate well. Doing it requires not only having a broad view of an organization’s current processes and responsibilities around development, operations, and security; but also which assumptions behind those are no longer valid.
Kubernetes can help hereby providing automation for exactly the types of tasks that become toilsome as scale increases. Several of its creators have lived through organic growth to a massive-scale. Kubernetes is built from that experience, with awareness of the new roles that are needed at massive-scale.
Q2. What is Kubernetes and why is it important?
Eric Tune: First, Kubernetes is one of the most popular ways to deploy applications in containers. Containers make the act of maintaining the machine & operating system a largely separate process from installing and maintaining an application instance – no more worrying about shared library or system utility version differences.
Second, it provides an abstraction over IaaS: VMs, VM images, VM types, load balancers, block storage, auto-scalers, etc. Kubernetes runs on numerous clouds, on-premises, and on a laptop. Many complex applications, such as those consisting of many microservices, can be deployed onto any Kubernetes cluster regardless of the underlying infrastructure. For an organization that may want to modernize their applications now, and move to cloud later, targeting Kubernetes means they won’t need to re-architect when they are ready to move. Third, Kubernetes supports infrastructure-as-code (IaC). You can define complex applications, including storage, networking, and application identity, in a common configuration language, called the Kubernetes Resource model. Unlike other IaC systems, which mostly support a “single-user” model, Kubernetes is designed for multiple users. It supports controlled delegation of responsibility from an ops team to a dev team.
Fourth, it provides an opinionated way to build distributed system control planes, and to extend the APIs and infrastructure-as code type system. This allows solution vendors and in-house infrastructure teams to build custom solutions that feel like they are first class parts of Kubernetes.
Q3. Who should be using Kubernetes?
Eric Tune: If your organization runs Linux-based microservices and has explored container technology, then you are ready to try Kubernetes.
Q4. You are a Kubernetes contributor since 1.0 (4 years). What did you work on specifically?
Eric Tune: During the first year, I worked on whatever needed to be done, including security (namespaces, service accounts, authentication and authorization, resource quota), performance, documentation, testing, API review and code review.
In those first years, people were mostly running stateless microservices on Kubernetes. In the second year, I worked to broaden the set of applications that can run on Kubernetes. I worked on the Job and CronJob APIs of Kubernetes, which support basic batch computation, and the StatefulSet API, which supports databases and other stateful applications. Additionally, I worked with the Helm project on Charts (easy-to-install applications for Kubernetes), with the Spark open source community to get it running on Kubernetes.
Starting in 2017, Kubernetes interest was growing so quickly that the project maintainers could not accept a fraction of the new features that were proposed. The answer was to make Kubernetes extensible so that new features could be build “out of the core.” I worked to define the extensibility story for Kubernetes, particularly for Custom Resource Definitions (CRDs) and Webhooks. The extensibility features of Kubernetes have enabled other large projects, such as Istio and Knative, to integrate with Kubernetes with lower overhead for the Kubernetes project maintainers.
Currently, I lead teams which work on both Open Source Kubernetes and Google Cloud.
Q5. What are the main challenges of migrating several microservices to Kubernetes?
Eric Tune: Here are three challenges I see when migrating several microservices to Kubernetes, and how I recommend handling them:
- Remove Ordering Dependencies: Say microservice C depends on microservices A and B to function normally. When migrating to declarative configuration and Kubernetes, the startup order for microservices can become variable, where previously it was ordered (e.g. by a script). This can cause unexpected behaviors. For example, microservice C might log errors at a high rate or crash if A is not ready yet. A first reaction is sometimes “how can I guarantee ordering of microservice startup,” My advice is not to impose order, but to change problematic behavior. For example, C could be changed to return some response for a request even when A and B are unreachable. This is not really a Kubernetes-specific requirement – it is a good practice for microservices, as it allows for graceful recovery from failures and for autoscaling.
- Don’t Persist Peer Network Identity: Some microservices permanently record the IP addresses of their peers at startup time, and then don’t expect it to ever change. That’s not a great match for the Kubernetes approach to networking. Instead, resolve peer addresses using their domain names and re-resolve after disconnection.
- Plan ahead for Running in Parallel: When migrating a complex set of microservices to Kubernetes, it’s typical to run the entire old environment and the new (Kubernetes) environment in parallel. Make sure you have load replay and response diffing tools to evaluate a dual environment setup.
Q6. How can Kubernetes scale without increasing ops team?
Eric Tune: Kubernetes is built to respond to many types of application and infrastructure failures automatically – for example slow memory leaks in an application, or kernel panics in a virtual machine. Previously this kind of problem may have required immediate attention. With Kubernetes as the first line of defense, ops can wait for more data before taking action. This in turn supports faster rollouts, as you don’t need to soak as long if you know that slow memory leaks will be handled automatically, and you can fix by rolling forward rather than back.
Some ops teams also face multiple deployment environments, including multi-cloud, hybrid, or varying hardware in on-premises datacenters. Kubernetes hides somes differences between these, reducing the number of variations of configuration that is needed.
A pattern I have seen is role specialization within ops teams, which can bring efficiencies. Some members specialize in operating the Kubernetes cluster itself, what I call a “Cluster Operations” role, while others specialize in operating a set of applications (microservices). The clean separation between infrastructure and application – in particular the use of Kubernetes configuration files as a contract between the two groups – supports this separation of duties.
Finally, if you are able to choose a hosted version of Kubernetes such as Google Container Engine (GKE), then the hosted service takes on much of the Cluster Operations role. (Note: I work on GKE.)
Q7. On-premises, hybrid, or public cloud infrastructure: which solutions would you think is it better for running Kubernetes?
Eric Tune: Usually factors unrelated to Kubernetes will determine if an application needs to run on-premises, such as data sovereignty, latency concerns or an existing hardware investment. Often some applications need to be on-premises and some can move to public cloud. In this case you have a hybrid Kubernetes deployment, with one or more clusters on-premises, and one or more clusters on public cloud. For application operators and developers, the same tools can be used in all the clusters. Applications in different clusters can be configured to communicate with each other, or to be separate, as security needs dictate. Each cluster is a separate failure domain. One does not typically have a single cluster which spans on-premises and public cloud.
Q8. Kubernetes is open source. How can developers contribute?
Eric Tune: We have 150+ associated repositories that are all looking for developers (and other roles) to contribute. If you want to help but aren’t sure what you want to work on, then start with the Community ReadMe, and come to the community meetings or watch a rerun. If you think you already know what area of Kubernetes you are interested in, then start with our contributors guide, and attend the relevant Special Interest Group (SIG) meeting.
Dr. Eric Tune is a Senior Staff Engineer at Google. He leads dozens of engineers working on Kubernetes and GKE. He has been a Kubernetes contributor since 1.0. Previously at Google he worked on the Borg container orchestration system, drove company-wide compute efficiency improvements, created the Google-wide Profiling system, and helped expand the size of Google’s search index. Prior to Google, he was active in computer architecture research. He holds computer engineering degrees (PhD, MS, BS) from UCSD .