On Telegraf Operator. Q&A with Russ Savage

Telegraf can collect metrics from pretty much any system or environment. At present there are over 200 Telegraf input plugins and that number just keeps increasing. That’s a huge advantage right there because it’s rare to have an ecosystem based on a single technology, like Kubernetes.– Russ Savage

Q1. What is Telegraf Operator?

Well, before we talk about the Telegraf Operator, we need to distinguish it from Telegraf. Telegraf is an open-source agent that you can configure to ingest metrics from all kinds of things, e.g., technology stacks, sensors, systems, etc.

The Telegraf Operator is an application that manages Telegraf instances in a Kubernetes cluster. When a new Kubernetes pod appears, the operator reads the pod annotations looking for specific information about how to configure the Telegraf input . When the operator finds a pod with valid annotations it attaches a telegraf instance to that pod as a sidecar. The output from that Telegraf instance is defined centrally, so that the user managing the pod configuration only has to worry about the configuration to monitor the application.

Q2. What’s a sidecar deployment and why should developers use it?

A sidecar deployment is when an instance of Telegraf is added to a Kubernetes pod right next to the application container and collects data from that application. Telegraf allows you to use any one of the 200+ input plugins to collect metrics from your application.

This is different from a DaemonSet deployment, which is commonly used to monitor the nodes of the Kubernetes cluster. The sidecar method only collects metrics from the application within the pod.

The sidecar approach is useful because it lets you define custom metrics for Telegraf to collect and the collection of those metrics doesn’t impact the monitoring framework shared by other workloads. 

The pods in your Kubernetes cluster can generate a significant amount of data. Sometimes, being able to directly push data from each pod to a central location can be more efficient than having a single service scrape that data. Also, you don’t have to worry about exposing the right ports for your pod since everything happens locally. Finally, for applications that do not expose metrics for scraping, adding a Telegraf sidecar can be a simple, code free way to get metrics out of the system.

Q3. Is the sidecar deployment what makes Telegraf Operator different from other monitoring solutions?

Oh, no. Not at all! Telegraf and the Telegraf Operator are much more than their deployment methods.

Sidecar deployments are popular because they provide the most flexibility when designing and building out your monitoring architecture. However, that flexibility comes with some complexity. The goal of the telegraf operator is to make configuring sidecars as simple as deploying pods in Kubernetes. Ideally, the person deploying the pod doesn’t need to know anything about where the metrics are going. That configuration can be managed centrally for all Telegraf instances in the Cluster. Users also don’t need to configure a telegraf container for each pod they manage. Again, that can be injected automatically later with no impact to their other containers.

From an architectural perspective, Telegraf is very similar to other metric collection solutions. For example, Prometheus uses exporters to scrape data at the pod level and sends those metrics to a Prometheus server where users can query that data. A Telegraf instance performs the same scraping function as a Prometheus exporter, and InfluxDB, a time series database, serves the same function as the Prometheus server. 

There are a number of things that differentiate Telegraf. One major difference is that Telegraf is environment agnostic. While something like Prometheus is designed to work in a Kubernetes environment, if you want to collect data from external or legacy systems or technologies you need to build custom exporters for each one to expose those endpoints.

Telegraf can collect metrics from pretty much any system or environment. At present there are over 200 Telegraf input plugins and that number just keeps increasing. That’s a huge advantage right there because it’s rare to have an ecosystem based on a single technology, like Kubernetes. So, while Telegraf can collect metrics within a Kubernetes environment, just like Prometheus, it can also collect metrics from non-Kubernetes environments with very little configuration.

Q4. So why should developers choose Telegraf and Telegraf Operator over another solution?

At InfluxData it comes down to flexibility. Telegraf is designed to be flexible. If a developer works in Kubernetes and only Kubernetes, and already has monitoring in place, like Prometheus, then the incentive to change things up is pretty low.

But as a company that’s planted firmly in the open-source world, it’s very rare (if ever!) that we see something like that in production. The moment a developer thinks that they want to expand metric collection beyond the scope of a Kubernetes environment, it’s worth looking at Telegraf.

If a developer is starting from scratch, they may want the flexibility that Telegraf offers. However, if a developer already has another solution in place, they can use Telegraf to either extend their current capabilities, or to replace an element of their current solution without the need to scrap everything and start over.

To bring this back to Telegraf Operator, once a developer figures out what they want to collect and how they plan to do so, Telegraf Operator automatically installs Telegraf instances to ingest that data. Not only does Telegraf make metric collection more flexible, but Telegraf Operator scales that flexibility automatically.

………………………………………………..

Russ Savage, Product Manager, InfluxData

Russ Savage is a Product Manager at InfluxData where he focuses on enabling DevOps for teams using InfluxDB and the TICK Stack. He has a background in computer engineering and has been focused on various aspects of enterprise data for the past 10 years. Russ has previously worked at Cask Data, Elastic, Box, and Amazon. When Russ is not working at InfluxData, he can be seen speeding down the slopes on a pair of skis.

Sponsored by InfluxData.

You may also like...