Kubernetes Logging and Monitoring - Let's understand

Kubernetes Logging and Monitoring – a simple guide

Last Updated on February 14, 2023 by cscontents

Introduction

Kubernetes monitoring means monitoring the Kubernetes cluster and pods running in it. When we say monitoring any system or any application, majorly it means we need to monitor or check its logs & metric sets. Similarly, by Kubernetes monitoring it means you need to monitor or have an eye on the various logs & metric sets generated by Kubernetes cluster and its pods.

To monitor those logs & metric sets we need to use some tools which will help us by fetching those logs & metric sets to a centralized place. Now, this tool can be self-hosted or cloud-hosted. And there are many tools available in the market (e.g., ELK stack, Dynatrace, Splunk etc.).

In this article we will try to understand Kubernetes logs, metric sets and how to monitor them.

If you want to know about monitoring in general or what is monitoring & why do we need it, then please head over to the below article.

What is “Monitoring” in DevOps? Why do we need to Monitor App/DB servers, Transactions etc.?

Why do we need to monitor Kubernetes cluster & its nodes, pods?

Below are some important reason –

  • Kubernetes pods are ephemeral in nature. Also, when they die all the logs get wiped-off. So, we need to set up some tool which can collect those logs, and we can know the reason behind any issue by checking those logs.
  • It’s very cumbersome and time-consuming to check the Kubernetes logs by running ‘kubectl logs’ command from the CLI.
  • Not only Kubernetes logs, but it is also very time-consuming to check the various metric sets (e.g., CPU usage, memory usage etc.) of Kubernetes cluster from the CLI. And monitoring/checking of these metric sets is very crucial. You won’t know how much resource is consumed by a specific application pod; you won’t know the cluster health status until & unless you manually use CLI command to check those metric sets.
  • Coming to business point of view, if you don’t check the logs & metrics on time when there is some issue your business or your client’s business will be impacted.

4 levels/layers of logs/metrics in Kubernetes monitoring

If we see Kubernetes architecture we will see there is 3 layer/level –

  • Cluster level – It is top most level. Here our concern is about overall cluster health status, as a cluster how it is performing etc.
  • Node level – It is the 2nd layer from top. We must check how individual nodes are performing, resource (CPU & memory) utilization etc.
  • Pod level – It is the 3rd layer from top. Pods are the smallest deployable unit in Kubernetes. The logs generated by individual pods & its metric sets are very important.
  • Container level – It is the bottom layer. This came into picture due to fact that in some case we need to run multiple containers inside same pod.

Kubernetes logging & monitoring

Different types of logs in Kubernetes Cluster

There are various types of log generated in Kubernetes cluster.

  • Logs generated by various Kubernetes components.
  • Logs generated by application or application pod. These logs are very crucial to ensure our application is running properly. There could be two types of application logs –
    • Logs which are written in stdout/stderr.
    • Logs which are not written in stdout/stderr or logs which are not in standard format.
  • If there are multiple containers inside a pod then we might need to check the logs for each of the

Different types of metric sets in Kubernetes Cluster

There are many metric sets –

  • At node level –
    • CPU utilization
    • Memory utilization
    • Load
    • Network traffic
    • Log rate
  • Status of overall health of cluster
  • Resource (CPU & memory) utilization by each application pod.
  • At container level –
    • CPU utilization
    • Memory utilization
    • Inbound traffic
    • Outbound traffic

How to fetch/collect logs from Kubernetes?

  • To fetch the logs of the Kubernetes component or system logs we can run logging agent as daemonset. Which means logging agent will run as a separate pod in each node.
  • If we run the logging agent as daemonset it will also collect logs from application pod. In case of application pod there are two cases –
    • The logs which are written in stdout/stderr those will be collected by the logging agent which is running as daemonset.
    • The logs which are not written in stdout/stderr or logs which are not in standard format, by default those logs will not be collected by the logging agent.
    • To collect the application logs which are not written in stdout/stderr we need to run a sidecar container (busybox) with the application container. It means total 2 containers will run inside the application pod. And this sidecar container will read all the logs and write them in stdout/stderr continuously. And once the logs are written in stdout/stderr, then those logs will be collected by the logging agent which is running as daemonset.
    • While running sidecar container with the application container we must keep an eye on the resource utilization by that sidecar. Otherwise, it will impact the application container.
  • In some case you might be interested to fetch the logs of application pod. In that case you don’t need to run the logging agent as daemonset. Rather you need to run the logging agent as sidecar container with the application container. It means there will be total two containers inside the application pod. The logging agent which is running as sidecar container will collect all the application logs and send them to some centralized location.

For example, we can use filebeat (from elastic stack), fluentd etc. as logging agent. We need to deploy them as daemonset. In the daemonset manifest file we need to provide the details of the backend or centralized location where logs will be sent or stored.

Kubernetes logging agent as daemonset and sidecar

How to fetch/collect metric sets from Kubernetes?

As we are running some logging agent to fetch the logs from Kubernetes, similarly we need to run some agent to fetch the metric sets. In this case also we need to run the agent as daemonset which will ensure one pod of that agent runs in each node, and it will collect all the metric set from the node. From these metric sets we will be able to see health of each node and overall cluster health.

If there is any application specific metric set which is not collected by the agent, in that case we might need to think about running one such agent as sidecar container inside the application pod, keeping in mind that the sidecar container should not affect the performance of the application container.

For example, we can use metricbeat from elastic stack to fetch the metric sets from Kubernetes.

Use some visualization tool

Once we have the logs & metric sets in a centralized location, it is very important that we should create some visualization chart so that we can utilize those logs & metric sets to solve any issue as soon as possible we find any. Also, in some case we would be able to anticipate the upcoming issues using those logs & metric sets.

In this case, once we have the dashboard ready we can share its URL to the other members of the team, and they can utilize it. This is very crucial that everybody understand the dashboard properly, and they can read the logs from the dashboard itself and no need to run the CLI commands.

Example: one great tool is Kibana which comes from Elastic stack (ELK stack). It is a powerful visualization tool. Usually Kibana is configured with Elasticsearch (where logs & metric sets are stored) and Kibana fetch the data from Elasticsearch to show it on the Kibana dashboard.

Conclusion

Throughout this article we saw and understood what are the various logs & metric sets in a Kubernetes environment and how to fetch those logs & metric sets to a centralized place for visualization purpose.

Final thoughts, this is very much important to set up some monitoring tool in your Kubernetes cluster so that you can know the cluster health, node health, health of the application pods etc. If you depend on manual process to check the logs & metrics then it will delay the process. And in a production environment this kind of delay might not be acceptable. So we need to think & take action carefully.

 

Thank You.

If you are interested in learning DevOps, please have a look at the below articles, which will help you greatly.