Monitor Kubernetes Control Plane Services Availability with Heartbeat [ELK]

Introduction

Kubernetes is the most popular container orchestrator currently available. It is already provided as a managed service by most cloud providers like Azure, AWS, GCP etc which shows the adaptability of Kubernetes in much less time.

There are multiple aspects of monitoring Kubernetes cluster and services using ELK and Beats.

For example, using Metricbeat to monitor resource metrics from nodes/pods/containers, Filebeat for system/container logs etc however, in this article we are going to specifically see how to monitor Kubernetes control plane services using Heartbeat.

Kubernetes Control Plane

Kubernetes Control Plane is responsible for coordinating with each node in the cluster, assigning work through pod scheduling, providing administrative interfaces to the cluster, and managing cluster-wide health and services.

Below article from Rancher highlights the key components of Kubernetes architecture:Introduction to Kubernetes ArchitectureIntroduction Kubernetes has become increasingly popular as a reliable platform for running and managing applications…rancher.com

Kubernetes Control Plane Availability

Control Plane services are required to be fully available for normal operation of cluster. So monitoring these services becomes very important from operations perspective.

To monitor these services with Heartbeat we need to deploy agent as daemonset so that it can run on each node. We will be using helm chart for deploying heartbeat as daemonset and configuring required monitors.

Refer below git repo to get started:Abmun/helm-chartsheartbeat is a lightweight daemon that periodically check the status of your services and determine whether they are…github.com

Monitors

We have total six services to monitor out of which four run only on master nodes and two run on worker nodes as shown below:

Image for post

So monitors have to be configured in such a way that master node services are identified using the master label on the nodes. By default below label is configured on master node so we can use this label to differentiate master and worker nodes.

node-role.kubernetes.io/master

We should use autodiscover for these monitors so that if nodes are added or removed from cluster, heartbeat can auto discover those and enable monitors.

Below is an example of control plane monitors, you can change conditions and ports according your cluster configuration.

heartbeat.autodiscover:
providers:
- type: kubernetes
host: ${NODE_NAME}
templates:
- condition:
contains:
kubernetes.namespace: kube-system
config:
- type: tcp
name: Kubelet_Service_Monitor
hosts: ["${data.kubernetes.node.name}:10250"] ipv4: true
schedule: "@every 5s"
- type: tcp
name: Kube_Proxy_Monitor
hosts: ["${data.kubernetes.node.name}:10256"] ipv4: true
schedule: "@every 5s"
- type: kubernetes
host: ${NODE_NAME}
add_resource_metadata:
node:
enabled: true
include_labels: ["node-role.kubernetes.io/master"] templates:
- condition:
contains:
kubernetes.node.labels.node-role_kubernetes_io/master: ""
config:
- type: tcp
name: Kube-Scheduler-Monitor
hosts: ["${NODE_NAME}:10259"] schedule: "@every 5s"
timeout: 5s
- type: tcp
name: Kube-Controller-Manager-Monitor
hosts: ["${NODE_NAME}:10257"] schedule: "@every 5s"
timeout: 5s
- type: tcp
name: Kube-API-Monitor
hosts: ["${NODE_NAME}:6443"] schedule: "@every 5s"
timeout: 5s
- type: tcp
name: ETCD-Service-Monitor
hosts: ["${NODE_NAME}:2379"] schedule: "@every 5s"
timeout: 5s

Heartbeat Uptime View

Image for post

Once you have all the availability data in ELK, you can setup dashboards and alerts on these monitors to get notified as soon as any of the services are reported down.

NOTE: For tutorial on how to setup ELK and Beats agents with Kubernetes cluster, refer to below article-Setup and operate ELK Stack on Kubernetes cluster using Argo CDArgo CD is a declarative, GitOps continuous delivery tool for Kubernetes. Lets see how we can use ArgoCD to deploy and…medium.com

Access/Permission error:

If you encounter access denied/permission error with the above monitors, make sure to add nodes and services in the heartbeat cluster role.

Image for post

Please do reach out in of any issues/queries.

Categories
Comments
All comments.
Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.