Setup Prometheus and Grafana on Kubernetes using prometheus-operator

Posted on 168 views

Monitoring Production Kubernetes Cluster(s) is an important and progressive operation for any Cluster Administrator. There are myriad of solutions that fall into the category of Kubernetes monitoring stack, and some of them are Prometheus and Grafana. This guide is created with an intention of guiding Kubernetes users to Setup Prometheus and Grafana on Kubernetes using prometheus-operator.

Prometheus is a full fledged solution that enables Developers and SysAdmins to access advanced metrics capabilities in Kubernetes. The metrics are collected in a time interval of 30 seconds, this is a default settings. The information collected include resources such as MemoryCPUDisk Performance and Network IO as well as R/W rates. By default the metrics are exposed on your cluster for up to a period of 14 days, but the settings can be adjusted to suit your environment.

Grafana is used for analytics and interactive visualization of metrics that’s collected and stored in Prometheus database. You can create custom charts, graphs, and alerts for Kubernetes cluster, with Prometheus being data source. In this guide we will perform installation of both Prometheus and Grafana on a Kubernetes Cluster. For this setup kubectl configuration is required, with Cluster Admin role binding.

Prometheus Operator

We will be using Prometheus Operator in this installation to deploy Prometheus monitoring stack on Kubernetes. The Prometheus Operator is written to ease the deployment and overall management of Prometheus and its related monitoring components. By using the Operator we simplify and automate Prometheus configuration on any any Kubernetes cluster using Kubernetes custom resources.

The diagram below shows the components of the Kubernetes monitoring that we’ll deploy:

prometheus_operator_kubernetes

The Operator uses the following custom resource definitions (CRDs) to deploy and configure Prometheus monitoring stack:

  • Prometheus – This defines a desired Prometheus deployment on Kubernetes
  • Alertmanager – This defines a desired Alertmanager deployment on Kubernetes cluster
  • ThanosRuler – This defines Thanos desired Ruler deployment.
  • ServiceMonitor – Specifies how groups of Kubernetes services should be monitored
  • PodMonitor – Declaratively specifies how group of pods should be monitored
  • Probe – Specifies how groups of ingresses or static targets should be monitored
  • PrometheusRule – Provides specification of Prometheus alerting desired set. The Operator generates a rule file, which can be used by Prometheus instances.
  • AlertmanagerConfig – Declaratively specifies subsections of the Alertmanager configuration, allowing routing of alerts to custom receivers, and setting inhibit rules.

Deploy Prometheus / Grafana Monitoring Stack on Kubernetes

To get a complete an entire monitoring stack we will use kube-prometheus project which includes Prometheus Operator among its components. The kube-prometheus stack is meant for cluster monitoring and is pre-configured to collect metrics from all Kubernetes components, with a default set of dashboards and alerting rules.

You should have kubectl configured and confirmed to be working:

$ kubectl cluster-info
Kubernetes control plane is running at https://192.168.10.12:6443
CoreDNS is running at https://192.168.10.12:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

Step 1: Clone kube-prometheus project

Use git command to clone kube-prometheus project to your local system:

git clone https://github.com/prometheus-operator/kube-prometheus.git

Navigate to the kube-prometheus directory:

cd kube-prometheus

Step 2: Create monitoring namespace, CustomResourceDefinitions & operator pod

Create a namespace and required CustomResourceDefinitions:

kubectl create -f manifests/setup

Command execution results as seen in the terminal screen.

customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com created
namespace/monitoring created

The namespace created with CustomResourceDefinitions is named monitoring:

$ kubectl get ns monitoring
NAME         STATUS   AGE
monitoring   Active   2m41s

Step 3: Deploy Prometheus Monitoring Stack on Kubernetes

Once you confirm the Prometheus operator is running you can go ahead and deploy Prometheus monitoring stack.

kubectl create -f manifests/

Here is my deployment progress output:

poddisruptionbudget.policy/alertmanager-main created
prometheusrule.monitoring.coreos.com/alertmanager-main-rules created
secret/alertmanager-main created
service/alertmanager-main created
serviceaccount/alertmanager-main created
servicemonitor.monitoring.coreos.com/alertmanager created
clusterrole.rbac.authorization.k8s.io/blackbox-exporter created
clusterrolebinding.rbac.authorization.k8s.io/blackbox-exporter created
configmap/blackbox-exporter-configuration created
deployment.apps/blackbox-exporter created
service/blackbox-exporter created
serviceaccount/blackbox-exporter created
servicemonitor.monitoring.coreos.com/blackbox-exporter created
secret/grafana-datasources created
configmap/grafana-dashboard-alertmanager-overview created
configmap/grafana-dashboard-apiserver created
configmap/grafana-dashboard-cluster-total created
configmap/grafana-dashboard-controller-manager created
configmap/grafana-dashboard-k8s-resources-cluster created
configmap/grafana-dashboard-k8s-resources-namespace created
configmap/grafana-dashboard-k8s-resources-node created
configmap/grafana-dashboard-k8s-resources-pod created
configmap/grafana-dashboard-k8s-resources-workload created
configmap/grafana-dashboard-k8s-resources-workloads-namespace created
configmap/grafana-dashboard-kubelet created
configmap/grafana-dashboard-namespace-by-pod created
configmap/grafana-dashboard-namespace-by-workload created
configmap/grafana-dashboard-node-cluster-rsrc-use created
configmap/grafana-dashboard-node-rsrc-use created
configmap/grafana-dashboard-nodes created
configmap/grafana-dashboard-persistentvolumesusage created
configmap/grafana-dashboard-pod-total created
configmap/grafana-dashboard-prometheus-remote-write created
configmap/grafana-dashboard-prometheus created
configmap/grafana-dashboard-proxy created
configmap/grafana-dashboard-scheduler created
configmap/grafana-dashboard-workload-total created
configmap/grafana-dashboards created
deployment.apps/grafana created
service/grafana created
serviceaccount/grafana created
servicemonitor.monitoring.coreos.com/grafana created
prometheusrule.monitoring.coreos.com/kube-prometheus-rules created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
deployment.apps/kube-state-metrics created
prometheusrule.monitoring.coreos.com/kube-state-metrics-rules created
service/kube-state-metrics created
serviceaccount/kube-state-metrics created
servicemonitor.monitoring.coreos.com/kube-state-metrics created
prometheusrule.monitoring.coreos.com/kubernetes-monitoring-rules created
servicemonitor.monitoring.coreos.com/kube-apiserver created
servicemonitor.monitoring.coreos.com/coredns created
servicemonitor.monitoring.coreos.com/kube-controller-manager created
servicemonitor.monitoring.coreos.com/kube-scheduler created
servicemonitor.monitoring.coreos.com/kubelet created
clusterrole.rbac.authorization.k8s.io/node-exporter created
clusterrolebinding.rbac.authorization.k8s.io/node-exporter created
daemonset.apps/node-exporter created
prometheusrule.monitoring.coreos.com/node-exporter-rules created
service/node-exporter created
serviceaccount/node-exporter created
servicemonitor.monitoring.coreos.com/node-exporter created
clusterrole.rbac.authorization.k8s.io/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created
clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created
clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created
configmap/adapter-config created
deployment.apps/prometheus-adapter created
poddisruptionbudget.policy/prometheus-adapter created
rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created
service/prometheus-adapter created
serviceaccount/prometheus-adapter created
servicemonitor.monitoring.coreos.com/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/prometheus-k8s created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created
prometheusrule.monitoring.coreos.com/prometheus-operator-rules created
servicemonitor.monitoring.coreos.com/prometheus-operator created
poddisruptionbudget.policy/prometheus-k8s created
prometheus.monitoring.coreos.com/k8s created
prometheusrule.monitoring.coreos.com/prometheus-k8s-prometheus-rules created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s-config created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
service/prometheus-k8s created
serviceaccount/prometheus-k8s created
servicemonitor.monitoring.coreos.com/prometheus-k8s created

Give it few seconds and the pods should start coming online. This can be checked with the commands below:

$ kubectl get pods -n monitoring
NAME                                   READY   STATUS    RESTARTS        AGE
alertmanager-main-0                    2/2     Running   0               3m8s
alertmanager-main-1                    2/2     Running   1 (2m55s ago)   3m8s
alertmanager-main-2                    2/2     Running   1 (2m40s ago)   3m8s
blackbox-exporter-69684688c9-nk66w     3/3     Running   0               6m47s
grafana-7bf8dc45db-q2ndq               1/1     Running   0               6m47s
kube-state-metrics-d75597b45-d9bhk     3/3     Running   0               6m47s
node-exporter-2jzcv                    2/2     Running   0               6m47s
node-exporter-5k8pk                    2/2     Running   0               6m47s
node-exporter-9852n                    2/2     Running   0               6m47s
node-exporter-f5dmp                    2/2     Running   0               6m47s
prometheus-adapter-5f68766c85-hjcz9    1/1     Running   0               6m46s
prometheus-adapter-5f68766c85-shjbz    1/1     Running   0               6m46s
prometheus-k8s-0                       2/2     Running   0               3m7s
prometheus-k8s-1                       2/2     Running   0               3m7s
prometheus-operator-748bb6fccf-b5ppx   2/2     Running   0               6m46s

To list all the services created you’ll run the command:

$ kubectl get svc -n monitoring
NAME                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
alertmanager-main       ClusterIP   10.100.171.41            9093/TCP,8080/TCP            7m2s
alertmanager-operated   ClusterIP   None                     9093/TCP,9094/TCP,9094/UDP   3m23s
blackbox-exporter       ClusterIP   10.108.187.73            9115/TCP,19115/TCP           7m2s
grafana                 ClusterIP   10.97.236.243            3000/TCP                     7m2s
kube-state-metrics      ClusterIP   None                     8443/TCP,9443/TCP            7m2s
node-exporter           ClusterIP   None                     9100/TCP                     7m2s
prometheus-adapter      ClusterIP   10.109.119.234           443/TCP                      7m1s
prometheus-k8s          ClusterIP   10.101.253.211           9090/TCP,8080/TCP            7m1s
prometheus-operated     ClusterIP   None                     9090/TCP                     3m22s
prometheus-operator     ClusterIP   None                     8443/TCP                     7m1s

Step 4: Access Prometheus, Grafana, and Alertmanager dashboards

We now have the monitoring stack deployed, but how can we access the dashboards of Grafana, Prometheus and Alertmanager?. There are two ways to achieve this;

Method 1: Accessing Prometheus UI and Grafana dashboards using kubectl proxy

An easy way to access Prometheus, Grafana, and Alertmanager dashboards is by using kubectl port-forward once all the services are running:

Grafana Dashboard
kubectl --namespace monitoring port-forward svc/grafana 3000

Then access Grafana dashboard on your local browser on URL:  http://localhost:3000

install-grafana-kubernetes-01-1024x705

Default Logins are:

Username: admin
Password: admin

You’re required to change the password on first login:

install-grafana-kubernetes-02-1024x700

Prometheus Dashboard

For Prometheus port forwarding run the commands below:

kubectl --namespace monitoring port-forward svc/prometheus-k8s 9090

And web console is accessible through the URL: http://localhost:9090

install-prometheus-kubernetes-01-1024x241

Alert Manager Dashboard

For Dashboard Alert Manager Dashboard:

kubectl --namespace monitoring port-forward svc/alertmanager-main 9093

Access URL is http://localhost:9093

Method 2: Accessing Prometheus UI and Grafana dashboard using NodePort / LoadBalancer

To access Prometheus, Grafana, and Alertmanager dashboards using one of the worker nodes IP address and a port you’ve to edit the services and set the type to NodePort.

You need a Load Balancer implementation in your cluster to use service type LoadBalancer. See our guide:

The Node Port method is only recommended for local clusters not exposed to the internet. The basic reason for this is insecurity of Prometheus/Alertmanager services.

Prometheus:

# If you need Node Port
kubectl --namespace monitoring patch svc prometheus-k8s -p '"spec": "type": "NodePort"'

#If you have working LoadBalancer
kubectl --namespace monitoring patch svc prometheus-k8s -p '"spec": "type": "LoadBalancer"'

Alertmanager:

# If you need Node Port
kubectl --namespace monitoring patch svc alertmanager-main -p '"spec": "type": "NodePort"'

#If you have working LoadBalancer
kubectl --namespace monitoring patch svc alertmanager-main -p '"spec": "type": "LoadBalancer"'

Grafana:

# If you need Node Port
kubectl --namespace monitoring patch svc grafana -p '"spec": "type": "NodePort"'

#If you have working LoadBalancer
kubectl --namespace monitoring patch svc grafana -p '"spec": "type": "LoadBalancer"'

Confirm that the each of the services have a Node Port assigned / Load Balancer IP addresses:

$ kubectl -n monitoring get svc  | grep NodePort
alertmanager-main       NodePort    10.254.220.101           9093:31237/TCP               45m
grafana                 NodePort    10.254.226.247           3000:31123/TCP               45m
prometheus-k8s          NodePort    10.254.92.43             9090:32627/TCP               45m

$ kubectl -n monitoring get svc  | grep LoadBalancer
grafana                 LoadBalancer   10.97.236.243    192.168.1.31   3000:30513/TCP               11m

In this example we can access the services as below:

# Grafana
NodePort: http://node_ip:31123
LB: http://lb_ip:3000

# Prometheus
NodePort: http://node_ip:31123
LB: http://lb_ip:9090

# Alert Manager
NodePort: http://node_ip:31237
LB: http://lb_ip:9093

An example of default grafana dashboard showing cluster-wide compute resource usage.

install-grafana-kubernetes-03-1024x494

Destroying / Tearing down Prometheus monitoring stack

If at some point you feel like tearing down Prometheus Monitoring stack in your Kubernetes Cluster, you can run kubectl delete command and pass the path to the manifest files we used during deployment.

kubectl delete --ignore-not-found=true -f manifests/ -f manifests/setup

Within some few minutes the stack is deleted and you can re-deploy if that was the intention.

coffee

Gravatar Image
A systems engineer with excellent skills in systems administration, cloud computing, systems deployment, virtualization, containers, and a certified ethical hacker.