Monitoring

📁 See the Monitor Folder →

Monitoring is implemented inside the EKS Cluster in order to monitor its resources. This ensures a proactive approach to potential failures or malfunctions within the production.

Monitoring is composed on five key principles:

  • Infrastructure Monitoring;
  • Application Tracing;
  • Log Management;
  • Alerting;
  • Visualizations.

For each of these principles, a tool has been deployed on Kubernetes:

  • Prometheus for Infrastructure Monitoring;
  • Jaeger for Application Tracing;
  • Loki for Log Management;
  • AlertManager for Alerting;
  • Grafana for Visualizations.

Persistent Storage

Monitoring tools spend their time retrieving and analyzing data in real time. In production, we want this data to last over time, so that we can analyze the data with these tools several weeks or even several months in the past. It is therefore necessary to provide persistent data to monitoring tools in production.

Each of the tools described below is automatically equipped with a Persistent Volume (PV) via a Persistent Volume Claim (PVC) which provides them with a EBS gp2 volume configured via a Storage Class (SC).

📁 See Persistent Storage Folder →

These are the automatically generated volumes for each monitoring tool:

Persistent Volumes

Prometheus + Grafana + AlertManager

Prometheus, Grafana and Alertmanager are all preconfigured and rapidly deployable through the kube-prometheus project managed by the Prometheus Operator team and community.

📁 See kube-prometheus Folder →

The kube-prometheus project provides beforehand the complete configuration of metrics of a Kubernetes Cluster for Prometheus, a batch of dashboards for Grafana and multiple alerts for AlertManager. For more specific needs, it is possible to customize the initial configuration of the project by configuring files in JSONNET format or by directly modifying the YAML configuration.

Prometheus

We use port-forwarding to get to the Prometheus User Interface:

kubectl -n monitoring port-forward svc/prometheus-operated 9090

And by going to the “Target” tab, you can see all the elements monitored by Prometheus:

Prometheus Targets

Grafana

We use port-forwarding to get to the Grafana User Interface:

kubectl -n monitoring port-forward svc/grafana 3000

And we have access to multiple dashboards, including a dashboard showing the memory and CPU used by the pods of the EKS cluster:

Grafana Dashboard

AlertManager

We use port-forwarding to get to the AlertManager User Interface:

kubectl -n monitoring port-forward svc/alertmanager-main 9093

And we access the list of alerts:

AlertManager

Loki

Loki was created by the creators of Grafana: Loki is therefore very easily integrated with Grafana. Simply add Loki to the EKS cluster via the Helm package manager, And configure Grafana to add Loki as a Datasource by default.

Here is an example of logs recovered with Loki, filtering on the pods of the application:

Loki Logs

Jaeger + OpenTelemetry + ElasticSearch

📁 See Jaeger Folder →

Jaeger requires more work to deploy since it requires the integration of two additional tools:

  • Using OpenTelemetry, which takes care of generating application traces and sending them to the Jaeger Collector;
  • Deploying ElasticSearch as a storage tool for Jaeger traces.

OpenTelemetry

In order to collect traces from the application, OpenTelemtry uses a file named tracing.js, which takes care of collecting and sending the traces to the Jaeger Collector. 📃 See tracing.js file →

There is no need to modify the application’s code, OpenTelemetry manages to analyze the duration of each application’s functions.

ElasticSearch

Jaeger uses Cassandra or ElasticSearch to store traces in production, and strongly advises prioritizing ElasticSearch. So we deploy three ElasticSearch pods, each with a dedicated volume.

📃 See ElasticSearch configuration file →

Jaeger UI

We use port-forwarding to get to the Jaeger UI:

kubectl port-forward svc/simple-prod-query 16686

And we can observe the traces of the application:

Jaeger UI

Bonus: Goldilocks

As previously explained, the Vertical Pod Autoscaler generates a recommendation on the compute resources needed by the pods in the cluster. This recommendation can be monitored in real time using Goldilocks, which is implemented in the Cluster.

We use port-forwarding to get to the Goldilocks User Interface:

kubectl -n goldilocks port-forward svc/goldilocks-dashboard 8080:80

And here’s a look at a VPA recommendation prettied up by Goldilocks:

Goldilocks