How To Set Up Monitoring Using Prometheus and Grafana
Monitoring our microservices is as important as its development. In this article, we see how you can monitor your microservices using Prometheus and Grafana.
Join the DZone community and get the full member experience.
Join For FreeIntroduction
So you have finally written and deployed your first microservice? Or maybe you have decided to embark on the microservices adventure to future-proof yourself? Either way, congratulations!
It’s time to take the next step. It's time to set up monitoring!
Why Setting Up Monitoring Is Important
Monitoring is super important. It's the part of your system that lets you know what’s going on in your app. And it isn’t just a dashboard with fancy charts.
Monitoring is the systematic process of aggregating actionable metrics and logs.
The keyword here is actionable. You are collecting all these metrics and logs to make decisions based on them.
For example, you would want to collect the health of your VMs & microservices to ensure you have enough healthy capacity to service user requests. You’d also like to trigger emails & notifications in case failures go below a certain threshold.
This is what monitoring helps you achieve. This is why you need to set up monitoring.
As it's clearly evident, your monitoring stack is a source from which several processes can be automated. So it's really important to make sure your monitoring stack is reliable and that it can scale with your application.
Prometheus has become the go-to monitoring stack in recent times. Its novel pull-based architecture, along with its in-built support for alerting, makes it an ideal choice for a wide variety of workloads.
In this article, we’ll use Prometheus to set up monitoring. For visualizations, we’ll use Grafana.
This guide is available in video format as well. Feel free to refer to it.
I’ve set up a GitHub repository for you guys as well. Use it to reproduce everything we’ll be doing today.
Prometheus and Grafana
Prometheus is a metrics aggregator. Instead of your services pushing metrics to it, like in the case of a database, Prometheus pulls metrics from your services.
It expects services to make an endpoint by exposing all the metrics in a particular format. All we need to do is tell Prometheus the address of such services, and it will begin scraping them periodically.
Prometheus then stores these metrics for a configured duration and makes them available for querying. It pretty much acts like a time-series database in that regard.
Along with that, Prometheus has got first-class support for alerting using AlertManager. With AlertManager, you can send notifications via Slack, email, PagerDuty, and tons of other mediums when certain triggers go off.
All of this makes it super easy to set up monitoring using Prometheus.
While Prometheus and AlertManager provided a well-rounded monitoring solution, we need dashboards to visualize what’s going on within our cluster. That’s where Grafana comes in.
Grafana is a fantastic tool that can create outstanding visualizations. Grafana itself can’t store data, but you can hook it up to various sources to pull metrics from it, including Prometheus.
So the way it works is this: we use an aggregator like Prometheus to scrap all the metrics. Then we link Grafana to it to visualize the data in the form of dashboards.
Easy!
What We'll Be Monitoring
Microservices
We’ll be monitoring two microservices today. The first microservice is written by a math genius to perform insane calculations. By insane calculations, I mean adding two numbers. It’s a simple HTTP endpoint expecting two numbers in the request and responds back with the addition of those two numbers.
The second one is a polite greeter service that takes a name as a URL parameter and responds with a greeting.
Here’s what the endpoints look like:
Service | URL | Response |
---|---|---|
Math Service | /add/:num1/:num2 |
{ "value": SUM_OF_NUMBERS } |
Greeter Service | /greeting/:name |
{ "greeting": "hello NAME" } |
And here’s the link to the code:
- Math Service: Written in Node.js
- Greeter Service: Written in Golang
Both of these services will be running inside Docker. Neither one of them have any code related to monitoring whatsoever. In fact, they don’t even know that they are being monitored.
Metrics
We will be monitoring a couple of metrics. This is what the final dashboard will look like.
First is the CPU & memory utilization of our services. Since all of our services will be running in Docker, we’ll use cAdvisor to collect these metrics.
cAdvisor is a really neat tool. It collects container metrics directly from the host and makes it available for Prometheus to scrape. You don’t really need to configure anything for it to work.
We’ll also be collecting HTTP metrics. These are also called l7 metrics.
What we are interested in is the requests coming in per second grouped by response status code. Using this metric alone, we can infer the error rates and total throughput of each service.
Now we could modify our services to collect these metrics and make them available for Prometheus to scrape. But that sounds like a lot of work. Instead, we will use a reverse proxy like HAProxy in front of our services which can collect metrics on our behalf.
The reverse proxy plays two roles: it exposes our apps to the outside world, and it collects metrics while it’s at it. Unfortunately, HAProxy doesn’t export metrics in a Prometheus compatible format. Luckily the community has built an exporter which can do this for us.
We still have a small problem with this setup. HAproxy won’t capture metrics for direct service to service communication since that will bypass it altogether. We can solve this loophole entirely by using something we call a service mesh. You can read more about it in this article.
Our Final Monitoring Setup
Our final monitoring setup using Prometheus and Grafana will look something like this:
Deploying Our Monitoring Stack
Finally, it’s time to get our hands dirty.
To get our services up, we’ll write a docker-compose file. Docker-compose is an awesome way to describe all the containers we need in a single YAML file. Again, all the resources can be found in this GitHub repo.
version'3'
services
###############################################################
# Our core monitoring stack #
###############################################################
prometheus
image prom/prometheus
ports
# Prometheus listens on port 9090 9090:9090
volumes
# We mount a custom prometheus config ./prometheus.yml:/etc/prometheus/prometheus.yml
# file to scrap cAdvisor and HAProxy
deploy
resources
limits
cpus'1'
memory 512M
grafana# Garafana needs no config file since
image grafana/grafana # we configure it once it's up
ports
# Grafana listens on port 3000 3000:3000
depends_on prometheus
###############################################################
# Agent to collect runtime metrics #
###############################################################
cadvisor
image google/cadvisor latest
container_name cadvisor
volumes# Don't ask me why I mounted all these
# directories. I simply copied these /:/rootfs:ro
# mounts from the documentation. /var/run:/var/run:rw
/sys:/sys:ro
/var/lib/docker/:/var/lib/docker:ro
deploy
resources
limits
cpus'1'
memory 512M
###############################################################
# HA proxy #
###############################################################
haproxy# We are using HAProxy as our reverse
image haproxy 2.3 # proxy here
ports
# I've configured HAProxy to run on 11000 11000:11000
volumes
# We mount a custom config file to proxy ./haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg
deploy# between both the services
resources
limits
cpus'0.5'
memory 512M
depends_on svc-greeter svc-math
haproxy-exporter
image prom/haproxy-exporter # Need to point the exporter to haproxy
command'--haproxy.scrape-uri="http://haproxy:8404/stats;csv"'
depends_on haproxy
###############################################################
# Our Microservices #
###############################################################
svc-greeter# These are our services. Nothing fancy
image spaceuptech/greeter
deploy
resources
limits
cpus'0.05'
memory 512M
svc-math
image spaceuptech/basic-service
deploy
resources
limits
cpus'0.05'
memory 512M
I have also gone ahead to make custom config files for Prometheus and HAProxy.
For Prometheus, we are setting up scrapping jobs for cAdvisor and HAProxy exporter. All we need to do is give Prometheus the host and port of the targets. If not provided explicitly, Prometheus fires HTTP
requests on the /metrics
endpoint to retrieve metrics.
For HAProxy, we configure one backend
for each service. We will be splitting traffic between the two microservices based on the incoming request’s url. We’ll forward the request to the greeter service if the url begins with /greeting
. Otherwise, we forward the request to the math service.
Bringing Our Services Online
Now that we have described all our services in a single docker-compose file, we can copy all of this on our VM, ssh into it, and then run docker-compose -p monitoring up -d
.
That’s it. Docker will get everything up and running.
These are the ports all the exposed services will be listening to. Make sure these ports are accessible.
Service | Port |
---|---|
Prometheus | 9090 |
Grafana | 3000 |
HAProxy | 11000 |
We can check if Prometheus was configured correctly by visiting http://YOUR_IP:9090/targets
. Both the targets (cAdvisor and HAProxy exported) should be healthy.
We can check if our proxy is configured properly by opening the following urls:
Service | URL | Expected Response |
---|---|---|
Math Service | http://YOUR_IP:11000/add/1/2 |
{ "value": 3 } |
Greeter Service | http://YOUR_IP:11000/greeting/YourTechBud |
{ "greeting": "hello YourTechBud" } |
Setting Up Our Monitoring Dashboard
Then the last step remaining would be to configure Grafana. Visit http://YOUR_IP:3000
to open up Grafana. The default username and password will be admin
.
Add Prometheus as a Data Source
All our monitoring metrics are being scrapped and stored in Prometheus. Hence, the first step is to add Prometheus as a datastore. To do that, select Add a datasource
> select Prometheus
from the dropdown > enter http://prometheus:9090
as the Prometheus url > select Save & Test
.
That’s all we need to do to link Grafana with Prometheus.
Create Our Dashboard.
Creating a dashboard from scratch can take time. You can import the dashboard I’ve already made to speed things up. After hitting the Import Dashboard
button in the Manage Dashboards
section, simply copy and paste the JSON in the text area and select Prometheus as the data source.
That’s it!
Feel free to refer to this 10-minute video below if you get lost.
Conclusion
You’ve just set up monitoring using Prometheus and Grafana. You can edit a chart to see what I’ve done. You’ll see that all it takes to populate a chart is a Prometheus query.
Here’s one query as an example: sum(container_memory_usage_bytes{name=~\"monitoring_svc.*\"} / 1024 / 1024) by (name)
. This query calculates the total memory being consumed by our two microservices.
I agree that the Prometheus queries can be a bit overwhelming. But that doesn’t really matter. You can simply import this dashboard and expect it to just work. Use it as a boilerplate. Mess around with it. Enjoy!
Did this article help you? How do you make sure your apps are cloud-native? Share your experiences below.
Published at DZone with permission of Noorain Panjwani, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments