Skip to content
Snippets Groups Projects
Verified Commit 22aa3092 authored by anarcat's avatar anarcat
Browse files

basic prometheus information

parent a21c7c8d
No related branches found
No related tags found
No related merge requests found
Prometheus
==========
Prometheus is a monitoring system that is designed to process a large
number of metrics, centralize them on one (or multiple) servers and
serve them with a well-defined API. That API is queried through a
domain-specific language (DSL) called "PromQL" or "Prometheus Query
Language". Prometheus also supports basic graphing capabilities
although those are limited enough that we use a separate graphing
layer on top (Grafana).
The Prometheus web interface is available at:
<https://prometheus.torproject.org>
A simple query you can try is to pick any metric in the list and click
`Execute`. For example, [this link](https://prometheus1.torproject.org/graph?g0.range_input=2w&g0.expr=node_load5&g0.tab=0) will show the 5-minute load
over the last two weeks for the known servers.
All machines configured through Puppet are scraped by the central
server every 15 seconds.
Munin expatriates
-----------------
Here's a quick cheat sheet from people used to Munin and switching to
Prometheus:
| What | Munin | Prometheus |
| --- | ----- | ---------- |
| Scraper | munin-update | prometheus |
| Agent | munin-node | prometheus node-exporter and others |
| Graphing | munin-graph | prometheus or grafana |
| Alerting | munin-limits | prometheus alertmanager |
| Network port | 4949 | 9100 and others |
| Protocol | TCP, text-based | HTTP, [text-based][] |
| Storage format | RRD | custom TSDB |
| Downsampling | yes | no |
| Default interval | 5 minutes | 15 seconds |
| Authentication | no | no |
| Federation | no | yes (can fetch from other servers) |
| High availability | no | yes (alert-manager gossip protocol) |
[text-based]: https://prometheus.io/docs/instrumenting/exposition_formats/
Basically, Prometheus is similar to Munin in many ways:
* it "pulls" metrics from the nodes, although it does it over HTTP
(to <http://host:9100/metrics>) instead of a custom TCP protocol
like Munin
* the agent running on the nodes is called `prometheus-node-exporter`
instead of `munin-node`. it scrapes only a set of built-in
parameters like CPU, disk space and so on, different exporters are
necessary for different applications (like
`prometheus-apache-exporter`) and any application can easily
implement an exporter by exposing a Prometheus-compatible
`/metrics` endpoint
* like Munin, the node exporter doesn't have any form of
authentication built-in. we rely on IP-level firewalls to avoid
leakage
* the central server is simply called `prometheus` and runs as a
daemon that wakes up on its own, instead of `munin-update` which is
called from `munin-cron` and before that `cron`
* graphics are generated on the fly through the crude Prometheus web
interface or by frontends like Grafana, instead of being constantly
regenerated by `munin-graph`
* samples are stored in a custom "time series database" (TSDB) in
Prometheus instead of the (ad-hoc) RRD standard
* Prometheus performs *no* downsampling like RRD and Prom relies on
smart compression to spare disk space, but it uses more than Munin
* Prometheus scrapes samples much more aggressively than Munin by
default, but that interval is configurable
* Prometheus can scale horizontally (by sharding different services
to different servers) and vertically (by aggregating different
servers to a central one with a different sampling frequency)
natively - `munin-update` and `munin-graph` can only run on a
single (and same) server
* Prometheus can act as an high availability alerting system thanks
to its `alertmanager` that can run multiple copies in parallel
without sending duplicate alerts - `munin-limits` can only run on a
single server
Puppet implementation
---------------------
Every node is configured as a `node-exporter` through the
`roles::monitored` that is included everywhere. The role might
eventually be expanded to cover alerting and other monitoring
resources as well. This role, in turn, includes the
`profile::prometheus::client` which configures each client correctly
with the right firewall rules.
The firewall rules are exported from the server, defined in
`profile::prometheus::server`. We hacked around limitations of the
upstream Puppet module to install Prometheus using backported Debian
packages. The monitoring server itself is defined in
`roles::monitoring`.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment