Skip to content
Snippets Groups Projects
Verified Commit f4a188d5 authored by anarcat's avatar anarcat
Browse files

merge template into the prometheus documentation

parent aa1af171
No related branches found
No related tags found
No related merge requests found
......@@ -11,8 +11,10 @@ layer on top (see [[Grafana]]).
[Prometheus]: https://prometheus.io/
Basic design
------------
[[!toc levels=3]]
Tutorial
========
The Prometheus web interface is available at:
......@@ -24,32 +26,12 @@ over the last two weeks for the known servers.
[this link]: https://prometheus1.torproject.org/graph?g0.range_input=2w&g0.expr=node_load5&g0.tab=0
Here you can see, from the [Prometheus overview documentation][] the
basic architecture of a Prometheus site:
# How-to
[Prometheus overview documentation]: https://prometheus.io/docs/introduction/overview/
## Pager playbook
## Disaster recovery
<img src="https://prometheus.io/assets/architecture.png" alt="A
drawing of Prometheus' architecture, showing the push gateway and
exporters adding metrics, service discovery through file_sd and
Kubernetes, alerts pushed to the Alertmanager and the various UIs
pulling from Prometheus" />
As you can see, Prometheus is somewhat tailored towards
[Kubernetes][] but it can be used without it. We're deploying it with
the `file_sd` discovery mechanism, where Puppet collects all exporters
into the central server, which then scrapes those exporters every
`scrape_interval` (by default 15 seconds). The architecture graph also
shows the Alertmanager which could be used to (eventually) replace our
Nagios deployment.
[Kubernetes]: https://kubernetes.io/
It does not show that Prometheus can federate to multiple instances
and the Alertmanager can be configured with High availability.
Munin expatriates
-----------------
## Migrating from Munin
Here's a quick cheat sheet from people used to Munin and switching to
Prometheus:
......@@ -117,8 +99,12 @@ Basically, Prometheus is similar to Munin in many ways:
without sending duplicate alerts - `munin-limits` can only run on a
single server
Puppet implementation
---------------------
Reference
=========
## Installation
### Puppet implementation
Every node is configured as a `node-exporter` through the
`roles::monitored` that is included everywhere. The role might
......@@ -145,3 +131,69 @@ policies.
[use of Debian packages for installation]: https://github.com/voxpupuli/puppet-prometheus/pull/303
[allow scrape job collection]: https://github.com/voxpupuli/puppet-prometheus/pull/304
[Prometheus Puppet module]: https://github.com/voxpupuli/puppet-prometheus/
## SLA
## Design
Here is, from the [Prometheus overview documentation][], the
basic architecture of a Prometheus site:
[Prometheus overview documentation]: https://prometheus.io/docs/introduction/overview/
<img src="https://prometheus.io/assets/architecture.png" alt="A
drawing of Prometheus' architecture, showing the push gateway and
exporters adding metrics, service discovery through file_sd and
Kubernetes, alerts pushed to the Alertmanager and the various UIs
pulling from Prometheus" />
As you can see, Prometheus is somewhat tailored towards
[Kubernetes][] but it can be used without it. We're deploying it with
the `file_sd` discovery mechanism, where Puppet collects all exporters
into the central server, which then scrapes those exporters every
`scrape_interval` (by default 15 seconds). The architecture graph also
shows the Alertmanager which could be used to (eventually) replace our
Nagios deployment.
[Kubernetes]: https://kubernetes.io/
It does not show that Prometheus can federate to multiple instances
and the Alertmanager can be configured with High availability.
## Issues
There is no issue tracker specifically for this project, [File][] or
[search][] for issues in the [generic internal services][search] component.
[File]: https://trac.torproject.org/projects/tor/newticket?component=Internal+Services%2FTor+Sysadmin+Team
[search]: https://trac.torproject.org/projects/tor/query?status=!closed&component=Internal+Services%2FTor+Sysadmin+Team
## Monitoring and testing
# Discussion
## Overview
<!-- describe the overall project. should include a link to a ticket -->
<!-- that has a launch checklist -->
## Goals
<!-- include bugs to be fixed -->
### Must have
### Nice to have
### Non-Goals
## Approvals required
<!-- for example, legal, "vegas", accounting, current maintainer -->
## Proposed Solution
## Cost
## Alternatives considered
<!-- include benchmarks and procedure if relevant -->
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment