merge template into the prometheus documentation

f4a188d5 · anarcat · aa1af171 · f4a188d5
Verified Commit f4a188d5 authored 4 years ago by anarcat
--- a/tsa/howto/prometheus.mdwn
+++ b/tsa/howto/prometheus.mdwn
@@ -11,8 +11,10 @@ layer on top (see [[Grafana]]).

 [Prometheus]: https://prometheus.io/

-Basic design
------------
+[[!toc levels=3]]
+
+Tutorial
+========

 The Prometheus web interface is available at:

@@ -24,32 +26,12 @@ over the last two weeks for the known servers.

 [this link]: https://prometheus1.torproject.org/graph?g0.range_input=2w&g0.expr=node_load5&g0.tab=0

-Here you can see, from the [Prometheus overview documentation][] the
-basic architecture of a Prometheus site:
+# How-to

-[Prometheus overview documentation]: https://prometheus.io/docs/introduction/overview/
+## Pager playbook
+## Disaster recovery

-<img src="https://prometheus.io/assets/architecture.png" alt="A
-drawing of Prometheus' architecture, showing the push gateway and
-exporters adding metrics, service discovery through file_sd and
-Kubernetes, alerts pushed to the Alertmanager and the various UIs
-pulling from Prometheus" />
-
-As you can see, Prometheus is somewhat tailored towards
-[Kubernetes][] but it can be used without it. We're deploying it with
-the `file_sd` discovery mechanism, where Puppet collects all exporters
-into the central server, which then scrapes those exporters every
-`scrape_interval` (by default 15 seconds). The architecture graph also
-shows the Alertmanager which could be used to (eventually) replace our
-Nagios deployment.
-
-[Kubernetes]: https://kubernetes.io/
-
-It does not show that Prometheus can federate to multiple instances
-and the Alertmanager can be configured with High availability.
-
-Munin expatriates
-----------------
+## Migrating from Munin

 Here's a quick cheat sheet from people used to Munin and switching to
 Prometheus:
@@ -117,8 +99,12 @@ Basically, Prometheus is similar to Munin in many ways:
   without sending duplicate alerts - `munin-limits` can only run on a
   single server

-Puppet implementation
---------------------
+Reference
+=========
+
+## Installation
+
+### Puppet implementation

 Every node is configured as a `node-exporter` through the
 `roles::monitored` that is included everywhere. The role might
@@ -145,3 +131,69 @@ policies.
 [use of Debian packages for installation]: https://github.com/voxpupuli/puppet-prometheus/pull/303
 [allow scrape job collection]: https://github.com/voxpupuli/puppet-prometheus/pull/304
 [Prometheus Puppet module]: https://github.com/voxpupuli/puppet-prometheus/
+
+## SLA
+
+## Design
+
+Here is, from the [Prometheus overview documentation][], the
+basic architecture of a Prometheus site:
+
+[Prometheus overview documentation]: https://prometheus.io/docs/introduction/overview/
+
+<img src="https://prometheus.io/assets/architecture.png" alt="A
+drawing of Prometheus' architecture, showing the push gateway and
+exporters adding metrics, service discovery through file_sd and
+Kubernetes, alerts pushed to the Alertmanager and the various UIs
+pulling from Prometheus" />
+
+As you can see, Prometheus is somewhat tailored towards
+[Kubernetes][] but it can be used without it. We're deploying it with
+the `file_sd` discovery mechanism, where Puppet collects all exporters
+into the central server, which then scrapes those exporters every
+`scrape_interval` (by default 15 seconds). The architecture graph also
+shows the Alertmanager which could be used to (eventually) replace our
+Nagios deployment.
+
+[Kubernetes]: https://kubernetes.io/
+
+It does not show that Prometheus can federate to multiple instances
+and the Alertmanager can be configured with High availability.
+
+## Issues
+
+There is no issue tracker specifically for this project, [File][] or
+[search][] for issues in the [generic internal services][search] component.
+
+ [File]: https://trac.torproject.org/projects/tor/newticket?component=Internal+Services%2FTor+Sysadmin+Team
+ [search]: https://trac.torproject.org/projects/tor/query?status=!closed&component=Internal+Services%2FTor+Sysadmin+Team
+
+## Monitoring and testing
+
+
+# Discussion
+
+## Overview
+
+<!-- describe the overall project. should include a link to a ticket -->
+<!-- that has a launch checklist -->
+
+## Goals
+<!-- include bugs to be fixed -->
+
+### Must have
+
+### Nice to have
+
+### Non-Goals
+
+## Approvals required
+<!-- for example, legal, "vegas", accounting, current maintainer -->
+
+## Proposed Solution
+
+## Cost
+
+## Alternatives considered
+
+<!-- include benchmarks and procedure if relevant -->