From 9460db41fb0080dcf3d465eac877b4eb42da063c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Antoine=20Beaupr=C3=A9?= <anarcat@debian.org>
Date: Wed, 8 May 2024 17:50:13 -0400
Subject: [PATCH] styling (tpo/tpa/team#40755)

---
 policy/tpa-rfc-33-monitoring.md | 34 +++++++++++++++++++++------------
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/policy/tpa-rfc-33-monitoring.md b/policy/tpa-rfc-33-monitoring.md
index c6e34f52..3cc4b081 100644
--- a/policy/tpa-rfc-33-monitoring.md
+++ b/policy/tpa-rfc-33-monitoring.md
@@ -510,30 +510,40 @@ servers](tpa-rfc-33-monitoring/architecture-after.png)
 The above shows a diagram of a highly available Prometheus server
 setup. Each server has its own set of services running:
 
- * Prometheus: the primary pulls metrics from exporters including a
+ * **Prometheus**: the primary pulls metrics from exporters including a
    node exporter on every machine but also other exporters defined by
    service admins, for which configuration is a mix of Puppet and a
-   GitLab repository pulled by Puppet. The secondary server keeps long
-   term metrics and pulls all the metrics from the primary server
-   using a longer scrape interval. Bother Prometheus server monitor
-   each other.
+   GitLab repository pulled by Puppet. 
+   
+   The secondary server keeps long term metrics and pulls all the
+   metrics from the primary server using a longer scrape
+   interval. Bother Prometheus server monitor each other.
 
- * blackbox exporter: this exporter runs on the primary Prometheus
+ * **blackbox exporter**: this exporter runs on the primary Prometheus
    server and is scraped by the primary Prometheus server for
    arbitrary metrics like ICMP, HTTP or TLS response times
 
- * Grafana: the primary server runs a Grafana service which should be
+ * **Grafana**: the primary server runs a Grafana service which should be
    fully configured in Puppet, with some dashboards being pulled from
    a GitLab repository. Local configuration is completely ephemeral
-   and discouraged. It pulls metrics from the local Prometheus server
-   which has a "remote read" interface to pull backlog from the
-   secondary server.
-
- * Alertmanager: each server also runs its own Alertmanager which
+   and discouraged. 
+   
+   It pulls metrics from the local Prometheus server which has a
+   "remote read" interface to pull backlog from the secondary
+   server.
+   
+   In the above diagram, it is shown as pulling directly from Prom2,
+   but that's a symbolic shortcut, it would only use `localhost` as an
+   actual data source.
+
+ * **Alertmanager**: each server also runs its own Alertmanager which
    fires off notifications to IRC, email, or (eventually) GitLab,
    deduplicating alerts between the two servers using its gossip
    protocol.
 
+ * **Karma**: alerting dashboard which pulls alerts from Alertmanager
+   and can issue silences.
+
 The current prometheus1/prometheus2 server will actually be retired in
 favor of two *new* servers which will be rebuilt from scratch,
 entirely from Puppet, LDAP, and GitLab repository, ensuring they are
-- 
GitLab