Verified Commit 9460db41 authored by anarcat's avatar anarcat 💥
Browse files

styling (team#40755)

parent 0a0ff08c
Loading
Loading
Loading
Loading
+22 −12
Original line number Diff line number Diff line
@@ -510,30 +510,40 @@ servers](tpa-rfc-33-monitoring/architecture-after.png)
The above shows a diagram of a highly available Prometheus server
setup. Each server has its own set of services running:

 * Prometheus: the primary pulls metrics from exporters including a
 * **Prometheus**: the primary pulls metrics from exporters including a
   node exporter on every machine but also other exporters defined by
   service admins, for which configuration is a mix of Puppet and a
   GitLab repository pulled by Puppet. The secondary server keeps long
   term metrics and pulls all the metrics from the primary server
   using a longer scrape interval. Bother Prometheus server monitor
   each other.
   GitLab repository pulled by Puppet. 
   
 * blackbox exporter: this exporter runs on the primary Prometheus
   The secondary server keeps long term metrics and pulls all the
   metrics from the primary server using a longer scrape
   interval. Bother Prometheus server monitor each other.

 * **blackbox exporter**: this exporter runs on the primary Prometheus
   server and is scraped by the primary Prometheus server for
   arbitrary metrics like ICMP, HTTP or TLS response times

 * Grafana: the primary server runs a Grafana service which should be
 * **Grafana**: the primary server runs a Grafana service which should be
   fully configured in Puppet, with some dashboards being pulled from
   a GitLab repository. Local configuration is completely ephemeral
   and discouraged. It pulls metrics from the local Prometheus server
   which has a "remote read" interface to pull backlog from the
   secondary server.
   and discouraged. 
   
 * Alertmanager: each server also runs its own Alertmanager which
   It pulls metrics from the local Prometheus server which has a
   "remote read" interface to pull backlog from the secondary
   server.
   
   In the above diagram, it is shown as pulling directly from Prom2,
   but that's a symbolic shortcut, it would only use `localhost` as an
   actual data source.

 * **Alertmanager**: each server also runs its own Alertmanager which
   fires off notifications to IRC, email, or (eventually) GitLab,
   deduplicating alerts between the two servers using its gossip
   protocol.

 * **Karma**: alerting dashboard which pulls alerts from Alertmanager
   and can issue silences.

The current prometheus1/prometheus2 server will actually be retired in
favor of two *new* servers which will be rebuilt from scratch,
entirely from Puppet, LDAP, and GitLab repository, ensuring they are