review prometheus documentation after service overhaul
We're moving a lot of things around in Prometheus. Make sure the Prometheus documentation is up to date in the wiki, in particular, perform the following checks:
priority A
Those need to be done as part of %TPA-RFC-33-A: emergency Icinga retirement, before we give out training (#41767 (closed)).
-
overall document structure review (done until pager playbooks section) -
quickly, review monitoring and testing section, to see if there's any urgent changes to be made there -
how to scrape a new target? (present, but messy) -
how to add an existing alert to prometheus -
document IRC channel -
how to write an alert? -
"where is my nagios check?" howto -
document silences -
how to create a blackbox check? -
document blackbox exporter oddities
See also the questions raised in the training, in #41767 (closed).
priority B
those may be done after icinga is retired, as part of priority B (%TPA-RFC-33-B: Prometheus server merge, more exporters).
-
sync with template.md -
review backups -
review monitoring and testing, yes, again -
review architecture -
review design, possibly copying a lot of TPA-RFC-33 in here -
storage -
queues -
authentication -
implementation -
related services -
Security and risk assessment -
Technical debt and next steps -
document TPA-RFC-33 in the wiki page (proposed solution?) -
remaining TODO items
/cc @lelutin
Edited by anarcat