Skip to content

merge prometheus2 targets into prometheus1

I am not exactly sure what that means exactly, but prometheus1 needs to supplant prometheus2 so the latter can be retired and replaced with a replica that's highly available.

I think it means we need to deploy the prometheus-alerts repository onto the prometheus1 server. We might also need to do something about all those dashboards in Grafana.

We'll run both servers in parallel for a year (or whatever the prometheus2 retention period is) and then retire prometheus2.

As usual, notify users and all.

flight check:

  • deploy prometheus-alerts repository on prom1 - currently does not produce alerts since prom1 doesn't scrape the same targets
  • configure all of the scrape targets from prom2 onto prom1 and possibly firewall rules to make prom1 able to poll the metrics. create a long (1 year) silence for all alerts with team tags other than TPA. let prom1 gather some data for a little while and verify that scraping is working for all sources
  • deploy dashboards that are used on grafana2 for the other teams onto grafana1 and check that it's showing mostly the same information
  • copy over any other bit of prometheus and alertmanager configuration from prom2 to prom1 that's not currently common to both
  • when we're confident enough create a long (2 years) silence on karma2 for all alerts. then remove the silence from prom1 to make that one send alerts instead
  • after maybe two weeks of checking if everything behaves the same, create an issue for decommisioning prom2 after 1y
Edited by lelutin
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information