merge prometheus2 targets into prometheus1
I am not exactly sure what that means exactly, but prometheus1
needs
to supplant prometheus2
so the latter can be retired and replaced
with a replica that's highly available.
I think it means we need to deploy the prometheus-alerts
repository onto the prometheus1
server. We might also need to do
something about all those dashboards in Grafana.
We'll run both servers in parallel for a year (or whatever the
prometheus2
retention period is) and then retire prometheus2
.
As usual, notify users and all.
flight check:
-
deploy prometheus-alerts repository on prom1 - currently does not produce alerts since prom1 doesn't scrape the same targets -
configure all of the scrape targets from prom2 onto prom1 and possibly firewall rules to make prom1 able to poll the metrics. create a long (1 year) silence for all alerts with team tags other than TPA
. let prom1 gather some data for a little while and verify that scraping is working for all sources -
deploy dashboards that are used on grafana2 for the other teams onto grafana1 and check that it's showing mostly the same information -
copy over any other bit of prometheus and alertmanager configuration from prom2 to prom1 that's not currently common to both -
when we're confident enough create a long (2 years) silence on karma2 for all alerts. then remove the silence from prom1 to make that one send alerts instead -
after maybe two weeks of checking if everything behaves the same, create an issue for decommisioning prom2 after 1y
Edited by lelutin