Changes
Page history
prometheus: estimate time to recovery
authored
Nov 20, 2025
by
lelutin
+ be a bit more precise on information held by alertmanager.
Hide whitespace changes
Inline
Side-by-side
service/prometheus.md
View page @
f44ec9ba
...
...
@@ -2404,12 +2404,18 @@ Puppet.
Non-configuration data should be restored from backup, with
`/var/lib/prometheus/`
being sufficient to reconstruct history.
The time to restore data depends on the data size and state of the network, but
for a rough indication on 2025-11-19, the dataset was 144Gb large and the
transfer took somewhere between 2.5 and 3h.
If even backups are destroyed, history will be lost, but the server should still
recover and start tracking new metrics.
Note that neither Alertmanager nor Karma hold specific state data, so nothing
needs to be taken out of backups for those and as long as prometheus is tracking
metrics they should both be working as well.
Note that Alertmanager holds information about the current alert silences in
place. If those are lost, we can recreate silences on a need-to basis. Karma
does not hold specific state data, so nothing needs to be taken out of backups
for it. Also, as long as prometheus is tracking metrics both services should
both be working as well.
# Reference
...
...
...
...