Prometheus inhibitions
Quote from TPA-RFC-33:
Alertmanager supports two different concepts for turning off notifications:
silences: operator issued override that turns off notifications for a given amount of time
inhibitions: configured override that turns off notifications for an alert if another alert is already firing
We will make sure we can silence alerts from the Karma dashboard, which should work out of the box. It should also be possible to silence alerts in the built-in Alertmanager web interface, although that might require some manual work to deploy correctly in the Debian package.
By default, silences have a time limit in Alertmanager. If that becomes a problem, we could deploy kthxbye to automatically extend alerts.
The other system, inhibitions, needs configuration to be effective. Micah said it is worth spending at least some time configuring some basic inhibitions to keep major outages from flooding operators with alerts, for example turning off alerts on reboots and so on. There are also ways to write alerting rules that do not need inhibitions at all.