Skip to content

Draft: Create alerts for CPU and RAM pressure counters (team#41639)

anarcat requested to merge 41639_prio_b into main

Those thresholds are what we can currently observe as reasonable limits. The alerts wait for 15 minutes to make sure that we don't alert on instantaneous blips, which wouldn't indicate a really problematic situation.

We've decided to exclude the ganeti nodes from the pressure counter alerts since those metrics are in pretty much all cases not related to activity on the physical machines but rather on the VMs running on them. We can currently clearly see that for IO pressure counters, the alert would be duplicated between VM and physical machine.

Merge request reports

Loading