Skip to content
Snippets Groups Projects
Verified Commit 30146e2f authored by anarcat's avatar anarcat
Browse files

try to rephrase the group_wait stuff again

I didn't find the result to be particularly legible, and it had lost
the separation of source code references I had before. I am not sure
we should dig too much into implementation details (like "threads"),
but I kept that anyways.

This is mostly reformulations.
parent 209eff5b
No related branches found
No related tags found
No related merge requests found
Pipeline #205893 passed with warnings
......@@ -1376,24 +1376,25 @@ false. We're ignoring this for now.
At this point, the alert has reached Alertmanager and it needs to make a
decision of what to do with it. More timers are involved.
Alerts will be evaluated against the alert routes, thus aggregated into a new
group or added to an existing group according to that route's `group_by`
setting, and then Alertmanager will evaluate the timers set on the particular
route that was matched. An alert group is created when an alert is received and
no other alerts already match the same values for the `group_by` criteria. An
alert group is removed when all alerts in a group are in state `inactive` (e.g.
resolved).
Alerts will be evaluated against the alert routes, thus aggregated
into a new group or added to an existing group according to that
route's `group_by` setting, and then Alertmanager will evaluate the
timers set on the particular route that was matched. An alert group is
created when an alert is received and no other alerts already match
the same values for the `group_by` criteria. An alert group is removed
when all alerts in a group are in state `inactive` (e.g. resolved).
Fourth, there's the `group_wait` setting (defaults to 5 seconds, can
be [customized by
route](https://prometheus.io/docs/alerting/latest/configuration/#route)).
This timer is initialized when the alerting group is created, see
[`dispatch/dispatch.go`, line 415, function `newAggrGroup`](https://github.com/prometheus/alertmanager/blob/e9904f93a7efa063bac628ed0b74184acf1c7401/dispatch/dispatch.go#L415).
This will keep Alertmanager from routing any alerts for a while thus allowing it
to group the _first_ alert notification for all alerts in the same group in one
batch. It implies that you will not receive a notification for a new alert
before that timer has elapsed. See also the too short [documentation on
grouping](https://prometheus.io/docs/alerting/latest/alertmanager/#grouping).
be [customized by route](https://prometheus.io/docs/alerting/latest/configuration/#route)). This will keep Alertmanager from
routing any alerts for a while thus allowing it to group the _first_
alert notification for all alerts in the same group in one batch. It
implies that you will not receive a notification for a new alert
before that timer has elapsed. See also the too short [documentation
on grouping](https://prometheus.io/docs/alerting/latest/alertmanager/#grouping).
(The `group_wait` timer is initialized when the alerting group is
created, see [`dispatch/dispatch.go`, line 415, function
`newAggrGroup`](https://github.com/prometheus/alertmanager/blob/e9904f93a7efa063bac628ed0b74184acf1c7401/dispatch/dispatch.go#L415).)
Now, *more* alerts might be sent by Prometheus if more metrics match
the above expression. They are *different* alerts because they have
......@@ -1402,18 +1403,22 @@ more commonly, other hosts require a reboot). Prometheus will then
relay that alert to the Alertmanager, and another timer comes in.
Fifth, before relaying that new alert that's already part of a firing
group, the `group_interval` (defaults to 5m) timer comes into play. When a
Alertmanager first creates an alert group, a thread is started for that group
and in that thread the value of `group_interval` that's set in the _route_ acts
like a time ticker: every `group_interval`, if there's a new notification that
needs to be sent, it will be sent. So new alerts will need to wait _up to_
`group_interval` before being relayed.
That timer is also initialized [in `dispatch.go`, line 460, function
`aggrGroup.run()`](https://github.com/prometheus/alertmanager/blob/e9904f93a7efa063bac628ed0b74184acf1c7401/dispatch/dispatch.go#L460). It's done *after* that function waits for the
previous timer which is normally based on the `group_wait` value, but
can be switched to `group_interval` after that very iteration, of
course.
group, Alertmanager will wait `group_interval` (defaults to 5m) before
resending a notification to the a group.
When Alertmanager first creates an alert group, a thread is started
for that group and the _route_'s `group_interval` acts like a time
ticker. Notifications are only sent when the `group_interval` period
repeats.
So new alerts merged in a group wait _up to_ `group_interval` before
being relayed.
(The `group_interval` timer is also initialized [in `dispatch.go`, line
460, function `aggrGroup.run()`](https://github.com/prometheus/alertmanager/blob/e9904f93a7efa063bac628ed0b74184acf1c7401/dispatch/dispatch.go#L460). It's done *after* that function
waits for the previous timer which is normally based on the
`group_wait` value, but can be switched to `group_interval` after that
very iteration, of course.)
So, conclusions:
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment