TPA-RFC-33: monitoring system upgrade or replacement
in #29864 (closed), we've gone pretty deep in comparisons between prometheus and icinga and how the first could replace the latter.
but now we're stuck at "i like this one better than the other" because we don't have a clear set of requirements.
the task here is to write a set of requirements for the new alerting system and, ultimately, make a proposal for the replacement of the deprecated Icinga 1 deployment we have now.
-
establish requirements -
approve requirements -
if replacing icinga: -
review #29864 (closed) for ideas and tasks -
decide whether we keep the prometheus1/2 distinction -
draft specification of all components, personas, etc, see https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-33-monitoring
-
- if keeping icinga
-
review work from @weasel done on DSA's Puppet/Icinga integration -
deploy that module or another inciga module inside puppet -
rewrite all the checks from thenagios-master.cfg
file into puppet (300+) -
rebuild a new Icinga 2 server -
retire the old Icinga 1 server
-
current status: awaiting adoption on June 12th.
update: tracked in %TPA-RFC-33-A: emergency Icinga retirement and next.
Edited by anarcat