Skip to content

TPA-RFC-33: monitoring system upgrade or replacement

in #29864 (closed), we've gone pretty deep in comparisons between prometheus and icinga and how the first could replace the latter.

but now we're stuck at "i like this one better than the other" because we don't have a clear set of requirements.

the task here is to write a set of requirements for the new alerting system and, ultimately, make a proposal for the replacement of the deprecated Icinga 1 deployment we have now.

  • establish requirements
  • approve requirements
  • if replacing icinga:
  • if keeping icinga
    • review work from @weasel done on DSA's Puppet/Icinga integration
    • deploy that module or another inciga module inside puppet
    • rewrite all the checks from the nagios-master.cfg file into puppet (300+)
    • rebuild a new Icinga 2 server
    • retire the old Icinga 1 server

current status: awaiting adoption on June 12th.

update: tracked in %TPA-RFC-33-A: emergency Icinga retirement and next.

Edited by anarcat
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information