train TPA team on new monitoring system
we've progressed far enough in the new monitoring system that we have a lag between the knowledge of some of our teammates (specifically @lavamind and the new folks) and those who worked on building the thing (@lelutin and @anarcat).
This needs to happen before retiring nagios (#40695 (closed)).
-
review documentation (#41655 -- the Priority A part) -
create material for a quick presentation on how to use and modify the prometheus+alertmanager monitoring service, see https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#training-course-plan -
plan for a time and give a presentation of various tools in a session so that people get up to speed, 1400 utc wednesday -
give the actual training
@lelutin feel free to take this as well if you want to give out the training.
Edited by anarcat