add incident response template
This is mostly to provide us with a post-mortem template as part of enhancing our incident response procedures (#40421 (closed)).
The actual incident template is different than the default "bug report" template, and deliberately so: it's designed to be fast to fill up, with a clear separation between the first responder duties and the comms/research role that come later.
The basic issue structure (next steps, dashboards, related issues) is based on existing incidents like lists-01 performance issues (OOM, latency) (#41957 - closed).
The overall post-mortem template is based on the The Practice of Cloud System Administration by Thomas A. Limoncelli (p. 981).
The roles are inspired by the Google SRE book:
https://sre.google/sre-book/managing-incidents/#recursive-separation-of-responsibilities
We prefered them over the driver/comms/command/records/research roles proposed here:
https://bitfieldconsulting.com/blog/got-game-secrets-of-great-incident-management
... because:
- that's more people (5 instead of 4)
- no one has the job to call for food
- we like the idea of the note-taker having a more active role in planning ahead and being responsible for post-mortem
We discarded the more complex (6-person!) structure suggested by Pager Duty because of our modest size: