title: Incident and emergency response: what to do in case of fire
This documentation is for sysadmins to figure out what to do when things go wrong. If you don't have the required accesses and haven't been trained for such situation, you might be better off just trying to wake up someone that can deal with them. See the doc/how-to-get-help documentation instead.
Specific situations
Server down
If a server is non-responsive, you can first check if it is actually reachable over the network:
ping -c 10 server.torproject.org
If it does respond, you can try to diagnose the issue by looking at Nagios and/or Grafana and analyse what, exactly is going on.
If it does not respond, you should see if it's a virtual machine,
and in this case, which server is hosting it. This information is
available in howto/ldap
(or the web interface, under the
physicalHost
field). Then login to that server to diagnose this
issue.
If the physical host is not responding or is empty (in which case it is a physical host), you need to file a ticket with the upstream provider. This information is available in Nagios:
- search for the server name in the search box
- click on the server
- drill down the "Parents" until you find something that ressembles
a hosting provider (e.g.
hetzner-hel1-01
is Hetzner,gw-cymru
is Cymru,gw-scw-*
are at Scaleway,gw-sunet
is Sunet)
What follows are per-provider instructions:
Hetzner robot (physical servers)
- Visit the Heztner Robot server page (password in
tor-passwords/hosts-extra-info
) - Select the right server (hostname is the second column)
- Select the "reset" tab
- Select the "Execute an automatic hardware reset" radio button and hit "Send". This is equivalent to hitting the "reset" button on a computer.
- Wait for the server to return for a "few" (2? 5? 10? 20?) minutes, depending on how hopeful you are this simple procedure will work.
- If that fails, Select the "Order a manual hardware reset" option and hit "Send". This will send an actual human to attend the server and see if they can bring it back online.
If all else fails, Select the "Support" tab and open a support request.
Hetzner Cloud (virtual servers)
- Visit the Hetzner Cloud console (password in
tor-passwords/hosts-extra-info
) - Select the project (usually "default")
- Select the affected server
- Open the console (the
>_
sign on the top right), and see if there are any error messages and/or if you can login there (using the root password intor-passwords/hosts
) - If that fails, attempt a "Power cycle" in the "Power" tab (on the left)
- If that fails, you can also try to boot a rescue system by selecting "Enable Rescue & Power Cycle" in the "Rescue" tab
If all else fails, create a support request. The support menu is in the "Person" menu on the top right of the page.
Cymru
Open a ticket by writing support@cymru.com.
Sunet
TBD
Support policies
Please see /policy/tpa-rfc-2-support/