|
|
---
|
|
|
title: Incident and emergency response: what to do in case of fire
|
|
|
---
|
|
|
|
|
|
This documentation is for sysadmins to figure out what to do when
|
|
|
things go wrong. If you don't have the required accesses and haven't
|
|
|
been trained for such situation, you might be better off just trying
|
|
|
to wake up someone that can deal with them. See the
|
|
|
[doc/how-to-get-help](doc/how-to-get-help) documentation instead.
|
|
|
|
|
|
[[_TOC_]]
|
|
|
|
|
|
Specific situations
|
|
|
===================
|
|
|
|
|
|
Server down
|
|
|
-----------
|
|
|
|
|
|
If a server is non-responsive, you can first check if it is actually
|
|
|
reachable over the network:
|
|
|
|
|
|
ping -c 10 server.torproject.org
|
|
|
|
|
|
If it does respond, you can try to diagnose the issue by looking at
|
|
|
[Nagios][] and/or [Grafana](https://grafana.torproject.org) and analyse what, exactly is going on.
|
|
|
|
|
|
[Nagios]: https://nagios.torproject.org
|
|
|
|
|
|
If it does *not* respond, you should see if it's a virtual machine,
|
|
|
and in this case, which server is hosting it. This information is
|
|
|
available in [howto/ldap](howto/ldap)
|
|
|
(or [the web interface](https://db.torproject.org/machines.cgi), under the
|
|
|
`physicalHost` field). Then login to that server to diagnose this
|
|
|
issue.
|
|
|
|
|
|
If the physical host is not responding or is empty (in which case it
|
|
|
*is* a physical host), you need to file a ticket with the upstream
|
|
|
provider. This information is available in [Nagios][]:
|
|
|
|
|
|
1. search for the server name in the search box
|
|
|
2. click on the server
|
|
|
3. drill down the "Parents" until you find something that ressembles
|
|
|
a hosting provider (e.g. `hetzner-hel1-01` is Hetzner, `gw-cymru`
|
|
|
is Cymru, `gw-scw-*` are at Scaleway, `gw-sunet` is Sunet)
|
|
|
|
|
|
What follows are per-provider instructions:
|
|
|
|
|
|
### Hetzner robot (physical servers)
|
|
|
|
|
|
1. Visit the [Heztner Robot server page](https://robot.your-server.de/server) (password in
|
|
|
`tor-passwords/hosts-extra-info`)
|
|
|
2. Select the right server (hostname is the second column)
|
|
|
3. Select the "reset" tab
|
|
|
4. Select the "Execute an automatic hardware reset" radio button and
|
|
|
hit "Send". This is equivalent to hitting the "reset" button on a
|
|
|
computer.
|
|
|
5. Wait for the server to return for a "few" (2? 5? 10? 20?) minutes,
|
|
|
depending on how hopeful you are this simple procedure will work.
|
|
|
6. If that fails, Select the "Order a manual hardware reset" option
|
|
|
and hit "Send". This will send an actual human to attend the
|
|
|
server and see if they can bring it back online.
|
|
|
|
|
|
If all else fails, Select the "Support" tab and open a support
|
|
|
request.
|
|
|
|
|
|
### Hetzner Cloud (virtual servers)
|
|
|
|
|
|
1. Visit the [Hetzner Cloud console](https://console.hetzner.cloud/) (password in
|
|
|
`tor-passwords/hosts-extra-info`)
|
|
|
2. Select the project (usually "default")
|
|
|
3. Select the affected server
|
|
|
4. Open the console (the `>_` sign on the top right), and see if
|
|
|
there are any error messages and/or if you can login there (using
|
|
|
the root password in `tor-passwords/hosts`)
|
|
|
5. If that fails, attempt a "Power cycle" in the "Power" tab (on the
|
|
|
left)
|
|
|
6. If that fails, you can also try to boot a rescue system by
|
|
|
selecting "Enable Rescue & Power Cycle" in the "Rescue" tab
|
|
|
|
|
|
If all else fails, create a support request. The support menu is in
|
|
|
the "Person" menu on the top right of the page.
|
|
|
|
|
|
### Cymru
|
|
|
|
|
|
Open a ticket by writing <support@cymru.com>.
|
|
|
|
|
|
### Sunet
|
|
|
|
|
|
TBD
|
|
|
|
|
|
Support policies
|
|
|
================
|
|
|
|
|
|
Please see [/policy/tpa-rfc-2-support/](https://gitlab.torproject.org/anarcat/wikitest/-/wikis/policy/tpa-rfc-2-support/) |