Table of Contents

1 Name

exit-ops - Exit Scanner, TorDNSEL and Tor Check Operations

2 Synopsis

While the three services described in this document could be implemented as discrete components, they currently have tight coupling which means they must all be deployed on the same host.

2.1 Exit Scanner

The exit scanner performs active measurement of Tor exit relays in order to determine the IP addresses that are used for exit connections. The active measurement uses an exitmap module, which is wrapped in a script to produce output formatted as an [Exit List](https://metrics.torproject.org/collector.html#type-tordnsel).

The exit list results are consumed by CollecTor, [TorDNSEL](tordnsel) and [Tor Check](../check-ops/). Exit lists and bulk exit lists are also consumed by third-party external applications at the following URLs:

2.2 TorDNSEL

2.3 Tor Check

The primary contact for this service is the Metrics Team <[metrics-team@lists.torproject.org](mailto:metrics-team@lists.torproject.org)>. For urgent queries, contact karsten, irl, or gaba in [#tor-project](ircs://irc.oftc.net:6697/tor-project).

The underlying infrastructure for the Onionoo service is provided by the Tor Sysadmin Team (TSA). There are a number of HTTP caches (onionoo-frontend-\.torproject.org*, currently running varnish) that sit in front of a number of backends (onionoo-backend-\.torproject.org*, running various Java compnents described below).

The frontends are entirely managed by TSA. The frontends communicate with the backends via IPsec tunnels managed by TSA.

The backend hosts are managed by TSA with the Onionoo services being managed by Metrics Team. The Onionoo services get their data from the "collector" service.

The backend are redundant and can survive outages, in those conditions:

  • shorter than 72 hours: backends can self-heal
  • longer partial outage: as long as a backend remains, the other backends can be restored from the remaining backend, although that is a manual process.
  • longer total outage: if all backends go down for more than 72h, data can still be recovered from collector, but that's another, different manual process that still has to be implemented

Note that data is recovered from collector, which has similar self-healing systems that cover 72 hours.

The Disaster Recovery section details how to recover from those situations.

## Onionoo Service Architecture

The Onionoo service consists of two parts: the hourly updater and the web server.

Both parts run on each backend host and the parts run with privilege seperation.

## Hourly Updater

The hourly updater is contained in the JAR file, which is built from the sources with:

ant jar

The JAR file is also included in the tarballs made available with releases in the generated/dist/ folder. The filename will look like onionoo-{protocol version}-{software version}.jar and on the backend host should be found in srv/onionoo.torproject.org/onionoo.

## Web Server

The web server is contained in the WAR file, which is built from the sources with:

ant war

The WAR file is also included in the tarballs made available with releases in the generated/dist/ folder. The filename will look like onionoo-{protocol version}-{software version}.jar and on the backend host should be found in srv/onionoo.torproject.org/onionoo.

Onionoo releases are available [from dist.torproject.org](https://dist.torproject.org/onionoo/) with the source code available [from Tor Project git](https://gitweb.torproject.org/onionoo.git).

Deployment and maintainence scripts are part of [metrics-cloud](https://gitweb.torproject.org/metrics-cloud.git).

## Initial deployment

The initial deployment procedure is split into 3 parts:

3 System setup

4 Importing history

5 Installing and starting the service

### Development/testing in AWS

For development or testing in AWS, a CloudFormation template is available named [onionoo-dev.yml](https://gitweb.torproject.org/metrics-cloud.git/plain/cloudformation/onionoo-dev.yml). The header for this template includes the command required to deploy the stack. It will deploy in your local user's namespace (the output of whoami) and must be provided with the name of your SSH key pair.

From the [CloudFormation portal](https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks), select your stack and view the outputs. You will find here the public IP address for the EC2 instance that has been created. Add this instance to ansible/dev in your local copy of metrics-cloud.git under "[onionoo-backends]".

You can now setup the machine with Ansible by running:

``` ansible-playbook -i dev onionoo-backends-aws.yml ```

Note that the AWS AMI used has passwordless sudo, so no password need be given.

### Fresh machine from TSA

Begin by copying the state and out directories from another Onionoo backend to /srv/onionoo.torproject.org/onionoo/{state,out}.

Add the host name of the new instance to ansible/production in your local copy of metrics-cloud.git under "[onionoo-backends]" and commit the change.

You can now setup the machine with Ansible by running:

``` ansible-playbook -i production -K onionoo-backends.yml ```

## Upgrade

The version number of Onionoo to install is stored as a variable in the main onionoo-backends.yml playbook. Begin by changing this to the new version number in your local clone, commit and push the change.

Login to each of the Onionoo backends and check the logs for the hourly updater to ensure it's not mid-update. If everything is idle:

``` sudo -u onionoo -i bash -c 'systemctl –user stop onionoo' ```

You can now run the Ansible playbook to update the installed versions and restart the services:

``` ansible-playbook -i production -K onionoo-backends.yml ```

## Logs

Logs for the hourly updater can be found in srv/onionoo.torproject.org/logs, and for the web server in srv/onionoo.torproject.org/web-logs.

## Web Server Direct Connection

It may be necessary to determine if an issue is caused by a frontend or a backend.

To connect directly to an Onionoo backend's web server, you will need to use SSH port forwarding. This is because the web server is not available for incoming connections on its Internet address. For example:

``` ssh -L8039:localhost:8080 onionoo-backend-01.torproject.org ```

You'll then be able to connect to localhost:8039 in your web browser.

Onionoo is monitored by the TSA Nagios instance (future task: add to Metrics Nagios) using the [tor-check-onionoo](https://gitweb.torproject.org/admin/tor-nagios.git/tree/tor-nagios-checks/checks/tor-check-onionoo) plugin. The frontends and backends are each monitored. The check is simple: it looks for the last time that bridge and relay information was updated and alerts if they are too old.

Alerts are sent to the metrics-alerts mailing list.

## Single backend data corruption, no hardware failure

``` sudo -u onionoo -i bash -c 'systemctl –user stop onionoo' sudo -u onionoo-unpriv -i bash -c 'systemctl –user stop onionoo-web' rm -rf /srv/onionoo.torproject.org/onionoo/home/{.,}\* rm -rf /srv/onionoo.torproject.org/onionoo/home-unpriv/{.,}\* rm -rf /srv/onionoo.torproject.org/onionoo/onionoo/{.,}\* ```

Then pretend you are deploying a new backend from the instructions above.

## Single backend failure, hardware failure

In the event of a single backend failure, ask TSA to trash it and make a new one. Once Puppet has configured their side of it, pretend you are deploying a new backend from the instructions above.

## Total loss

In the event of a total loss, ask TSA to trash all the backends and make new ones. Once Puppet has configured one host, restore the state and out directories from the latest good backup. It may be necessary to refer to the logs to work out when the latest good backup might be, which should also be backed up. Once state and out are in place, pretend you are deploying a new backend from the instructions above.

## Total loss including all backups

In the event that the backups have also been lost, it will not be possible to restore history. The data does exist in CollecTor to do this, but there is no code that actually does it.

If no out directory is present on the instance when the Ansible playbook is run to install and start the service, it will perform an initial single run of the updater to bootstrap. This will be where history starts.

Try to avoid this happening.

The Onionoo service implements the [Onionoo protocol](https://metrics.torproject.org/onionoo.html).

Known bugs can be found in the Tor Project Trac using [this query](https://trac.torproject.org/projects/tor/query?status=!closed&component=Metrics/Onionoo).

New bug reports should also be [filed at the Tor Project Trac](https://trac.torproject.org/projects/tor/newticket?component=Metrics/Onionoo) in the Metrics/Onionoo component.

Author: Iain Learmonth

Created: 2020-03-31 Tue 15:01

Validate