**exit-ops** - Exit Scanner, TorDNSEL and Tor Check Operations
* TODO Synopsis
While the three services described in this document could be implemented as discrete components,
they currently have tight coupling which means they must all be deployed on the same host.
** TODO Exit Scanner [0/3]
The exit scanner performs active measurement of Tor exit relays in order to determine the IP addresses that are used for exit connections.
The active measurement uses an exitmap module, which is wrapped in a script to produce output formatted as an [Exit List](https://metrics.torproject.org/collector.html#type-tordnsel).
The exit list results are consumed by CollecTor, [TorDNSEL](tordnsel) and [Tor Check](../check-ops/).
Exit lists and bulk exit lists are also consumed by third-party external applications at the following URLs:
- https://check.torproject.org/exit-addresses - Latest exit list
- https://check.torproject.org/torbulkexitlist - Latest bulk exit list
Documentation questions:
- [ ] How long do we keep old measurements in the exit list?
- [ ] What are the timings for measurement runs?
- [ ] How many old exit lists do we keep around?
** TODO TorDNSEL [0/2]
TorDNSEL is a DNS list service that behaves in a similar way to [[https://en.wikipedia.org/wiki/Domain_Name_System-based_Blackhole_List][Domain Name System-based Blackhole Lists]].
IP addresses will give positive results in the event that an address has been found to be used by an exit relay in a recent scan.
Documentation questions:
- [ ] For how long does an address give a positive result?
- [ ] Do we also include all IP addresses of exit flagged relays in the consensus?
** TODO Tor Check [0/1]
Tor Check is a website that can be used to determine if a browser is using the Tor network for queries.
It will also check the User-Agent to determine if a user is using Tor Browser.
It is accessed via HTTPS at https://check.torproject.org/.
Documentation questions:
- [ ] Where is the JSON API?
* DONE Contacts
The primary contact for this service is the Metrics Team <[[mailto:metrics-team@lists.torproject.org][metrics-team@lists.torproject.org]]>.
For urgent queries, contact *karsten*, *irl*, or *gaba* in [[ircs://irc.oftc.net:6697/tor-project][#tor-project]].
* TODO Overview
The underlying infrastructure for the exit scanner, TorDNSEL and Tor Check services is provided by the
Tor Sysadmin Team (TSA). All services run on one virtual machine with the hostname ~check-01.torproject.org~.
** TODO Exit Scanner
Documentation questions:
- [ ] Where is the exitmap module?
- [ ] What are the services called?
- [ ] What user is used?
** TODO TorDNSEL
Documentation questions:
- [ ] Where does the zone file live?
- [ ] Ticket about doing DNSSEC signing
- [ ] Where is DNS served
- [ ] What name is delegated
- [ ] Can delegation work in testing environment?
* DONE Sources
The sources for exitmap are available on GitHub: https://github.com/NullHypothesis/exitmap.
The [[https://gitweb.torproject.org/metrics-cloud.git/tree/ansible/roles/exit-scanner/files/exitscan.py][exitmap wrapper]] and [[https://gitweb.torproject.org/metrics-cloud.git/tree/ansible/roles/exit-scanner/files/ipscan.py][module]] used by the exit scanner can be found in the metrics-cloud repository.
The wrapper script is also responsible for writing out the zone file to be used by the TorDNSEL service
and triggering a reload of the zone.
The sources for Tor Check are available in our git: https://gitweb.torproject.org/check.git.
* TODO Deployment
** DONE Initial deployment
The initial deployment procedure is split into 2 parts:
- System setup
- Installing and starting the services
There are no manual steps required to load state, and backups do not need to be performed for the host running this service.
Everything can be configured from scratch with only the Ansible playbook.
*** DONE Development/testing in AWS
For development or testing in AWS, a CloudFormation template is available named [[https://gitweb.torproject.org/metrics-cloud.git/plain/cloudformation/exit-scanner-dev.yml][~exit-scanner-dev.yml~]].
From the [[https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks][CloudFormation portal]], select your stack and view the outputs.
You will find here the public IP address for the EC2 instance that has been created.
Add this instance to *ansible/dev* in your local copy of metrics-cloud.git under "[exit-scanners]".
You can now setup the machine with Ansible by running:
```
ansible-playbook -i dev exit-scanners-aws.yml
```
Note that the AWS AMI used has passwordless sudo, so no password need be given.
*** DONE Fresh machine from TSA
Add the host name of the new instance to *ansible/production* in your local
copy of metrics-cloud.git under "[exit-scanners]" and commit the change.
You can now setup the machine with Ansible by running:
```
ansible-playbook -i production -K exit-scanners.yml
```
** TODO Upgrade [0/2]
The upstream sources for the applications that make up this service do not have managed releases
which makes this difficult.
To fix a bug in the exit scanner wrapper script, fix this in the metrics-cloud repository and re-run
the deployment playbook.
- [ ] Can we upgrade exitmap sensibly?
- [ ] Can we upgrade Tor Check sensibly?
* TODO Diagnostics
** TODO Logs [0/2]
- [ ] What things log?
- [ ] Where do the logs go?
* TODO Monitoring [0/2]
- [ ] CollecTor log messages
- [ ] Nagios
* DONE Disaster Recovery
The exit scanner service does not need to maintain any state between runs.
It's nice if it can in order to cope with a relay that happened to be down at the time we tried to measure
it but in the event of a failure it is perfectly acceptable to throw away the old box and provision a new one.
Follow the initial deployment instructions above.
* TODO Service Level Agreement
* TODO See Also
* TODO Standards
The exit scanner service produces exit lists according to the [[https://2019.www.torproject.org/tordnsel/exitlist-spec.txt][TorDNSEL exit list format]].
* TODO History
* TODO Authors
* DONE Major Caveats
The exit scanner service does not support IPv6.
* DONE Bugs
Known bugs can be found in the Tor Project Trac for:
#+TITLE: metrics-cloud: Scripts for orchestrating Tor Metrics services
#+OPTIONS: ^:nil
* DONE Synopsis
The metrics-cloud framework aims to enable:
- reproducible deployments of software
- consistency between those software deployments
Side-effects of these goals are:
- reproducible experiments (good science)
- reduced maintainence costs
- reduced human error
There are currently two components to the metrics-cloud framework: CloudFormation templates and Ansible playbooks.
The CloudFormation templates are relevant only to testing and development, while the Ansible playbooks are applicable
to both environments.
* DONE Usage of AWS for Tor Metrics Development
Each member of the Tor Metrics team has a standing allowance of 100USD/month for development using AWS. In practice,
we have not used more than 50USD/month for the team in any one month and generally sit around 25USD/month. It is
still important to minimize costs when using AWS and the use of CloudFormation templates and Ansible playbooks for
rapid creation, provisioning and destruction should help with this.
** DONE CloudFormation Templates
CloudFormation is an AWS service allowing the definition of /stacks/. These stacks describe a series of AWS services
using a domain-specific language and allow for the easy creation of a number of interconnected resources. All resources
in a stack are tagged with the stack name which allows for tracking of costs per project. Each stack can also have all
resources terminated together easily, allowing stacks to exist for only as long as they are needed.
The CloudFormation templates used in the framework can be found in the [[https://gitweb.torproject.org/metrics-cloud.git/tree/cloudformation][cloudformation]] folder of the repository.
It may be that for some services the templates are very simple, and others may be more complex. No matter the level of
complexity we still want to use the templates to ensure we are meeting the key goals of the framework and also to simplify
tracking of spending in the billing portal through the tags.
Documentation for CloudFormation, including an API reference, can be found at: https://docs.aws.amazon.com/cloudformation/.
*** DONE Quickstart: Deploying a template
Each template begins with comments with any relevant notes about the template, and a deployment command that will upload
and deploy the template on AWS. The commands will look something like:
You'll notice that the command includes a call to ~whoami~ to prefix the stack name with your current username, and also
that the ~identify_user.sh~ script is used to determine which SSH key to use for new instances.
You do not need to modify this command line before running it.
Once the stack has been deployed from the template, you can view its resources and delete it through
the [[https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks?filteringText=&filteringStatus=active&viewNested=true&hideStacks=false][CloudFormation management console]].
*** DONE SSH Key Selection
The [[https://gitweb.torproject.org/metrics-cloud.git/tree/cloudformation/identify_user.sh][identify_user.sh]] script prints out the name of the SSH public key to be used based on either:
- the ~TOR_METRICS_SSH_KEY~ environment variable, or
- the current user name.
The environment variable takes precedence over the username to key mapping.
If you change the default key you would like to use, update the mapping in this shell script.
SSH keys are managed through the [[https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#KeyPairs:][EC2 management console]] and are not (currently) managed by a CloudFormation template.
** DONE Templates and Stacks
There is no directory hierachy for the templates in the ~cloudformation~ folder of the repository. There are a couple of naming
conventions used though:
- Development/testing templates/stacks use a ~-dev~ suffix after the service name
- Long-term and shared templates/stacks start with ~metrics-~
*** DONE ~billing-alerts~
The [[https://gitweb.torproject.org/metrics-cloud.git/tree/cloudformation/billing-alerts.yml][~billing-alerts~ template]] sends notifications to the subscribed individuals whenever the predicted spend for the month will be
over 50USD. Email addresses can be added here if other people should be notified too.
*** DONE ~metrics-vpc~
The [[https://gitweb.torproject.org/metrics-cloud.git/tree/cloudformation/metrics-vpc.yml][~metrics-vpc~ template]] contains shared resources for Tor Metrics development templates. This includes:
**** MetricsVPC and MetricsSubnet
The subnet should be referenced by any resource that requires it. Use of the default VPC should be avoided as we
share the AWS account with other Tor teams.
For example, to create an EC2 instance:
#+BEGIN_SRC yaml
Instance:
Type: AWS::EC2::Instance
Properties:
AvailabilityZone: !Select [ 0, !GetAZs ]
ImageId: ami-01db78123b2b99496
InstanceType: t2.large
SubnetId:
Fn::ImportValue: 'MetricsSubnet'
KeyName: !Ref myKeyPair
SecurityGroupIds:
- Fn::ImportValue: 'MetricsInternetSecurityGroup'
- Fn::ImportValue: 'MetricsPingableSecurityGroup'
- Fn::ImportValue: 'MetricsHTTPASecurityGroup'
#+END_SRC
Note also that the availability zone is not hardcoded to allow for portability between regions if we ever want that.
**** Various security groups
The EC2 example above uses some of the security groups from the ~metrics-vpc~ template. Refer to the template source
for details on each group's rules.
**** The development DNS zone
Often services require TLS certificates, or require DNS names for other reasons. To facilitate this, a zone is hosted
using Route53 allowing for DNS records to be created in CloudFormation templates. This zone is:
~tm-dev-aws.safemetrics.org~.
As an example, creating an A record for an EC2 instance with the subdomain of the stack name:
It's not common to use other AWS services as part of these templates as the goal is usually to have these services
deployed on TPA managed hosts.
** DONE Linting
[[https://github.com/aws-cloudformation/cfn-python-lint][~cfn-lint~]] is used to ensure we are complying with best practices. None of the team have formal training in the use of CloudFormation
so we are really making it up as we go along. Other tools may be used in the future, as we learn about them, to make sure we are using
things efficiently and correctly.
This is also run as part of the [[https://travis-ci.org/github/torproject/metrics-cloud/][continuous integration checks]] on Travis CI.
* TODO Ansible Playbooks
Ansible is an open-source software provisioning, configuration management, and application-deployment tool. It's written in Python,
is mature, and has an extensive selection of modules for almost everything we could need.
** TODO Inventories and site.yml
In general, there are two inventories: [[https://gitweb.torproject.org/metrics-cloud.git/tree/ansible/production][production]] and dev. Only the production inventory is committed to git, the dev inventory will
vary between members of the team, referencing their own dev instances as created by CloudFormation. We do not specify a default
inventory in the ~ansible.cfg~ file, so you must specify an inventory for every invocation of ~ansible-playbook~ using the ~-i~ flag:
#+BEGIN_SRC shell
ansible-playbook -i dev ...
#+END_SRC
Inside the inventory, hosts are grouped by their purpose. For each group there is a corresponding YAML file in the root of the
~ansible~ directory that specifies a playbook for the group. All of these files are included in the ~site.yml~ master playbook to
allow multiple hosts to be provisioned together.
** TODO ~metrics-common~
The [[https://gitweb.torproject.org/metrics-cloud.git/tree/ansible/roles/metrics-common][~metrics-common~]] role allows us to have a consistent environment between services, and closely matches the environment that
would be provided by a TSA managed machine. The role handles:
- installation of dependency packages from Debian (optionally from the backports repository)
- formats additional volumes attached to the instance using the specified filesystem
- sets the timezone to UTC (Q: /is this what TSA do?/)
- creates user accounts for each member of the team
- all team members can perform unlimited passwordless sudo (TSA hosts require a password)
- SSH password authentication is disabled
- all user account passwords are removed/disabled
- creates service user accounts as specified
- home directories are created as specified, and linked from ~/home/$user~
- lingering is enabled for service users
This is all configured via group variables in the [[https://gitweb.torproject.org/metrics-cloud.git/tree/ansible/group_vars][~ansible/group_vars/~]] folder. Examples there should help you to understand how
these work. These override the [[https://gitweb.torproject.org/metrics-cloud.git/tree/ansible/roles/metrics-common/defaults/main.yml][defaults]] set in the role.
** TODO Service roles
** DONE Linting
[[https://docs.ansible.com/ansible-lint/][~ansible-lint~]] is used to ensure we are complying with best practices. None of the team have formal training in the use of Ansible
so we are really making it up as we go along. Other tools may be used in the future, as we learn about them, to make sure we are using
things efficiently and correctly.
This is also run as part of the [[https://travis-ci.org/github/torproject/metrics-cloud/][continuous integration checks]] on Travis CI.
* TODO Common Tasks
** TODO Add a new member to the team
** TODO Update an SSH key for a team member
** TODO Deploy and provision a development environment for a service