Skip to content
Snippets Groups Projects
Commit 67f94c0c authored by irl's avatar irl :keyboard:
Browse files

metrics/exit-ops: progress

parent e30adf0e
No related branches found
No related tags found
No related merge requests found
......@@ -3,7 +3,7 @@
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<!-- 2020-03-31 Tue 15:01 -->
<!-- 2020-03-31 Tue 15:26 -->
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>&lrm;</title>
......@@ -233,22 +233,47 @@ for the JavaScript code in this tag.
<h2>Table of Contents</h2>
<div id="text-table-of-contents">
<ul>
<li><a href="#orgf8a5360">1. Name</a></li>
<li><a href="#org421d74b">2. Synopsis</a>
<li><a href="#orgf71f11f">1. <span class="todo TODO">TODO</span> Name</a></li>
<li><a href="#orgb479f4d">2. <span class="todo TODO">TODO</span> Synopsis</a>
<ul>
<li><a href="#org1e1fe6c">2.1. Exit Scanner</a></li>
<li><a href="#org3a6eff2">2.2. TorDNSEL</a></li>
<li><a href="#orgb90836d">2.3. Tor Check</a></li>
<li><a href="#org9de5d07">2.1. <span class="todo TODO">TODO</span> Exit Scanner <code>[0/3]</code></a></li>
<li><a href="#org0ea311a">2.2. <span class="todo TODO">TODO</span> TorDNSEL <code>[0/2]</code></a></li>
<li><a href="#org08eae9a">2.3. <span class="todo TODO">TODO</span> Tor Check <code>[0/1]</code></a></li>
</ul>
</li>
<li><a href="#org8a3c346">3. System setup</a></li>
<li><a href="#org267a2f7">4. Importing history</a></li>
<li><a href="#orga62ddac">5. Installing and starting the service</a></li>
<li><a href="#org7a758ce">3. <span class="done DONE">DONE</span> Contacts</a></li>
<li><a href="#org2c30690">4. <span class="todo TODO">TODO</span> Overview</a></li>
<li><a href="#orga7ae53c">5. <span class="todo TODO">TODO</span> Sources</a></li>
<li><a href="#org031de7e">6. <span class="todo TODO">TODO</span> Deployment</a>
<ul>
<li><a href="#orgbfdd485">6.1. <span class="todo TODO">TODO</span> Initial deployment</a>
<ul>
<li><a href="#orgd642448">6.1.1. <span class="todo TODO">TODO</span> Development/testing in AWS</a></li>
<li><a href="#org98bb6b7">6.1.2. <span class="todo TODO">TODO</span> Fresh machine from TSA</a></li>
</ul>
</li>
<li><a href="#org3c3171d">6.2. <span class="todo TODO">TODO</span> Upgrade</a></li>
</ul>
</li>
<li><a href="#org347ade0">7. <span class="todo TODO">TODO</span> Diagnostics</a>
<ul>
<li><a href="#orgf6324dd">7.1. <span class="todo TODO">TODO</span> Logs</a></li>
</ul>
</li>
<li><a href="#org57c3daf">8. <span class="todo TODO">TODO</span> Monitoring</a></li>
<li><a href="#org65d6ec1">9. <span class="todo TODO">TODO</span> Disaster Recovery</a></li>
<li><a href="#org087a077">10. <span class="todo TODO">TODO</span> Service Level Agreement</a></li>
<li><a href="#org2c819c4">11. <span class="todo TODO">TODO</span> See Also</a></li>
<li><a href="#orged19433">12. <span class="todo TODO">TODO</span> Standards</a></li>
<li><a href="#org8aa303f">13. <span class="todo TODO">TODO</span> History</a></li>
<li><a href="#orgcc23786">14. <span class="todo TODO">TODO</span> Authors</a></li>
<li><a href="#org76ddcbd">15. <span class="todo TODO">TODO</span> Bugs</a></li>
</ul>
</div>
</div>
<div id="outline-container-orgf8a5360" class="outline-2">
<h2 id="orgf8a5360"><span class="section-number-2">1</span> Name</h2>
<div id="outline-container-orgf71f11f" class="outline-2">
<h2 id="orgf71f11f"><span class="section-number-2">1</span> <span class="todo TODO">TODO</span> Name</h2>
<div class="outline-text-2" id="text-1">
<p>
<b><b>exit-ops</b></b> - Exit Scanner, TorDNSEL and Tor Check Operations
......@@ -256,8 +281,8 @@ for the JavaScript code in this tag.
</div>
</div>
<div id="outline-container-org421d74b" class="outline-2">
<h2 id="org421d74b"><span class="section-number-2">2</span> Synopsis</h2>
<div id="outline-container-orgb479f4d" class="outline-2">
<h2 id="orgb479f4d"><span class="section-number-2">2</span> <span class="todo TODO">TODO</span> Synopsis</h2>
<div class="outline-text-2" id="text-2">
<p>
While the three services described in this document could be implemented as discrete components,
......@@ -265,8 +290,8 @@ they currently have tight coupling which means they must all be deployed on the
</p>
</div>
<div id="outline-container-org1e1fe6c" class="outline-3">
<h3 id="org1e1fe6c"><span class="section-number-3">2.1</span> Exit Scanner</h3>
<div id="outline-container-org9de5d07" class="outline-3">
<h3 id="org9de5d07"><span class="section-number-3">2.1</span> <span class="todo TODO">TODO</span> Exit Scanner <code>[0/3]</code></h3>
<div class="outline-text-3" id="text-2-1">
<p>
The exit scanner performs active measurement of Tor exit relays in order to determine the IP addresses that are used for exit connections.
......@@ -282,162 +307,102 @@ Exit lists and bulk exit lists are also consumed by third-party external applica
<li><a href="https://check.torproject.org/exit-addresses">https://check.torproject.org/exit-addresses</a> - Latest exit list</li>
<li><a href="https://check.torproject.org/torbulkexitlist">https://check.torproject.org/torbulkexitlist</a> - Latest bulk exit list</li>
</ul>
</div>
</div>
<div id="outline-container-org3a6eff2" class="outline-3">
<h3 id="org3a6eff2"><span class="section-number-3">2.2</span> TorDNSEL</h3>
</div>
<div id="outline-container-orgb90836d" class="outline-3">
<h3 id="orgb90836d"><span class="section-number-3">2.3</span> Tor Check</h3>
<div class="outline-text-3" id="text-2-3">
<p>
The primary contact for this service is the Metrics Team &amp;lt;[metrics-team@lists.torproject.org](<a href="mailto:metrics-team@lists.torproject.org">mailto:metrics-team@lists.torproject.org</a>)&amp;gt;.
For urgent queries, contact <b>karsten</b>, <b>irl</b>, or <b>gaba</b> in [#tor-project](ircs://irc.oftc.net:6697/tor-project).
</p>
<p>
The underlying infrastructure for the Onionoo service is provided by the
Tor Sysadmin Team (TSA). There are a number of HTTP caches
(<b>onionoo-frontend-\</b>.torproject.org*, currently running varnish)
that sit in front of a number of backends
(<b>onionoo-backend-\</b>.torproject.org*, running various Java compnents
described below).
</p>
<p>
The frontends are entirely managed by TSA. The frontends communicate
with the backends via IPsec tunnels managed by TSA.
</p>
<p>
The backend hosts are managed by TSA with the Onionoo services being managed
by Metrics Team. The Onionoo services get their data from the
"collector" service.
</p>
<p>
The backend are redundant and can survive outages, in those
conditions:
Documentation questions:
</p>
<ul class="org-ul">
<li>shorter than 72 hours: backends can self-heal</li>
<li>longer partial outage: as long as a backend remains, the other
backends can be restored from the remaining backend, although that
is a manual process.</li>
<li>longer total outage: if all backends go down for more than 72h,
data can still be recovered from collector, but that's another,
different manual process that still has to be implemented</li>
<li class="off"><code>[&#xa0;]</code> How long do we keep old measurements in the exit list?</li>
<li class="off"><code>[&#xa0;]</code> What are the timings for measurement runs?</li>
<li class="off"><code>[&#xa0;]</code> How many old exit lists do we keep around?</li>
</ul>
</div>
</div>
<div id="outline-container-org0ea311a" class="outline-3">
<h3 id="org0ea311a"><span class="section-number-3">2.2</span> <span class="todo TODO">TODO</span> TorDNSEL <code>[0/2]</code></h3>
<div class="outline-text-3" id="text-2-2">
<p>
Note that data is recovered from collector, which has similar
self-healing systems that cover 72 hours.
</p>
<p>
The Disaster Recovery section details how to recover from those situations.
</p>
<p>
## Onionoo Service Architecture
</p>
<p>
The Onionoo service consists of two parts: the hourly updater and the web
server.
</p>
<p>
Both parts run on each backend host and the parts run with privilege seperation.
</p>
<p>
## Hourly Updater
</p>
<p>
The hourly updater is contained in the JAR file, which is built from the
sources with:
</p>
<p>
ant jar
</p>
<p>
The JAR file is also included in the tarballs made available with releases in
the
<b>generated/dist/</b>
folder.
The filename will look like
<b>onionoo-{protocol version}-{software version}.jar</b>
and on the backend host should be found in
<b><i>srv/onionoo.torproject.org/onionoo</i></b>.
</p>
<p>
## Web Server
TorDNSEL is a DNS list service that behaves in a similar way to <a href="https://en.wikipedia.org/wiki/Domain_Name_System-based_Blackhole_List">Domain Name System-based Blackhole Lists</a>.
IP addresses will give positive results in the event that an address has been found to be used by an exit relay in a recent scan.
</p>
<p>
The web server is contained in the WAR file, which is built from the
sources with:
Documentation questions:
</p>
<p>
ant war
</p>
<ul class="org-ul">
<li class="off"><code>[&#xa0;]</code> For how long does an address give a positive result?</li>
<li class="off"><code>[&#xa0;]</code> Do we also include all IP addresses of exit flagged relays in the consensus?</li>
</ul>
</div>
</div>
<div id="outline-container-org08eae9a" class="outline-3">
<h3 id="org08eae9a"><span class="section-number-3">2.3</span> <span class="todo TODO">TODO</span> Tor Check <code>[0/1]</code></h3>
<div class="outline-text-3" id="text-2-3">
<p>
The WAR file is also included in the tarballs made available with releases in
the
<b>generated/dist/</b>
folder.
The filename will look like
<b>onionoo-{protocol version}-{software version}.jar</b>
and on the backend host should be found in
<b><i>srv/onionoo.torproject.org/onionoo</i></b>.
Tor Check is a website that can be used to determine if a browser is using the Tor network for queries.
It will also check the User-Agent to determine if a user is using Tor Browser.
It is accessed via HTTPS at <a href="https://check.torproject.org/">https://check.torproject.org/</a>.
</p>
<p>
Onionoo releases are available
[from dist.torproject.org](<a href="https://dist.torproject.org/onionoo/">https://dist.torproject.org/onionoo/</a>)
with the source code available
[from Tor Project git](<a href="https://gitweb.torproject.org/onionoo.git">https://gitweb.torproject.org/onionoo.git</a>).
Documentation questions:
</p>
<p>
Deployment and maintainence scripts are part of
[metrics-cloud](<a href="https://gitweb.torproject.org/metrics-cloud.git">https://gitweb.torproject.org/metrics-cloud.git</a>).
</p>
<ul class="org-ul">
<li class="off"><code>[&#xa0;]</code> Where is the JSON API?</li>
</ul>
</div>
</div>
</div>
<div id="outline-container-org7a758ce" class="outline-2">
<h2 id="org7a758ce"><span class="section-number-2">3</span> <span class="done DONE">DONE</span> Contacts</h2>
<div class="outline-text-2" id="text-3">
<p>
## Initial deployment
The primary contact for this service is the Metrics Team &lt;<a href="mailto:metrics-team@lists.torproject.org">metrics-team@lists.torproject.org</a>&gt;.
For urgent queries, contact <b>karsten</b>, <b>irl</b>, or <b>gaba</b> in .
</p>
</div>
</div>
<div id="outline-container-org2c30690" class="outline-2">
<h2 id="org2c30690"><span class="section-number-2">4</span> <span class="todo TODO">TODO</span> Overview</h2>
<div class="outline-text-2" id="text-4">
<p>
The initial deployment procedure is split into 3 parts:
The underlying infrastructure for the exit scanner, TorDNSEL and Tor Check services is provided by the
Tor Sysadmin Team (TSA). All services run on one virtual machine with the hostname <code>check-01.torproject.org</code>.
</p>
</div>
</div>
</div>
<div id="outline-container-org8a3c346" class="outline-2">
<h2 id="org8a3c346"><span class="section-number-2">3</span> System setup</h2>
<div id="outline-container-orga7ae53c" class="outline-2">
<h2 id="orga7ae53c"><span class="section-number-2">5</span> <span class="todo TODO">TODO</span> Sources</h2>
</div>
<div id="outline-container-org267a2f7" class="outline-2">
<h2 id="org267a2f7"><span class="section-number-2">4</span> Importing history</h2>
<div id="outline-container-org031de7e" class="outline-2">
<h2 id="org031de7e"><span class="section-number-2">6</span> <span class="todo TODO">TODO</span> Deployment</h2>
<div class="outline-text-2" id="text-6">
</div>
<div id="outline-container-orga62ddac" class="outline-2">
<h2 id="orga62ddac"><span class="section-number-2">5</span> Installing and starting the service</h2>
<div class="outline-text-2" id="text-5">
<div id="outline-container-orgbfdd485" class="outline-3">
<h3 id="orgbfdd485"><span class="section-number-3">6.1</span> <span class="todo TODO">TODO</span> Initial deployment</h3>
<div class="outline-text-3" id="text-6-1">
<p>
### Development/testing in AWS
The initial deployment procedure is split into 2 parts:
</p>
<ul class="org-ul">
<li>System setup</li>
<li>Installing and starting the services</li>
</ul>
</div>
<div id="outline-container-orgd642448" class="outline-4">
<h4 id="orgd642448"><span class="section-number-4">6.1.1</span> <span class="todo TODO">TODO</span> Development/testing in AWS</h4>
<div class="outline-text-4" id="text-6-1-1">
<p>
For development or testing in AWS, a CloudFormation template is available
named
......@@ -469,11 +434,12 @@ ansible-playbook -i dev onionoo-backends-aws.yml
<p>
Note that the AWS AMI used has passwordless sudo, so no password need be given.
</p>
</div>
</div>
<p>
### Fresh machine from TSA
</p>
<div id="outline-container-org98bb6b7" class="outline-4">
<h4 id="org98bb6b7"><span class="section-number-4">6.1.2</span> <span class="todo TODO">TODO</span> Fresh machine from TSA</h4>
<div class="outline-text-4" id="text-6-1-2">
<p>
Begin by copying the <b>state</b> and <b>out</b> directories from another Onionoo backend
to <b>/srv/onionoo.torproject.org/onionoo/{state,out}</b>.
......@@ -493,11 +459,13 @@ You can now setup the machine with Ansible by running:
ansible-playbook -i production -K onionoo-backends.yml
```
</p>
</div>
</div>
</div>
<p>
## Upgrade
</p>
<div id="outline-container-org3c3171d" class="outline-3">
<h3 id="org3c3171d"><span class="section-number-3">6.2</span> <span class="todo TODO">TODO</span> Upgrade</h3>
<div class="outline-text-3" id="text-6-2">
<p>
The version number of Onionoo to install is stored as a variable in the main
onionoo-backends.yml playbook. Begin by changing this to the new version number
......@@ -525,11 +493,17 @@ restart the services:
ansible-playbook -i production -K onionoo-backends.yml
```
</p>
</div>
</div>
</div>
<p>
## Logs
</p>
<div id="outline-container-org347ade0" class="outline-2">
<h2 id="org347ade0"><span class="section-number-2">7</span> <span class="todo TODO">TODO</span> Diagnostics</h2>
<div class="outline-text-2" id="text-7">
</div>
<div id="outline-container-orgf6324dd" class="outline-3">
<h3 id="orgf6324dd"><span class="section-number-3">7.1</span> <span class="todo TODO">TODO</span> Logs</h3>
<div class="outline-text-3" id="text-7-1">
<p>
Logs for the hourly updater can be found in
<b><i>srv/onionoo.torproject.org/logs</i></b>, and for the web server in
......@@ -560,7 +534,13 @@ ssh -L8039:localhost:8080 onionoo-backend-01.torproject.org
<p>
You'll then be able to connect to localhost:8039 in your web browser.
</p>
</div>
</div>
</div>
<div id="outline-container-org57c3daf" class="outline-2">
<h2 id="org57c3daf"><span class="section-number-2">8</span> <span class="todo TODO">TODO</span> Monitoring</h2>
<div class="outline-text-2" id="text-8">
<p>
Onionoo is monitored by the <b><b>TSA</b></b> Nagios instance (future task: add to Metrics
Nagios) using the
......@@ -573,73 +553,42 @@ alerts if they are too old.
<p>
Alerts are sent to the metrics-alerts mailing list.
</p>
</div>
</div>
<p>
## Single backend data corruption, no hardware failure
</p>
<p>
```
sudo -u onionoo -i bash -c 'systemctl &#x2013;user stop onionoo'
sudo -u onionoo-unpriv -i bash -c 'systemctl &#x2013;user stop onionoo-web'
rm -rf /srv/onionoo.torproject.org/onionoo/home/{.,}\*
rm -rf /srv/onionoo.torproject.org/onionoo/home-unpriv/{.,}\*
rm -rf /srv/onionoo.torproject.org/onionoo/onionoo/{.,}\*
```
</p>
<p>
Then pretend you are deploying a new backend from the instructions above.
</p>
<p>
## Single backend failure, hardware failure
</p>
<p>
In the event of a single backend failure, ask TSA to trash it and make a new
one. Once Puppet has configured their side of it, pretend you are deploying a
new backend from the instructions above.
</p>
<p>
## Total loss
</p>
<p>
In the event of a total loss, ask TSA to trash all the backends and make new
ones. Once Puppet has configured one host, restore the state and out
directories from the latest good backup. It may be necessary to refer to the
logs to work out when the latest good backup might be, which should also be
backed up. Once state and out are in place, pretend you are deploying a new
backend from the instructions above.
</p>
<p>
## Total loss including all backups
</p>
<p>
In the event that the backups have also been lost, it will not be possible to
restore history. The data does exist in CollecTor to do this, but there is no
code that actually does it.
</p>
<div id="outline-container-org65d6ec1" class="outline-2">
<h2 id="org65d6ec1"><span class="section-number-2">9</span> <span class="todo TODO">TODO</span> Disaster Recovery</h2>
</div>
<p>
If no out directory is present on the instance when the Ansible playbook is run
to install and start the service, it will perform an initial single run of the
updater to bootstrap. This will be where history starts.
</p>
<div id="outline-container-org087a077" class="outline-2">
<h2 id="org087a077"><span class="section-number-2">10</span> <span class="todo TODO">TODO</span> Service Level Agreement</h2>
</div>
<p>
Try to avoid this happening.
</p>
<div id="outline-container-org2c819c4" class="outline-2">
<h2 id="org2c819c4"><span class="section-number-2">11</span> <span class="todo TODO">TODO</span> See Also</h2>
</div>
<div id="outline-container-orged19433" class="outline-2">
<h2 id="orged19433"><span class="section-number-2">12</span> <span class="todo TODO">TODO</span> Standards</h2>
<div class="outline-text-2" id="text-12">
<p>
The Onionoo service implements the [Onionoo
protocol](<a href="https://metrics.torproject.org/onionoo.html">https://metrics.torproject.org/onionoo.html</a>).
</p>
</div>
</div>
<div id="outline-container-org8aa303f" class="outline-2">
<h2 id="org8aa303f"><span class="section-number-2">13</span> <span class="todo TODO">TODO</span> History</h2>
</div>
<div id="outline-container-orgcc23786" class="outline-2">
<h2 id="orgcc23786"><span class="section-number-2">14</span> <span class="todo TODO">TODO</span> Authors</h2>
</div>
<div id="outline-container-org76ddcbd" class="outline-2">
<h2 id="org76ddcbd"><span class="section-number-2">15</span> <span class="todo TODO">TODO</span> Bugs</h2>
<div class="outline-text-2" id="text-15">
<p>
Known bugs can be found in the Tor Project Trac using
[this query](<a href="https://trac.torproject.org/projects/tor/query?status=!closed&amp;component=Metrics%2FOnionoo">https://trac.torproject.org/projects/tor/query?status=!closed&amp;component=Metrics/Onionoo</a>).
......@@ -657,7 +606,7 @@ component.
</div>
<div id="postamble" class="status">
<p class="author">Author: Iain Learmonth</p>
<p class="date">Created: 2020-03-31 Tue 15:01</p>
<p class="date">Created: 2020-03-31 Tue 15:26</p>
<p class="validation"><a href="http://validator.w3.org/check?uri=referer">Validate</a></p>
</div>
</body>
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment