How to run the Snakes on a Tor Exit Scanner I. Introduction The Snakes on a Tor Exit Scanner scans the Tor network for misbehaving and misconfigured exit nodes. It has several tests that it performs, including HTML, javascript, arbitrary HTTP, SSL and DNS scans. The mechanisms by which these scans operate will be covered in another document. This document concerns itself only with running the scanner. II. Prerequisites Python 2.5+ Tor 0.2.2.x py-openssl/pyOpenSSL Bonus: Secondary external IP address Having a second external IP address will allow your scanner to filter out false positives for dynamic pages that arise due to pages encoding your IP address in documents. III. Setup A. Compiling Tor To run SoaT you will need Tor 0.2.2.x. It is also strongly recommended that you have a custom Tor instance that is devoted only to exit scanning, and is not performing any other function (including serving as a relay or a directory authority). B. Configuring SoaT for Randomized Testing If you just want to run a simple static test, skip this section. To configure SoaT you will need to edit soat_config.py. In particular, you'll want to change 'refetch_ip' to be set to your secondary IP address. If you don't have a secondary IP, set it to None. If you're feeling ambitious, you can edit soat_config.py to change the set of 'scan_filetypes' and increase 'max_content_size' to something large enough to support these filetypes. However, you should balance this with our more immediate need for the scanner to run quickly so that the code is exercised and can stabilize quickly. If you plan on doing search-based tests, you'll also want to edit ./wordlist.txt and change its contents to be a smattering of random and/or commonly censored words. If you speak other languages (especially any that have unicode characters), using keywords from them would be especially useful for testing and scanning. Note that these queries WILL be issued in plaintext via non-Tor, and the resulting urls fetched via non-Tor as well, so bear that and your server's legal jurisdiction in mind when choosing keywords. You can also separate out the wordlist.txt file into three files by changing the soat_config.py settings 'ssl_wordlist_file', 'html_wordlist_file', and 'filetype_wordlist_file'. This will allow you to use separate keywords for obtaining SSL, HTML, and Filetype urls. This can be useful if you believe it likely for an adversary to target only certain keywords/concepts/sites in a particular context. You can edit the contents of the wordlist files while SoaT runs. It will pick up the changes after it completes a full network scan with the old list. IV. Running Tor and SoaT Once you have everything compiled and configured, you should be ready to run the pieces. You probably want to do this as a separate, unprivileged user. First, start up your custom Tor with the sample torrc provided in the TorFlow svn root: # ~/src/tor-git/src/or/tor -f ./data/tor/torrc & Now you're ready to run SoaT. The next section describes SoaT's different tests and operating modes, but if you'd like to get started immediately, you may choose to run a search based SSL and HTTP test. These are the most complete of the currently implemented tests, and have very low false positive rates. # ./soat.py --ssl --http >& ./data/soat.log & or # ./soat.py --ssl --target=ip:port >& ./data/soat.log & V. Tests and Operating Modes Currently, SoaT's most developed tests are those for SSL and HTTP requests. But SoaT is also capable of doing HTML, and DNS Rebind tests. Any combination of these tests may be performed during a SoaT run, although DNS Rebind requires at least one other test to be performed in parallel. To enable a test, simply pass SoaT its flag: --ssl, --http, --html, or --dnsrebind. By default the tests are run in search based mode, this means that the URLs to be requested during the run are gathered by querying search engines for the terms in your ./wordlist.txt file. An alternative, and potentially less false positive prone, operating mode is the fixed target mode. Fixed target mode is enabled by passing SoaT one or more --target= flags. Only the URLs referenced by the target flags will be requested. This operating mode has several attractive features, for instance, you can reduce false positive rates by selecting static content, and you can shorten the duration of runs by selecting small files on highly responsive servers. It should be noted that, despite their attractive features, fixed target scans are likely to miss many of the results which search based scans detect. Principally this is because it is difficult as a SoaT operator to pick a diverse set of targets. Consider, for example, that if in selecting your targets you neglect to include a site on one of OpenDNS' blacklists, then you're going to miss one of the most common configuration issues that SoaT detects. Another issue is that it's quite likely some malicious exit nodes limit their activity to a small set of sites. As such, any reduction in your search space limits the likelihood that you'll make a request through such an exit which triggers its malicious behavior. VI. Monitoring and Results A. Issues with automated search engine queries and Randomized Scans SoaT can use Ixquick, Google, or Yahoo to perform its search queries. The current default is Ixquick, and for most purposes this should be fine. If you do find that you're having trouble discovering URLs (particularly for the SSL test), then you may wish to switch to Google or Yahoo. To do so, open your soat_config.py and change: default_search_mode = ixquick_search_mode to default_search_mode = google_search_mode or default_search_mode = yahoo_search_mode Regardless of the engine used, you'll need to keep an eye on the beginning of the soat.log to make sure it is actually retrieving URLs. Google's servers can periodically decide that you are not worthy to query them, especially if you restart soat several times in a row. If this happens, you'll need to temporarily switch search engines. Be warned that the Yahoo search mode is not acceptable for conducting SSL tests as Yahoo lacks the necessary query terms. If neither Ixquick nor Google are working, you'll either need to stop your SSL tests or switch to a fixed target scan. B. Handling Crashes At this stage in the game, your primary task will be to periodically check the scanner for exceptions and hangs. For that you'll just want to tail the soat.log file to make sure it is putting out recent loglines and is continuing to run. If there are any issues, please mail me your soat.log. If/When SoaT crashes, you should be able to resume it exactly where it left off with: # ./soat.py --resume=-1 [other options you used last time] >& soat.log & Keeping the same options during a --resume is a Really Good Idea. Soat actually saves a snapshot to a unique name each time you run it without --resume, so you can suspend and resume arbitrary runs by specifying their number: # ls ./data/soat/ # ./soat.py --resume=2 --ssl --html --http --dnsrebind >& soat.log & Using --resume=-1 indicates that SoaT should resume its most recent run. C. Handling Results As things stabilize, you'll want to begin grepping your soat.log for ERROR lines. These indicate serious scanning errors and content modifications. There will likely be false positives at first, and these will require you tar up your ./data directory and soat.log and send it to me to improve the filters for them: # tar -jcf soat-data.tbz2 ./data/soat ./soat.log If you're feeling adventurous, you can inspect the results yourself by running snakeinspector.py. Running it with no arguments will dump all failures to your screen in a semi-human readable format. You can add a --verbose to get unified diffs of content modifications, and you can filter on specific Test Result types with --resultfilter, and on specific exit idhexes with --exit. Ex: # ./snakeinspector.py --verbose --exit=80972D30FE33CB8AD60726C5272AFCEBB05CD6F7 --resultfilter=SSLTestResult Other useful filters are --after, --before, --finishedafter, and --finishedbefore. These each take a timestamp such as "Thu Jan 1 00:00:00 1970". --after and --before are useful while a test is in progress to see what's been discovered so far. The finishedafter and finishedbefore flags filter results based on when the test during which they were discovered was completed, and provide a nice way to group all results from the same test together. If you wanted to see all results from tests completed the week of August 9, 2010, you could run: # ./snakeinspector.py --verbose --finishedafter="Mon Aug 9 00:00:00 2010" --finishedbefore="Mon Aug 16 00:00:00 2010" You can see the full list of available filters by running: # ./snakeinspector.py --help D. Verifying Results If you would like to verify a set of results, you can use the --rescan option of soat, which crawls your data directory and creates a list of nodes to scan that consist only of failures, and then scans those with fresh URLs: # ./soat.py --rescan --ssl --html --http --dnsrebind >& soat.log & Rescans can also be resumed with --resume should they fail. SoaT can also do a rescan at the end of every loop through the node list. This is governed by the rescan_at_finish soat_config option. Note that rescanning does not prune out geolocated URLs that differ across the majority of exit nodes. It can thus cause many more false positives to accumulate than a regular scan. E. Reporting Results You'll notice in your soat_config.py that there are several variables prefixed by "mail_". Set appropriately, these allow you to automatically email results to us through snakeinspector (you'll also have to add our email address to the to_email list). If you have a gmail account, you can set these variables as follows: mail_server = "smtp.gmail.com" mail_auth = True mail_tls = False mail_starttls = True mail_user = "your_username@example.com" mail_password = "your_password" If you're wary of leaving your email password in plaintext in the soat_config, you can set mail_password = None, and you'll be prompted to provide it when snakeinspector is run. In this current directory is a cron.sh script that calls snakeinspector to email results that completed in the last hour, or since the last time you've run it. Add it to `crontab -e` like so: 0 * * * * ~/code/torflow.git/NetworkScanners/ExitAuthority/cron.sh Alright that covers the basics. Let's get those motherfuckin snakes off this motherfuckin Tor!