Raw import from Trac using Trac markup language. authored by Alexander Hansen Færøy's avatar Alexander Hansen Færøy
[[TOC]]
== Design ==
The easiest way to understand how !GetTor works is by enumerating the
different steps involved in the problem we want to solve. Consider the
following situations:
1. Receive requests from users via different channels.
2. Process the received requests, extracting the information needed to provide an useful response, namely: source address/user, operating system and language.
3. Construct a response according to the information extracted.
4. Create an anti-flood mechanism that allows to blacklist specific users.
5. Verify that the source address/user of the request is not permanently or temporarily blacklisted.
6. Send back a reply with the links to download Tor Browser from some popular ''non-blocked'' cloud service.
7. Keep track of the number of requests received by !GetTor.
8. Upload Tor Browser to popular ''non-blocked'' cloud services.
The current design of !GetTor consists of a series of modules, each one
intended for a specific task. There are two big groups: the main modules,
and the service modules. The main modules are Core,
Blacklist and Database, aimed to cover the points 3), 4), 5) and 7).
The service modules are STMP, XMPP and Twitter, aimed to cover the points
1), 2), 5) and 6).
Whenever a request is received, it is handled by one of the service modules
according to the channel the request was sent by the user. These channels are email
for SMTP, chat for XMPP, and DM for Twitter. The corresponding module process
the request, collecting all the necessary data to provide
an useful reply, namely: operating system, language and source address/user.
It also makes sure that the source address/user is not blacklisted (See
Blacklisting for details). If no valid data is found, then a help message
is sent back to the user. Otherwise, the service module contacts the Core
module asking for the links and then replies to the user.
In both cases, the Core module increases the number of requests received
in the database. A very simple diagram of the modules interaction looks
like this:
{{{
-----------
->|SMTP Module| -----------
/ ----------- \ >| Blacklist |<
/ \ / ----------- \
/ ----------- \ / ------ \ ----------
USERS <---> |XMPP Module| <------> | Core | <-----------> | Database |
\ ----------- / ------ ----------
\ / |
\ -------------- |
\-->|Twitter Module| |
-------------- |
\ |
\ ----------------
\---------->| Other Services |
----------------
}}}
There is one of the points enumerated before that is not covered by the
previous modules, which is uploading Tor Browser to popular ''non-blocked''
cloud services. This is handled by a series of scripts, one for each cloud
service supported. Currently, there are scripts for Dropbox and Google
Drive.\
\
Below you will find a more detailed description of each one of the modules
and scripts of !GetTor.
=== Core
As its name suggests, this is the core module of !GetTor, and its main
purpose is to provide a simple and robust interface for obtaining the links
to download the Tor Browser. The design of this module is based on one
main concept: storing the links on files. The idea consists on having
one file for each cloud service or provider, where each file follows the
Python PEP8 format for configuration files, which means that the data is
categorized under sections and accesible by keys. Every ''links file''
must have the following five sections:
'''[provider]''': Contains only one key, the name of the cloud service/provider.
'''[key]''': Contains only one key, the fingerprint of the PGP key used to sign
the Tor Browser packages.
'''[linux]''': Contains all the links for the Linux operating system, with one
key for each locale available. Every locale should have no more than six
lines. There is one line for the Tor Browser link, other for the ASC
signature link, and other for the sha256 of Tor Browser. There is one
set of three lines for 32-bit and other for 64-bit (six lines in total).
'''[windows]''': Contains all the links for the Windows operating system, with
one key for each locale available. Every locale should have no more than
three lines. There is one line for the Tor Browser link, other for the
ASC signature link, and other for the sha256 of Tor Browser. The windows
package of Tor Browser is intended for both 32 and 64 bits.
'''[osx]''': Contains all the links for the Mac OSX operating system, with one
key for each locale available. Every locale should have no more than six
lines. There is one line for the Tor Browser link, other for the ASC
signature link, and other for the sha256 of Tor Browser. There is one
set of three lines for 32-bit and other for 64-bit (six lines in total).
A sample ''links file'' should look like this:
{{{
[provider]
name = Dropbox
[key]
fingerprint = 8738 A680 B84B 3031 A630 F2DB 416F 0610 63FE E659
[linux]
en = Package (64-bit): link-to-dropbox-en64
ASC signature (64-bit): link-to-dropbox-en64.asc
Package SHA256 checksum (64-bit): 98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4,
Package (32-bit): link-to-dropbox-en32
ASC signature (32-bit): link-to-dropbox-en32.asc
Package SHA256 checksum (32-bit): 98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4
es = Package (32-bit): link-to-dropbox-es32
ASC signature (32-bit): link-to-dropbox-es32.asc
Package SHA256 checksum (32-bit): 98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4
[windows]
...
[osx]
....
}}}
Please note that for the purposes of making things easier, the name of a ''links file'' should
be ''provider_in_lowercase.links''. All of the above allow us an easy access
to the links depending on the operating system and language that we need. The public method
for doing this is the following:
{{{#!python
get_links(service, os, lc)
}}}
This returns a string with the links, where:
'''service''': String that identifies the service communicating with the core module. This is for stats purposes only.
'''os''': The operating system for which we need the links. There are currently three options: ''windows'', ''linux'', and ''osx''.
'''lc''': The locale for which we need the links. There is currently one supported option: ''en'' (for English).
Below is a sample script that communicates with the core module:
{{{#!python
#!/usr/bin/python
import gettor.core
core = gettor.core.Core()
links = core.get_links('dummy service', 'linux', 'en')
print links
}}}
For more details you are welcome to see the implementation on the code repository.\\
The secondary purpose of the core module is to provide methods to ease
the creation of ''links files'' for cloud services. There are two public
methods for this:
{{{#!python
create_links_file(provider, fingerprint)
}}}
This creates a ''links file'' with the format ''provider_in_lowecase.links'', where:
'''provider''': String for the name of the provider/cloud service (e.g. Dropbox)
'''fingerprint''': String that represents the fingerprint used to sign the Tor Browser packages.
And,
{{{#!python
add_link(provider, os, lc, link)
}}}
This adds a link to the ''links file'' of the provider, where:
'''provider''': Strings that identifies the provider/cloud service. This is also the name of the ''links file''.
'''os''': The operating system for which we intend to add the link. There are currently three options: ''windows'', ''linux'', and ''osx''.
'''lc''': Locale for which we intend to add the link. There is currently one supported option: ''en'' (for English).
'''link''': String that represents the actual link to be added.
Below is a sample script to create a ''links file'' and add a couple of links to it:
{{{#!python
#!/usr/bin/python
import gettor.core
link64 = """Package (64-bit): link-to-dropbox?dl=1
ASC signature (64-bit): link-to-dropbox.asc?dl=1
Package SHA256 checksum (64-bit): 98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4"""
link32 = """Package (32-bit): link-to-dropbox?dl=1
ASC signature (32-bit): link-to-dropbox.asc?dl=1
Package SHA256 checksum (32-bit): 98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4"""
core = gettor.core.Core()
core.create_links_file('Dropbox', '8738 A680 B84B 3031 A630 F2DB 416F 0610 63FE E659')
core.add_link('Dropbox', 'linux', 'en', link64)
core.add_link('Dropbox', 'linux', 'en', link32)
}}}
For more details on these methods please check the code repository and/or see the cloud service scripts section.
=== Distribution Channels
Ideally, a user should have various ways to contact !GetTor and receive
the Tor Browser. This ''distribution channels'' should parse a request,
get the user's OS and language, ask for the links to the core module and
then send this info back to the user. Ideally, each distribution channel
should be handled by a separate module. Currently, there is one distribution
channel deployed (SMTP), one implemented but not deployed (XMPP), and one
not finished (Twitter).
==== SMTP
This modules is on charge of receive and reply requests via email. Back
in 2008 when !GetTor was conceived, SMTP was the main and only distribution
channel. Requests were answered with the actual bundle as an attachment
instead of links. This approach was good, but the bundles started to get
larger in size to the point were it was no longer feasible to send it as
an attachment (the current size of Tor Browser is ~40Mb).
There three scenarios involved in sending links via email:
* Listen for users' emails directed to !GetTor robot.
* Determine the type of request and get the necessary data to reply it.
* Send back a reply to the user.
The first point is handled by the mail server provided by the Tor Project.
In addition, we use email forwarding to make sure we get all the emails
directed to !GetTor robot. For this a .forward like the following is used:
{{{#!bash
|"python2.7 /path/to/gettor/smtp_process.py"
}}}
With this, the only concern of the smtp_process.py script is to receive
emails fron the standard input and talk to the SMTP module to process it.
The SMTP module has only one public method:
{{{#!python
process_email(raw_msg)
}}}
Where:
'''raw_msg''': String that represents the email received.
A basic script for communicating with the SMTP module should look like this:
{{{#!python
#!/usr/bin/env python
import sys
import gettor.smtp
service = gettor.smtp.SMTP()
incoming = sys.stdin.read()
service.process_email(incoming)
}}}
The other two points are handled by the SMTP module. The first step after
receiving a request is determine if the address is blacklisted. See the
Blacklisting section to check the current process to do that. Then, the
next step is to determine the type of request received.For now, there are
only two types of request that could be received: help and links. The decision
process to determine what type we have received is the following:
* Does the body of the message include the words ''windows'', ''linux'', or ''osx''?
If so, we have received a links request.
* Any other case should be considered as a help request, including blank
emails.
For both types of request the language is obtainede from the address the email
was intended to: gettor+lc@torproject.org, where lc stands for the supported
locales by Tor Browser. Currently, the only locale supported is English.
If no locale is specified, we assume English by default.
Knowing the type and language of the request is enough to construct a reply
and send it to the user. Every time a reply is sent, the number of requests
received is increased in the database. See Database to check the current
DB schema.
==== XMPP
To be redacted.
==== Twitter
To be redacted.
=== Database
The database module, as its name suggests, is in charge of interacting
with the !GetTor database. The current design is quite simple and satisfies
two needs:
'''Add a request'''. For now it consists only in knowing how many requests we have received so far. No other data is collected.
'''Add/delete/update a user'''. This allow us to know how many requests a single user has made and thus avoid any type of flood (see Blacklisting). For this purpose we collect the following data:
* ''user'': 256 hash of the user address/account.
* ''service'': string that represents the distribution method used by the user
(e.g. SMTP).
* ''times'': number of requests received from the same user.
* ''blocked'': boolean flag to know if user is permanently blacklisted.
* ''last_request'': timestamp that represents the last time a given user
made a request from the same distribution channel.
The initial design of the database module (during the revamp) considered a
lot of data to be collected (type of request, language, os, etc.), but
eventually we decided to keep just the necessary data to know how many
requests !GetTor has received and to avoid flood. The type of database
choosen for this purpose was SQLite. You can check a sample database in
the code repository (gettor.db).
=== Blacklisting
The current blacklisting mechanism is quite simply and it's based on the data
collected by the 'users' table specified in !GetTor's database, plus some
extra parameters defined in blacklist.cfg, which help us to stablish limits
to avoid flood. The current mechanism depends on four parameters:
* '''user''': Hashed address/account of the user. It helps to identify malicious users.
* '''service''': Service or distribution channel used by the user trying to contact !GetTor.
* '''max_req''': Maximum number of requests per user ''and'' service allowed at the moment.
* '''wait_time''': Number of minutes a user should wait until she reaches '''max_req'''.
Both the '''user''' and '''service''' parameteres are obtained in real time
when !GetTor receives a request. The other two, '''max_req''' and '''wait_time'''
are specified in blacklist.cfg. Each service module (e.g. SMTP) should be
in charge of specifying the path to this configuration file and interact
with the !Blacklisting module according to that information. The current
mechanism also depends on the '''last_request''', '''times''', and '''blocked'''
fields of the database for the record associated with '''user''' and '''service'''.
With all of this, the decision algorithm can be described as follows:
{{{#!python
if blocked:
update_user_on_db(user, service, times+1, 1)
raise BlacklistError("Blocked user")
elif times >= max_req:
last = get last_request from db
next = last + wait_time
if now < next:
# too many requests from the same user
update_user_on_db(user, service, times+1, 0)
raise BlacklistError("Too many requests")
else:
# fresh user again!
update_user_on_db(user, service, 1, 0)
else:
# adding up a request for user
update_user_on_db(user, service, times+1, 0)
}}}
This simple mechanism helps us avoid malicious users from flooding one or
more services/distribution channels with infinite requests. As you may
otice, if a user make a request before the '''wait_time''' has passed, then
the user must wait another '''wait_time''' to make a request again, and
if a user make a request after she has reached the maximun number of requests
and waited '''wait_time''', then the counter of her requests is setted to
one. You can check the {{{_is_blacklisted}}} method of the SMTP module to
see how a service should interact with the Blacklisting module.
This mechanism could certainly be improved. If you have any ideas/comments
about it, please tell us (ideally by filling a ticket :)
=== Cloud Services
For each service used by !GetTor to distribute the Tor Browser files there
should be a script in charge of uploading such files according to the methods
provided by the service used. Each one of these scripts must assume that the
latest Tor Browser files has been downloaded (see Other Scripts) and contemplate the
following tasks (in order):
1. Get the fingerprint from the key used to sign the Tor Browser.
2. Use the Core module to create a new links file (core.create_links_file).
3. Obtain the sha256 checksum of each {tar.xz, exe, dmg} file to be uploaded.
4. Check that the corresponding .asc signature exists for each {tar.xz, exe, dmg} file to be uploaded.
5. Identify the architecture, language and operating system associated to each {tar.xz, exe, dmg} file to be uploaded.
6. Create a string describing a new link, using the information identified before.
7. Use the Core module to add a link to the new links file created (core.add_link), specifying the service, the operating system, and the language (locale).
You can check the existing scripts for Dropbox and Google Drive to see the current
methods used to do the points listed above, specially 1, 3, 5, and 6. For more details
on how the links files are created and how the links are stored, check the documentation
about the Core module. Below is a list of the current services/providers integrated
with !GetTor:
* '''Dropbox''': Deployed. In use for a long time.
* '''Google Drive''': Implemented, but not yet deployed.
* '''Github''': Implemented, but not yet deployed. This one should be especially useful to distribute the Tor Browser
in places where Dropbox and Google Drive are blocked (e.g. China).
If you have an idea for a new service that could be used (even if you don't know how to implement it),
please contact us (ideally by filling a ticket :).
=== Other Scripts
Below is a list of scripts used for diverse and "smaller" tasks:
* '''blacklist.py''': Handle blacklisting of users. Execute blacklist.py -h for more details.
* '''create_db.py''': Handle the creation of the SQLite database used by !GetTor for managing blacklisting of users and keep track of basic stats. Execute create_db.py -h for more details.
* '''stats.py''': Handle basic stats according to the information stored in the SQLite database. Execute stats.py -h for more details.
* '''fetch_latest_torbrowser.py''': Automate the download of Tor Browser files from Tor Project's website and upload of these files to the services used by !GetTor every time a new stable version of Tor Browser is available. Implemented, but not yet deployed. See the source file in the repository for more details.