Changes

Alexander Hansen Færøy · 53a16e3a
--- a/org/roadmaps/GetTor/design.md
+++ b/org/roadmaps/GetTor/design.md
+[[TOC]]
+== Design ==
+The easiest way to understand how !GetTor works is by enumerating the 
+different steps involved in the problem we want to solve. Consider the
+following situations:
+1. Receive requests from users via different channels.
+2. Process the received requests, extracting the information needed to provide an useful response, namely: source address/user, operating system and language.
+3. Construct a response according to the information extracted.
+4. Create an anti-flood mechanism that allows to blacklist specific users.
+5. Verify that the source address/user of the request is not permanently or temporarily blacklisted.
+6. Send back a reply with the links to download Tor Browser from some popular ''non-blocked'' cloud service.
+7. Keep track of the number of requests received by !GetTor.
+8. Upload Tor Browser to popular ''non-blocked'' cloud services.
+The current design of !GetTor consists of a series of modules, each one
+intended for a specific task. There are two big groups: the main modules, 
+and the service modules. The main modules are Core, 
+Blacklist and Database, aimed to cover the points 3), 4), 5) and 7).
+The service modules are STMP, XMPP and Twitter, aimed to cover the points
+1), 2), 5) and 6). 
+Whenever a request is received, it is handled by one of the service modules
+according to the channel the request was sent by the user. These channels are email
+for SMTP, chat for XMPP, and DM for Twitter. The corresponding module process 
+the request, collecting all the necessary data to provide
+an useful reply, namely: operating system, language and source address/user.
+It also makes sure that the source address/user is not blacklisted (See
+Blacklisting for details). If no valid data is found, then a help message 
+is sent back to the user. Otherwise, the service module contacts the Core
+module asking for the links and then replies to the user.
+In both cases, the Core module increases the number of requests received 
+in the database. A very simple diagram of the modules interaction looks
+like this:
+{{{
+                  -----------
+               ->|SMTP Module|          -----------
+             /    -----------  \      >| Blacklist |<
+            /                   \   /   -----------   \
+           /     -----------     \ /    ------         \       ----------
+    USERS <---> |XMPP Module| <------> | Core | <-----------> | Database |
+           \     -----------     /      ------                 ----------
+            \                   /         |
+             \     --------------         |
+              \-->|Twitter Module|        |
+                   --------------         |
+                \                         |
+                 \             ----------------
+                  \---------->| Other Services |
+                               ----------------
+}}}
+There is one of the points enumerated before that is not covered by the
+previous modules, which is uploading Tor Browser to popular ''non-blocked''
+cloud services. This is handled by a series of scripts, one for each cloud
+service supported. Currently, there are scripts for Dropbox and Google
+Drive.\
+\
+Below you will find a more detailed description of each one of the modules
+and scripts of !GetTor.
+=== Core
+As its name suggests, this is the core module of !GetTor, and its main
+purpose is to provide a simple and robust interface for obtaining the links
+to download the Tor Browser. The design of this module is based on one
+main concept: storing the links on files. The idea consists on having
+one file for each cloud service or provider, where each file follows the
+Python PEP8 format for configuration files, which means that the data is
+categorized under sections and accesible by keys. Every ''links file'' 
+must have the following five sections:
+ '''[provider]''': Contains only one key, the name of the cloud service/provider.
+ '''[key]''': Contains only one key, the fingerprint of the PGP key used to sign
+ the Tor Browser packages.
+ '''[linux]''': Contains all the links for the Linux operating system, with one
+ key for each locale available. Every locale should have no more than six 
+ lines. There is one line for the Tor Browser link, other for the ASC 
+ signature link, and other for the sha256 of Tor Browser. There is one
+ set of three lines for 32-bit and other for 64-bit (six lines in total).
+ '''[windows]''': Contains all the links for the Windows operating system, with 
+ one key for each locale available. Every locale should have no more than 
+ three lines. There is one line for the Tor Browser link, other for the 
+ ASC signature link, and other for the sha256 of Tor Browser. The windows
+ package of Tor Browser is intended for both 32 and 64 bits.
+ '''[osx]''': Contains all the links for the Mac OSX operating system, with one
+ key for each locale available. Every locale should have no more than six 
+ lines. There is one line for the Tor Browser link, other for the ASC 
+ signature link, and other for the sha256 of Tor Browser. There is one
+ set of three lines for 32-bit and other for 64-bit (six lines in total).
+A sample ''links file'' should look like this:
+{{{
+[provider]
+name = Dropbox
+[key]
+fingerprint = 8738 A680 B84B 3031 A630 F2DB 416F 0610 63FE E659
+[linux]
+en = Package (64-bit): link-to-dropbox-en64
+	ASC signature (64-bit): link-to-dropbox-en64.asc
+	Package SHA256 checksum (64-bit): 98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4,
+	Package (32-bit): link-to-dropbox-en32
+	ASC signature (32-bit): link-to-dropbox-en32.asc
+	Package SHA256 checksum (32-bit): 98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4
+es = Package (32-bit): link-to-dropbox-es32
+	ASC signature (32-bit): link-to-dropbox-es32.asc
+	Package SHA256 checksum (32-bit): 98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4
+[windows]
+...
+[osx]
+....
+}}}
+Please note that for the purposes of making things easier, the name of a ''links file'' should
+be ''provider_in_lowercase.links''. All of the above allow us an easy access
+to the links depending on the operating system and language that we need. The public method
+for doing this is the following:
+{{{#!python 
+  get_links(service, os, lc)
+}}}
+This returns a string with the links, where:
+ '''service''': String that identifies the service communicating with the core module. This is for stats purposes only.
+ '''os''': The operating system for which we need the links. There are currently three options: ''windows'', ''linux'', and ''osx''.
+ '''lc''': The locale for which we need the links. There is currently one supported option: ''en'' (for English).
+Below is a sample script that communicates with the core module:
+{{{#!python
+#!/usr/bin/python
+import gettor.core
+core = gettor.core.Core()
+links = core.get_links('dummy service', 'linux', 'en')
+print links
+}}}
+For more details you are welcome to see the implementation on the code repository.\\
+The secondary purpose of the core module is to provide methods to ease
+the creation of ''links files'' for cloud services. There are two public
+methods for this:
+{{{#!python
+create_links_file(provider, fingerprint)
+}}}
+This creates a ''links file'' with the format ''provider_in_lowecase.links'', where:
+ '''provider''': String for the name of the provider/cloud service (e.g. Dropbox)
+ '''fingerprint''': String that represents the fingerprint used to sign the Tor Browser packages.
+And,
+{{{#!python
+   add_link(provider, os, lc, link)
+}}}
+This adds a link to the ''links file'' of the provider, where:
+   '''provider''': Strings that identifies the provider/cloud service. This is also the name of the ''links file''.
+   '''os''': The operating system for which we intend to add the link. There are currently three options: ''windows'', ''linux'', and ''osx''.
+   '''lc''': Locale for which we intend to add the link. There is currently one supported option: ''en'' (for English).
+   '''link''': String that represents the actual link to be added.
+Below is a sample script to create a ''links file'' and add a couple of links to it:
+{{{#!python
+#!/usr/bin/python
+import gettor.core
+link64 = """Package (64-bit): link-to-dropbox?dl=1
+ASC signature (64-bit): link-to-dropbox.asc?dl=1
+Package SHA256 checksum (64-bit): 98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4"""
+link32 = """Package (32-bit): link-to-dropbox?dl=1
+ASC signature (32-bit): link-to-dropbox.asc?dl=1
+Package SHA256 checksum (32-bit): 98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4"""
+core = gettor.core.Core()
+core.create_links_file('Dropbox', '8738 A680 B84B 3031 A630 F2DB 416F 0610 63FE E659')
+core.add_link('Dropbox', 'linux', 'en', link64)
+core.add_link('Dropbox', 'linux', 'en', link32)
+}}}
+For more details on these methods please check the code repository and/or see the cloud service scripts section.
+=== Distribution Channels
+Ideally, a user should have various ways to contact !GetTor and receive
+the Tor Browser. This ''distribution channels'' should parse a request,
+get the user's OS and language, ask for the links to the core module and
+then send this info back to the user. Ideally, each distribution channel 
+should be handled by a separate module. Currently, there is one distribution
+channel deployed (SMTP), one implemented but not deployed (XMPP), and one
+not finished (Twitter).
+==== SMTP
+This modules is on charge of receive and reply requests via email. Back
+in 2008 when !GetTor was conceived, SMTP was the main and only distribution
+channel. Requests were answered with the actual bundle as an attachment
+instead of links. This approach was good, but the bundles started to get
+larger in size to the point were it was no longer feasible to send it as
+an attachment (the current size of Tor Browser is ~40Mb).
+There three scenarios involved in sending links via email:
+ * Listen for users' emails directed to !GetTor robot.
+ * Determine the type of request and get the necessary data to reply it.
+ * Send back a reply to the user.
+The first point is handled by the mail server provided by the Tor Project. 
+In addition, we use email forwarding to make sure we get all the emails
+directed to !GetTor robot. For this a .forward like the following is used:
+{{{#!bash
+|"python2.7 /path/to/gettor/smtp_process.py"
+}}}
+With this, the only concern of the smtp_process.py script is to receive
+emails fron the standard input and talk to the SMTP module to process it.
+The SMTP module has only one public method:
+{{{#!python
+process_email(raw_msg)
+}}}
+Where:
+ '''raw_msg''': String that represents the email received.
+A basic script for communicating with the SMTP module should look like this:
+{{{#!python
+#!/usr/bin/env python
+import sys
+import gettor.smtp
+service = gettor.smtp.SMTP()
+incoming = sys.stdin.read()
+service.process_email(incoming)
+}}}
+The other two points are handled by the SMTP module. The first step after
+receiving a request is determine if the address is blacklisted. See the
+Blacklisting section to check the current process to do that. Then, the
+next step is to determine the type of request received.For now, there are
+only two types of request that could be received: help and links. The decision
+process to determine what type we have received is the following:
+ * Does the body of the message include the words ''windows'', ''linux'', or ''osx''?
+ If so, we have received a links request.
+ * Any other case should be considered as a help request, including blank
+ emails.
+For both types of request the language is obtainede from the address the email
+was intended to: gettor+lc@torproject.org, where lc stands for the supported
+locales by Tor Browser. Currently, the only locale supported is English.
+If no locale is specified, we assume English by default.
+Knowing the type and language of the request is enough to construct a reply
+and send it to the user. Every time a reply is sent, the number of requests 
+received is increased in the database. See Database to check the current
+DB schema.
+==== XMPP
+To be redacted.
+==== Twitter
+To be redacted.
+=== Database
+The database module, as its name suggests, is in charge of interacting
+with the !GetTor database. The current design is quite simple and satisfies
+two needs:
+'''Add a request'''. For now it consists only in knowing how many requests we have received so far. No other data is collected.
+'''Add/delete/update a user'''. This allow us to know how many requests a single user has made and thus avoid any type of flood (see Blacklisting). For this purpose we collect the following data:
+ * ''user'': 256 hash of the user address/account.
+ * ''service'': string that represents the distribution method used by the user
+   (e.g. SMTP).
+ * ''times'': number of requests received from the same user.
+ * ''blocked'': boolean flag to know if user is permanently blacklisted.
+ * ''last_request'': timestamp that represents the last time a given user
+   made a request from the same distribution channel.
+The initial design of the database module (during the revamp) considered a
+lot of data to be collected (type of request, language, os, etc.), but 
+eventually we decided to keep just the necessary data to know how many
+requests !GetTor has received and to avoid flood. The type of database 
+choosen for this purpose was SQLite. You can check a sample database in
+the code repository (gettor.db).
+=== Blacklisting
+The current blacklisting mechanism is quite simply and it's based on the data
+collected by the 'users' table specified in !GetTor's database, plus some
+extra parameters defined in blacklist.cfg, which help us to stablish limits
+to avoid flood. The current mechanism depends on four parameters:
+ * '''user''': Hashed address/account of the user. It helps to identify malicious users.
+ * '''service''': Service or distribution channel used by the user trying to contact !GetTor.
+ * '''max_req''': Maximum number of requests per user ''and'' service allowed at the moment.
+ * '''wait_time''': Number of minutes a user should wait until she reaches '''max_req'''.
+Both the '''user''' and '''service''' parameteres are obtained in real time
+when !GetTor receives a request. The other two, '''max_req''' and '''wait_time'''
+are specified in blacklist.cfg. Each service module (e.g. SMTP) should be
+in charge of specifying the path to this configuration file and interact
+with the !Blacklisting module according to that information. The current
+mechanism also depends on the '''last_request''', '''times''', and '''blocked'''
+fields of the database for the record associated with '''user''' and '''service'''.
+With all of this, the decision algorithm can be described as follows:
+{{{#!python
+ if blocked:
+    update_user_on_db(user, service, times+1, 1)
+    raise BlacklistError("Blocked user")
+ elif times >= max_req:
+    last = get last_request from db
+    next = last + wait_time
+    if now < next:
+        # too many requests from the same user
+        update_user_on_db(user, service, times+1, 0)
+        raise BlacklistError("Too many requests")
+    else:
+        # fresh user again!
+        update_user_on_db(user, service, 1, 0)
+ else:
+        # adding up a request for user
+        update_user_on_db(user, service, times+1, 0)
+}}}
+This simple mechanism helps us avoid malicious users from flooding one or
+more services/distribution channels with infinite requests. As you may 
+otice, if a user make a request before the '''wait_time''' has passed, then
+the user must wait another '''wait_time''' to make a request again, and
+if a user make a request after she has reached the maximun number of requests
+and waited '''wait_time''', then the counter of her requests is setted to
+one. You can check the {{{_is_blacklisted}}} method of the SMTP module to
+see how a service should interact with the Blacklisting module.
+This mechanism could certainly be improved. If you have any ideas/comments
+about it, please tell us (ideally by filling a ticket :)
+=== Cloud Services
+For each service used by !GetTor to distribute the Tor Browser files there
+should be a script in charge of uploading such files according to the methods
+provided by the service used. Each one of these scripts must assume that the 
+latest Tor Browser files has been downloaded (see Other Scripts) and contemplate the 
+following tasks (in order):
+  1. Get the fingerprint from the key used to sign the Tor Browser.
+  2. Use the Core module to create a new links file (core.create_links_file).
+  3. Obtain the sha256 checksum of each {tar.xz, exe, dmg} file to be uploaded.
+  4. Check that the corresponding .asc signature exists for each {tar.xz, exe, dmg} file to be uploaded.
+  5. Identify the architecture, language and operating system associated to each {tar.xz, exe, dmg} file to be uploaded.
+  6. Create a string describing a new link, using the information identified before.
+  7. Use the Core module to add a link to the new links file created (core.add_link), specifying the service, the operating system, and the language (locale).
+You can check the existing scripts for Dropbox and Google Drive to see the current
+methods used to do the points listed above, specially 1, 3, 5, and 6. For more details 
+on how the links files are created and how the links are stored, check the documentation
+about the Core module. Below is a list of the current services/providers integrated
+with !GetTor:
+ * '''Dropbox''': Deployed. In use for a long time.
+ * '''Google Drive''': Implemented, but not yet deployed.
+ * '''Github''': Implemented, but not yet deployed. This one should be especially useful to distribute the Tor Browser
+ in places where Dropbox and Google Drive are blocked (e.g. China).
+If you have an idea for a new service that could be used (even if you don't know how to implement it),
+please contact us (ideally by filling a ticket :).
+=== Other Scripts
+Below is a list of scripts used for diverse and "smaller" tasks:
+ * '''blacklist.py''': Handle blacklisting of users. Execute blacklist.py -h for more details.
+ * '''create_db.py''': Handle the creation of the SQLite database used by !GetTor for managing blacklisting of users and keep track of basic stats. Execute create_db.py -h for more details.
+ * '''stats.py''': Handle basic stats according to the information stored in the SQLite database. Execute stats.py -h for more details.
+ * '''fetch_latest_torbrowser.py''': Automate the download of Tor Browser files from Tor Project's website and upload of these files to the services used by !GetTor every time a new stable version of Tor Browser is available. Implemented, but not yet deployed. See the source file in the repository for more details.