GitLab is used only for code review, issue tracking and project management. Canonical locations for source code are still https://gitweb.torproject.org/ https://git.torproject.org/ and git-rw.torproject.org.

architecture.md 6.33 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155
Design and architecture
=======================

If you are new to rdsys and seek to understand its design and architecture, this
document is for you.  It helps you understand rdsys's information flow, its
layers of abstraction, and how the code is organised.

Layers of abstraction
---------------------

Rdsys aspires to implement the design philosophy of [The Clean
Architecture](https://blog.cleancoder.com/uncle-bob/2012/08/13/the-clean-architecture.html).
Applied to our domain of circumvention proxy distribution, this means the
following:

At the lowest layer, we have rdsys's *domain logic*.  These are abstract
building blocks that implement rdsys's "axioms".  An example is the notion of a
*resource* – a "thing" that rdsys hands out to users.  This can be a Tor bridge,
a VPN proxy, a Snowflake proxy, or download links for circumvention software.
In Go, a resource is an interface (it's called `Resource` and is defined in
[domain.go](https://gitlab.torproject.org/tpo/anti-censorship/rdsys/-/blob/master/pkg/core/domain.go)),
which dictates what methods an object has to implement.

On top of the domain logic are *use cases*.  These are actually useful
manifestations of the building blogs in the domain logic.  An abstract resource
turns into a Tor bridge – a concrete "thing" that we can hand out to users.
Distributors are a different kind of use case because they turn the abstract
idea of hashrings into code that actually hands out things to users.

Rdsys's processes talk to each other via a *delivery mechanism* – currently
implemented as HTTP connections but this could also be domain sockets, or remote
procedure calls.  The Go interface `Mechanism` (defined in
[ipc.go](https://gitlab.torproject.org/tpo/anti-censorship/rdsys/-/blob/master/pkg/delivery/ipc.go))
specifies the methods that a delivery mechanism must implement.

Finally, there's the *presentation* layer, which implements the user interfaces
that users interact with.  An example is a Web page that allows users to request
resources.  The presentation layer wraps the distributor code in the use cases
layer.

It is important to understand that software dependencies *only point downwards*
in the abstraction layers: The presentation layer uses code that's part of the
delivery and use cases layer but the use cases layer *must not* use code that's
in the presentation layer.  This principle keeps the code reasonably easy to
maintain and allows for drop-in replacements of, say, the delivery layer.

The following diagram illustrates a simplification of the relationships
discussed above.  The black arrows denote dependencies.  Note how all
dependencies only point downwards.

```mermaid
graph TB
  subgraph "Core logic"
  Resource[Resource]
  Hashring[Hashring]
  end

  subgraph "Delivery"
  IPC[IPC mechanisms]
  end

  subgraph "Use cases"
  Transports[Tor bridge, e.g. obfs4] --> Resource
  Distributor[Distributor, e.g. Salmon] --> Hashring
  Distributor --> Resource
  Distributor --> IPC
  end

  subgraph "Presentation"
  Frontend[Web frontend] --> Distributor
  Frontend --> IPC
  end
```

Software architecture
---------------------

Rdsys implements a microservice-based architecture and consists of one backend
process and several distributor processes.  The distributors talk to the backend
over an HTTP streaming API.  The backend exposes two APIs and one Web page:

1. A public API that lets resources register themselves.
2. A private API that supplies distributors with resources.
3. A Web page that shows the status of specific resources.

To test resources (or more specifically: Tor bridges), rdsys relies on
[bridgestrap](https://gitlab.torproject.org/tpo/anti-censorship/bridgestrap).

```mermaid
graph TB

  subgraph "rdsys"
  Backend[Backend] --> Bridgestrap[Bridgestrap]
  Distributor1[Distributor 1] --> Backend
  Distributor2[Distributor 2] --> Backend
  Distributor3[Distributor 3] --> Backend
end
```

Information flow
----------------

This subsection discusses how a resource goes from being added to rdsys to being
handed out to users.

1. A resource can make it into rdsys in two ways:

   1.1 By being passively pulled by rdsys (see
       [kraken.go](https://gitlab.torproject.org/tpo/anti-censorship/rdsys/-/blob/master/internal/kraken.go)).
       An example are Tor bridge descriptors, which rdsys parses periodically.

   1.2 By actively registering itself.  Resources can send an HTTP GET request
       to rdsys to register themselves for distribution (see
       [backend.go](https://gitlab.torproject.org/tpo/anti-censorship/rdsys/-/blob/master/internal/backend.go).
       An example are [Snowflakes proxies](https://snowflake.torproject.org)
       at least in theory because Snowflakes are stand-alone proxies without a
       Tor process that would take care of "publishing" a resource.

2. Once a resource is in rdsys's backend, it's ready to be handed out to a
   distributor because the backend does not handle resource distribution.  Each
   resource maps to one – and only one – distributor.

3. Distributors obtain resources from rdsys's backend by talking to an HTTP API.
   This API provides distributors with an initial, deterministically-selected
   set of resources, followed by resource updates, which are sent whenever the
   set of resources changes, i.e. resources disappear, change, or are added.

4. The distribution of resources is at the discretion of distributors.  It is
   the distributor's responsibility to 1) smartly hand out resources to users,
   2) implement intuitive UIs, and 3) thwart Sybil attacks.

Directory structure
-------------------

Here is an overview of rdsys's directory structure and the purpose of each
directory.

* bin/ 
* cmd/ (Command line tools that invoke the backend and distributors.)
* conf/ (Rdsys's configuration file.)
* doc/ (Documentation, e.g. the document you are reading right now.)
* internal/ (code that's internal to rdsys)
* pkg/ (code that can be reused by other projects)
  - core/ (Rdsys's elementary building blocks; the lowest layer of abstraction.)
  - usecases/ (Use cases that emerge from the building blocks, e.g. resources and distributors.)
  - delivery/ (Rdsys's IPC mechanisms.)
  - presentation/ (The user-facing frontends for distributors.)

If you are new to rdsys's code base, it's best to work your way up the layers
of abstraction.  Read the code in the following order:

1. pkg/core/
2. pkg/usecases/
3. pkg/delivery/
4. pkg/presentation/