Skip to content
Snippets Groups Projects
Unverified Commit ee86cb27 authored by anarcat's avatar anarcat
Browse files

expand the static site docs

parent b14e243a
No related branches found
No related tags found
No related merge requests found
......@@ -176,9 +176,20 @@ denial of service attacks.
The static mirror system is built of three kinds of hosts:
* `source` - builds and hosts the original content
(`roles::static_source` in Puppet)
* `master` - receives the contents from the source, dispatches it
(atomically) to the mirrors
(atomically) to the mirrors (`roles::static_source` in Puppet)
* `mirror` - serves the contents to the user
(`roles::static_mirror_web` in Puppet)
Content is split into different "components", which are units of
content that get synchronized atomically across the different
hosts. Those components are defined in a YAML file in the
`tor-puppet.git` repository
(`modules/roles/misc/static-components.yaml` at the time of writing,
but it might move to Hiera, see [issue 30020](https://gitlab.torproject.org/tpo/tpa/team/-/issues/30020) and [puppet](puppet)).
TODO: make a diagram?
<!-- this is a rephrased copy of -->
<!-- https://salsa.debian.org/dsa-team/mirror/dsa-puppet/-/blob/master/modules/roles/README.static-mirroring.txt -->
......@@ -188,14 +199,7 @@ provides: whereas individual virtual machines are power-cycled for
scheduled maintenance (e.g. kernel upgrades), static mirroring
machines are removed from the DNS during their maintenance.
The term static mirroring infrastructure includes:
• components, specifying the data source and other config options.
See `modules/roles/misc/static-components.yaml`
• a `master` host for each component, responsible only for distributing data,
not for serving data to end users.
• machines with the `static_mirror` Puppet role
• a few scripts around `rsync(1)`
### Change process
When data changes, the `source` is responsible for running
`static-update-component`, which instructs the `master` via SSH to run
......@@ -213,11 +217,153 @@ point the updated data will be served to end users.
<!-- end of the copy -->
TODO: expand design. talk about mininag and walk through the [scripts overview](https://salsa.debian.org/dsa-team/mirror/dsa-puppet/-/blob/master/modules/staticsync/files/OVERVIEW)
### Source code inventory
The source code of the static mirror system is spread out in different
files and directories in the `tor-puppet.git` repository:
* `modules/roles/misc/static-components.yaml` lists the "components"
* `modules/roles/manifests/` holds the different Puppet roles:
* `roles::static_mirror` - a generic mirror, see
`staticsync::static_mirror` below
* `roles::static_mirror_web` - a web mirror, including most (but
not necessarily all) components defined in the YAMl
configuration. configures Apache (which the above
doesn't). includes `roles::static_mirror` (and therefore
`staticsync::static_mirror`)
* `roles::static_mirror_onion` - configures the hidden services for
the web mirrors defined above
* `roles::static_source` - a generic static source, see
`staticsync::static_source`, below
* `roles::static_master` - a generic static master, see
`staticsync::static_master` below
* `modules/staticsync/` is the core Puppet module holding most of the
source code:
* `staticsync::static_source` - source, which:
* exports the static user SSH key to the master, punching a hole
in the firewall
* collects the SSH keys from the master(s)
* `staticsync::static_mirror` - a mirror which does the above and:
* deploys the `static-mirror-run` and `static-mirror-run-all`
scripts (see below)
* configures a cron job for `static-mirror-run-all`
* exports a configuration snippet of `/etc/static-clients.conf`
for the **master**
* `staticsync::static_master` - a master which:
* deploys the `static-master-run` and
`static-master-update-component` scripts (see below)
* collects the `static-clients.conf` configuration file, which
is the hostname (`$::fqdn`) of each of the
`static_sync::static_mirror` exports
* configures the `basedir` (currently
`/srv/static.torproject.org`) and `user` home directory
(currently `/home/mirroradm`)
* collects the SSH keys from sources, mirrors and other masters
* exports the SSH key to the mirrors and sources
* `staticsync::base`, included by all of the above, deploys:
* `/etc/static-components.conf`: a file derived from the
`static-components.yaml` config file
* `/etc/staticsync.conf`: polyglot (bash and Python)
configuration file propagating the `base` (currently
`/srv/static.torproject.org`, `masterbase` (currently
`$base/master`) and `staticuser` (currently `mirroradm`)
settings
* `staticsync-ssh-wrap` and `static-update-component` (see below)
TODO: try to figure out why we have `/etc/static-components.conf` and
not directly the `YAML` file shipped to hosts, in
`staticsync::base`. See the `static-components.conf.erb` Puppet
template.
### Scripts walkthrough
<!-- this is a reformatted copy of the `OVERVIEW` in the staticsync
puppet module -->
- `static-update-component` is run by the user on the **source** host.
If not run under sudo as the `staticuser` already, it sudos to the
`staticuser`, re-execing itself. It then SSH to the `static-master`
for that component to run `static-master-update-component`.
LOCKING: none, but see `static-master-update-component`
- `static-master-update-component` is run on the **master** host
It rsyncs the contents from the **source** host to the static
**master**, and then triggers `static-master-run` to push the
content to the mirrors.
The sync happens to a new `<component>-updating.incoming-XXXXXX`
directory. On sync success, `<component>` is replaced with that new
tree, and the `static-master-run` trigger happens.
LOCKING: exclusive locks are held on `<component>.lock`
- `static-master-run` triggers all the mirrors for a component to
initiate syncs.
When all mirrors have an up-to-date tree, they are
instructed to update the `cur` symlink to the new tree.
To begin with, `static-master-run` copies `<component>` to
`<component>-current-push`.
This is the tree all the mirrors then sync from. If the push was
successful, `<component>-current-push` is renamed to
`<component>-current-live`.
LOCKING: exclusive locks are held on `<component>.lock`
- `static-mirror-run` runs on a mirror and syncs components.
There is a symlink called `cur` that points to either `tree-a` or
`tree-b` for each component. the `cur` tree is the one that is
live, the other one usually does not exist, except when a sync is
ongoing (or a previous one failed and we keep a partial tree).
During a sync, we sync to the `tree-<X>` that is not the live one.
When instructed by `static-master-run`, we update the symlink and
remove the old tree.
`static-mirror-run` rsyncs either `-current-push` or `-current-live`
for a component.
LOCKING: during all of `static-mirror-run`, we keep an exclusive
lock on the `<component>` dir, i.e., the directory that holds
`tree-[ab]` and `cur`.
- `static-mirror-run-all`
Run `static-mirror-run` for all components on this mirror, fetching
the `-live-` tree.
LOCKING: none, but see `static-mirror-run`.
- `staticsync-ssh-wrap`
wrapper for ssh job dispatching on source, master, and mirror.
LOCKING: on **master**, when syncing `-live-` trees, a shared lock
is held on `<component>.lock` during the rsync process.
TODO: make a diagram?
<!-- end of the copy -->
The scripts are written in bash except `static-master-run`, written in
Python 2.
### Authentication
Authentication between the static site hosts is entirely done through
SSH. The source hosts are accessible by normal users, which can `sudo`
to a "role" user which has privileges to run the static sync scripts
as sync user. That user then has privileges to contact the master
server which, in turn, can login to the mirrors over SSH as well.
The user's `sudo` configuration is therefore critical and that
`sudoers` configuration could also be considered part of the static
mirror system.
TODO: "audit" the static site mirror design as per https://bluesock.org/~willkg/blog/dev/auditing_projects.html
## Issues
......@@ -257,6 +403,8 @@ The IP addresses are replaced with:
Logs are kept for two weeks.
Errrors may be sent by email.
Metrics are scraped by [Prometheus](prometheus) using the "apache"
exporter.
......@@ -281,6 +429,44 @@ The goal of this discussion section is to consider improvements to the
static site mirror system at torproject.org. It might also apply to
debian.org, but the focus is currently on TPO.
The static site mirror system has been designed for hosting Debian.org
content. Interestingly, it is not used for the operating system
mirrors itself, which are synchronized using another, separate system
([archvsync](https://salsa.debian.org/mirror-team/archvsync/)).
The maintenance status of the mirror code is unclear: while it is
still in use at Debian.org, it is made of a few sets of components
which are not bundled in a single package. This makes it hard to
follow "upstream", although, in theory, it should be possible to
follow the [dsa-puppet](https://salsa.debian.org/dsa-team/mirror/dsa-puppet/) repository. In practice, that's pretty
difficult because the dsa-puppet and tor-puppet have disconnected
histories. Even if they would have a common ancestor, the code is
spread in multiple directories, which makes it hard to track. There
has been some refactoring to move most of the code in a `staticsync`
module, but we still have files strewn over otehr modules.
The static mirror system was written for Debian.org by Peter
Palfrader. It has also been patches by other DSA members (Stephen
Gran and Julien Cristau both have more than 100 commits on the old
code base).
This service is critical: it distributes the main torproject.org
websites, but also software releases like the tor project source code
and other websites.
The static site system has no unit tests, linting, release process, or
CI. Code is deployed directly through Puppet, on the live servers.
There hasn't been a security audit of the system, as far as we could
tell.
Python 2 porting is probably the most pressing issue in this project:
the `static-master-run` program is written in old Python 2.4
code. Thankfully it is fairly short and should be easy to port.
The YAML configuration duplicates the YAML parsing and data structures
present in Hiera, see [issue 30020](https://gitlab.torproject.org/tpo/tpa/team/-/issues/30020) and [puppet](puppet)).
## Goals
TODO: document requirements
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment