... | ... | @@ -176,9 +176,20 @@ denial of service attacks. |
|
|
The static mirror system is built of three kinds of hosts:
|
|
|
|
|
|
* `source` - builds and hosts the original content
|
|
|
(`roles::static_source` in Puppet)
|
|
|
* `master` - receives the contents from the source, dispatches it
|
|
|
(atomically) to the mirrors
|
|
|
(atomically) to the mirrors (`roles::static_source` in Puppet)
|
|
|
* `mirror` - serves the contents to the user
|
|
|
(`roles::static_mirror_web` in Puppet)
|
|
|
|
|
|
Content is split into different "components", which are units of
|
|
|
content that get synchronized atomically across the different
|
|
|
hosts. Those components are defined in a YAML file in the
|
|
|
`tor-puppet.git` repository
|
|
|
(`modules/roles/misc/static-components.yaml` at the time of writing,
|
|
|
but it might move to Hiera, see [issue 30020](https://gitlab.torproject.org/tpo/tpa/team/-/issues/30020) and [puppet](puppet)).
|
|
|
|
|
|
TODO: make a diagram?
|
|
|
|
|
|
<!-- this is a rephrased copy of -->
|
|
|
<!-- https://salsa.debian.org/dsa-team/mirror/dsa-puppet/-/blob/master/modules/roles/README.static-mirroring.txt -->
|
... | ... | @@ -188,14 +199,7 @@ provides: whereas individual virtual machines are power-cycled for |
|
|
scheduled maintenance (e.g. kernel upgrades), static mirroring
|
|
|
machines are removed from the DNS during their maintenance.
|
|
|
|
|
|
The term static mirroring infrastructure includes:
|
|
|
|
|
|
• components, specifying the data source and other config options.
|
|
|
See `modules/roles/misc/static-components.yaml`
|
|
|
• a `master` host for each component, responsible only for distributing data,
|
|
|
not for serving data to end users.
|
|
|
• machines with the `static_mirror` Puppet role
|
|
|
• a few scripts around `rsync(1)`
|
|
|
### Change process
|
|
|
|
|
|
When data changes, the `source` is responsible for running
|
|
|
`static-update-component`, which instructs the `master` via SSH to run
|
... | ... | @@ -213,11 +217,153 @@ point the updated data will be served to end users. |
|
|
|
|
|
<!-- end of the copy -->
|
|
|
|
|
|
TODO: expand design. talk about mininag and walk through the [scripts overview](https://salsa.debian.org/dsa-team/mirror/dsa-puppet/-/blob/master/modules/staticsync/files/OVERVIEW)
|
|
|
### Source code inventory
|
|
|
|
|
|
The source code of the static mirror system is spread out in different
|
|
|
files and directories in the `tor-puppet.git` repository:
|
|
|
|
|
|
* `modules/roles/misc/static-components.yaml` lists the "components"
|
|
|
* `modules/roles/manifests/` holds the different Puppet roles:
|
|
|
* `roles::static_mirror` - a generic mirror, see
|
|
|
`staticsync::static_mirror` below
|
|
|
* `roles::static_mirror_web` - a web mirror, including most (but
|
|
|
not necessarily all) components defined in the YAMl
|
|
|
configuration. configures Apache (which the above
|
|
|
doesn't). includes `roles::static_mirror` (and therefore
|
|
|
`staticsync::static_mirror`)
|
|
|
* `roles::static_mirror_onion` - configures the hidden services for
|
|
|
the web mirrors defined above
|
|
|
* `roles::static_source` - a generic static source, see
|
|
|
`staticsync::static_source`, below
|
|
|
* `roles::static_master` - a generic static master, see
|
|
|
`staticsync::static_master` below
|
|
|
* `modules/staticsync/` is the core Puppet module holding most of the
|
|
|
source code:
|
|
|
* `staticsync::static_source` - source, which:
|
|
|
* exports the static user SSH key to the master, punching a hole
|
|
|
in the firewall
|
|
|
* collects the SSH keys from the master(s)
|
|
|
* `staticsync::static_mirror` - a mirror which does the above and:
|
|
|
* deploys the `static-mirror-run` and `static-mirror-run-all`
|
|
|
scripts (see below)
|
|
|
* configures a cron job for `static-mirror-run-all`
|
|
|
* exports a configuration snippet of `/etc/static-clients.conf`
|
|
|
for the **master**
|
|
|
* `staticsync::static_master` - a master which:
|
|
|
* deploys the `static-master-run` and
|
|
|
`static-master-update-component` scripts (see below)
|
|
|
* collects the `static-clients.conf` configuration file, which
|
|
|
is the hostname (`$::fqdn`) of each of the
|
|
|
`static_sync::static_mirror` exports
|
|
|
* configures the `basedir` (currently
|
|
|
`/srv/static.torproject.org`) and `user` home directory
|
|
|
(currently `/home/mirroradm`)
|
|
|
* collects the SSH keys from sources, mirrors and other masters
|
|
|
* exports the SSH key to the mirrors and sources
|
|
|
* `staticsync::base`, included by all of the above, deploys:
|
|
|
* `/etc/static-components.conf`: a file derived from the
|
|
|
`static-components.yaml` config file
|
|
|
* `/etc/staticsync.conf`: polyglot (bash and Python)
|
|
|
configuration file propagating the `base` (currently
|
|
|
`/srv/static.torproject.org`, `masterbase` (currently
|
|
|
`$base/master`) and `staticuser` (currently `mirroradm`)
|
|
|
settings
|
|
|
* `staticsync-ssh-wrap` and `static-update-component` (see below)
|
|
|
|
|
|
TODO: try to figure out why we have `/etc/static-components.conf` and
|
|
|
not directly the `YAML` file shipped to hosts, in
|
|
|
`staticsync::base`. See the `static-components.conf.erb` Puppet
|
|
|
template.
|
|
|
|
|
|
### Scripts walkthrough
|
|
|
|
|
|
<!-- this is a reformatted copy of the `OVERVIEW` in the staticsync
|
|
|
puppet module -->
|
|
|
|
|
|
- `static-update-component` is run by the user on the **source** host.
|
|
|
|
|
|
If not run under sudo as the `staticuser` already, it sudos to the
|
|
|
`staticuser`, re-execing itself. It then SSH to the `static-master`
|
|
|
for that component to run `static-master-update-component`.
|
|
|
|
|
|
LOCKING: none, but see `static-master-update-component`
|
|
|
|
|
|
- `static-master-update-component` is run on the **master** host
|
|
|
|
|
|
It rsyncs the contents from the **source** host to the static
|
|
|
**master**, and then triggers `static-master-run` to push the
|
|
|
content to the mirrors.
|
|
|
|
|
|
The sync happens to a new `<component>-updating.incoming-XXXXXX`
|
|
|
directory. On sync success, `<component>` is replaced with that new
|
|
|
tree, and the `static-master-run` trigger happens.
|
|
|
|
|
|
LOCKING: exclusive locks are held on `<component>.lock`
|
|
|
|
|
|
- `static-master-run` triggers all the mirrors for a component to
|
|
|
initiate syncs.
|
|
|
|
|
|
When all mirrors have an up-to-date tree, they are
|
|
|
instructed to update the `cur` symlink to the new tree.
|
|
|
|
|
|
To begin with, `static-master-run` copies `<component>` to
|
|
|
`<component>-current-push`.
|
|
|
|
|
|
This is the tree all the mirrors then sync from. If the push was
|
|
|
successful, `<component>-current-push` is renamed to
|
|
|
`<component>-current-live`.
|
|
|
|
|
|
LOCKING: exclusive locks are held on `<component>.lock`
|
|
|
|
|
|
- `static-mirror-run` runs on a mirror and syncs components.
|
|
|
|
|
|
There is a symlink called `cur` that points to either `tree-a` or
|
|
|
`tree-b` for each component. the `cur` tree is the one that is
|
|
|
live, the other one usually does not exist, except when a sync is
|
|
|
ongoing (or a previous one failed and we keep a partial tree).
|
|
|
|
|
|
During a sync, we sync to the `tree-<X>` that is not the live one.
|
|
|
When instructed by `static-master-run`, we update the symlink and
|
|
|
remove the old tree.
|
|
|
|
|
|
`static-mirror-run` rsyncs either `-current-push` or `-current-live`
|
|
|
for a component.
|
|
|
|
|
|
LOCKING: during all of `static-mirror-run`, we keep an exclusive
|
|
|
lock on the `<component>` dir, i.e., the directory that holds
|
|
|
`tree-[ab]` and `cur`.
|
|
|
|
|
|
- `static-mirror-run-all`
|
|
|
|
|
|
Run `static-mirror-run` for all components on this mirror, fetching
|
|
|
the `-live-` tree.
|
|
|
|
|
|
LOCKING: none, but see `static-mirror-run`.
|
|
|
|
|
|
- `staticsync-ssh-wrap`
|
|
|
|
|
|
wrapper for ssh job dispatching on source, master, and mirror.
|
|
|
|
|
|
LOCKING: on **master**, when syncing `-live-` trees, a shared lock
|
|
|
is held on `<component>.lock` during the rsync process.
|
|
|
|
|
|
TODO: make a diagram?
|
|
|
<!-- end of the copy -->
|
|
|
|
|
|
The scripts are written in bash except `static-master-run`, written in
|
|
|
Python 2.
|
|
|
|
|
|
### Authentication
|
|
|
|
|
|
Authentication between the static site hosts is entirely done through
|
|
|
SSH. The source hosts are accessible by normal users, which can `sudo`
|
|
|
to a "role" user which has privileges to run the static sync scripts
|
|
|
as sync user. That user then has privileges to contact the master
|
|
|
server which, in turn, can login to the mirrors over SSH as well.
|
|
|
|
|
|
The user's `sudo` configuration is therefore critical and that
|
|
|
`sudoers` configuration could also be considered part of the static
|
|
|
mirror system.
|
|
|
|
|
|
TODO: "audit" the static site mirror design as per https://bluesock.org/~willkg/blog/dev/auditing_projects.html
|
|
|
|
|
|
## Issues
|
|
|
|
... | ... | @@ -257,6 +403,8 @@ The IP addresses are replaced with: |
|
|
|
|
|
Logs are kept for two weeks.
|
|
|
|
|
|
Errrors may be sent by email.
|
|
|
|
|
|
Metrics are scraped by [Prometheus](prometheus) using the "apache"
|
|
|
exporter.
|
|
|
|
... | ... | @@ -281,6 +429,44 @@ The goal of this discussion section is to consider improvements to the |
|
|
static site mirror system at torproject.org. It might also apply to
|
|
|
debian.org, but the focus is currently on TPO.
|
|
|
|
|
|
The static site mirror system has been designed for hosting Debian.org
|
|
|
content. Interestingly, it is not used for the operating system
|
|
|
mirrors itself, which are synchronized using another, separate system
|
|
|
([archvsync](https://salsa.debian.org/mirror-team/archvsync/)).
|
|
|
|
|
|
The maintenance status of the mirror code is unclear: while it is
|
|
|
still in use at Debian.org, it is made of a few sets of components
|
|
|
which are not bundled in a single package. This makes it hard to
|
|
|
follow "upstream", although, in theory, it should be possible to
|
|
|
follow the [dsa-puppet](https://salsa.debian.org/dsa-team/mirror/dsa-puppet/) repository. In practice, that's pretty
|
|
|
difficult because the dsa-puppet and tor-puppet have disconnected
|
|
|
histories. Even if they would have a common ancestor, the code is
|
|
|
spread in multiple directories, which makes it hard to track. There
|
|
|
has been some refactoring to move most of the code in a `staticsync`
|
|
|
module, but we still have files strewn over otehr modules.
|
|
|
|
|
|
The static mirror system was written for Debian.org by Peter
|
|
|
Palfrader. It has also been patches by other DSA members (Stephen
|
|
|
Gran and Julien Cristau both have more than 100 commits on the old
|
|
|
code base).
|
|
|
|
|
|
This service is critical: it distributes the main torproject.org
|
|
|
websites, but also software releases like the tor project source code
|
|
|
and other websites.
|
|
|
|
|
|
The static site system has no unit tests, linting, release process, or
|
|
|
CI. Code is deployed directly through Puppet, on the live servers.
|
|
|
|
|
|
There hasn't been a security audit of the system, as far as we could
|
|
|
tell.
|
|
|
|
|
|
Python 2 porting is probably the most pressing issue in this project:
|
|
|
the `static-master-run` program is written in old Python 2.4
|
|
|
code. Thankfully it is fairly short and should be easy to port.
|
|
|
|
|
|
The YAML configuration duplicates the YAML parsing and data structures
|
|
|
present in Hiera, see [issue 30020](https://gitlab.torproject.org/tpo/tpa/team/-/issues/30020) and [puppet](puppet)).
|
|
|
|
|
|
## Goals
|
|
|
|
|
|
TODO: document requirements
|
... | ... | |