diff --git a/howto/static-component.md b/howto/static-component.md index d43bb32d94368000728907aeb8fbcfdcdae62760..558d6426acf6f62669e086afd2c61d35dee34336 100644 --- a/howto/static-component.md +++ b/howto/static-component.md @@ -176,9 +176,20 @@ denial of service attacks. The static mirror system is built of three kinds of hosts: * `source` - builds and hosts the original content + (`roles::static_source` in Puppet) * `master` - receives the contents from the source, dispatches it - (atomically) to the mirrors + (atomically) to the mirrors (`roles::static_source` in Puppet) * `mirror` - serves the contents to the user + (`roles::static_mirror_web` in Puppet) + +Content is split into different "components", which are units of +content that get synchronized atomically across the different +hosts. Those components are defined in a YAML file in the +`tor-puppet.git` repository +(`modules/roles/misc/static-components.yaml` at the time of writing, +but it might move to Hiera, see [issue 30020](https://gitlab.torproject.org/tpo/tpa/team/-/issues/30020) and [puppet](puppet)). + +TODO: make a diagram? <!-- this is a rephrased copy of --> <!-- https://salsa.debian.org/dsa-team/mirror/dsa-puppet/-/blob/master/modules/roles/README.static-mirroring.txt --> @@ -188,14 +199,7 @@ provides: whereas individual virtual machines are power-cycled for scheduled maintenance (e.g. kernel upgrades), static mirroring machines are removed from the DNS during their maintenance. -The term static mirroring infrastructure includes: - - • components, specifying the data source and other config options. - See `modules/roles/misc/static-components.yaml` - • a `master` host for each component, responsible only for distributing data, - not for serving data to end users. - • machines with the `static_mirror` Puppet role - • a few scripts around `rsync(1)` +### Change process When data changes, the `source` is responsible for running `static-update-component`, which instructs the `master` via SSH to run @@ -213,11 +217,153 @@ point the updated data will be served to end users. <!-- end of the copy --> -TODO: expand design. talk about mininag and walk through the [scripts overview](https://salsa.debian.org/dsa-team/mirror/dsa-puppet/-/blob/master/modules/staticsync/files/OVERVIEW) +### Source code inventory + +The source code of the static mirror system is spread out in different +files and directories in the `tor-puppet.git` repository: + + * `modules/roles/misc/static-components.yaml` lists the "components" + * `modules/roles/manifests/` holds the different Puppet roles: + * `roles::static_mirror` - a generic mirror, see + `staticsync::static_mirror` below + * `roles::static_mirror_web` - a web mirror, including most (but + not necessarily all) components defined in the YAMl + configuration. configures Apache (which the above + doesn't). includes `roles::static_mirror` (and therefore + `staticsync::static_mirror`) + * `roles::static_mirror_onion` - configures the hidden services for + the web mirrors defined above + * `roles::static_source` - a generic static source, see + `staticsync::static_source`, below + * `roles::static_master` - a generic static master, see + `staticsync::static_master` below + * `modules/staticsync/` is the core Puppet module holding most of the + source code: + * `staticsync::static_source` - source, which: + * exports the static user SSH key to the master, punching a hole + in the firewall + * collects the SSH keys from the master(s) + * `staticsync::static_mirror` - a mirror which does the above and: + * deploys the `static-mirror-run` and `static-mirror-run-all` + scripts (see below) + * configures a cron job for `static-mirror-run-all` + * exports a configuration snippet of `/etc/static-clients.conf` + for the **master** + * `staticsync::static_master` - a master which: + * deploys the `static-master-run` and + `static-master-update-component` scripts (see below) + * collects the `static-clients.conf` configuration file, which + is the hostname (`$::fqdn`) of each of the + `static_sync::static_mirror` exports + * configures the `basedir` (currently + `/srv/static.torproject.org`) and `user` home directory + (currently `/home/mirroradm`) + * collects the SSH keys from sources, mirrors and other masters + * exports the SSH key to the mirrors and sources + * `staticsync::base`, included by all of the above, deploys: + * `/etc/static-components.conf`: a file derived from the + `static-components.yaml` config file + * `/etc/staticsync.conf`: polyglot (bash and Python) + configuration file propagating the `base` (currently + `/srv/static.torproject.org`, `masterbase` (currently + `$base/master`) and `staticuser` (currently `mirroradm`) + settings + * `staticsync-ssh-wrap` and `static-update-component` (see below) + +TODO: try to figure out why we have `/etc/static-components.conf` and +not directly the `YAML` file shipped to hosts, in +`staticsync::base`. See the `static-components.conf.erb` Puppet +template. + +### Scripts walkthrough + +<!-- this is a reformatted copy of the `OVERVIEW` in the staticsync +puppet module --> + +- `static-update-component` is run by the user on the **source** host. + + If not run under sudo as the `staticuser` already, it sudos to the + `staticuser`, re-execing itself. It then SSH to the `static-master` + for that component to run `static-master-update-component`. + + LOCKING: none, but see `static-master-update-component` + +- `static-master-update-component` is run on the **master** host + + It rsyncs the contents from the **source** host to the static + **master**, and then triggers `static-master-run` to push the + content to the mirrors. + + The sync happens to a new `<component>-updating.incoming-XXXXXX` + directory. On sync success, `<component>` is replaced with that new + tree, and the `static-master-run` trigger happens. + + LOCKING: exclusive locks are held on `<component>.lock` + +- `static-master-run` triggers all the mirrors for a component to + initiate syncs. + + When all mirrors have an up-to-date tree, they are + instructed to update the `cur` symlink to the new tree. + + To begin with, `static-master-run` copies `<component>` to + `<component>-current-push`. + + This is the tree all the mirrors then sync from. If the push was + successful, `<component>-current-push` is renamed to + `<component>-current-live`. + + LOCKING: exclusive locks are held on `<component>.lock` + +- `static-mirror-run` runs on a mirror and syncs components. + + There is a symlink called `cur` that points to either `tree-a` or + `tree-b` for each component. the `cur` tree is the one that is + live, the other one usually does not exist, except when a sync is + ongoing (or a previous one failed and we keep a partial tree). + + During a sync, we sync to the `tree-<X>` that is not the live one. + When instructed by `static-master-run`, we update the symlink and + remove the old tree. + + `static-mirror-run` rsyncs either `-current-push` or `-current-live` + for a component. + + LOCKING: during all of `static-mirror-run`, we keep an exclusive + lock on the `<component>` dir, i.e., the directory that holds + `tree-[ab]` and `cur`. + +- `static-mirror-run-all` + + Run `static-mirror-run` for all components on this mirror, fetching + the `-live-` tree. + + LOCKING: none, but see `static-mirror-run`. + +- `staticsync-ssh-wrap` + + wrapper for ssh job dispatching on source, master, and mirror. + + LOCKING: on **master**, when syncing `-live-` trees, a shared lock + is held on `<component>.lock` during the rsync process. -TODO: make a diagram? +<!-- end of the copy --> + +The scripts are written in bash except `static-master-run`, written in +Python 2. + +### Authentication + +Authentication between the static site hosts is entirely done through +SSH. The source hosts are accessible by normal users, which can `sudo` +to a "role" user which has privileges to run the static sync scripts +as sync user. That user then has privileges to contact the master +server which, in turn, can login to the mirrors over SSH as well. + +The user's `sudo` configuration is therefore critical and that +`sudoers` configuration could also be considered part of the static +mirror system. -TODO: "audit" the static site mirror design as per https://bluesock.org/~willkg/blog/dev/auditing_projects.html ## Issues @@ -257,6 +403,8 @@ The IP addresses are replaced with: Logs are kept for two weeks. +Errrors may be sent by email. + Metrics are scraped by [Prometheus](prometheus) using the "apache" exporter. @@ -281,6 +429,44 @@ The goal of this discussion section is to consider improvements to the static site mirror system at torproject.org. It might also apply to debian.org, but the focus is currently on TPO. +The static site mirror system has been designed for hosting Debian.org +content. Interestingly, it is not used for the operating system +mirrors itself, which are synchronized using another, separate system +([archvsync](https://salsa.debian.org/mirror-team/archvsync/)). + +The maintenance status of the mirror code is unclear: while it is +still in use at Debian.org, it is made of a few sets of components +which are not bundled in a single package. This makes it hard to +follow "upstream", although, in theory, it should be possible to +follow the [dsa-puppet](https://salsa.debian.org/dsa-team/mirror/dsa-puppet/) repository. In practice, that's pretty +difficult because the dsa-puppet and tor-puppet have disconnected +histories. Even if they would have a common ancestor, the code is +spread in multiple directories, which makes it hard to track. There +has been some refactoring to move most of the code in a `staticsync` +module, but we still have files strewn over otehr modules. + +The static mirror system was written for Debian.org by Peter +Palfrader. It has also been patches by other DSA members (Stephen +Gran and Julien Cristau both have more than 100 commits on the old +code base). + +This service is critical: it distributes the main torproject.org +websites, but also software releases like the tor project source code +and other websites. + +The static site system has no unit tests, linting, release process, or +CI. Code is deployed directly through Puppet, on the live servers. + +There hasn't been a security audit of the system, as far as we could +tell. + +Python 2 porting is probably the most pressing issue in this project: +the `static-master-run` program is written in old Python 2.4 +code. Thankfully it is fairly short and should be easy to port. + +The YAML configuration duplicates the YAML parsing and data structures +present in Hiera, see [issue 30020](https://gitlab.torproject.org/tpo/tpa/team/-/issues/30020) and [puppet](puppet)). + ## Goals TODO: document requirements