Skip to content
Snippets Groups Projects
status.md 16.93 KiB

A "status" dashboard is a simple website that allows service admins to clearly and simply announce down times and recovery.

Note that this be considered part of the documentation system, but is documented separately.

Tutorial

Local development environment

To install the development environment for the status site, you should have a copy of the Hugo static site generator and the git repository:

sudo apt install hugo
git clone --recursive -b main git@git-rw.torproject.org:project/web/status-site
cd status-site

WARNING: the URL of the Git repository changed! It used to be hosted at GitLab, but is now hosted at Gitolite. The repository is mirrored to GitLab, but pushing there will not trigger build jobs.

Then you can start a local development server to preview the site with:

hugo serve --baseUrl=http://localhost/
firefox https://localhost:1313/

The content can also be built in the public/ directory with, simply:

hugo

Creating new issues

Issues are stored in content/issues/. You can create a new issue with hugo new, for example:

hugo new issues/2021-02-03-testing-cstate-again.md

This create the file from a pre-filled template (called an archetype in Hugo) and put it in content/issues/2021-02-03-testing-cstate-again.md.

If you do not have hugo installed locally, you can also copy the template directly (from themes/cstate/archetypes/default.md), or copy an existing issue and use it as a template.

Otherwise the upstream guide on how to create issues is fairly thorough and should be followed.

In general, keep in mind that the date field is when the issue started, not when you posted the issue, see this feature request asking for an explicit "update" field.

Also note that you can add draft: true to the front-matter (the block on top) to keep the post from being published on the front page before it is ready.

Uploading site to the static mirror system

The status.torproject.org site currently lives in the static mirror system and uses the git-rw repository which get built via Jenkins once you push to the repository.

In other words, uploading the site is automated by continuous integration. So you simply need to commit and push:

git commit -a -myolo
git push

Note that only the webwml group has access to the repository for now.

You will see progress of the Jenkins jobs:

If all goes well, the changes should propagate to the mirrors within about 5 to 10 minutes, depending on how busy Jenkins is.

If the jobs did not trigger, make sure you are pushing to the Gitolite server (git-rw.torproject.org) and NOT the GitLab server, which is just a mirror and cannot currently trigger Jenkins jobs.

Merge requests may also be issued from the mirror of the repository on GitLab:

https://gitlab.torproject.org/tpo/tpa/status-site

... but will need to be merged into the git-rw server by someone in the above group to take effect. More people have access to the GitLab repository and should therefore be able to collaborate there.

See also the disaster recovery options below.

Keep in mind that this is a public website. You might want to talk with the comms@ people before publishing big or sensitive announcements.

How-to

Changing categories

cState relies on "systems" which live inside a "category" For example, the "v3 onion services" are in the "Tor network" category. Those are defined in the config.yml file, and each issue (in content/issues) refers to one or more "system" that is affected by it.

Theming

The logo lives in static/logo.png. Some colors are defined in config.yml, search for Colors throughout cState.

Pager playbook

The only Nagios warning that can come out of this service is if the static synchronisation fails. See the static site system for more information on diagnosing those.

Disaster recovery

It should be possible to deploy the static website anywhere that supports plain HTML, assuming you have a copy of the git repository.

The instructions below assume you have a copy of the git repository. Make sure you follow the installation instructions to also clone the submodules! If the git repository is not available, you could start from scratch using the example repository as well.

From here on, it is assumed you have a copy of the git repository (or the example one).

Those procedures were not tested.

Manual deployment to the static mirror system

If git-rw is down, you can upload the public/ folder content under /srv/status.torproject.org/htdocs.

The canonical source for the static websites rotation is defined in Puppet (in modules/roles/misc/static-components.yaml) and is currently set to staticiforme.torproject.org. This rsync command should be enough:

rsync -rtP public/ anarcat@staticiforme.torproject.org:/srv/status.torproject.org/htdocs

NOTE: there is a copy of the git repository in /status-site as well. Ignore it: it's out of date but could be used to build the website in a pinch.

Then the new source material needs to be synchronized to the mirrors, with:

sudo -u torwww static-update-component status.torproject.org

This requires membership to the torwww group.

Don't forget to push the changes to the git repository, once that is available. It's important so that the next people can start from your changes:

git commit -a -myolo
git push

Netlify deployment

Upstream has instructions to deploy to Netlify, which, in our case, might be as simple as following this link and filling in those settings:

  • Build command: hugo
  • Publish directory: public
  • Add one build environment variable
    • Key: HUGO_VERSION
    • Value: 0.48 (or later)

Then, of course, DNS needs to be updated to point there.

GitLab pages deployment

A site could also be deployed on another GitLab server with "GitLab pages" enabled. For example, if the repository is pushed to https://gitlab.com/, the GitLab CI/CD system there will automatically pick it up and publish it.

Then DNS needs to be tweaked to point there as well.

Reference

The status.tpo dashboard is built with cstate, which is a theme for the Hugo static site generator.

Installation

See the instructions on how to setup a local development environment and the design section for more information on how this is setup.

SLA

This service should be highly available. It should support failure from one or all point of presence: if all fail, it should be easy to deploy it to a third-party provider.

Design

The status site is part of the static mirror system and is built with Jenkins jobs, from a git repository on the git server. This was setup this way because that is how every other static website is currently built.

This involved:

We also considered using GitLab CI for deployment but (a) GitLab pages is not yet setup and (b) it doesn't integrate well with the static mirror system for now. See the broader discussion of the static site system improvements.

Issues

There is no issue tracker specifically for this project, File or search for issues in the team issue tracker.

Upstream issues can be found and filed in the GitHub issue tracker.

Monitoring and testing

The site, like other static mirrors, is monitored by Nagios with the dsa_check_staticsync check, which ensures all mirrors are up to date.

Logs and metrics

There are no logs or metrics specific to this service, see the static site service for details.

Backups

Does not need special backups: backed up as part of the regular static site and git services.

Other documentation

Discussion

Overview

This project comes from two places:

  1. during the 2020 TPA user survey, some respondents suggested to document "down times of 1h or longer" and better communicate about service statuses

  2. separately, following a major outage in the Tor network due to a DDOS, the network team and network health teams asked for a dashboard to inform tor users about such problems in the future

This is therefore a project spanning multiple teams, with different stakeholders. The general idea is to have a site (say status.torproject.org) that simply shows users how things are going, in an easy to understand form.

Goals

In general, the goal is to provide a simple interface to provide users with status updates.

Must have

  • user-friendly: the public website must be easy to understand by the Tor wider community of users (not just TPI/TPA)
  • status updates and progress: "post status problem we know about so the world can learn if problems are known to the Tor team."
    • example: "[recent] v3 outage where we could have put out a small FAQ right away (go static HTML!) and then update the world as we figure out the problem but also expected return to normal."
  • multi-stakeholder: "easily editable by many of us namely likely the network health team and we could also have the network team to help out"
  • simple to deploy and use: pushing an update shouldn't require complex software or procedures. editing a text file, committing and pushing, or building with a single command and pushing the HTML, for example, is simple enough. installing a MySQL database and PHP server, for example, is not simple enough.
  • keep it simple
  • free-software based

Nice to have

  • deployment through GitLab (pages?), with contingency plans
  • separate TLD to thwart DNS-based attacks against torproject.org
  • same tool for multiple teams
  • per-team filtering
  • RSS feeds
  • integration with social media?
  • responsive design

Non-Goals

  • automation: updating the site is a manual process. no automatic reports of sensors/metrics or Nagios, as this tends to complicate the implementation and cause false positives

Approvals required

TPA, network team, network health team.