A "status" dashboard is a simple website that allows service admins to clearly and simply announce down times and recovery.
Note that this be considered part of the documentation system, but is documented separately.
Tutorial
Local development environment
To install the development environment for the status site, you should have a copy of the Hugo static site generator and the git repository:
sudo apt install hugo
git clone --recursive -b main git@git-rw.torproject.org:project/web/status-site
cd status-site
WARNING: the URL of the Git repository changed! It used to be hosted at GitLab, but is now hosted at Gitolite. The repository is mirrored to GitLab, but pushing there will not trigger build jobs.
Then you can start a local development server to preview the site with:
hugo serve --baseUrl=http://localhost/
firefox https://localhost:1313/
The content can also be built in the public/
directory with, simply:
hugo
Creating new issues
Issues are stored in content/issues/
. You can create a new issue
with hugo new
, for example:
hugo new issues/2021-02-03-testing-cstate-again.md
This create the file from a pre-filled template (called an
archetype in Hugo)
and put it in content/issues/2021-02-03-testing-cstate-again.md
.
If you do not have hugo installed locally, you can also copy the
template directly (from themes/cstate/archetypes/default.md
), or
copy an existing issue and use it as a template.
Otherwise the upstream guide on how to create issues is fairly thorough and should be followed.
In general, keep in mind that the date
field is when the issue
started, not when you posted the issue, see this feature
request asking for an explicit "update" field.
Also note that you can add draft: true
to the front-matter (the
block on top) to keep the post from being published on the front page
before it is ready.
Uploading site to the static mirror system
The status.torproject.org
site currently lives in the static mirror
system and uses the git-rw repository which get built via
Jenkins once you push to the repository.
In other words, uploading the site is automated by continuous integration. So you simply need to commit and push:
git commit -a -myolo
git push
Note that only the webwml
group has access to the repository for
now.
You will see progress of the Jenkins jobs:
- hugo-website-status (build job)
- hugo-website-status-install (install job)
If all goes well, the changes should propagate to the mirrors within about 5 to 10 minutes, depending on how busy Jenkins is.
If the jobs did not trigger, make sure you are pushing to the Gitolite
server (git-rw.torproject.org
) and NOT the GitLab server, which is
just a mirror and cannot currently trigger Jenkins jobs.
Merge requests may also be issued from the mirror of the repository on GitLab:
https://gitlab.torproject.org/tpo/tpa/status-site
... but will need to be merged into the git-rw
server by someone in
the above group to take effect. More people have access to the GitLab
repository and should therefore be able to collaborate there.
See also the disaster recovery options below.
Keep in mind that this is a public website. You might want to talk
with the comms@
people before publishing big or sensitive
announcements.
How-to
Changing categories
cState relies on "systems" which live inside a "category" For example,
the "v3 onion services" are in the "Tor network" category. Those are
defined in the config.yml
file, and each issue (in content/issues
)
refers to one or more "system" that is affected by it.
Theming
The logo lives in static/logo.png
. Some colors are defined in
config.yml
, search for Colors throughout cState
.
Pager playbook
The only Nagios warning that can come out of this service is if the static synchronisation fails. See the static site system for more information on diagnosing those.
Disaster recovery
It should be possible to deploy the static website anywhere that supports plain HTML, assuming you have a copy of the git repository.
The instructions below assume you have a copy of the git repository. Make sure you follow the installation instructions to also clone the submodules! If the git repository is not available, you could start from scratch using the example repository as well.
From here on, it is assumed you have a copy of the git repository (or the example one).
Those procedures were not tested.
Manual deployment to the static mirror system
If git-rw
is down, you can upload the public/
folder content under
/srv/status.torproject.org/htdocs
.
The canonical source for the static websites rotation is defined in
Puppet (in modules/roles/misc/static-components.yaml
) and is
currently set to staticiforme.torproject.org
. This rsync
command
should be enough:
rsync -rtP public/ anarcat@staticiforme.torproject.org:/srv/status.torproject.org/htdocs
NOTE: there is a copy of the git repository in /status-site
as
well. Ignore it: it's out of date but could be used to build the
website in a pinch.
Then the new source material needs to be synchronized to the mirrors, with:
sudo -u torwww static-update-component status.torproject.org
This requires membership to the torwww
group.
Don't forget to push the changes to the git repository, once that is available. It's important so that the next people can start from your changes:
git commit -a -myolo
git push
Netlify deployment
Upstream has instructions to deploy to Netlify, which, in our case, might be as simple as following this link and filling in those settings:
- Build command:
hugo
- Publish directory:
public
- Add one build environment variable
- Key:
HUGO_VERSION
- Value:
0.48
(or later)
- Key:
Then, of course, DNS needs to be updated to point there.
GitLab pages deployment
A site could also be deployed on another GitLab server with "GitLab pages" enabled. For example, if the repository is pushed to https://gitlab.com/, the GitLab CI/CD system there will automatically pick it up and publish it.
Then DNS needs to be tweaked to point there as well.
Reference
The status.tpo
dashboard is built with cstate, which is a theme
for the Hugo static site generator.
Installation
See the instructions on how to setup a local development environment and the design section for more information on how this is setup.
SLA
This service should be highly available. It should support failure from one or all point of presence: if all fail, it should be easy to deploy it to a third-party provider.
Design
The status site is part of the static mirror system and is built with Jenkins jobs, from a git repository on the git server. This was setup this way because that is how every other static website is currently built.
This involved:
- a new static component owned by
torwww
(in thetor-puppet.git
repository) - a new build script in the jenkins/tools.git repository
- a new build job in the jenkins/jobs.git repository
- a new entry in the ssh wrapper in the admin/static-builds.git repository
- a new gitolite repository with hooks to ping the Jenkins server and mirror to GitLab
We also considered using GitLab CI for deployment but (a) GitLab pages is not yet setup and (b) it doesn't integrate well with the static mirror system for now. See the broader discussion of the static site system improvements.
Issues
There is no issue tracker specifically for this project, File or search for issues in the team issue tracker.
Upstream issues can be found and filed in the GitHub issue tracker.
Monitoring and testing
The site, like other static mirrors, is monitored by Nagios with
the dsa_check_staticsync
check, which ensures all mirrors are up to
date.
Logs and metrics
There are no logs or metrics specific to this service, see the static site service for details.
Backups
Does not need special backups: backed up as part of the regular static site and git services.
Other documentation
- cState home page
- demo site
- cState wiki, see in particular the usage and configuration guides
Discussion
Overview
This project comes from two places:
-
during the 2020 TPA user survey, some respondents suggested to document "down times of 1h or longer" and better communicate about service statuses
-
separately, following a major outage in the Tor network due to a DDOS, the network team and network health teams asked for a dashboard to inform tor users about such problems in the future
This is therefore a project spanning multiple teams, with different stakeholders. The general idea is to have a site (say status.torproject.org) that simply shows users how things are going, in an easy to understand form.
Goals
In general, the goal is to provide a simple interface to provide users with status updates.
Must have
- user-friendly: the public website must be easy to understand by the Tor wider community of users (not just TPI/TPA)
-
status updates and progress: "post status problem we know about
so the world can learn if problems are known to the Tor team."
- example: "[recent] v3 outage where we could have put out a small FAQ right away (go static HTML!) and then update the world as we figure out the problem but also expected return to normal."
- multi-stakeholder: "easily editable by many of us namely likely the network health team and we could also have the network team to help out"
- simple to deploy and use: pushing an update shouldn't require complex software or procedures. editing a text file, committing and pushing, or building with a single command and pushing the HTML, for example, is simple enough. installing a MySQL database and PHP server, for example, is not simple enough.
- keep it simple
- free-software based
Nice to have
- deployment through GitLab (pages?), with contingency plans
- separate TLD to thwart DNS-based attacks against torproject.org
- same tool for multiple teams
- per-team filtering
- RSS feeds
- integration with social media?
- responsive design
Non-Goals
- automation: updating the site is a manual process. no automatic reports of sensors/metrics or Nagios, as this tends to complicate the implementation and cause false positives
Approvals required
TPA, network team, network health team.