2020-11-18.md



Roll call: who's there and emergencies
gaba, hiro and anarcat on mumble, weasel (briefly) checked in on IRC.
No emergencies.

BTCPayServer hosting
team#33750 (closed)
We weren't receiving donations so hiro setup this service on
Lunanode because we were in a rush. We're still not receiving
donations, but that's because of troubles with the wallet that hiro
will resolve out of band.
So this issue is about where we host this service: at Lunanode, or
within TPA? The Lunanode server is already a virtual machine running
Docker (and not a "pure container" thing) so we need to perform
upgrades, create users and so on in the virtual machine.
Let's host it, because we kind of already do anyways: it's just that
only hiro has access for now.
Let's host this in a VM in the new Ganeti cluster at Cymru. If the
performance is not good enough (because the spec mentions SSD, which
we do not have at Cymru: we have SAS), make some room at Hetzner by
migrating some other machines to Cymru and then create the VM at
Hetzner.
hiro is lead on the next steps.

Tor browser build VM - review requirements
team#34122 (closed)
Brief discussion about the security implications of enabling user
namespaces in a Debian server. By default this is disabled in Debian
because of concerns that the possible elevated privileges ("root"
inside a namespace) can be leveraged to get root outside of the
namespace. In the Debian bug report discussing this, anarcat
asked why exactly this was still disabled and Ben Hutchings
responded by giving a few examples of security issues that were
mitigated by this.
But because, in our use case, the alternative is to give root
directly, it seems that enabling user namespaces is a good
mitigation. Worst case our users get root access, but that's not worse
than giving them root directly. So we are go on granting user
namespace access.
The virtual machine will be created in the new Cymru cluster, assuming
disk performance is satisfactory.

TPA-RFC-7: root access policy
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-7-root
Anarcat presented the proposal draft as sent to the team on November
9th. A few questions remained in the draft:

what is the process to allow/revoke access to the TPA team?
is the new permissions (to grant limited sudo rights to some
service admins) acceptable?

In other services, we use a vetting process: a sponsor that already
has access should file the ticket for the person, the person doesn't
request access. That is basically how it works for TPA as well. The
revocation procedure was not directly discussed and still needs to be
drafted.
It was noted that other teams have servers outside of TPA (karsten,
phw and cohosh for example) because of the current limitations, so
other people might use those accesses as well. It will be worth
talking with other stakeholders about this proposal to make sure it is
attuned to the other teams' requirements. Think about the issue with
Prometheus right now which is a good counter-example of when service
admins do not require root on the servers (issue 40089).
Another example is the onionperf servers that were setup elsewhere
because they needed custom iptables rules. this might not require
root but just iptables access, or at least special iptables rules
configured by TPA.
In general, the spirit of the proposal is to bring more flexibility
with what changes we allow on servers to the TPA team. We want to help
teams host their servers with us but that also comes with the
understanding that we need the capacity (in terms of staff and
hardware resources) to do so as well. This was agreed upon by the
people present in the mumble meeting, so anarcat will finish the draft
and propose it formally to the team later.

Roadmap review
Did not have time to review the team board.
anarcat ranted about people not updating their ticket and was
(rightly) corrected that people are updating their tickets. So keep
up the good work!
We noted that the top-level TPA board is not used for triage
because it picks up too many tickets, outside of the core TPA team,
that we cannot do anything about (e.g. the outreachy stuff in the
GitLab lobby).

Other discussions

Should we rotate triage responsibility bi-weekly or monthly?
Will be discussed on IRC, email, or in a later meeting later, as we
ran out of time.

Next meeting
We should resume our normal schedule of doing a meeting the first
Wednesday of the month, which brings us to December 2nd 2020, at
1500UTC, which is equivalent to: 07:00 US/Pacific, 10:00 US/Eastern,
16:00 Europe/Paris

Metrics of the month

hosts in Puppet: 78, LDAP: 81, Prometheus exporters: 132
number of apache servers monitored: 28, hits per second: 199
number of nginx servers: 2, hits per second: 2, hit ratio: 0.87
number of self-hosted nameservers: 6, mail servers: 12
pending upgrades: 36, reboots: 0
average load: 0.64, memory available: 1.43 TiB/2.02 TiB, running processes: 480
bytes sent: 243.83 MB/s, received: 138.97 MB/s
planned buster upgrades completion date: 2020-09-16

GitLab tickets: 126 issues including...

open: 1
icebox: 84
backlog: 32
next: 5
doing: 4
(closed: 2119)


Note that only two "stretch" machines remain and the "buster" upgrade
is considered mostly complete: those two machines are the SVN and Trac
servers which are both scheduled for retirement.
Upgrade prediction graph (which is becoming a "how many machines do we
have graph") still lives at
https://help.torproject.org/tsa/howto/upgrades/
Now also available as the main Grafana dashboard. Head to
https://grafana.torproject.org/, change the time period to 30 days,
and wait a while for results to render.