Skip to content
Snippets Groups Projects

Roll call: who's there and emergencies

gaba, hiro and anarcat on mumble, weasel (briefly) checked in on IRC.

No emergencies.

BTCPayServer hosting

team#33750 (closed)

We weren't receiving donations so hiro setup this service on Lunanode because we were in a rush. We're still not receiving donations, but that's because of troubles with the wallet that hiro will resolve out of band.

So this issue is about where we host this service: at Lunanode, or within TPA? The Lunanode server is already a virtual machine running Docker (and not a "pure container" thing) so we need to perform upgrades, create users and so on in the virtual machine.

Let's host it, because we kind of already do anyways: it's just that only hiro has access for now.

Let's host this in a VM in the new Ganeti cluster at Cymru. If the performance is not good enough (because the spec mentions SSD, which we do not have at Cymru: we have SAS), make some room at Hetzner by migrating some other machines to Cymru and then create the VM at Hetzner.

hiro is lead on the next steps.

Tor browser build VM - review requirements

team#34122 (closed)

Brief discussion about the security implications of enabling user namespaces in a Debian server. By default this is disabled in Debian because of concerns that the possible elevated privileges ("root" inside a namespace) can be leveraged to get root outside of the namespace. In the Debian bug report discussing this, anarcat asked why exactly this was still disabled and Ben Hutchings responded by giving a few examples of security issues that were mitigated by this.

But because, in our use case, the alternative is to give root directly, it seems that enabling user namespaces is a good mitigation. Worst case our users get root access, but that's not worse than giving them root directly. So we are go on granting user namespace access.

The virtual machine will be created in the new Cymru cluster, assuming disk performance is satisfactory.

TPA-RFC-7: root access policy

https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-7-root

Anarcat presented the proposal draft as sent to the team on November 9th. A few questions remained in the draft:

  1. what is the process to allow/revoke access to the TPA team?
  2. is the new permissions (to grant limited sudo rights to some service admins) acceptable?

In other services, we use a vetting process: a sponsor that already has access should file the ticket for the person, the person doesn't request access. That is basically how it works for TPA as well. The revocation procedure was not directly discussed and still needs to be drafted.

It was noted that other teams have servers outside of TPA (karsten, phw and cohosh for example) because of the current limitations, so other people might use those accesses as well. It will be worth talking with other stakeholders about this proposal to make sure it is attuned to the other teams' requirements. Think about the issue with Prometheus right now which is a good counter-example of when service admins do not require root on the servers (issue 40089).

Another example is the onionperf servers that were setup elsewhere because they needed custom iptables rules. this might not require root but just iptables access, or at least special iptables rules configured by TPA.

In general, the spirit of the proposal is to bring more flexibility with what changes we allow on servers to the TPA team. We want to help teams host their servers with us but that also comes with the understanding that we need the capacity (in terms of staff and hardware resources) to do so as well. This was agreed upon by the people present in the mumble meeting, so anarcat will finish the draft and propose it formally to the team later.

Roadmap review

Did not have time to review the team board.

anarcat ranted about people not updating their ticket and was (rightly) corrected that people are updating their tickets. So keep up the good work!

We noted that the top-level TPA board is not used for triage because it picks up too many tickets, outside of the core TPA team, that we cannot do anything about (e.g. the outreachy stuff in the GitLab lobby).

Other discussions

Should we rotate triage responsibility bi-weekly or monthly?

Will be discussed on IRC, email, or in a later meeting later, as we ran out of time.

Next meeting

We should resume our normal schedule of doing a meeting the first Wednesday of the month, which brings us to December 2nd 2020, at 1500UTC, which is equivalent to: 07:00 US/Pacific, 10:00 US/Eastern, 16:00 Europe/Paris

Metrics of the month

  • hosts in Puppet: 78, LDAP: 81, Prometheus exporters: 132
  • number of apache servers monitored: 28, hits per second: 199
  • number of nginx servers: 2, hits per second: 2, hit ratio: 0.87
  • number of self-hosted nameservers: 6, mail servers: 12
  • pending upgrades: 36, reboots: 0
  • average load: 0.64, memory available: 1.43 TiB/2.02 TiB, running processes: 480
  • bytes sent: 243.83 MB/s, received: 138.97 MB/s
  • planned buster upgrades completion date: 2020-09-16
  • GitLab tickets: 126 issues including...
    • open: 1
    • icebox: 84
    • backlog: 32
    • next: 5
    • doing: 4
    • (closed: 2119)

Note that only two "stretch" machines remain and the "buster" upgrade is considered mostly complete: those two machines are the SVN and Trac servers which are both scheduled for retirement.

Upgrade prediction graph (which is becoming a "how many machines do we have graph") still lives at https://help.torproject.org/tsa/howto/upgrades/

Now also available as the main Grafana dashboard. Head to https://grafana.torproject.org/, change the time period to 30 days, and wait a while for results to render.