TPA team issueshttps://gitlab.torproject.org/tpo/tpa/team/-/issues2022-04-07T16:00:21Zhttps://gitlab.torproject.org/tpo/tpa/team/-/issues/33786establish the "standard" virtual machine / instance size in Ganeti2022-04-07T16:00:21Zanarcatestablish the "standard" virtual machine / instance size in GanetiGaneti clusters can define parameters for minimum, maximum and "standard" instance sizes. This is currently:
```
# gnt-cluster info
[...]
Instance policy - limits for instances:
bounds specs:
- max/0:
cpu-count: 8
...Ganeti clusters can define parameters for minimum, maximum and "standard" instance sizes. This is currently:
```
# gnt-cluster info
[...]
Instance policy - limits for instances:
bounds specs:
- max/0:
cpu-count: 8
disk-count: 16
disk-size: 1048576
memory-size: 32768
nic-count: 8
spindle-use: 12
min/0:
cpu-count: 1
disk-count: 1
disk-size: 1024
memory-size: 128
nic-count: 1
spindle-use: 1
std:
cpu-count: 1
disk-count: 1
disk-size: 1024
memory-size: 128
nic-count: 1
spindle-use: 1
allowed disk templates: drbd, plain
vcpu-ratio: 4
spindle-ratio: 32
[...]
```
We should at least define some sort of "standard" here and define what the minimum and maximums should be.
for what it's worth, the average memory size right now is around 5GB
```
root@fsn-node-01:~# echo \($(gnt-instance list | awk '{ print $NF }' | grep G'$' | sed 's/G$/+/')0\) / $(gnt-instance list | awk '{ print $NF }' | grep G'$' | wc -l) | bc -l
4.80769230769230769230
```
more investigation would be required to evaluate standard disk and CPU sizes.https://gitlab.torproject.org/tpo/tpa/team/-/issues/33766DNS renumbering procedure fails if git server is unavailable2022-04-07T16:08:36ZanarcatDNS renumbering procedure fails if git server is unavailableit's unclear to me how to renumber a server that *only* has a record in LDAP (as opposed to something in `dns/domains.git` as well).
in legacy/trac#33730, the DNS records were not picked up by nevii, the authoritative nameserver, even a...it's unclear to me how to renumber a server that *only* has a record in LDAP (as opposed to something in `dns/domains.git` as well).
in legacy/trac#33730, the DNS records were not picked up by nevii, the authoritative nameserver, even after running `ud-generate` on alberti and `ud-replicate` on nevii. and indeed, those commands only populate `/var/lib/misc/` on nevii, which doesn't touch the full zone loaded by bind.
so this needs clarification and documentation and aaargh.https://gitlab.torproject.org/tpo/tpa/team/-/issues/33477automate retirements2022-04-07T16:20:57Zanarcatautomate retirements[we're removing a surprising number of servers. in the last few months, we're retired about half a dozen servers. that procedure is currently entirely manual, and quite error-prone, especially because of at(1) jobs errors or omitted step...[we're removing a surprising number of servers. in the last few months, we're retired about half a dozen servers. that procedure is currently entirely manual, and quite error-prone, especially because of at(1) jobs errors or omitted steps. backup removal can be forgotten or typo'd and we've forgotten to remove entries in the spreadsheet or nagios a few times.
we should automate this process. this also has the added benefit of simplifying the _migration_ process: we have a ton of servers that need to move from libvirt into the ganeti cluster, and part of that process involves retiring the old copy of the server.
right now the documentation is in the [retire-a-host](https://help.torproject.org/tsa/howto/retire-a-host/) page, but i somehow got into the habit of calling tickets `decomission host FOO`. so we need to decide between "retire" and "decommission" as a naming convention, before we start writing code.
the two contestants are:
* retire, retirement, retiring, retired
* decommission, decommissioning procedure, decommissioning, decommisssioned
a quick IRC survey indicates people favor the former family because it's shorter and more familiar.
decommission was seen as a "nice word". it's also less ambiguous: "retire" can also refer to a user, and it could also imply the host sticks around for a while and rant about pain in his lower backs and babble nonsense when guests are around, just to embarrass us. the problem with decommission is that I can't spell it to save my life. it also doesn't have a "name" (an action, like "retirement") so it makes it sometimes awkward to refer to.
so we favor converging over "retire" for now.anarcatanarcathttps://gitlab.torproject.org/tpo/tpa/team/-/issues/33332move root passwords to trocla?2022-04-07T15:58:50Zanarcatmove root passwords to trocla?one manual step of our install process is to initialize the root password and set it in the password manager. that manual step could be completely skipped if we just set the root password in trocla.one manual step of our install process is to initialize the root password and set it in the password manager. that manual step could be completely skipped if we just set the root password in trocla.https://gitlab.torproject.org/tpo/tpa/team/-/issues/33062investigate kreb's advice on DNS hijacking2022-06-03T23:47:50Zanarcatinvestigate kreb's advice on DNS hijackingAfter reviewing [this article about recent DNS hijacking incidents](https://krebsonsecurity.com/2019/02/a-deep-dive-on-the-recent-widespread-dns-hijacking-attacks/), I think it might be worth reviewing the recommendations given in the ar...After reviewing [this article about recent DNS hijacking incidents](https://krebsonsecurity.com/2019/02/a-deep-dive-on-the-recent-widespread-dns-hijacking-attacks/), I think it might be worth reviewing the recommendations given in the article, which are basically:
1. [x] use DNSSEC
2. [ ] Use registration features like Registry Lock that can help protect domain names records from being changed
3. [ ] Use access control lists for applications, Internet traffic and monitoring
4. [ ] Use 2-factor authentication, and require it to be used by all relevant users and subcontractors
5. [x] In cases where passwords are used, pick unique passwords and consider password managers
6. [ ] Review accounts with registrars and other providers
7. [ ] Monitor certificates by monitoring, for example, Certificate Transparency Logs (#40677)
Some of those are impractical: for example 2FA will not work for us if we have one shared account with a provider.
Others have already been done: we have a good DNSSEC deployment and manage passwords properly.
Mainly, I'm curious about investigating Registry lock and CT logs monitoring, the latter which could be added as a Nagios thing, maybe.https://gitlab.torproject.org/tpo/tpa/team/-/issues/31969deploy a puppet dashboard2023-11-23T21:08:17Zanarcatdeploy a puppet dashboardit would be useful to have a way to browse reports and facts in the cluster. there's a lot of information in the PuppetDB that's only visible when you inspect the database, and it would help to have a way to browse this and diagnose issu...it would be useful to have a way to browse reports and facts in the cluster. there's a lot of information in the PuppetDB that's only visible when you inspect the database, and it would help to have a way to browse this and diagnose issues with puppet.https://gitlab.torproject.org/tpo/tpa/team/-/issues/31633publish HTML documentation of our puppet source2022-04-06T20:54:12Zanarcatpublish HTML documentation of our puppet sourcethere are ways of generating HTML versions of Puppet source code, based on the docstrings littering the source code. i've done some tentative runs of this and it looks ... interesting. the utility of this is currently limited by the fact...there are ways of generating HTML versions of Puppet source code, based on the docstrings littering the source code. i've done some tentative runs of this and it looks ... interesting. the utility of this is currently limited by the fact that only 35% of the source is documented, according to `puppet strings`, but i figured I would document the efforts I've done so far already.
Koumbit uses the following Rakefile to generate the docs for their monorepo:
```
#require 'bundler/gem_tasks'
task :default do
# nothing
puts('no action')
end
task :doc do
require 'puppet-strings/tasks/generate'
# This doesn't seem to really process node files, but
# an exclude of manifests/ might be interesting.
Rake::Task['strings:generate'].invoke(
# This list of included files was taken from
# https://github.com/puppetlabs/puppet-strings#generating-documentation-with-puppet-strings
# and should correspond to what puppet-strings does by default, but spanned
# over all of the code directories in the control repos.
# It's possible that some directories might include .rb files that were not
# specified.. We'll have to fix this if we ever encounter such an issue.
'**/manifests/**/*.pp **/functions/**/*.pp **/types/**/*.pp **/tasks/**/*.pp **/lib/**/*.rb',
'false',
'false',
'markdown'
)
end
# Generate documentation only for manifests in site/
# This will help to verify if there's anything in our own code that's missing
# comments for documentation. The run will be faster and less noisy than when
# we generate everything.
# Note, though, that it will create an index only for things in site/
task :doc_site do
require 'puppet-strings/tasks/generate'
# This doesn't seem to really process node files, but
# an exclude of manifests/ might be interesting.
Rake::Task['strings:generate'].invoke(
'site/**/*.pp site/**/*.rb',
'false',
'false',
'markdown'
)
end
task :doc_clean do
system("rm -rf doc")
end
task :doc_upload, [:ftp_host, :ftp_port, :ftp_user, :ftp_pass, :ftp_dir] do |t, args|
puts "lftp -e \"mirror -R doc #{args[:ftp_dir]}\" -u #{args[:ftp_user]},#{args[:ftp_pass]} -p #{args[:ftp_port]} #{args[:ftp_host]}"
system("lftp -e \"mirror -R doc #{args[:ftp_dir]}; quit\" -u #{args[:ftp_user]},#{args[:ftp_pass]} -p #{args[:ftp_port]} #{args[:ftp_host]}")
end
```
Notice the two different jobs for `site` (private) and `modules` (public).https://gitlab.torproject.org/tpo/tpa/team/-/issues/30672Ask holder of torproject.be to stop serving the zone2022-04-07T16:05:57ZLinus Nordberglinus@torproject.orgAsk holder of torproject.be to stop serving the zoneWe've asked the holder of torproject.be to stop serving the zone in ticket legacy/trac#27951.
Tracking progress here.We've asked the holder of torproject.be to stop serving the zone in ticket legacy/trac#27951.
Tracking progress here.https://gitlab.torproject.org/tpo/tpa/team/-/issues/30671Ask holder of torproject.fr to stop serving the zone2022-04-07T16:05:55ZLinus Nordberglinus@torproject.orgAsk holder of torproject.fr to stop serving the zoneWe've asked the holder of torproject.fr to stop serving the zone in legacy/trac#27951.
Tracking progress here.We've asked the holder of torproject.fr to stop serving the zone in legacy/trac#27951.
Tracking progress here.https://gitlab.torproject.org/tpo/tpa/team/-/issues/30273improve inventory of hardware resources2023-11-09T15:38:05Zanarcatimprove inventory of hardware resourcesWe currently have a few hosting providers and locations where we have "stuff":
* virtual machines
* colocated servers
* raspberri pi under desk
* routers
* "cloud" things (like AWS)
* test machines
* etc
TPO machines are current...We currently have a few hosting providers and locations where we have "stuff":
* virtual machines
* colocated servers
* raspberri pi under desk
* routers
* "cloud" things (like AWS)
* test machines
* etc
TPO machines are currently documented in LDAP. But they are also in Puppet. And there's a spreadsheet (which we want to replace with something else, probably a grafana dashboard, in legacy/trac#29816). And there are many things (like AWS) which are not really tracked formally anywhere that I am aware of.
So this project is about establishing a clearer process to keep such an inventory. It should at least cover the following, TPO-managed infrastructure:
* physical servers
* virtual machines on those physical servers *or* on other cloud providers
Ideally, we would also have a unified view of this for all machines paid for by TPI, regardless of the team.
Each machine should have documentation on:
* remote console access or control panel
* cost
* location
* responsible team
* purpose
* age and lifecycle (see parent legacy/trac#29304)
The last bit is of course related to another problem, which is lifecycle management (see parent ticket legacy/trac#29304).
A lot of that stuff is currently in LDAP and maybe it should just be added there. But I wonder if it would be useful to create another system (which might eventually supersede LDAP) that would be more flexible. If that process would happen at all, we would first need to thoroughly document how hosts are integrated into LDAP and so on, of course.cleanup and publish the sysadmin codebasehttps://gitlab.torproject.org/tpo/tpa/team/-/issues/29409Host-alive checks (ping) on IPv62022-04-07T16:05:28ZLinus Nordberglinus@torproject.orgHost-alive checks (ping) on IPv6https://gitlab.torproject.org/tpo/tpa/team/-/issues/29394Find another authoritative DNS provider2022-04-07T16:00:27ZLinus Nordberglinus@torproject.orgFind another authoritative DNS provider"Shop around and figure out prices"
cf https://trac.torproject.org/projects/tor/wiki/org/meetings/2019BrusselsAdminTeamMinutes#DNSproviders"Shop around and figure out prices"
cf https://trac.torproject.org/projects/tor/wiki/org/meetings/2019BrusselsAdminTeamMinutes#DNSprovidershttps://gitlab.torproject.org/tpo/tpa/team/-/issues/29386Implement and deploy script for spamming people about account, group and host...2022-05-03T17:45:37ZLinus Nordberglinus@torproject.orgImplement and deploy script for spamming people about account, group and host expirationweasel (Peter Palfrader)weasel (Peter Palfrader)https://gitlab.torproject.org/tpo/tpa/team/-/issues/29385Adapt LDAP scripts to honour expiration dates2022-05-03T17:45:37ZLinus Nordberglinus@torproject.orgAdapt LDAP scripts to honour expiration datesweasel (Peter Palfrader)weasel (Peter Palfrader)https://gitlab.torproject.org/tpo/tpa/team/-/issues/29384Add to LDAP, for each group, an expiration date2022-05-03T17:45:37ZLinus Nordberglinus@torproject.orgAdd to LDAP, for each group, an expiration dateweasel (Peter Palfrader)weasel (Peter Palfrader)https://gitlab.torproject.org/tpo/tpa/team/-/issues/29383Add to LDAP, for each user account, an expiration date2022-05-03T17:45:37ZLinus Nordberglinus@torproject.orgAdd to LDAP, for each user account, an expiration dateweasel (Peter Palfrader)weasel (Peter Palfrader)https://gitlab.torproject.org/tpo/tpa/team/-/issues/29382Add to LDAP, for each host, expiration date and list of "stakeholders"2022-05-03T17:45:37ZLinus Nordberglinus@torproject.orgAdd to LDAP, for each host, expiration date and list of "stakeholders"https://gitlab.torproject.org/tpo/tpa/team/-/issues/29381Add to LDAP expiration date and list of "stakeholders"2022-05-03T17:45:37ZLinus Nordberglinus@torproject.orgAdd to LDAP expiration date and list of "stakeholders"cf https://trac.torproject.org/projects/tor/wiki/org/meetings/2019BrusselsAdminTeamMinutescf https://trac.torproject.org/projects/tor/wiki/org/meetings/2019BrusselsAdminTeamMinuteshttps://gitlab.torproject.org/tpo/tpa/team/-/issues/29380Link to db.tpo from infrastructure page, for service expiration dates2022-05-03T17:45:36ZLinus Nordberglinus@torproject.orgLink to db.tpo from infrastructure page, for service expiration datesInformation about when a system expires will be present in LDAP.
Add appropriate links to db.tpo on https://trac.torproject.org/projects/tor/wiki/org/operations/Infrastructure, per service, making it easy to figure out when a service is...Information about when a system expires will be present in LDAP.
Add appropriate links to db.tpo on https://trac.torproject.org/projects/tor/wiki/org/operations/Infrastructure, per service, making it easy to figure out when a service is about to expire.https://gitlab.torproject.org/tpo/tpa/team/-/issues/29306Write a script to mail people informing them about expiration of their service2022-05-03T17:45:36ZJens KubiezielWrite a script to mail people informing them about expiration of their serviceWe agreed that machines should have an expiration date (see legacy/trac#29304). For this we need a script which looks at the expiration date, sends a mail out to service people informing them about it and suggesting possible actions. wea...We agreed that machines should have an expiration date (see legacy/trac#29304). For this we need a script which looks at the expiration date, sends a mail out to service people informing them about it and suggesting possible actions. weasel agreed to write it.weasel (Peter Palfrader)weasel (Peter Palfrader)