About us
The sysadmin team is responsible for managing machines under the torproject.org
domain. It does not operate the Tor network in any form nor is it responsible for all services running on torproject.org
: that is the job of the various service admins responsible of those services.
Most of the documentation of the sysadmin team is in a a different wiki for now.
Roadmap
This page documents a possible roadmap for the TPA team for the year 2020.
Items should be SMART, that is:
- specific
- measurable
- achievable
- relevant
- time-bound
Main objectives (need to have):
- decommissining of old machines (moly in particular)
- move critical services in ganeti
- buster upgrades before LTS
- within budget
Secondary objectives (nice to have):
- new mail service
- conversion of the kvm* fleet to ganeti for higher reliability and availability
- buster upgrade completion before anarcat vacation
Non-objective:
- service admin roadmapping?
- kubernetes cluster deployment?
Assertions:
- new gnt-fsn nodes with current hardware (PX62-NVMe, 118EUR/mth), cost savings possible with the AX line (-20EUR/mth) or by reducing disk space requirements (-39EUR/mth) per node
- cymru actually delivers hardware and is used for moly decom
- gitlab hardware requirements covered by another budget
- we absorb the extra bandwidth costs from the new hardware design (currently 38EUR per month but could rise when new bandwidth usage comes in) - could be shifted to TBB team or at least labeled as such
TODO
- nextcloud roadmap
- identify critical services and realistic improvements #31243 (moved) (done)
- (anarcat & gaba) sort out each month by priority (mostly done for feb/march)
- (gaba) add keywords #tpa-roadmap- for each month (doing for february and march to test how this would work) (done)
- (anarcat) create missing tickets for february/march (partially done, missing some from hiro)
- (at tpa meeting) estimate tickets! (1pt = 1 day)
- (gaba) reorganize budget file per month
- (gaba) create a roadmap for gitlab migration
- (gaba) find service admins for gitlab (nobody for trac in services page) - gaba to talk with isa and alex and look for service admins (sent a mail to las vegas but nobody replied... I will talk with each team lead)
- have a shell account in the server
- restart/stop service
- upgrade services
- problems with the service
January
- catchup after holidays
- agree internally on a roadmap for 2020
- first phase of installer automation (setup-storage and friends) #31239 (moved)
- new FSN node in the Ganeti cluster (fsn-node-03) #32937 (moved)
- textile shutdown and VM relocation, 2 VMs to migrate #31686 (moved) (+86EUR)
- enable needrestart fleet-wide (#31957 (moved))
- review website build errors (#32996 (moved))
- evaluate if discourse can be used as comments platform for the blog (#33105 (moved)) <-- can we move this further down the road (not february) until gitlab is migrated? -->
- communicate buster upgrade timeline to service admins DONE
- buster upgrade 63% done: 48 buster, 28 stretch machines
February
capacity around 15 days (counting 2.5 days per week for anarcat and 5 days per month for hiro)
TicketQuery(keywords~=tpa-roadmap-february,format=progress)
- 2020 roadmap officially adopted - done
- second phase of installer automation #31239 (moved) (esp. puppet automation, e.g. #32901 (moved), #32914 (moved)) - done
- new gnt-fsn node (fsn-node-04) -118EUR=+40EUR (#33081 (moved)) - done
- storm shutdown #32390 (moved) - done
- unifolium decom (after storm), 5 VMs to migrate, #33085 (moved) +72EUR=+158EUR - not completed
- buster upgrade 70% done: 53 buster (+5), 23 stretch (-5) - done: 54 buster (+6), 22 stretch (-6), 1 jessie
- migrate gitlab-01 to a new VM (gitlab-02) and use the omnibus package instead of ansible (#32949 (moved)) - done
- migrate CRM machines to gnt and test with Giant Rabbit #32198 (moved) (priority) - not done
- automate upgrades: enable unattended-upgrades fleet-wide (#31957 (moved) ) - not done
- anti-censorship monitoring (external prometheus setup assistance) #31159 (moved) - not done
[[TicketQuery(keywords~=tpa-roadmap-february,format=table,order=priority,changetime,desc=false,col=id|summary|status|points|actualpoints|priority|severity|changetime|sponsor,group=owner,max=100)]]
March
capacity around 15 days (counting 2.5 days per week for anarcat and 5 days per month for hiro)
TicketQuery(keywords~=tpa-roadmap-march,format=progress)
High possibility of overload here (two major decoms and many machines setup). Possible to push moly/cymru work to april?
- 2021 budget proposal?
- possible gnt-cymru cluster setup (~6 machines) #29397 (moved)
- moly decom #29974 (moved), 5 VMs to migrate
- kvm3 decom, 7 VMs to migrate (inc. crm-int and crm-ext), #33082 (moved) +72EUR=+112EUR
- new gnt-fsn node (fsn-node-05) #33083 (moved) -118EUR=-6EUR
- eugeni VM migration to gnt-fsn #32803 (moved)
- buster upgrade 80% done: 61 buster (+8), 15 stretch (-8)
- solr deployment (#33106 (moved))
- anti-censorship monitorining (external prometheus setup assistance) #31159 (moved)
- nc.riseup.net cleanup #32391 (moved)
- SVN shutdown? #17202 (moved)
[[TicketQuery(keywords~=tpa-roadmap-march,format=table,order=priority,changetime,desc=false,col=id|summary|status|points|actualpoints|priority|severity|changetime|sponsor,group=owner,max=100)]]
April
TicketQuery(keywords~=tpa-roadmap-april,format=progress)
- kvm4 decom, 9 VMs to migrate #32802 (moved) (w/o eugeni), +121EUR=+115EUR
- new gnt-fsn node (fsn-node-06) -118EUR=-3EUR
- buster upgrade 90% done: 68 buster (+7), 8 stretch (-7)
- solr configuration
[[TicketQuery(keywords~=tpa-roadmap-april,format=table,order=priority,changetime,desc=false,col=id|summary|status|points|actualpoints|priority|severity|changetime|sponsor,group=owner,max=100)]]
May
TicketQuery(keywords~=tpa-roadmap-may,format=progress)
- kvm5 decom, 9 VMs to migrate #33084 (moved), +111EUR=+108EUR
- new gnt-fsn node (fsn-node-07) -118EUR=-10EUR
- buster upgrade 100% done: 76 buster (+8), 0 stretch (-8)
- current planned completion date of Buster upgrades
- start ramping down work, training and documentation
- solr text updates and maintenance
[[TicketQuery(keywords~=tpa-roadmap-may,format=table,order=priority,changetime,desc=false,col=id|summary|status|points|actualpoints|priority|severity|changetime|sponsor,group=owner,max=100)]]
June
TicketQuery(keywords~=tpa-roadmap-june,format=progress)
- Debian jessie LTS EOL, chiwui forcibly shutdown #29399 (moved)
- finish ramp-down, final bugfixing and training before vacation
- search.tp.o soft launch
[[TicketQuery(keywords~=tpa-roadmap-june,format=table,order=priority,changetime,desc=false,col=id|summary|status|points|actualpoints|priority|severity|changetime|sponsor,group=owner,max=100)]]
July
TicketQuery(keywords~=tpa-roadmap-july,format=progress)
- Debian stretch EOL, final deadline for buster upgrades
- anarcat vacation
- tor meeting?
- hiro tentative vacations
[[TicketQuery(keywords~=tpa-roadmap-july,format=table,order=priority,changetime,desc=false,col=id|summary|status|points|actualpoints|priority|severity|changetime|sponsor,group=owner,max=100)]]
August
TicketQuery(keywords~=tpa-roadmap-august,format=progress)
- anarcat vacation
- web metrics R&D (investigate a platform for web metrics) (#32996 (moved))
[[TicketQuery(keywords~=tpa-roadmap-august,format=table,order=priority,changetime,desc=false,col=id|summary|status|points|actualpoints|priority|severity|changetime|sponsor,group=owner,max=100)]]
September
TicketQuery(keywords~=tpa-roadmap-september,format=progress)
- plan contingencies for christmas holidays
- catchup following vacation
- web metrics deployment
[[TicketQuery(keywords~=tpa-roadmap-september,format=table,order=priority,changetime,desc=false,col=id|summary|status|points|actualpoints|priority|severity|changetime|sponsor,group=owner,max=100)]]
October
TicketQuery(keywords~=tpa-roadmap-october,format=progress)
- puppet work (finish prometheus module development, puppet environments, trocla, Hiera, publish code #29387 (moved))
- varnish to nginx conversion #32462 (moved)
- web metrics soft launch (in time for eoy campaign)
- submit service R&D #30608 (moved)
[[TicketQuery(keywords~=tpa-roadmap-october,format=table,order=priority,changetime,desc=false,col=id|summary|status|points|actualpoints|priority|severity|changetime|sponsor,group=owner,max=100)]]
November
TicketQuery(keywords~=tpa-roadmap-november,format=progress)
- first submit service prototype? #30608 (moved)
[[TicketQuery(keywords~=tpa-roadmap-november,format=table,order=priority,changetime,desc=false,col=id|summary|status|points|actualpoints|priority|severity|changetime|sponsor,group=owner,max=100)]]
December
TicketQuery(keywords~=tpa-roadmap-december,format=progress)
- stabilisation & bugfixing
- 2021 roadmapping
- one or two week xmas holiday
- CCC?
[[TicketQuery(keywords~=tpa-roadmap-december,format=table,order=priority,changetime,desc=false,col=id|summary|status|points|actualpoints|priority|severity|changetime|sponsor,group=owner,max=100)]]
2021 preview
Objectives:
- complete puppetization
- experiment with containers/kubernetes?
- close and merge more services
- replace nagios with prometheus? #29864 (moved)
- new hire?
Monhtly goals:
- january: roadmap approval
- march/april: anarcat vacation