This document is manually synchronized regularly between and . Keep the format in markdown for now. It should eventually be moved into Trac keywords.
[[!toc]]
Introduction
============
This page documents a possible roadmap for the TPA team for the year 2020.
Items should be "SMART" , that is:
* specific
* measurable
* achievable
* relevant
* time-bound
DEPRECATED (only to be able to have gaba's notes on the roadmap)
TODO
----
* nextcloud roadmap
* identify critical services and realistic improvements [#31243][] (done)
* (anarcat & gaba) sort out each month by priority (mostly done for feb/march)
* (gaba) add keywords #tpa-roadmap- for each month (doing for february and march to test how this would work) (done)
* (anarcat) create missing tickets for february/march (partially done, missing some from hiro)
* (at tpa meeting) estimate tickets! (1pt = 1 day)
* (gaba) reorganize budget file per month https://nc.torproject.net/apps/onlyoffice/7374?filePath=%2FTeams%2FSysadmin%2FBudget%20Sysadmin.xlsx
* (gaba) create a roadmap for gitlab migration
* (gaba) find service admins for gitlab (nobody for trac in https://trac.torproject.org/projects/tor/wiki/org/operations/services ) - gaba to talk with isa and alex and look for service admins (sent a mail to las vegas but nobody replied... I will talk with each team lead)
* have a shell account in the server
* restart/stop service
* upgrade services
* problems with the service
[#31243]: https://bugs.torproject.org/31243
Main objectives (need to have):
* decommissining of old machines (moly in particular)
* move critical services in ganeti
* buster upgrades before LTS
* within budget
Secondary objectives (nice to have):
* new mail service
* conversion of the kvm* fleet to ganeti for higher reliability and availability
* buster upgrade completion before anarcat vacation
Non-objective:
* service admin roadmapping?
* kubernetes cluster deployment?
Assertions:
* new gnt-fsn nodes with current hardware (PX62-NVMe, 118EUR/mth), cost savings possible with the AX line (-20EUR/mth) or by reducing disk space requirements (-39EUR/mth) per node
* cymru actually delivers hardware and is used for moly decom
* gitlab hardware requirements covered by another budget
* we absorb the extra bandwidth costs from the new hardware design (currently 38EUR per month but could rise when new bandwidth usage comes in) - could be shifted to TBB team or at least labeled as such
January
=======
* [x] catchup after holidays
* [x] agree internally on a roadmap for 2020
* [x] first phase of installer automation (setup-storage and friends) [#31239][]
* [x] new FSN node in the Ganeti cluster (fsn-node-03) [#32937][]
* [ ] textile shutdown and VM relocation, 2 VMs to migrate [#31686][] (+86EUR) IN PROGRESS
* [ ] enable needrestart fleet-wide ([#31957][])
* [ ] review website build errors ([#32996][])
* [ ] evaluate if discourse can be used as comments platform for the blog ([#33105][]) <-- can we move this further down the road (not february) until gitlab is migrated? -->
* [x] communicate buster upgrade timeline to service admins DONE
* [x] buster upgrade 63% done: 48 buster, 28 stretch machines
[#31957]: https://bugs.torproject.org/31957
[#31686]: https://bugs.torproject.org/31686
[#32937]: https://bugs.torproject.org/32937
[#31239]: https://bugs.torproject.org/31239
[#32996]: https://bugs.torproject.org/32996
[#33105]: https://bugs.torproject.org/33105
February
========
* 2020 roadmap officially adopted
* second phase of installer automation [#31239][] (esp. puppet automation, e.g. [#32901][], [#32914][])
* new gnt-fsn node (fsn-node-04) -118EUR=+40EUR ([#33081][])
* storm shutdown [#32390][]
* unifolium decom (after storm), 5 VMs to migrate, [#33085][] +72EUR=+158EUR
* buster upgrade 70% done: 53 buster (+5), 23 stretch (-5)
* migrate gitlab-01 to a new VM (gitlab-02) and use the omnibus package instead of ansible ([#32949][])
* migrate CRM machines to gnt and test with Giant Rabbit [#32198][] (priority)
* automate upgrades: enable unattended-upgrades fleet-wide ([#31957][] )
* anti-censorship monitoring (external prometheus setup assistance) [#31159][]
[#31159]: https://bugs.torproject.org/31159
[#32198]: https://bugs.torproject.org/32198
[#32949]: https://bugs.torproject.org/32949
[#33085]: https://bugs.torproject.org/33085
[#32390]: https://bugs.torproject.org/32390
[#33081]: https://bugs.torproject.org/33081
[#32914]: https://bugs.torproject.org/32914
[#32901]: https://bugs.torproject.org/32901
March
=====
High possibility of overload here (two major decoms and many machines setup). Possible to push moly/cymru work to april?
* 2021 budget proposal?
* possible gnt-cymru cluster setup (~6 machines) [#29397][]
* moly decom [#29974][], 5 VMs to migrate
* kvm3 decom, 7 VMs to migrate (inc. crm-int and crm-ext), [#33082][] +72EUR=+112EUR
* new gnt-fsn node (fsn-node-05) [#33083][] -118EUR=-6EUR
* eugeni VM migration to gnt-fsn [#32803][]
* buster upgrade 80% done: 61 buster (+8), 15 stretch (-8)
* solr deployment ([#33106][])
* anti-censorship monitorining (external prometheus setup assistance) [#31159][]
* nc.riseup.net cleanup [#32391][]
* SVN shutdown? [#17202][]
[#17202]: https://bugs.torproject.org/17202
[#32391]: https://bugs.torproject.org/32391
[#32803]: https://bugs.torproject.org/32803
[#33083]: https://bugs.torproject.org/33083
[#33082]: https://bugs.torproject.org/33082
[#29974]: https://bugs.torproject.org/29974
[#29397]: https://bugs.torproject.org/29397
[#33106]: https://bugs.torproject.org/33106
April
=====
* kvm4 decom, 9 VMs to migrate [#32802][] (w/o eugeni), +121EUR=+115EUR
* new gnt-fsn node (fsn-node-06) -118EUR=-3EUR
* buster upgrade 90% done: 68 buster (+7), 8 stretch (-7)
* solr configuration
[#32802]: https://bugs.torproject.org/32802
May
===
* kvm5 decom, 9 VMs to migrate [#33084][], +111EUR=+108EUR
* new gnt-fsn node (fsn-node-07) -118EUR=-10EUR
* buster upgrade 100% done: 76 buster (+8), 0 stretch (-8)
* current planned completion date of Buster upgrades
* start ramping down work, training and documentation
* solr text updates and maintenance
[#33084]: https://bugs.torproject.org/33084
June
====
* Debian jessie LTS EOL, chiwui forcibly shutdown [#29399][]
* finish ramp-down, final bugfixing and training before vacation
* search.tp.o soft launch
[#29399]: https://bugs.torproject.org/29399
July
====
* Debian stretch EOL, final deadline for buster upgrades
* anarcat vacation
* tor meeting?
* hiro tentative vacations
August
======
* anarcat vacation
* web metrics R&D (investigate a platform for web metrics) ([#32996])
[#32996]: https://bugs.torproject.org/32996
September
=========
* plan contingencies for christmas holidays
* catchup following vacation
* web metrics deployment
October
=======
* puppet work (finish prometheus module development, puppet environments, trocla, Hiera, publish code [#29387][])
* varnish to nginx conversion [#32462][]
* web metrics soft launch (in time for eoy campaign)
* submit service R&D [#30608][]
[#30608]: https://bugs.torproject.org/30608
[#32462]: https://bugs.torproject.org/32462
[#29387]: https://bugs.torproject.org/29387
November
========
* first submit service prototype? [#30608][]
December
========
* stabilisation & bugfixing
* 2021 roadmapping
* one or two week xmas holiday
* CCC?
2021 preview
============
Objectives:
* complete puppetization
* experiment with containers/kubernetes?
* close and merge more services
* replace nagios with prometheus? [#29864][]
* new hire?
[#29864]: https://bugs.torproject.org/29864
Monhtly goals:
* january: roadmap approval
* march/april: anarcat vacation