This document is manually synchronized regularly between and . Keep the format in markdown for now. It should eventually be moved into Trac keywords. [[!toc]] Introduction ============ This page documents a possible roadmap for the TPA team for the year 2020. Items should be "SMART" , that is: * specific * measurable * achievable * relevant * time-bound DEPRECATED (only to be able to have gaba's notes on the roadmap) TODO ---- * nextcloud roadmap * identify critical services and realistic improvements [#31243][] (done) * (anarcat & gaba) sort out each month by priority (mostly done for feb/march) * (gaba) add keywords #tpa-roadmap- for each month (doing for february and march to test how this would work) (done) * (anarcat) create missing tickets for february/march (partially done, missing some from hiro) * (at tpa meeting) estimate tickets! (1pt = 1 day) * (gaba) reorganize budget file per month https://nc.torproject.net/apps/onlyoffice/7374?filePath=%2FTeams%2FSysadmin%2FBudget%20Sysadmin.xlsx * (gaba) create a roadmap for gitlab migration * (gaba) find service admins for gitlab (nobody for trac in https://trac.torproject.org/projects/tor/wiki/org/operations/services ) - gaba to talk with isa and alex and look for service admins (sent a mail to las vegas but nobody replied... I will talk with each team lead) * have a shell account in the server * restart/stop service * upgrade services * problems with the service [#31243]: https://bugs.torproject.org/31243 Main objectives (need to have): * decommissining of old machines (moly in particular) * move critical services in ganeti * buster upgrades before LTS * within budget Secondary objectives (nice to have): * new mail service * conversion of the kvm* fleet to ganeti for higher reliability and availability * buster upgrade completion before anarcat vacation Non-objective: * service admin roadmapping? * kubernetes cluster deployment? Assertions: * new gnt-fsn nodes with current hardware (PX62-NVMe, 118EUR/mth), cost savings possible with the AX line (-20EUR/mth) or by reducing disk space requirements (-39EUR/mth) per node * cymru actually delivers hardware and is used for moly decom * gitlab hardware requirements covered by another budget * we absorb the extra bandwidth costs from the new hardware design (currently 38EUR per month but could rise when new bandwidth usage comes in) - could be shifted to TBB team or at least labeled as such January ======= * [x] catchup after holidays * [x] agree internally on a roadmap for 2020 * [x] first phase of installer automation (setup-storage and friends) [#31239][] * [x] new FSN node in the Ganeti cluster (fsn-node-03) [#32937][] * [ ] textile shutdown and VM relocation, 2 VMs to migrate [#31686][] (+86EUR) IN PROGRESS * [ ] enable needrestart fleet-wide ([#31957][]) * [ ] review website build errors ([#32996][]) * [ ] evaluate if discourse can be used as comments platform for the blog ([#33105][]) <-- can we move this further down the road (not february) until gitlab is migrated? --> * [x] communicate buster upgrade timeline to service admins DONE * [x] buster upgrade 63% done: 48 buster, 28 stretch machines [#31957]: https://bugs.torproject.org/31957 [#31686]: https://bugs.torproject.org/31686 [#32937]: https://bugs.torproject.org/32937 [#31239]: https://bugs.torproject.org/31239 [#32996]: https://bugs.torproject.org/32996 [#33105]: https://bugs.torproject.org/33105 February ======== * 2020 roadmap officially adopted * second phase of installer automation [#31239][] (esp. puppet automation, e.g. [#32901][], [#32914][]) * new gnt-fsn node (fsn-node-04) -118EUR=+40EUR ([#33081][]) * storm shutdown [#32390][] * unifolium decom (after storm), 5 VMs to migrate, [#33085][] +72EUR=+158EUR * buster upgrade 70% done: 53 buster (+5), 23 stretch (-5) * migrate gitlab-01 to a new VM (gitlab-02) and use the omnibus package instead of ansible ([#32949][]) * migrate CRM machines to gnt and test with Giant Rabbit [#32198][] (priority) * automate upgrades: enable unattended-upgrades fleet-wide ([#31957][] ) * anti-censorship monitoring (external prometheus setup assistance) [#31159][] [#31159]: https://bugs.torproject.org/31159 [#32198]: https://bugs.torproject.org/32198 [#32949]: https://bugs.torproject.org/32949 [#33085]: https://bugs.torproject.org/33085 [#32390]: https://bugs.torproject.org/32390 [#33081]: https://bugs.torproject.org/33081 [#32914]: https://bugs.torproject.org/32914 [#32901]: https://bugs.torproject.org/32901 March ===== High possibility of overload here (two major decoms and many machines setup). Possible to push moly/cymru work to april? * 2021 budget proposal? * possible gnt-cymru cluster setup (~6 machines) [#29397][] * moly decom [#29974][], 5 VMs to migrate * kvm3 decom, 7 VMs to migrate (inc. crm-int and crm-ext), [#33082][] +72EUR=+112EUR * new gnt-fsn node (fsn-node-05) [#33083][] -118EUR=-6EUR * eugeni VM migration to gnt-fsn [#32803][] * buster upgrade 80% done: 61 buster (+8), 15 stretch (-8) * solr deployment ([#33106][]) * anti-censorship monitorining (external prometheus setup assistance) [#31159][] * nc.riseup.net cleanup [#32391][] * SVN shutdown? [#17202][] [#17202]: https://bugs.torproject.org/17202 [#32391]: https://bugs.torproject.org/32391 [#32803]: https://bugs.torproject.org/32803 [#33083]: https://bugs.torproject.org/33083 [#33082]: https://bugs.torproject.org/33082 [#29974]: https://bugs.torproject.org/29974 [#29397]: https://bugs.torproject.org/29397 [#33106]: https://bugs.torproject.org/33106 April ===== * kvm4 decom, 9 VMs to migrate [#32802][] (w/o eugeni), +121EUR=+115EUR * new gnt-fsn node (fsn-node-06) -118EUR=-3EUR * buster upgrade 90% done: 68 buster (+7), 8 stretch (-7) * solr configuration [#32802]: https://bugs.torproject.org/32802 May === * kvm5 decom, 9 VMs to migrate [#33084][], +111EUR=+108EUR * new gnt-fsn node (fsn-node-07) -118EUR=-10EUR * buster upgrade 100% done: 76 buster (+8), 0 stretch (-8) * current planned completion date of Buster upgrades * start ramping down work, training and documentation * solr text updates and maintenance [#33084]: https://bugs.torproject.org/33084 June ==== * Debian jessie LTS EOL, chiwui forcibly shutdown [#29399][] * finish ramp-down, final bugfixing and training before vacation * search.tp.o soft launch [#29399]: https://bugs.torproject.org/29399 July ==== * Debian stretch EOL, final deadline for buster upgrades * anarcat vacation * tor meeting? * hiro tentative vacations August ====== * anarcat vacation * web metrics R&D (investigate a platform for web metrics) ([#32996]) [#32996]: https://bugs.torproject.org/32996 September ========= * plan contingencies for christmas holidays * catchup following vacation * web metrics deployment October ======= * puppet work (finish prometheus module development, puppet environments, trocla, Hiera, publish code [#29387][]) * varnish to nginx conversion [#32462][] * web metrics soft launch (in time for eoy campaign) * submit service R&D [#30608][] [#30608]: https://bugs.torproject.org/30608 [#32462]: https://bugs.torproject.org/32462 [#29387]: https://bugs.torproject.org/29387 November ======== * first submit service prototype? [#30608][] December ======== * stabilisation & bugfixing * 2021 roadmapping * one or two week xmas holiday * CCC? 2021 preview ============ Objectives: * complete puppetization * experiment with containers/kubernetes? * close and merge more services * replace nagios with prometheus? [#29864][] * new hire? [#29864]: https://bugs.torproject.org/29864 Monhtly goals: * january: roadmap approval * march/april: anarcat vacation