... | @@ -8,63 +8,106 @@ work in the coming year. |
... | @@ -8,63 +8,106 @@ work in the coming year. |
|
|
|
|
|
# Overall goals
|
|
# Overall goals
|
|
|
|
|
|
## Brainstorm
|
|
Those goals are based on the user survey performed in December 2020
|
|
|
|
and are going to be discussed in the TPA team in January 2021. As of
|
|
|
|
2021-01-19, this is just a draft proposed by @anarcat and not formally
|
|
|
|
adopted by the team.
|
|
|
|
|
|
The following are conclusions drawn from the survey, below:
|
|
As a reminder, the priority suggested by the survey is "service
|
|
|
|
stabilisation" before "new services". Furthermore, some services are
|
|
|
|
way more popular than others, so those services should get special
|
|
|
|
attention. In general, the over-arching goals are therefore:
|
|
|
|
|
|
* email delivery needs to be improved, multiple possible solutions
|
|
* stabilisation (particularly email but also GitLab, schleuder, blog)
|
|
* split eugeni into lists and forwards
|
|
* better communication (particularly with devs)
|
|
* setup submit-01 to deliver people's emails (#30608)
|
|
|
|
* stop treating eugeni as a smart host: have CiviCRM and RT and
|
|
## Need to have
|
|
other machines deliver their own email
|
|
|
|
* CiviCRM needs to handle its bounces
|
|
* email delivery improvements:
|
|
* followup on abuse complaints
|
|
* handle bounces in CiviCRM ([issue 33037](https://gitlab.torproject.org/tpo/tpa/team/-/issues/33037))
|
|
* continue the GitLab migration:
|
|
* systematically followup on and respond to abuse complaints
|
|
* setup GitLab CI for everyone, deprecate Jenkins
|
|
* diagnose and resolve delivery issue (e.g. [yahoo delivery
|
|
* migrate away from Gitolite and Gitweb
|
|
problems](https://gitlab.torproject.org/tpo/tpa/team/-/issues/34134))
|
|
* fix the blog, possible solutions:
|
|
* provide reliable delivery for users ("my email ends up in spam!")
|
|
* migrate to static website and Discourse
|
|
* possible implementations:
|
|
|
|
* split mailing lists out of eugeni
|
|
|
|
* setup submit-01 to deliver people's emails ([issue 30608](https://gitlab.torproject.org/tpo/tpa/team/-/issues/30608)))
|
|
|
|
* split schleuder out of eugeni (or retire)
|
|
|
|
* stop using eugeni as a smart host (each host sends its own
|
|
|
|
email, particularly RT and CiviCRM)
|
|
|
|
* retire old services:
|
|
|
|
* SVN ([issue 17202](https://gitlab.torproject.org/tpo/tpa/team/-/issues/17202))
|
|
|
|
* fpcentral ([issue 40009](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40009))
|
|
|
|
* gitolite (replaced with GitLab, see above)
|
|
|
|
* gitweb (replaced with GitLab, see above)
|
|
|
|
* jenkins (replaced with GitLab, see above)
|
|
|
|
* scale GitLab with ongoing and surely expanding usage
|
|
|
|
* possibly split in multiple server
|
|
|
|
* throw more hardware at it?
|
|
|
|
* monitoring?
|
|
|
|
* provide reliable and simple continuous integration services
|
|
|
|
* retire Jenkins
|
|
|
|
* replace with GitLab CI, with Windows, Mac and Linux runners
|
|
|
|
* avoid duplicate git hosting infrastructure
|
|
|
|
* retire gitolite, gitweb ([issue 36](https://gitlab.torproject.org/tpo/tpa/gitlab/-/issues/36))
|
|
|
|
* fix the blog moderation and comment moderation, possible solutions:
|
|
|
|
* migrate to a static website and Discourse
|
|
* fix formatting and improve moderation within Drupal
|
|
* fix formatting and improve moderation within Drupal
|
|
* retire more services:
|
|
* improve communications and monitoring:
|
|
* SVN
|
|
* document "downtimes of 1 hour or longer", in a status page [issue
|
|
* fpcentral
|
|
40138](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40138)
|
|
* schleuder?
|
|
* reduce alert fatigue in Nagios
|
|
* testnet?
|
|
* publicize debugging tools (Grafana, user-level logging in systemd
|
|
* gitolite (to GitLab, see above)
|
|
services)
|
|
* gitweb (to GitLab, see above)
|
|
* encourage communication and ticket creation
|
|
* jenkins (to GitLab, see above)
|
|
* move root@ and tpa "noise" to RT ([ticket 31242]( https://gitlab.torproject.org/tpo/tpa/team/-/issues/31242)), make a real
|
|
* stabilise service offering, possible solutions:
|
|
mailing list for admins so that gaba and non-tech can join
|
|
* retire services (see above)
|
|
* plan for hiro's vacation (replacement?)
|
|
* balance FSN/CHI ganeti clusters
|
|
|
|
* finish transitions and migrations (e.g. GitLab, main website,
|
|
|
|
etc)
|
|
|
|
* document "downtimes of 1 hour or longer", maybe part of the
|
|
|
|
monthly report? "how many 9's?" suggest mitigations when
|
|
|
|
downtimes occur (maybe just a static site made with [cstate](https://github.com/cstate/cstate)?
|
|
|
|
with contingencies for when the static site network goes down, of
|
|
|
|
course) see https://gitlab.torproject.org/tpo/tpa/team/-/issues/40138
|
|
|
|
* above probably requires auditing and reducing noise in Nagios
|
|
|
|
alerts, because alerts fatigue makes it useless for detecting
|
|
|
|
outages right now
|
|
|
|
* improve developer experience:
|
|
|
|
* provide development/experimental VMs?
|
|
|
|
* give developers more tools to debug problems (e.g. grafana, stack
|
|
|
|
traces hidden in syslog)
|
|
|
|
* improve interaction between TPA and devs when new services are
|
|
|
|
setup
|
|
|
|
|
|
|
|
Also note the following 2020 goals that are not mentioned above and
|
|
|
|
might be added:
|
|
|
|
|
|
|
|
* moly retirement
|
|
## Nice to have
|
|
* solr/search.tpo deployment
|
|
|
|
* web metrics (#32996)
|
|
|
|
* varnish to nginx conversion (#32462)
|
|
|
|
|
|
|
|
## TODO: Need to have
|
|
* improve sysadmin code base
|
|
## TODO: Nice to have
|
|
* avoid YOLO commits in Puppet (possibly: server-side linting, CI)
|
|
## TODO: Non-goals
|
|
* publish our Puppet repository ([ticket 29387](https://gitlab.torproject.org/tpo/tpa/team/-/issues/29387))
|
|
# TODO: Quarterly breakdown
|
|
* reduce dependency on Python 2 code (see [short term LDAP plan](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/ldap#short-term-merge-with-upstream-port-to-python-3-if-necessary))
|
|
|
|
* reduce dependency on LDAP (move hosts to Puppet? see [mid term
|
|
|
|
LDAP plan](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/ldap#mid-term-move-hosts-to-puppet-possibly-replace-ud-ldap-with-simpler-dashboard))
|
|
|
|
* retire more old services:
|
|
|
|
* testnet?
|
|
|
|
* schleuder?
|
|
|
|
* provide secure, end-to-end authentication of Tor source code
|
|
|
|
([issue 81](https://gitlab.torproject.org/tpo/tpa/gitlab/-/issues/81))
|
|
|
|
* finish retiring old hardware (moly, [ticket 29974](https://gitlab.torproject.org/legacy/trac/-/issues/29974))
|
|
|
|
* varnish to nginx conversion (#32462)
|
|
|
|
* solr/search.tpo deployment (#33106)?
|
|
|
|
* web metrics (#32996)?
|
|
|
|
* GitLab pages hosting
|
|
|
|
* experiment with containers/kubernetes for CI/CD
|
|
|
|
|
|
|
|
## Non-goals
|
|
|
|
|
|
|
|
* complete email service: not enough time / budget (or delegate + pay Riseup)
|
|
|
|
* "provide development/experimental VMs": would be possible through
|
|
|
|
GitLab CD, to be investigated once we have GitLab CI solidly
|
|
|
|
running
|
|
|
|
* "improve interaction between TPA and devs when new services are
|
|
|
|
setup": see "improve communications" above, and "experimental
|
|
|
|
VMs". The endgame here is people will be able to deploy their own
|
|
|
|
services through Docker, but this will likely not happen in 2021
|
|
|
|
* static mirror network retirement / rearchitecture: we want to test
|
|
|
|
out GitLab pages first and see if it can provide a decent
|
|
|
|
alternative
|
|
|
|
* TODO: "finish main website transition", "broken links on
|
|
|
|
website"... should TPA cover for web stuff?
|
|
|
|
* TODO: are service admins still a thing? should we cover for things
|
|
|
|
like the metrics team?
|
|
|
|
* complete puppetization: old legacy services are not in Puppet. that
|
|
|
|
is fine: we keep maintaining them by hand when relevant, but new
|
|
|
|
services should all be built in Puppet
|
|
|
|
* replace Nagios with Prometheus: not a short term goal, no clear
|
|
|
|
benefit. reduce the noise in Nagios instead
|
|
|
|
|
|
|
|
# Quarterly breakdown
|
|
|
|
|
|
## Q1
|
|
## Q1
|
|
|
|
|
... | @@ -73,6 +116,17 @@ this roadmap is concerned. It should include items we are fairly |
... | @@ -73,6 +116,17 @@ this roadmap is concerned. It should include items we are fairly |
|
certain to be able to complete within the next few months or
|
|
certain to be able to complete within the next few months or
|
|
so. Postponing those could cause problems.
|
|
so. Postponing those could cause problems.
|
|
|
|
|
|
|
|
* email delivery improvements:
|
|
|
|
* handle bounces in CiviCRM ([issue 33037](https://gitlab.torproject.org/tpo/tpa/team/-/issues/33037))
|
|
|
|
* followup on abuse complaints
|
|
|
|
* diagnose and resolve delivery issue (e.g. [yahoo delivery
|
|
|
|
problems](https://gitlab.torproject.org/tpo/tpa/team/-/issues/34134))
|
|
|
|
* GitLab CI deployment, plan for Jenkins retirement
|
|
|
|
* setup a discourse instance, deprecate blog comments?
|
|
|
|
* plan for blog replacement?
|
|
|
|
* document "downtimes of 1 hour or longer", in a status page [issue
|
|
|
|
40138](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40138)
|
|
|
|
|
|
## Q2
|
|
## Q2
|
|
|
|
|
|
Second quarter is a little more vague, but should still be
|
|
Second quarter is a little more vague, but should still be
|
... | @@ -80,6 +134,17 @@ Second quarter is a little more vague, but should still be |
... | @@ -80,6 +134,17 @@ Second quarter is a little more vague, but should still be |
|
wait a little longer or that are part of longer projects that will
|
|
wait a little longer or that are part of longer projects that will
|
|
take longer to complete.
|
|
take longer to complete.
|
|
|
|
|
|
|
|
* retire old services:
|
|
|
|
* SVN ([issue 17202](https://gitlab.torproject.org/tpo/tpa/team/-/issues/17202))
|
|
|
|
* fpcentral ([issue 40009](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40009))
|
|
|
|
* establish plan for gitolite/gitweb retirement
|
|
|
|
* improve sysadmin code base
|
|
|
|
* avoid YOLO commits in Puppet (possibly: server-side linting, CI)
|
|
|
|
* publish our Puppet repository ([ticket 29387](https://gitlab.torproject.org/tpo/tpa/team/-/issues/29387))
|
|
|
|
* reduce dependency on Python 2 code (see [short term LDAP plan](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/ldap#short-term-merge-with-upstream-port-to-python-3-if-necessary))
|
|
|
|
* reduce dependency on LDAP (move hosts to Puppet? see [mid term
|
|
|
|
LDAP plan](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/ldap#mid-term-move-hosts-to-puppet-possibly-replace-ud-ldap-with-simpler-dashboard))
|
|
|
|
|
|
## Q3
|
|
## Q3
|
|
|
|
|
|
From our experience, after three quarters, things get difficult to
|
|
From our experience, after three quarters, things get difficult to
|
... | @@ -88,13 +153,19 @@ time before this time, which totally changed basic assumptions about |
... | @@ -88,13 +153,19 @@ time before this time, which totally changed basic assumptions about |
|
worker availability and priorities.
|
|
worker availability and priorities.
|
|
|
|
|
|
Also, a global pandemic basically tore the world apart, throwing
|
|
Also, a global pandemic basically tore the world apart, throwing
|
|
everything in the air.
|
|
everything in the air, so obviously plans kind of went out the
|
|
|
|
window. Hopefully this won't happen again and the pandemic will
|
|
|
|
somewhat subside, but we should plan for the worst.
|
|
|
|
|
|
|
|
* jenkins retirement?
|
|
|
|
|
|
## Q4
|
|
## Q4
|
|
|
|
|
|
Obviously, the fourth quarter is sheer crystal balling at this stage,
|
|
Obviously, the fourth quarter is sheer crystal balling at this stage,
|
|
but it should still be an interesting exercise to perform.
|
|
but it should still be an interesting exercise to perform.
|
|
|
|
|
|
|
|
* gitolite/gitweb retirement?
|
|
|
|
|
|
# 2020 roadmap evaluation
|
|
# 2020 roadmap evaluation
|
|
|
|
|
|
The following is a review of the 2020 roadmap.
|
|
The following is a review of the 2020 roadmap.
|
... | | ... | |