Loading tsa/howto/gitlab.mdwn +73 −6 Original line number Diff line number Diff line Loading @@ -51,16 +51,41 @@ lists: <tor-dev@lists.torproject.org> would be best. <!-- how to deal with them. this should be easy to follow: think of --> <!-- your future self, in a stressful situation, tired and hungry. --> * Grafana Dashboards: * [GitLab overview](https://grafana.torproject.org/d/QrDJktiMz/gitlab-omnibus) * [Gitaly](https://grafana.torproject.org/d/x6Z50y-iz/gitlab-gitaly) ## Disaster recovery <!-- what to do if all goes to hell. e.g. restore from backups? --> <!-- rebuild from scratch? not necessarily those procedures (e.g. see --> <!-- "Installation" below but some pointers. --> In case the entire GitLab machine is destroyed, a new server should be provisionned in the [[ganeti]] cluster (or elsewhere) and backups should be restored using the below procedure. ### Running an emergency backup TBD ### baremetal recovery TBD # Reference ## Installation <!-- how to setup the service from scratch --> The current GitLab server was setup in the [[ganeti]] cluster in a regular virtual machine. It was configured with [[puppet]] with the `roles::gitlab`. This installs the [GitLab Omnibus](https://docs.gitlab.com/omnibus/) distribution which duplicates a lot of resources we would otherwise manage elsewhere in Puppet, including (but possibly not limited to): * [[prometheus]] * [[postgresql]] This therefore leads to a "particular" situation regarding monitoring and PostgreSQL backups, in particular. ## SLA <!-- this describes an acceptable level of service for this service --> Loading Loading @@ -278,6 +303,38 @@ around. <!-- a good guide to "audit" an existing project's design: --> <!-- https://bluesock.org/~willkg/blog/dev/auditing_projects.html --> GitLab is a fairly large program with multiple components. The [upstream documentation](https://docs.gitlab.com/ee/development/architecture.html) has a good details of the architecture but this section aims at providing a shorter summary. Here's an overview diagram, first:  The web frontend is Nginx (which we incidentally also use in our [[cache]] system) but GitLab wrote their own reverse proxy called [GitLab Workhorse](https://gitlab.com/gitlab-org/gitlab-workhorse/) which in turn talks to the underlying GitLab Rails application, served by the [Unicorn](https://yhbt.net/unicorn/) application server. The Rails app stores its data in a [[postgresql]] database (although not our own deployment, for now: TODO). GitLab also offloads long-term background tasks to a tool called [sidekiq](https://github.com/mperham/sidekiq). Those all server HTTP(S) requests but GitLab is of course also accessible over SSH to push/pull git repositories. This is handled by a separate component called [gitlab-shell](https://gitlab.com/gitlab-org/gitlab-shell) which acts as a shell for the `git` user. Workhorse, Rails, sidekiq and gitlab-shell all talk with Redis to store temporary information, caches and session information. They can also communicate with the [Gitaly](https://gitlab.com/gitlab-org/gitaly) server which handles all communication with the git repositories themselves. Finally, Git)Lab also features GitLab Pages and Continuous Integration ("pages" and CI, neither of which we do not currently use). CI is handled by [GitLab runners](https://gitlab.com/gitlab-org/gitlab-runner/) which can be deployed by anyone and registered in the Rails app to pull CI jobs. [GitLab pages](https://gitlab.com/gitlab-org/gitlab-pages) is "a simple HTTP server written in Go, made to serve GitLab Pages with CNAMEs and SNI using HTTP/HTTP2". ## Issues <!-- such projects are never over. add a pointer to well-known issues --> Loading @@ -289,10 +346,20 @@ There is no issue tracker specifically for this project, [File][] or [File]: https://trac.torproject.org/projects/tor/newticket?component=Internal+Services%2FTor+Sysadmin+Team [search]: https://trac.torproject.org/projects/tor/query?status=!closed&component=Internal+Services%2FTor+Sysadmin+Team TODO. ## Monitoring and testing <!-- describe how this service is monitored and how it can be tested --> <!-- after major changes like IP address changes or upgrades --> Monitoring right now is minimal: normal host-level metrics like disk space, CPU usage, web port and TLS certificates are monitored by Nagios with our normal infrastructure, as a black box. Prometheus monitoring is built into the GitLab Omnibus package, so it is *not* configured through our Puppet like other Prometheus servers. It has still been (manually) integrated in our Prometheus setup and Grafana dashboards (see [pager playbook](#Pager_playbook)) have been deployed. More work is underway to improve monitoring in [issue 33921](https://gitlab.torproject.org/tpo/tpa/services/-/issues/33921). ## Backups Loading Loading
tsa/howto/gitlab.mdwn +73 −6 Original line number Diff line number Diff line Loading @@ -51,16 +51,41 @@ lists: <tor-dev@lists.torproject.org> would be best. <!-- how to deal with them. this should be easy to follow: think of --> <!-- your future self, in a stressful situation, tired and hungry. --> * Grafana Dashboards: * [GitLab overview](https://grafana.torproject.org/d/QrDJktiMz/gitlab-omnibus) * [Gitaly](https://grafana.torproject.org/d/x6Z50y-iz/gitlab-gitaly) ## Disaster recovery <!-- what to do if all goes to hell. e.g. restore from backups? --> <!-- rebuild from scratch? not necessarily those procedures (e.g. see --> <!-- "Installation" below but some pointers. --> In case the entire GitLab machine is destroyed, a new server should be provisionned in the [[ganeti]] cluster (or elsewhere) and backups should be restored using the below procedure. ### Running an emergency backup TBD ### baremetal recovery TBD # Reference ## Installation <!-- how to setup the service from scratch --> The current GitLab server was setup in the [[ganeti]] cluster in a regular virtual machine. It was configured with [[puppet]] with the `roles::gitlab`. This installs the [GitLab Omnibus](https://docs.gitlab.com/omnibus/) distribution which duplicates a lot of resources we would otherwise manage elsewhere in Puppet, including (but possibly not limited to): * [[prometheus]] * [[postgresql]] This therefore leads to a "particular" situation regarding monitoring and PostgreSQL backups, in particular. ## SLA <!-- this describes an acceptable level of service for this service --> Loading Loading @@ -278,6 +303,38 @@ around. <!-- a good guide to "audit" an existing project's design: --> <!-- https://bluesock.org/~willkg/blog/dev/auditing_projects.html --> GitLab is a fairly large program with multiple components. The [upstream documentation](https://docs.gitlab.com/ee/development/architecture.html) has a good details of the architecture but this section aims at providing a shorter summary. Here's an overview diagram, first:  The web frontend is Nginx (which we incidentally also use in our [[cache]] system) but GitLab wrote their own reverse proxy called [GitLab Workhorse](https://gitlab.com/gitlab-org/gitlab-workhorse/) which in turn talks to the underlying GitLab Rails application, served by the [Unicorn](https://yhbt.net/unicorn/) application server. The Rails app stores its data in a [[postgresql]] database (although not our own deployment, for now: TODO). GitLab also offloads long-term background tasks to a tool called [sidekiq](https://github.com/mperham/sidekiq). Those all server HTTP(S) requests but GitLab is of course also accessible over SSH to push/pull git repositories. This is handled by a separate component called [gitlab-shell](https://gitlab.com/gitlab-org/gitlab-shell) which acts as a shell for the `git` user. Workhorse, Rails, sidekiq and gitlab-shell all talk with Redis to store temporary information, caches and session information. They can also communicate with the [Gitaly](https://gitlab.com/gitlab-org/gitaly) server which handles all communication with the git repositories themselves. Finally, Git)Lab also features GitLab Pages and Continuous Integration ("pages" and CI, neither of which we do not currently use). CI is handled by [GitLab runners](https://gitlab.com/gitlab-org/gitlab-runner/) which can be deployed by anyone and registered in the Rails app to pull CI jobs. [GitLab pages](https://gitlab.com/gitlab-org/gitlab-pages) is "a simple HTTP server written in Go, made to serve GitLab Pages with CNAMEs and SNI using HTTP/HTTP2". ## Issues <!-- such projects are never over. add a pointer to well-known issues --> Loading @@ -289,10 +346,20 @@ There is no issue tracker specifically for this project, [File][] or [File]: https://trac.torproject.org/projects/tor/newticket?component=Internal+Services%2FTor+Sysadmin+Team [search]: https://trac.torproject.org/projects/tor/query?status=!closed&component=Internal+Services%2FTor+Sysadmin+Team TODO. ## Monitoring and testing <!-- describe how this service is monitored and how it can be tested --> <!-- after major changes like IP address changes or upgrades --> Monitoring right now is minimal: normal host-level metrics like disk space, CPU usage, web port and TLS certificates are monitored by Nagios with our normal infrastructure, as a black box. Prometheus monitoring is built into the GitLab Omnibus package, so it is *not* configured through our Puppet like other Prometheus servers. It has still been (manually) integrated in our Prometheus setup and Grafana dashboards (see [pager playbook](#Pager_playbook)) have been deployed. More work is underway to improve monitoring in [issue 33921](https://gitlab.torproject.org/tpo/tpa/services/-/issues/33921). ## Backups Loading