start filling up the gitlab doc: installation, design, monitoring... (4c97abe8) · Commits · The Tor Project / TPA / Wiki Replica

tsa/howto/gitlab.mdwn

+73 −6

Original line number	Diff line number	Diff line
		@@ -51,16 +51,41 @@ lists: <tor-dev@lists.torproject.org> would be best.
		<!-- how to deal with them. this should be easy to follow: think of -->
		<!-- your future self, in a stressful situation, tired and hungry. -->

		* Grafana Dashboards:
		* [GitLab overview](https://grafana.torproject.org/d/QrDJktiMz/gitlab-omnibus)
		* [Gitaly](https://grafana.torproject.org/d/x6Z50y-iz/gitlab-gitaly)

		## Disaster recovery

		<!-- what to do if all goes to hell. e.g. restore from backups? -->
		<!-- rebuild from scratch? not necessarily those procedures (e.g. see -->
		<!-- "Installation" below but some pointers. -->
		In case the entire GitLab machine is destroyed, a new server should be
		provisionned in the [[ganeti]] cluster (or elsewhere) and backups
		should be restored using the below procedure.

		### Running an emergency backup

		TBD

		### baremetal recovery

		TBD

		# Reference

		## Installation
		<!-- how to setup the service from scratch -->

		The current GitLab server was setup in the [[ganeti]] cluster in a
		regular virtual machine. It was configured with [[puppet]] with the
		`roles::gitlab`.

		This installs the [GitLab Omnibus](https://docs.gitlab.com/omnibus/) distribution which duplicates a
		lot of resources we would otherwise manage elsewhere in Puppet,
		including (but possibly not limited to):

		* [[prometheus]]
		* [[postgresql]]

		This therefore leads to a "particular" situation regarding monitoring
		and PostgreSQL backups, in particular.

		## SLA
		<!-- this describes an acceptable level of service for this service -->
		@@ -278,6 +303,38 @@ around.
		<!-- a good guide to "audit" an existing project's design: -->
		<!-- https://bluesock.org/~willkg/blog/dev/auditing_projects.html -->

		GitLab is a fairly large program with multiple components. The
		[upstream documentation](https://docs.gitlab.com/ee/development/architecture.html) has a good details of the architecture but
		this section aims at providing a shorter summary. Here's an overview
		diagram, first:

		![GitLab's architecture diagram](https://docs.gitlab.com/ee/development/img/architecture_simplified.png)

		The web frontend is Nginx (which we incidentally also use in our
		[[cache]] system) but GitLab wrote their own reverse proxy called
		[GitLab Workhorse](https://gitlab.com/gitlab-org/gitlab-workhorse/) which in turn talks to the underlying GitLab
		Rails application, served by the [Unicorn](https://yhbt.net/unicorn/) application
		server. The Rails app stores its data in a [[postgresql]] database
		(although not our own deployment, for now: TODO). GitLab also offloads
		long-term background tasks to a tool called [sidekiq](https://github.com/mperham/sidekiq).

		Those all server HTTP(S) requests but GitLab is of course also
		accessible over SSH to push/pull git repositories. This is handled by
		a separate component called [gitlab-shell](https://gitlab.com/gitlab-org/gitlab-shell) which acts as a shell
		for the `git` user.

		Workhorse, Rails, sidekiq and gitlab-shell all talk with Redis to
		store temporary information, caches and session information. They can
		also communicate with the [Gitaly](https://gitlab.com/gitlab-org/gitaly) server which handles all
		communication with the git repositories themselves.

		Finally, Git)Lab also features GitLab Pages and Continuous Integration
		("pages" and CI, neither of which we do not currently use). CI is
		handled by [GitLab runners](https://gitlab.com/gitlab-org/gitlab-runner/) which can be deployed by anyone and
		registered in the Rails app to pull CI jobs. [GitLab pages](https://gitlab.com/gitlab-org/gitlab-pages) is "a
		simple HTTP server written in Go, made to serve GitLab Pages with
		CNAMEs and SNI using HTTP/HTTP2".

		## Issues

		<!-- such projects are never over. add a pointer to well-known issues -->
		@@ -289,10 +346,20 @@ There is no issue tracker specifically for this project, [File][] or
		[File]: https://trac.torproject.org/projects/tor/newticket?component=Internal+Services%2FTor+Sysadmin+Team
		[search]: https://trac.torproject.org/projects/tor/query?status=!closed&component=Internal+Services%2FTor+Sysadmin+Team

		TODO.

		## Monitoring and testing

		<!-- describe how this service is monitored and how it can be tested -->
		<!-- after major changes like IP address changes or upgrades -->
		Monitoring right now is minimal: normal host-level metrics like disk
		space, CPU usage, web port and TLS certificates are monitored by
		Nagios with our normal infrastructure, as a black box.

		Prometheus monitoring is built into the GitLab Omnibus package, so it
		is not configured through our Puppet like other Prometheus
		servers. It has still been (manually) integrated in our Prometheus
		setup and Grafana dashboards (see [pager playbook](#Pager_playbook)) have been deployed.

		More work is underway to improve monitoring in [issue 33921](https://gitlab.torproject.org/tpo/tpa/services/-/issues/33921).

		## Backups