Skip to content
Snippets Groups Projects
Unverified Commit 4c97abe8 authored by anarcat's avatar anarcat
Browse files

start filling up the gitlab doc: installation, design, monitoring...

parent 688f8fe1
No related branches found
No related tags found
No related merge requests found
......@@ -51,16 +51,41 @@ lists: <tor-dev@lists.torproject.org> would be best.
<!-- how to deal with them. this should be easy to follow: think of -->
<!-- your future self, in a stressful situation, tired and hungry. -->
* Grafana Dashboards:
* [GitLab overview](https://grafana.torproject.org/d/QrDJktiMz/gitlab-omnibus)
* [Gitaly](https://grafana.torproject.org/d/x6Z50y-iz/gitlab-gitaly)
## Disaster recovery
<!-- what to do if all goes to hell. e.g. restore from backups? -->
<!-- rebuild from scratch? not necessarily those procedures (e.g. see -->
<!-- "Installation" below but some pointers. -->
In case the entire GitLab machine is destroyed, a new server should be
provisionned in the [[ganeti]] cluster (or elsewhere) and backups
should be restored using the below procedure.
### Running an emergency backup
TBD
### baremetal recovery
TBD
# Reference
## Installation
<!-- how to setup the service from scratch -->
The current GitLab server was setup in the [[ganeti]] cluster in a
regular virtual machine. It was configured with [[puppet]] with the
`roles::gitlab`.
This installs the [GitLab Omnibus](https://docs.gitlab.com/omnibus/) distribution which duplicates a
lot of resources we would otherwise manage elsewhere in Puppet,
including (but possibly not limited to):
* [[prometheus]]
* [[postgresql]]
This therefore leads to a "particular" situation regarding monitoring
and PostgreSQL backups, in particular.
## SLA
<!-- this describes an acceptable level of service for this service -->
......@@ -278,6 +303,38 @@ around.
<!-- a good guide to "audit" an existing project's design: -->
<!-- https://bluesock.org/~willkg/blog/dev/auditing_projects.html -->
GitLab is a fairly large program with multiple components. The
[upstream documentation](https://docs.gitlab.com/ee/development/architecture.html) has a good details of the architecture but
this section aims at providing a shorter summary. Here's an overview
diagram, first:
![GitLab's architecture diagram](https://docs.gitlab.com/ee/development/img/architecture_simplified.png)
The web frontend is Nginx (which we incidentally also use in our
[[cache]] system) but GitLab wrote their own reverse proxy called
[GitLab Workhorse](https://gitlab.com/gitlab-org/gitlab-workhorse/) which in turn talks to the underlying GitLab
Rails application, served by the [Unicorn](https://yhbt.net/unicorn/) application
server. The Rails app stores its data in a [[postgresql]] database
(although not our own deployment, for now: TODO). GitLab also offloads
long-term background tasks to a tool called [sidekiq](https://github.com/mperham/sidekiq).
Those all server HTTP(S) requests but GitLab is of course also
accessible over SSH to push/pull git repositories. This is handled by
a separate component called [gitlab-shell](https://gitlab.com/gitlab-org/gitlab-shell) which acts as a shell
for the `git` user.
Workhorse, Rails, sidekiq and gitlab-shell all talk with Redis to
store temporary information, caches and session information. They can
also communicate with the [Gitaly](https://gitlab.com/gitlab-org/gitaly) server which handles all
communication with the git repositories themselves.
Finally, Git)Lab also features GitLab Pages and Continuous Integration
("pages" and CI, neither of which we do not currently use). CI is
handled by [GitLab runners](https://gitlab.com/gitlab-org/gitlab-runner/) which can be deployed by anyone and
registered in the Rails app to pull CI jobs. [GitLab pages](https://gitlab.com/gitlab-org/gitlab-pages) is "a
simple HTTP server written in Go, made to serve GitLab Pages with
CNAMEs and SNI using HTTP/HTTP2".
## Issues
<!-- such projects are never over. add a pointer to well-known issues -->
......@@ -289,10 +346,20 @@ There is no issue tracker specifically for this project, [File][] or
[File]: https://trac.torproject.org/projects/tor/newticket?component=Internal+Services%2FTor+Sysadmin+Team
[search]: https://trac.torproject.org/projects/tor/query?status=!closed&component=Internal+Services%2FTor+Sysadmin+Team
TODO.
## Monitoring and testing
<!-- describe how this service is monitored and how it can be tested -->
<!-- after major changes like IP address changes or upgrades -->
Monitoring right now is minimal: normal host-level metrics like disk
space, CPU usage, web port and TLS certificates are monitored by
Nagios with our normal infrastructure, as a black box.
Prometheus monitoring is built into the GitLab Omnibus package, so it
is *not* configured through our Puppet like other Prometheus
servers. It has still been (manually) integrated in our Prometheus
setup and Grafana dashboards (see [pager playbook](#Pager_playbook)) have been deployed.
More work is underway to improve monitoring in [issue 33921](https://gitlab.torproject.org/tpo/tpa/services/-/issues/33921).
## Backups
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment