- Tutorial
- How to get an account?
- How to report an issue in Tor software?
- How to report an issue in the bugtracker itself?
- Note about confidential issues
- How to contribute code?
- How to quote a comment in a reply?
- How-to
- Continuous Integration (CI)
- Container registry operations
- Logging in
- Uploading an image
- Email interactions
- Creating a new issue
- Commenting on an issue
- Quick status updates by email
- How to migrate a Git repository from legacy to GitLab?
- How to mirror a Git repository from legacy to GitLab?
- How to mirror a Git repository from GitLab to GitHub
- How to find the right emoji?
- Publishing notifications on IRC
- Setting up two-factor authentication (2FA)
- Deleting sensitive attachments
- Publishing GitLab pages
- Accepting merge requests on wikis
- Renaming a branch globally
- Modifying open Merge Requests
- Updating gitolite
- Updating Transifex
- Find the project associated with a project ID
- Find the project associated with a hashed repository name
- Connect to the PostgreSQL server
- Pager playbook
- Troubleshooting
- Filtering through json logs
- GitLab pages not found
- PostgreSQL debugging
- Disk full on GitLab server
- Incoming email routing
- Outgoing email
- Gitlab registry troubleshooting
- HTTP 500 Internal Server Error
- HTTP 502 Bad Gateway
- Disaster recovery
- Running an emergency backup
- Recovering this wiki from backups
- Restoring from backups
- Reference
- Installation
- Main GitLab installation
- PostgreSQL standalone transition
- GitLab CI installation
- GitLab pages installation
- GitLab registry
- Upgrades
- SLA
- Design
- Continuous integration
- Spam control
- Scalability
- GitLab pages
- Redacting GitLab confidential issues
- Issues
- Known
- Resolved
- Monitoring and metrics
- Tests
- Logs
- Backups
- Other documentation
- Discussion
- Meetings
- Overview
- Goals
- Must have
- Nice to have
- Non-Goals
- Approvals required
- Proposed Solution
- Cost
- Alternatives considered
- GitLab command line clients
- Migration tools
- Git repository integrity solutions
- Guix: sign all commits
- Arista: sign all commits in Gerrit
- Gerwitz: sign all commits or at least merge commits
- Torvalds: signed tags
- Vick: git signatures AKA git notes
- Walters: extended validation tags
- Ryabitsev: b4 and patch attestations
- Ryabitsev: Secure Scuttlebutt
- Stelzer: ssh signatures
- Lorenc: sigstore
- Sirish: gittuf
- Other caveats
- Related
- Migration from Trac
GitLab is a web-based DevOps lifecycle tool that provides a Git-repository manager providing wiki, issue-tracking and continuous integration/continuous deployment pipeline features, using an open-source license, developed by GitLab Inc (Wikipedia). Tor uses GitLab for issue tracking, source code and wiki hosting, at https://gitlab.torproject.org, after migrating from Trac and gitolite.
Note that continuous integration is documented separately, in the CI page.
- Tutorial
-
How-to
- Continuous Integration (CI)
- Container registry operations
- Email interactions
- How to migrate a Git repository from legacy to GitLab?
- How to mirror a Git repository from legacy to GitLab?
- How to mirror a Git repository from GitLab to GitHub
- How to find the right emoji?
- Publishing notifications on IRC
- Setting up two-factor authentication (2FA)
- Deleting sensitive attachments
- Publishing GitLab pages
- Accepting merge requests on wikis
- Renaming a branch globally
- Find the project associated with a project ID
- Find the project associated with a hashed repository name
- Connect to the PostgreSQL server
- Pager playbook
- Disaster recovery
- Reference
-
Discussion
- Meetings
- Overview
- Goals
- Approvals required
- Proposed Solution
- Cost
- Alternatives considered
-
Git repository integrity solutions
- Guix: sign all commits
- Arista: sign all commits in Gerrit
- Gerwitz: sign all commits or at least merge commits
- Torvalds: signed tags
- Vick: git signatures AKA git notes
- Walters: extended validation tags
- Ryabitsev: b4 and patch attestations
- Ryabitsev: Secure Scuttlebutt
- Stelzer: ssh signatures
- Lorenc: sigstore
- Sirish: gittuf
- Other caveats
- Related
- Migration from Trac
Tutorial
How to get an account?
You might already have an account! If you were active on Trac, your account was migrated with the same username and email address as Trac, unless you have an LDAP account, in which case that was used. So head over to the password reset page to get access to your account.
If your account was not migrated, send a mail to gitlab-admin@torproject.org to request a new one.
If you did not have an account in Trac and want a new account, you should request a new one at https://gitlab.onionize.space/.
How to report an issue in Tor software?
You first need to figure out which project the issue resides in. The project list is a good place to get started. Here are a few quick links for popular projects:
If you do not have a GitLab account or can't figure it out for any reason, you can also use the mailing lists. The tor-dev@lists.torproject.org mailing list is the best for now.
How to report an issue in the bugtracker itself?
If you have access to GitLab, you can file a new issue after you have searched the GitLab project for similar bugs.
If you do not have access to GitLab, you can email gitlab-admin@torproject.org.
Note about confidential issues
Note that you can mark issues as "confidentials" which will make them private to the members of the project the issue is reported on (the "developers" group and above, specifically).
Keep in mind, however, that it is still possible issue information gets leaked in cleartext, however. For example, GitLab sends email notifications in cleartext for private issue, an known upstream issue.
We have deployed a workaround for this which redacts outgoing mail, but there's still some metadata leaking there:
- the issue number
- the reporter
- the project name
- the reply token (allowing someone to impersonate a reply)
Some repositories might also have "web hooks" that notify IRC bots in clear text as well, although at the time of writing all projects are correctly configured. The IRC side of things, of course, might also leak information.
Note that internal notes are currently not being redacted, unless they are added to confidential issues, see issue 145.
How to contribute code?
As reporting an issue, you first need to figure out which project you are working on in the GitLab project list. Then, if you are not familiar with merge requests, you should read the merge requests introduction in the GitLab documentation. If you are unfamiliar with merge requests but familiar with GitHub's pull requests, those are similar.
Note that we do not necessarily use merge requests in all teams yet, and Gitolite still has the canonical version of the code. See issue 36 for a followup on this.
Also note that different teams might have different workflows. If a team has a special workflow that diverges from the one here, it should be documented here. Those are the workflows we know about:
- Network Team
- Web Team
- Bridge DB: merge requests
If you do not have access to GitLab, please use one of the mailing lists: tor-dev@lists.torproject.org would be best.
How to quote a comment in a reply?
The "Reply" button only creates a new comment without any quoted text
by default. It seems the solution to that is currently highlighting
the text to quote and then pressing the r
-key. See also the other
keyboard shortcuts.
Alternatively, you can copy-paste the text in question in the comment
form, select the pasted text, and hit the Insert a quote
button
which look like a styled, curly, and closing quotation mark ”
.
How-to
Continuous Integration (CI)
All CI documentation resides in a different document see service/ci.
Container registry operations
Logging in
To upload content to the registry, you first need to login. This can
be done with the login
command:
podman login
This will ask you for your GitLab username and a password, for which you should use a personal access token.
Uploading an image
Assuming you already have an image built (below we have it labeled
with containers.torproject.org/anarcat/test/airsonic-test
), you can
upload it with:
podman push containers.torproject.org/anarcat/test/airsonic-test containers.torproject.org/anarcat/test
Notice the two arguments: the first is the label of the image to
upload and the second is where to upload it, or "destination". The
destination is made of two parts, the first component is the host name
of the container registry (in our case containers.torproject.org
)
and the second part is the path to the project to upload into (in our
case anarcat/test
.
The uploaded container image should appear under Deploy -> Container Registry in your project. In the above case, it is in:
https://gitlab.torproject.org/anarcat/test/container_registry/4
Email interactions
You can interact with GitLab by email too.
Creating a new issue
Clicking on the project issues gives a link at the bottom of the page, which says say "Email a new issue to this project".
That link should go into the "To" field of your email. The email
subject becomes the title of the issue and the body the
description. You can use shortcuts in the body, like /assign @foo
,
/estimate 1d
, etc.
See the upstream docs for more details.
Commenting on an issue
If you just reply to the particular comment notification you received by email, as you would reply to an email in a thread, that comment will show up in the issue.
You need to have email notifications enabled for this to work, naturally.
You can also add a new comment to any issue by copy-pasting the issue-specific email address in the right sidebar (labeled "Issue email", introduced in GitLab 13.8).
This also works with shortcuts like /estimate 1d
or /spend -1h
. Note: for those you won't get notification emails back, though,
while for others like /assign @foo
you would.
See the upstream docs for more details.
Quick status updates by email
There are a bunch of quick actions available which are handy to
update an issue. As mentioned above they can be sent by email as well,
both within a comment (be it as a reply to a previous one or in a new
one) or just instead of it. So, for example, if you want to update the
amount of time spent on ticket $foo by one hour, find any notification
email for that issue and reply to it by replacing any quoted text with
/spend 1h
.
How to migrate a Git repository from legacy to GitLab?
See the git documentation for this procedure.
How to mirror a Git repository from legacy to GitLab?
See the git documentation for this procedure.
How to mirror a Git repository from GitLab to GitHub
Some repositories are mirrored to the torproject
organization on
GitHub. This section explains how that works and how to create a
new mirror from GitLab. In this example, we're going to mirror the
tor browser manual.
-
head to the "Mirroring repositories" section of the settings/repository part of the project
-
as a Git repository URL, enter:
ssh://git@github.com/torproject/manual.git
-
click "detect host keys"
-
choose "SSH" as the "Authentication method"
-
don't check any of the boxes, click "Mirror repository"
-
the page will reload and show the mirror in the list of "Mirrored repositories". click the little "paperclip" icon which says "Copy SSH public key"
-
head over to the settings/keys section of the target GitHub project and click "Add deploy key"
Title: https://gitlab.torproject.org/tpo/web/manual mirror key Key: <paste public key here>
-
check the "Allow write access" checkbox and click "Add key"
-
back in the "Mirroring repositories" section of the GitLab project, click the "Update now" button represented by circling arrows
If there is an error, it will show up as a little red "Error" button. Hovering your mouse over the button will show you the error.
If you want retry the "Update now" button, you need to let the update interval pass (1 minute for protected branch mirroring, 5 minutes for all branches) otherwise it will have no effect.
How to find the right emoji?
It's possible to add "reaction emojis" to comments and issues and merge requests in GitLab. Just hit the little smiley face and a dialog will pop up. You can then browse through the list and pick the right emoji for how you feel about the comment, but remember to be nice!
It's possible you get lost in the list. You can type the name of the
emoji to restrict your search, but be warned that some emojis have
particular, non-standard names that might not be immediately
obvious. For example, 🎉
, U+1F389 PARTY POPPER
, is found as
tada
in the list! See this upstream issue for more details.
Publishing notifications on IRC
By default, new projects do not have notifications setup in
#tor-bots
like all the others. To do this, you need to configure a
"Webhook", in the Settings -> Webhooks
section of the project. The
URL should be:
https://kgb-bot.torproject.org/webhook/
... and you should select the notifications you wish to see in
#tor-bots
. You can also enable notifications to other channels by
adding more parameters to the URL, like (say)
?channel=tor-foo
.
Important note: do not try to put the #
in the channel name, or if
you do, URL-encode it (e.g. like %23tor-foo
), otherwise this will
silently fail to change the target channel.
Other parameters are documented the KGB documentation. In
particular, you might want to use private=yes;channel=tor-foo
if you
do not want to have the bot send notifications in #tor-bots
, which
is also does by default.
IMPORTANT: Again, even if you tell the bot to send a notification to the channel
#tor-foo
, the bot still defaults to also sending to#tor-bots
, unless you use thatprivate
flag above. Be careful to not accidentally leak sensitive information to a public channel, and test with a dummy repository if you are unsure.
The KGB bot can also send notifications to channels that require a password.
In the /etc/kgb.conf
configuration file, add a secret
to a channel so the
bot can access a password-protected channel. For example:
channels:
-
name: '#super-secret-channel
network: 'MyNetwork'
secret: 'ThePasswordIsPassw0rd'
repos:
- SecretRepo
Note: support for channel passwords is not implemented in the upstream KGB bot. There's an open merge request for it and the patch has been applied to TPA's KGB install, but new installs will need to manually apply that patch.
Note that GitLab admins might be able to configure system-wide
hooks in the admin section, although it's not entirely clear
how does relate to the per-project hooks so those have not been
enabled. Furthermore, it is possible for GitLab admins with root
access to enable webhooks on all projects, with the webhook rake
task. For example, running this on the GitLab server (currently
gitlab-02
) will enable the above hook on all repositories:
sudo gitlab-rake gitlab:web_hook:add URL='https://kgb-bot.torproject.org/webhook/'
Note that by default, the rake task only enables Push
events. You
need the following patch to enable others:
modified lib/tasks/gitlab/web_hook.rake
@@ -10,7 +10,19 @@ namespace :gitlab do
puts "Adding webhook '#{web_hook_url}' to:"
projects.find_each(batch_size: 1000) do |project|
print "- #{project.name} ... "
- web_hook = project.hooks.new(url: web_hook_url)
+ web_hook = project.hooks.new(
+ url: web_hook_url,
+ push_events: true,
+ issues_events: true,
+ confidential_issues_events: false,
+ merge_requests_events: true,
+ tag_push_events: true,
+ note_events: true,
+ confidential_note_events: false,
+ job_events: true,
+ pipeline_events: true,
+ wiki_page_events: true,
+ )
if web_hook.save
puts "added".color(:green)
else
See also the upstream issue and our GitLab issue 7 for details.
You can also remove a given hook from all repos with:
sudo gitlab-rake gitlab:web_hook:rm URL='https://kgb-bot.torproject.org/webhook/'
And, finally, list all hooks with:
sudo gitlab-rake gitlab:web_hook:list
The hook needs a secret token to be operational. This secret is stored
in Puppet's Trocla database as profile::kgb_bot::gitlab_token
:
trocla get profile::kgb_bot::gitlab_token plain
That is configured in profile::kgb_bot
in case that is not working.
Note that if you have a valid personal access token, you can manage
the hooks with the gitlab-hooks.py
script in gitlab-tools
script. For example, this created a webhook for the tor-nagios
project:
export HTTP_KGB_TOKEN=$(ssh root@puppet.torproject.org trocla get profile::kgb_bot::gitlab_token plain)
./gitlab-hooks.py -p tpo/tpa/debian/deb.torproject.org-keyring create --no-releases-events --merge-requests-events --issues-events --push-events --url https://kgb-bot.torproject.org/webhook/?channel=tor-admin
Setting up two-factor authentication (2FA)
We strongly recommend you enable two-factor authentication on GitLab. This is well documented in the GitLab manual, but basically:
-
first, pick a 2FA "app" (and optionally a hardware token) if you don't have one already
-
head to your account settings
-
register your 2FA app and save the recovery codes somewhere. if you need to enter a URL by hand, you can scan the qrcode with your phone or create one by following this format:
otpauth://totp/$ACCOUNT?secret=$KEY&issuer=gitlab.torproject.org
where...
-
$ACCOUNT
is theAccount
field in the 2FA form -
$KEY
is theKey
field in the 2FA form, without spaces
-
-
register the 2FA hardware token if available
GitLab requires a 2FA "app" even if you intend to use a hardware token. The 2FA "app" must implement the TOTP protocol, for example the Google Authenticator or a free alternative (for example free OTP plus, see also this list from the Nextcloud project). The hardware token must implement the U2F protocol, which is supported by security tokens like the YubiKey, Nitrokey, or similar.
Deleting sensitive attachments
If a user uploaded a secret attachment by mistake, just deleting the issue is not sufficient: it turns out that doesn't remove the attachments from disk!
To fix this, ask a sysadmin to find the file in the
/var/opt/gitlab/gitlab-rails/uploads/
directory. Assuming the
attachment URL is:
https://gitlab.torproject.org/anarcat/test/uploads/7dca7746b5576f6c6ec34bb62200ba3a/openvpn_5.png
There should be a "hashed" directory and a hashed filename in there, which looks something like:
./@hashed/08/5b/085b2a38876eeddc33e3fbf612912d3d52a45c37cee95cf42cd3099d0a3fd8cb/7dca7746b5576f6c6ec34bb62200ba3a/openvpn_5.png
The second directory (7dca7746b5576f6c6ec34bb62200ba3a
above) is the
one visible in the attachment URL. The last part is the actual
attachment filename, but since those can overlap between issues, it's
safer to look for the hash. So to find the above attachment, you
should use:
find /var/opt/gitlab/gitlab-rails/uploads/ -name 7dca7746b5576f6c6ec34bb62200ba3a
And delete the file in there. The following should do the trick:
find /var/opt/gitlab/gitlab-rails/uploads/ -name 7dca7746b5576f6c6ec34bb62200ba3a | sed 's/^/rm /' > delete.sh
Verify delete.sh
and run it if happy.
Note that GitLab is working on an attachment manager that should allow web operators to delete old files, but it's unclear how or when this will be implemented, if ever.
Publishing GitLab pages
GitLab features a way to publish websites directly from the continuous
integration pipelines, called GitLab pages. Complete
documentation on how to publish such pages is better served by the
official documentation, but creating a .gitlab-ci.yml
should get you
rolling. For example, this will publish a hugo
site:
image: registry.gitlab.com/pages/hugo/hugo_extended:0.65.3
pages:
script:
- hugo
artifacts:
paths:
- public
only:
- main
If .gitlab-ci.yml
already contains a job in the build
stage that
generates the required artifacts in the public
directory, then
including the pages-deploy.yml
CI template should be sufficient:
include:
- project: tpo/tpa/ci-templates
file: pages-deploy.yml
GitLab pages are published under the *.pages.torproject.org
wildcard
domain. There are two types of projects hosted at the TPO GitLab:
sub-group projects, usually under the tpo/
super-group, and user
projects, for example anarcat/myproject
. You can also publish a page
specifically for a user. The URLs will look something like this:
Type of GitLab page | Name of the project created in GitLab | Website URL |
---|---|---|
User pages | username.pages.torproject.net |
https://username.pages.torproject.net |
User projects | user/projectname |
https://username.pages.torproject.net/projectname |
Group projects | tpo/group/projectname |
https://tpo.pages.torproject.net/group/projectname |
Accepting merge requests on wikis
Wiki permissions are not great, but there's a workaround: accept merge requests for a git replica of the wiki.
This documentation was moved to the documentation section.
Renaming a branch globally
While git
supports renaming branches locally with the git branch --move $to_name
command, this doesn't actually rename the remote
branch. That process is more involved.
Changing the name of a default branch both locally and on remotes can be partially automated with the use of anarcat's branch rename script. The script basically renames the branch locally, pushes the new branch and deletes the old one, with special handling of GitLab remotes, where it "un-protects" and "re-protects" the branch.
You should run the script with an account that has "Maintainer" or
"Owner" access to GitLab, so that it can do the above GitLab API
changes. You will then need to provide an access token through
the GITLAB_PRIVATE_TOKEN
environment variable, which should have the
scope api
.
So, for example, this will rename the master
branch to main
on the
local and remote repositories:
GITLAB_PRIVATE_TOKEN=REDACTED git-branch-rename-remote
If you want to rename another branch or remote, you can specify those
on the commandline as well. For example, this will rename the
develop
branch to dev
on the gitlab
remote:
GITLAB_PRIVATE_TOKEN=REDACTED git-branch-rename-remote --remote gitlab --from-branch develop --to-branch dev
The command can also be used to fix other repositories so that they correctly rename their local branch too. In that case, the GitLab repository is already up to date, so there is no need for an access token.
Other users, then can just run this command will rename master
to
main
on the local repository, including remote tracking branches:
git-branch-rename-remote
Obviously, users without any extra data in their local repository can just destroy their local repository and clone a new one to get the correct configuration.
Keep in mind that there may be a few extra steps and considerations to make when changing the name of a heavily used branch, detailed below.
Modifying open Merge Requests
A merge request that is open against the modified branch may be bricked as a result of deleting the old branch name from the Gitlab remote. To avoid this, after creating and pushing the new branch name, edit each merge request to target the new branch name before deleting the old branch.
Updating gitolite
Many GitLab repositories are mirrored or maintained manually on
Gitolite (git-rw.torproject.org
) and Gitweb. The ssh
step for
the above automation script will fail for Gitolite and these steps
need to be done manually by a sysadmin. Open a TPA ticket with a
list of the Gitolite repositories you would like to update and a
sysadmin will perform the following magic:
cd /srv/git.torproject.org/repositories/
for repo in $list; do
git -C "$repo" symbolic-ref HEAD refs/heads/$to_branch
done
This will update Gitolite, but it won't update Gitweb until the repositories have been pushed to. To update Gitweb immediately, ask your friendly sysadmin to run the above command on the Gitweb server as well.
Updating Transifex
If your repository relies on Transifex for translations, make sure to update the Transifex config to pull from the new branch. To do so, open a l10n ticket with the new branch name changes.
Find the project associated with a project ID
Sometimes you'll find a numeric project ID instead of a human-readable one. For example, you can see on the arti project that it says:
Project ID: 647
So you can easily find the project ID of a project right on the project's front page. But what if you only have the ID and need to find what project it represents? You can talk with the API, with a URL like:
https://gitlab.torproject.org/api/v4/projects/<PROJECT_ID>
For example, this is how I found the above arti project from the
Project ID 647
:
$ curl -s 'https://gitlab.torproject.org/api/v4/projects/647' | jq .web_url
"https://gitlab.torproject.org/tpo/core/arti"
Find the project associated with a hashed repository name
Git repositories are not stored under the project name in GitLab anymore, but under a hash of the project ID. The easiest way to get to the project URL from a hash is through the rails console, for example:
sudo gitlab-rails console
then:
ProjectRepository.find_by(disk_path: '@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9').project
... will return the project
object. You probably want the
path_with_namespace
from there:
ProjectRepository.find_by(disk_path: '@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9').project.path_with_namespace
You can chain those in the console to display multiple repos:
['@hashed/e0/b0/e0b08ad65f5b6f6b75d18c8642a041ca1160609af1b7dfc55ab7f2d293fd8758',
'@hashed/f1/5a/f15a3a5d34619f23d79d4124224e69f757a36d8ffb90aa7c17bf085ceb6cd53a',
'@hashed/09/dc/09dc1bb2b25a72c6a5deecbd211750ba6f81b0bd809a2475eefcad2c11ab9091',
'@hashed/a0/bd/a0bd94956b9f42cde97b95b10ad65bbaf2a8d87142caf819e4c099ed75126d72',
'@hashed/32/71/32718321fcedd1bcfbef86cac61aa50938668428fddd0e5810c97b3574f2e070',
'@hashed/7d/a0/7da08b799010a8dd3e6071ef53cd8f52049187881fbb381b6dfe33bba5a8f8f0',
'@hashed/26/c1/26c151f9669f97e9117673c9283843f75cab75cf338c189234dd048f08343e69',
'@hashed/92/b6/92b690fedfae7ea8024eb6ea6d53f64cd0a4d20e44acf71417dca4f0d28f5c74',
'@hashed/ff/49/ff49a4f6ed54f15fa0954b265ad056a6f0fdab175ac8a1c3eb0a98a38e46da3d',
'@hashed/9a/0d/9a0d49266d4f5e24ff7841a16012f3edab7668657ccaee858e0d55b97d5b8f9a',
'@hashed/95/9d/959daad7593e37c5ab21d4b54173deb4a203f4071db42803fde47ecba3f0edcd'].each do |hash| print( ProjectRepository.find_by(disk_path: hash).project.path_with_namespace, "\n") end
Finally, you can also generate a rainbow table of all possible hashes to get the project ID, and from there, find the project using the API above. Here's a Python blob that will generate a hash for every project ID up to 2000:
import hashlib
for i in range(2000):
h = hashlib.sha256()
h.update(str(i).encode('ascii'))
print(i, h.hexdigest())
Connect to the PostgreSQL server
We previously had instructions on how to connect to the GitLab Omnibus PostgreSQL server, with the upstream instructions but this is now deprecated. Normal PostgreSQL procedures should just work, like:
sudo -u postgres psql
Pager playbook
- Grafana Dashboards:
TODO: document how to handle common problems in GitLab
Troubleshooting
Upstream recommends running this command to self-test a GitLab instance:
sudo gitlab-rake gitlab:check SANITIZE=true
This command also shows general info about the GitLab instance:
sudo gitlab-rake gitlab:env:info
it is especially useful to find on-disk files and package versions.
Filtering through json logs
The most useful log to look into when trying to identify errors or traffic
patterns is /var/log/gitlab-rails/production_json.log
. It shows all of the
activity on the web interface.
Since the file is formatted in JSON, to filter through this file, you need to
use jq
to filter lines. Here are some useful examples that you can build upon
for your search:
To find requests that got a server error (e.g. 500 http status code) response:
jq 'select(.status==500)' production_json.log
To get lines only from a defined period of time:
jq --arg s '2024-07-16T07:10:00' --arg e '2024-07-16T07:19:59' 'select(.time | . >= $s and . <= $e + "z")' prodcution_json.log
To identify the individual IP addresses with the highest number of requests for the day:
jq -rC '.remote_ip' production_json.log | sort | uniq -c | sort -n | tail -10
GitLab pages not found
If you're looking for a way to track GitLab pages error, know that the
webserver logs are in /var/log/nginx/gitlab_pages_access
, but that
only proxies requests for the GitLab Pages engine, which (JSON!) logs
live in /var/log/gitlab/gitlab-pages/current
.
If you get a "error":"domain does not exist"
problem, make sure the
entire pipeline actually succeeds. Typically, the "pages:deploy" job
can fail with:
Artifacts for pages are too large
In that case, you need to go into the Admin Area -> Settings -> Preferences -> Pages and bump the size limit. It defaults to 100MB and we bumped it to 1024MB at the time of writing. Note that GitLab CI/CD also have a similar setting which might (or might not?) affect such problems.
PostgreSQL debugging
The PostgreSQL configuration in GitLab was particular, but you should now follow our normal PostgreSQL procedures.
Disk full on GitLab server
If the main GitLab server is running out of space (as opposed to runners, see Runner disk fills up for that scenario), then it's projects that are taking up space. We've typically had trouble with artifacts taking up space, for example (team#40615 (closed), team#40517 (closed)).
You can see the largest disk users in the GitLab admin area in Overview -> Projects -> Sort by: Largest repository.
Note that, although it's unlikely, it's technically possible that an archived project takes up space, so make sure you check the "Show archived projects" option in the "Sort by" drop down.
In the past, we have worked around that problem by reducing the default artifact retention period from 4 to 2 weeks (team#40516 (closed)) but obviously does not take effect immediately. More recently, we have tried to tweak individual project's retention policies and scheduling strategies (details in team#40615 (closed)).
Please be aware of the known upstream issues that affect those diagnostics as well.
To obtain a list of project sorted by space usage, log on to GitLab using an
account with administrative privileges and open the Projects page
sorted by Largest repository
. The total space consumed by each project is
displayed and clicking on a specific project shows a breakdown of how this space
is consumed by different components of the project (repository, LFS, CI
artifacts, etc.).
If a project is consuming an unexpected amount of space for artifacts, the scripts from the tpo/tpa/gitlab-tools project can by utilized to obtain a breakdown of the space used by job logs and artifacts, per job or per pipeline. These scripts can also be used to manually remove such data, see the gitlab-tools README. Additional guidance regarding job artifacts on the Job artifacts using too much space upstream documentation page.
It's also possible to compile some CI artifact usage statistics directly on the
GitLab server. To see if expiration policies work (or if "kept" artifacts or
old job.log
are a problem), use this command (which takes a while to
run):
find -mtime +14 -print0 | du --files0-from=- -c -h | tee find-mtime+14-du.log
To limit this to job.log
, of course, you can do:
find -name "job.log" -mtime +14 -print0 | du --files0-from=- -c -h | tee find-mtime+14-joblog-du.log
If we ran out of space on the object storage because of the GitLab
registry, consider purging untagged manifests by tweaking the
cron job defined in profile::gitlab::app
in Puppet.
Incoming email routing
Incoming email may sometimes still get routed through mx-dal-01
, but
generally gets delivered directly to the Postfix server on gitlab-02
, and from
there, to a dovecot mailbox. You can use postfix-trace
to confirm
the message correctly ended up there.
Normally, GitLab should be picking mails from the mailbox
(/srv/mail/git@gitlab.torproject.org/Maildir/
) regularly, and
deleting them when done. If that is not happening, look at the
mailroom logs:
tail -f /var/log/gitlab/mailroom/mail_room_json.log | jq -c
A working run will look something like this:
{"severity":"INFO","time":"2022-08-29T20:15:57.734+00:00","context":{"email":"git@gitlab.torproject.org","name":"inbox"},"action":"Processing started"}
{"severity":"INFO","time":"2022-08-29T20:15:57.734+00:00","context":{"email":"git@gitlab.torproject.org","name":"inbox"},"uid":7788,"action":"asking arbiter to deliver","arbitrator":"MailRoom::Arbitration::Redis"}.734+00:00","context":{"email":"git@gitlab.torproject.org","name":"inbox"},"action":"Getting new messages","unread":{"count":1,"ids":[7788]},"to_be_delivered":{"count":1,"ids":[7788]}}ext":{"email":"git@gitlab.torproject.org","name":"inbox"},"uid":7788,"action":"sending to deliverer","deliverer":"MailRoom::Delivery::Sidekiq","byte_size":4162}","delivery_method":"Sidekiq","action":"message pushed"}
{"severity":"INFO","time":"2022-08-29T20:15:57.744+00:00","context":{"email":"git@gitlab.torproject.org","name":"inbox"},"action":"Processing started"}
{"severity":"INFO","time":"2022-08-29T20:15:57.744+00:00","context":{"email":"git@gitlab.torproject.org","name":"inbox"},"action":"Getting new messages","unread":{"count":0,"ids":[]},"to_be_delivered":{"count":0,"ids":[]}}0","context":{"email":"git@gitlab.torproject.org","name":"inbox"},"action":"Idling"}
Emails should be processed every minute or so. If they are not, the
mailroom
process might be crashed, you can see if it's running with:
gitlabctl status mailroom
Example running properly:
root@gitlab-02:~# gitlab-ctl status mailroom
run: mailroom: (pid 3611591) 247s; run: log: (pid 2993172) 370149s
Example stopped:
root@gitlab-02:~# gitlab-ctl status mailroom
finish: mailroom: (pid 3603300) 5s; run: log: (pid 2993172) 369429s
Startup failures do not show up in the JSON log file, but instead in another logfile, see:
tail -f /var/log/gitlab/mailroom/current
If you see a crash, it might be worth looking for an upstream regression, also look in omnibus-gitlab.
Outgoing email
Follow the email not sent procedure. TL;DR:
sudo gitlab-rails console
(Yes it takes forever.) Then check if the settings are sane:
--------------------------------------------------------------------------------
Ruby: ruby 3.0.5p211 (2022-11-24 revision ba5cf0f7c5) [x86_64-linux]
GitLab: 15.10.0 (496a1d765be) FOSS
GitLab Shell: 14.18.0
PostgreSQL: 12.12
------------------------------------------------------------[ booted in 28.31s ]
Loading production environment (Rails 6.1.7.2)
irb(main):003:0> ActionMailer::Base.delivery_method
=> :smtp
irb(main):004:0> ActionMailer::Base.smtp_settings
=>
{:user_name=>nil,
:password=>nil,
:address=>"localhost",
:port=>25,
:domain=>"localhost",
:enable_starttls_auto=>false,
:tls=>false,
:ssl=>false,
:openssl_verify_mode=>"none",
:ca_file=>"/opt/gitlab/embedded/ssl/certs/cacert.pem"}
Then test an email delivery:
Notify.test_email('noreply@torproject.org', 'Hello World', 'This is a test message').deliver_now
A working delivery will look something like this, with the last line in green:
irb(main):001:0> Notify.test_email('noreply@torproject.org', 'Hello World', 'This is a test message').deliver_now
Delivered mail 64219bdb6e919_10e66548d042948@gitlab-02.mail (20.1ms)
=> #<Mail::Message:296420, Multipart: false, Headers: <Date: Mon, 27 Mar 2023 13:36:27 +0000>, <From: GitLab <git@gitlab.torproject.org>>, <Reply-To: GitLab <noreply@torproject.org>>, <To: noreply@torproject.org>, <Message-ID: <64219bdb6e919_10e66548d042948@gitlab-02.mail>>, <Subject: Hello World>, <Mime-Version: 1.0>, <Content-Type: text/html; charset=UTF-8>, <Content-Transfer-Encoding: 7bit>, <Auto-Submitted: auto-generated>, <X-Auto-Response-Suppress: All>>
A failed delivery will also say Delivered mail
but will
include an error message as well. For example, in issue 139 we had
this error:
irb(main):006:0> Notify.test_email('noreply@torproject.org', 'Hello World', 'This is a test message').deliver_now
Delivered mail 641c797273ba1_86be948d03829@gitlab-02.mail (7.2ms)
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/net-protocol-0.1.3/lib/net/protocol.rb:46:in `connect_nonblock': SSL_connect returned=1 errno=0 state=error: certificate verify failed (self signed certificate in certificate chain) (OpenSSL::SSL::SSLError)
Gitlab registry troubleshooting
If something goes with the GitLab Registry feature, you should first look at the logs in:
tail -f /var/log/gitlab/registry/current /var/log/gitlab/nginx/gitlab_registry_*.log /var/log/gitlab/gitlab-rails/production.log
The first one might be the one with more relevant information, but is the hardest to parse, as it's this weird "date {JSONBLOB}" format that no human or machine can parse.
You can restart just the registry with:
gitlab-ctl restart registry
A misconfiguration of the object storage backend will look like this when uploading a container:
Error: trying to reuse blob sha256:61581d479298c795fa3cfe95419a5cec510085ec0d040306f69e491a598e7707 at destination: pinging container registry containers.torproject.org: invalid status code from registry 503 (Service Unavailable)
The registry logs might have something like this:
2023-07-18_21:45:26.21751 time="2023-07-18T21:45:26.217Z" level=info msg="router info" config_http_addr="127.0.0.1:5000" config_http_host= config_http_net= config_http_prefix= config_http_relative_urls=true correlation_id=01H5NFE6E94A566P4EZG2ZMFMT go_version=go1.19.8 method=HEAD path="/v2/anarcat/test/blobs/sha256:61581d479298c795fa3cfe95419a5cec510085ec0d040306f69e491a598e7707" root_repo=anarcat router=gorilla/mux vars_digest="sha256:61581d479298c795fa3cfe95419a5cec510085ec0d040306f69e491a598e7707" vars_name=anarcat/test version=v3.76.0-gitlab
2023-07-18_21:45:26.21774 time="2023-07-18T21:45:26.217Z" level=info msg="authorized request" auth_project_paths="[anarcat/test]" auth_user_name=anarcat auth_user_type=personal_access_token correlation_id=01H5NFE6E94A566P4EZG2ZMFMT go_version=go1.19.8 root_repo=anarcat vars_digest="sha256:61581d479298c795fa3cfe95419a5cec510085ec0d040306f69e491a598e7707" vars_name=anarcat/test version=v3.76.0-gitlab
2023-07-18_21:45:26.30401 time="2023-07-18T21:45:26.303Z" level=error msg="unknown error" auth_project_paths="[anarcat/test]" auth_user_name=anarcat auth_user_type=personal_access_token code=UNKNOWN correlation_id=01H5NFE6CZBE49BZ6KBK4EHSJ1 detail="SignatureDoesNotMatch: The request signature we calculated does not match the signature you provided. Check your key and signing method.\n\tstatus code: 403, request id: 17731468F69A0F79, host id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8" error="unknown: unknown error" go_version=go1.19.8 host=containers.torproject.org method=HEAD remote_addr=64.18.183.94 root_repo=anarcat uri="/v2/anarcat/test/blobs/sha256:a55f9a4279c12800590169f7782b956e5c06ec88ec99c020dd111a7a1dcc7eac" user_agent="containers/5.23.1 (github.com/containers/image)" vars_digest="sha256:a55f9
If you suspect the object storage backend to be the problem, you
should try to communicate with the MinIO server by configuring the
rclone
client on the GitLab server and trying to manipulate the
server. Look for the access token in /etc/gitlab/gitlab.rb
and use
it to configure rclone
like this:
rclone config create minio s3 provider Minio endpoint https://minio.torproject.org:9000/ region dallas access_key_id gitlab-registry secret_access_key REDACTED
Then you can list the registry bucket:
rclone ls minio:gitlab-registry/
See how to Use rclone as an object storage client for more ideas.
The above may reproduce the above error from the registry:
SignatureDoesNotMatch: The request signature we calculated does not match the signature you provided. Check your key and signing method.
That is either due to an incorrect access key or bucket. An error that
was made during the original setup was to treat gitlab/registry
as a
bucket, while it's a subdirectory... This was fixed by switching to
gitlab-registry
as a bucket name. Another error we had was to use
endpoint
instead of regionendpoint
.
Another tweak that was done was to set a region in MinIO. Before the right region was set and matching in the configuration, we had this error in the registry logs:
2023-07-18_21:04:57.46099 time="2023-07-18T21:04:57.460Z" level=fatal msg="configuring application: 1 error occurred:\n\t* validating region provided: dallas\n\n"
As a last resort, you can revert back to the filesystem storage
by commenting out the storage => { ... 's3' ... }
block in
profile::gitlab::app
and adding a line in the gitlab_rails
blob
like:
registry_path => '/var/opt/gitlab/gitlab-rails/shared/registry',
Note that this is a risky operation, as you might end up with a "split brain" where some images are on the filesystem, and some on object storage. Warning users with maintenance announcement on the GitLab site might be wise.
In the same section, you can disable the registry by default on all projects with:
gitlab_default_projects_features_container_registry => false,
... or disable it site-wide with:
registry => {
enable => false
# [...]
}
Note that the registry
configuration is stored inside the Docker
Registry config.yaml
file as a single line that looks like JSON. You
may think it's garbled and the reason why things don't work, but it
isn't, that is valid YAML, just harder to parse. Blame gitlab-ctl
's
Chef cookbook on that... A non-mangled version of the working config
would look like:
storage:
s3:
accesskey: gitlab-registry
secretkey: REDACTED
region: dallas
regionendpoint: https://minio.torproject.org:9000/
bucket: gitlab-registry
Another option that was explored while setting up the registry is enabling the debug server.
HTTP 500 Internal Server Error
If pushing an image to the registry fails with a HTTP 500 error, it's possible
one of the image's layers is too large and exceeding the Nginx buffer. This can
be confirmed by looking in /var/log/gitlab/nginx/gitlab_registry_error.log
:
2024/08/07 14:10:58 [crit] 1014#1014: *47617170 pwritev() "/run/nginx/client_body_temp/0000090449" has written only 110540 of 131040, client: [REDACTED], server: containers.torproject.org, request: "PATCH /v2/lavamind/ci-test/torbrowser/blobs/uploads/df0ee99b-34cb-4cb7-81d7-232640881f8f?_state=HMvhiHqiYoFBC6mZ_cc9AnjSKkQKvAx6sZtKCPSGVZ97Ik5hbWUiOiJsYXZhbWluZC9jaS10ZXN0L3RvcmJyb3dzZXIiLCJVVUlEIjoiZGYwZWU5OWItMzRjYi00Y2I3LTgxZDctMjMyNjQwODgxZjhmIiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDI0LTA4LTA3VDEzOjU5OjQ0Ljk2MTYzNjg5NVoifQ%3D%3D HTTP/1.1", host: "containers.torproject.org"
This happens because Nginx buffers such uploads under /run
, which is a tmpfs
with a default size of 10% of server's total memory. Possible solutions include
increasing the size of the tmpfs, or disabling buffering (but this is untested
and might not work).
HTTP 502 Bad Gateway
If such an error occurs when pushing an image that takes a long time (eg. because of a slow uplink) it's possible the authorization token lifetime limit is being exceeded.
By default the token lifetime is 5 minutes. This setting can be changed via the GitLab admin web interface, in the Container registry configuration section.
Disaster recovery
In case the entire GitLab machine is destroyed, a new server should be provisionned in the howto/ganeti cluster (or elsewhere) and backups should be restored using the below procedure.
Running an emergency backup
A full backup can be ran as root with:
/usr/bin/gitlab-rake gitlab:backup:create
Backups are stored as a tar file in /srv/gitlab-backup
and do not
include secrets, which are backed up separately, for example with:
umask 0077 && tar -C /var/opt/gitlab -czf /srv/gitlab-backup/config_backup$(date +"\%Y\%m\%dT\%H\%M").tar.gz
See /etc/cron.d/gitlab-config-backup
, and the gitlab::backup
and
profile::gitlab::app
classes for the actual jobs that runs nightly.
Recovering this wiki from backups
If you need to immediately restore the wiki from backups, you can head to the backup server and restore the directory:
/var/opt/gitlab/git-data/repositories/@hashed/11/f8/11f8e31ccbdbb7d91589ecf40713d3a8a5d17a7ec0cebf641f975af50a1eba8d.git
The hash above is the SHA256 checksum of the wiki-replica project id (695):
$ printf 695 | sha256sum
11f8e31ccbdbb7d91589ecf40713d3a8a5d17a7ec0cebf641f975af50a1eba8d -
On the backup server, that would be something like:
bconsole
restore
5
46
cd /var/opt/gitlab/git-data/repositories/@hashed/11/f8
mark 11f8e31ccbdbb7d91589ecf40713d3a8a5d17a7ec0cebf641f975af50a1eba8d.git
done
yes
The files will end up in /var/tmp/bacula-restore
on
gitlab-02
. Note that the number 46
, above, will vary according to
other servers backed up on the backup server, of course.
This should give you a copy of the git repository, which you can then use, presumably, to read this procedure and restore the rest of GitLab.
(Although then, how did you read this part of the procedure? Anyways, I thought this could save your future self one day. You'll thank me later.)
Restoring from backups
The upstream documentation has a fairly good restore procedure, but because our backup procedure is non-standard -- we exclude repositories and artifacts, for example -- you should follow this procedure instead.
TODO: note that this procedure was written before upstream reorganized their documentation to create a dedicated migration manual that is similar to this procedure. The following procedure should be reviewed and possibly updated in comparison.
Note that the procedure assumes some familiarity with the general backup and restore procedures, particularly how to restore a bunch of files from the backup server (see the restore files section.
This entire procedure will take many hours to complete. In our tests, it took:
- an hour or two to setup a VM
- less than an hour to do a basic GitLab install
- 20 minutes to restore the basic system (database, tickets are visible at this point)
- an hour to restore repositories
- another hour to restore artifacts
This gives a time to recovery of about 5 to 6 hours. Most of that time is spent waiting for files to be copied, interspersed with a few manual commands.
So here's the procedure that was followed to deploy a development server, from backups, in tpo/tpa/team#40820 (run everything as root):
-
install GitLab using Puppet: basically create a server large enough for everything, apply the Puppet
role::gitlab
That includes creating new certificates and DNS records, if not already present (those may be different if you are created a dev server from backups, for example, which was the case for the the above ticket).
Also note that you need to install the same GitLab version as the one from the backup. If you are unsure of the GitLab version that's in the backup (bad day uh?), try to restore the
/var/opt/gitlab/gitlab-rails/VERSION
file from the backup server first. -
at this point, a blank GitLab installation should be running. verify that you can reach the login page, possibly trying to login with the root account, because a working GitLab installation is a pre-requisite for the rest of the restore procedure.
(it might be technically possible to restore the entire server from scratch using only the backup server, but that procedure has not been established or tested.)
-
on the backup server (currently
bacula-director-01
), restore the latest GitLab backup job from the/srv/gitlab-backup
and the secrets from/etc/gitlab
:# bconsole *restore To select the JobIds, you have the following choices: [...] 5: Select the most recent backup for a client [...] Select item: (1-13): 5 Defined Clients: [...] 47: gitlab-02.torproject.org-fd [...] Select the Client (1-98): 47 Automatically selected FileSet: Standard Set [...] Building directory tree for JobId(s) 199535,199637,199738,199847,199951 ... ++++++++++++++++++++++++++++++++ 596,949 files inserted into the tree. [...] cwd is: / $ cd /etc cwd is: /etc/ $ mark gitlab 84 files marked. $ cd /srv cwd is: /srv/ $ mark gitlab-backup 12 files marked. $ done
This took about 20 minutes in a simulation done in June 2022, including 5 minutes to load the file list.
-
move the files in place and fix ownership, possibly moving pre-existing backups out of place, if the new server has been running for more than 24 hours:
mkdir /srv/gitlab-backup.blank mv /srv/gitlab-backup/* /srv/gitlab-backup.blank cd /var/tmp/bacula-restores/srv/gitlab-backup mv *.tar.gz backup_information.yml db /srv/gitlab-backup/ cd /srv/gitlab-backup/ chown git:git *.tar.gz backup_information.yml
-
stop GitLab services that talk with the database (those might have changed since the time of writing, review upstream documentation just in case:
gitlab-ctl stop puma gitlab-ctl stop sidekiq
-
restore the secrets files (note: this wasn't actually tested, but should work):
chown root:root /var/tmp/bacula-restores/etc/gitlab/* mv /var/tmp/bacula-restores/etc/gitlab/{gitlab-secrets.json,gitlab.rb} /etc/gitlab/
Note that if you're setting up a development environment, you do not want to perform that step, which means that CI/CD variables and 2FA tokens will be lost, which means people will need to reset those and login with their recovery codes. This is what you want for a dev server, because you do not want a possible dev server compromise to escalate to the production server, or the dev server to have access to the prod deployments.
Also note that this step was not performed on the dev server test and this lead to problems during login: while it was possible to use a recovery code to bypass 2FA, it wasn't possible to reset the 2FA configuration afterwards.
-
restore the files:
gitlab-backup restore
This last step will ask you to confirm the restore, because it actually destroys the existing install. It will also ask you to confirm the rewrite of the
authorized_keys
file, which you want to accept (unless you specifically want to restore that from backup as well). -
restore the database: note that this was never tested. Now you should follow the direct backup recovery procedure.
-
restart the services and check everything:
gitlab-ctl reconfigure gitlab-ctl restart gitlab-rake gitlab:check SANITIZE=true gitlab-rake gitlab:doctor:secrets gitlab-rake gitlab:lfs:check gitlab-rake gitlab:uploads:check gitlab-rake gitlab:artifacts:check
Note: in the simulation, GitLab was started like this instead, which just worked as well:
gitlab-ctl start puma gitlab-ctl start sidekiq
We did try the "verification" tasks above, but many of them failed, especially in the
gitlab:doctor:secrets
job, possibly because we didn't restore the secrets (deliberately).
At this point, basic functionality like logging-in and issues should
be working again, but not wikis (because they are not restored
yet). Note that it's normal to see a 502 error message ("Whoops,
GitLab is taking too much time to respond.") when GitLab restarts: it
takes a long time to start (think minutes)... You can follow its
progress in /var/log/gitlab/gitlab-rails/*.log
.
Be warned that the new server will start sending email
notifications, for example for issues with an due date, which might be
confusing for users if this is a development server. If this is a
production server, that's a good thing. If it's a development server,
you may want to disable email altogether in the GitLab server, with
this line in Hiera data (eg. hiera/roles/gitlab_dev.yml
) in the
tor-puppet.git
repository:
profile::gitlab::app::email_enabled: false
Note that GitLab 16.6 also ships with a silent mode that could significantly improve on the above.
So the above procedure only restores a part of the system, namely what is covered by the nightly backup job. To restore the rest (at the time of writing: artifacts and repositories, which includes wikis!), you also need to specifically restore those files from the backup server.
For example, this procedure will restore the repositories from the backup server:
$ cd /var/opt/gitlab/git-data
cwd is: /var/opt/gitlab
$ mark repositories
113,766 files marked.
$ done
The files will then end up in
/var/tmp/bacula-restores/var/opt/gitlab/git-data
. They will need to
be given to the right users and moved into place:
chown -R git:root /var/tmp/bacula-restores/var/opt/gitlab/git-data/repositories
mv /var/opt/gitlab/git-data/repositories /var/opt/gitlab/git-data/repositories.orig
mv /var/tmp/bacula-restores/var/opt/gitlab/git-data/repositories /var/opt/gitlab/git-data/repositories/
During the last simulation, restoring repositories took an hour.
Restoring artifacts is similar:
$ cd /srv/gitlab-shared
cwd is: /srv/gitlab-shared/
$ mark artifacts
434,788 files marked.
$ done
Then the files need to be given and moved as well, notice the
git:git
instead of git:root
:
chown -R git:git /var/tmp/bacula-restores/srv/gitlab-shared/artifacts
mv /var/opt/gitlab/gitlab-rails/shared/artifacts/ /var/opt/gitlab/gitlab-rails/shared/artifacts.orig
mv /var/tmp/bacula-restores/srv/gitlab-shared/artifacts /var/opt/gitlab/gitlab-rails/shared/artifacts/
Restoring the artifacts took another hour of copying.
And that's it! Note that this procedure may vary if the subset of files backed up by the GitLab backup job changes.
Reference
Installation
Main GitLab installation
The current GitLab server was setup in the howto/ganeti cluster in a
regular virtual machine. It was configured with howto/puppet with the
roles::gitlab
. That, in turn, includes a series of profile
classes which configure:
-
profile::gitlab::web
: nginx vhost and TLS cert, which depends onprofile::nginx
built for the howto/cache service and relying on the puppet/nginx module from the Forge -
profile::gitlab::app
: the core of the configuration of gitlab itself, uses the puppet/gitlab module from the Forge, with Prometheus, Grafana, PostgreSQL and Nginx support disabled, but Redis, and some exporters enabled -
profile::gitlab::db
: the PostgreSQL server -
profile::dovecot::private
: a simple IMAP server to receive mails destined to GitLab
This installs the GitLab Omnibus distribution which duplicates a lot of resources we would otherwise manage elsewhere in Puppet, mostly Redis now.
The install takes a long time to complete. It's going to take a few minutes to download, unpack, and configure GitLab. There's no precise timing of this procedure yet, but assume each of those steps takes about 2 to 5 minutes.
Note that you'll need special steps to configure the database during the install, see below.
After the install, the administrator account details are stored in
/etc/gitlab/initial_root_password
. After logging in, you most likely
want to disable new signups as recommended, or possibly restore
from backups.
Note that the first gitlab server (gitlab-01) was setup using the Ansible recipes used by the Debian.org project. That install was not working so well (e.g. 503 errors on merge requests) so we migrated to the omnibus package in March 2020, which seems to work better. There might still be some leftovers of that configuration here and there, but some effort was done during the 2022 hackweek (2022-06-28) to clean that up in Puppet at least. See tpo/tpa/gitlab#127 for some of that cleanup work.
PostgreSQL standalone transition
In early 2024, PostgreSQL was migrated to its own setup, outside of GitLab Omnibus, to ease maintenance and backups (see issue 41426). This is how that was performed.
First, there are two different documents upstream explaining how to do this, one is Using a non-packaged PostgreSQL database management server, and the other is Configure GitLab using an external PostgreSQL service. This discrepancy was filed as a bug.
In any case, the profile::gitlab::db
Puppet class is designed to
create a database capable of hosting the GitLab service. It only
creates the database and doesn't actually populate it, which is
something the Omnibus package normally does.
In our case, we backed up the production "omnibus" cluster and restored to the managed cluster using the following procedure:
-
deploy the
profile::gitlab::db
profile, make sure the port doesn't conflict with the omnibus database (e.g. use port5433
instead of5432
), note that the postgres export will fail to start, that's normal because it conflicts with the omnibus one:pat
-
backup the GitLab database a first time, note down the time it takes:
gitlab-backup create SKIP=tar,artifacts,repositories,builds,ci_secure_files,lfs,packages,registry,uploads,terraform_state,pages
-
restore said database into the new database created, noting down the time it took to restore:
date ; time pv /srv/gitlab-backup/db/database.sql.gz | gunzip -c | sudo -u postgres psql -q gitlabhq_production; date
Note that the last step (
CREATE INDEX
) can take a few minutes on its own, even after thepv
progress bar completed. -
drop the database and recreate it:
sudo -u postgres psql -c 'DROP DATABASE gitlabhq_production'; pat
-
post an announcement of a 15-60 minute downtime (adjust according to the above test)
-
change the parameters in
gitlab.rb
to point to the other database cluster (in our case, this is done inprofile::gitlab::app
), make sure you also turn offpostgres
andpostgres_exporter
, with:postgresql['enable'] = false postgresql_exporter['enable'] = false gitlab_rails['db_adapter'] = "postgresql" gitlab_rails['db_encoding'] = "utf8" gitlab_rails['db_host'] = "127.0.0.1" gitlab_rails['db_password'] = "[REDACTED]" gitlab_rails['db_port'] = 5433 gitlab_rails['db_user'] = "gitlab"
... or, in Puppet:
class { 'gitlab': postgresql => { enable => false, }, postgres_exporter => { enable => false, }, gitlab_rails => { db_adapter => 'postgresql', db_encoding => 'utf8', db_host => '127.0.0.1', db_user => 'gitlab', db_port => '5433', db_password => trocla('profile::gitlab::db', 'plain'), # [...] } }
That configuration is detailed in this guide.
-
stop GitLab, but keep postgres running:
gitlab-ctl stop gitlab-ctl start postgresql
-
do one final backup and restore:
gitlab-backup create SKIP=tar,artifacts,repositories,builds,ci_secure_files,lfs,packages,registry,uploads,terraform_state,pages date ; time pv /srv/gitlab-backup/db/database.sql.gz | gunzip -c | sudo -u postgres psql -q gitlabhq_production; date
-
apply the above changes to
gitlab.rb
(or just run Puppet):pat gitlab-ctl reconfigure gitlab-ctl start
-
make sure only one database is running, this should be empty:
gitlab-ctl status | grep postgresql
And this should show only the Debian package cluster:
ps axfu | grep postgresql
GitLab CI installation
See the CI documentation for documentation specific to GitLab CI.
GitLab pages installation
To setup GitLab pages, we followed the GitLab Pages administration manual. The steps taken were as follows:
- add
pages.torproject.net
to the public suffix list (issue 40121 and upstream PR) (although that takes months or years to propagate everywhere) - add
*.pages.torproject.net
andpages.torproject.net
to DNS (dns/domains.git
repository), as A records so that LE DNS-01 challenges still work, along with a CAA record to allow the wildcard onpages.torproject.net
- get the wildcard cert from Let's Encrypt (in
letsencrypt-domains.git
) - deploy the TLS certificate, some GitLab config and a nginx vhost to gitlab-02 with Puppet
- run the status-site pipeline to regenerate the pages
The GitLab pages configuration lives in the profile::gitlab::app
Puppet class. The following GitLab settings were added:
gitlab_pages => {
ssl_certificate => '/etc/ssl/torproject/certs/pages.torproject.net.crt-chained',
ssl_certificate_key => '/etc/ssl/private/pages.torproject.net.key',
},
pages_external_url => 'https://pages.torproject.net',
The virtual host for the pages.torproject.net
domain was configured
through the profile::gitlab::web
class.
GitLab registry
The GitLab registry was setup first by deploying an object storage server (see minio). An access key was created with:
mc admin user svcacct add admin gitlab --access-key gitlab-registry
... and the secret key stored in Trocla.
Then the config was injected in the profile::gitlab::app
class,
mostly inline. The registry itself is configured through the
profile::gitlab::registry
class, so that it could possibly be moved
onto its own host.
That configuration was filled with many perils, partly documented in
tpo/tpa/gitlab#89. One challenge was to get everything working at
once. The software itself is the Docker Registry shipped with
GitLab Omnibus, and it's configured through Puppet, which passes
the value to the /etc/gitlab/gitlab.rb
file which then writes the
final configuration into /var/opt/gitlab/registry/config.yml
.
We take the separate bucket approach in that each service using
object storage has its own bucket assigned. This required a special
policy to be applied to the gitlab
MinIO user:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "BucketAccessForUser",
"Effect": "Allow",
"Action": [
"s3:*"
],
"Resource": [
"arn:aws:s3:::gitlab/*",
"arn:aws:s3:::gitlab"
]
},
{
"Sid": "BucketAccessForUser",
"Effect": "Allow",
"Action": [
"s3:*"
],
"Resource": [
"arn:aws:s3:::gitlab*"
]
}
]
}
That is the policy called gitlab-star-bucket-policy
which grants
access to all buckets prefixed with gitlab
(as opposed to only the
gitlab
bucket itself).
Then we have an access token specifically made for this project called
gitlab-registry
and that restricts the above policy to only the
gitlab-registry
bucket.
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"s3:*"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::gitlab-registry",
"arn:aws:s3:::gitlab-registry/*"
],
"Sid": "BucketAccessForUser"
}
]
}
It might be possible to manage the Docker registry software and configuration directly from Puppet, with Debian package, but that configuration is actually deprecated since 15.8 and unsupported in GitLab 16. I explained our rationale on why this could be interesting in the relevant upstream issue.
We have created a registry
user on the host because that's what
GitLab expects, but it might be possible to use a different, less
generic username by following this guide.
A cron job runs every Saturday to clean up unreferenced layers. Untagged manifests are not purged even if invisible, as we feel maybe those would result in needless double-uploads. If we do run out of disk space on images, that is a policy we could implement.
Upstream documentation on how to manage the registry is available here:
https://docs.gitlab.com/ee/administration/packages/container_registry.html
Upgrades
GitLab upgrades are generally done automatically through unattended-upgrades, but major upgrades are pinned in a preferences file, so they need to be manually approved.
That is done in tor-puppet.git
, in the hiera/roles/gitlab.yaml
file, the profile::gitlab::app::major_version
variable.
SLA
Design
GitLab is a fairly large program with multiple components. The upstream documentation has a good details of the architecture but this section aims at providing a shorter summary. Here's an overview diagram, first:
Note: the above image may be broken as upstream frequently changes the URL. It should be visible in the Simplified component overview of the architecture documentation.
The web frontend is Nginx (which we incidentally also use in our howto/cache system) but GitLab wrote their own reverse proxy called GitLab Workhorse which in turn talks to the underlying GitLab Rails application, served by the Unicorn application server. The Rails app stores its data in a howto/postgresql database. GitLab also offloads long-term background tasks to a tool called sidekiq.
Those all server HTTP(S) requests but GitLab is of course also
accessible over SSH to push/pull git repositories. This is handled by
a separate component called gitlab-shell which acts as a shell
for the git
user.
Workhorse, Rails, sidekiq and gitlab-shell all talk with Redis to store temporary information, caches and session information. They can also communicate with the Gitaly server which handles all communication with the git repositories themselves.
Continuous integration
GitLab also features Continuous Integration (CI). CI is handled by GitLab runners which can be deployed by anyone and registered in the Rails app to pull CI jobs. This is documented in the service/ci page.
Spam control
TODO: document lobby.
Discuss alternatives, e.g. this hackernews discussion about mediawiki moving to gitlab. Their gitlab migration documentation might give us hints on how to improve the spam situation on our end.
A few ideas on tools:
- Tornevall blocklist
- Mediawiki spam control tricks
- Friendly CAPTCHA, considered for inclusion in GitLab
Scalability
We have not looked a lot into GitLab scalability. Upstream has reference architectures which explain how to scale for various user sizes. We have not yet looked into this, and so far have just thrown hardware at GitLab when performance issues come up.
GitLab pages
GitLab pages is "a simple HTTP server written in Go, made to serve GitLab Pages with CNAMEs and SNI using HTTP/HTTP2". In practice, the way this works is that artifacts from GitLab CI jobs get sent back to the central server.
GitLab pages is designed to scale horizontally: multiple pages servers can be deployed and fetch their content and configuration through NFS. They are rearchitecturing this with Object storage (ie. S3 through minio by default, or external existing providers) which might simplify running this but this actually adds complexity to a previously fairly simple design. Note that they have tried using CephFS instead of NFS but that did not work for some reason.
The new pages architecture also relies on the GitLab rails API for configuration (it was a set of JSON files before), which makes it dependent on the Rails API for availability, although that part of the design has exponential back-off time for unavailability of the rails API, so maybe it would survive a downtime of the rails API.
GitLab pages is not currently in use in our setup, but could be used as an alternative to the static mirroring system. See the discussion there for more information about how that compares with the static mirror system.
Update: some tests of GitLab pages were performed in January 2021, with moderate success. There are still concerns about the reliability and scalability of the service, but the service could be used for small sites at this stage. See the GitLab pages installation instructions for details on how this was setup.
Note that the pages are actually on disk, in
/var/opt/gitlab/gitlab-rails/shared/pages/GROUP/.../PROJECT
, for
example the status site pipeline publishes to:
/var/opt/gitlab/gitlab-rails/shared/pages/tpo/tpa/status-site/
Maybe this could be abused to act as a static source in the static mirror system?
Update: see service/static-shim for the chosen solution to deploy websites built in GitLab CI to the static mirror system.
Redacting GitLab confidential issues
Back in 2022, we embarked in the complicated affair of making GitLab stop sending email notifications in cleartext for private issue. This involved MR 101558 and MR 122343, merged in GitLab 16.2 for the GitLab application side. Those add a header like:
X-GitLab-ConfidentialIssue: true
To outgoing email when a confidential issue is created or commented on. Note that internal notes are currently not being redacted, unless they are added to confidential issues, see issue 145.
That header, in turn, is parsed by the outgoing Postfix server to
redact those emails. This is done through a header_checks(5) in
/etc/postfix/header_filter_check
:
/^X-GitLab-ConfidentialIssue:\ true/ FILTER confidential_filter:
That, in turn, sends the email through a pipe(8) transport
defined in master.cf
:
confidential_filter unix - n n - 10 pipe
flags=Rq user=gitlab-confidential null_sender=
argv=/usr/local/sbin/gitlab_confidential_filter --from ${sender} -- ${recipient}
... which, in turn, calls the gitlab_confidential_filter
Python
program which does the following:
- parse the email
- if it has a
X-GitLab-ConfidentialIssue: true
header, parse the email to find the "signature" which links to the relevant GitLab page - prepend a message to that signature
- replace the body of the original message with that redaction
- resend the message after changing the
X-GitLab-ConfidentialIssue
header toredacted
to avoid loops
Notes that if a filtered message does not have a
X-GitLab-ConfidentialIssue: true
header (which should never happen),
it just resends the email as is, as a safety fallback.
The canonical copy of the script is in our tor-puppet.git
repository, in
modules/profile/files/gitlab/gitlab_confidential_filter.py
. A copy
of the script from launch time is available issue gitlab#23.
The filter also relies on other GitLab headers to find the original issue and synthesize a replacement body for the redaction.
The replacement message is:
A new confidential issue was reported and its content was redacted
from this email notification.
... followed by the standard boilerplate GitLab normally appends to outgoing email:
Reply to this email directly or view it on GitLab: $URL
New comments on issues see a slightly different message:
A comment was added to a confidential issue and its content was
redacted from this email notification.
... followed by the same standard boilerplate.
All of this is deployed by Puppet in the profile::gitlab::app
class
and some hacks buried in the profile::postfix
class and its templates.
An alternative would be to encrypt outgoing mails with PGP/MIME. Some software that could do this was considered in the Schleuder retirement, see TPA-RFC-41.
Note that this doesn't work with external participants, which can be used to CC arbitrary email addresses that do not have a GitLab account. If such an email gets added, confidential contents will leak through clear text email, see the discussion in tpo/tpa/gitlab#157.
Issues
File or search for issues in the gitlab project.
Upstream manages its issue queue in GitLab, naturally. You may want to look for upstream regression, also look in omnibus-gitlab.
Known
- Wikis:
- Issues:
- Confidential issues leak cleartext by email (see the Note about confidential issues above, now redacted by a custom Postfix extension)
- Cannot move issues to projects I do not maintain
- (lacking the) Ability to invite users to a confidential issue
- No pinned issues
- Merge requests:
- in general, dealing with large number of merge requests is hard, as it's hard to tell what the status of each individual one is, see upstream issues
- General:
-
fails to detect fr-CA locale, workaround: use
en-GB
or set date to 24-hour format (starting in 16.6), now a dedicated epic - search sucks
-
fails to detect fr-CA locale, workaround: use
See also issues YOU have voted on.
Resolved
- Wikis:
- Issues:
- incident checklists cannot be checked (fixed in 16.7)
- Issues warn about LFS
- General:
- keep gitlab artifacts disk space usage under control, resolved through a home-made script (gitlab-pipeline-vacuum) but also upstream, partially: Clean up old expired artifacts for self-managed instances is done, but not:
- Does not allow users to select 12 vs 24-hour format, fixed in 16.6
- regressions (non-exhaustive list, listing started after 16.6
release, see also this upstream list):
- mailroom fails to start (16.6, hot-patched, fixed in 16.6.1)
- Expired artifacts are not deleted although they should have been (16.5), internal incident, fixed in 16.6.1
-
Files in
pages_deployments
are not deleted on disk whendeactivated_pages_deployments_delete_cron_worker
runs, 16.5, fixed in 16.6.1 - copy reference shortcut disappeared (16.6, worked around by providing a keybinding, c r)
Monitoring and metrics
Monitoring right now is minimal: normal host-level metrics like disk space, CPU usage, web port and TLS certificates are monitored by with our normal infrastructure, as a black box.
Prometheus monitoring is built into the GitLab Omnibus package, so it is not configured through our Puppet like other Prometheus targets. It has still been (manually) integrated in our Prometheus setup and Grafana dashboards (see pager playbook) have been deployed.
Another problem with the current monitoring is that some GitLab exporters are currently hardcoded.
We could also use the following tools to integrate alerting into GitLab better:
- moosh3/gitlab-alerts: autogenerate issues based from Prometheus Alert Manager (with the webhook)
- FUSAKLA/prometheus-gitlab-notifier: similar
- 11.5 shipped a bunch of alerts which we might want to use directly
- the "Incident management" support has various integrations including Prometheus (starting from 13.1) and Pagerduty (which is supported by Prometheus)
We also lack visibility on certain key aspects of GitLab. For example, it would be nice to monitor issue counts in Prometheus or have better monitoring of GitLab pipelines like wait time, success/failure rates and so on. There was an issue open about monitoring individual runners but the runners do not expose (nor do they have access to) that information, so that was scrapped.
There used to be a development server called gitlab-dev-01
that
could be used to test dangerous things if there is a concern a change
could break the production server, but it was retired, see
tpo/tpa/team#41151 for details.
Tests
When we perform important maintenance on the service, like for example when moving the VM from one cluster to another, we want to make sure that everything is still working as expected. This section is a checklist of things to test in order to gain confidence that everything is still working:
- logout/login
- check if all the systemd services are ok
- running gitlab-ctl status
- repository interactions
- cloning
- pushing a commit
- running a ci pipeline with build artifacts
- pulling an image from containers.tpo
- checking if the api is responsive (TODO add example test command)
- look at the web dashboard in the admin section
Logs
GitLab keeps an extensive (excessive?) amount of logs, in
/var/log/gitlab
, which includes PII, including IP addresses.
To see live logs, you can type the handy command:
gitlab-ctl tail
... but that is sort of like drinking from a fire hose. You can inspect the logs of a specific component by passing it as an argument, for example to inspect the mail importer:
gitlab-ctl tail mailroom
Each component is in his own directory, so the equivalent to the above is:
tail -f /var/log/gitlab/mailroom/{current,mail_room_json.log}
Notice how both regular and JSON logs are kept.
Logs seem to be kept for a month.
Backups
There is a backup job ( tpo-gitlab-backup
, in the root
user
crontab) that is a simple wrapper script which calls gitlab-backup
to dump some components of the GitLab installation in the backup
directory (/srv/gitlab-backup
).
The backup system is deployed by Puppet and (at the time of writing!) skips the database, repositories and artifacts. It contains:
- GitLab CI build logs (
builds.tar.gz
) - Git Large Files (Git LFS,
lfs.tar.gz
) - packages (
packages.tar.gz
) - GitLab pages (
pages.tar.gz
) - some terraform thing (
terraform_state.tar.gz
) - uploaded files (
uploads.tar.gz
)
The backup job is ran nightly. GitLab also creates a backup on upgrade. Jobs are purged daily, and are assumed to be covered by regular Bacula backups.
The backup job does NOT contain those components because they take up a tremendous amount of disk space, and are already backed up by Bacula. Those need to be restored from the regular backup server, separately:
- Git repositories (found in
/var/opt/gitlab/git-data/repositories/
) - GitLab CI artifacts (normally found in
/var/opt/gitlab/gitlab-rails/shared/artifacts/
, in our case bind-mounted over/srv/gitlab-shared/artifacts
)
It is assumed that the existing backup system
will pick up those files, but also the actual backup files in
/srv/gitlab-backup
and store them for our normal rotation
periods. For repositories, this is actually not completely clear, see
upstream issue 432743 for that discussion.
This implies that some of the files covered by the gitlab-backup
job
are also already backed up by Bacula and are therefore duplicated on
the backup storage server. Ultimately, we need to make sure everything
is covered by our normal backup system and possibly retire the rake
task, see issue 40518 to track that work.
Note that, since 16.6 (late 2023), GitLab has slightly better documentation about how backups work. We have experimenting server-side backups in late 2023, and found many issues:
- lacking documentation about server-side backups
- backups are never pruned (!)
- incremental support is unclear
- runaway backup size
The backup size is particularly problematic. In the 2023 test, we
found that our 90GiB of repositories were generating a new 200GiB of
object storage data at every backup. It seems like shared @pool
repositories are not backed up correctly, which begs the question of
the backups' integrity in the first place.
Other documentation
- GitLab has a built-in help system and online documentation
- Support forum
Discussion
Meetings
Some meetings about tools discussed GitLab explicitly. Those are the minutes:
Overview
The GitLab project at Tor has been a long time coming. If you look at the Trac history section, you'll see it has been worked on since at least 2016, at which point an external server was setup for the "network team" to do code review. This server was ultimately retired.
The current server has been worked on since 2019, with the master ticket, issue 29400, created in the footsteps of the 2019 Brussels meeting. The service launched some time in June 2020, with a full migration of Trac tickets.
Goals
Must have
- replacement of the Trac issue tracking server
- rough equivalent of Trac features in GitLab
Nice to have
- identical representation of Trac issues in GitLab, including proper issue numbering
Non-Goals
- replacement of Gitolite (git hosting)
- replacement of Gitweb (git hosting)
- replacement of Jenkins (CI) -- although that was eventually done
- replacement of the static site hosting system
Those are not part of the first phase of the project, but it is understood that if one of those features gets used more heavily in GitLab, the original service MUST be eventually migrated into GitLab and turned off. We do not want to run multiple similar services at the same time (for example run both gitolite and gitaly on all git repositories, or run Jenkins and GitLab runners).
Approvals required
The GitLab migration was approved at the 2019 Brussels dev meeting.
Proposed Solution
The solution to the "code review" and "project management" problems are to deploy a GitLab instance which does not aim at managing all source code, in the first stage.
Cost
Staff not evaluated.
In terms of hardware, we start with a single virtual machine and agree that, in the worst case, we can throw a full Hetzner PX62-NVMe node at the problem (~70EUR/mth).
Alternatives considered
GitLab is such a broad project that multiple alternatives exist for different components:
- GitHub
- Pros:
- widely used in the open source community
- Good integration between ticketing system and code
- Cons
- It is hosted by a third party (Microsoft!)
- Closed source
- Pros:
- GitLab:
- Pros:
- Mostly free software
- Feature-rich
- Cons:
- Complex software, high maintenance
- "Opencore" - some interesting features are closed-source
GitLab command line clients
If you want to do batch operations or integrations with GitLab, you might want to use one of those tools, depending on your environment or preferred programming language:
- bugwarrior (Debian) - support for GitLab, GitHub and other bugtrackers for the taskwarrior database
- git-lab - python commandline client, lists, pulls MR; creates snippets
- GitLab-API-v4 (Debian) - perl library and commandline client
- GitLabracadabra (Debian) - configure a GitLab instance from a YAML configuration, using the API: project settings like labels, admins, etc
-
glab (Debian) - inspired by GitHub's official
gh
client -
python-gitlab (also known as
gitlab-cli
in Debian) - ruby-gitlab (Debian), also includes a commandline client
- salsa (in Debian devscripts) is specifically built for salsa but might be coerced into talking to other GitLab servers
GitLab upstream has a list of third-party commandline tools that is interesting as well.
Migration tools
ahf implemented the GitLab migration using his own home-made tools that talk to the GitLab and Trac API. but there's also tracboat which is designed to migrate from trac to GitLab.
We did not use Tracboat because it uses gitlab's DB directly and thus only works with some very specific version. Each time the database schema changes at GitLab, Tracboat needs to port to it. We preferred to use something that talked with the GitLab API.
We also didn't like the output entirely, so we modified it but still used some of its regular expressions and parser.
We also needed to implement the "ticket movement" hack (with the legacy project) which wasn't implemented in Tracboat.
Finally, we didn't want to do complete user migration, but lazily transfer only some users.
Git repository integrity solutions
This section is a summary of the discussion in ticket tpo/tpa/gitlab#81. A broader discussion of the security issues with GitLab vs Gitolite and the choices made during that migration are available in Gitolite: security concerns.
Some developers expressed concerns about using GitLab as a canonical location for source code repositories, mainly because of the much broader attack surface GitLab provides, compared to the legacy, gitolite-based infrastructure, especially considering that the web application basically has write access to everything.
One solution to this problem is to use cryptographic signatures. We already use OpenPGP extensively in the Tor infrastructure, and it's well integrated in git, so it's an obvious candidate. But it's not necessarily obvious how OpenPGP would be used to sign code inside Tor, so this section provides a short review of existing solutions in that space.
Guix: sign all commits
Guix uses OpenPGP to sign commits, using an approach that is basically:
- The repository contains a .guix-authorizations file that lists the OpenPGP key fingerprints of authorized committers.
- A commit is considered authentic if and only if it is signed by one of the keys listed in the .guix-authorizations file of each of its parents. This is the authorization invariant.
[...] Since .guix-authorizations is a regular file under version control, granting or revoking commit authorization does not require special support.
Note the big caveat:
It has one downside: it prevents pull-request-style workflows. Indeed, merging the branch of a contributor not listed in .guix-authorizations would break the authorization invariant. It’s a good tradeoff for Guix because our workflow relies on patches carved into stone tablets (patch tracker), but it’s not suitable for every project out there.
Also note there's a bootstrapping problem in their design:
Which commit do we pick as the first one where we can start verifying the authorization invariant?
They solve this with an out of band "channel introduction" mechanism which declares a good hash and a signing key.
This also requires a custom client. But it serves as a good example of an extreme approach (validate everything) one could take.
Note that GitLab Premium (non-free) has support for push rules and in particular a "Reject unsigned commits" rule.
Another implementation is SourceWare's gitsigur which verifies all commits (200 lines Python script), see also this discussion for a comparison. A similar project is Gentoo's update-02-gpg bash script.
Arista: sign all commits in Gerrit
Arista wrote a blog post called Commit Signing with Git at Enterprise Scale (archive) which takes a radically different approach.
- all OpenPGP keys are centrally managed (which solves the "web of trust" mess) in a Vault
- Gerrit is the gatekeeper: for patches to be merged, they must be signed by a trusted key
It is a rather obtuse system: because the final patches are rebased on top of the history, the git signatures are actually lost so they have a system to keep a reference to the Gerrit change id in the git history, which does have a copy of the OpenPGP signature.
Gerwitz: sign all commits or at least merge commits
Mike Gerwitz wrote an article in 2012 (which he warns is out of date) but which already correctly identified the issues with merge and rebase workflows. He argues there is a way to implement the desired workflow by signing merges: because maintainers are the one committing merge requests to the tree, they are in a position to actually sign the code provided by third-parties. Therefore it can be assume that if a merge commit is signed, then the code it imported is also signed.
The article also provides a crude checking script for such a scenario.
Obviously, in the case of GitLab, it would make the "merge" button less useful, as it would break the trust chain. But it's possible to merge "out of band" (in a local checkout) and push the result, which GitLab generally correctly detect as closing the merge request.
Note that sequoia-git implements this pattern, according to this.
Torvalds: signed tags
Linus Torvalds, the original author and maintainer of the Linux
kernel, simply signs the release tags. In an article called "what
does a pgp signature on a git commit prove?", Konstantin Ryabitsev
(the kernel.org
sysadmin), provides a good primer on OpenPGP signing
in git. It also shows how to validate Linux releases by checking the
tag and argues this is sufficient to ensure trust.
Vick: git signatures AKA git notes
The git-signatures project, authored by Lance R. Vick, makes it possible to "attach an arbitrary number of GPG signatures to a given commit or tag.":
Git already supports commit signing. These tools are intended to compliment that support by allowing a code reviewer and/or release engineer attach their signatures as well.
Downside: third-party tool not distributed with git and not packaged in Debian.
The idea of using git-notes was also proposed by Owen Jacobsen.
Walters: extended validation tags
The git-evtag projects from Colin Walters tries to address the perceived vulnerability of the SHA-1 hash by implementing a new signing procedure for tags, based on SHA-512 and OpenPGP.
Ryabitsev: b4 and patch attestations
Konstantin Ryabitsev (the kernel.org sysadmin, again) proposed a new cryptographic scheme to sign patches in Linux, he called "patch attestation". The protocol is designed to survive mailing list transports, rebases and all sorts of mangling. It does not use GnuPG and is based on a Trust On First Use (TOFU) model.
The model is not without critics.
Update, 2021-06-04: there was another iteration of that concept, this time based on DKIM-like headers, with support for OpenPGP signatures but also "native" ed25519.
One key takeaway from this approach, which we could reuse, is the way public keys are stored. In patatt, the git repository itself holds the public keys:
On the other hand, within the context of git repositories, we already have a suitable mechanism for distributing developer public keys, which is the repository itself. Consider this:
- git is already decentralized and can be mirrored to multiple locations, avoiding any single points of failure
- all contents are already versioned and key additions/removals can be audited and “git blame’d”
- git commits themselves can be cryptographically signed, which allows a small subset of developers to act as “trusted introducers” to many other contributors (mimicking the “keysigning” process)
The idea of using git itself for keyring management was originally suggested by the did:git project, though we do not currently implement the proposed standard itself.
<https://github.com/dhuseby/did-git-spec/blob/master/did-git-spec.md>
It's unclear, however, why the latter spec wasn't reused. To be investigated.
Update, 2022-04-20: someone actually went through the trouble of auditing the transparency log, which is an interesting exercise in itself. The verifier source code is available, but probably too specific to Linux for our use case. Their notes are also interesting. This is also in the kernel documentation and the logs themselves are in this git repository.
Ryabitsev: Secure Scuttlebutt
A more exotic proposal is to use the Secure Scuttlebutt (SSB) protocol instead of emails to exchange (and also, implicitly) sign git commits. There is even a git-ssb implementation, although it's hard to see because it's been migrated to .... SSB!
Obviously, this is not quite practical and is shown only as a more radical example, as a stand-in for the other end of the decentralization spectrum.
Stelzer: ssh signatures
Fabian Stelzer made a pull request for git which was actually merged in October 2021 and therefore might make it to 2.34. The PR adds support for SSH signatures on top of the already existing OpenPGP and X.509 systems that git already supports.
It does not address the above issues of "which commits to sign" or "where to store keys", but it does allow users to drop the OpenPGP/GnuPG dependency if they so desire. Note that there may be compatibility issues with different OpenSSH releases, as the PR explicitly says:
I will add this feature in a follow up patch afterwards since the released 8.7 version has a broken ssh-keygen implementation which will break ssh signing completely.
We do not currently have plans to get rid of OpenPGP internally, but it's still nice to have options.
Lorenc: sigstore
Dan Lorenc, an engineer at Google, designed a tool that allows users to sign "artifacts". Typically, those are container images (e.g. cosign is named so because it signs "containers"), but anything can be signed.
It also works with a transparency log server called rekor. They run a public instance, but we could also run our own. It is currently unclear if we could have both, but it's apparently possible to run a "monitor" that would check the log for consistency.
There's also a system for signing binaries with ephemeral keys which seems counter-intuitive but actually works nicely for CI jobs.
Seems very promising, maintained by Google, RedHat, and supported by the Linux foundation. Complementary to in-toto and TUF. TUF is actually used to create the root keys which are controlled, at the time of writing, by:
- Bob Callaway (Google)
- Dan Lorenc (Google)
- Luke Hinds (RedHat)
- Marina Moore (NYU)
- Santiago Torres (Purdue)
Update: gitsign is specifically built to use this infrastructure for Git. GitHub and GitLab are currently lacking support for verifying those signatures. See tutorial.
Similar projects:
- SLSA, which has a well documented threat model
- Trillian (Google)
- sigsum, similar to sigstore, but more minimal
Sirish: gittuf
Aditya Sirish, a PhD student under TUF's Cappos is building gittuf a "security layer for Git repositories" which allows things like multiple signatures, key rotation and in-repository attestations of things like "CI ran green on this commit".
Designed to be backend agnostic, so should support GPG and sigstore, also includes in-toto attestations.
Other caveats
Also note that git has limited security guarantees regarding checksums, since it uses SHA-1, but that is about to change. Most Git implementations also have protections against collisions, see for example this article from GitHub.
There are, of course, a large number of usability (and some would say security) issues with OpenPGP (or, more specifically, the main implementation, GnuPG). There has even been security issues with signed Git commits, specifically.
So I would also be open to alternative signature verification schemes. Unfortunately, none of those are implemented in git, as far as I can tell.
There are, however, alternatives to GnuPG itself. This article from Saoirse Shipwreckt shows how to verify commits without GnuPG, for example. That still relies on OpenPGP keys of course...
... which brings us to the web of trust and key distribution problems. The OpenPGP community is in this problematic situation right now where the traditional key distribution mechanisms (the old keyserver network) has been under attack and is not as reliable as it should be. This brings the question of keyring management, but that is already being discussed in tpo/tpa/team#29671.
Finally, note that OpenPGP keys are not permanent: they can be revoked, or expired. Dealing with this problem has its specific set of solutions as well. GitHub marks signatures as verified for expired or revoked (but not compromised) keys, but has a special mouse-over showing exactly what's going on with that key, which seems like a good compromise.
Related
- gitid: easier identity management for git
- signed git pushes
- TUF: generic verification mechanism, used by Docker, no known Git implementation just yet (update: gittuf in pre-alpha as of dec 2023)
- SLSA: "security framework, a check-list of standards and controls to prevent tampering, improve integrity, and secure packages and infrastructure", built on top of in-toto
- jcat: used by fwupd
- git-signify: using signify, a non-OpenPGP alternative
- crev: Code REView system, used by Rust (and Cargo) to vet dependencies, delegates sharing signatures to git, but cryptographically signs them so should be resilient against a server compromise
- arch linux upstream tag verifications
- Linux kernel OpenPGP keys distribution repository
- sequoia authenticate commits - to be evaluated
Migration from Trac
GitLab was put online as part of a migration from Trac, see the Trac documentation for details on the migration.