We have already started mirroring (gitlab#18 (closed), gitlab#35 (closed)) repositories from gitolite to GitLab. We need to decide how and/or if we will accept such requests in the future, and, in particular, whether we want to host all our git repositories on GitLab in the long term.
If so, we need to come up with a migration plan on how the old repositories on gitolite will "map" to the ones in GitLab. This is particularly complicated by the fact that the namespace established on GitLab does not necessarily reflect the one in use on Gitolite, so we are very likely to have to come up with some rewrite rules to handle those redirections.
But at the very least, we need a plan, and we need it fast, because I am worried this migration will happen organically and we will then have to maintain two git hosting systems in parallel. This is similar to the problem of "hosting both trac and gitlab in parallel" that we have (succesfully, i think) avoided, but it was a near hit. ;)
TL;DR: this is the current state of the non-official policy:
gitolite and gitweb will eventually be retired, probably in 2022
an RFC will be written before the final migration is started, and discussed here
By "small ones", I think we meant "not tor browser",
We need to define the following policies:
do we keep gitolite around forever? no. gitolite and gitweb will be replaced by GitLab eventually.
if we do, do we keep the old codebase or upgrade? N/A
if we do not, when do we retire git-rw (cupani) and gitweb (vineale)? within 1 or 2 years, that is 2021 or 2022
if we do not, how do we protect our code against the larger attack surface of GitLab? opened gitlab#81 (closed) for that discussion this roadblock is removed. it's the responsability of teams to implement commit signing or other integrity measures they need.
where do people create new git repositories? gitolite or gitlab? new repositories are created on gitlab
can people mirror their git repositories from gitolite to gitlab? yes, in a limited way there are known issues with protected branches (gitlab#38 (closed))
how do we mirror a repo from gitolite to gitlab? documented, see this section
can people migrate their git repositories from gitolite to gitlab? yes, but only small projects right now, documented here, missing support for redirection on clone
how do we migrate a repo from gitolite to gitlab? using the migration procedure
how do we redirect users from gitolite to gitlab? using the above migration procedure, although there are issues with SSH clones (which don't fire a hook) and HTTP clones need webserver-level redirection
Remaining task list:
figure out how to do redirections on git clone
figure out gitweb redirection patterns
propose a migration plan for retiring the legacy gitolite infrastructure (cupani/git-rw, vineale/gitweb)
make a list of repositories that will be migrated and those that will be archived (to archive.org? or to gitlab?)
make a map of the repositories path on git-rw to gitlab
some projects progressively move, as tests
eventually migrate all remaining projects at once on a flag day
shut down gitolite and the git web interface (cgit), and their respective servers, cupani and vineale
there are a couple of caveats here:
jenkins. does gitlab talk to jenkins? or do we retire it as well and just switch to gitlab ci?
gitolite hooks
private repositories, especially TPA's
Regarding the last point, we have sensitive, special requirements for TPA. We especially do not want the reference puppet repository to be hosted on gitlab, as that would mean granting gitlab root access to all our servers! There are similar issues with other repositories, which would need to be audited. Those are the repositories I am concerned about:
The latter (puppet) is the most critical ones, as we can't grant GitLab puppet access. But we already have that repository on the puppetmaster, so it's somewhat orthogonal. The other are either public already, or encrypted (tor-passwords), so maybe they could live on gitlab, but they are tangled up with a lot of gitolite hooks which would need to be ported.
In general, we need to audit the gitolite hooks and see what can be implemented with the current stuff we have in GitLab and what will need CI and whether it's a dealbreaker.
I am concerned about the security here. I know that the gitolite installation is old and crufty, but the gitlab code-base is HUGE. For gitolite, as I understand it, you have to be able to make an ssh connection to the server before you can change anything in a repository. But in gitlab, it seems that the gitlab server has write access to all the repositories. That makes a much much bigger attack surface, unless I grievously misunderstand the architectures here.
My argument is that the space of users having access to gitolite is large enough that you essentially need to trust it with your code. We have dozens of people with SSH access, and all of those have, in practice, write access everywhere if gitolite is compromised.
Of course, GitLab is larger, and if there's an unauthenticated attack against GitLab, that could compromise our respositories. And there has been vulnerabilities in GitLab before, for sure. But looking at that list sorted by priority, i cannot find an unauthenticated vulnerability after reviewing the top ten. And I do not remember such an issue being ever disclosed although it's theoretically possible...
So my concern with GitLab is not that unauthenticated users might hack at it, but more what authenticated, trusted users could do. And there, of course, there is a plethora of privilege escalation vulnerabilities that have been disclosed (and fixed!). And that's where, when you compare to gitolite, there is a problem: our current gitolite install is vulnerable to 4 such issues at least, and possibly more, partly because it's so old, but also partly because gitolite is less well-maintained.
We are actively maintaining gitlab, following upstream releases quite closely. And while I'm not super happy about the attack surface (nevermind the complexity of managing the thing underneath!), everything (and especially security) is made of tradeoffs. And in this case, my opinion is that we should go towards GitLab for code hosting, because it will facilitate code review workflows, which is a large part of securing a product...
As I mentioned in the meeting, if we are worried about trust in our supply chain, we need to look elsewhere anyways, because gitolite cannot be trusted in its current state. I would encourage all teams to start signing their commits and verifying cryptographic history when doing releases. Having reproducible builds is also part of the equation, of course.
Otherwise, what is the alternative? We would keep gitolite around forever and have two replicas of the source code? That would mean a significant burden on the sysadmins and service "team" (ahf and hiro, really) responsible for those services, and I don't think it's fair to ask that for those people.
In there, we can see that ssh access is hardened through a gitlab-shell, a program that acts a bit like gitolite. But obviously, that's not our concern as much as the web stuff that runs on top: the GitLab Rails app is probably where a problem would occur if there is any, and it does have write access to the git repositories, as far as I understand.
I guess that, if we would want to harden that architecture, we could somehow revoke access to the webserver to the Gitaly storage servers and restrict access to the gitlab-shell (so that only SSH users could effect changes on repositories), but I don't know if that's possible or desirable (in part because it would block merge requests workflows).
The only downside with that approach is that a git clone will not warn about the project redirection, but I am not sure there's a way to fix that. Maybe in gitweb or git.torproject.org? I do not believe that hooks are ran on clone, on the server side, so there's little we can do here in terms of ssh cloning...?
I think we might be able to do 302 redirects at the webserver level for URL 1.
In the above procedure, there's a pre-receive hook in gitolite that denies users access to the repositories and tell them about the new repo, which fixes URL 2.
I don't know if there is a way to fix URL 3, because there's no "pre-clone" hook, either in git or in gitolite. But maybe, if we migrate all the repos, we can replace gitolite by a simple shell that would guess the mapping depending on the requested repository path....
And this is (one of the reasons) why it's hard to do everything at once...
also, if I can reframe the debate surrounding gitolite here a little... we have the following three options with any given, or all, repositories hosted on git./git-rw./gitweb.tpo:
repository stays on gitolite forever, not mirrored to gitolite
repository is mirrored between gitolite and gitlab
repository is migrated to gitlab
I am personally advocating for option 3, because I do not believe it is sustainable, secure, or useful to keep running gitolite for the forseable future. But whether you agree with this or not is somehow besides the point: anyone who wants any of the above solutions will need to work on implementing a plan for those.
I am promoting the (in my opinion simpler) plan of doing only step three (progressively, of course), not because I'm a GitLab fanboy, but because I'm a fan of keeping things simple and reducing the number of tools my users need to learn. We're introducing GitLab, which is a huge deal, and is going to take time. But keeping Gitolite around forever is just going to hurt us (and, I would argue extend the attack surface of our hosting services).
Options 1 and 2 are not "free". Maintaining gitolite right now has a significant churn in terms of support tickets (e.g. "please change access to repo X") and in terms of managing hardware for larger repositories. So I think we should avoid that in future.
People are already doing option 2, and it's causing its share of support calls already (e.g. gitlab#41 (closed)). I suspect this will significantly increase our support load when larger and more complex repositories show up. I am particularly concerned about maintaining the tor browser git repository on both infrastructures, for example, as it's already causing load issues on the old infra, quite regularly.
Some people are already doing option 3 (migration) and it seems more straightforward: I have already migrated two repositories without too much trouble (see #34437 (closed)).
And of course, it seems many people are assuming we will stay on gitolite forever. But that doesn't change the fact that 2 and 3 are happening, and are happening quick. So we need to plan for those so that, as @dgoulet correctly identified, people find our stuff when they look at git.torproject.org.
This ticket is not about forcing anything on anyone: it's about documenting best practices and establishing an orderly retreat from, or at least a plan on how to deal with, gitolite in the future.
after today's gitlab meeting, where it was clear this ticket was
getting too long and incomprehensible, i added a TL;DR: checklist of
the policies that need to be established.
and to be clear, here is my position on each issueL:
do we keep gitolite around forever? no. gitolite is a security liability and too difficult to deploy good workflows around to still b e useful.
if we do, do we keep the old codebase or upgrade? N/A
if we do not, when do we retire git-rw (cupani) and gitweb (vineale)? one or two years from now
if we do not, how do we protect our code against the larger attack surface of GitLab? OpenPGP signatures is the best we have, so far, and we should start using those even if we still use gitolite.
where do people create new git repositories? gitolite or gitlab? gitlab
can people mirror their git repositories from gitolite to gitlab? as an exceptional measure, yes and only if the repository is small.
how do we mirror a repo from gitolite to gitlab? i have no idea how this thing works, and i'm worried about the support load it causes, e.g. gitlab#38 (closed) and gitlab#41 (closed) already.
can people migrate their git repositories from gitolite to gitlab? yes. every repository should be migrated, possibly automatically.
how do we migrate a repo from gitolite to gitlab? using this procedure
how do we redirect users from gitolite to gitlab? using git hooks, although it is not clear how to do redirection on clones done over ssh and http
We can not push for a full migration of repos from gitolite to gitlab AND remove gitolite. I think we should put a date to talk again about gitolite and how to kill it.
For now we should
write down a guide for projects on how to get their repository mirrored between gitolite and gitlab
continue mantaining gitolite until June 2021 when we have this discussion again.
Answering your questions:
do we keep gitolite around forever?
we mantain it until we have the capacity to discuss this again. Let's put a date of June 2021 to look at gitolite/gitlab again.
if we do, do we keep the old codebase or upgrade?
Not sure what this implies.
if we do not, when do we retire git-rw (cupani) and gitweb (vineale)?
we don't yet
if we do not, how do we protect our code against the larger attack surface of GitLab?
mmm, not sure wat you mean here.
where do people create new git repositories? gitolite or gitlab?
gitlab
can people mirror their git repositories from gitolite to gitlab?
yes, unless you have any concern? Maybe tor browser should wait?
how do we mirror a repo from gitolite to gitlab?
please let's write a specific guide about it.
can people migrate their git repositories from gitolite to gitlab?
we mantain it until we have the capacity to discuss this again. Let's put a date of June 2021 to look at gitolite/gitlab again.
that sounds like really far away for me, and i wonder who will be responsible for maintaining gitolite all that time (not to mention gitlab, of course).
if we do, do we keep the old codebase or upgrade?
Not sure what this implies.
the question here is whether we are ready continue taking the (security and reliability) risk of a compromise of Gitolite, knowing that we run a prehistoric version that hasn't been patched in over a decade, or we bite the bullet and do the upgrade, knowing it might be a waste of time considering we eventually upgrade to gitlab.
can people mirror their git repositories from gitolite to gitlab?
yes, unless you have any concern? Maybe tor browser should wait?
i have concerns that this is going to be difficult to manage because it's not clear which server holds the canonical source. we have, at first, considered this was git-rw, but bugs like gitlab#38 (closed) and gitlab#41 (closed) show that it's not that clear cut and that we might have unsurmountable problems with some mirrors.
and yes, I'm specifically concerned about hosting tor-browser, wherever it is, but specifically concerned about hosting it in two places (three, if we count gitweb, and we should).
[...]
how do we migrate a repo from gitolite to gitlab?
please let's write a specific guide about it.
I did write this procedure, hopefully it will be useful?
how do we redirect users from gitolite to gitlab?
suggestions?
in the above guide, I suggest deploying a pre-receive hook in repositories so pushes are denied, but it's only one of many options. weasel suggested to just remove the write permission from the repository in gitolite, but I think it's a better UI to use a hook because we can show the user where to push instead.
the problem is how to redirect readonly operations like clone, on HTTP and SSH. I haven't looked into that in detail yet and doubt I will have time to do so, unfortunately. it should be fairly easy to redirect HTTP (using 301/302) and I think git might just follow those? but to redirect SSH, we would need a "pre-clone" hook, which does not exist in git, which means hacking at gitolite, which means hell, which is one of the reasons I am concerned about half-migrations... :)
Rather than belabour the points I don't agree on, let me sketch out a proposal for
the parts I hope we can agree on.
STEP 1: Preparation
[All "step 1X" items can happen in parallel.]
Step 1a: figure out redirects or mirroring for git.torproject.org and
gitweb.torproject.org.
We should had a way to keep those working even as
repositories migrate.
It is not necessary that they work for every repository -- only the
top-level ones not belonging to any user.
We can make this plan under the assumption that every relevant
project either migrates to gitlab, or is mirrored on gitlab.
We don't need to set up redirects for git-rw; anybody who has push
permissions for a repository can update their push URL.
If this proves too difficult, it is more important to keep gitweb.tpo
working than to keep git.tpo working.
Step 1b: Figure out hooks
For all projects that have installed git hooks of any kind, we
need to EITHER keep those hooks working post-migration (by moving
them to gitlab somehow) or we have to give up on having them work.
In most cases, the only hooks will be for things like sending emails
to tor-commits or notifying #tor-bots or disabling non-fast-foraward
pushes. We can use "work-alike" hooks for those if need be.
If a hook will not work on gitlab, we should look for a replacement
with the same functionality. If none can be found, then it is the
responsibility of the team that uses the hook to port or rewrite the
hook scripts. They should be given at least three months to do so,
between notification of the team and migration.
Step 1C: figure out admin/
The admin team has 22 repositories in the admin/ namespace. Some of
these are probably obsolete; some are probably in use. The admin
team should on their own form a plan for either disabling,
migrating, or keeping each of these repositories.
STEP 2: Primary migration
[All "step 2X" items can happen in parallel.]
Step 2A: Disable all extern/ and mirror/ repositories
I believe these are not in current use.
Let's disable all reads and writes to these repositories in order to
find out.
Step 2B: Disable user repositories and hidden repositories[*]
Declare a date after which personal gitolite repositories will no
longer be supported. Email all users with personal gitolite
repositories explaining how to migrate or move them. When the date
arrives, disable all read and write permissions for such
repositories.
NOTE that the plan is that the gitolite admins should not take on
responsibility for migrating these.
Do the same for all hidden repositories.[*]
Do not set up redirects for these repositories on gitweb.tpo or
git.tpo, unless the user specifically requests them. (Let them know
about this option when informing them.)
[*] except for gitolite-admin, which needs to exist as long as
gitolite does.
Step 2C: Migrate legacy project repositories
[All "step 2X" items can happen in parallel.]
Any project which has not had a commit for 18 months is considered a
legacy repository. These repositories should be turned into legacy
projects in gitlab, and archived as legacy projects on github too
(if they have not been so archived already).
Before archiving them, email everybody who has commit rights on
those repositories, and confirm that they are really legacy
projects. Assure these people that it would be very easy to turn
them into live projects again if needed.
(For legacy projects that never received any user, or that never got
very far, it may be better to archive them under an even more legacy
username or category.)
Step 2D: Opt-out migration of live project repositories
Any project that has had a commit in the last 18 months is a
"live" repository. By default, these will all move to gitlab, with
redirects/mirroring as appropriate on git.tpo and gitweb.tpo.
We'll do this first with an opt-out procedure: if the project
maintainers object, they can keep it on gitolite for now.
Otherwise, if they say nothing, or they don't mind, the project moves
to a new gitlab project.
When these projects merge, we take steps as needed to keep git.tpo
and gitweb.tpo working. These are the most crucial projects to keep
mirrored/redirected.
STEP 3: Gitolite lockdown
At this point, gitolite will contain only projects that opted out
during step 2D, and projects where all read and write access is
disabled.
After a month or two, assuming we haven't run into difficulty, we
should delete every group that does not own a repository, and delete
all repositories that do not have any readers/writers.
If possible, we should disable access for every user that is not a
member of at least one remaining gitolite group.
STEP 4: Hard migration
At this point, there will be no gitolite repositories left except
for those that opted out during step 2D. [I hope there will be no
more than 2 or 3.] At this point (or earlier!) we should reassess
our gitolite maintenance burden and policies, and do one of the
following:
Form a secondary migration plan for moving these "problem
repositories" to gitlab.
Find an alternative solution for hosting these problem
repositories, if they cannot move on to gitlab.
Decide to continue supporting gitolite for these problem
repositories.
meta-comments: I don't think that's a final proposal by any means, but I hope it can last us for a while. It's designed to migrate all the easy stuff as soon as possible, and get gitolite down to the smallest size we can.
I'm leaving the timeframe deliberately vague, since deciding on deadlines will be a balancing act between maintenance issues and user requirements. Before finalizing any such proposal, the git admins should figure out what timeline makes the most sense.
i only read the proposal quickly, but I am really, truly grateful that someone stepped up and wrote a plan! and it seems to make sense, at first glance.. i haven't found metaphorical devil in the details yet but those are bound to come up as we start down that road. :)
FWIW, wrt. hooks, for our first — now completed — iteration of migrating to GitLab, at Tails we decided to mirror the repositories that had Gitolite post-* hooks in place from GitLab to Gitolite, so the existing hooks keep working as is for now, without having to first port all the things to webhooks before we can migrate repositories. This implies to keep Gitolite running, but:
We had to keep it running for other reasons, e.g. our canonical Puppet repositories live in Gitolite and are mirrored to GitLab.
Almost no human user has access to Gitolite now, so exploiting it now requires either first exploiting GitLab, or find an unauthenticated privilege escalation in Gitolite, which is better than the previous state of things.
I looked over this discussion and it looks plausible.
Anarcat worried about repos like tor-passwords -- it seems like our current plan is to maintain a separate independent thing for puppet anyway, so maybe we can move other 'internal' repos over to that too? Or is that a terrible idea because it would increase the surface area for puppet? I worry that some of these internal repos will be stuck in between the plan and be sad from both sides.
during the last tools meeting (full log), we have agreed on a few things in the checklist I have made in the description. i have documented it there, but it's basically:
gitolite and gitweb will be replaced by GitLab eventually, within 1 or 2 years, that is 2021 or 2022
new repositories are created on gitlab
people can mirror their git repositories from gitolite to gitlab, in a limited way: there are serious limitations like issues with protected branches (gitlab#38 (closed))
people can migrate small repositories from gitolite to GitLab, using the migration procedure, although there are issues with redirections for SSH clones (which don't fire a hook) and HTTP clones (which need webserver-level redirection)
the remaining issues we need to clarify are:
how do we protect our code against the larger attack surface of GitLab? opened gitlab#81 (closed) to discuss this
how do we mirror a repo from gitolite to gitlab? needs to be documented
So my concern with GitLab is not that unauthenticated users might hack at it, but more what authenticated, trusted users could do. And there, of course, there is a plethora of privilege escalation vulnerabilities that have been disclosed (and fixed!). And that's where, when you compare to gitolite, there is a problem: our current gitolite install is vulnerable to 4 such issues at least, and possibly more, partly because it's so old, but also partly because gitolite is less well-maintained.
I was privately told that our gitolite install is actually not vulnerable to the specific security issues listed there, because of the way we use gitolite. It seems like those specific issues are related to an attacker having access to the control repository and abusing it? I'm not exactly sure, but I'm told I've overstated the (in)security of our gitolite deployment.
In general, great care has been taken to run a gitolite version that is specifically older, to ensure a smaller attack surface, because it has less features than newer gitolite versions. That's why it's such a weird version.
I am still worried that we use an old version of the software that is essentially unmaintained. For me, that feels like technical debt and makes maintenance harder. it's true that this old gitolite has a much smalller attack surface than gitlab (or even more recent gitolite), but my approach with this problem lies more with having other mechanisms to ensure code integrity (gitlab#81 (closed), ie. code signing and supply chain integrity) or secrecy (ie. encrypted repositories) than trusting the transport.
In any case: to the good people who setup that those (c)git(web) and gitolite servers: thanks! they have served us well all those years, and I trust that you did the right thing. After all, we haven't had any significant security issues with those for their lifetime, so that has to mean something. Given people's enthusiasm with GitLab right now, however, I doubt that we'll be able to keep that tide from washing over those old services, so I think it would be wise to start moving in the other direction now. :)
Here's a proposed timeline that I think I'd be okay with. I can't promise that any other human would be okay with it, so we need to ask more people:
On May 1*, all personal or non-public gitolite repositories become read-only.
On June 1*, all personal or non-public gitolite repositories are deleted.
2a. Also on June 1*, all public gitolite repositories that have not had any commits in the last 12 months become read-only.
On November 1*, gitolite is shut down.
3a. The admin team is responsible for making sure that https://git.torproject.org and https://gitweb.torproject.org still work, and that existing links to them remain stable. (This goes for other sites as well.) This is a blocker for the November 1 shutdown.
3b. The development teams are responsible for deciding how they want to adjust their workflows to mitigate the larger attack surface of gitlab (see gitlab#81 (closed)). (This is not a blocker for the November 1 shutdown.)
3c. If the development teams decide on any solutions in 3b above that require additional services (keyservers, CT servers, automated verifiers) to run, admin will help out and try to make sure things go smoothly. Developers should come up with requirements here earlier, not later. (This is not a blocker for the November 1 shutdown.)
(*) I expect that other people will want these dates to be later, but this is what I'd be okay with personally.
what do we mean by that? there are lots of URL patterns in cgit... do we mean that, say: https://gitweb.torproject.org/admin/tsa-misc.git/ will redirect to something, but also that all of those (and possible other iterations) also work?
those are just random examples, but the possibilities here are kind of scary... building such a rewrite map would require basically reverse-engineering cgit...
I would also set a deadline for this:
Developers should come up with requirements here earlier, not later
... prior to nov 1, say may 1st.
i'm otherwise extremely happy to see progress here and would gladly go with that timeline, with the understanding that time constraints might push certain steps back...
this is exciting! i'm happy to be moving forward on this, and unblocking gitlab#81 (closed).. :)
oh, and before i forget, this is perfect material for a TPA-RFC, again. if you do not mind, i would make a formal proposal through this process so that we can get wider approval for this...
what do we mean by that? there are lots of URL patterns in cgit... do we mean that, say: https://gitweb.torproject.org/admin/tsa-misc.git/ will redirect to something, but also that all of those (and possible other iterations) also work?
I think that as many as possible of these should work. It might mean that we have to keep gitweb around (yuck).
Another possibility is that we use logs on gitweb.tpo to find out which of the links are actually used (say, over the course of a month) and consider the rest to be not-important-to-support.
Another possibility it that we use a webcrawler on our own domains to find out which of the links we use.
... prior to nov 1, say may 1st.
That seems far too early to me. July 1 is more reasonable IMO. Or you could promise different gradations of support, like "If you ask on may 1 it will be a high priority. If you ask on October 1 you will get the best I can do in 20 minutes."
this is exciting! i'm happy to be moving forward on this, and unblocking gitlab#81 (closed).. :)
Well, this is only valid if other folks are okay with it too :)
oh, and before i forget, this is perfect material for a TPA-RFC, again. if you do not mind, i would make a formal proposal through this process so that we can get wider approval for this...
Maybe? Neither of us is exactly swimming in free time. But if you link me to the TPA-RFC process document I might have a chance.
so one side-channel conversation about this migration has finally closed recently, in gitlab#81 (closed). there, @nickm and I have agreed that it's best not to block on "fixing GitLab security" before migrating away from gitolite into GitLab. the benefits of the former are just too great and the burden of maintaining both is too high to be stuck in the current situation.
i have summarized this conversation about security and gitlab#81 (closed) in:
and i will check the gitlab#81 (closed) task off the checklist here. i think the next step here is basically to review if there's any missing documentation and formulate a formal proposal, which @nickm seems to have been considering last i checked. :)
anarcatmarked the checklist item if we do not, how do we protect our code against the larger attack surface of GitLab? opened gitlab#81 (closed) for that discussion as completed
marked the checklist item if we do not, how do we protect our code against the larger attack surface of GitLab? opened gitlab#81 (closed) for that discussion as completed