... | ... | @@ -941,4 +941,154 @@ this was eventually abandoned in favor of using Puppet and the |
|
|
"Omnibus" package.
|
|
|
|
|
|
For ad hoc jobs, [fabric](fabric) is being used.
|
|
|
|
|
|
|
|
|
For code management, I have done a more extensive review of possible
|
|
|
alternatives. [This talk](https://www.youtube.com/watch?v=RdIyStATgFE) is a good introduction for git submodule,
|
|
|
librarian and r10k. Based on that talk and [these slide](https://arlimus.github.io/slides/librarian.and.r10k/), I've made
|
|
|
the following observations:
|
|
|
|
|
|
### monorepo
|
|
|
|
|
|
This is our current approach, which is that all code is committed in
|
|
|
one monolithic repository. This effectively makes it impossible to
|
|
|
share code outside of the repository with anyone else because there is
|
|
|
private data inside, but also because it doesn't follow the standard
|
|
|
role/profile/modules separation that makes collaboration possible at
|
|
|
all. To work around that, I designed a workflow where we locally clone
|
|
|
subrepos as needed, but this is clunky as it requies to commit every
|
|
|
change twice: one for the subrepo, one for the parent.
|
|
|
|
|
|
Our giant monorepo also mixes all changes together which can be an pro
|
|
|
*and* a con: on the one hand it's easy to see and audit all changes at
|
|
|
once, but on the other hand, it can be overwhelming and confusing.
|
|
|
|
|
|
But it does allow us to integrate with librarian right now and is a
|
|
|
good stopgap solution. A better solution would need to solve the
|
|
|
"double-commit" problem and still allow us to have smaller
|
|
|
repositories that we can collaborate on outside of our main tree.
|
|
|
|
|
|
## submodules
|
|
|
|
|
|
The talk partially covers how difficult `git submodules` work and how
|
|
|
hard they are to deal with. I say partially because submodules are
|
|
|
even harder to deal with than the examples she gives. She shows how
|
|
|
submodules are hard to add and remove, because the metadata is stored
|
|
|
in stored in multiple locations (`.gitsubmodules`, `.git/config`,
|
|
|
`.git/modules/` and the submodule repository itself).
|
|
|
|
|
|
She also mentions submodules don't know about dependencies and it's
|
|
|
likely you will break your setup if you forget one step. (See [this
|
|
|
post](https://web.archive.org/web/20171101202911/http://somethingsinistral.net/blog/git-submodules-are-probably-not-the-answer/) for more examples.)
|
|
|
|
|
|
In my experience, the biggest annoyance with submodules is the
|
|
|
"double-commit" problem: you need to make commits in the submodule,
|
|
|
then *redo* the commits in the parent repository to chase the head of
|
|
|
that submodule. This does not improve on our current situation, which
|
|
|
is that we need to do those two commits anyways in our giant monorepo.
|
|
|
|
|
|
One advantage with submodules is that they're mostly standard:
|
|
|
everyone knows about them, even if they're not familiar and their
|
|
|
knowledge is reusable outside of Puppet.
|
|
|
|
|
|
## librarian
|
|
|
|
|
|
Librarian is written in ruby. It's built on top of [another library
|
|
|
called librarian](https://github.com/applicationsonline/librarian) that is used by Ruby's [bundler](https://gembundler.com/). At the time
|
|
|
of the talk, was "pretty active" but unfortunately, librarian now
|
|
|
seems to be [abandoned](https://github.com/voxpupuli/librarian-puppet/issues/48) so we might be forced to use r10k in the
|
|
|
future, which has a quite different workflow.
|
|
|
|
|
|
One problem with librarian right now is that `librarian update` clears
|
|
|
any existing git subrepo and re-clones it from scratch. If you have
|
|
|
temporary branches that were not pushed remotely, all of those are
|
|
|
lost forever. That's really bad and annoying! it's by design: it
|
|
|
"takes over your modules directory", as she explains in the talk and
|
|
|
everything comes from the Puppetfile.
|
|
|
|
|
|
Librarian does resolve dependencies recursively and store the decided
|
|
|
versions in a lockfile which allow us to "see" what happens when you
|
|
|
update from a Puppetfile.
|
|
|
|
|
|
But there's no cryptographic chain of trust between the repository
|
|
|
where the Puppetfile is and the modules that are checked out. Unless
|
|
|
the module is checked out from git (which isn't the default), only
|
|
|
version range specifiers constrain which code is checked out, which
|
|
|
gives a huge surface area for arbitrary code injection in the entire
|
|
|
puppet infrastructure (e.g. MITM, forge compromise, hostile upstream
|
|
|
attacks)
|
|
|
|
|
|
## r10k
|
|
|
|
|
|
r10k was written because librarian was too slow for large
|
|
|
deployments. But it covers more than just managing code: it also
|
|
|
manages environments and is designed to run on the Puppet master. It
|
|
|
doesn't have dependency resolution or a `Puppetfile.lock`,
|
|
|
however. See [this ticket](https://github.com/puppetlabs/r10k/issues/38), closed in favor of [that one](https://tickets.puppetlabs.com/browse/RK-3).
|
|
|
|
|
|
r10k is more complex and very opiniated: it requires lots of
|
|
|
configuration including its own YAML file, hooks into the Puppetmaster
|
|
|
and can [take a while to deploy](http://garylarizza.com/blog/2014/02/18/puppet-workflow-part-3/). r10k is still in [active
|
|
|
development](https://github.com/puppetlabs/r10k/releases) and is supported by Puppetlabs, so there's [official
|
|
|
documentation](https://puppet.com/docs/pe/2019.1/r10k.html) in the Puppet documentation.
|
|
|
|
|
|
Often used in conjunction with librarian for dependency resolution.
|
|
|
|
|
|
One cool feature is that r10k allows you to create dynamic
|
|
|
environments based on branch names. All you need is a single repo with
|
|
|
a Puppetfile and r10k handles the rest. The problem, of course, is
|
|
|
that you need to trust it's going to do the right thing. There's the
|
|
|
security issue, but there's also the problem of resolving dependencies
|
|
|
and you *do* end up double-committing in the end if you use branches
|
|
|
in sub-repositories. But maybe that is unavoidable.
|
|
|
|
|
|
(Note that there are ways of resolving dependencies with external
|
|
|
tools, like [generate-puppetfile](https://github.com/rnelson0/puppet-generate-puppetfile) ([introduction](https://rnelson0.com/2015/11/06/introducing-generate-puppetfile-or-creating-a-ruby-program-to-update-your-puppetfile-and-fixtures-yml/)) or [this hack
|
|
|
that reformats librarian output](https://github.com/dharmabruce/lp2r10k/blob/master/lp2r10k) or [those rake tasks](https://github.com/voxpupuli/ra10ke). there's
|
|
|
also a [go rewrite called g10k](https://github.com/xorpaul/g10k) that is much faster, but with
|
|
|
similar limitations.)
|
|
|
|
|
|
## git subtree
|
|
|
|
|
|
[This article](https://web.archive.org/web/20171107082413/http://somethingsinistral.net/blog/scaling-puppet-environment-deployment/) mentions git subtrees from the point of view of
|
|
|
Puppet management quickly. It outline how it's cool that the history
|
|
|
of the subtree gets merged as is in the parent repo, which gives us
|
|
|
the best of both world (individual, per-module history view along with
|
|
|
a global view in the parent repo). It makes, however, rebasing in
|
|
|
subtrees impossible, as it breaks the parent merge. You do end up with
|
|
|
some of the disadvantages of the monorepo in the all the code is
|
|
|
actually committed in the parent repo and you *do* have to commit
|
|
|
twice as well.
|
|
|
|
|
|
## subrepo
|
|
|
|
|
|
TODO. https://github.com/ingydotnet/git-subrepo
|
|
|
|
|
|
## myrepos
|
|
|
|
|
|
[myrepos](https://myrepos.branchable.com/) is one of many solutions to manage multiple git
|
|
|
repositories. It has been used in the past at my old workplace
|
|
|
(Koumbit.org) to manage and checkout multiple git repositories.
|
|
|
|
|
|
Like Puppetfile without locks, it doesn't enforce cryptographic
|
|
|
integrity between the master repositories and the subrepositories: all
|
|
|
it does is define remotes and their locations.
|
|
|
|
|
|
Like r10k it doesn't handle dependencies and will require extra setup,
|
|
|
although it's much lighter than r10k.
|
|
|
|
|
|
Its main disadvantage is that it isn't well known and might seem
|
|
|
esoteric to people. It also has weird failure modes, but could be used
|
|
|
in parallel with a monorepo. For example, it might allow us to setup
|
|
|
specific remotes in subdirectories of the monorepo automatically.
|
|
|
|
|
|
## Summary table
|
|
|
|
|
|
| Approach | Pros | Cons | Summary |
|
|
|
|------------|----------------------------|------------------------------------------|---------------------------|
|
|
|
| Monorepo | Simple | Double-commit | Status quo |
|
|
|
| Submodules | Well-known | Hard to use, double-commit | Not great |
|
|
|
| Librarian | Dep resolution client-side | Unmaintained, bad integration with git | Not sufficient on its own |
|
|
|
| r10k | Standard | Hard to deploy, opiniated | To evaluate further |
|
|
|
| Subtree | "best of both worlds" | Still get double-commit, rebase problems | Not sure it's worth it |
|
|
|
| Subrepo | ? | ? | ? |
|
|
|
| myrepos | Flexible | Esoteric | might be useful with our monorepo | |