anarcat · abf3601e
--- a/howto/puppet.md
+++ b/howto/puppet.md
@@ -941,4 +941,154 @@ this was eventually abandoned in favor of using Puppet and the
 "Omnibus" package.

 For ad hoc jobs, [fabric](fabric) is being used.
-    
+
+For code management, I have done a more extensive review of possible
+alternatives. [This talk](https://www.youtube.com/watch?v=RdIyStATgFE) is a good introduction for git submodule,
+librarian and r10k. Based on that talk and [these slide](https://arlimus.github.io/slides/librarian.and.r10k/), I've made
+the following observations:
+
+### monorepo
+
+This is our current approach, which is that all code is committed in
+one monolithic repository. This effectively makes it impossible to
+share code outside of the repository with anyone else because there is
+private data inside, but also because it doesn't follow the standard
+role/profile/modules separation that makes collaboration possible at
+all. To work around that, I designed a workflow where we locally clone
+subrepos as needed, but this is clunky as it requies to commit every
+change twice: one for the subrepo, one for the parent.
+
+Our giant monorepo also mixes all changes together which can be an pro
+*and* a con: on the one hand it's easy to see and audit all changes at
+once, but on the other hand, it can be overwhelming and confusing.
+
+But it does allow us to integrate with librarian right now and is a
+good stopgap solution. A better solution would need to solve the
+"double-commit" problem and still allow us to have smaller
+repositories that we can collaborate on outside of our main tree.
+
+## submodules
+
+The talk partially covers how difficult `git submodules` work and how
+hard they are to deal with. I say partially because submodules are
+even harder to deal with than the examples she gives. She shows how
+submodules are hard to add and remove, because the metadata is stored
+in stored in multiple locations (`.gitsubmodules`, `.git/config`,
+`.git/modules/` and the submodule repository itself).
+
+She also mentions submodules don't know about dependencies and it's
+likely you will break your setup if you forget one step. (See [this
+post](https://web.archive.org/web/20171101202911/http://somethingsinistral.net/blog/git-submodules-are-probably-not-the-answer/) for more examples.)
+
+In my experience, the biggest annoyance with submodules is the
+"double-commit" problem: you need to make commits in the submodule,
+then *redo* the commits in the parent repository to chase the head of
+that submodule. This does not improve on our current situation, which
+is that we need to do those two commits anyways in our giant monorepo.
+
+One advantage with submodules is that they're mostly standard:
+everyone knows about them, even if they're not familiar and their
+knowledge is reusable outside of Puppet.
+
+## librarian
+
+Librarian is written in ruby. It's built on top of [another library
+called librarian](https://github.com/applicationsonline/librarian) that is used by Ruby's [bundler](https://gembundler.com/). At the time
+of the talk, was "pretty active" but unfortunately, librarian now
+seems to be [abandoned](https://github.com/voxpupuli/librarian-puppet/issues/48) so we might be forced to use r10k in the
+future, which has a quite different workflow.
+
+One problem with librarian right now is that `librarian update` clears
+any existing git subrepo and re-clones it from scratch. If you have
+temporary branches that were not pushed remotely, all of those are
+lost forever. That's really bad and annoying! it's by design: it
+"takes over your modules directory", as she explains in the talk and
+everything comes from the Puppetfile.
+
+Librarian does resolve dependencies recursively and store the decided
+versions in a lockfile which allow us to "see" what happens when you
+update from a Puppetfile.
+
+But there's no cryptographic chain of trust between the repository
+where the Puppetfile is and the modules that are checked out. Unless
+the module is checked out from git (which isn't the default), only
+version range specifiers constrain which code is checked out, which
+gives a huge surface area for arbitrary code injection in the entire
+puppet infrastructure (e.g. MITM, forge compromise, hostile upstream
+attacks)
+
+## r10k
+
+r10k was written because librarian was too slow for large
+deployments. But it covers more than just managing code: it also
+manages environments and is designed to run on the Puppet master. It
+doesn't have dependency resolution or a `Puppetfile.lock`,
+however. See [this ticket](https://github.com/puppetlabs/r10k/issues/38), closed in favor of [that one](https://tickets.puppetlabs.com/browse/RK-3).
+
+r10k is more complex and very opiniated: it requires lots of
+configuration including its own YAML file, hooks into the Puppetmaster
+and can [take a while to deploy](http://garylarizza.com/blog/2014/02/18/puppet-workflow-part-3/). r10k is still in [active
+development](https://github.com/puppetlabs/r10k/releases) and is supported by Puppetlabs, so there's [official
+documentation](https://puppet.com/docs/pe/2019.1/r10k.html) in the Puppet documentation.
+
+Often used in conjunction with librarian for dependency resolution.
+
+One cool feature is that r10k allows you to create dynamic
+environments based on branch names. All you need is a single repo with
+a Puppetfile and r10k handles the rest. The problem, of course, is
+that you need to trust it's going to do the right thing. There's the
+security issue, but there's also the problem of resolving dependencies
+and you *do* end up double-committing in the end if you use branches
+in sub-repositories. But maybe that is unavoidable.
+
+(Note that there are ways of resolving dependencies with external
+tools, like [generate-puppetfile](https://github.com/rnelson0/puppet-generate-puppetfile) ([introduction](https://rnelson0.com/2015/11/06/introducing-generate-puppetfile-or-creating-a-ruby-program-to-update-your-puppetfile-and-fixtures-yml/)) or [this hack
+that reformats librarian output](https://github.com/dharmabruce/lp2r10k/blob/master/lp2r10k) or [those rake tasks](https://github.com/voxpupuli/ra10ke). there's
+also a [go rewrite called g10k](https://github.com/xorpaul/g10k) that is much faster, but with
+similar limitations.)
+
+## git subtree
+
+[This article](https://web.archive.org/web/20171107082413/http://somethingsinistral.net/blog/scaling-puppet-environment-deployment/) mentions git subtrees from the point of view of
+Puppet management quickly. It outline how it's cool that the history
+of the subtree gets merged as is in the parent repo, which gives us
+the best of both world (individual, per-module history view along with
+a global view in the parent repo). It makes, however, rebasing in
+subtrees impossible, as it breaks the parent merge. You do end up with
+some of the disadvantages of the monorepo in the all the code is
+actually committed in the parent repo and you *do* have to commit
+twice as well.
+
+## subrepo
+
+TODO. https://github.com/ingydotnet/git-subrepo
+
+## myrepos
+
+[myrepos](https://myrepos.branchable.com/) is one of many solutions to manage multiple git
+repositories. It has been used in the past at my old workplace
+(Koumbit.org) to manage and checkout multiple git repositories.
+
+Like Puppetfile without locks, it doesn't enforce cryptographic
+integrity between the master repositories and the subrepositories: all
+it does is define remotes and their locations.
+
+Like r10k it doesn't handle dependencies and will require extra setup,
+although it's much lighter than r10k.
+
+Its main disadvantage is that it isn't well known and might seem
+esoteric to people. It also has weird failure modes, but could be used
+in parallel with a monorepo. For example, it might allow us to setup
+specific remotes in subdirectories of the monorepo automatically.
+
+## Summary table
+
+| Approach   | Pros                       | Cons                                     | Summary                   |
+|------------|----------------------------|------------------------------------------|---------------------------|
+| Monorepo   | Simple                     | Double-commit                            | Status quo                |
+| Submodules | Well-known                 | Hard to use, double-commit               | Not great                 |
+| Librarian  | Dep resolution client-side | Unmaintained, bad integration with git   | Not sufficient on its own |
+| r10k       | Standard                   | Hard to deploy, opiniated                | To evaluate further       |
+| Subtree    | "best of both worlds"      | Still get double-commit, rebase problems | Not sure it's worth it |
+| Subrepo    | ? | ? | ? |
+| myrepos    | Flexible                   | Esoteric                                 | might be useful with our monorepo |