puppet.md

http://localhost:8080/pdb/dashboard/index.html

PuppetDB itself also holds performance information about the Puppet agent runs,
which are called "reports". Those reports contain information about changes
operated on each server, how long the agent runs take and so on. Those metrics
could be made more visible by using a dashboard, but that has not been
implemented yet (see [issue 31969][]).

[issue 31969]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/31969

The Puppet server, Puppet agents and PuppetDB keep logs of their
operations. The latter keeps its logs in `/var/log/puppetdb/` for a
maximum of 90 days or 1GB, whichever comes first (configured in
`/etc/puppetdb/request-logging.xml` and
`/etc/puppetdb/logback.xml`). The other logs are sent to `syslog`, and
usually end up in `daemon.log`.

Puppet should hold minimal personally identifiable information, like
user names, user public keys and project names.

## Other documentation

 * [Latest Puppet docs](https://puppet.com/docs/puppet/latest/puppet_index.html) - might be too new, see also the [Puppet
   5.5 docs](https://puppet.com/docs/puppet/5.5/puppet_index.html)
   * [Function reference](https://puppet.com/docs/puppet/latest/function.html)
   * [Type reference](https://puppet.com/docs/puppet/latest/type.html)
 * [Mapping between versions of Puppet Entreprise, Facter, Hiera, Agent, etc](https://puppet.com/docs/pe/2019.0/component_versions_in_recent_pe_releases.html)

# Discussion

This section goes more in depth into how Puppet is setup, why it was
setup the way it was, and how it could be improved.

## Overview

Our Puppet setup dates back from 2011, according to the git history,
and was probably based off the [Debian System Administrator's Puppet
codebase](https://salsa.debian.org/dsa-team/mirror/dsa-puppet) which dates back to 2009.

## Goals

The general goal of Puppet is to provide basic automation across the
architecture, so that software installation and configuration, file
distribution, user and some service management is done from a central
location, managed in a git repository. This approach is often called
[Infrastructure as code](https://en.wikipedia.org/wiki/Infrastructure_as_Code).

This section also documents possible improvements to our Puppet
configuration that we are considering.

### Must have

 * **secure**: only sysadmins should have access to push configuration,
   whatever happens. this includes deploying only audited and verified
   Puppet code into production.
 * **code review**: changes on servers should be verifiable by our peers,
   through a git commit log
 * **fix permissions issues**: deployment system should allow all admins
   to push code to the puppet server without having to constantly fix
   permissions (e.g. through a [role account](https://gitlab.torproject.org/tpo/tpa/team/-/issues/29663))
 * **secrets handling**: there are some secrets in Puppet. those
   should remain secret.

We mostly have this now, although there are concerns about permissions
being wrong sometimes, which a role account could fix.

### Nice to have

Those are mostly issues with the current architecture we'd like to fix:

 * **Continuous Integration**: before deployment, code should be vetted by
   a peer and, ideally, automatically checked for errors and tested
 * **single source of truth**: when we add/remove nodes, we should not
   have to talk to multiple services (see also the [install automation
   ticket](https://gitlab.torproject.org/tpo/tpa/team/-/issues/31239) and the [new-machine discussion](new-machine#discussion)
 * **collaboration** with other sysadmins outside of TPA, for which we
   would need to...
 * ... **publicize our code** (see [ticket 29387](https://gitlab.torproject.org/tpo/tpa/team/-/issues/29387))
 * **no manual changes**: every change on every server should be committed
   to version control somewhere
 * **bare-metal recovery**: it should be possible to recover a service's
   *configuration* from a bare Debian install with Puppet (and with
   data from the [backup](backup) service of course...)
 * **one commit only**: we shouldn't have to commit "twice" to get
   changes propagated (once in a submodule, once in the parent module,
   for example)

### Non-Goals

 * **ad hoc changes** to the infrastructure. one-off jobs should be
   handled by [fabric](fabric), Cumin, or straight SSH.

## Approvals required

TPA should approve policy changes as per [tpa-rfc-1](/policy/tpa-rfc-1-policy).

## Proposed Solution

To improve on the above "Goals", I would suggest the following
configuration.

TL;DR:

 0. publish our repository (tpo/tpa/team#29387)
 1. Use a control repository
 2. Get rid of `3rdparty`
 3. Deploy with `g10k`
 4. Authenticate with checksums
 5. Deploy to branch-specific environments (tpo/tpa/team#40861)
 6. Rename the default branch "production"
 7. Push directly on the Puppet server
 8. Use a role account (tpo/tpa/team#29663)
 9. Use local test environments
 10. Develop a test suite
 11. Hook into CI
 12. OpenPGP verification and web hook

Steps 1-8 could be implemented without too much difficulty and should
be a mid term objective. Steps 9 to 12 require significantly more work
and could be implemented once the new infrastructure stabilizes.

What follows is an explanation and justification of each step.

### Publish our repository

Right now our Puppet repository is private, because there's
sensitive information in there. The goal of this step is to make sure
we can safely publish our repository without risking disclosing
secrets.

Secret data is currently stored in Trocla, and we should keep using it
for that purpose. That would avoid having to mess around splitting the
repository in multiple components in the short term.

This is the data that needs to be moved into Trocla at the time of writing:

 * `modules/postfix/files/virtual` - email addresses
 * `modules/postfix/files/access-1-sender-reject` and related - email addresses
 * sudoers configurations?

A full audit should be redone before this is completed.

### Use a control repository

The base of the infrastructure is a [control-repo](https://puppet.com/docs/pe/latest/control_repo.html) ([example](https://github.com/puppetlabs/control-repo),
[another more complex example](https://github.com/example42/psick))
which chain-loads all the other modules. This implies turning all our
"modules" into "profiles" and moving "real" modules (which are fit for
public consumption) "outside", into public repositories (see also
[issue 29387: publish our puppet repository](https://gitlab.torproject.org/tpo/tpa/team/-/issues/29387)).

Note that the control repository *could* also be public: we could simply have
all the private data inside of Trocla or some other private repository.

The control repository concept originates from the proprietary version
of Puppet (Puppet Enterprise or PE) but its logic is applicable to the
open source Puppet release as well.

### Get rid of 3rdparty

The control repo's core configuration file is the `Puppetfile`. We
already use a Puppetfile to manage modules inside of the `3rdparty`
directory.

Our current `modules/` directory would be split into `site/`, which
is the designated location for roles and profiles, and `legacy/`, which
would host private custom modules, with the goal of getting rid of `legacy/`
altogether by either publishing our custom modules and integrating them into
the `Puppetfile` or transforming them into a new profile class in
`site/profile/`.

In other words, this is the checklist:

 * [x] convert everything to hiera (tpo/tpa/team#30020) - this
       requires creating `roles` for each machine (more or less) --
       effectively done as far as this issue is concerned
 * [ ] sanitize repository (tpo/tpa/team#29387)
 * [ ] rename `hiera/` to `data/`
 * [ ] add `site/` and `legacy/` to modulepaths environment config
 * [ ] move `modules/profile/` and `modules/role/` modules into `site/`
 * [ ] move remaining modules in `modules/` into `legacy/`
 * [ ] move `3rdparty/*` into environment root

Once this is done, our Puppet environment would look like this:

 * `data/` - configuration data for profiles and modules

 * `modules/` - equivalent of the current `3rdparty/modules/` directory: fully
   public, reusable code that's aimed at collaboration, mostly code from the
   Puppet forge or our own repository if no equivalent there

 * `site/profile/` - "magic sauce" on top of 3rd party `modules/` to
   configure 3rd party modules according to our site-specific requirements

 * `site/role/` - abstract classes that assemble several profiles to define
   a logical role for any given machine in our infrastructure

 * `legacy/` - remaining custom modules that still need to be either published
   and moved to their own repository in `modules/`, or replaced with an existing
   3rd party module (eg. from voxpupuli)

Although the module paths would be rearranged, no class names would be changed
as a result of this, such that no changes would be required of the actual puppet
code.

### Deploy with g10k

It seems clear that everyone is converging over the use of a
`Puppetfile` to deploy code. While there are still monorepos out
there, but they do make our life harder, especially when we need to
operate on non-custom modules.

Instead, we should converge towards *not* following upstream modules
in our git repository. Modules managed by the `Puppetfile` would *not*
be managed in our git monorepo and, instead, would be deployed by
`r10k` or `g10k` (most likely the latter because of its support for
checksums).

Note that neither `r10k` or `g10k` resolve dependencies in a
`Puppetfile`. We therefore also need a tool to verify the file
correctly lists all required modules. The following solutions need to
be validated but could address that issue:

 * [generate-puppetfile](https://github.com/rnelson0/puppet-generate-puppetfile): take a `Puppetfile` and walk the
   dependency tree, generating a new `Puppetfile` (see also [this
   introduction to the project](https://rnelson0.com/2015/11/06/introducing-generate-puppetfile-or-creating-a-ruby-program-to-update-your-puppetfile-and-fixtures-yml/))
 * [Puppetfile-updater](https://github.com/camptocamp/puppetfile-updater): read the `Puppetfile` and fetch new releases
 * [ra10ke](https://github.com/voxpupuli/ra10ke): a bunch of Rake tasks to validate a `Puppetfile`
   * `r10k:syntax`: syntax check, see also `r10k puppetfile check`
   * `r10k:dependencies`: check for out of date dependencies
   * `r10k:solve_dependencies`: check for **missing** dependencies
   * `r10k:install`: wrapper around `r10k` to install with some
     caveats
   * `r10k:validate`: make sure modules are accessible
   * `r10k:duplicates`: look for duplicate declarations
 * [lp2r10k](https://github.com/dharmabruce/lp2r10k/): convert "librarian" `Puppetfile` (missing
   dependencies) into a "r10k" `Puppetfile` (with dependencies)

Note that this list comes from the [updating your Puppetfile](https://github.com/puppetlabs/r10k/blob/master/doc/updating-your-puppetfile.mkd#automatic-updates)
documentation in the r10k project, which is also relevant here.

### Authenticate code with checksums

This part is the main problem with moving away from a monorepo. By
using a monorepo, we can audit the code we push into production. But
if we offload this to `r10k`, it can download code from wherever the
`Puppetfile` says, effectively shifting our trust path from OpenSSH
to HTTPS, the Puppet Forge, git and whatever remote gets added to the
`Puppetfile`.

There is no obvious solution for this right now, surprisingly. Here
are two possible alternatives:

 1. [g10k](https://github.com/xorpaul/g10k/) supports using a `:sha256sum` parameter to checksum
    modules, but that only works for Forge modules. Maybe we could
    pair this with using an explicit `sha1` reference for git
    repository, ensuring those are checksummed as well. The downside
    of that approach is that it leaves checked out git repositories in
    a "detached head" state.

 2. `r10k` has a [pending pull request](https://github.com/puppetlabs/r10k/pull/823) to add a `filter_command`
    directive which could run after a git checkout has been
    performed. it could presumably be used to verify OpenPGP
    signatures on git commits, although this would work only on
    modules we sign commits on (and therefore not third party)

It seems the best approach would be to use g10k for now with checksums
on both git commit and forge modules.

A validation hook running *before* g10k COULD validate that all `mod`
lines have a `checksum` of some sort...

Note that this approach does *NOT* solve the "double-commit" problem
identified in the Goals. It is believed that only a "monorepo" would
fix that problem and that approach comes in direct conflict with the
"collaboration" requirement. We chose the latter.

This could be implemented as a patch to `ra10ke`.

### Deploy to branch-specific environments

A key feature of r10k (and, of course, g10k) is that they are capable
of deploying code to new environments depending on the branch we're
working on. We would enable that feature to allow testing some large
changes to critical code paths without affecting all servers.

See tpo/tpa/team#40861.

### Rename the default branch "production"

In accordance with Puppet's best practices, the control repository's
default branch would be called "production" and not "master".

Also: Black Lives Matter.

### Push directly on the Puppet server

Because we are worried about the GitLab attack surface, we could still
keep on pushing to the Puppet server for now. The control repository
could be mirrored to GitLab using a deploy key. All other repositories
would be published on GitLab anyways, and there the attack surface
would not matter because of the checksums in the control repository.

### Use a role account

To avoid permission issues, use a role account (say `git`) to accept
pushes and enforce git hooks (tpo/tpa/team#29663).

### Use local test environments

It should eventually be possible to test changes locally before
pushing to production. This would involve radically simplifying the
Puppet server configuration and probably either getting rid of the
LDAP integration or at least making it optional so that changes can be
tested without it.

This would involve "puppetizing" the Puppet server configuration so
that a Puppet server and test agent(s) could be bootstrapped
automatically. Operators would run "smoke tests" (running Puppet by
hand and looking at the result) to make sure their code works before
pushing to production.

### Develop a test suite

The next step is to start working on a test suite for services, at
least for new deployments, so that code can be tested without running
things by hand. Plenty of Puppet modules have such test suite,
generally using [rspec-puppet](https://rspec-puppet.com/) and [rspec-puppet-facts](https://github.com/mcanevet/rspec-puppet-facts), and we
already have a few modules in `3rdparty` that have such tests. The
idea would be to have those tests on a per-role or per-profile basis.

The Foreman people have published [their test infrastructure](https://github.com/theforeman/foreman-infra/tree/master/puppet) which
could be useful as inspiration for our purposes here.

### Hook into continuous integration

Once tests are functional, the last step is to move the control
repository into GitLab directly and start running CI against the
Puppet code base. This would probably not happen until GitLab CI is
deployed, and would require lots of work to get there, but would
eventually be worth it.

The GitLab CI would be indicative: an operator would need to push to a
topic branch there first to confirm tests pass but would still push
directly to the Puppet server for production.

Note that we are working on (client-side) validation hooks for now,
see [issue 31226][].

[issue 31226]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/31226

### OpenPGP verification and web hook

To stop pushing directly to the Puppet server, we could implement
OpenPGP verification on the control repository. If a hook checks that
commits are signed by a trusted party, it does not matter where the
code is hosted.

A good reference for OpenPGP verification is [this guix article](https://guix.gnu.org/blog/2020/securing-updates/)
which covers a few scenarios and establishes a pretty solid
verification workflow. There's also a larger project-wide discussion
in [GitLab](howto/gitlab) [issue 81](https://gitlab.torproject.org/tpo/tpa/gitlab/-/issues/81).

We could use the [webhook](https://github.com/voxpupuli/puppet_webhook) system to have GitLab notify the Puppet
server to pull code.

## Cost

N/A.

## Alternatives considered

Ansible was considered for managing [GitLab](gitlab) for a while, but
this was eventually abandoned in favor of using Puppet and the
"Omnibus" package.

For ad hoc jobs, [fabric](fabric) is being used.

For code management, I have done a more extensive review of possible
alternatives. [This talk](https://www.youtube.com/watch?v=RdIyStATgFE) is a good introduction for git submodule,
librarian and r10k. Based on that talk and [these slide](https://arlimus.github.io/slides/librarian.and.r10k/), I've made
the following observations:

### monorepo

This is our current approach, which is that all code is committed in
one monolithic repository. This effectively makes it impossible to
share code outside of the repository with anyone else because there is
private data inside, but also because it doesn't follow the standard
role/profile/modules separation that makes collaboration possible at
all. To work around that, I designed a workflow where we locally clone
subrepos as needed, but this is clunky as it requires to commit every
change twice: one for the subrepo, one for the parent.

Our giant monorepo also mixes all changes together which can be an pro
*and* a con: on the one hand it's easy to see and audit all changes at
once, but on the other hand, it can be overwhelming and confusing.

But it does allow us to integrate with librarian right now and is a
good stopgap solution. A better solution would need to solve the
"double-commit" problem and still allow us to have smaller
repositories that we can collaborate on outside of our main tree.

### submodules

The talk partially covers how difficult `git submodules` work and how
hard they are to deal with. I say partially because submodules are
even harder to deal with than the examples she gives. She shows how
submodules are hard to add and remove, because the metadata is stored
in stored in multiple locations (`.gitsubmodules`, `.git/config`,
`.git/modules/` and the submodule repository itself).

She also mentions submodules don't know about dependencies and it's
likely you will break your setup if you forget one step. (See [this
post](https://web.archive.org/web/20171101202911/http://somethingsinistral.net/blog/git-submodules-are-probably-not-the-answer/) for more examples.)

In my experience, the biggest annoyance with submodules is the
"double-commit" problem: you need to make commits in the submodule,
then *redo* the commits in the parent repository to chase the head of
that submodule. This does not improve on our current situation, which
is that we need to do those two commits anyways in our giant monorepo.

One advantage with submodules is that they're mostly standard:
everyone knows about them, even if they're not familiar and their
knowledge is reusable outside of Puppet.

Others have *strong* opinions about submodules, with one Debian
developer suggesting to [Never use git submodules](https://diziet.dreamwidth.org/14666.html) and instead
recommending `git subtree`, a monorepo, `myrepos`, or ad-hoc scripts.

### librarian

Librarian is written in ruby. It's built on top of [another library
called librarian](https://github.com/applicationsonline/librarian) that is used by Ruby's [bundler](https://gembundler.com/). At the time
of the talk, was "pretty active" but unfortunately, librarian now
seems to be [abandoned](https://github.com/voxpupuli/librarian-puppet/issues/48) so we might be forced to use r10k in the
future, which has a quite different workflow.

One problem with librarian right now is that `librarian update` clears
any existing git subrepo and re-clones it from scratch. If you have
temporary branches that were not pushed remotely, all of those are
lost forever. That's really bad and annoying! it's by design: it
"takes over your modules directory", as she explains in the talk and
everything comes from the Puppetfile.

Librarian does resolve dependencies recursively and store the decided
versions in a lockfile which allow us to "see" what happens when you
update from a Puppetfile.

But there's no cryptographic chain of trust between the repository
where the Puppetfile is and the modules that are checked out. Unless
the module is checked out from git (which isn't the default), only
version range specifiers constrain which code is checked out, which
gives a huge surface area for arbitrary code injection in the entire
puppet infrastructure (e.g. MITM, forge compromise, hostile upstream
attacks)

### r10k

r10k was written because librarian was too slow for large
deployments. But it covers more than just managing code: it also
manages environments and is designed to run on the Puppet master. It
doesn't have dependency resolution or a `Puppetfile.lock`,
however. See [this ticket](https://github.com/puppetlabs/r10k/issues/38), closed in favor of [that one](https://tickets.puppetlabs.com/browse/RK-3).

r10k is more complex and very opiniated: it requires lots of
configuration including its own YAML file, hooks into the Puppetmaster
and can [take a while to deploy](http://garylarizza.com/blog/2014/02/18/puppet-workflow-part-3/). r10k is still in [active
development](https://github.com/puppetlabs/r10k/releases) and is supported by Puppetlabs, so there's [official
documentation](https://puppet.com/docs/pe/2019.1/r10k.html) in the Puppet documentation.

Often used in conjunction with librarian for dependency resolution.

One cool feature is that r10k allows you to create dynamic
environments based on branch names. All you need is a single repo with
a Puppetfile and r10k handles the rest. The problem, of course, is
that you need to trust it's going to do the right thing. There's the
security issue, but there's also the problem of resolving dependencies
and you *do* end up double-committing in the end if you use branches
in sub-repositories. But maybe that is unavoidable.

(Note that there are ways of resolving dependencies with external
tools, like [generate-puppetfile](https://github.com/rnelson0/puppet-generate-puppetfile) ([introduction](https://rnelson0.com/2015/11/06/introducing-generate-puppetfile-or-creating-a-ruby-program-to-update-your-puppetfile-and-fixtures-yml/)) or [this hack
that reformats librarian output](https://github.com/dharmabruce/lp2r10k/blob/master/lp2r10k) or [those rake tasks](https://github.com/voxpupuli/ra10ke). there's
also a [go rewrite called g10k](https://github.com/xorpaul/g10k) that is much faster, but with
similar limitations.)

### git subtree

[This article](https://web.archive.org/web/20171107082413/http://somethingsinistral.net/blog/scaling-puppet-environment-deployment/) mentions git subtrees from the point of view of
Puppet management quickly. It outline how it's cool that the history
of the subtree gets merged as is in the parent repo, which gives us
the best of both world (individual, per-module history view along with
a global view in the parent repo). It makes, however, rebasing in
subtrees impossible, as it breaks the parent merge. You do end up with
some of the disadvantages of the monorepo in the all the code is
actually committed in the parent repo and you *do* have to commit
twice as well.

### subrepo

The [git-subrepo](https://github.com/ingydotnet/git-subrepo) is "an improvement from `git-submodule` and
`git-subtree`". It is a mix between a monorepo and a submodule system,
with modules being stored in a `.gitrepo` file. It is somewhat less
well known than the other alternatives, presumably because it's newer?

It is entirely written in `bash`, which I find somewhat scary. It is
[not packaged in Debian yet](http://bugs.debian.org/911397) but might be soon.

It works around the "double-commit issue" by having a special `git
subrepo commit` command that "does the right thing". That, in general,
is its major flaw: it reproduces many git commands like `init`,
`push`, `pull` as subcommands, so you need to remember which command
to run. To quote the (rather terse) manual:

> All the subrepo commands use names of actual Git commands and try to
> do operations that are similar to their Git counterparts. They also
> attempt to give similar output in an attempt to make the subrepo
> usage intuitive to experienced Git users.
>
> Please note that the commands are not exact equivalents, and do not
> take all the same arguments

Still, its feature set is impressive and could be the perfect mix
between the "submodules" and "subtree" approach of still keeping a
monorepo while avoiding the double-commit issue.

### myrepos

[myrepos](https://myrepos.branchable.com/) is one of many solutions to manage multiple git
repositories. It has been used in the past at my old workplace
(Koumbit.org) to manage and checkout multiple git repositories.

Like Puppetfile without locks, it doesn't enforce cryptographic
integrity between the master repositories and the subrepositories: all
it does is define remotes and their locations.

Like r10k it doesn't handle dependencies and will require extra setup,
although it's much lighter than r10k.

Its main disadvantage is that it isn't well known and might seem
esoteric to people. It also has weird failure modes, but could be used
in parallel with a monorepo. For example, it might allow us to setup
specific remotes in subdirectories of the monorepo automatically.

### Summary table

| Approach   | Pros                       | Cons                                     | Summary                           |
|------------|----------------------------|------------------------------------------|-----------------------------------|
| Monorepo   | Simple                     | Double-commit                            | Status quo                        |
| Submodules | Well-known                 | Hard to use, double-commit               | Not great                         |
| Librarian  | Dep resolution client-side | Unmaintained, bad integration with git   | Not sufficient on its own         |
| r10k       | Standard                   | Hard to deploy, opiniated                | To evaluate further               |
| Subtree    | "best of both worlds"      | Still get double-commit, rebase problems | Not sure it's worth it            |
| Subrepo    | subtree + optional         | Unusual, new commands to learn           | To evaluate further               |
| myrepos    | Flexible                   | Esoteric                                 | might be useful with our monorepo |

### Best practices survey

I made a survey of the community (mostly the [shared puppet
modules](https://gitlab.com/shared-puppet-modules-group/) and [Voxpupuli](https://voxpupuli.org/) groups) to find out what the best
current practices are.

Koumbit uses foreman/puppet but pinned at version 10.1 because it is
the last one supporting "passenger" (the puppetmaster deployment
method currently available in Debian, deprecated and dropped from
puppet 6). They [patched it](https://redmine.koumbit.net/projects/theforeman-puppet/repository/revisions/5b1b0b42f2d7d7b01eacde6584d3) to support `puppetlabs/apache < 6`.
They push to a bare repo on the puppet master, then they have
validation hooks (the inspiration for our own hook implementation, see
[issue 31226][]), and a hook deploys the code to the right branch.

They were using r10k but stopped because they had issues when r10k
would fail to deploy code atomically, leaving the puppetmaster (and
all nodes!) in an unusable state. This would happen when their git
servers were down without a locally cached copy. They also implemented
branch cleanup on deletion (although that could have been done some
other way). That issue was apparently reported against r10k but never
got a response. They now use puppet-librarian in their custom
hook. Note that it's possible r10k does not actually have that issue
because they found the issue they filed and it was... [against
librarian](https://github.com/voxpupuli/librarian-puppet/issues/73)!

Some people in #voxpupuli seem to use the Puppetlabs Debian packages
and therefore puppetserver, r10k and puppetboards. Their [Monolithic
master](https://voxpupuli.org/docs/monolithic/) architecture uses an external git repository, which pings
the puppetmaster through a [webhook](https://github.com/voxpupuli/puppet_webhook) which deploys a
[control-repo](https://puppet.com/docs/pe/latest/control_repo.html) ([example](https://github.com/puppetlabs/control-repo)) and calls r10k to deploy the
code. They also use [foreman](https://www.theforeman.org/) as a node classifier. that procedure
uses the following modules:

 * [puppet/puppetserver](https://forge.puppet.com/puppet/puppetserver)
 * [puppetlabs/puppet_agent](https://forge.puppet.com/puppetlabs/puppet_agent)
 * [puppetlabs/puppetdb](https://forge.puppet.com/puppetlabs/puppetdb)
 * [puppetlabs/puppet_metrics_dashboard](https://forge.puppet.com/puppetlabs/puppet_metrics_dashboard)
 * [voxpupuli/puppet_webhook](https://github.com/voxpupuli/puppet_webhook)
 * [r10k](https://github.com/puppetlabs/r10k) or [g10k](https://github.com/xorpaul/g10k)
 * [Foreman](https://www.theforeman.org/)

They also have a [master of masters](https://voxpupuli.org/docs/master_agent/) architecture for scaling to
larger setups. For scaling, I have found [this article](https://puppet.com/blog/scaling-open-source-puppet/) to be more
interesting, that said.

So, in short, it seems people are converging towards r10k with a
web hook. To validate git repositories, they mirror the repositories
to a private git host.

After writing this document, anarcat decided to try a setup with a
"control-repo" and `g10k`, because the latter can cryptographically
verify third-party repositories, either through a git hash or tarball
checksum. There's still only a single environment (I haven't
implemented the "create an environment on a new branch" hook). And it
often means two checkins when we work on shared modules, but that can
be alleviated by skipping the cryptographic check and trusting
transport by having the Puppetfile chase a branch name instead of a
checksum, during development. In production, of course, a checksum can
then be pinned again, but that is the biggest flaw in that workflow.

### Other alternatives

 * [josh](https://github.com/josh-project/josh): "Combine the advantages of a monorepo with those of
   multirepo setups by leveraging a blazingly-fast, incremental, and
   reversible implementation of git history filtering."
 * [lerna](https://lerna.js.org/): Node/JS multi-project management
 * [lite](https://github.com/splitsh/lite): git repo splitter
 * [git-subsplit](https://github.com/dflydev/git-subsplit): "Automate
   and simplify the process of managing one-way read-only subtree
   splits"