diff --git a/howto/puppet.md b/howto/puppet.md index a643010038f526ecdd8c3d1dc20cb86b2e00c506..a958feb52ee5948df9700108c626064aae61d01b 100644 --- a/howto/puppet.md +++ b/howto/puppet.md @@ -961,7 +961,180 @@ TPA should approve policy changes as per [tpa-rfc-1](/policy/tpa-rfc-1-policy). ## Proposed Solution -N/A. +To improve on the above "Goals", I would suggest the following +configuration. + +TL;DR: + + 1. Use a control repository + 2. Get rid of 3rdparty + 3. Deploy with g10k + 4. Authenticate with checksums + 5. Deploy to branch-specific environments + 6. Rename the default branch "production" + 7. Push directly on the Puppet server + 8. Use a role account + 9. Use local test environments + 10. Develop a test suite + 11. Hook into CI + 12. OpenPGP verification and web hook + +Steps 1-8 could be implemented without too much difficulty and should +be a mid term objective. Steps 9 to 12 require significantly more work +and could be implemented once the new infrastructure stabilizes. + +What follows is an explanation and justification of each step. + +### Use a control repository + +The base of the infrastructure is a [control-repo](https://puppet.com/docs/pe/latest/control_repo.html) ([example](https://github.com/puppetlabs/control-repo)) +which chain-loads all the other modules. This implies turning all our +"modules" into "profiles" and moving "real" modules (which are fit for +public consumption) "outside", into public repositories (see also +[issue 29387: publish our puppet repository](https://gitlab.torproject.org/tpo/tpa/team/-/issues/29387)). + +Note that the control repository *could* also be public: we could +simply have the private data inside of Hiera or some other private +repository. + +The control repository concept is specific to the proprietary version +of Puppet (Puppet Enterprise or PE) but its logic should be usable +with the open source Puppet release as well. + +### Get rid of 3rdparty + +The control repo's core configuration file is the `Puppetfile`. We +already use a Puppetfile, but only to manage modules inside of the +`3rdparty` directory. Now it would manage *all* modules, or, more +specifically, `3rdparty` would become the default `modules` directory +which would, incidentally, encourage us to upstream our modules and +publish them to the world. + +Our current `modules` directory would move into `site-modules`, which +is the designated location for "roles, profiles, and custom +modules". This has been suggested before in [issue 29387: publish our +puppet repository](https://gitlab.torproject.org/tpo/tpa/team/-/issues/29387)) and is important for the `Puppetfile` to do its +job. + +### Deploy with g10k + +It seems clear that everyone is converging over the use of a +`Puppetfile` to deploy code. While there are still monorepos out +there, but they do make our life harder, especially when we need to +operate on non-custom modules. + +Instead, we should converge towards *not* following upstream modules +in our git repository. Modules managed by the `Puppetfile` would *not* +be managed in our git monorepo and, instead, would be deployed by +`r10k`. + +### Authenticate code with checksums + +This part is the main problem with moving away from a monorepo. By +using a monorepo, we can audit the code we push into production. But +if we offload this to `r10k`, it can download code from wherever the +`Puppetfile` says, effectively shifting our trust path from OpenSSH +to HTTPS, the Puppet Forge, git and whatever remote gets added to the +`Puppetfile`. + +There is no obvious solution for this right now, surprisingly. Here +are two possible alternatives: + + 1. [g10k](https://github.com/xorpaul/g10k/) supports using a `:sha256sum` parameter to checksum + modules, but that only works for Forge modules. Maybe we could + pair this with using an explicit `sha1` reference for git + repository, ensuring those are checksummed as well. The downside + of that approach is that it leaves checked out git repositories in + a "detached head" state. + + 2. `r10k` has a [pending pull request](https://github.com/puppetlabs/r10k/pull/823) to add a `filter_command` + directive which could run after a git checkout has been + performed. it could presumably be used to verify OpenPGP + signatures on git commits, although this would work only on + modules we sign commits on (and therefore not third party) + +It seems the best approach would be to use g10k for now with checksums +on both git commit and forge modules. + +A validation hook running *before* g10k COULD validate that all `mod` +lines have a `checksum` of some sort... + +Note that this approach does *NOT* solve the "double-commit" problem +identified in the Goals. It is believed that only a "monorepo" would +fix that problem and that approach comes in direct conflict with the +"collaboration" requirement. We chose the latter. + +### Deploy to branch-specific environments + +A key feature of r10k (and, of course, g10k) is that they are capable +of deploying code to new environments depending on the branch we're +working on. We would enable that feature to allow testing some large +changes to critical code paths without affecting all servers. + +### Rename the default branch "production" + +In accordance with Puppet's best practices, the control repository's +default branch would be called "production" and not "master". + +Also: Black Lives Matter. + +### Push directly on the Puppet server + +Because we are worried about the GitLab attack surface, we could still +keep on pushing to the Puppet server for now. The control repository +could be mirrored to GitLab using a deploy key. All other repositories +would be published on GitLab anyways, and there the attack surface +would not matter because of the checksums in the control repository. + +### Use a role account + +To avoid permission issues, use a role account (say `git`) to accept +pushes and enforce git hooks. + +### Use local test environments + +It should eventually be possible to test changes locally before +pushing to production. This would involve radically simplifying the +Puppet server configuration and probably either getting rid of the +LDAP integration or at least making it optional so that changes can be +tested without it. + +This would involve "puppetizing" the Puppet server configuration so +that a Puppet server and test agent(s) could be bootstrapped +automatically. Operators would run "smoke tests" (running Puppet by +hand and looking at the result) to make sure their code works before +pushing to production. + +### Develop a test suite + +The next step is to start working on a test suite for services, at +least for new deployments, so that code can be tested without running +things by hand. Plenty of Puppet modules have such test suite, +generally using [rspec-puppet](https://rspec-puppet.com/) and [rspec-puppet-facts](https://github.com/mcanevet/rspec-puppet-facts), and we +already have a few modules in `3rdparty` that have such tests. The +idea would be to have those tests on a per-role or per-profile basis. + +### Hook into continuous integration + +Once tests are functional, the last step is to move the control +repository into GitLab directly and start running CI against the +Puppet code base. This would probably not happen until GitLab CI is +deployed, and would require lots of work to get there, but would +eventually be worth it. + +The GitLab CI would be indicative: an operator would need to push to a +topic branch there first to confirm tests pass but would still push +directly to the Puppet server for production. + +### OpenPGP verification and web hook + +To stop pushing directly to the Puppet server, we could implement +OpenPGP verification on the control repository. If a hook checks that +commits are signed by a trusted party, it does not matter where the +code is hosted. + +We could use the [webhook](https://github.com/voxpupuli/puppet_webhook) system to have GitLab notify the Puppet +server to pull code. ## Cost @@ -1125,3 +1298,52 @@ specific remotes in subdirectories of the monorepo automatically. | Subtree | "best of both worlds" | Still get double-commit, rebase problems | Not sure it's worth it | | Subrepo | ? | ? | ? | | myrepos | Flexible | Esoteric | might be useful with our monorepo | + +### Best practices survey + +I made a survey of the community (mostly the [shared puppet +modules](https://gitlab.com/shared-puppet-modules-group/) and [Voxpupuli](https://voxpupuli.org/) groups) to find out what the best +current practices are. + +Koumbit uses foreman/puppet but pinned at version 10.1 because it is +the last one supporting "passenger" (the puppetmaster deployment +method currently available in Debian, deprecated and dropped from +puppet 6). They [patched it](https://redmine.koumbit.net/projects/theforeman-puppet/repository/revisions/5b1b0b42f2d7d7b01eacde6584d3) to support `puppetlabs/apache < 6`. +They push to a bare repo on the puppet master, then they have +validation hooks (the inspiration for our #31226), and a hook deploys +the code to the right branch. + +They were using r10k but stopped because they had issues when r10k +would fail to deploy code atomically, leaving the puppetmaster (and +all nodes!) in an unusable state. This would happen when their git +servers were down without a locally cached copy. They also implemented +branch cleanup on deletion (although that could have been done some +other way). That issue was apparently reported against r10k but never +got a response. They now use puppet-librarian in their custom +hook. Note that it's possible r10k does not actually have that issue +because they found the issue they filed and it was... [against +librarian](https://github.com/voxpupuli/librarian-puppet/issues/73)! + +Some people in #voxpupuli seem to use the Puppetlabs Debian packages +and therefore puppetserver, r10k and puppetboards. Their [Monolithic +master](https://voxpupuli.org/docs/monolithic/) architecture uses an external git repository, which pings +the puppetmaster through a [webhook](https://github.com/voxpupuli/puppet_webhook) which deploys a +[control-repo](https://puppet.com/docs/pe/latest/control_repo.html) ([example](https://github.com/puppetlabs/control-repo)) and calls r10k to deploy the +code. They also use [foreman](https://www.theforeman.org/) as a node classifier. that procedure +uses the following modules: + + * [puppet/puppetserver](https://forge.puppet.com/puppet/puppetserver) + * [puppetlabs/puppet_agent](https://forge.puppet.com/puppetlabs/puppet_agent) + * [puppetlabs/puppetdb](https://forge.puppet.com/puppetlabs/puppetdb) + * [puppetlabs/puppet_metrics_dashboard](https://forge.puppet.com/puppetlabs/puppet_metrics_dashboard) + * [voxpupuli/puppet_webhook](https://github.com/voxpupuli/puppet_webhook) + * [r10k](https://github.com/puppetlabs/r10k) or [g10k](https://github.com/xorpaul/g10k) + * [Foreman](https://www.theforeman.org/) + +They also have a [master of masters](https://voxpupuli.org/docs/master_agent/) architecture for scaling to +larger setups. For scaling, I have found [this article](https://puppet.com/blog/scaling-open-source-puppet/) to be more +interesting, that said. + +So, in short, it seems people are converging towards r10k with a +web hook. To validate git repositories, they mirror the repositories +to a private git host.