propose improvements to the Puppet deployment

3a63110d · anarcat · d96532cb · 3a63110d
Verified Commit 3a63110d authored 4 years ago by anarcat
--- a/howto/puppet.md
+++ b/howto/puppet.md
@@ -961,7 +961,180 @@ TPA should approve policy changes as per [tpa-rfc-1](/policy/tpa-rfc-1-policy).

 ## Proposed Solution

-N/A.
+To improve on the above "Goals", I would suggest the following
+configuration.
+
+TL;DR:
+
+ 1. Use a control repository
+ 2. Get rid of 3rdparty
+ 3. Deploy with g10k
+ 4. Authenticate with checksums
+ 5. Deploy to branch-specific environments
+ 6. Rename the default branch "production"
+ 7. Push directly on the Puppet server
+ 8. Use a role account
+ 9. Use local test environments
+ 10. Develop a test suite
+ 11. Hook into CI
+ 12. OpenPGP verification and web hook
+
+Steps 1-8 could be implemented without too much difficulty and should
+be a mid term objective. Steps 9 to 12 require significantly more work
+and could be implemented once the new infrastructure stabilizes.
+
+What follows is an explanation and justification of each step.
+
+### Use a control repository
+
+The base of the infrastructure is a [control-repo](https://puppet.com/docs/pe/latest/control_repo.html) ([example](https://github.com/puppetlabs/control-repo))
+which chain-loads all the other modules. This implies turning all our
+"modules" into "profiles" and moving "real" modules (which are fit for
+public consumption) "outside", into public repositories (see also
+[issue 29387: publish our puppet repository](https://gitlab.torproject.org/tpo/tpa/team/-/issues/29387)).
+
+Note that the control repository *could* also be public: we could
+simply have the private data inside of Hiera or some other private
+repository.
+
+The control repository concept is specific to the proprietary version
+of Puppet (Puppet Enterprise or PE) but its logic should be usable
+with the open source Puppet release as well.
+
+### Get rid of 3rdparty
+
+The control repo's core configuration file is the `Puppetfile`. We
+already use a Puppetfile, but only to manage modules inside of the
+`3rdparty` directory. Now it would manage *all* modules, or, more
+specifically, `3rdparty` would become the default `modules` directory
+which would, incidentally, encourage us to upstream our modules and
+publish them to the world.
+
+Our current `modules` directory would move into `site-modules`, which
+is the designated location for "roles, profiles, and custom
+modules". This has been suggested before in [issue 29387: publish our
+puppet repository](https://gitlab.torproject.org/tpo/tpa/team/-/issues/29387)) and is important for the `Puppetfile` to do its
+job.
+
+### Deploy with g10k
+
+It seems clear that everyone is converging over the use of a
+`Puppetfile` to deploy code. While there are still monorepos out
+there, but they do make our life harder, especially when we need to
+operate on non-custom modules.
+
+Instead, we should converge towards *not* following upstream modules
+in our git repository. Modules managed by the `Puppetfile` would *not*
+be managed in our git monorepo and, instead, would be deployed by
+`r10k`.
+
+### Authenticate code with checksums
+
+This part is the main problem with moving away from a monorepo. By
+using a monorepo, we can audit the code we push into production. But
+if we offload this to `r10k`, it can download code from wherever the
+`Puppetfile` says, effectively shifting our trust path from OpenSSH
+to HTTPS, the Puppet Forge, git and whatever remote gets added to the
+`Puppetfile`.
+
+There is no obvious solution for this right now, surprisingly. Here
+are two possible alternatives:
+
+ 1. [g10k](https://github.com/xorpaul/g10k/) supports using a `:sha256sum` parameter to checksum
+    modules, but that only works for Forge modules. Maybe we could
+    pair this with using an explicit `sha1` reference for git
+    repository, ensuring those are checksummed as well. The downside
+    of that approach is that it leaves checked out git repositories in
+    a "detached head" state.
+
+ 2. `r10k` has a [pending pull request](https://github.com/puppetlabs/r10k/pull/823) to add a `filter_command`
+    directive which could run after a git checkout has been
+    performed. it could presumably be used to verify OpenPGP
+    signatures on git commits, although this would work only on
+    modules we sign commits on (and therefore not third party)
+
+It seems the best approach would be to use g10k for now with checksums
+on both git commit and forge modules.
+
+A validation hook running *before* g10k COULD validate that all `mod`
+lines have a `checksum` of some sort...
+
+Note that this approach does *NOT* solve the "double-commit" problem
+identified in the Goals. It is believed that only a "monorepo" would
+fix that problem and that approach comes in direct conflict with the
+"collaboration" requirement. We chose the latter.
+
+### Deploy to branch-specific environments
+
+A key feature of r10k (and, of course, g10k) is that they are capable
+of deploying code to new environments depending on the branch we're
+working on. We would enable that feature to allow testing some large
+changes to critical code paths without affecting all servers.
+
+### Rename the default branch "production"
+
+In accordance with Puppet's best practices, the control repository's
+default branch would be called "production" and not "master".
+
+Also: Black Lives Matter.
+
+### Push directly on the Puppet server
+
+Because we are worried about the GitLab attack surface, we could still
+keep on pushing to the Puppet server for now. The control repository
+could be mirrored to GitLab using a deploy key. All other repositories
+would be published on GitLab anyways, and there the attack surface
+would not matter because of the checksums in the control repository.
+
+### Use a role account
+
+To avoid permission issues, use a role account (say `git`) to accept
+pushes and enforce git hooks.
+
+### Use local test environments
+
+It should eventually be possible to test changes locally before
+pushing to production. This would involve radically simplifying the
+Puppet server configuration and probably either getting rid of the
+LDAP integration or at least making it optional so that changes can be
+tested without it.
+
+This would involve "puppetizing" the Puppet server configuration so
+that a Puppet server and test agent(s) could be bootstrapped
+automatically. Operators would run "smoke tests" (running Puppet by
+hand and looking at the result) to make sure their code works before
+pushing to production.
+
+### Develop a test suite
+
+The next step is to start working on a test suite for services, at
+least for new deployments, so that code can be tested without running
+things by hand. Plenty of Puppet modules have such test suite,
+generally using [rspec-puppet](https://rspec-puppet.com/) and [rspec-puppet-facts](https://github.com/mcanevet/rspec-puppet-facts), and we
+already have a few modules in `3rdparty` that have such tests. The
+idea would be to have those tests on a per-role or per-profile basis.
+
+### Hook into continuous integration
+
+Once tests are functional, the last step is to move the control
+repository into GitLab directly and start running CI against the
+Puppet code base. This would probably not happen until GitLab CI is
+deployed, and would require lots of work to get there, but would
+eventually be worth it.
+
+The GitLab CI would be indicative: an operator would need to push to a
+topic branch there first to confirm tests pass but would still push
+directly to the Puppet server for production.
+
+### OpenPGP verification and web hook
+
+To stop pushing directly to the Puppet server, we could implement
+OpenPGP verification on the control repository. If a hook checks that
+commits are signed by a trusted party, it does not matter where the
+code is hosted.
+
+We could use the [webhook](https://github.com/voxpupuli/puppet_webhook) system to have GitLab notify the Puppet
+server to pull code.

 ## Cost

@@ -1125,3 +1298,52 @@ specific remotes in subdirectories of the monorepo automatically.
 | Subtree    | "best of both worlds"      | Still get double-commit, rebase problems | Not sure it's worth it |
 | Subrepo    | ? | ? | ? |
 | myrepos    | Flexible                   | Esoteric                                 | might be useful with our monorepo |
+
+### Best practices survey
+
+I made a survey of the community (mostly the [shared puppet
+modules](https://gitlab.com/shared-puppet-modules-group/) and [Voxpupuli](https://voxpupuli.org/) groups) to find out what the best
+current practices are.
+
+Koumbit uses foreman/puppet but pinned at version 10.1 because it is
+the last one supporting "passenger" (the puppetmaster deployment
+method currently available in Debian, deprecated and dropped from
+puppet 6). They [patched it](https://redmine.koumbit.net/projects/theforeman-puppet/repository/revisions/5b1b0b42f2d7d7b01eacde6584d3) to support `puppetlabs/apache < 6`.
+They push to a bare repo on the puppet master, then they have
+validation hooks (the inspiration for our #31226), and a hook deploys
+the code to the right branch.
+
+They were using r10k but stopped because they had issues when r10k
+would fail to deploy code atomically, leaving the puppetmaster (and
+all nodes!) in an unusable state. This would happen when their git
+servers were down without a locally cached copy. They also implemented
+branch cleanup on deletion (although that could have been done some
+other way). That issue was apparently reported against r10k but never
+got a response. They now use puppet-librarian in their custom
+hook. Note that it's possible r10k does not actually have that issue
+because they found the issue they filed and it was... [against
+librarian](https://github.com/voxpupuli/librarian-puppet/issues/73)!
+
+Some people in #voxpupuli seem to use the Puppetlabs Debian packages
+and therefore puppetserver, r10k and puppetboards. Their [Monolithic
+master](https://voxpupuli.org/docs/monolithic/) architecture uses an external git repository, which pings
+the puppetmaster through a [webhook](https://github.com/voxpupuli/puppet_webhook) which deploys a
+[control-repo](https://puppet.com/docs/pe/latest/control_repo.html) ([example](https://github.com/puppetlabs/control-repo)) and calls r10k to deploy the
+code. They also use [foreman](https://www.theforeman.org/) as a node classifier. that procedure
+uses the following modules:
+
+ * [puppet/puppetserver](https://forge.puppet.com/puppet/puppetserver)
+ * [puppetlabs/puppet_agent](https://forge.puppet.com/puppetlabs/puppet_agent)
+ * [puppetlabs/puppetdb](https://forge.puppet.com/puppetlabs/puppetdb)
+ * [puppetlabs/puppet_metrics_dashboard](https://forge.puppet.com/puppetlabs/puppet_metrics_dashboard)
+ * [voxpupuli/puppet_webhook](https://github.com/voxpupuli/puppet_webhook)
+ * [r10k](https://github.com/puppetlabs/r10k) or [g10k](https://github.com/xorpaul/g10k)
+ * [Foreman](https://www.theforeman.org/)
+
+They also have a [master of masters](https://voxpupuli.org/docs/master_agent/) architecture for scaling to
+larger setups. For scaling, I have found [this article](https://puppet.com/blog/scaling-open-source-puppet/) to be more
+interesting, that said.
+
+So, in short, it seems people are converging towards r10k with a
+web hook. To validate git repositories, they mirror the repositories
+to a private git host.