Skip to content
Snippets Groups Projects
Verified Commit 3a63110d authored by anarcat's avatar anarcat
Browse files

propose improvements to the Puppet deployment

parent d96532cb
No related branches found
No related tags found
No related merge requests found
......@@ -961,7 +961,180 @@ TPA should approve policy changes as per [tpa-rfc-1](/policy/tpa-rfc-1-policy).
## Proposed Solution
N/A.
To improve on the above "Goals", I would suggest the following
configuration.
TL;DR:
1. Use a control repository
2. Get rid of 3rdparty
3. Deploy with g10k
4. Authenticate with checksums
5. Deploy to branch-specific environments
6. Rename the default branch "production"
7. Push directly on the Puppet server
8. Use a role account
9. Use local test environments
10. Develop a test suite
11. Hook into CI
12. OpenPGP verification and web hook
Steps 1-8 could be implemented without too much difficulty and should
be a mid term objective. Steps 9 to 12 require significantly more work
and could be implemented once the new infrastructure stabilizes.
What follows is an explanation and justification of each step.
### Use a control repository
The base of the infrastructure is a [control-repo](https://puppet.com/docs/pe/latest/control_repo.html) ([example](https://github.com/puppetlabs/control-repo))
which chain-loads all the other modules. This implies turning all our
"modules" into "profiles" and moving "real" modules (which are fit for
public consumption) "outside", into public repositories (see also
[issue 29387: publish our puppet repository](https://gitlab.torproject.org/tpo/tpa/team/-/issues/29387)).
Note that the control repository *could* also be public: we could
simply have the private data inside of Hiera or some other private
repository.
The control repository concept is specific to the proprietary version
of Puppet (Puppet Enterprise or PE) but its logic should be usable
with the open source Puppet release as well.
### Get rid of 3rdparty
The control repo's core configuration file is the `Puppetfile`. We
already use a Puppetfile, but only to manage modules inside of the
`3rdparty` directory. Now it would manage *all* modules, or, more
specifically, `3rdparty` would become the default `modules` directory
which would, incidentally, encourage us to upstream our modules and
publish them to the world.
Our current `modules` directory would move into `site-modules`, which
is the designated location for "roles, profiles, and custom
modules". This has been suggested before in [issue 29387: publish our
puppet repository](https://gitlab.torproject.org/tpo/tpa/team/-/issues/29387)) and is important for the `Puppetfile` to do its
job.
### Deploy with g10k
It seems clear that everyone is converging over the use of a
`Puppetfile` to deploy code. While there are still monorepos out
there, but they do make our life harder, especially when we need to
operate on non-custom modules.
Instead, we should converge towards *not* following upstream modules
in our git repository. Modules managed by the `Puppetfile` would *not*
be managed in our git monorepo and, instead, would be deployed by
`r10k`.
### Authenticate code with checksums
This part is the main problem with moving away from a monorepo. By
using a monorepo, we can audit the code we push into production. But
if we offload this to `r10k`, it can download code from wherever the
`Puppetfile` says, effectively shifting our trust path from OpenSSH
to HTTPS, the Puppet Forge, git and whatever remote gets added to the
`Puppetfile`.
There is no obvious solution for this right now, surprisingly. Here
are two possible alternatives:
1. [g10k](https://github.com/xorpaul/g10k/) supports using a `:sha256sum` parameter to checksum
modules, but that only works for Forge modules. Maybe we could
pair this with using an explicit `sha1` reference for git
repository, ensuring those are checksummed as well. The downside
of that approach is that it leaves checked out git repositories in
a "detached head" state.
2. `r10k` has a [pending pull request](https://github.com/puppetlabs/r10k/pull/823) to add a `filter_command`
directive which could run after a git checkout has been
performed. it could presumably be used to verify OpenPGP
signatures on git commits, although this would work only on
modules we sign commits on (and therefore not third party)
It seems the best approach would be to use g10k for now with checksums
on both git commit and forge modules.
A validation hook running *before* g10k COULD validate that all `mod`
lines have a `checksum` of some sort...
Note that this approach does *NOT* solve the "double-commit" problem
identified in the Goals. It is believed that only a "monorepo" would
fix that problem and that approach comes in direct conflict with the
"collaboration" requirement. We chose the latter.
### Deploy to branch-specific environments
A key feature of r10k (and, of course, g10k) is that they are capable
of deploying code to new environments depending on the branch we're
working on. We would enable that feature to allow testing some large
changes to critical code paths without affecting all servers.
### Rename the default branch "production"
In accordance with Puppet's best practices, the control repository's
default branch would be called "production" and not "master".
Also: Black Lives Matter.
### Push directly on the Puppet server
Because we are worried about the GitLab attack surface, we could still
keep on pushing to the Puppet server for now. The control repository
could be mirrored to GitLab using a deploy key. All other repositories
would be published on GitLab anyways, and there the attack surface
would not matter because of the checksums in the control repository.
### Use a role account
To avoid permission issues, use a role account (say `git`) to accept
pushes and enforce git hooks.
### Use local test environments
It should eventually be possible to test changes locally before
pushing to production. This would involve radically simplifying the
Puppet server configuration and probably either getting rid of the
LDAP integration or at least making it optional so that changes can be
tested without it.
This would involve "puppetizing" the Puppet server configuration so
that a Puppet server and test agent(s) could be bootstrapped
automatically. Operators would run "smoke tests" (running Puppet by
hand and looking at the result) to make sure their code works before
pushing to production.
### Develop a test suite
The next step is to start working on a test suite for services, at
least for new deployments, so that code can be tested without running
things by hand. Plenty of Puppet modules have such test suite,
generally using [rspec-puppet](https://rspec-puppet.com/) and [rspec-puppet-facts](https://github.com/mcanevet/rspec-puppet-facts), and we
already have a few modules in `3rdparty` that have such tests. The
idea would be to have those tests on a per-role or per-profile basis.
### Hook into continuous integration
Once tests are functional, the last step is to move the control
repository into GitLab directly and start running CI against the
Puppet code base. This would probably not happen until GitLab CI is
deployed, and would require lots of work to get there, but would
eventually be worth it.
The GitLab CI would be indicative: an operator would need to push to a
topic branch there first to confirm tests pass but would still push
directly to the Puppet server for production.
### OpenPGP verification and web hook
To stop pushing directly to the Puppet server, we could implement
OpenPGP verification on the control repository. If a hook checks that
commits are signed by a trusted party, it does not matter where the
code is hosted.
We could use the [webhook](https://github.com/voxpupuli/puppet_webhook) system to have GitLab notify the Puppet
server to pull code.
## Cost
......@@ -1125,3 +1298,52 @@ specific remotes in subdirectories of the monorepo automatically.
| Subtree | "best of both worlds" | Still get double-commit, rebase problems | Not sure it's worth it |
| Subrepo | ? | ? | ? |
| myrepos | Flexible | Esoteric | might be useful with our monorepo |
### Best practices survey
I made a survey of the community (mostly the [shared puppet
modules](https://gitlab.com/shared-puppet-modules-group/) and [Voxpupuli](https://voxpupuli.org/) groups) to find out what the best
current practices are.
Koumbit uses foreman/puppet but pinned at version 10.1 because it is
the last one supporting "passenger" (the puppetmaster deployment
method currently available in Debian, deprecated and dropped from
puppet 6). They [patched it](https://redmine.koumbit.net/projects/theforeman-puppet/repository/revisions/5b1b0b42f2d7d7b01eacde6584d3) to support `puppetlabs/apache < 6`.
They push to a bare repo on the puppet master, then they have
validation hooks (the inspiration for our #31226), and a hook deploys
the code to the right branch.
They were using r10k but stopped because they had issues when r10k
would fail to deploy code atomically, leaving the puppetmaster (and
all nodes!) in an unusable state. This would happen when their git
servers were down without a locally cached copy. They also implemented
branch cleanup on deletion (although that could have been done some
other way). That issue was apparently reported against r10k but never
got a response. They now use puppet-librarian in their custom
hook. Note that it's possible r10k does not actually have that issue
because they found the issue they filed and it was... [against
librarian](https://github.com/voxpupuli/librarian-puppet/issues/73)!
Some people in #voxpupuli seem to use the Puppetlabs Debian packages
and therefore puppetserver, r10k and puppetboards. Their [Monolithic
master](https://voxpupuli.org/docs/monolithic/) architecture uses an external git repository, which pings
the puppetmaster through a [webhook](https://github.com/voxpupuli/puppet_webhook) which deploys a
[control-repo](https://puppet.com/docs/pe/latest/control_repo.html) ([example](https://github.com/puppetlabs/control-repo)) and calls r10k to deploy the
code. They also use [foreman](https://www.theforeman.org/) as a node classifier. that procedure
uses the following modules:
* [puppet/puppetserver](https://forge.puppet.com/puppet/puppetserver)
* [puppetlabs/puppet_agent](https://forge.puppet.com/puppetlabs/puppet_agent)
* [puppetlabs/puppetdb](https://forge.puppet.com/puppetlabs/puppetdb)
* [puppetlabs/puppet_metrics_dashboard](https://forge.puppet.com/puppetlabs/puppet_metrics_dashboard)
* [voxpupuli/puppet_webhook](https://github.com/voxpupuli/puppet_webhook)
* [r10k](https://github.com/puppetlabs/r10k) or [g10k](https://github.com/xorpaul/g10k)
* [Foreman](https://www.theforeman.org/)
They also have a [master of masters](https://voxpupuli.org/docs/master_agent/) architecture for scaling to
larger setups. For scaling, I have found [this article](https://puppet.com/blog/scaling-open-source-puppet/) to be more
interesting, that said.
So, in short, it seems people are converging towards r10k with a
web hook. To validate git repositories, they mirror the repositories
to a private git host.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment