replace PuppetDB to address security issues (CVE-2021-27021, jetty9)
in #40699 (closed), we found that a recent buster upgrade inexplicably triggered Debian #994843 which seems to have appeared (at least for some users) after the 9.4.16-0+deb10u1
libjetty9-java upgrade, which itself occurred in August 2021.
in #40699 (closed), we have downgraded libjetty9 to workaround the issue, but that opened up a whole lot of other vulnerabilities.
this entire thing also outlined how badly puppetdb is maintained in debian. it's affected by at least one major vulnerability (CVE-2021-27021, SQL injection) and generally is lagging significantly behind upstream.
so we might want to consider either updating or removing puppetdb completely from our infrastructure.
with @lavamind we brainstormed an action plan that looks like this:
-
test puppet-terminus-redis in a lab, to see if exported resources work correctly -
if that's a go, attempt a migration from puppetdb to redis on pauli -
replace catalog monitoring from checking puppetdb to checking each host individually (@lavamind has a plugin for that), over NRPE
(see blockers below before embarking in this however)
For monitoring, we may want to also consider puppet-prometheus_reporter to collect catalog runs into prometheus for performance analysis. The Redis implementation doesn't keep old catalogs, which is somewhat of a step back from PuppetDB. But maybe that can be kept inside the Nagios replacement project (#29864 (closed)).
For the migration, the process discovered in #40422 (closed) may be used here. Basically, to migrate to another exported resource backend, the following should happen, in strict order:
- run puppet everywhere until all nodes have converged
- disable puppet on all nodes (
puppet agent --disable ...
- install the new terminus backend
- stop puppetdb
- switch to new backend (e.g.
storeconfigs_backend = redis
and so on, see the terminus README) on disk (/etc/puppet/puppet.conf
on pauli) and in manifests - enable/run/disable puppet in dry run on all nodes (
puppet agent --enable ; puppet agent --test --noop ; puppet agent --disable ...
) - run step 4 multiple times until you have some level of confidence you have converged (
--noop
should eventually show no change) - enable/run puppet everywhere in "wet" (not dry or "noop") mode
- remove puppetdb
Be careful that Puppet WILL rewrite puppet.conf to point back at puppetdb if Puppet is ran without changing the manifests puppetdb should be stopped so that will only fail some manifests, but particular care should be applied to make sure the configuration is correct before puppet is ran.
Blockers:
-
nagios relies on puppetdb for checks (we can check directly on the hosts with another check) -
cumin relies on puppetdb for host inventories