... | ... | @@ -606,11 +606,49 @@ Revocation procedures problems were discussed in [33587][] and [33446][]. |
|
|
|
|
|
5. Run `puppet agent -t` to have puppet running on the client again.
|
|
|
|
|
|
## Pager playbook
|
|
|
|
|
|
<!-- information about common errors from the monitoring system and -->
|
|
|
<!-- how to deal with them. this should be easy to follow: think of -->
|
|
|
<!-- your future self, in a stressful situation, tired and hungry. -->
|
|
|
|
|
|
TODO.
|
|
|
|
|
|
## Disaster recovery
|
|
|
|
|
|
<!-- what to do if all goes to hell. e.g. restore from backups? -->
|
|
|
<!-- rebuild from scratch? not necessarily those procedures (e.g. see -->
|
|
|
<!-- "Installation" below but some pointers. -->
|
|
|
|
|
|
TODO.
|
|
|
|
|
|
# Reference
|
|
|
|
|
|
This documents generally how things are setup.
|
|
|
|
|
|
## Before it all starts
|
|
|
## Installation
|
|
|
|
|
|
TODO. It is not yet clear how the Puppetmaster was setup or how to
|
|
|
build a new one. The interactions with other tools like Nagios and
|
|
|
LDAP especially need to be documented.
|
|
|
|
|
|
## SLA
|
|
|
|
|
|
No formal SLA is defined. Puppet runs on a fairly slow cron job so
|
|
|
doesn't have to be highly available right now. This could change in
|
|
|
the future if we rely more on it for deployments.
|
|
|
|
|
|
## Design
|
|
|
<!-- how this is built -->
|
|
|
<!-- should reuse and expand on the "proposed solution", it's a -->
|
|
|
<!-- "as-built" documented, whereas the "Proposed solution" is an -->
|
|
|
<!-- "architectural" document, which the final result might differ -->
|
|
|
<!-- from, sometimes significantly -->
|
|
|
|
|
|
<!-- a good guide to "audit" an existing project's design: -->
|
|
|
<!-- https://bluesock.org/~willkg/blog/dev/auditing_projects.html -->
|
|
|
|
|
|
### Before it all starts
|
|
|
|
|
|
- `puppet.tpo` is currently being run on `pauli.tpo`
|
|
|
- This is where the tor-puppet git repo lives
|
... | ... | @@ -619,7 +657,7 @@ This documents generally how things are setup. |
|
|
- All paths in this document are relative to the root of this
|
|
|
repository.
|
|
|
|
|
|
## File layout
|
|
|
### File layout
|
|
|
|
|
|
- `3rdparty/modules` include modules that are shared publicly and do
|
|
|
not contain any TPO-specific configuration. There is a `Puppetfile`
|
... | ... | @@ -672,13 +710,13 @@ pattern. See [ticket #29387][] for an in-depth discussion. |
|
|
[role/profile/module]: https://puppet.com/docs/pe/2017.2/r_n_p_intro.html
|
|
|
[ticket #29387]: https://bugs.torproject.org/29387
|
|
|
|
|
|
## Custom facts
|
|
|
### Custom facts
|
|
|
|
|
|
`modules/torproject_org/lib/facter/software.rb` defines our custom
|
|
|
facts, making it possible to get answer to questions like "Is this
|
|
|
host running apache2?" byt simply looking at a puppet variable.
|
|
|
|
|
|
## Style guide
|
|
|
### Style guide
|
|
|
|
|
|
Puppet manifests should generally follow the [Puppet style
|
|
|
guide][]. This can be easily done with [Flycheck][] in Emacs,
|
... | ... | @@ -698,7 +736,7 @@ Otherwise the style already in use in the file should be followed. |
|
|
[Flycheck]: http://flycheck.org/
|
|
|
[vim-puppet]: https://github.com/rodjek/vim-puppet
|
|
|
|
|
|
## Hiera
|
|
|
### Hiera
|
|
|
|
|
|
[Hiera][] is a "key/value lookup tool for configuration data" which
|
|
|
Puppet uses to look up values for class parameters and node
|
... | ... | @@ -710,7 +748,7 @@ currently use Hiera. |
|
|
|
|
|
[Hiera]: https://puppet.com/docs/hiera/3.2/
|
|
|
|
|
|
### Classes definitions
|
|
|
#### Classes definitions
|
|
|
|
|
|
Each host declares which class it should include through a `classes`
|
|
|
parameter. For example, this is what configures a Prometheus server:
|
... | ... | @@ -730,7 +768,7 @@ those should be ported to shared modules from the Puppet forge, with |
|
|
our glue built into a profile on top of the third-party module. The
|
|
|
role `roles::monitoring` follows that pattern correctly.
|
|
|
|
|
|
### Node configuration
|
|
|
#### Node configuration
|
|
|
|
|
|
On top of the host configuration, some node-specific configuration can
|
|
|
be performed from Hiera. This should be avoided as much as possible,
|
... | ... | @@ -757,7 +795,7 @@ modules. For example, the Bacula director is hardcoded in the `bacula` |
|
|
base class (in `modules/bacula/manifests/init.pp`). That should be
|
|
|
moved into a class parameter, probably in `common.yaml`.
|
|
|
|
|
|
## Cron and scheduling
|
|
|
### Cron and scheduling
|
|
|
|
|
|
The Puppet agent is *not* running as a daemon, it's running through
|
|
|
good old `cron`.
|
... | ... | @@ -768,3 +806,85 @@ hours. |
|
|
|
|
|
This configuration is in `/etc/cron.d/puppet-crontab` and deployed by
|
|
|
Puppet itself, currently as part of the `torproject_org` module.
|
|
|
|
|
|
## Issues
|
|
|
|
|
|
There is no issue tracker specifically for this project, [File][] or
|
|
|
[search][] for issues in the [team issue tracker][search] component.
|
|
|
|
|
|
[File]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/new
|
|
|
[search]: https://gitlab.torproject.org/tpo/tpa/team/-/issues
|
|
|
|
|
|
## Monitoring and testing
|
|
|
|
|
|
Puppet is hooked into Nagios in two ways:
|
|
|
|
|
|
* one job runs on the Puppetmaster and checks PuppetDB for
|
|
|
reports. this was done with a [patched](https://github.com/evgeni/check_puppetdb_nodes/pull/14) version of the
|
|
|
[check_puppetdb_nodes](https://github.com/evgeni/check_puppetdb_nodes/_) Nagios check, now packaged inside the
|
|
|
`tor-nagios-checks` Debian package
|
|
|
* another job runs on each Puppet node and will therefore work even
|
|
|
if the Puppetmaster dies for some reason. this is done with the
|
|
|
[check_puppet_agent](https://github.com/aswen/nagios-plugins/blob/master/check_puppet_agent) Nagios check, now also packaged inside the
|
|
|
`tor-nagios-checks` Debian package
|
|
|
|
|
|
This was [implemented in March 2019](https://gitlab.torproject.org/tpo/tpa/team/-/issues/29676). An alternative implementation
|
|
|
[using Prometheus](https://forge.puppet.com/puppet/prometheus_reporter) was considered but [Prometheus still hasn't
|
|
|
replaced Nagios](https://gitlab.torproject.org/tpo/tpa/team/-/issues/29864) at the time of writing.
|
|
|
|
|
|
There are no validation checks and *a priori* no peer review of code:
|
|
|
code is directly pushed to the puppetmaster without validation. Work
|
|
|
is being done to [implement automated checks](https://gitlab.torproject.org/tpo/tpa/team/-/issues/31226) but that is only
|
|
|
being deployed on some clients for now.
|
|
|
|
|
|
# Discussion
|
|
|
|
|
|
This section goes more in depth into how Puppet is setup, why it was
|
|
|
setup the way it was, and how it could be improved.
|
|
|
|
|
|
## Overview
|
|
|
|
|
|
Our Puppet setup dates back from 2011, according to the git history,
|
|
|
and was probably based off the [Debian System Administrator's Puppet
|
|
|
codebase](https://salsa.debian.org/dsa-team/mirror/dsa-puppet) which dates back to 2009.
|
|
|
|
|
|
## Goals
|
|
|
|
|
|
The general goal of Puppet is to provide basic automation across the
|
|
|
architecture, so that software installation and configuration, file
|
|
|
distribution, user and some service management is done from a central
|
|
|
location, managed in a git repository. This approach is often called
|
|
|
[Infrastructure as code](https://en.wikipedia.org/wiki/Infrastructure_as_Code).
|
|
|
|
|
|
### Must have
|
|
|
|
|
|
TBD.
|
|
|
|
|
|
### Nice to have
|
|
|
|
|
|
TBD.
|
|
|
|
|
|
### Non-Goals
|
|
|
|
|
|
TBD.
|
|
|
|
|
|
## Approvals required
|
|
|
|
|
|
TPA should approve policy changes as per [tpa-rfc-1](/policy/tpa-rfc-1-policy).
|
|
|
|
|
|
## Proposed Solution
|
|
|
|
|
|
N/A.
|
|
|
|
|
|
## Cost
|
|
|
|
|
|
N/A.
|
|
|
|
|
|
## Alternatives considered
|
|
|
|
|
|
Ansible was considered for managing [GitLab](gitlab) for a while, but
|
|
|
this was eventually abandoned in favor of using Puppet and the
|
|
|
"Omnibus" package.
|
|
|
|
|
|
For ad-hoc jobs, [fabric](fabric) is being used.
|
|
|
|