Skip to content
Snippets Groups Projects
puppet.md 83.92 KiB

TPA uses Puppet to manage all servers it operates. It handles most of the configuration management of the base operating system and some services. It is not designed to handle ad hoc tasks, for which we favor the use of fabric.

Tutorial

This page is long! This first section hopes to get you running with a simple task quickly.

Adding an IP address to the global allow list

In this tutorial, we will add an IP address to the global allow list, on all firewalls on all machines. This is a big deal! It will allow that IP address to access the SSH servers on all boxes and more. This should be an static IP address on a trusted network.

If you have never used Puppet before or are nervous at all about making such a change, it is a good idea to have a more experienced sysadmin nearby to help you. They can also confirm this tutorial is what is actually needed.

  1. To any change on the Puppet server, you will first need to clone the git repository:

    git clone pauli.torproject.org:/srv/puppet.torproject.org/git/tor-puppet

    This needs to be only done once.

  2. The firewall rules are defined in the ferm module, which lives in modules/ferm. The file you specifically need to change is modules/ferm/templates/defs.conf.erb, so open that in your editor of choice:

    $EDITOR modules/ferm/templates/defs.conf.erb
  3. The code you are looking for is ADMIN_IPS. Add a @def for your IP address and add the new macro to the ADMIN_IPS macro. When you exit your editor, git should show you a diff that looks something like this:

    --- a/modules/ferm/templates/defs.conf.erb
    +++ b/modules/ferm/templates/defs.conf.erb
    @@ -77,7 +77,10 @@ def $TPO_NET = (<%= networks.join(' ') %>);
     @def $linus   = ();
     @def $linus   = ($linus 193.10.5.2/32); # kcmp@adbc
     @def $linus   = ($linus 2001:6b0:8::2/128); # kcmp@adbc
    -@def $ADMIN_IPS = ($weasel $linus);
    +@def $anarcat = ();
    +@def $anarcat = ($anarcat 203.0.113.1/32); # home IP
    +@def $anarcat = ($anarcat 2001:DB8::DEAD/128 2001:DB8:F00F::/56); # home IPv6
    +@def $ADMIN_IPS = ($weasel $linus $anarcat);
    
    
     @def $BASE_SSH_ALLOWED = ();
  4. Then you can commit this and push:

    git commit -m'add my home address to the allow list' && git push
  5. Then you should login to one of the hosts and make sure the code applies correctly:

    ssh -tt perdulce.torproject.org sudo puppet agent -t

Puppet shows colorful messages. If nothing is red and it returns correctly, you are done. If that doesn't work, go back to step 2. If that doesn't work, ask for help from your colleague in the Tor sysadmin team.

If this works, congratulations, you have made your first change across the entire Puppet infrastructure! You might want to look at the rest of the documentation to learn more about how to do different tasks and how things are setup. A key "How to" we recommend is the Progressive deployment section below, which will teach you how to make a change like the above while making sure you don't break anything even if it affects a lot of machines.

How-to

Modifying an existing configuration

For new deployments, this is NOT the preferred method. For example, if you are deploying new software that is not already in use in our infrastructure, do not follow this guide and instead follow the Adding a new module guide below.

If you are touching an existing configuration, things are much simpler however: you simply go to the module where the code already exists and make changes. You git commit and git push the code, then immediately run puppet agent -t on the affected node.

Look at the File layout section above to find the right piece of code to modify. If you are making changes that potentially affect more than one host, you should also definitely look at the Progressive deployment section below.

Adding a new module

This is a broad topic, but let's take the Prometheus monitoring system as an example which followed the role/profile/module pattern.

First, the Prometheus modules on the Puppet forge were evaluated for quality and popularity. There was a clear winner there: the Prometheus module from Vox Populi had hundreds of thousands more downloads than the next option, which was deprecated.

Next, the module was added to the Puppetfile (in 3rdparty/Puppetfile):

mod 'puppet-prometheus', '6.4.0'

... and librarian was ran:

librarian-puppet install

This fetched a lot of code from the Puppet forge: the stdlib, archive and system modules were all installed or updated. All those modules were audited manually, by reading each file and looking for obvious security flaws or back doors. Then the code was committed into git:

git add 3rdparty
git commit -m'install prometheus module after audit'

Then the module was configured in a profile, in modules/profile/manifests/prometheus/server.pp:

class profile::prometheus::server {
  class {
    'prometheus::server':
      # follow prom2 defaults
      localstorage        => '/var/lib/prometheus/metrics2',
      storage_retention   => '15d',
  }
}

The above contains our local configuration for the upstream prometheus::server class installed in the 3rdparty directory. In particular, it sets a retention period and a different path for the metrics, so that they follow the new Prometheus 2.x defaults.

Then this profile was added to a role, in modules/roles/manifests/monitoring.pp:

# the monitoring server
class roles::monitoring {
  include profile::prometheus::server
}

Notice how the role does not refer to any implementation detail, like that the monitoring server uses Prometheus. It looks like a trivial, useless, class but it can actually grow to include multiple profiles.

Then that role is added to the Hiera configuration of the monitoring server, in hiera/nodes/hetzner-nbg1-01.torproject.org.yaml:

classes:
  - roles::monitoring

And Puppet was ran on the host, with:

puppet --enable ; puppet agent -t --noop ; puppet --disable "testing prometheus deployment"

This led to some problems as the upstream module doesn't support installing from Debian packages. Support for Debian was added to the code in 3rdparty/modules/prometheus, and committed into git:

emacs 3rdparty/modules/prometheus/manifests/*.pp # magic happens
git commit -m'implement all the missing stuff' 3rdparty
git push

And the above puppet command-line was ran again, continuing that loop until things were good.

If you need to deploy the code to multiple hosts, see the Progressive deployment section below. To contribute changes back upstream (and you should do so), see the section right below.

Contributing changes back upstream

For simple changes, the above workflow works well, but eventually it is preferable to actually fork the upstream repository and operate on our fork until the changes are merged upstream.

First, the modified module is moved out of the way:

mv 3rdparty/modules/prometheus{,.orig}

The module is then forked on GitHub or wherever it is hosted, and then added to the Puppetfile:

mod 'puppet-prometheus',
    :git => 'https://github.com/anarcat/puppet-prometheus.git',
    :branch => 'deploy'

Then Librarian is ran again to fetch that code:

librarian-puppet install

Because Librarian is a little dumb, it might checkout your module in "detached head" mode, in which case you will want to fix the checkout:

cd 3rdparty/modules/prometheus
git checkout deploy
git reset --hard origin/deploy
git pull

Note that the deploy branch here is a merge of all the different branches proposed upstream in different pull requests, but it could also be the master branch or a single branch if only a single pull request was sent.

Since you now have a clone of the upstream repository, you can push and pull normally with upstream. When you make a change, however, you need to commit (and push) the change both in the sub-repository and the main repository:

cd 3rdparty/modules/prometheus
$EDITOR manifests/init.pp # more magic stuff
git commit -m'change the frobatz to a argblu'
git push
cd ..
git commit -m'change the frobatz to a argblu'
git push

Often, I make commits directly in our main Puppet repository, without pushing to the third party fork, until I am happy with the code, and then I craft a nice pretty commit that can be pushed upstream, reversing that process:

$EDITOR 3rdparty/prometheus/manifests/init.pp # dirty magic stuff
git commit -m'change the frobatz to a quuxblah'
git push
# see if that works, generally not
git commit -m'rah. wanted a quuxblutz'
git push
# now we are good, update our pull request
cd 3rdparty/modules/prometheus
git commit -m'change the frobatz to a quuxblutz'
git push

It's annoying to double-commit things, but I haven't found a best way to do so just yet. This problem is further discussed in ticket #29387.

Also note that when you update code like this, the Puppetfile does not change, but the Puppetfile.lock file does change. The GIT.sha parameter needs to be updated. This can be done by hand, but since that is error-prone, you might want to simply run this to update modules:

librarian-puppet update

This will also update dependencies so make sure you audit those changes before committing and pushing.

Running tests

Ideally, Puppet modules have a test suite. This is done with rspec-puppet and rspec-puppet-facts. This is not very well documented upstream, but it's apparently part of the Puppet Development Kit (PDK). Anyways: assuming tests exists, you will want to run some tests before pushing your code upstream, or at least upstream might ask you for this before accepting your changes. Here's how to get setup:

sudo apt install ruby-rspec-puppet ruby-puppetlabs-spec-helper ruby-bundler
bundle install --path vendor/bundle

This installs some basic libraries, system-wide (Ruby bundler and the rspec stuff). Unfortunately, required Ruby code is rarely all present in Debian and you still need to install extra gems. In this case we set it up within the vendor/bundle directory to isolate them from the global search path.

Finally, to run the tests, you need to wrap your invocation with bundle exec, like so:

bundle exec rake test

Validating Puppet code

You SHOULD run validation checks on commit locally before pushing your manifests. To install those hooks, you should clone this repository:

git clone https://github.com/anarcat/puppet-git-hooks

... and deploy it as a pre-commit hook:

ln -s $PWD/puppet-git-hooks tor-puppet/.git/hooks/pre-commit

A server-side validation hook hasn't been enabled yet because our manifests would sometimes fail and the hook was found to be somewhat slow. That is being worked on in issue 31226.

Listing all hosts under puppet

This will list all active hosts known to the Puppet master:

ssh -t pauli.torproject.org 'sudo -u postgres psql puppetdb -P pager=off -A -t -c "SELECT c.certname FROM certnames c WHERE c.deactivated IS NULL"'

The following will list all hosts under Puppet and their virtual value:

ssh -t pauli.torproject.org "sudo -u postgres psql puppetdb -P pager=off -F',' -A -t -c \"SELECT c.certname, value_string FROM factsets fs INNER JOIN facts f ON f.factset_id = fs.id INNER JOIN fact_values fv ON fv.id = f.fact_value_id INNER JOIN fact_paths fp ON fp.id = f.fact_path_id INNER JOIN certnames c ON c.certname = fs.certname WHERE fp.name = 'virtual' AND c.deactivated IS NULL\""  | tee hosts.csv

The resulting file is a Comma-Separated Value (CSV) file which can be used for other purposes later.

Possible values of the virtual field can be obtain with a similar query:

ssh -t pauli.torproject.org "sudo -u postgres psql puppetdb -P pager=off -A -t -c \"SELECT DISTINCT value_string FROM factsets fs INNER JOIN facts f ON f.factset_id = fs.id INNER JOIN fact_values fv ON fv.id = f.fact_value_id INNER JOIN fact_paths fp ON fp.id = f.fact_path_id WHERE fp.name = 'virtual';\""

The currently known values are: kvm, physical, and xenu.

As a bonus, this query will show the number of hosts running each release:

SELECT COUNT(c.certname), value_string FROM factsets fs INNER JOIN facts f ON f.factset_id = fs.id INNER JOIN fact_values fv ON fv.id = f.fact_value_id INNER JOIN fact_paths fp ON fp.id = f.fact_path_id INNER JOIN certnames c ON c.certname = fs.certname WHERE fp.name = 'lsbdistcodename' AND c.deactivated IS NULL GROUP BY value_string;

Other ways of extracting a host list

  • Using the PuppetDB API:

     curl -s -G http://localhost:8080/pdb/query/v4/facts  | jq -r ".[].certname"

    The fact API is quite extensive and allows for very complex queries. For example, this shows all hosts with the apache2 fact set to true:

     curl -s -G http://localhost:8080/pdb/query/v4/facts --data-urlencode 'query=["and", ["=", "name", "apache2"], ["=", "value", true]]' | jq -r ".[].certname"

    This will list all hosts sorted by their report date, older first, followed by the timestamp, space-separated:

     curl -s -G http://localhost:8080/pdb/query/v4/nodes  | jq -r 'sort_by(.report_timestamp) | .[] | "\(.certname) \(.report_timestamp)"' | column -s\  -t

    This will list all hosts with the roles::static_mirror class:

     curl -s -G http://localhost:8080/pdb/query/v4 --data-urlencode 'query=inventory[certname] { resources { type = "Class" and title = "Roles::Static_mirror" }} ' | jq .[].certname

    This will show all hosts running Debian buster:

     curl -s -G http://localhost:8080/pdb/query/v4 --data-urlencode 'query=nodes { facts { name = "lsbdistcodename" and value = "buster" }}' | jq .[].certname
  • Using howto/cumin

  • Using LDAP:

     HOSTS=$(ssh alberti.torproject.org 'ldapsearch -h db.torproject.org -x -ZZ -b dc=torproject,dc=org -LLL "hostname=*.torproject.org" hostname | awk "\$1 == \"hostname:\" {print \$2}" | sort')
     for i in `echo $HOSTS`; do mkdir hosts/x-$i 2>/dev/null || continue; echo $i; ssh $i ' ...'; done

    the mkdir is so that I can run the same command in many terminal windows and each host gets only one once

Batch jobs on all hosts

With that trick, a job can be ran on all hosts with parallel-ssh, for example, check the uptime:

cut -d, -f1 hosts.hsv | parallel-ssh -i -h /dev/stdin uptime

This would do the same, but only on physical servers:

grep 'physical$' hosts.hsv | cut -d -f1 | parallel-ssh -i -h /dev/stdin uptime

This would fetch the /etc/motd on all machines:

cut -d -f1 hosts.csv | parallel-slurp -h /dev/stdin -L motd /etc/motd motd

To run batch commands through sudo that requires a password, you will need to fool both sudo and ssh a little more:

cut -d -f1 hosts.csv | parallel-ssh -P -I -i -x -tt -h /dev/stdin -o pvs sudo pvs

You should then type your password then Control-d. Warning: this will show your password on your terminal and probably in the logs as well.

Batch jobs can also be ran on all Puppet hosts with Cumin:

ssh -N -L8080:localhost:8080 pauli.torproject.org &
cumin '*' uptime

See howto/cumin for more examples.

Progressive deployment

If you are making a major change to the infrastructure, you may want to deploy it progressively. A good way to do so is to include the new class manually in the node configuration, say in hiera/nodes/$fqdn.yaml:

classes:
  - my_new_class

Then you can check the effect of the class on the host with the --noop mode. Make sure you disable Puppet so that automatic runs do not actually execute the code, with:

puppet agent --disable "testing my_new_class deployment"

Then the new manifest can be simulated with this command:

puppet agent --enable ; puppet agent -t --noop ; puppet agent --disable "testing my_new_class deployment"

Examine the output and, once you are satisfied, you can re-enable the agent and actually run the manifest with:

puppet agent --enable ; puppet agent -t

If the change is inside an existing class, that change can be enclosed in a class parameter and that parameter can be passed as an argument from Hiera. This is how the transition to a managed /etc/apt/sources.list file was done:

  1. first, a parameter was added to the class that would remove the file, defaulting to false:

    class torproject_org(
      Boolean $manage_sources_list = false,
    ) {
      if $manage_sources_list {
        # the above repositories overlap with most default sources.list
        file {
          '/etc/apt/sources.list':
            ensure => absent,
        }
      }
    }
  2. then that parameter was enabled on one host, say in hiera/nodes/brulloi.torproject.org.yaml:

    torproject_org::manage_sources_list: true
  3. Puppet was run on that host using the simulation mode:

    puppet agent --enable ; puppet agent -t --noop ; puppet agent --disable "testing my_new_class deployment"
  4. when satisfied, the real operation was done:

    puppet agent --enable ; puppet agent -t --noop
  5. then this was added to two other hosts, and Puppet was ran there

  6. finally, all hosts were checked to see if the file was present on hosts and had any content, with howto/cumin (see above for alternative way of running a command on all hosts):

    cumin '*' 'du /etc/apt/sources.list'
  7. since it was missing everywhere, the parameter was set to true by default and the custom configuration removed from the three test nodes

  8. then Puppet was ran by hand everywhere, using Cumin, with a batch of 5 hosts at a time:

    cumin -o txt -b 5 '*' 'puppet agent -t'

    because Puppet returns a non-zero value when changes are made, this will above when any one host in a batch of 5 will actually operate a change. You can then examine the output and see if the change is legitimate or abort the configuration change.

Troubleshooting

Running Puppet by hand and logging

When a Puppet manifest is not behaving as it should, the first step is to run it by hand on the host:

puppet agent -t

If that doesn't yield enough information, you can see pretty much everything that Puppet does with the --debug flag. This will, for example, include Exec resources onlyif commands and allow you to see why they do not work correctly (a common problem):

puppet agent -t --debug

Finally, some errors show up only on the Puppet server: you can look in /var/log/daemon.log there for errors that will only show up there.

Finding exported resources with SQL queries

Connecting to the PuppetDB database itself can sometimes be easier than trying to operate the API. There you can inspect the entire thing as a normal SQL database, use this to connect:

sudo -u postgres psql puppetdb

It's possible exported resources do surprising things sometimes. It is useful to look at the actual PuppetDB to figure out which tags exported resources have. For example, this query lists all exported resources with troodi in the name:

SELECT certname_id,type,title,file,line,tags FROM catalog_resources WHERE exported = 't' AND title LIKE '%troodi%';

Keep in mind that there are automatic tags in exported resources which can complicate things.

Finding exported resources with PuppetDB

This query will look for exported resources with the type Backupninja::Server::Account (which can be a class, define, or builtin resource) and a title (the "name" of the resource as defined in the manifests) of backup-blah@backup.koumbit.net:

curl -s -X POST http://localhost:8080/pdb/query/v4 \
    -H 'Content-Type:application/json' \
    -d '{"query": "resources { type = \"Backupninja::Server::Account\" and title = \"backup-blah@backup.koumbit.net\" }"}' \
    | jq . | less -SR

TODO: update the above query to match resources actually in use at TPO. That example is from koumbit.org folks.

Password management

If you need to set a password in a manifest, there are special functions to handle this. We do not want to store passwords directly in Puppet source code, for various reasons: it is hard to erase because code is stored in git, but also, ultimately, we want to publish that source code publicly.

We have two mechanisms on how to do this now: a HKDF to generate passwords by hashing a common secret, and Trocla, which generates random passwords and stores the hash or, if necessary, the clear-text in a YAML file.. The HKDF function is deprecated and should be replaced by Trocla eventually.

hkdf

NOTE: this procedure is DEPRECATED and Trocla should be used instead, see the trocla migration ticket for details.

Old passwords in Puppet are managed through a Key Derivation Function (KDF), more specifically a hash-based KDF that takes a secret stored on the Puppet master (in /etc/puppet/secret) concatenates this with a unique token picked by the caller, and generates a secret unique to that token. An example:

$secret = hkdf('/etc/puppet/secret', "dip-${::hostname}-base-secret")

This generates a unique passwords for the given token. The password is then used, in clear text, by the puppet client as appropriate.

The function is an implementation of RFC5869, a SHA256-based HKDF taken from an earlier version of John Downey's Rubygems implementation.

Trocla

Trocla is another password-management solution that takes another approach. With Trocla, each password is generated on the fly from a secure entropy source (Ruby's SecureRandom module) and stored inside a state file (in /var/lib/trocla/trocla_data.yml, configured /etc/puppet/troclarc.yaml) on the Puppet master.

Trocla can return "hashed" versions of the passwords, so that the plain text password is never visible from the client. The plain text can still be stored on the Puppet master, or it can be deleted once it's been transmitted to the user or another password manager. This makes it possible to have Trocla not keep any secret at all.

This piece of code will generate a bcrypt-hashed password for the Grafana admin, for example:

$grafana_admin_password = trocla('grafana_admin_password', 'bcrypt')

The plain-text for that password will never leave the Puppet master. it will still be stored on the Puppet master, and you can see the value with:

trocla get grafana_admin_password plain

... on the command-line.

A password can also be set with this command:

trocla set grafana_guest_password plain

Note that this might erase other formats for this password, although those will get regenerated as needed.

Also note that trocla get will fail if the particular password or format requested does not exist. For example, say you generate a plain-text password with and then get the bcrypt version:

trocla create test plain
trocla get test bcrypt

This will return the empty string instead of the hashed version. Instead, use trocla create to generate that password. In general, it's safe to use trocla create as it will reuse existing password. It's actually how the trocla() function behaves in Puppet as well.

Getting information from other nodes

A common pattern in Puppet is to deploy resources on a given host with information from another host. For example, you might want to grant access to host A from host B. And while you can hardcode host B's IP address in host A's manifest, it's not good practice: if host B's IP address changes, you need to change the manifest, and that practice makes it difficult to introduce host C into the pool...

So we need ways of having a node use information from other nodes in our Puppet manifests. There are 5 methods in our Puppet source code at the time of writing:

  • Exported resources
  • PuppetDB lookups
  • Puppet Query Languaeg (PQL)
  • LDAP lookups
  • Hiera lookups

This section walks through how each method method works, outlining the advantage/disadvantage of each.

Exported resources

Our Puppet configuration supports exported resources, a key component of complex Puppet deployments. Exported resources allow one host to define a configuration that will be exported to the Puppet server and then realized on another host.

We commonly use this to punch holes in the firewall between nodes. For example, this manifest in the roles::puppetmaster class:

@@ferm::rule::simple { "roles::puppetmaster-${::fqdn}":
    tag         => 'roles::puppetmaster',
    description => 'Allow Puppetmaster access to LDAP',
    port        => ['ldap', 'ldaps'],
    saddr       => $base::public_addresses,
  }

... exports a firewall rule that will, later, allow the Puppet server to access the LDAP server (hence the port => ['ldap', 'ldaps'] line). This rule doesn't take effect on the host applying the roles::puppetmaster class, but only on the LDAP server, through this rather exotic syntax:

Ferm::Rule::Simple <<| tag == 'roles::puppetmaster' |>>

This tells the LDAP server to apply whatever rule was exported with the @@ syntax and the specified tag. Any Puppet resource can be exported and realized that way.

Note that there are security implications with collecting exported resources: it delegates the resource specification of a node to another. So, in the above scenario, the Puppet master could decide to open other ports on the LDAP server (say, the SSH port), because it exports the port number and the LDAP server just blindly applies the directive. A more secure specification would explicitly specify the sensitive information, like so:

Ferm::Rule::Simple <<| tag == 'roles::puppetmaster' |>> {
    port => ['ldap'],
}

But then a compromised server could send a different saddr and there's nothing the LDAP server could do here: it cannot override the address because it's exactly the information we need from the other server...

PuppetDB lookups

A common pattern in Puppet is to extract information from host A and use it on host B. The above "exported resources" pattern can do this for files, commands and many more resources, but sometimes we just want a tiny bit of information to embed in a configuration file. This could, in theory, be done with an exported concat resource, but this can become prohibitively complicated for something as simple as an allowed IP address in a configuration file.

For this we use the puppetdbquery module, which allows us to do elegant queries against PuppetDB. For example, this will extract the IP addresses of all nodes with the roles::gitlab class applied:

$allow_ipv4 = query_nodes('Class[roles::gitlab]', 'networking.ip')
$allow_ipv6 = query_nodes('Class[roles::gitlab]', 'networking.ip6')

This code, in profile::kgb_bot, propagates those variables into a template through the allowed_addresses variable, which gets expanded like this:

<% if $allow_addresses { -%>
<% $allow_addresses.each |String $address| { -%>
    allow <%= $address %>;
<% } -%>
    deny all;
<% } -%>

Note that there is a potential security issue with that approach. The same way that exported resources trust the exporter, we trust that the node exported the right fact. So it's in theory possible that a compromised Puppet node exports an evil IP address in the above example, granting access to an attacker instead of the proper node. If that is a concern, consider using LDAP or Hiera lookups instead.

Also note that this will eventually fail when the node goes down: after a while, resources are expired from the PuppetDB server and the above query will return an empty list. This seems reasonable: we do want to eventually revoke access to nodes that go away, but it's still something to keep in mind.

Keep in mind that the networking.ip fact, in the above example, might be incorrect in the case of a host that's behind NAT. In that case, you should use LDAP or Hiera lookups.

Note that this could also be implemented with a concat exported resource, but much harder because you would need some special case when no resource is exported (to avoid adding the deny) and take into account that other configuratinos might also be needed in the file. It would have the same security and expiry issues anyways.

Puppet query language

Note that there's also a way to do those queries without a Forge module, through the Puppet query language and the puppetdb_query function. The problem with that approach is that the function is not very well documented and the query syntax is somewhat obtuse. For example, this is what I came up with to do the equivalent of the query_nodes call, above:

$allow_ipv4 = puppetdb_query(
  ['from', 'facts',
    ['and',
      ['=', 'name', 'networking.ip'],
      ['in', 'certname',
        ['extract', 'certname',
          ['select_resources',
            ['and',
              ['=', 'type', 'Class'],
              ['=', 'title', 'roles::gitlab']]]]]]])

It seems like I did something wrong, because that returned an empty array. I could not figure out how to debug this, and apparently I neded more functions (like map and filter) to get what I wanted (see this gist). I gave up at that point: the puppetdbquery abstraction is much cleaner and usable.

If you are merely looking for a hostname, however, PQL might be a little more manageable. For example, this is how the roles::onionoo_frontend class finds its backends to setup the IPsec network:

$query = 'nodes[certname] { resources { type = "Class" and title = "Roles::Onionoo_backend" } }'
$peer_names = sort(puppetdb_query($query).map |$value| { $value["certname"] })
$peer_names.each |$peer_name| {
  $network_tag = [$::fqdn, $peer_name].sort().join('::')
  ipsec::network { "ipsec::${network_tag}":
    peer_networks => $base::public_addresses
  }
}

LDAP lookups

Our Puppet server is hooked up to the LDAP server and has information about the hosts defined there. Information about the node running the manifest is available in the global $nodeinfo variable, but there is also a $allnodeinfo parameter with information about every host known in LDAP.

A simple example of how to use the $nodeinfo variable is how the base::public_address and base::public_address6 parameters -- which represent the IPv4 and IPv6 public address of a node -- are initialized in the base class:

class base(
  Stdlib::IP::Address $public_address            = filter_ipv4(getfromhash($nodeinfo, 'ldap', 'ipHostNumber'))[0],
  Optional[Stdlib::IP::Address] $public_address6 = filter_ipv6(getfromhash($nodeinfo, 'ldap', 'ipHostNumber'))[0],
) {
  $public_addresses = [ $public_address, $public_address6 ].filter |$addr| { $addr != undef }
}

This loads the ipHostNumber field from the $nodeinfo variable, and uses the filter_ipv4 or filter_ipv6 functions to extract the IPv4 or IPv6 addresses respectively.

A good example of the $allnodeinfo parameter is how the roles::onionoo_frontend class finds the IP addresses of its backend. After having loaded the host list from PuppetDB, it then uses the parameter to extract the IP address:

$backends = $peer_names.map |$name| {
    [
      $name,
      $allnodeinfo[$name]['ipHostNumber'].filter |$a| { $a =~ Stdlib::IP::Address::V4 }[0]
    ] }.convert_to(Hash)

Such a lookup is considered more secure than going through PuppetDB as LDAP is a trusted data source. It is also our source of truth for this data, at the time of writing.

Hiera lookups

For more security-sensitive data, we should use a trusted data source to extract information about hosts. We do this through Hiera lookups, with the lookup function. A good example is how we populate the SSH public keys on all hosts, for the admin user. In the profile::ssh class, we do the following:

$keys = lookup('profile::admins::keys', Data, 'hash')

This will lookup the profile::admin::keys field in Hiera, which is a trusted source because under the control of the Puppet git repo. This refers to the following data structure in hiera/common.yaml:

profile::admins::keys:
  anarcat:
    type: "ssh-rsa"
    pubkey: "AAAAB3[...]"

The key point with Hiera is that it's a "hierarchical" data structure, so each host can have its own override. So in theory, the above keys could be overriden per host. Similarly, the IP address information for each host could be stored in Hiera instead of LDAP. But in practice, we do not currently do this and the per-host information is limited.

Revoking and generating a new certificate for a host

Revocation procedures problems were discussed in 33587 and 33446.

  1. Clean the certificate on the master

    puppet cert clean host.torproject.org
  2. Clean the certificate on the client:

    find /var/lib/puppet/ssl -name host.torproject.org.pem -delete
  3. Then run the bootstrap script on the client from tsa-misc/installer/puppet-bootstrap-client and get a new checksum

  4. Run tpa-puppet-sign-client on the master and pass the checksum

  5. Run puppet agent -t to have puppet running on the client again.

Pager playbook

catalog run: PuppetDB warning: did not update since...

If you see an error like:

Check last node runs from PuppetDB WARNING - cupani.torproject.org did not update since 2020-05-11T04:38:54.512Z

It can also be eventually accompanied with the puppet server reporting the same problem:

Subject: ** PROBLEM Service Alert: pauli/puppet - all catalog runs is WARNING **
[...]
Check last node runs from PuppetDB WARNING - cupani.torproject.org did not update since 2020-05-11T04:38:54.512Z

One of the following is happening, in decreasing likeliness:

  1. the node's Puppet manifest has an error of some sort that makes it impossible to run the catalog
  2. the node is down and has failed to report since the last time specified
  3. the Puppet server is down and all nodes will fail to report in the same way (in which case a lot more warnings will show up, and other warnings about the server will come in)

The first situation will usually happen after someone pushed a commit introducing the error. We try to keep all manifests compiling all the time and such errors should be immediately fixed. Look at the history of the Puppet source tree and try to identify the faulty commit. Reverting such a commit is acceptable to restore the service.

The second situation can happen if a node is in maintenance for an extended duration. Normally, the node will recover when it goes back online. If a node is to be permanently retired, it should be removed from Puppet, using the [host retirement procedures][retire-a-host].

Finally, if the main Puppet server is down, it should definitely be brought back up. See disaster recovery, below.

In any case, running the Puppet agent on the affected node should give more information:

ssh NODE puppet agent -t

Problems pushing to the Puppet server

Normally, when you push new commits to the Puppet server, a hook runs and updates the working copy. But sometimes this fails with an error like:

remote: error: unable to unlink old 'modules/ipsec/misc/config.yaml': Permission denied.

The problem, in such cases, is that the files in the /etc/puppet/ checkout are not writable by your user. It could also happen that the repository itself (in /srv/puppet.torproject.org/git/tor-puppet) could have permission issues.

This problem is described in issue 29663 and is due to someone not pushing properly before you. To fix the permissions, try:

sudo chown -R root:adm /etc/puppet
sudo chown :puppet /etc/puppet/secret
sudo chmod -R g+rw /etc/puppet
sudo chmod g-w /etc/puppet/secret

A similar recipe could be applied to the git repository, as needed. Hopefully this will be resolved when we start deploying with a role account instead.

Disaster recovery

Ideally, the main Puppet server would be deployable from Puppet bootstrap code and the main installer. But in practice, much of its configuration was done manually over the years and it MUST be restored from backups in case of failure.

This probably includes a restore of the PostgreSQL database backing the PuppetDB server as well. It's possible this step could be skipped in an emergency, because most of the information in PuppetDB is a cache of exported resources, reports and facts. But it could also break hosts and make converging the infrastructure impossible, as there might be dependency loops in exported resources.

In particular, the Puppet server needs access to the LDAP server, and that is configured in Puppet. So if the Puppet server needs to be rebuilt from scratch, it will need to be manually allowed access to the LDAP server to compile its manifest.

So it is strongly encouraged to restore the PuppetDB server database as well in case of disaster.

This also applies in case of an IP address change of the Puppet server, in which case access to the LDAP server needs to be manually granted before the configuration can run and converge. This is a known bootstrapping issue with the Puppet server and is further discussed in the design section.

Reference

This documents generally how things are setup.

Installation

Setting up a new Puppet server from scratch is not supported, or, to be more accurate, would be somewhat difficult. The server expects various external services to populate it with data, in particular:

The auto-ca component is also deployed manual, and so are the git hooks, repositories and permissions.

This needs to be documented, automated and improved. Ideally, it should be possible to install a new Puppet server from scratch using nothing but a Puppet bootstrap manifest, see issue 30770 and issue 29387, along with discussion about those improvements in this page, for details.

SLA

No formal SLA is defined. Puppet runs on a fairly slow cron job so doesn't have to be highly available right now. This could change in the future if we rely more on it for deployments.

Design

The Puppet server and PuppetDB currently live on pauli. That server was setup in 2011 by weasel. It follows the configuration of the Debian Sysadmin (DSA) Puppet server, which has its source code available in the dsa-puppet repository.

The service is maintained by TPA and manages all TPA-operated machines. Ideally, all services are managed by Puppet, but historically, only basic services were configured through Puppet, leaving service admins responsible for deploying their services on top of it. That tendency has shifted recently (~2020) with the deployment of the GitLab service through Puppet, for example.

The source code to the Puppet manifests (see below for a Glossary) is managed through git on a repository hosted directly on the Puppet server. Agents are deployed as part of the install process, and talk to the central server using a Puppet-specific certificate authority (CA).

As mentioned in the installation section, the Puppet server assumes a few components (namely LDAP, Nagios, Let's Encrypt and auto-ca) feed information into it. This is also detailed in the sections below. In particular, Puppet consistutes a duplicate "source of truth" for some information about servers. For example, LDAP has a "purpose" field describing what a server is for, but Puppet also has the concept of a role, attributed through Hiera (see issue 30273). A similar problem exists with IP addresses and user access control, in general.

Puppet is generally considered stable, but the code base is somewhat showing its age and has accumulated some technical debt.

For example, much of the Puppet code deployed is specific to Tor (and DSA, to a certain extent) and therefore is only maintained by a handful of people. It would be preferable to migrate to third-party, externally maintained modules (e.g. systemd, but also many others, see issue 29387 for details). A similar problem exists with custom Ruby code implemented for various functions, which is being replaced with Hiera (issue 30020).

The Puppet infrastructure being kept up to date with the latest versions in Debian but will require some work to port to Puppet 6, as the current deployment system ("puppetmaster") has been removed in that new release (see issue 33588).

Glossary

This is a subset of the Puppet glossary to quickly get you started with the vocabulary used in this document.

  • Puppet node: a machine (virtual or physical) running Puppet
  • Manifest: Puppet source code
  • Catalog: a set of compiled of Puppet source which gets applied on a node by a Puppet agent
  • Puppet agents: the Puppet program that runs on all nodes to apply manifests
  • Puppet server: the server which all agents connect to to fetch their catalog, also known as a Puppet master in older Puppet versions (pre-6)
  • Facts: information collected by Puppet agents on nodes, and exported to the Puppet server
  • Reports: log of changes done on nodes recorded by the Puppet server
  • PuppetDB server: an application server on top of a PostgreSQL database providing an API to query various resources like node names, facts, reports and so on

File layout

The Puppet server and PuppetDB server run on pauli.torproject.org. That is where the main git repository (tor-puppet) lives, in /srv/puppet.torproject.org/git/tor-puppet. That repository has hooks to populate /etc/puppet which is the live checkout from which the Puppet server compiles its catalogs.

All paths below are relative to the root of that git repository.

  • 3rdparty/modules include modules that are shared publicly and do not contain any TPO-specific configuration. There is a Puppetfile there that documents where each module comes from and that can be maintained with r10k or librarian.

  • modules includes roles, profiles, and classes that make the bulk of our configuration.

  • each node is assigned a "role" through Hiera, in hiera/nodes/$FQDN.yaml

    To be more accurate, Hiera assigns a Puppet class to each node, although each node should have only one special purpose class, a "role", see issue 40030 for progress on that transition.

  • The torproject_org module (modules/torproject_org/manifests/init.pp) performs basic host initialisation, like configuring Debian mirrors and APT sources, installing a base set of packages, configuring puppet and timezone, setting up a bunch of configuration files and running ud-replicate.

  • There is also the hoster.yaml file (modules/torproject_org/misc/hoster.yaml) which defines hosting providers and specifies things like which network blocks they use, if they have a DNS resolver or a Debian mirror. hoster.yaml is read by

    • the nodeinfo() function (modules/puppetmaster/lib/puppet/parser/functions/nodeinfo.rb), used for setting up the $nodeinfo variable
    • ferm's def.conf template (modules/ferm/templates/defs.conf.erb)
  • The root of definitions and execution is in Puppet is found in the manifests/site.pp file, but this file is now mostly empty, in favor of Hiera.

Note that the above is the current state of the file hierarchy. As part Hiera transition (issue 30020), a lot of the above architecture will change in favor of the more standard role/profile/module pattern.

Note that this layout might also change in the future with the introduction of a role account (issue 29663) and when/if the repository is made public (which requires changing the layout).

See ticket #29387 for an in-depth discussion.

Installed packages facts

The modules/torproject_org/lib/facter/software.rb file defines our custom facts, making it possible to get answer to questions like "Is this host running apache2?" by simply looking at a puppet variable.

Those facts are deprecated and we should instead install packages through Puppet instead of manually installing packages on hosts.

Style guide

Puppet manifests should generally follow the Puppet style guide. This can be easily done with Flycheck in Emacs, vim-puppet, or a similar plugin in your favorite text editor.

Many files do not currently follow the style guide, as they predate the creation of said guide. Files should not be completely reformatted unless there's a good reason. For example, if a conditional covering a large part of a file is removed and the file needs to be re-indented, it's a good opportunity to fix style in the file. Same if a file is split in two components or for some other reason completely rewritten.

Otherwise the style already in use in the file should be followed.

Hiera

Hiera is a "key/value lookup tool for configuration data" which Puppet uses to look up values for class parameters and node configuration in General.

We are in the process of transitioning over to this mechanism from our previous set of custom YAML lookup system. This documents the way we currently use Hiera.

Classes definitions

Each host declares which class it should include through a classes parameter. For example, this is what configures a Prometheus server: