- installing a server (TODO)
- retiring a server (howto/retire-a-host)
- migrating machines (howto/ganeti)
- retiring a user (TODO)
- reboots (howto/upgrades)
- ... etc
Fabric makes easy things reproducible and hard things possible. It is not designed to handle larger-scale configuration management, for which we use howto/puppet.
All of the instructions below assume you have a copy of the TPA fabric library, fetch it with:
git clone firstname.lastname@example.org:admin/tsa-misc.git && cd tsa-misc
Running a command on hosts
Fabric can be used from the commandline to run arbitrary commands on servers, like this:
fab -H hostname.example.com -- COMMAND
$ fab -H perdulce.torproject.org -- uptime 17:53:22 up 24 days, 19:34, 1 user, load average: 0.00, 0.00, 0.07
This is equivalent to:
ssh hostname.example.com COMMAND
... except that you can run it on multiple servers:
$ fab -H perdulce.torproject.org,chives.torproject.org -- uptime 17:54:48 up 24 days, 19:36, 1 user, load average: 0.00, 0.00, 0.06 17:54:52 up 24 days, 17:35, 21 users, load average: 0.00, 0.00, 0.00
Listing tasks and self-documentation
tsa-misc repository has a good library of tasks that can be ran
from the commandline. To show the list, use:
Help for individual tasks can also be inspected with
$ fab -h host.fetch-ssh-host-pubkey Usage: fab [--core-opts] host.fetch-ssh-host-pubkey [--options] [other tasks here ...] Docstring: fetch public host key from server Options: -t STRING, --type=STRING
The name of the server to run the command against is implicit in the
usage: it must be passed with the
-H (short for
argument. For example:
$ fab -H perdulce.torproject.org host.fetch-ssh-host-pubkey b'ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGOnZX95ZQ0mliL0++Enm4oXMdf1caZrGEgMjw5Ykuwp root@perdulce\n'
A simple Fabric function
Each procedure mentioned in the introduction above has its own
documentation. This tutorial aims more to show how to make a simple
Fabric program inside TPA. Here we will create a
uptime task which
will simply run the
uptime command on the provided hosts. It's a
trivial example that shouldn't be implemented (it is easier to just
fab to run the shell command) but it should give you an idea of
how to write new tasks.
edit the source
we pick the "generic" host library (
host.py) here, but there are other libraries that might be more appropriate, for example
reboot. Fabric-specific extensions, monkeypatching and other hacks should live in
add a task, which is simply a Python function:
@task def uptime(con): return con.run('uptime')
@taskstring is a decorator which indicates to Fabric the function should be exposed as a command-line task. In that case, it gets a Connection object passed which we can run stuff from. In this case, we run the
uptimecommand over SSH.
the task will automatically be loaded as it is part of the
hostmodule, but if this is a new module, add it to
fabfile.pyin the parent directory
the task should now be available:
$ fab -H perdulce.torproject.org host.uptime 18:06:56 up 24 days, 19:48, 1 user, load average: 0.00, 0.00, 0.02
N/A for now. Fabric is an ad-hoc tool and, as such, doesn't have monitoring that should trigger a response. It could however be used for some oncall work, which remains to be determined.
Fabric is available as a Debian package:
apt install fabric
See also the upstream instructions for other platforms (e.g. Pip).
Fabric code grew out of the installer and reboot scripts in the
tsa-misc repository. To get access to the code, simply clone the
repository and run from the top level directory:
git clone email@example.com:admin/tsa-misc.git && cd tsa-misc && fab -l
This code could also be moved to its own repository altogether.
Installing Fabric on Debian buster
Fabric has been part of Debian since at least Debian jessie, but you should install the newer, 2.x version that is only available in bullseye and later. The bullseye version is a "trivial backport" which means it can be installed directly in stable with:
apt install -t bullseye fabric
apt install -t buster-backports python3-paramiko
TPA's fabric library lives in the
tsa-misc repository and consists
of multiple Python modules, at the time of writing:
anarcat@curie:tsa-misc(master)$ wc -l fabric_tpa/*.py 463 fabric_tpa/ganeti.py 297 fabric_tpa/host.py 46 fabric_tpa/__init__.py 262 fabric_tpa/libvirt.py 224 fabric_tpa/reboot.py 125 fabric_tpa/retire.py 1417 total
Each module encompasses Fabric tasks that can be called from the
fab tool or Python functions, both of which can be
reused in other modules as well. There are also wrapper scripts for
certain jobs that are a poor fit for the
fab tool, especially
reboot which requires particular host scheduling.
The fabric functions currently only communicate with the rest of the
infrastructure through SSH. It is assumed the operator will have
root access on all the affected servers. Server lists are
provided by the operator but should eventually be extracted from
PuppetDB or LDAP. It's also possible scripts will eventually edit
existing (but local) git repositories.
Most of the TPA-specific code was written and is maintained by
anarcat. The Fabric project itself is headed by Jeff Forcier AKA
bitprophet it is, obviously, a much smaller community than Ansible
but still active. There is a mailing list, IRC channel, and GitHub
issues for upstream support (see contact) along with commercial
support through Tidelift.
There are no formal releases of the code for now.
Those are the main jobs being automated by fabric:
Monitoring and testing
There is no monitoring of this service, as it's not running continously.
Fabric tasks should implement some form of unit testing. Ideally, we would have 100% test coverage.
We use pytest to write unit tests. To run the test suite, use:
There are multiple tasks in TPA that require manual copy-pasting of
code from documentation to the shell or, worse, to grep backwards in
history to find the magic command (e.g.
ldapvi). A lot of those jobs
are error-prone and hard to do correctly.
In case of the installer, this leads to significant variation and chaos in the installs, which results in instability and inconsistencies between servers. It was determined that the installs would be automated as part of ticket 31239 and that analysis and work is being done in howto/new-machine.
It was later realised that other areas were suffering from a similar problem. The upgrade process, for example, has mostly been manual until adhoc shell scripts were written. But unfortunately now we have many shell scripts, none of which work correctly. So work started on automating reboots as part of ticket 33406.
And then it was time to migrate the second libvirt server to howto/ganeti (unifolium/kvm2, ticket 33085) and by then it was clear some more generic solution was required. An attempt to implement this work in Ansible only led to frustration at the complexity of the task and tests were started on Fabric instead, which were positive. A few weeks later, a library of functions was available and the migration procedure was almost entirely automated.
LDAP integration might be something we could consider, because it's a
large part of the automation that's required in a lot of our work. One
alternative is to talk with
ldapvi or commandline tools, the other
is to implement some things natively in Python:
- Python LDAP could be used to automate talking with ud-ldap, see in particular the Python LDAP functions, in particular add and delete
- The above docs are very limited, and they suggest external resources also:
ease of use - it should be easy to write new tasks and to understand existing ones
operation on multiple servers - many of the tricky tasks we need to do operate on multiple servers synchronously something that, for example, is hard to do in Puppet
Nice to have
- long term maintenance - this should not be Legacy Code and must be unit tested, at least for parts that are designed to stay in the long term (e.g. not the libvirt importer)
sharing with the community - it is assumed that those are tasks too site-specific to be reused by other groups, although the code is still shared publicly. shared code belongs to Puppet.
performance - this does not need to be high performance, as those tasks are done rarely
TPA. Approved in /meeting/2020-03-09/.
We are testing Fabric.
Time and labor.
Ansible makes easy things easy and scalable, but makes it hard to do hard stuff
for example, how would you do a disk inventory and pass it to another host to recreate those disk? for an Ansible ignorant like me, it's far from trivial. it probably implies something like this dictionnary type but in Fabric, it's:
json.loads(con.run('qemu-img info --output=json %s' % disk_path).stdout)
Any person somewhat familiar with Python can tell what this does.
we use Puppet for high-level configuration management, and Ansible conflicts with that problem space, leading to higher cognitive load
- MCollective was (it's deprecated) a tool that could be used to fire jobs on Puppet nodes from the Puppet master
- Not relevant for our use case because we want to bootstrap Puppet (in which case Puppet is not available yet) or retire Puppet (in which case it will go away).
- does not have much privileged access to PuppetDB or the Puppet CA infrastructure, that needs to be bolted on by hand
Doing things by hand
- timing is sometimes critical
- sets best practices in code instead of in documentation
- makes recipes easily reusable
Another custom Python script
- is it
run? what if you want both the output and the status code? can you remember?
- argument parsing code built-in, self-documenting code
- exposes Python functions as commandline jobs
- hard to reuse
- hard to read, audit
- missing a lot of basic programming primitives (hashes, objects, etc)
- no unit testing out of the box
- notoriously hard to read