# Cumin

[Cumin][] is a tool to operate arbitrary shell commands on [howto/Puppet](howto/Puppet)
hosts that match a certain criteria. It can match classes, facts and
other things stored in the PuppetDB.

It is useful to do adhoc or emergency changes on a bunch of machines
at once. It is especially useful to run Puppet itself on multiple
machines at once to do progressive deployments.

It should *not* be used as a replacement for Puppet itself: most
configuration on server should *not* be done manually and should
instead be done in Puppet manifests so they can be reproduced and
documented.

 [Cumin]: https://doc.wikimedia.org/cumin/master/introduction.html

## Installation

### Virtualenv / pip

If Cumin is not available from your normal packages (see [bug 924685][]
for Debian), you must install it in a [Python virtualenv][].

[bug 924685]: https://bugs.debian.org/924685

First, install dependencies, Cumin and some patches:

    sudo apt install python3-clustershell python3-pyparsing python3-requests python3-tqdm python3-yaml
    python3 -m venv --system-site-packages ~/.virtualenvs/cumin
    ~/.virtualenvs/cumin/bin/pip3 install cumin
    ~/.virtualenvs/cumin/bin/pip3 uninstall tqdm pyparsing clustershell # force using trusted system packages

Then drop the following configuration in
`~/.config/cumin/config.yaml`:

    transport: clustershell
    puppetdb:
        host: localhost
        scheme: http
        port: 8080
        api_version: 4  # Supported versions are v3 and v4. If not specified, v4 will be used.
    log_file: cumin.log
    default_backend: puppetdb

From here on we'll assume you use the following alias:

    alias cumin="~/.virtualenvs/cumin/bin/cumin --config ~/.config/cumin/config.yaml"

You should also make sure your machine has access to the PuppetDB
server configured above, with:

    ssh -L8080:localhost:8080 pauli.torproject.org

## Example commands

This will run the `uptime` command on all hosts:

    cumin '*' uptime

To run against only a subset, you need to use the Cumin grammar, which
is [briefly described in the Wikimedia docs][]. For example, this
will run the same command only on physical hosts:

    cumin 'F:virtual=physical' uptime

Just check the monitoring server:

    cumin 'R:class=roles::monitoring' uptime

Any Puppet fact or class can be queried that way. This also serves as
a ad-hoc interface to query PuppetDB for certain facts, as you don't
have to provide a command. In that case, `cumin` runs in "dry mode"
and will simply show which hosts match the request:

    $ cumin 'F:virtual=physical'
    16 hosts will be targeted:
    [...]

[Python virtualenv]: https://virtualenv.pypa.io/
[briefly described in the Wikimedia docs]: https://wikitech.wikimedia.org/wiki/Cumin#PuppetDB_host_selection
[parallel-ssh]: https://code.google.com/archive/p/parallel-ssh/

# Discussion

## Alternatives considered

 * [Choria](https://choria.io/) - to be evaluated
 * Ansible?
 * Puppet mcollective?
 * simple SSH loop from LDAP output
 * [parallel-ssh][]
 * see also the [list in the Puppet docs](puppet#other-ways-of-extracting-a-host-list)

See also [fabric](howto/fabric).