... | ... | @@ -296,19 +296,50 @@ downtime, because users and passwords are *copied* over to all |
|
|
hosts. In other words, authentication doesn't rely on the LDAP server
|
|
|
being up.
|
|
|
|
|
|
In general, OpenLDAP is very stable and doesn't generally crash, so we
|
|
|
haven't had many emergencies scenarios with it yet. If anything
|
|
|
happens, make sure the `slapd` service is running.
|
|
|
|
|
|
The `ud-ldap` software, on the other hand, is a little more
|
|
|
complicated and can be hard to diagnose. It has a large number of
|
|
|
moving parts (Python, Perl, Bash, Shell scripts) and talks over a
|
|
|
large number of protocols (email, DNS, HTTPS, SSH, finger). The
|
|
|
failure modes documented here are far from exhaustive and you should
|
|
|
expect exotic failures and error messages.
|
|
|
|
|
|
### LDAP server failure
|
|
|
|
|
|
That said, if the LDAP server goes down, password changes will not
|
|
|
work, and the server inventory (at <https://db.torproject.org/>) will
|
|
|
be gone. A mitigation is to use Puppet manifests and/or PuppetDB to
|
|
|
get a host list and server inventory, see the [Puppet
|
|
|
documentation](puppet) for details.
|
|
|
|
|
|
In general, OpenLDAP is very stable and doesn't generally crash, so we
|
|
|
haven't had many emergencies scenarios with it yet. If anything
|
|
|
happens, make sure the `slapd` service is running.
|
|
|
### Git server failure
|
|
|
|
|
|
The `ud-ldap` software, on the other hand, is a little more
|
|
|
complicated and can be hard to diagnose. TODO: expand on the failure
|
|
|
modes.
|
|
|
The LDAP server will fail to regenerate (and therefore update) zone
|
|
|
files and zone records if the Git server is unavailable. This is
|
|
|
described in [issue 33766](https://gitlab.torproject.org/tpo/tpa/team/-/issues/33766). The fix is to recover the git server. A
|
|
|
workaround is to run this command on the primary DNS server (currently
|
|
|
`nevii`):
|
|
|
|
|
|
sudo -u dnsadm /srv/dns.torproject.org/bin/update --force
|
|
|
|
|
|
### ud-replicate failures
|
|
|
|
|
|
TODO: i seem to recall `ud-replicate` failing somehow, possibly
|
|
|
because of SSH multiplexing or something?
|
|
|
|
|
|
### Dependency loop on new installs
|
|
|
|
|
|
Installing a new server requires granting the new server access
|
|
|
various machines, including [puppet](puppet) and the LDAP server
|
|
|
itself. This is granted ... by Puppet through LDAP!
|
|
|
|
|
|
So a server cannot register itself on the LDAP server and needs an
|
|
|
operator to first create a `host` snippet on the LDAP server, and then
|
|
|
run Puppet on the Puppet server. This is documented in the
|
|
|
[installation notes](new-machine).
|
|
|
|
|
|
## Disaster recovery
|
|
|
|
... | ... | @@ -316,7 +347,11 @@ The LDAP server is mostly built by hand and should therefore be |
|
|
restored from backups in case of a catastrophic failure. Care should
|
|
|
be taken to keep the SSH keys of the server intact.
|
|
|
|
|
|
TODO: analyse <https://gitlab.torproject.org/tpo/tpa/team/-/issues/33908>.
|
|
|
The IP address (and name?) of the LDAP server should not be hardcoded
|
|
|
anywhere. When the server was last renumbered ([issue 33908](https://gitlab.torproject.org/tpo/tpa/team/-/issues/33908)), the
|
|
|
only changes necessary were on the server itself, in `/etc`. So in
|
|
|
theory, a fresh new server could be deployed (from backups) in a new
|
|
|
location (and new address) without having to do much.
|
|
|
|
|
|
# Reference
|
|
|
## Installation
|
... | ... | |