Loading howto/ldap.md +42 −7 Original line number Diff line number Diff line Loading @@ -296,19 +296,50 @@ downtime, because users and passwords are *copied* over to all hosts. In other words, authentication doesn't rely on the LDAP server being up. In general, OpenLDAP is very stable and doesn't generally crash, so we haven't had many emergencies scenarios with it yet. If anything happens, make sure the `slapd` service is running. The `ud-ldap` software, on the other hand, is a little more complicated and can be hard to diagnose. It has a large number of moving parts (Python, Perl, Bash, Shell scripts) and talks over a large number of protocols (email, DNS, HTTPS, SSH, finger). The failure modes documented here are far from exhaustive and you should expect exotic failures and error messages. ### LDAP server failure That said, if the LDAP server goes down, password changes will not work, and the server inventory (at <https://db.torproject.org/>) will be gone. A mitigation is to use Puppet manifests and/or PuppetDB to get a host list and server inventory, see the [Puppet documentation](puppet) for details. In general, OpenLDAP is very stable and doesn't generally crash, so we haven't had many emergencies scenarios with it yet. If anything happens, make sure the `slapd` service is running. ### Git server failure The `ud-ldap` software, on the other hand, is a little more complicated and can be hard to diagnose. TODO: expand on the failure modes. The LDAP server will fail to regenerate (and therefore update) zone files and zone records if the Git server is unavailable. This is described in [issue 33766](https://gitlab.torproject.org/tpo/tpa/team/-/issues/33766). The fix is to recover the git server. A workaround is to run this command on the primary DNS server (currently `nevii`): sudo -u dnsadm /srv/dns.torproject.org/bin/update --force ### ud-replicate failures TODO: i seem to recall `ud-replicate` failing somehow, possibly because of SSH multiplexing or something? ### Dependency loop on new installs Installing a new server requires granting the new server access various machines, including [puppet](puppet) and the LDAP server itself. This is granted ... by Puppet through LDAP! So a server cannot register itself on the LDAP server and needs an operator to first create a `host` snippet on the LDAP server, and then run Puppet on the Puppet server. This is documented in the [installation notes](new-machine). ## Disaster recovery Loading @@ -316,7 +347,11 @@ The LDAP server is mostly built by hand and should therefore be restored from backups in case of a catastrophic failure. Care should be taken to keep the SSH keys of the server intact. TODO: analyse <https://gitlab.torproject.org/tpo/tpa/team/-/issues/33908>. The IP address (and name?) of the LDAP server should not be hardcoded anywhere. When the server was last renumbered ([issue 33908](https://gitlab.torproject.org/tpo/tpa/team/-/issues/33908)), the only changes necessary were on the server itself, in `/etc`. So in theory, a fresh new server could be deployed (from backups) in a new location (and new address) without having to do much. # Reference ## Installation Loading Loading
howto/ldap.md +42 −7 Original line number Diff line number Diff line Loading @@ -296,19 +296,50 @@ downtime, because users and passwords are *copied* over to all hosts. In other words, authentication doesn't rely on the LDAP server being up. In general, OpenLDAP is very stable and doesn't generally crash, so we haven't had many emergencies scenarios with it yet. If anything happens, make sure the `slapd` service is running. The `ud-ldap` software, on the other hand, is a little more complicated and can be hard to diagnose. It has a large number of moving parts (Python, Perl, Bash, Shell scripts) and talks over a large number of protocols (email, DNS, HTTPS, SSH, finger). The failure modes documented here are far from exhaustive and you should expect exotic failures and error messages. ### LDAP server failure That said, if the LDAP server goes down, password changes will not work, and the server inventory (at <https://db.torproject.org/>) will be gone. A mitigation is to use Puppet manifests and/or PuppetDB to get a host list and server inventory, see the [Puppet documentation](puppet) for details. In general, OpenLDAP is very stable and doesn't generally crash, so we haven't had many emergencies scenarios with it yet. If anything happens, make sure the `slapd` service is running. ### Git server failure The `ud-ldap` software, on the other hand, is a little more complicated and can be hard to diagnose. TODO: expand on the failure modes. The LDAP server will fail to regenerate (and therefore update) zone files and zone records if the Git server is unavailable. This is described in [issue 33766](https://gitlab.torproject.org/tpo/tpa/team/-/issues/33766). The fix is to recover the git server. A workaround is to run this command on the primary DNS server (currently `nevii`): sudo -u dnsadm /srv/dns.torproject.org/bin/update --force ### ud-replicate failures TODO: i seem to recall `ud-replicate` failing somehow, possibly because of SSH multiplexing or something? ### Dependency loop on new installs Installing a new server requires granting the new server access various machines, including [puppet](puppet) and the LDAP server itself. This is granted ... by Puppet through LDAP! So a server cannot register itself on the LDAP server and needs an operator to first create a `host` snippet on the LDAP server, and then run Puppet on the Puppet server. This is documented in the [installation notes](new-machine). ## Disaster recovery Loading @@ -316,7 +347,11 @@ The LDAP server is mostly built by hand and should therefore be restored from backups in case of a catastrophic failure. Care should be taken to keep the SSH keys of the server intact. TODO: analyse <https://gitlab.torproject.org/tpo/tpa/team/-/issues/33908>. The IP address (and name?) of the LDAP server should not be hardcoded anywhere. When the server was last renumbered ([issue 33908](https://gitlab.torproject.org/tpo/tpa/team/-/issues/33908)), the only changes necessary were on the server itself, in `/etc`. So in theory, a fresh new server could be deployed (from backups) in a new location (and new address) without having to do much. # Reference ## Installation Loading