it's unclear to me how to renumber a server that only has a record in LDAP (as opposed to something in dns/domains.git as well).
in legacy/trac#33730 (moved), the DNS records were not picked up by nevii, the authoritative nameserver, even after running ud-generate on alberti and ud-replicate on nevii. and indeed, those commands only populate /var/lib/misc/ on nevii, which doesn't touch the full zone loaded by bind.
so this needs clarification and documentation and aaargh.
Designs
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
ud-generate writes stuff in /var/cache/userdir-ldap/hosts, one directory per host
ud-replicate rsyncs that stuff to /var/lib/misc on all hosts
DNS servers (nevii and falax, at first glance) are special and have a precious little dns-sshfp file that gets generated with all those "automatic" records from the ipHostNumber field in LDAP
that file is therefore dropped in /var/lib/misc/thishost/dns-sshfp on nevii
the zone file used by bind is in /srv/dns.torproject.org/var/generated/torproject.org on nevii, but it doesn't include the file generated by ud-replicate, so it's generated by something else
I'm about there in my investigation. It looks like the file is actually generated after a git hook, triggered from cupani, for example look at this push on the dns/domains.git repo:
Push to git@git-rw.torproject.org:admin/dns/domains[...]remote: via /srv/git.torproject.org/git-helpers/post-receive-diff remote: == 00-sync-to-mirror == remote: == commit-mail == remote: == github-push == remote: == gitlab-push == remote: == irc-message == remote: == per-repo-hook == remote: run-parts: executing /srv/git.torproject.org/git-helpers/post-receive-per-repo.d/admin%dns%domains/trigger-dns-server admin/dns/domains /tmp/tmp.1bfbXjedlyremote: [/srv/git.torproject.org/git-helpers/post-receive-per-repo.d/admin%dns%domains/trigger-dns-server] Triggering update on dns master remote: 2020-03-30 22:10:05 /srv/dns.torproject.org/bin/update: ***** start of script ***** remote: 2020-03-30 22:10:05 /srv/dns.torproject.org/bin/update: pre flock remote: 2020-03-30 22:10:05 /srv/dns.torproject.org/bin/update: pre git pull remote: 2020-03-30 22:10:05 /srv/dns.torproject.org/bin/update: pre update-keys remote: 2020-03-30 22:10:09 /srv/dns.torproject.org/bin/update: pre build-services remote: 2020-03-30 22:10:09 /srv/dns.torproject.org/bin/update: pre for loop remote: 2020-03-30 22:10:09 /srv/dns.torproject.org/bin/update: pre write_zonefile for 0-26.72.229.38.in-addr.arpa remote: 2020-03-30 22:10:09 /srv/dns.torproject.org/bin/update: pre write_zonefile for 0.0.0.5.a.5.0.0.0.b.6.0.1.0.0.2.ip6.arpa remote: 2020-03-30 22:10:10 /srv/dns.torproject.org/bin/update: pre write_zonefile for 1.0.0.0.5.0.0.0.0.0.5.8.7.0.6.2.ip6.arpa remote: 2020-03-30 22:10:10 /srv/dns.torproject.org/bin/update: pre write_zonefile for 144-28.132.35.154.in-addr.arpa remote: 2020-03-30 22:10:10 /srv/dns.torproject.org/bin/update: pre write_zonefile for 16-28.235.45.89.in-addr.arpa remote: 2020-03-30 22:10:10 /srv/dns.torproject.org/bin/update: pre write_zonefile for 30.172.in-addr.arpa remote: 2020-03-30 22:10:10 /srv/dns.torproject.org/bin/update: pre write_zonefile for 64-28.132.35.154.in-addr.arpa remote: 2020-03-30 22:10:10 /srv/dns.torproject.org/bin/update: pre write_zonefile for b.0.0.0.0.b.6.0.0.0.0.0.0.2.6.2.ip6.arpa remote: 2020-03-30 22:10:10 /srv/dns.torproject.org/bin/update: pre write_zonefile for onion-router.net remote: 2020-03-30 22:10:10 /srv/dns.torproject.org/bin/update: pre write_zonefile for rev remote: 2020-03-30 22:10:10 /srv/dns.torproject.org/bin/update: pre write_zonefile for torproject.com remote: 2020-03-30 22:10:10 /srv/dns.torproject.org/bin/update: pre write_zonefile for torproject.net remote: 2020-03-30 22:10:11 /srv/dns.torproject.org/bin/update: pre write_zonefile for torproject.org remote: 2020-03-30 22:10:11 /srv/dns.torproject.org/bin/update: pre dns-update remote: 2020-03-30 22:10:11 /srv/dns.torproject.org/bin/update: done! remote: 2020-03-30 22:10:11 /srv/dns.torproject.org/bin/update: ***** end of script ***** remote: == xx-jenkins-trigger == remote: [hook[4791]] Triggering jenkins build for (https://git.torproject.org/admin/dns/domains.git, master, 2f5ed1f115f9a5aa6bad82ca7e1a6737fc8088f4). remote: No git jobs using repository: https://git.torproject.org/admin/dns/domains.git and branches: master remote: No Git consumers using SCM API plugin for: https://git.torproject.org/admin/dns/domains.git remote: [hook[4791]] Jenkins triggers done. To git-rw.torproject.org:admin/dns/domains bdd0d4e..2f5ed1f master -> masterupdating local tracking ref 'refs/remotes/origin/master'
Therefore, the script in /srv/dns.torproject.org/bin/update seems to have the magic sauce.
I haven't dug any deeper as to why that's not done automatically or what actually takes content of dns-sshfp, or how this could be done by hand, but it's definitely something that we should document. This affects the ganeti import procedure, but also the new-machine procedure.
It is also be important to figure out where exactly the TTL gets extracted from LDAP, and how to change it immediately, for the ganeti procedures.
a nice special case is when we renumber git.torproject.org, because then even the hack of pushing a change to the git repo doesn't work, because nevii pulls changes from git.torproject.org to update the zone, because i'm sad, because we can't have nice things.
rebuild_zones=0if [ -e /var/lib/misc/thishost/dns-sshfp ]; then if ! cmp -s /var/lib/misc/thishost/dns-sshfp "$tempfile"; then rebuild_zones=1 fifi[..]if [ "${rebuild_zones}" -gt 0 ]; then sudo -u dnsadm /srv/dns.torproject.org/bin/updatefi
the update can be triggered by hand with the last command above,
sudo -u dnsadm /srv/dns.torproject.org/bin/update, possibly with
--force
the $INCLUDE "/var/lib/misc/thishost/dns-sshfp" from the
dns/domains.git zonefile is not parsed by bind, but by
"makezonefile or whatever it's called to syntax check and to add
the SOA header"
What seems to have happened here is specific to the migration of vineale and the git infrastructure: the update script failed because it could not pull from git (because the original server was done), and aborted everything.
So the following should have happened instead:
update should have continued with the cached copy of the git repo if git pull failed
failing that, ud-replicate should have warned about the problem instead of silently succeeding, and retried until it worked
The above two points feel like the code changes that could happen to avoid that problem in the future. Everything else seems like docs that could be thrown in tsa/howto/ldap.mdwn.
But for now, i'll just go back to business as usual and try to get some more shit done instead.
Trac: Summary: DNS renumbering procedure fails if git is untouched to DNS renumbering procedure fails if git server is unavailable