draft: notify auto-dns during reboots
Paired with the mininag retirement (team#41734 (closed)), this will ensure hosts get removed from rotation when we reboot them.
There are many problems here:
-
the zonefile gets rebuilt too often -
code is duplicated between the ganeti and non-ganeti cases (an already existing problem -
not sure i like the good=True API -
we should do this for all hosts, as this will take forever, we need a way to check if a host is a mirror, presumbaly in LDAP
Merge request reports
Activity
requested review from @lelutin
@lelutin would love to hear your thoughts on how i could improve this code.
assigned to @anarcat
if I understand correctly, those auto_dns state actions are really only required for a handful of hosts which are in a round-robin rotation for the same service as others. so maybe we should delay merging this until we have some way to check whether or not the state changes are required.
one thing that I saw is that there's currently no call to
auto_dns_set_status
withgood=True
, onlyFalse
. should we set state to good once we know that the hosts are back online (I'm guessing that would be afterif not wait_for_shutdown()
... or is it rather what's intended in the call towards the end of the function right beforereturn True
)?I agree that the zone rebuild should be split out and done only once before we wait for the shutdowns to happen, and then once more after setting state back to good.
if I understand correctly, those auto_dns state actions are really only required for a handful of hosts which are in a round-robin rotation for the same service as others. so maybe we should delay merging this until we have some way to check whether or not the state changes are required.
agreed.
one thing that I saw is that there's currently no call to
auto_dns_set_status
withgood=True
, onlyFalse
. should we set state to good once we know that the hosts are back online (I'm guessing that would be afterif not wait_for_shutdown()
... or is it rather what's intended in the call towards the end of the function right beforereturn True
)?there is one
auto_dns_set_status(primary_dns_con, con.host, True)
line there, but the last one is incorrect. that API sucks, we should just have good/bad directly in the function name.I agree that the zone rebuild should be split out and done only once before we wait for the shutdowns to happen, and then once more after setting state back to good.
yep.
what do you think of the duplication between
ganeti.stop_instances
andshutdown_and_wait
? part of the messiness here is related to that...(i mean the whole thing is a pile of ugly spaghetti code now, but that's a particilarly yucky copy-paste, imho).
hmm well the function
shutdown_and_wait
is pretty long and it has long if blocks and apparent code duplication in certain parts, so it's hard to understand what's happening with it.now is it this MR's fault? I don't think so.. maybe we should refactor the function at some point to make it easier to read and maintain. but I don't think this idea should prevent the addition of the auto_dns stuff.
fwiw fabric does have some convenience classes and functions to help out with writing tests: https://docs.fabfile.org/en/stable/api/testing.html
maybe we can start with creating tests for the
shutdown_and_wait()
function and then we'll be able to refactor it with more confidence
mentioned in issue team#41766 (closed)
i don't think this is relevant anymore. in TPA-RFC-67 (#41734), we simply retired mininag and the next step is probably to rewrite it with a prometheus backend going forward, see team#41766 (closed) in any case.
mentioned in issue team#41811 (closed)