actively monitor resources like available storage space
As the two incidents of at least temporary losses/unavailability of descriptors were due to insufficient memory (cf. here, a timed tasked should check this (and possibly other parameters) at regular intervals (preferably in a timely manner before the next run) and raise the red flag when a problem is visible.
Might this be useful? Or is storage only a current problem?