Long-running tor instances fail to keep up-to-date directory information
We have a small number of long-running tor instances as part of our OnionPerf setups that are running 24/7. In the past, some of these tor instances got into a state where their directory information was no longer up-to-date enough to build circuits. In some cases they recovered after hours, days, or even weeks, but in some cases we had to restart the tor processes.
I'm attaching a graph that shows the number of open circuits as reported in heartbeat log messages. That number is relatively stable most of the time, depending on whether we're using the tor instance for making requests or for providing an onion service. But in some cases the number drops to zero, which coincides with the log message:
[notice] Our directory information is no longer up-to-date enough to build circuits: [...]
The graph also shows that sometimes the number magically goes up again. Those times coincide with the following log message:
[notice] We now have enough directory information to build circuits.
The purple dashed lines show when we restarted tor processes manually. Some of these restarts are unrelated to the number of open circuits. But some restarts happened explicitly because the tor instance was not working anymore for our measurements.
By the way, the op-nl instance shown in the middle was running 0.2.9.11-dev, whereas the op-us and op-hk instances were running 0.3.0.7-dev. It may be coincidence, but the older op-nl did not run out of up-to-date directory information, whereas the newer op-us and op-hk did. Was this issue maybe introduced in 0.3.0.x?
I have tor logs available for all these tor instances. I can easily provide them, either as a big tarball or for specific days and instances as a smaller tarball. Just let me know.