Eventual inability to connect to a HS from a client that lists most countries in ExcludeNodes
My v3 hidden services become unavailable after a while to a client with lots of countries listed in ExcludeNodes (I can't recall what used to happen to v2 services).
The set up is as follows:
I have multiple HS in a server which are copies of each other and similar in every respect, except for the keys and hostnames
I have a client connecting to these which has lots of countries listed in ExcludeNodes; in fact it excludes most countries in the world, but not to the extent of making tor unusable
For a while after this is set up, I'm able to use all of these services from the client. But then one day one of those will fail to connect. There's a wait of many seconds while tor is busy, and ultimately it will fail the request. If I retry, often it'll take just as long to fail again, but it reaches a point that after a number of failures it will start to fail instantly.
Meanwhile, the other, similar services are still accessible to this client. Also a browser connected to this tor bundle can browse the web, etc.
I used to think there was something wrong with the service that failed so I attempted to redefine it and restart it. I did this until I found that the problem was the client. The client with lots of countries in ExcludeNodes reaches a point where it is unable to continue processing this service. The only way for this client to start working correctly again is to comment out the ExcludeNodes directive in torrc and restart the client (a HUP signal does not suffice), then after reinstating ExcludeNodes and restarting the client the hidden service will be accessible again... until some unspecified future date.
I should add in case it's relevant that the client is accessing the tor network through bridges. These are good bridges and are in good condition.
I wonder if it is a case of, with the passage of time, the descriptors database on the client side losing quality and becoming unable to support these operations. Because I believe that ExcludedNodes worked correctly at runtime, whether the database has had many countries excluded for a number of days or the database's just been refreshed. But if something is missing from the database maybe after a few days the software can't run all the permutations (to hit the ones that will allow it to connect). Maybe the problem is that the upkeeping and refilling of this database at all other times (while tor was already loaded but I was still not trying to access the hidden service), under ExcludeNodes conditions, is unable to refill the database properly... but then this condition only becomes apparent later, at HS connection time. (Though I'm not a Tor expert so forgive me if I'm not making sense).
Anyhow, I can't think how I could check what's in the descriptors database. I've tried replacing it with another tor bundle installation that didn't have this problem (and the state file) but I'm not able to complete such a test yet. All I know is the difference between a successful connection and an unsuccessful one (which I obtained by adding SETEVENTS INFO to a control connection). During a successful connection there's only about a hundred lines logged and it connects. During an unsuccessful connection lots more info messages are logged, like so,
650 INFO extend_info_from_node(): Not including the ed25519 ID for $(ID)~(NAME) at (IP), since it won't be able to authenticate it
intermixed with messages issued by origin_circuit_new() and rep_hist_note_used_internal() talking about seconds of predictive building remaining. And it fails to connect.
Perhaps there is a threshold of ExcludeNodes above which the conditions for the fulfilling of requests starves the client database of information and degrades it and makes the client incapable of doing its work, and perhaps my own setting for this option has exceeded the threshold (I have excluded most countries in the world except my own and those that have borders with it). What I can't get however is why this condition is never apparent as the client initially tries to use the service, only becoming apparent several days later.