"PredictedPortsRelevanceTime 0" causes stagnant/uncommunicative onion services, stale descriptors
I am running 72 tor daemons with the following spec:
Tor 0.2.9.7-rc (git-6b6ad81c) running on Linux with Libevent 2.0.21-stable, OpenSSL 1.0.1t and Zlib 1.2.8.
...on a cluster of identical Raspberry Pi hardware.
The goal is to experiment with Tor bandwidth via OnionBalance, so I have been tweaking configurations because a cluster of N tor daemons doesn't really benefit from predictive persistent anything.
The configuration (text in the footer) in about 8% of cases, creates a daemon which, after initial upload, appears to never (or-only-very-rarely - unsure) refresh its descriptors in an HSDir.
This behaviour stops when "PredictedPortsRelevanceTime 0" is commented out.
Using a small custom Stem script, I query the age of the 72 daemons' descriptor; the vast majority are less than 2 hours old, but some - the afflicted daemons - are 10+hours old.
Sample output from my tool:
19:25:22 mistral:~ $ ls-hsdir `cat Dropbox/all-onions.txt` v=2 age=5183 pub(2016-12-23 18:00:00) 2pnhm32wvh2g6bod v=2 age=5183 pub(2016-12-23 18:00:00) 2ss5hl24km3cnedb # unavailable 44kpqx3wj4pdj4x3 v=2 age=1583 pub(2016-12-23 19:00:00) 457vhfiipyfahsw2 v=2 age=5183 pub(2016-12-23 18:00:00) 4byeybc6yyqvxc64 v=2 age=12383 pub(2016-12-23 16:00:00) 4sj56yfqt6iimah2 v=2 age=5183 pub(2016-12-23 18:00:00) 57j6n5nsrvl2n3lm # unavailable 5imawjwdy2332sk2 v=2 age=5183 pub(2016-12-23 18:00:00) 5k2ukr3gjxw4iuwo v=2 age=12383 pub(2016-12-23 16:00:00) 6bdgdiyoqdaq65oh v=2 age=1583 pub(2016-12-23 19:00:00) 6egxpvvszfzriamo v=2 age=5183 pub(2016-12-23 18:00:00) 7rydmwifplyugjzg v=2 age=5183 pub(2016-12-23 18:00:00) a7ls3tboibdtexpa v=2 age=66383 pub(2016-12-23 01:00:00) apk2wb3qdwzovtdj v=2 age=1583 pub(2016-12-23 19:00:00) av6plyhrd5j7enoo v=2 age=1583 pub(2016-12-23 19:00:00) awocgbvyljq4nf2p v=2 age=5183 pub(2016-12-23 18:00:00) ayzn2s76oh4eqw45 v=2 age=37583 pub(2016-12-23 09:00:00) b6rzknxn664juice # unavailable bnuy3zlmrnvljylh v=2 age=1583 pub(2016-12-23 19:00:00) btxtnep4ipsgiq6j ... ...
The daemons, despite some having such old descriptors, are all still reachable some 21 hours after launch
I shall be taking these (cited) daemons down, but can recreate them pretty easily.
Purely speculatively, it does sound vaguely similar to this Ricochet issue which arma reported to Ricochet: https://github.com/ricochet-im/ricochet/issues/245
I have 2x 'debug' logs from the same physical machine, one which is of a 'good' daemon and the other 'stale' daemon, running concurrently. The 'good' log is 35Mb versus the 'stale' 27Mb, but comparison with other logs does not suggest a strong correlation for stale daemons vs: logfile size.
The files are presumably too large to attach? Even after compression they will be several Mb.
Running carml on an stale daemon for HS_DESC activity showed little of note. Surprisingly little, even.
I'm stuck for ideas, but am aware that a very large site uses this option in its 2.7 config, so it would be good to know if it is needed and/or helpful for SingleOnions in 2.9, and.or also bugfixed.
19:28:24 rig2:hs2.d $ more config DataDirectory /home/alecm/master/halfagig/hs2.d HiddenServiceDir /home/alecm/master/halfagig/hs2.d ControlPort unix:/home/alecm/master/halfagig/hs2.d/control.sock SocksPort 0 Log debug file /home/alecm/master/halfagig/hs2.d/log.txt SafeLogging 0 HeartbeatPeriod 60 minutes # HiddenServicePort 19 localhost:8502 # HiddenServicePort 22 localhost:22 HiddenServicePort 80 localhost:10502 HiddenServiceNumIntroductionPoints 3 LongLivedPorts 19,22,80 # # CircuitBuildTimeout 60 # LearnCircuitBuildTimeout 0 PredictedPortsRelevanceTime 0 # UseEntryGuards 0 # UseEntryGuardsAsDirGuards 0