"PredictedPortsRelevanceTime 0" causes stagnant/uncommunicative onion services, stale descriptors
I am running 72 tor daemons with the following spec:
Tor 0.2.9.7-rc (git-6b6ad81c) running on Linux with Libevent 2.0.21-stable, OpenSSL 1.0.1t and Zlib 1.2.8.
...on a cluster of identical Raspberry Pi hardware.
The goal is to experiment with Tor bandwidth via OnionBalance, so I have been tweaking configurations because a cluster of N tor daemons doesn't really benefit from predictive persistent anything.
The configuration (text in the footer) in about 8% of cases, creates a daemon which, after initial upload, appears to never (or-only-very-rarely - unsure) refresh its descriptors in an HSDir.
This behaviour stops when "PredictedPortsRelevanceTime 0" is commented out.
Using a small custom Stem script, I query the age of the 72 daemons' descriptor; the vast majority are less than 2 hours old, but some - the afflicted daemons - are 10+hours old.
Sample output from my tool:
19:25:22 mistral:~ $ ls-hsdir `cat Dropbox/all-onions.txt`
v=2 age=5183 pub(2016-12-23 18:00:00) 2pnhm32wvh2g6bod
v=2 age=5183 pub(2016-12-23 18:00:00) 2ss5hl24km3cnedb
# unavailable 44kpqx3wj4pdj4x3
v=2 age=1583 pub(2016-12-23 19:00:00) 457vhfiipyfahsw2
v=2 age=5183 pub(2016-12-23 18:00:00) 4byeybc6yyqvxc64
v=2 age=12383 pub(2016-12-23 16:00:00) 4sj56yfqt6iimah2
v=2 age=5183 pub(2016-12-23 18:00:00) 57j6n5nsrvl2n3lm
# unavailable 5imawjwdy2332sk2
v=2 age=5183 pub(2016-12-23 18:00:00) 5k2ukr3gjxw4iuwo
v=2 age=12383 pub(2016-12-23 16:00:00) 6bdgdiyoqdaq65oh
v=2 age=1583 pub(2016-12-23 19:00:00) 6egxpvvszfzriamo
v=2 age=5183 pub(2016-12-23 18:00:00) 7rydmwifplyugjzg
v=2 age=5183 pub(2016-12-23 18:00:00) a7ls3tboibdtexpa
v=2 age=66383 pub(2016-12-23 01:00:00) apk2wb3qdwzovtdj
v=2 age=1583 pub(2016-12-23 19:00:00) av6plyhrd5j7enoo
v=2 age=1583 pub(2016-12-23 19:00:00) awocgbvyljq4nf2p
v=2 age=5183 pub(2016-12-23 18:00:00) ayzn2s76oh4eqw45
v=2 age=37583 pub(2016-12-23 09:00:00) b6rzknxn664juice
# unavailable bnuy3zlmrnvljylh
v=2 age=1583 pub(2016-12-23 19:00:00) btxtnep4ipsgiq6j
...
...
The daemons, despite some having such old descriptors, are all still reachable some 21 hours after launch
I shall be taking these (cited) daemons down, but can recreate them pretty easily.
Purely speculatively, it does sound vaguely similar to this Ricochet issue which arma reported to Ricochet: https://github.com/ricochet-im/ricochet/issues/245
I have 2x 'debug' logs from the same physical machine, one which is of a 'good' daemon and the other 'stale' daemon, running concurrently. The 'good' log is 35Mb versus the 'stale' 27Mb, but comparison with other logs does not suggest a strong correlation for stale daemons vs: logfile size.
The files are presumably too large to attach? Even after compression they will be several Mb.
Running carml on an stale daemon for HS_DESC activity showed little of note. Surprisingly little, even.
I'm stuck for ideas, but am aware that a very large site uses this option in its 2.7 config, so it would be good to know if it is needed and/or helpful for SingleOnions in 2.9, and.or also bugfixed.
19:28:24 rig2:hs2.d $ more config
DataDirectory /home/alecm/master/halfagig/hs2.d
HiddenServiceDir /home/alecm/master/halfagig/hs2.d
ControlPort unix:/home/alecm/master/halfagig/hs2.d/control.sock
SocksPort 0
Log debug file /home/alecm/master/halfagig/hs2.d/log.txt
SafeLogging 0
HeartbeatPeriod 60 minutes
# HiddenServicePort 19 localhost:8502
# HiddenServicePort 22 localhost:22
HiddenServicePort 80 localhost:10502
HiddenServiceNumIntroductionPoints 3
LongLivedPorts 19,22,80
#
# CircuitBuildTimeout 60
# LearnCircuitBuildTimeout 0
PredictedPortsRelevanceTime 0
# UseEntryGuards 0
# UseEntryGuardsAsDirGuards 0