Once you're running the legacy/trac#1810 (moved) patch, you should be seeing info-severity log lines like:
May 30 16:59:32.000 [info] mark_my_descriptor_dirty(): Decided to publish new relay descriptor: set onion keyMay 30 17:05:20.000 [info] mark_my_descriptor_dirty(): Decided to publish new relay descriptor: set onion keyMay 30 17:05:22.000 [info] mark_my_descriptor_dirty(): Decided to publish new relay descriptor: config changeMay 30 17:08:18.000 [info] mark_my_descriptor_dirty(): Decided to publish new relay descriptor: ORPort found reachableMay 30 17:08:19.000 [info] mark_my_descriptor_dirty(): Decided to publish new relay descriptor: DirPort found reachableMay 30 17:39:09.000 [info] mark_my_descriptor_dirty(): Decided to publish new relay descriptor: rotated onion keyMay 31 09:12:59.000 [info] mark_my_descriptor_dirty(): Decided to publish new relay descriptor: set onion keyJun 01 03:13:26.000 [info] mark_my_descriptor_dirty(): Decided to publish new relay descriptor: time for new descriptor
(that one is from moria1)
What does your set of log lines look like, ideally including a period where publication was working and a period where it fell out of the consensus?
right now at 22:58 UTC+2 all four blutmagie routers are flagged running on the Tor projects's consensus-health web site, whereas only blutmagie3 and blutmagie4 are flagged running on my Tor node dedicated for feeding the tns site. Blutmagie and blutmagie2 are lacking the running flag. There's no indication these routers decided to publish new relay descriptor recently.
torstatus:~/tmp# telnet localhost 9051
Trying 127.0.0.1...
Connected to localhost.
Escape character is '!^]'.
authenticate "XXX-censored-XXX"
250 OK
getinfo ns/name/blutmagie
250+ns/name/blutmagie=
r blutmagie YpexOmh7UhpZxr15GIolAewDoGU bbf9bJUi0nIosJVoE4M52OI5sZk 2011-06-05 !1 (closed):15:17 192.251.226.206 443 80
s Exit Fast Guard HSDir Named Stable V2Dir Valid
w Bandwidth=47100
p reject 25,119,135-139,445,465,563,587,1214,4661-4666,6346-6429,6660-6999
.
250 OK
getinfo ns/name/blutmagie2
250+ns/name/blutmagie2=
r blutmagie2 Z+yEN22cTEZ9zoYhqsoQkWC1Jk4 m49nalEUoKBjRfHcc4ScLq2bXdM 2011-06-05 !1 (closed):15:18 192.251.226.206 8080 707
s Exit Fast Guard HSDir Named Stable V2Dir Valid
w Bandwidth=65500
p reject 25,119,135-139,445,465,563,587,1214,4661-4666,6346-6429,6660-6999
.
250 OK
getinfo ns/name/blutmagie3
250+ns/name/blutmagie3=
r blutmagie3 ZsqH4WTxz86MO7XAlSF6KFeLi68 +ZmFcRXlziVExGfe2xB/gMeOtr4 2011-06-05 !1 (closed):15:18 192.251.226.205 443 80
s Exit Fast Guard HSDir Named Running Stable V2Dir Valid
w Bandwidth=36400
p reject 25,119,135-139,445,465,563,587,1214,4661-4666,6346-6429,6660-6999
.
250 OK
getinfo ns/name/blutmagie4
250+ns/name/blutmagie4=
r blutmagie4 e2mNMn8WlVkECP7ZXN7hVld00TY v9Ah1Xy8dmuIigYgqaBkU8MINR0 2011-06-05 !1 (closed):15:19 192.251.226.205 22 21
s Exit Fast Guard HSDir Named Running Stable V2Dir Valid
w Bandwidth=40400
p reject 25,119,135-139,445,465,563,587,1214,4661-4666,6346-6429,6660-6999
.
250 OK
I upgraded Tor feeding torstatus.blutmagie.de from tor-0.2.2.24-alpha to tor-0.2.3.1-alpha. Let's see if this changes something. Today tor-0.2.2.24-alpha still reported single routers flagged "running" in Tor Metrics Portal's Consensus Health not running.
After upgrading my Tor network status box from 0.2.2.24-alpha to 0.2.3.1-alpha two weeks ago I never saw my routers missing the running flag again. Thus the issue seems to be solved. It looks like it was not a problem with the routers blutmagie1-4 running 0.2.3.1-alpha.
Well isn't that fun. the legacy/trac#1810 (moved) fix is not in 0.2.3.1, so we still have some bug left in 0.2.2.x that probably got miraculously resolved in 0.2.3.1.
See branch bug3327 in my public repo. If we like it upon review, I propose that we test it out first in 0.2.3.x, and merge it into 0.2.2.x only when it's had some testing in the wild.
bug3327 looks reasonable. I'd suggest making it a 'major' feature, and changing 'our' to 'their' in the changes file. And maybe s/Routers/Relays/ while we're at it.
Note that we're going to see more publication attempts, and thus more descriptors in the wild, in some cases. The first case that comes to mind is a relay that thinks it's reachable but a quorum of directory authorities can't reach it. I expect we'll see 12 times as many descriptors for those relays. Clients won't see them, so it's not so bad, but karsten's metrics datasets will bloat. I wonder if relays that cache v2 data will fetch them too, if tor26 thinks they're reachable.
It sure will be tricky to figure out if this patch is working right, since it only kicks in for the case that we don't think exists much right now. I wonder if we might tell the directory authority our reason for generating the new descriptor, e.g. as an http header when we post? That would let us keep a better eye on whether (and for whom) this patch is seeing action.
I agree that we'll be happier putting this feature into 0.2.3.