Overzealous descriptor regeneration bug remains

changed milestone to %Tor: 0.2.2.x-final in legacy/trac

added component::core tor/tor in Legacy / Trac milestone::Tor: 0.2.2.x-final in Legacy / Trac priority::high in Legacy / Trac resolution::fixed in Legacy / Trac status::closed in Legacy / Trac tor-relay in Legacy / Trac type::defect in Legacy / Trac labels

Once you're running the legacy/trac#1810 (moved) patch, you should be seeing info-severity log lines like:

May 30 16:59:32.000 [info] mark_my_descriptor_dirty(): Decided to publish new relay descriptor: set onion key
May 30 17:05:20.000 [info] mark_my_descriptor_dirty(): Decided to publish new relay descriptor: set onion key
May 30 17:05:22.000 [info] mark_my_descriptor_dirty(): Decided to publish new relay descriptor: config change
May 30 17:08:18.000 [info] mark_my_descriptor_dirty(): Decided to publish new relay descriptor: ORPort found reachable
May 30 17:08:19.000 [info] mark_my_descriptor_dirty(): Decided to publish new relay descriptor: DirPort found reachable
May 30 17:39:09.000 [info] mark_my_descriptor_dirty(): Decided to publish new relay descriptor: rotated onion key
May 31 09:12:59.000 [info] mark_my_descriptor_dirty(): Decided to publish new relay descriptor: set onion key
Jun 01 03:13:26.000 [info] mark_my_descriptor_dirty(): Decided to publish new relay descriptor: time for new descriptor

(that one is from moria1)

What does your set of log lines look like, ideally including a period where publication was working and a period where it fell out of the consensus?

Adding Falo to CC per request by rransom

Trac:
Cc: N/A to Falo

1810 patch is running since yesterday on blutmagie. No "Decided to publish new relay descriptor" logging occured so far. Keep you posted...

Trac:
Username: Falo

this might be related to karsten's trouble. We should see if we have any indication of this on non-dirauths

Moving this to 0.2.2.x-final, since it seems to look important. We can kick it out again if it isn't.

Trac:
Milestone: Tor: unspecified to Tor: 0.2.2.x-final

right now at 22:58 UTC+2 all four blutmagie routers are flagged running on the Tor projects's consensus-health web site, whereas only blutmagie3 and blutmagie4 are flagged running on my Tor node dedicated for feeding the tns site. Blutmagie and blutmagie2 are lacking the running flag. There's no indication these routers decided to publish new relay descriptor recently.

torstatus:~/tmp# telnet localhost 9051

Trying 127.0.0.1... Connected to localhost. Escape character is '!^]'. authenticate "XXX-censored-XXX"

250 OK getinfo ns/name/blutmagie 250+ns/name/blutmagie= r blutmagie YpexOmh7UhpZxr15GIolAewDoGU bbf9bJUi0nIosJVoE4M52OI5sZk 2011-06-05 !1 (closed):15:17 192.251.226.206 443 80 s Exit Fast Guard HSDir Named Stable V2Dir Valid w Bandwidth=47100 p reject 25,119,135-139,445,465,563,587,1214,4661-4666,6346-6429,6660-6999 . 250 OK getinfo ns/name/blutmagie2 250+ns/name/blutmagie2= r blutmagie2 Z+yEN22cTEZ9zoYhqsoQkWC1Jk4 m49nalEUoKBjRfHcc4ScLq2bXdM 2011-06-05 !1 (closed):15:18 192.251.226.206 8080 707 s Exit Fast Guard HSDir Named Stable V2Dir Valid w Bandwidth=65500 p reject 25,119,135-139,445,465,563,587,1214,4661-4666,6346-6429,6660-6999 . 250 OK getinfo ns/name/blutmagie3 250+ns/name/blutmagie3= r blutmagie3 ZsqH4WTxz86MO7XAlSF6KFeLi68 +ZmFcRXlziVExGfe2xB/gMeOtr4 2011-06-05 !1 (closed):15:18 192.251.226.205 443 80 s Exit Fast Guard HSDir Named Running Stable V2Dir Valid w Bandwidth=36400 p reject 25,119,135-139,445,465,563,587,1214,4661-4666,6346-6429,6660-6999 . 250 OK getinfo ns/name/blutmagie4 250+ns/name/blutmagie4= r blutmagie4 e2mNMn8WlVkECP7ZXN7hVld00TY v9Ah1Xy8dmuIigYgqaBkU8MINR0 2011-06-05 !1 (closed):15:19 192.251.226.205 22 21 s Exit Fast Guard HSDir Named Running Stable V2Dir Valid w Bandwidth=40400 p reject 25,119,135-139,445,465,563,587,1214,4661-4666,6346-6429,6660-6999 . 250 OK

Trac:
Username: Falo

I upgraded Tor feeding torstatus.blutmagie.de from tor-0.2.2.24-alpha to tor-0.2.3.1-alpha. Let's see if this changes something. Today tor-0.2.2.24-alpha still reported single routers flagged "running" in Tor Metrics Portal's Consensus Health not running.

Trac:
Username: Falo

0.2.3.1 also has the bug. The recently released 0.2.2.28-beta should not have the most common case

oh wait, I misunderstood your comment. When upgrading to 0.2.3.2 or 0.2.2.26 or later, remember to add the FetchV2Networkstatus config option

Any news here?

moria1's output looks great, but I would expect it to.

After upgrading my Tor network status box from 0.2.2.24-alpha to 0.2.3.1-alpha two weeks ago I never saw my routers missing the running flag again. Thus the issue seems to be solved. It looks like it was not a problem with the routers blutmagie1-4 running 0.2.3.1-alpha.

Pls close this ticket.

Trac:
Username: Falo

Well isn't that fun. the legacy/trac#1810 (moved) fix is not in 0.2.3.1, so we still have some bug left in 0.2.2.x that probably got miraculously resolved in 0.2.3.1.

rransom suggests that a legacy/trac#535 (moved) solution is needed here.

See branch bug3327 in my public repo. If we like it upon review, I propose that we test it out first in 0.2.3.x, and merge it into 0.2.2.x only when it's had some testing in the wild.

Trac:
Status: new to needs_review

bug3327 looks reasonable. I'd suggest making it a 'major' feature, and changing 'our' to 'their' in the changes file. And maybe s/Routers/Relays/ while we're at it.

Note that we're going to see more publication attempts, and thus more descriptors in the wild, in some cases. The first case that comes to mind is a relay that thinks it's reachable but a quorum of directory authorities can't reach it. I expect we'll see 12 times as many descriptors for those relays. Clients won't see them, so it's not so bad, but karsten's metrics datasets will bloat. I wonder if relays that cache v2 data will fetch them too, if tor26 thinks they're reachable.

It sure will be tricky to figure out if this patch is working right, since it only kicks in for the case that we don't think exists much right now. I wonder if we might tell the directory authority our reason for generating the new descriptor, e.g. as an http header when we post? That would let us keep a better eye on whether (and for whom) this patch is seeing action.

I agree that we'll be happier putting this feature into 0.2.3.

Uploaded a new bug3327 branch with the changes you suggested. What do you think now?

Okay, I've cleaned it up, squashed it, tested it a little, and merged it.

Trac:
Resolution: N/A to fixed
Status: needs_review to closed

Trac:
Keywords: N/A deleted, tor-relay added

Overzealous descriptor regeneration bug remains

Child items ...

Activity