Ticket legacy/trac#20853 (moved) introduced the issue with commit 63d3ba96f973735ded16e78bd0b8406b6fcdec35 (tor-0.3.0.1-alpha) with:
+ if (BUG(rend_service_is_ephemeral(new)) ||+ BUG(rend_service_is_ephemeral(old))) {+ continue;
The new service list does not ONLY contains ephemeral service so if you have one regular service and then you add an ephemeral one, a config reload will trigger BUG(rend_service_is_ephemeral(new) because that new object is from the global list containing all types of service.
Furthermore, this whole loop that is suppose to copy the intro points from the current service to the newly configured one is broken with that commit.
I'm working on a refactoring to first fix this bug then extract this large loop into a function and improve few things along the way with unit tests. It is also an important piece of work for legacy/trac#20657 (moved) (prop224 service) because we'll need it for the v3 service list.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
Both fixes the bug and refactor with unit test that part of the code. It is also an important piece for prop224 to have that service pruning code on reload in a separate function.
Changes look reasonable to me. One question, why does circuit_get_next_service_intro_circ() need the start argument?
In the old code, we just iterated from the beginning of the circuitlist everytime. Why do we now begin from start? Is it an optimization?
Also, BTW I didn't manage to reproduce the original bug. How do we do it? I started up a Tor with a normal HS, then I added an ADD_ONION, and then I HUPed. Am I missing a step?
Changes look reasonable to me. One question, why does circuit_get_next_service_intro_circ() need the start argument?
In the old code, we just iterated from the beginning of the circuitlist everytime. Why do we now begin from start? Is it an optimization?
Yes. It is also to basically follow the circuit_get_next_by_* API that uses that technique. Without it, we can't return a circuit from the middle of the list matching our requirements and then continuing the loop at that circuit. Else, the other options is for the function to return a list of matching circuits. No strong preferences here, I would be fine with either.
Also, BTW I didn't manage to reproduce the original bug. How do we do it? I started up a Tor with a normal HS, then I added an ADD_ONION, and then I HUPed. Am I missing a step?
Yes, normal service then you do ADD_ONION NEW:BEST Port=8161 and then HUP or SIGNAL RELOAD, you should hit the BUG. Btw, it won't be on console if you setup logging and should look like this (I just reproduced it on master):
Jan 09 09:23:44.000 [warn] tor_bug_occurred_(): Bug: src/or/rendservice.c:839: rend_config_services: Non-fatal assertion !(rend_service_is_ephemeral(new)) failed. (on Tor 0.3.0.1-alpha-dev 655ffeadd53833d9)Jan 09 09:23:44.000 [warn] Bug: Non-fatal assertion !(rend_service_is_ephemeral(new)) failed in rend_config_services at src/or/rendservice.c:839. Stack trace: (on Tor 0.3.0.1-alpha-dev 655ffeadd53833d9)[...]
Changes look reasonable to me. One question, why does circuit_get_next_service_intro_circ() need the start argument?
In the old code, we just iterated from the beginning of the circuitlist everytime. Why do we now begin from start? Is it an optimization?
Yes. It is also to basically follow the circuit_get_next_by_* API that uses that technique. Without it, we can't return a circuit from the middle of the list matching our requirements and then continuing the loop at that circuit. Else, the other options is for the function to return a list of matching circuits. No strong preferences here, I would be fine with either.
I think the way you have it is fine!
Also, BTW I didn't manage to reproduce the original bug. How do we do it? I started up a Tor with a normal HS, then I added an ADD_ONION, and then I HUPed. Am I missing a step?
Yes, normal service then you do ADD_ONION NEW:BEST Port=8161 and then HUP or SIGNAL RELOAD, you should hit the BUG. Btw, it won't be on console if you setup logging and should look like this (I just reproduced it on master):
Jan 09 09:23:44.000 [warn] tor_bug_occurred_(): Bug: src/or/rendservice.c:839: rend_config_services: Non-fatal assertion !(rend_service_is_ephemeral(new)) failed. (on Tor 0.3.0.1-alpha-dev 655ffeadd53833d9)Jan 09 09:23:44.000 [warn] Bug: Non-fatal assertion !(rend_service_is_ephemeral(new)) failed in rend_config_services at src/or/rendservice.c:839. Stack trace: (on Tor 0.3.0.1-alpha-dev 655ffeadd53833d9)[...]}}}
Hmm, I'm still unable to repro this with carml. I tried with unpatched df87812 and Tor will just successfuly HUP with no issues. Here is my torrc:
{{{
SocksPort auto
ControlPort auto
Jan 12 13:16:53.000 [notice] Tor 0.3.0.1-alpha-dev (git-df87812b) opening log file.
Jan 12 13:16:57.000 [notice] New control connection opened from 127.0.0.1.
Jan 12 13:16:58.000 [notice] New control connection opened from 127.0.0.1.
Jan 12 13:16:58.000 [notice] Received reload signal (hup). Reloading config and resetting internal state.
Jan 12 13:16:58.000 [notice] Read configuration file "/home/f/Computers/tor/mytor/../confs/bug21054".
Jan 12 13:16:58.000 [warn] CookieAuthFileGroupReadable is set, but will have no effect: you must specify an explicit CookieAuthFile to have it group-readable.
(For the record, I think that the circuit_get_next_by_() functions are a bit error-prone. And I think we should probably get rid of them someday if we can, maybe replacing them with some circuit_get_all_by_() variants that fill a smartlist. But that's sure another issue.)