Now that we expire 20% of our slowest circuits, there is a chance that clients may pick a rend point that
hidden services are unable to reach in 3 tries within their circuit build timeout value. This will cause the
client connection to fail.
We should look at this code and see if we can make it more resilient to timeout, or have it backoff on the
timeout value after N tries instead of giving up entirely on the connection.
[Automatically added by flyspray2trac: Operating System: All]
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
Is there any progress on this or are there new ideas? I believe this might be a reason for many of the hidden service connectivity problems we've been hearing from alpha users lately. Something I've experience myself was that occasionally a hidden service would take 120 seconds to time out, but a new request would very quickly succeed.
Trac: Keywords: N/Adeleted, N/Aadded Description: Now that we expire 20% of our slowest circuits, there is a chance that clients may pick a rend point that
hidden services are unable to reach in 3 tries within their circuit build timeout value. This will cause the
client connection to fail.
We should look at this code and see if we can make it more resilient to timeout, or have it backoff on the
timeout value after N tries instead of giving up entirely on the connection.
[Automatically added by flyspray2trac: Operating System: All]
to
Now that we expire 20% of our slowest circuits, there is a chance that clients may pick a rend point that
hidden services are unable to reach in 3 tries within their circuit build timeout value. This will cause the
client connection to fail.
We should look at this code and see if we can make it more resilient to timeout, or have it backoff on the
timeout value after N tries instead of giving up entirely on the connection.
I've seen a hidden service client time out on a rendezvous circuit, then try again with a new rendezvous circuit and introduction point, much faster than I think it should have.
Trac: Priority: minor to normal Owner: mikeperry to rransom Status: new to assigned
Something I've experience myself was that occasionally a hidden service would take 120 seconds to time out, but a new request would very quickly succeed.
This problem is on the client side, not the server side (which this ticket's description focuses on). The client spends its pre-built general-purpose circuits somehow (possibly on the descriptor fetch, possibly on introduction or rendezvous circuits which immediately time out (I haven't dug thoroughly enough into the source to find out whether this happens yet)), then all of the rendezvous circuits and introduction circuits it opens time out; when the user opens a second AP connection after the first times out, the client has some pre-built circuits ready, and the introduction and rendezvous attempts succeed before the CBT code reaps those circuits.
See bug1297a ( git://git.torproject.org/rransom/tor.git bug1297a ) for fixes for some timeout-induced breakage on the client side. I suspect that this doesn't completely fix legacy/trac#1297 (moved) on the client side, and it doesn't even touch the hidden service side.
Something I've experience myself was that occasionally a hidden service would take 120 seconds to time out, but a new request would very quickly succeed.
This problem is on the client side, not the server side (which this ticket's description focuses on). The client spends its pre-built general-purpose circuits somehow (possibly on the descriptor fetch, possibly on introduction or rendezvous circuits which immediately time out (I haven't dug thoroughly enough into the source to find out whether this happens yet)),
From circuit_launch_by_extend_info, if circ is being cannibalized:
/* reset the birth date of this circ, else expire_building * will see it and think it's been trying to build since it * began. */ tor_gettimeofday(&circ->_base.timestamp_created);
So intro and rend circuits do not die immediately after they are obtained through cannibalism.
In order to give hidden services with high circuit-build timeouts a chance of working, we need to modify the client code so that when a client's intro circ times out in state C_INTRODUCE_ACK_WAIT, the client leaves its corresponding rendezvous circuit open while it tries again with a different intro/rend circuit pair. This will require creating another state for rendezvous circuits (stored in the ‘purpose’ field).
At this point, I'm having a hard time seeing this as having a good risk/reward ratio 0.2.2.x. Once there's code, you can try to convince me otherwise if you want.
Trac: Milestone: Tor: 0.2.2.x-final to Tor: 0.2.3.x-final
See bug1297b ( https://git.torproject.org/rransom/tor.git bug1297b ) for a not-yet-tested branch on 0.2.3.x to make clients keep HS circuits which have reached their normal CBT around longer while retrying with new intro/rend circuits.
I will need to add a configuration option to allow users to disable this new behaviour, because even though it will clearly improve HS connection-establishment performance (assuming it works correctly), I suspect that it will harm performance after the connection is established, because we will now use circuits which took longer to build. We currently do not have tools designed to test latency on already-opened circuits; when we do, we will want to investigate this further.
There is one remaining change to make for this ticket, on the service side: hidden services should be able to keep their CIRCUIT_PURPOSE_S_CONNECT_REND circuits open after they time out, while building another rendezvous circuit in parallel.
The commits on bug1297b (up to c04093363803a4120bdecae82d61e357e869d1fe) do not break Tor when used in TBB. I've pushed some more commits, including a few to fix unrelated bugs; the new changes are not yet tested, and will have to be squashed and rearranged a bit.
See bug1297b-v2 ( https://git.torproject.org/rransom/tor.git bug1297b-v2 ) for the rebased branch. This branch contains my bug4759-v2 branch, because these changes require that legacy/trac#4759 (moved) be fixed.
Looks good, I think. Could I have some comments explaining what can happen to a circuit once hs_circ_has_timed_out is set on it? The current comments do a good job of explaining when the flag is set, but not the allowable transitions out of that state. (So, the idea is that a "timed out" circuit is not really timed out, but allowed to stick around a little longer in case it works, in which case we declare it to be okay?)
Why would you want to set CloseHSClientCircuitsImmediatelyOnTimeout ? Is it just there for testing, or what?
Is there any limit on how many times this code can relaunch circuits on timeout for the same request?
Looks good, I think. Could I have some comments explaining what can happen to a circuit once hs_circ_has_timed_out is set on it? The current comments do a good job of explaining when the flag is set, but not the allowable transitions out of that state.
OK. I'll push a comment change tomorrow.
(So, the idea is that a "timed out" circuit is not really timed out, but allowed to stick around a little longer in case it works, in which case we declare it to be okay?)
Yes, that's the idea.
Why would you want to set CloseHSClientCircuitsImmediatelyOnTimeout ? Is it just there for testing, or what?
The justification for Tor's adaptive-CBT code is that circuits which are built more quickly are also ‘faster’ after they are built. These changes will cause clients to use circuits with longer build times, in order to decrease the overall time until some circuit is connected to a hidden service. Users who connect to or host latency-sensitive hidden services (e.g. IRC) might want to set the options which disable these changes.
We will also want to use those options to test the impact of this change on performance, someday when we have a performance-measurement tool which measures the latency on an open circuit (rather than only measuring the time until a first request has completed through a Tor client with no circuits open).
Is there any limit on how many times this code can relaunch circuits on timeout for the same request?
On the client side, HS circuits are relaunched by the existing code in circuit_get_open_circ_or_launch when it does not find an ‘acceptable’ circuit to use (as defined by circuit_is_acceptable, which never considers a circuit with hs_circ_has_timed_out set acceptable). The client will continue launching circuits as long as there is an AP connection trying to connect to the hidden service and there is an intro point remaining to try for the HS. (Before the last legacy/trac#3825 (moved) change, clients could keep pounding an intro point for SocksTimeout seconds; now, the maximum number of intro circs is five circs per intro point.)
On the service side, rendezvous circuits are relaunched when they reach the normal timeout for four-hop circuits. The service will stop launching circuits to a client's rendezvous point after launching MAX_REND_FAILURES circuits (currently 30) or after trying to connect for MAX_REND_TIMEOUT seconds (currently 30). [ticket:4241 MAX_REND_FAILURES is too high], but I don't know what number to lower it to yet.
This change does not increase the number of circuits built for a hidden-service connection attempt; it is likely to decrease the number of circuits, by decreasing the time before a client successfully connects to a hidden service (and thus decreasing the time for which it builds new circuits for the HS connection attempt).
Looks good, I think. Could I have some comments explaining what can happen to a circuit once hs_circ_has_timed_out is set on it? The current comments do a good job of explaining when the flag is set, but not the allowable transitions out of that state.