Every connection request in tor could be modified to random-exponential-backoff on failure.
This would resolve repeated-connection overloading issues in general.
In particular, it would reduce the risk that old versions would DoS the authorities (or fallback directories, see legacy/trac#15775 (moved) / legacy/trac#15228 (moved)) when the clients are switched off.
Connections to authorities are a priority for this change, then directory servers.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
It is impossible that we will fix all 226 currently open 028 tickets before 028 releases. Time to move some out. This is my second pass through the "new" and tickets, looking for things to move to 0.2.9.
Trac: Milestone: Tor: 0.2.8.x-final to Tor: 0.2.9.x-final
Connection failures are detected in connection_handle_read_impl() / connection_handle_write_impl(), which call, generically, connection_close_immediate()/connection_mark_for_close_internal(), but also in the case of orconns, call connection_or_notify_error(), and call connection_edge_end_errno() for edge connections.
The connection_close_immediate()/connection_mark_for_close_internal() path flows to connection_about_to_close_connection(), which can call connection_dir_about_to_close(), connection_or_about_to_close(), connection_ap_about_to_close() or connection_exit_about_to_close(). In the case of orconns and edge connections everything interesting happens from connection_or_notify_error() and connection_edge_end_errno(), but connection_dir_about_to_close() is the trigger point for retrying downloads from the directory servers.
Edge connections are either outgoing from the exit, in which case we just send an END cell down the circuit on failure, from connection_edge_end_errno() -> connection_edge_end() -> connection_edge_send_command(), or incoming from the client, in which case we don't get any choices about retrying. There's no retry policy to change there.
Orconn failures cause circuits to die or fail to attach, and these flow through circuit_n_chan_done() and circuit_unlink_all_from_channel() from channel_closed(). Ultimately, connection failures end up in circuit_about_to_free(), and then for origin circuits in circuit_build_failed() when handling a circuit closed for error.
Since all of these possible failure cases are ultimately driven from somewhere else (e.g., exit connection fails) and trigger reporting back to the cause of that connection (e.g. send END cell) rather than retrying, or are on the client side and become a matter of general circuit-building policy, for this ticket I'll be focusing attention on retries of failed downloads from the directory servers. We should think about backoffs for circuit building at some point perhaps, but it seems to be largely separable from the question of directories, less critical for DoS-resistance since there aren't analogous heavily loaded elements like the authorities, and more security-sensitive because of potential implications for behavior when we fail to connect to our preferred entry guard.
this ultimately uses download_status_t too just like the consensus download;
see download_status_is_ready_by_sk_in_cl() and friends in routerlist.c
connection_dir_download_routerdesc_failed()
890 /* No need to relaunch descriptor downloads here: we already do it
891 * every 10 or 60 seconds (FOO_DESCRIPTOR_RETRY_INTERVAL) in main.c. */
- The mechanism here is in launch_descriptor_fetches_callback()/reset_descriptor_failures_callback(); we can realize exponential backoff by suitable adjustments
connection_dir_bridge_routerdesc_failed()
calls connection_dir_retry_bridges()
calls retry_bridge_descriptor_fetch_directly()
calls launch_direct_bridge_descriptor_fetch()
At minimum, it should be easy to implement exponential backoffs for consensus and certificate downloads through the download_status_t mechanism, since they already notify it of their successes/failures and ask it whether we're ready to attempt a new download yet. Further ivestigation of the right approach for the bridge descriptor and router descriptor download cases pending.
Please review implementation in my bug15942 branch; this has been tested by unit tests for the random exponential backoff download schedule in src/test/test_dir.c, and by using iptables to block outgoing TCP connections while bootstrapping a client to observe backoffs in progress.
(oh and also: most of my questions on that branch are actual questions, and not sneaky suggestions. "No, that would be a bad idea" is an okay answer in most cases.)