HS intro circuit retry logic fails when network interface is down
During investigations in legacy/trac#16387 (moved), we found out that mobile HSes won't retry their intro point circuits when their network interface is down.
This is a problem since mobile devices change their IP address all the time, and their network interface toggles on and off. Since the retry logic fails in this case, the mobile HS ends up rotating intro points everytime the interface goes up.
The problem is this chunk of code in rend_consider_services_intro_points()
:
/* Let's try to rebuild circuit on the nodes we want to retry on. */
SMARTLIST_FOREACH_BEGIN(retry_nodes, rend_intro_point_t *, intro) {
r = rend_service_launch_establish_intro(service, intro);
if (r < 0) {
log_warn(LD_REND, "Error launching circuit to node %s for service %s.",
safe_str_client(extend_info_describe(intro->extend_info)),
safe_str_client(service->service_id));
/* Unable to launch a circuit to that intro point, remove it from
* the valid list so we can create a new one. */
smartlist_remove(service->intro_nodes, intro);
rend_intro_point_free(intro);
continue;
}
intro->circuit_retries++;
} SMARTLIST_FOREACH_END(intro);
When our interface is down, rend_service_launch_establish_intro()
will fail immediately since it eventually calls connection_or_connect()
which leads to a connect()
call with no network.
As you can see from the code above, when that function fails immediately, we remove the intro point no questions asked, with no subsequent retries.
This is problematic, since the failure there is because of our local network, and does not indicate any issues with the intro point, so we should not ditch it so easily. Ideally, as long as we know that the error is local, we should probably keep on trying that same intro point.