Reachability Tests aren't conducted if there are no exit nodes
Context:
- https://lists.torproject.org/pipermail/tor-dev/2014-October/007613.html
- https://lists.torproject.org/pipermail/tor-dev/2014-October/007654.html
On 22 October 2014 05:48, Roger Dingledine arma@mit.edu wrote:
What I had to do was make one of my Directory Authorities an exit - this let the other nodes start building circuits through the authorities and upload descriptors.
This part seems surprising to me -- directory authorities always publish their dirport whether they've found it reachable or not, and relays publish their descriptors directly to the dirport of each directory authority (not through the Tor network).
So maybe there's a bug that you aren't describing, or maybe you are misunderstanding what you saw?
See also https://trac.torproject.org/projects/tor/ticket/11973
Another problem I ran into was that nodes couldn't conduct reachability tests when I had exits that were only using the Reduced Exit Policy - because it doesn't list the ORPort/DirPort! (I was using nonstandard ports actually, but indeed the reduced exit policy does not include 9001 or 9030.) Looking at the current consensus, there are 40 exits that exit to all ports, and 400-something exits that use the ReducedExitPolicy. It seems like 9001 and 9030 should probably be added to that for reachability tests?
The reachability tests for the ORPort involve extending the circuit to the ORPort -- which doesn't use an exit stream. So your relays should have been able to find themselves reachable, and published a descriptor, even with no exit relays in the network.
Okay, so the behavior I saw, and reproduced, is that reachability tests didn't succeed (and therefore descriptors weren't uploaded) when there were no exits. I think I may have figured out why, but there are some internals I haven't completely figured out. I'm going to lay out what I think and then the parts I'm not completely sure about.
First off, you're (obviously) correct about me misunderstanding extending the circuit via an Exit stream, that's not necessary. But still, I think the lack of Exits stopped the reachability tests from succeeding.
too long; didn't read
I don't think reachability tests happen when there are no Exit nodes because of a quirk in the bootstrapping process, where we never think we have a minimum of directory information.
target function: consider_testing_reachability
A reachability test is conducted from consider_testing_reachability
(I think it's only conducted from here? Although maybe there's other situations it could happen..?) consider_testing_reachability
is called from circuit_send_next_onion_skin
, circuit_testing_opened
, run_scheduled_events
, and directory_info_has_arrived
.
call site #1: directory_info_has_arrived
This is called very frequently on router startup. But consider_testing_reachability
will not be called if router_have_minimum_dir_info
returns false:
void directory_info_has_arrived(time_t now, int from_cache)
{ //...
if (!router_have_minimum_dir_info()) {
//...
return;
} else { /* ... */ }
if (server_mode(options) && !net_is_disabled() && !from_cache &&
(can_complete_circuit || !any_predicted_circuits(now)))
consider_testing_reachability(1, 1);
}
router_have_minimum_dir_info
returns the static variable have_min_dir_info
. This variable is only set to 1 in update_router_have_minimum_dir_info
and then only if there are Exits! Specifically, we will trigger paths < get_frac_paths_needed_for_circs(options,consensus)
because we have 0% of the Exit Bandwidth, as shown by this error message:
Nov 09 22:10:26.000 [notice] I learned some more directory information, but not enough to build a circuit: We need more descriptors: we have 5/5, and can only build 0% of likely paths. (We have 100% of guards bw, 100% of midpoint bw, and 0% of exit bw.)
update_router_have_minimum_dir_info(void)
{ //...
char *status = NULL;
int num_present=0, num_usable=0;
double paths = compute_frac_paths_available(consensus, options, now,
&num_present, &num_usable,
&status);
if (paths < get_frac_paths_needed_for_circs(options,consensus)) {
tor_snprintf(dir_info_status, sizeof(dir_info_status),
"We need more %sdescriptors: we have %d/%d, and "
"can only build %d%% of likely paths. (We have %s.)",
using_md?"micro":"", num_present, num_usable,
(int)(paths*100), status);
//...
res = 0;
goto done;
}
res = 1;
}
done:
if (res && !have_min_dir_info) { /* ... */ }
if (!res && have_min_dir_info) {
int quiet = directory_too_idle_to_fetch_descriptors(options, now);
tor_log(quiet ? LOG_INFO : LOG_NOTICE, LD_DIR,
"Our directory information is no longer up-to-date "
"enough to build circuits: %s", dir_info_status);
/* a) make us log when we next complete a circuit, so we know when Tor
* is back up and usable, and b) disable some activities that Tor
* should only do while circuits are working, like reachability tests
* and fetching bridge descriptors only over circuits. */
can_complete_circuit = 0;
control_event_client_status(LOG_NOTICE, "NOT_ENOUGH_DIR_INFO");
}
have_min_dir_info = res;
}
(The exact source line is in frac_nodes_with_descriptors
, called by compute_frac_paths_available
:)
/** For all nodes in <b>sl</b>, return the fraction of those nodes, weighted
* by their weighted bandwidths with rule <b>rule</b>, for which we have
* descriptors. */
double
frac_nodes_with_descriptors(const smartlist_t *sl,
bandwidth_weight_rule_t rule)
{
//...
if (smartlist_len(sl) == 0)
return 0.0;
This prevents reachability from occurring from directory_info_has_arrived
.
#2 (closed): run_scheduled_events (and call site #3 (closed))
call siteThere's a litany of conditions to call consider_testing_reachability
from run_scheduled_events
. In particular, there's can_complete_circuit
if (time_to_check_descriptor < now && !options->DisableNetwork) {
//...
/* also, check religiously for reachability, if it's within the first
* 20 minutes of our uptime. */
if (is_server &&
(can_complete_circuit || !any_predicted_circuits(now)) &&
!we_are_hibernating()) {
if (stats_n_seconds_working < TIMEOUT_UNTIL_UNREACHABILITY_COMPLAINT) {
consider_testing_reachability(1, dirport_reachability_count==0);
can_complete_circuit
is only set in circuit_send_next_onion_skin
, but then only if a circuit is built and it is not circ->build_state->onehop_tunnel
. I think this means the circuit is a full circuit, complete with Exit. Right?
int circuit_send_next_onion_skin(origin_circuit_t *circ)
{ //...
if (circ->cpath->state == CPATH_STATE_CLOSED) {
// ...
} else {
//...
hop = onion_next_hop_in_cpath(circ->cpath);
if (!hop) {
//...
if (!can_complete_circuit && !circ->build_state->onehop_tunnel) {
can_complete_circuit=1;
/* FFFF Log a count of known routers here */
log_notice(LD_GENERAL,
"Tor has successfully opened a circuit. "
"Looks like client functionality is working.");
//...
if (server_mode(options) && !check_whether_orport_reachable()) {
inform_testing_reachability();
consider_testing_reachability(1, 1);
This is also the third place consider_testing_reachability
is called - there is only one left:
#4 (closed): circuit_testing_opened
call site/** A testing circuit has completed. Take whatever stats we want.
* Noticing reachability is taken care of in onionskin_answer(),
* so there's no need to record anything here. But if we still want
* to do the bandwidth test, and we now have enough testing circuits
* open, do it.
*/
static void
circuit_testing_opened(origin_circuit_t *circ)
{
if (have_performed_bandwidth_test ||
!check_whether_orport_reachable()) {
/* either we've already done everything we want with testing circuits,
* or this testing circuit became open due to a fluke, e.g. we picked
* a last hop where we already had the connection open due to an
* outgoing local circuit. */
circuit_mark_for_close(TO_CIRCUIT(circ), END_CIRC_AT_ORIGIN);
} else if (circuit_enough_testing_circs()) {
router_perform_bandwidth_test(NUM_PARALLEL_TESTING_CIRCS, time(NULL));
have_performed_bandwidth_test = 1;
} else
consider_testing_reachability(1, 0);
}
But... as far as I can tell - a testing circuit is only used for two things: conducting a reachability test and conducting a bandwidth self-test. The only place a bandwidth self-test is called is inside circuit_testing_opened
. So this call of consider_testing_reachability
is a chicken or the egg problem.