It seems to me that if just a moment ago you tried to connect to a relay and you failed for reason connectrefused, then maybe you already know what the answer is going to be for other circuits that show up in that second.
How long should we cache the answer for? And, are there any anonymity implications to this design change? (We already batch circuit extend requests as they're waiting for the connect request to finish or fail, so I think whatever the anonymity implications are, maybe we already have them?)
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
I gave this ticket a title to better reflect its urgency.
It's an important ticket because many fast relays are flooding their conntables or equivalent with this sustained flood of outgoing connection attempts.
Trac: Summary: If you just failed to extend a circuit to a relay that's down, consider remembering that for a bit to All relays are constantly connecting to down relays and failing over and over
I think we already do this part correctly. See the NCIRCS=3 argument for the event -- that's the number of circuits that got queued pending on this connection attempt's outcome.
I can take this one. But I want to get this clear. If I summarize what we have here and what we need to decide:
Fortunately, we do track OR connection failure in rephist, see rep_hist_note_connect_failed() so we could use the or_history_t to know if we should try to connect.
We'll need to decide on a "how long before I retry to reconnect" timing. We could do a simple "backoff algorithm" like 1 sec, 5 sec, 10 sec and give up.
Once I've identified that tor can't reach a specific relay, how long should we consider it "unreachable"? Or should we repeat the above if no retries are pending?
Should we keep the EXTEND cells while waiting for a reasonable amount of time before reconnecting? That is, if we have a failure to connect, should we just DESTROY the circuit or retry to connect and send the cell(s) later?
Should we consider the disconnects and "I've just waited 60 sec before I got anything back from you" as potential failures also to note down?
Trac: Keywords: N/Adeleted, tor-relay, tor-dos, performance added Status: new to accepted Priority: Medium to Very High Owner: N/Ato dgoulet
No need for a complex backoff thing. Just remember the failure for 60 seconds, and during those 60 seconds, send back a destroy immediately. Reducing from n attempts per minute down to 1 per minute should be enough to help us survive until the clients get a consensus update and stop asking us to try.
We should avoid having this cached-failure thing impact client behavior. That is, it should cause other people's circuits to get destroys, but it shouldn't auto-fail our own client attempts. Maybe we should change how the client behaves, but if so, let's do it later, and not introduce subtle breakage in something we'll be considering for backport.
Hm! I was going to say "but rep_hist_note_connect_failed() won't work if the relay isn't in our consensus", but actually, it is simply based on intended identity digest of the next destination, so it does look like we can reuse the data struct. Except, shouldn't we also be caching the IP:port that failed? Otherwise somebody can ask us to extend to a victim digest at an unworkable IP:port, and we'll cache "failure!" and then refuse all the legit cells going to the right address for the next minute.
No need for a complex backoff thing. Just remember the failure for 60 seconds, and during those 60 seconds, send back a destroy immediately. Reducing from n attempts per minute down to 1 per minute should be enough to help us survive until the clients get a consensus update and stop asking us to try.
60 seconds seems sensible.
And it's about the timeframe it takes to restart a relay (tor takes 30s to shut down by default).
Anything less would be ineffective.
We should avoid having this cached-failure thing impact client behavior. That is, it should cause other people's circuits to get destroys, but it shouldn't auto-fail our own client attempts. Maybe we should change how the client behaves, but if so, let's do it later, and not introduce subtle breakage in something we'll be considering for backport.
Excluding origin circuits seems fine.
(So does including them, but only on relays. But we already have backoff and limits on origin circuits.)
Hm! I was going to say "but rep_hist_note_connect_failed() won't work if the relay isn't in our consensus", but actually, it is simply based on intended identity digest of the next destination, so it does look like we can reuse the data struct. Except, shouldn't we also be caching the IP:port that failed? Otherwise somebody can ask us to extend to a victim digest at an unworkable IP:port, and we'll cache "failure!" and then refuse all the legit cells going to the right address for the next minute.
When we introduce IPv6 extends from relays (legacy/trac#24404 (moved)), we only want to fail attempts to a single IP address, and not the whole relay. So dual-stack relays will get two attempts to other dual-stack relays: an IPv4 attempt, and an IPv6 attempt. I like that design, it makes sense to try both.
Ok, after discussion on IRC with arma a bit this morning, I've ended up with a branch.
This is untested (although it compiles and chutney is happy), there are no unit tests nor "in the wild" proper testing done on it for now.
I just want to throw it here as a full draft on this because 1) it adds quite a bit of code and 2) I want a review of the code logic before going in review mode.
(FWIW, this has also been a finding of dawuud months ago , where he scanned the Tor network for partitions and he found that many high bw relays cannot contact other high bw relays. I guess he tried a single circuit build between those relays, so basically his findings are isomorphic to the ones from nusenu.)
Thanks! Fixed the typos (I hope all of them!). I've merged your unit test, squashed all the things, rebased on latest 033 and added a changes file. Putting in merge_ready for nickm eyes.
CC src/or/connection_or.osrc/or/connection_or.c:1180:23: error: implicit conversion loses integer precision: 'uint64_t' (aka 'unsigned long long') to 'unsigned int' [-Werror,-Wshorten-64-to-32] unsigned int hash = tor_addr_hash(&entry->addr); ~~~~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~src/or/connection_or.c:1187:67: error: extra ';' outside of a function [-Werror,-Wextra-semi] or_connect_failure_ht_hash, or_connect_failure_ht_eq); ^src/or/connection_or.c:1191:48: error: extra ';' outside of a function [-Werror,-Wextra-semi] 0.6, tor_reallocarray_, tor_free_);
the HT implementation only uses 32 bit hashes on 64-bit LP64 systems, like macOS and the BSDs, and 32-bit systems (split off into legacy/trac#25365 (moved))
the port is user-controlled, so it needs to be hashed before being combined with the other hashes
the HT implementation only uses 32 bit hashes on 64-bit LP64 systems, like macOS and the BSDs, and 32-bit systems (split off into legacy/trac#25365 (moved))
the port is user-controlled, so it needs to be hashed before being combined with the other hashes
What is the danger here? The length is fixed so what is the difference between "+= 42" or "+= h(42)" ?
For the hash fix, wouldn't it be more efficient to then combined addr + digest + port in a buffer and siphash that instead of doing 3 rounds of siphash?
the HT implementation only uses 32 bit hashes on 64-bit LP64 systems, like macOS and the BSDs, and 32-bit systems (split off into legacy/trac#25365 (moved))
the port is user-controlled, so it needs to be hashed before being combined with the other hashes
What is the danger here? The length is fixed so what is the difference between "+= 42" or "+= h(42)" ?
The hash needs to be unpredictable so that bad clients can't fill up one of your hash table buckets and cause your relay to slow down.
Adding the raw port gives the client direct control over 16 bits of your hash result, which makes the hash table less secure.
For the hash fix, wouldn't it be more efficient to then combined addr + digest + port in a buffer and siphash that instead of doing 3 rounds of siphash?
Hashing digest and port in a single buffer would be an easy win.
Hashing an extra buffer as part of tor_addr_hash() would also be great, because it avoids combining hashes with +.
But it would need a new argument to tor_addr_hash().
And we could fix the other use of tor_addr_hash() at the same time.
So on the hash function: You can't use a memcpy(tor_addr_t) like that, because some of the bytes of the tor_addr_t can be unspecified. I can fix that up.
On unable_connect_since_ts: Do we ever clear this field (re-set it to zero) on success? It seems like it just stays set forever. If so maybe a better name would be something like "last_failed_connect_ts"?
Oh, and one more thing I noticed: It is not enough to just check "node_get_pref_orport(node, &node_addr_port);" -- the node may have multiple addresses, and "pref" just returns the one you would prefer to use yourself.
So on the hash function: You can't use a memcpy(tor_addr_t) like that, because some of the bytes of the tor_addr_t can be unspecified. I can fix that up.
Oh damn because of the union, if it is a v4, we'll copy bunch of uninit bytes from the larger v6. I guess setting the size according to the family is the way to do it.
On unable_connect_since_ts: Do we ever clear this field (re-set it to zero) on success? It seems like it just stays set forever. If so maybe a better name would be something like "last_failed_connect_ts"?
It is a timestamp, we don't need to reset it. It always being compared to a cutoff so even thought it is set once and never touched again, that is fine since it will at some point always pass the cutoff.
Oh, and one more thing I noticed: It is not enough to just check "node_get_pref_orport(node, &node_addr_port);" -- the node may have multiple addresses, and "pref" just returns the one you would prefer to use yourself.
Not sure it is a problem here. We use node_get_pref_orport() only to make sure the or connection we are about to note down as a failure matches the node_t we have from the identity digest. In this case, we use the node_t to keep the timestamp. If it doesn't match (because other port/addr for the identity), we note it down as a "unknown".
This ends up to be the same result but just more precisely indexed by addr + port + digest always.
So on the hash function: You can't use a memcpy(tor_addr_t) like that, because some of the bytes of the tor_addr_t can be unspecified. I can fix that up.
Oh damn because of the union, if it is a v4, we'll copy bunch of uninit bytes from the larger v6. I guess setting the size according to the family is the way to do it.
Ok, did an attempt of this in fixup 53a26acb6172b7eb. Not the prettiest but I don't see a way. Maybe we can abstract that in an inline function somehow like "tor_addr_get_ptr_size()" or sth around those lines?
So on the hash function: You can't use a memcpy(tor_addr_t) like that, because some of the bytes of the tor_addr_t can be unspecified. I can fix that up.
Oh damn because of the union, if it is a v4, we'll copy bunch of uninit bytes from the larger v6. I guess setting the size according to the family is the way to do it.
On unable_connect_since_ts: Do we ever clear this field (re-set it to zero) on success? It seems like it just stays set forever. If so maybe a better name would be something like "last_failed_connect_ts"?
It is a timestamp, we don't need to reset it. It always being compared to a cutoff so even thought it is set once and never touched again, that is fine since it will at some point always pass the cutoff.
Is it okay with you if I change the name anyway? The problem is that the "since" implies a continuous condition. If I say "I have been unable to eat broccoli since 1982", it does not mean only that in 1982 I failed to eat broccoli: it also means that at every time since 1982, I have also been unable to eat broccoli.
Oh, and one more thing I noticed: It is not enough to just check "node_get_pref_orport(node, &node_addr_port);" -- the node may have multiple addresses, and "pref" just returns the one you would prefer to use yourself.
Not sure it is a problem here. We use node_get_pref_orport() only to make sure the or connection we are about to note down as a failure matches the node_t we have from the identity digest. In this case, we use the node_t to keep the timestamp. If it doesn't match (because other port/addr for the identity), we note it down as a "unknown".
This ends up to be the same result but just more precisely indexed by addr + port + digest always.
But the preferred address can change while Tor is running, I think, based on configuration options. That would make the value cached in the node become incorrect, and that's why I think we should maybe just not have this in the node_t at all.
Is it okay with you if I change the name anyway? The problem is that the "since" implies a continuous condition. If I say "I have been unable to eat broccoli since 1982", it does not mean only that in 1982 I failed to eat broccoli: it also means that at every time since 1982, I have also been unable to eat broccoli.
Renamed unable_connect_since_ts in commit 0410a5e49c.
But the preferred address can change while Tor is running, I think, based on configuration options. That would make the value cached in the node become incorrect, and that's why I think we should maybe just not have this in the node_t at all.
If a node_t preferred address changes, indeed we won't match the connection to a node_t anymore and thus we will fallback to using the cache for unknown nodes. And that object in the cache will be used until the node_t gets modified by I guess the new descriptor at some point. So all in all, we'll still be tracking connection failure correctly.
The only reason why I'm still a bit hesitating on using the "unknown cache" for everything is maybe because of performance. If we would use that, at every connection, we would need to do a lookup where the hash function is a bit heavier with this addr + port + digest triplet than simply looking up the node list. Furthermore, every 60 seconds (OR_CONNECT_FAILURE_CLEANUP_INTERVAL), we cleanup that cache meaning we can iterate over thousands of objects in there (potentially more than the number of relays). I don't think it would be a significant cost but still important to keep in mind. Keeping it tiny can worth it.
However, just using the cache (and not the node_t), would make things a bit more simpler in the code, we would just need to accept the additional memory and lookup/iteration price.
My testing on a real relays shows that the majority of the times, we'll get our information out of the node_t.
At this point, I can lean towards both sides, I might just need a second opinion?
I've created a _03 branch which squashes everything because the latest changes basically modified half patch. Code should be simpler and more straight forward to understand. The use of the node_t has been completely removed.
On a meta level, should we open a ticket about writing this behavior on our spec? Also, would it be better in the future, if we switch from static 60 seconds lifetime, to some sort of backoff algorithm, or to a consensus param? Should we open a ticket for this?
Okay; much simpler. I've merged it to maint-0.3.3 and forward. Let's close this once there are follow-up tickets for the issues that asn raises in comment 41 above?
On a meta level, should we open a ticket about writing this behavior on our spec? Also, would it be better in the future, if we switch from static 60 seconds lifetime, to some sort of backoff algorithm, or to a consensus param? Should we open a ticket for this?
Yes! I think we could do much better in a second iteration. A backoff of some sort with knobs we can change through the consensus.