All relays are constantly connecting to down relays and failing over and over

changed milestone to %Tor: 0.3.3.x-final in legacy/trac

Whatever we do here, we should make it per-IP-family as part of legacy/trac#24405 (moved).

Deferring various "new"-status enhancement tickets to 0.3.4

Trac:
Milestone: Tor: 0.3.3.x-final to Tor: 0.3.4.x-final

I gave this ticket a title to better reflect its urgency.

It's an important ticket because many fast relays are flooding their conntables or equivalent with this sustained flood of outgoing connection attempts.

Trac:
Summary: If you just failed to extend a circuit to a relay that's down, consider remembering that for a bit to All relays are constantly connecting to down relays and failing over and over

Trac:
Milestone: Tor: 0.3.4.x-final to Tor: 0.3.3.x-final
Keywords: N/A deleted, must-fix-before-033-stable added

When we fix this, we should delay connection attempts if:

there is already a connection attempt in progress (legacy/trac#24841 (moved)), or
a connection attempt recently failed.

Replying to teor:

When we fix this, we should delay connection attempts if:

there is already a connection attempt in progress (legacy/trac#24841 (moved))

I think we already do this part correctly. See the NCIRCS=3 argument for the event -- that's the number of circuits that got queued pending on this connection attempt's outcome.

I can take this one. But I want to get this clear. If I summarize what we have here and what we need to decide:

Fortunately, we do track OR connection failure in rephist, see rep_hist_note_connect_failed() so we could use the or_history_t to know if we should try to connect.
We'll need to decide on a "how long before I retry to reconnect" timing. We could do a simple "backoff algorithm" like 1 sec, 5 sec, 10 sec and give up.
Once I've identified that tor can't reach a specific relay, how long should we consider it "unreachable"? Or should we repeat the above if no retries are pending?
Should we keep the EXTEND cells while waiting for a reasonable amount of time before reconnecting? That is, if we have a failure to connect, should we just DESTROY the circuit or retry to connect and send the cell(s) later?
Should we consider the disconnects and "I've just waited 60 sec before I got anything back from you" as potential failures also to note down?

Trac:
Keywords: N/A deleted, tor-relay, tor-dos, performance added
Status: new to accepted
Priority: Medium to Very High
Owner: N/A to dgoulet

My thoughts for the first design would be:

No need for a complex backoff thing. Just remember the failure for 60 seconds, and during those 60 seconds, send back a destroy immediately. Reducing from n attempts per minute down to 1 per minute should be enough to help us survive until the clients get a consensus update and stop asking us to try.
We should avoid having this cached-failure thing impact client behavior. That is, it should cause other people's circuits to get destroys, but it shouldn't auto-fail our own client attempts. Maybe we should change how the client behaves, but if so, let's do it later, and not introduce subtle breakage in something we'll be considering for backport.
Hm! I was going to say "but rep_hist_note_connect_failed() won't work if the relay isn't in our consensus", but actually, it is simply based on intended identity digest of the next destination, so it does look like we can reuse the data struct. Except, shouldn't we also be caching the IP:port that failed? Otherwise somebody can ask us to extend to a victim digest at an unworkable IP:port, and we'll cache "failure!" and then refuse all the legit cells going to the right address for the next minute.

Replying to arma:

My thoughts for the first design would be:

No need for a complex backoff thing. Just remember the failure for 60 seconds, and during those 60 seconds, send back a destroy immediately. Reducing from n attempts per minute down to 1 per minute should be enough to help us survive until the clients get a consensus update and stop asking us to try.

60 seconds seems sensible. And it's about the timeframe it takes to restart a relay (tor takes 30s to shut down by default). Anything less would be ineffective.

We should avoid having this cached-failure thing impact client behavior. That is, it should cause other people's circuits to get destroys, but it shouldn't auto-fail our own client attempts. Maybe we should change how the client behaves, but if so, let's do it later, and not introduce subtle breakage in something we'll be considering for backport.

Excluding origin circuits seems fine. (So does including them, but only on relays. But we already have backoff and limits on origin circuits.)

Hm! I was going to say "but rep_hist_note_connect_failed() won't work if the relay isn't in our consensus", but actually, it is simply based on intended identity digest of the next destination, so it does look like we can reuse the data struct. Except, shouldn't we also be caching the IP:port that failed? Otherwise somebody can ask us to extend to a victim digest at an unworkable IP:port, and we'll cache "failure!" and then refuse all the legit cells going to the right address for the next minute.

When we introduce IPv6 extends from relays (legacy/trac#24404 (moved)), we only want to fail attempts to a single IP address, and not the whole relay. So dual-stack relays will get two attempts to other dual-stack relays: an IPv4 attempt, and an IPv6 attempt. I like that design, it makes sense to try both.

Ok, after discussion on IRC with arma a bit this morning, I've ended up with a branch.

This is untested (although it compiles and chutney is happy), there are no unit tests nor "in the wild" proper testing done on it for now.

I just want to throw it here as a full draft on this because 1) it adds quite a bit of code and 2) I want a review of the code logic before going in review mode.

Branch: bug24767_033_01

Trac:
Status: accepted to needs_review

(FWIW, this has also been a finding of dawuud months ago , where he scanned the Tor network for partitions and he found that many high bw relays cannot contact other high bw relays. I guess he tried a single circuit build between those relays, so basically his findings are isomorphic to the ones from nusenu.)

Trac:
Keywords: N/A deleted, review-group-32 added

Trac:
Reviewer: N/A to asn

must-fix-before-033-stable is now 033-must

Trac:
Keywords: must-fix-before-033-stable deleted, 033-must added

OnionGit: https://oniongit.eu/dgoulet/tor/merge_requests/21

Pushed unittest in my branch bug24767_033_01 this should provide almost 100% coverage of all the new code.

I also did a review on oniongit.

Trac:
Status: needs_review to needs_revision

Ok review addressed.

The main commit is still "WIP" so once you ACK this branch, I'll have to squash + amend for the upstream merge (and add a changes file).

Second thing, all this code is currently flat in connection_or.c ... maybe we want it in dos.c or some other appropriate file? No strong opinion.

Trac:
Status: needs_revision to needs_review

Fixes LGTM.

The commit at 33628d8 has various mistypes, so please address those too :) Also don't forget to merge my unittest please.

Other than that, LGTM.

Thanks! Fixed the typos (I hope all of them!). I've merged your unit test, squashed all the things, rebased on latest 033 and added a changes file. Putting in merge_ready for nickm eyes.

Oniongit: https://oniongit.eu/dgoulet/tor/merge_requests/22 Branch: bug24767_033_02

Trac:
Status: needs_review to merge_ready

This branch doesn't compile on macOS using clang:

  CC       src/or/connection_or.o
src/or/connection_or.c:1180:23: error: implicit conversion loses integer
      precision: 'uint64_t' (aka 'unsigned long long') to 'unsigned int'
      [-Werror,-Wshorten-64-to-32]
  unsigned int hash = tor_addr_hash(&entry->addr);
               ~~~~   ^~~~~~~~~~~~~~~~~~~~~~~~~~~
src/or/connection_or.c:1187:67: error: extra ';' outside of a function
      [-Werror,-Wextra-semi]
             or_connect_failure_ht_hash, or_connect_failure_ht_eq);
                                                                  ^
src/or/connection_or.c:1191:48: error: extra ';' outside of a function
      [-Werror,-Wextra-semi]
             0.6, tor_reallocarray_, tor_free_);

Trac:
Status: merge_ready to needs_revision

Trac:
Reviewer: asn to asn, teor

There are two design issues here:

the HT implementation only uses 32 bit hashes on 64-bit LP64 systems, like macOS and the BSDs, and 32-bit systems (split off into legacy/trac#25365 (moved))
the port is user-controlled, so it needs to be hashed before being combined with the other hashes

I fixed the port issue, and the compilation issues in my branch bug24767_033_02 on https://github.com/teor2345/tor.git

Feel free to squash my extra commits if you'd like.

Trac:
Status: needs_revision to needs_review

Replying to teor:

There are two design issues here:

the HT implementation only uses 32 bit hashes on 64-bit LP64 systems, like macOS and the BSDs, and 32-bit systems (split off into legacy/trac#25365 (moved))

the port is user-controlled, so it needs to be hashed before being combined with the other hashes

What is the danger here? The length is fixed so what is the difference between "+= 42" or "+= h(42)" ?

I fixed the port issue, and the compilation issues in my branch bug24767_033_02 on https://github.com/teor2345/tor.git

For the hash fix, wouldn't it be more efficient to then combined addr + digest + port in a buffer and siphash that instead of doing 3 rounds of siphash?

Replying to dgoulet:

Replying to teor:

There are two design issues here:

the HT implementation only uses 32 bit hashes on 64-bit LP64 systems, like macOS and the BSDs, and 32-bit systems (split off into legacy/trac#25365 (moved))

the port is user-controlled, so it needs to be hashed before being combined with the other hashes

What is the danger here? The length is fixed so what is the difference between "+= 42" or "+= h(42)" ?

The hash needs to be unpredictable so that bad clients can't fill up one of your hash table buckets and cause your relay to slow down. Adding the raw port gives the client direct control over 16 bits of your hash result, which makes the hash table less secure.

I fixed the port issue, and the compilation issues in my branch bug24767_033_02 on https://github.com/teor2345/tor.git

For the hash fix, wouldn't it be more efficient to then combined addr + digest + port in a buffer and siphash that instead of doing 3 rounds of siphash?

Hashing digest and port in a single buffer would be an easy win. Hashing an extra buffer as part of tor_addr_hash() would also be great, because it avoids combining hashes with +. But it would need a new argument to tor_addr_hash(). And we could fix the other use of tor_addr_hash() at the same time.

Two fixup commits pushed to address teor's review above. First one to fix the hashing and the second for the compilation issue.

Same OnionGit Branch: bug24767_033_02

Trac:
Keywords: N/A deleted, review-group-34 added

Trac:
Reviewer: asn, teor to asn

Marking all tickets reached by current round of 033 triage.

Trac:
Keywords: N/A deleted, 033-triage-20180320 added

Mark 033-must tickets as triaged-in for 0.3.3

Trac:
Keywords: N/A deleted, 033-included-20180320 added

LGTM! :)

Trac:
Status: needs_review to merge_ready

So on the hash function: You can't use a memcpy(tor_addr_t) like that, because some of the bytes of the tor_addr_t can be unspecified. I can fix that up.

On unable_connect_since_ts: Do we ever clear this field (re-set it to zero) on success? It seems like it just stays set forever. If so maybe a better name would be something like "last_failed_connect_ts"?

Trac:
Status: merge_ready to needs_revision

Oh, and one more thing I noticed: It is not enough to just check "node_get_pref_orport(node, &node_addr_port);" -- the node may have multiple addresses, and "pref" just returns the one you would prefer to use yourself.

Replying to nickm:

So on the hash function: You can't use a memcpy(tor_addr_t) like that, because some of the bytes of the tor_addr_t can be unspecified. I can fix that up.

Oh damn because of the union, if it is a v4, we'll copy bunch of uninit bytes from the larger v6. I guess setting the size according to the family is the way to do it.

On unable_connect_since_ts: Do we ever clear this field (re-set it to zero) on success? It seems like it just stays set forever. If so maybe a better name would be something like "last_failed_connect_ts"?

It is a timestamp, we don't need to reset it. It always being compared to a cutoff so even thought it is set once and never touched again, that is fine since it will at some point always pass the cutoff.

Oh, and one more thing I noticed: It is not enough to just check "node_get_pref_orport(node, &node_addr_port);" -- the node may have multiple addresses, and "pref" just returns the one you would prefer to use yourself.

Not sure it is a problem here. We use node_get_pref_orport() only to make sure the or connection we are about to note down as a failure matches the node_t we have from the identity digest. In this case, we use the node_t to keep the timestamp. If it doesn't match (because other port/addr for the identity), we note it down as a "unknown".

This ends up to be the same result but just more precisely indexed by addr + port + digest always.

Replying to dgoulet:

Replying to nickm:

So on the hash function: You can't use a memcpy(tor_addr_t) like that, because some of the bytes of the tor_addr_t can be unspecified. I can fix that up.

Oh damn because of the union, if it is a v4, we'll copy bunch of uninit bytes from the larger v6. I guess setting the size according to the family is the way to do it.

Ok, did an attempt of this in fixup 53a26acb6172b7eb. Not the prettiest but I don't see a way. Maybe we can abstract that in an inline function somehow like "tor_addr_get_ptr_size()" or sth around those lines?

Still branch: bug24767_033_02

Trac:
Status: needs_revision to needs_review

Replying to dgoulet:

Replying to nickm:

So on the hash function: You can't use a memcpy(tor_addr_t) like that, because some of the bytes of the tor_addr_t can be unspecified. I can fix that up.

Oh damn because of the union, if it is a v4, we'll copy bunch of uninit bytes from the larger v6. I guess setting the size according to the family is the way to do it.

On unable_connect_since_ts: Do we ever clear this field (re-set it to zero) on success? It seems like it just stays set forever. If so maybe a better name would be something like "last_failed_connect_ts"?

It is a timestamp, we don't need to reset it. It always being compared to a cutoff so even thought it is set once and never touched again, that is fine since it will at some point always pass the cutoff.

Is it okay with you if I change the name anyway? The problem is that the "since" implies a continuous condition. If I say "I have been unable to eat broccoli since 1982", it does not mean only that in 1982 I failed to eat broccoli: it also means that at every time since 1982, I have also been unable to eat broccoli.

Oh, and one more thing I noticed: It is not enough to just check "node_get_pref_orport(node, &node_addr_port);" -- the node may have multiple addresses, and "pref" just returns the one you would prefer to use yourself.

Not sure it is a problem here. We use node_get_pref_orport() only to make sure the or connection we are about to note down as a failure matches the node_t we have from the identity digest. In this case, we use the node_t to keep the timestamp. If it doesn't match (because other port/addr for the identity), we note it down as a "unknown".

This ends up to be the same result but just more precisely indexed by addr + port + digest always.

But the preferred address can change while Tor is running, I think, based on configuration options. That would make the value cached in the node become incorrect, and that's why I think we should maybe just not have this in the node_t at all.

53a26acb6172b7eb looks okay to me. I'll write another patch to simplify it after this is merged.

Is it okay with you if I change the name anyway? The problem is that the "since" implies a continuous condition. If I say "I have been unable to eat broccoli since 1982", it does not mean only that in 1982 I failed to eat broccoli: it also means that at every time since 1982, I have also been unable to eat broccoli.

Renamed unable_connect_since_ts in commit 0410a5e49c.

But the preferred address can change while Tor is running, I think, based on configuration options. That would make the value cached in the node become incorrect, and that's why I think we should maybe just not have this in the node_t at all.

If a node_t preferred address changes, indeed we won't match the connection to a node_t anymore and thus we will fallback to using the cache for unknown nodes. And that object in the cache will be used until the node_t gets modified by I guess the new descriptor at some point. So all in all, we'll still be tracking connection failure correctly.

The only reason why I'm still a bit hesitating on using the "unknown cache" for everything is maybe because of performance. If we would use that, at every connection, we would need to do a lookup where the hash function is a bit heavier with this addr + port + digest triplet than simply looking up the node list. Furthermore, every 60 seconds (OR_CONNECT_FAILURE_CLEANUP_INTERVAL), we cleanup that cache meaning we can iterate over thousands of objects in there (potentially more than the number of relays). I don't think it would be a significant cost but still important to keep in mind. Keeping it tiny can worth it.

However, just using the cache (and not the node_t), would make things a bit more simpler in the code, we would just need to accept the additional memory and lookup/iteration price.

My testing on a real relays shows that the majority of the times, we'll get our information out of the node_t.

At this point, I can lean towards both sides, I might just need a second opinion?

Discussing this with nickm on IRC. We'll move this to only the cache instead of using the node_t to store the timestamp.

We'll also rename the "unknown cache" to something more meaningful like "node_addr_track_cache" or sth that tells us that we are actually tracking.

Trac:
Status: needs_review to needs_revision

I've created a _03 branch which squashes everything because the latest changes basically modified half patch. Code should be simpler and more straight forward to understand. The use of the node_t has been completely removed.

PR: https://github.com/torproject/tor/pull/29 Branch: bug24767_033_03

Trac:
Status: needs_revision to needs_review

Much better!

On a meta level, should we open a ticket about writing this behavior on our spec? Also, would it be better in the future, if we switch from static 60 seconds lifetime, to some sort of backoff algorithm, or to a consensus param? Should we open a ticket for this?

Trac:
Status: needs_review to merge_ready

Okay; much simpler. I've merged it to maint-0.3.3 and forward. Let's close this once there are follow-up tickets for the issues that asn raises in comment 41 above?

Replying to asn:

Much better!

On a meta level, should we open a ticket about writing this behavior on our spec? Also, would it be better in the future, if we switch from static 60 seconds lifetime, to some sort of backoff algorithm, or to a consensus param? Should we open a ticket for this?

Yes! I think we could do much better in a second iteration. A backoff of some sort with knobs we can change through the consensus.

Ticket legacy/trac#25674 (moved) for the possible improvement. Closing this.

Trac:
Resolution: N/A to fixed
Status: merge_ready to closed

closed

mentioned in issue legacy/trac#24841 (moved)

mentioned in issue legacy/trac#25061 (moved)

mentioned in issue legacy/trac#25674 (moved)

All relays are constantly connecting to down relays and failing over and over

Child items ...

Activity