Tor issueshttps://gitlab.torproject.org/tpo/core/tor/-/issues2022-01-18T15:04:09Zhttps://gitlab.torproject.org/tpo/core/tor/-/issues/40006tls: `buf_read_from_tls()` is exiting its loop too early2022-01-18T15:04:09ZDavid Gouletdgoulet@torproject.orgtls: `buf_read_from_tls()` is exiting its loop too earlyThis is a bit of a deep dive so you've been warned.
#### Context
The `buf_read_from_tls()` function is called when a connection is ready to read data from a TLS socket into its inbuf (`buf_t`).
That function loops over the total lengt...This is a bit of a deep dive so you've been warned.
#### Context
The `buf_read_from_tls()` function is called when a connection is ready to read data from a TLS socket into its inbuf (`buf_t`).
That function loops over the total length of data we are allowed to consume (that value is affected by bandwidth limitation). The general case is 16k bytes or 32 cells being the hard coded maximum allowed per mainloop round of a connection read. The parameter is `at_most` and it is the maximum amount of data we are asking to read.
#### Wow OpenSSL
Within the read loop, we call `read_to_chunk_tls()` which essentially boils down to calling `SSL_read(at_most)` (with tor using OpenSSL). It is given the `at_most` bytes value and thus the expectation is that we'll get that amount of bytes back from the TLS socket.
Now it turns out that `SSL_read()` only returns a TLS record and will buffer the rest up to `at_most` if that limit is higher than a record size (in our case a record size is 4096 bytes).
See the man page for a better explanation. I might of not fully understood it but experimental data shows that we get 4k bytes every time: https://www.openssl.org/docs/man1.0.2/man3/SSL_read.html
And thus the takeaway here is that `read_to_chunk_tls()` called with the upper limit of `16k` bytes will always only return `4k` bytes.
#### The Bug
Putting everything together, lets look at the `read_to_chunk_tls()` reading loop, see if you can spot the problem:
```
while (at_most > total_read) {
[...]
r = read_to_chunk_tls(buf, chunk, tls, readlen);
if (r < 0)
return r; /* Error */
tor_assert(total_read+r <= BUF_MAX_LEN);
total_read += r;
if ((size_t)r < readlen) /* eof, block, or no more to read. */
break;
}
```
Notice the end of the loop `if` condition, it believes that if the returned length is smaller than the one asked, we've reached EOF (exactly like `read()` behaves). But in the case of `SSL_read()`, this is not true and so we bail every time after reading only a fraction of `at_most` that is one single TLS record.
I was able to confirm by adding custom `log_debug`:
```
Jun 19 13:53:13.551 [debug] buf_read_from_tls(): tls: read len: 14858, at_most: 14858, total: 0
Jun 19 13:53:13.551 [debug] tor_tls_read(): tls read: at_most 14858, read: 4048
```
You can see that the read is only `4048` (TLS overhead is removed and thus less than 4096).
#### Solution
Obvious solution here would be to remove that `if` condition and let tor consume up to the `at_most` value of bytes.
I did run an experiment with this and was able to see my onion service consuming 32 cells per round instead of 8.
#### Performance
Now, the naive thinking here would be "Ok wow, if we make tor read 4 times more cells, will tor cell relaying will improve by 4 folds?". In practice, probably not.
But this will obviously improve performance in some ways. Reason is that take KIST scheduler that runs every 10 msec. Now, it will be able to make a decision based on 4 times the amount of cells it used to see. If the TCP window allows it, more cells will go on the wire. But also, I strongly believe EWMA priority will get better here because circuit will see more cells and thus the back offs will get sharper per circuit instead of constantly hovering at most around 8-16 cells per round.
It remains unclear if this will trigger more congestion on the network for instance or buffer bloating or anything unexpected. Because of that, we might want to not backport this right away and see a progression from newer relays instead.
And of course, we need the shadow people to help us measure this quantitatively :).
**Note**: I was able to find this issue because I was measuring, with the tracing branch #32910, the onion service timings to handle lots of INTRO2 cells under DDoS and noticed that we would never processed more than 8 cells per mainloop round.David Gouletdgoulet@torproject.orgDavid Gouletdgoulet@torproject.orghttps://gitlab.torproject.org/tpo/core/tor/-/issues/40400Make `ProtocolWarnings 1` less noisy to run2021-10-20T20:25:32ZNick MathewsonMake `ProtocolWarnings 1` less noisy to runWhile investigating the right fix for #40175, Sebastian reminded me that we get lots of frequent protocol warnings. We should make these less frequent so that it's less of a burden to run Tor with protocol warnings enabled.
The most co...While investigating the right fix for #40175, Sebastian reminded me that we get lots of frequent protocol warnings. We should make these less frequent so that it's less of a burden to run Tor with protocol warnings enabled.
The most common warnings that Sebastian encountered are:
```
84853 Rejecting RENDEZVOUS1 cell with unrecognized rendezvous cookie
9975 Didn't recognize cell, but circ stops here
10002 circuit_receive_relay_cell (forward) failed
```
Options are:
* Downgrade to INFO or DEBUG.
* Add rate-limiting
* Identify cases that truly should be protocol warnings, and downgrade or rate-limit the other cases.Tor: 0.4.7.x-freezeNick MathewsonNick Mathewsonhttps://gitlab.torproject.org/tpo/core/tor/-/issues/40019git hooks: Stop running practracker constantly2020-10-22T15:12:33ZDavid Gouletdgoulet@torproject.orggit hooks: Stop running practracker constantlyPractracker runs on every commit with the pre-commit hook. It does _not_ run on `maint-` branches but it runs on master but also on EVERY development branch anyone does.
The reason is simple, it is that we never merge things that breaks...Practracker runs on every commit with the pre-commit hook. It does _not_ run on `maint-` branches but it runs on master but also on EVERY development branch anyone does.
The reason is simple, it is that we never merge things that breaks practracker into master so makes sense that every dev branch is tested.
However, I'm arguing here that running practracker constantly is a waste of time and extremely annoying for development.
There is value in practracker but I think we could simply run it once in a while, maybe every release, update the exception file and be done with it. There is even an argument to say that one should only run it in order to measure our technical debt in some ways or wanting to start fixing part of it.
Again, I don't see value in constantly having a perfectly up to date `exceptions.txt` file. There is value that we have the tool available in our repository but not being run all the time. Yes, it can break but this is why we can simply run it as part of our release process.
Thus I vote to remove it from our `pre-commit-git.hook` file.
Thoughts @tpo/core ?David Gouletdgoulet@torproject.orgDavid Gouletdgoulet@torproject.orghttps://gitlab.torproject.org/tpo/core/tor/-/issues/40338Don't exit if a service directory has non-private permissions when using Test...2021-07-01T13:56:48ZoparaDon't exit if a service directory has non-private permissions when using TestingTorNetworkWhen running tor with `TestingTorNetwork` enabled, don't exit if the hs directory has other/world permissions set.
For Tor experiments with configurations that are tracked in git, it is a bit of a pain to fix up ownership of the configu...When running tor with `TestingTorNetwork` enabled, don't exit if the hs directory has other/world permissions set.
For Tor experiments with configurations that are tracked in git, it is a bit of a pain to fix up ownership of the configuration files before running tor. To work around this, Shadow puts all of the files in an archive to preserve permissions and [commits the archive to git](https://github.com/shadow/shadow-plugin-tor/tree/main/resource), but it would be nice to be able to commit the configuration files to git directly and track changes properly.
In almost all cases where the permissions are incorrect, tor will automatically fix them and continue running, but the only exception to this that I can find is the hidden service directory.
As the `TestingTorNetwork` is explicitly about running a testing Tor network, I don't see any security implications of disabling the permissions check when this option is enabled.
(PR incoming...)https://gitlab.torproject.org/tpo/core/tor/-/issues/40202Ressurect Conflux Proposal2021-04-06T14:58:36ZMike PerryRessurect Conflux Proposal@dgoulet wrote an initial conflux proposal draft. In the network-team meeting on Nov 17th, we discussed reviving it, but based on Proposal 324.
Things we definitely need:
- Unreliable conflux (no resumption windows)
- Use Prop324 cong...@dgoulet wrote an initial conflux proposal draft. In the network-team meeting on Nov 17th, we discussed reviving it, but based on Proposal 324.
Things we definitely need:
- Unreliable conflux (no resumption windows)
- Use Prop324 congestion window information to decide which circ to send data on
- Add sequence numbers to relay cells on each leg (Prop 325? Or other approach?)
- Minimize the reorder window at Exits for stream assembly (pushback tradeoffs, etc)
- Side channel analysis section
Nice to have:
- Pre-emptive circuit discussion
- Minimize client reorder window
- Resumption if a circuit leg dies
This is Sponsor 61, Objective 3.1 material.Sponsor 61 - Making the Tor network faster & more reliable for users in Internet-repressive placesMike PerryMike Perryhttps://gitlab.torproject.org/tpo/core/tor/-/issues/40192add unix Socket support to MetricsPort2024-03-05T15:38:50Znusenuadd unix Socket support to MetricsPorttor recently got support for `MetricsPort` in v0.4.5.1-alpha (#40063).
According to the man page MetricsPort does not support unix sockets (like SocksPort or ControlSocket).
Using a socket allows for better access control than for examp...tor recently got support for `MetricsPort` in v0.4.5.1-alpha (#40063).
According to the man page MetricsPort does not support unix sockets (like SocksPort or ControlSocket).
Using a socket allows for better access control than for example binding it to localhost and only allowing localhost in MetricsPortPolicy because that would allow any local process to access it.Alexander Færøyahf@torproject.orgAlexander Færøyahf@torproject.orghttps://gitlab.torproject.org/tpo/core/tor/-/issues/40127Link Tor's code into one big .a file.2020-10-23T14:36:38ZNick MathewsonLink Tor's code into one big .a file.Managing all our little libraries has proved too much for too many downstream apps. Let's make one big static library.Managing all our little libraries has proved too much for too many downstream apps. Let's make one big static library.Tor: 0.4.5.x-stableNick MathewsonNick Mathewsonhttps://gitlab.torproject.org/tpo/core/tor/-/issues/40017Inbuf for outgoing SOCKS 4 proxy not cleared before reading from OR connection2022-07-07T00:48:30ZoparaInbuf for outgoing SOCKS 4 proxy not cleared before reading from OR connectionWhen configured to use an outgoing SOCKS 4 proxy (as set with `Socks4Proxy ip:port`), tor does not clear the connection buffer after the SOCKS handshake and before reading bytes from the OR connection. A proxy can send extra characters a...When configured to use an outgoing SOCKS 4 proxy (as set with `Socks4Proxy ip:port`), tor does not clear the connection buffer after the SOCKS handshake and before reading bytes from the OR connection. A proxy can send extra characters at the end of the handshake which will be stored on the OR connection inbuf as if they were authenticated bytes from the first hop.
This is not actually exploitable because of a check in `connection_or_process_inbuf()`, which will close the connection if it sees that there are bytes on the inbuf while the connection is in the TLS handshaking state. This check was originally designed to protect relays and not clients, but luckily it actually protects the client in this case as well.
```C
/* This check was necessary with 0.2.2, when the TLS_SERVER_RENEGOTIATING
* check would otherwise just let data accumulate. It serves no purpose
* in 0.2.3.
*
* XXXX Remove this check once we verify that the above paragraph is
* 100% true. */
if (buf_datalen(conn->base_.inbuf) > MAX_OR_INBUF_WHEN_NONOPEN) {
log_fn(LOG_PROTOCOL_WARN, LD_NET, "Accumulated too much data (%d bytes) "
"on nonopen OR connection %s %s:%u in state %s; closing.",
(int)buf_datalen(conn->base_.inbuf),
connection_or_nonopen_was_started_here(conn) ? "to" : "from",
conn->base_.address, conn->base_.port,
conn_state_to_string(conn->base_.type, conn->base_.state));
connection_or_close_for_error(conn, 0);
ret = -1;
}
```
I think that the comments here should be changed to show that this check is still important, and it would be good to also clear the inbuf when transitioning out of the `OR_CONN_STATE_PROXY_HANDSHAKING` state. The best solution would probably be to not use a single buffer for both authenticated and unauthenticated data, but this would probably take more work. I have also not looked at the other proxy types, but the SOCKS 5 proxy looks okay at first glance.
If you remove the check above (as mentioned in the comment), a SOCKS 4 proxy that appends for example `\x00\x00\x07\x00\x00\x01` (a VERSIONS cell) to the proxy reply will cause the tor client to log:
```
channel_tls_process_versions_cell(): Couldn't find a version in common between my version list and the list in the VERSIONS cell; closing connection.
```
This only causes the client to drop the connection (so the client can't connect to the tor network), but as the proxy would be able to bypass the TLS authentication for the first cells on a connection (*if the check above were removed*) and make the client think that it received cells that the first hop didn't send, there could potentially be other interesting things you could do with other cell types.Tor: 0.4.5.x-stableNick MathewsonNick Mathewsonhttps://gitlab.torproject.org/tpo/core/tor/-/issues/33763Consider setting TCP_NOTSENT_LOWAT to 1 byte on our TCP connections2022-06-22T15:41:05ZDavid Gouletdgoulet@torproject.orgConsider setting TCP_NOTSENT_LOWAT to 1 byte on our TCP connectionsThis TCP option would provide us and important performance benefit at the relay side.
In an effort to work on a new tor design to reduce kernel queuing delays, see it as an improvement to KIST and tor architecture, Rob and I submitted t...This TCP option would provide us and important performance benefit at the relay side.
In an effort to work on a new tor design to reduce kernel queuing delays, see it as an improvement to KIST and tor architecture, Rob and I submitted this paper to NetDev 0x14 conference in hope of sitting down with the Linux kernel developers and network experts about this idea.
https://www.robgjansen.com/publications/epollcwnd-netdev2020.pdf
Ultimately we want a new poll TCP option in the kernel which our experimentation shows how it improves performance drastically.
But turns out that starting with setting `TCP_NOTSENT_LOWAT` to 1 byte would significantly improve performance close to what we wanted. See `Figure 1.` (page 3) in the paper for the performance improvements.
Setting `TCP_NOTSENT_LOWAT` is minimal change to tor and thus we should do it.https://gitlab.torproject.org/tpo/core/tor/-/issues/26646add support for multiple OutboundBindAddressExit IP(ranges)2023-04-03T16:39:16Znusenuadd support for multiple OutboundBindAddressExit IP(ranges)tor has support for dedicated outbound IP addresses
for on exit relays via OutboundBindAddressExit.
This parameter supports only a single IPv4 and a single IPv6 address.
I propose to add an extension of this feature to support IPv4 and ...tor has support for dedicated outbound IP addresses
for on exit relays via OutboundBindAddressExit.
This parameter supports only a single IPv4 and a single IPv6 address.
I propose to add an extension of this feature to support IPv4 and IPv6
ranges/prefixes.
The idea is to assign an IP address to each tor circuit. The exit IP
address must never change during the lifetime of the circuit.
Exit IP addresses would be randomly assigned to circuits. Once
the exit runs out of IPs it cycles through his pool of IPs again.
With IPv6 address space availability this can take a long time
with IPv4 it will be limited.
This aims to reduce the negative impact of few "bad" users on many "good"
users since they will not share the same IP address on the exit.
This might also have some negative? side effect since
it demultiplexes tor clients to multiple source IPs on the exit
and an external observer (not running the exit itself)
can tell clients apart by looking at source IPs.
Instead of doing it on the circuit level you could do it
based on time. Change the exit IP every 5 minutes (but
do _not_ change the exit IPs for _existing_ circuits even if they
live longer than 5 minutes).
https://lists.torproject.org/pipermail/tor-dev/2018-March/013036.htmlTor: 0.4.8.x-freezeAlexander Færøyahf@torproject.orgAlexander Færøyahf@torproject.orghttps://gitlab.torproject.org/tpo/core/tor/-/issues/7870Retry on a new circuit for more reasons.2023-04-03T16:41:58ZRoger DingledineRetry on a new circuit for more reasons.In git commit 388ac4126 I added some code that will never be reached:
```
if (rh->length > 0 && edge_reason_is_retriable(reason) &&
[...]
case END_STREAM_REASON_CONNECTREFUSED:
if (!conn->chosen_exit_optional)
b...In git commit 388ac4126 I added some code that will never be reached:
```
if (rh->length > 0 && edge_reason_is_retriable(reason) &&
[...]
case END_STREAM_REASON_CONNECTREFUSED:
if (!conn->chosen_exit_optional)
break; /* break means it'll close, below */
/* Else fall through: expire this circuit, clear the
* chosen_exit_name field, and try again. */
[...]
case END_STREAM_REASON_TIMEOUT:
```
since those two reasons aren't included in edge_reason_is_retriable().
One fix would be just to remove those dead code lines.
But another fix would be to include both of those reasons in the is_retriable() list, to better tolerate broken exits -- for example, an exit that sets -j REJECT in an iptable rule, rather than putting everything in its exit policy; or an exit for which the routing to the destination has a loop, causing the ttl to expire and some internet router to send back an icmp unreachable.
The downsides are a) if will take many seconds longer before your connection fails, if it fails, and b) we throw away circuits even more often, since we'll now chew through several circuits every time you attempt a destination that is down or unreachable.
The upside is that users will see fewer false failures, in proportion to how many broken or misconfigured exits there are.
I'm not entirely sure which side of the fence I'm on here. I think I lean slightly towards retrying for these two cases.https://gitlab.torproject.org/tpo/core/tor/-/issues/40869dirvote.c: maxunmeasurdbw is misspelled2023-10-26T14:19:51ZNick Mathewsondirvote.c: maxunmeasurdbw is misspelledHave a look at this (from dirvote.c, in `networkstatus_compute_consensus`)
```
{
if (consensus_method < MIN_METHOD_FOR_CORRECT_BWWEIGHTSCALE) {
max_unmeasured_bw_kb = (int32_t) extract_param_buggy(
params, ...Have a look at this (from dirvote.c, in `networkstatus_compute_consensus`)
```
{
if (consensus_method < MIN_METHOD_FOR_CORRECT_BWWEIGHTSCALE) {
max_unmeasured_bw_kb = (int32_t) extract_param_buggy(
params, "maxunmeasuredbw", DEFAULT_MAX_UNMEASURED_BW_KB);
} else {
max_unmeasured_bw_kb = dirvote_get_intermediate_param_value(
param_list, "maxunmeasurdbw", DEFAULT_MAX_UNMEASURED_BW_KB);
if (max_unmeasured_bw_kb < 1)
max_unmeasured_bw_kb = 1;
}
}
```
See the problem? In the first case, we are checking an option called "maxunmeasuredbw". In the second case, we are checking "maxunmeasurdbw".
Here it is in monospace:
* `maxunmeasuredbw`
* `maxunmeasurdbw`
Whooops! I must have introduced this bug back in nickm/tor@fb3704b45982e6a97dbad4e2d6e9cf7ba8fd1151, which went into 0.4.6.1-alpha.
I see two fixes:
1. Document the current behavior in dir-spec and param-spec. If HTTP can fix the spelling of "referer", we can leave this as-is.
2. Add a new consensus method to fix the spelling.
3. Go into a corner and curse general state of computing.
I favor option 1.
Found while working on #40835.Nick MathewsonNick Mathewsonhttps://gitlab.torproject.org/tpo/core/tor/-/issues/40865Add padding to INTRODUCE1 messages2023-09-26T19:59:35ZNick MathewsonAdd padding to INTRODUCE1 messagesSee torspec!167 and related tickets for rationale.See torspec!167 and related tickets for rationale.Nick MathewsonNick Mathewsonhttps://gitlab.torproject.org/tpo/core/tor/-/issues/40855Do not apply prop275 published-time transformation to bridge-authority pseudo...2023-09-18T18:11:45ZNick MathewsonDo not apply prop275 published-time transformation to bridge-authority pseudo-consensusNick MathewsonNick Mathewsonhttps://gitlab.torproject.org/tpo/core/tor/-/issues/40835Remove obsolete consensus methods, 2023 edition.2023-10-05T18:56:51ZNick MathewsonRemove obsolete consensus methods, 2023 edition.I suggest that we remove obsolete consensus methods, per proposal 290.
Specifically, we should remove support for every consensus method before method 32, since that is the highest method supported by 0.4.7.7.
@dgoulet if you concur I ...I suggest that we remove obsolete consensus methods, per proposal 290.
Specifically, we should remove support for every consensus method before method 32, since that is the highest method supported by 0.4.7.7.
@dgoulet if you concur I can go ahead and do this.Nick MathewsonNick Mathewsonhttps://gitlab.torproject.org/tpo/core/tor/-/issues/40732Congestion control should not update unless cwnd is full2023-01-17T18:27:32ZMike PerryCongestion control should not update unless cwnd is fullWhile staring at conflux algorithms, I found a very interesting congestion control bug: We're updating the congestion window even if it is not full. This means for chatty protocols that don't cause queue, they may inflate their congestio...While staring at conflux algorithms, I found a very interesting congestion control bug: We're updating the congestion window even if it is not full. This means for chatty protocols that don't cause queue, they may inflate their congestion window too much and overshoot. Then, when they do have data to send, they cause queue overload.
There's a few possible fixes. Here's a simple one: https://gitlab.torproject.org/mikeperry/tor/-/commit/663bfb5584ef1e8e2d35dd044c2924cc6b298bb5
The benefits from a shadow run with this were considerable, without much impact on performance. Queue use over 1000 disappeared, so we can lower `circ_max_cell_queue_size` down to ~750.
Other options include using `inflight` instead of `cwnd` to estimate BDP and queue use, in which case we could still backoff, but we just would not increase the cwnd in these cases. I am going to test those.
This may also enable us to make our congestion control parameters more aggressive again. Quite the find.
Cc: @dgouletTor: 0.4.7.x-post-stableMike PerryMike Perryhttps://gitlab.torproject.org/tpo/core/tor/-/issues/40698Don't assign a high bandwidth weight to Authorities2022-10-28T15:32:10ZRoger DingledineDon't assign a high bandwidth weight to Authoritiestl;dr: dizum has a consensus weight of 18000 right now, which is silly because it means it's overloaded doing non dir auth things, which is in turn making it bad at doing dir auth things.
Background:
In the beginning, we invented the "...tl;dr: dizum has a consensus weight of 18000 right now, which is silly because it means it's overloaded doing non dir auth things, which is in turn making it bad at doing dir auth things.
Background:
In the beginning, we invented the "MaxAdvertisedBandwidth" config option to let relays limit the amount of attention they would get from clients for ordinary circuit roles. Directory authorities used (and still use) this config option to leave most of their bandwidth for their dir auth functions.
But then the bwauth design came along, and suddenly you could get assigned a high consensus weight even if your descriptor claims a low rate limit.
So we put some hack in place to avoid giving high weights to relays with the Authority flag. I don't remember which hack it was -- whether bwauth stopped measuring or expressing opinions about those relays, whether authorities skipped over weights in the bw file if they're for Authorities, or what.
But we seem to have lost that hack now, or maybe I dreamed it and we never had it in the first place. In any case, I think we want one again/now. Here are two ways we could implement it:
* In C-Tor. For example, in routerstatus_format_entry() where we do:
```
if (format == NS_V3_VOTE && vrs && vrs->has_measured_bw) {
smartlist_add_asprintf(chunks,
" Measured=%d", vrs->measured_bw_kb);
}
```
we could add "&& !rs->is_authority" to that if.
* In sbws, we avoid measuring relays with the Authority flag, or we measure them but we don't put an opinion into the V3BandwidthsFile, or something similar. [Edit: turns out there's a ticket for that idea! https://gitlab.torproject.org/tpo/network-health/onbasca/-/issues/62]
I would slightly favor doing the check in Tor rather than sbws, (a) so sbws doesn't need to grow more complexity and surface area about what should get measured and what shouldn't, and (b) so I don't open another ticket like this in five years talking about a missing feature in sbws's successor.
See https://gitlab.torproject.org/tpo/core/tor/-/issues/3023 for the much bigger issue.Tor: 0.4.7.x-post-stablehttps://gitlab.torproject.org/tpo/core/tor/-/issues/40645channelpading_get_netflow_inactive_timeout_ms is fragile when high_timeout < ...2022-08-11T15:49:52ZNick Mathewsonchannelpading_get_netflow_inactive_timeout_ms is fragile when high_timeout < low_timeout.See `channelpading_get_netflow_inactive_timeout_ms` and `channelpadding_update_padding_for_channel`
Nothing in that function, so far as I can tell, actually enforces `high_timeout > low_timeout` in the function that generates the next ...See `channelpading_get_netflow_inactive_timeout_ms` and `channelpadding_update_padding_for_channel`
Nothing in that function, so far as I can tell, actually enforces `high_timeout > low_timeout` in the function that generates the next timeout. So for example, if somebody sends a padding negotiate cell with `low_timeout` set very high, we can get a negative value when we compute `high_timeout - low_timeout`.
This is bad! When we pass that to crypto_rand_int(), we assert that the input is less than `INT_MAX - 1`, to detect exactly this kind of underflow.
Fortunately, I think we cannot run into that situation from an errant `PADDING_NEGOTIATE` cell: `channelpadding_updates_padding_for_channel` makes sure that the high_timeout on each channel is at least the low timeout as it processes those cells. (I tested this with come code that sent those cells on a chutney network.)
Nevertheless, I would like to apply the following patch for defense-in-depth:
```
diff --git a/src/core/or/channelpadding.c b/src/core/or/channelpadding.c
index 47a04e5248caec..1f559f6c420d6c 100644
--- a/src/core/or/channelpadding.c
+++ b/src/core/or/channelpadding.c
@@ -186,7 +186,7 @@ channelpadding_get_netflow_inactive_timeout_ms(const channel_t *chan)
high_timeout = MAX(high_timeout, chan->padding_timeout_high_ms);
}
- if (low_timeout == high_timeout)
+ if (low_timeout >= high_timeout)
return low_timeout; // No randomization
```
What do you think?
I'm going to leave this ticket as confidential until somebody else can confirm that there isn't a way to trigger this in practice.
cc @mikeperry
Found while investigating #40591.Nick MathewsonNick Mathewsonhttps://gitlab.torproject.org/tpo/core/tor/-/issues/40626Confidential - TROVE-2022-001: Congestion control RTT injected delay2022-07-26T14:08:38ZDavid Gouletdgoulet@torproject.orgConfidential - TROVE-2022-001: Congestion control RTT injected delayPublic ticket that started this: https://gitlab.torproject.org/tpo/core/tor/-/issues/40624
This only affects >= tor-0.4.7.2-alpha. I have reserved TROVE-2022-001 at the moment and set it to HIGH considering the remote nature of the bug ...Public ticket that started this: https://gitlab.torproject.org/tpo/core/tor/-/issues/40624
This only affects >= tor-0.4.7.2-alpha. I have reserved TROVE-2022-001 at the moment and set it to HIGH considering the remote nature of the bug and its consequences on the network.
## What
It appears that congestion control can enter a state that makes it never exit the CC slow-start. This means in concrete terms that tor can never exit its "initial congestion window" (set at 2 cells right now) thus having extremely slow circuits. As a client, we are talking couple KB/sec.
This in theory can be triggered in two ways which one can be done remotely:
1. A clock jump.
2. A tor withholding a `SENDME` for a couple of minutes would also trigger this condition.
The (2) is the one that is very worrying because anyone can trigger that. A malicious client could do that to an onion service effectively turning "off" congestion control for that service sending a pretty huge signal to a Guard/Middle relay.
But, it is likely also possible that mobile tor client could go dormant just before needing to send a `SENDME` and then coming back online much later sending it and thus triggering this condition on the endpoint (onion service, relay).
Another possibility, like my non-Exit relay ended up in, is for directory request to stall long enough leading to that problem. Directory authority are often overwhelmed or heavily throttled/DPI (Faravahar).
Or for a malicious client to upload a descriptor on an HSDir (any relay) and withholding that `SENDME` again triggering this problem rendering the relay almost unusable.
In a nutshell, the network can come to a grinding halt if we don't fix this else we need to disable CC asap.
## How
The problem lies in `time_delta_stalled_or_jumped()` which checks if the circuit new RTT is very much out of range from the previous one. In that case, it sets `is_monotime_clock_broken = true` which is global to tor as in affecting all circuits. And then, it returns `true` so the circuit RTT is **not** updated because we believe the clock is no bueno.
But, from that point on, every call to `time_delta_stalled_or_jumped()` will return `true` because of the guard `if (old_delta == 0)` where `old_delta` is `circuit->cc->ewma_rtt_usec` which starts at 0 for a new circuit and now because `is_monotime_clock_broken = true`, it will stay 0, never able to come back to `false`.
This means, the circuit never gets to measure its RTT and thus never exit slow starts.
## Solution
Proposed patch by @mikeperry : https://gitlab.torproject.org/mikeperry/tor/-/commit/4bdcfdf6b9cd7d11cac01eb8e4e2d91bdd5fc11bTor: 0.4.7.x-post-stablehttps://gitlab.torproject.org/tpo/core/tor/-/issues/40619Typo in microdesc.c2022-07-21T19:19:42ZcypherpunksTypo in microdesc.cThe file src/feature/nodelist/microdesc.c currently has the following strings:
<code>
if (tor_memeq(node->rs->descriptor_digest,
(*mdp)->digest, DIGEST256_LEN)) {
rs_match = "Microdesc digest in RS...The file src/feature/nodelist/microdesc.c currently has the following strings:
<code>
if (tor_memeq(node->rs->descriptor_digest,
(*mdp)->digest, DIGEST256_LEN)) {
rs_match = "Microdesc digest in RS matches";
} else {
rs_match = "Microdesc digest in RS does match";
}
</code>
It looks like a typo that should read "digest in RS does *not* match".Tor: 0.4.8.x-freezeNick MathewsonNick Mathewson