Trac issueshttps://gitlab.torproject.org/legacy/trac/-/issues2020-06-13T15:48:04Zhttps://gitlab.torproject.org/legacy/trac/-/issues/32438Inconsistent failure-then-success bootstrap behavior with clock set 24h in th...2020-06-13T15:48:04ZintrigeriInconsistent failure-then-success bootstrap behavior with clock set 24h in the pastContext: I'm investigating which part of Tails' crazy clock fixing dance we can remove thanks to https://trac.torproject.org/projects/tor/ticket/24661. Corresponding Tails ticket: https://redmine.tails.boum.org/code/issues/16471.
Enviro...Context: I'm investigating which part of Tails' crazy clock fixing dance we can remove thanks to https://trac.torproject.org/projects/tor/ticket/24661. Corresponding Tails ticket: https://redmine.tails.boum.org/code/issues/16471.
Environment: Tor Browser 9.0.1 x86_64 on Debian unstable, clock set 24h before UTC. Tested both with direct connection to the Tor network and with bridges.
The first time I click "Connect" in Tor Launcher, I see a bootstrapping error (see attached screenshot and Tor logs):
```
Tor failed to establish a Tor network connection.
Loading authority certificates failed (Clock skew -81944 in microdesc flavor consensus from CONSENSUS - ?).
```
I guess that's kind of expected despite the improvements implemented as part of https://trac.torproject.org/projects/tor/ticket/24661.
But then, if I click "Reconfigure" and then "Connect", tor bootstraps successfully. This surprises me and feels inconsistent: the way I see it, either the clock is skewed enough for tor to fail bootstrapping, and then it should not succeed on second try; or tor can somehow deal with skewed clock, and then it should succeed on first try.Kathleen BradeKathleen Bradehttps://gitlab.torproject.org/legacy/trac/-/issues/28813confirm hypothesis re PT jump-to-80%2020-06-13T15:35:30ZTaylor Yuconfirm hypothesis re PT jump-to-80%traumschule reported a jump-to-80% situation that was apparently not resolved by the recent 0.3.5 bootstrap reporting changes. Check whether this is explained by the current situation where the TCP connection succeeding is enough progre...traumschule reported a jump-to-80% situation that was apparently not resolved by the recent 0.3.5 bootstrap reporting changes. Check whether this is explained by the current situation where the TCP connection succeeding is enough progress to start reporting directory progress, regardless of whether the TCP connection is to a local proxy or an actual relay.Tor: 0.4.0.x-finalTaylor YuTaylor Yuhttps://gitlab.torproject.org/legacy/trac/-/issues/28654Allow relays to serve future consensuses2020-06-13T15:34:50ZteorAllow relays to serve future consensusesLike #28591 for clients, we should allow relays to serve future consensuses.Like #28591 for clients, we should allow relays to serve future consensuses.Tor: 0.4.0.x-finalhttps://gitlab.torproject.org/legacy/trac/-/issues/28591Accept a future consensus for bootstrap2020-06-13T16:06:33ZteorAccept a future consensus for bootstrap#24661 allows tor to bootstrap when the client's clock is ahead of the network by up to 1 day.
But clients can't bootstrap when the client's clock is behind the network by more than a few hours:
https://trac.torproject.org/projects/tor/...#24661 allows tor to bootstrap when the client's clock is ahead of the network by up to 1 day.
But clients can't bootstrap when the client's clock is behind the network by more than a few hours:
https://trac.torproject.org/projects/tor/ticket/24661#comment:18Tor: 0.3.5.x-finalteorteorhttps://gitlab.torproject.org/legacy/trac/-/issues/28319accept a reasonably live consensus for path selection2020-06-13T15:33:46Zteoraccept a reasonably live consensus for path selectionWhen I fixed guard selection in #24661, tor said:
```
Nov 05 15:29:55.000 [notice] I learned some more directory information, but not enough to build a circuit: We have no recent usable consensus.
```
Maybe this is a logging issue, mayb...When I fixed guard selection in #24661, tor said:
```
Nov 05 15:29:55.000 [notice] I learned some more directory information, but not enough to build a circuit: We have no recent usable consensus.
```
Maybe this is a logging issue, maybe it's another constraint we need to fix.
See the full log in:
https://trac.torproject.org/projects/tor/ticket/24661#comment:13Tor: 0.4.0.x-finalteorteorhttps://gitlab.torproject.org/legacy/trac/-/issues/28255verify guard selection consensus expiry constraints2020-06-13T15:33:32ZTaylor Yuverify guard selection consensus expiry constraintsThe hypothesis in #23605 is that bootstrapping can get stuck at #23605 if there is enough clock skew for the consensus to be expired but still "reasonably live". Let's verify this and try to record more details.The hypothesis in #23605 is that bootstrapping can get stuck at #23605 if there is enough clock skew for the consensus to be expired but still "reasonably live". Let's verify this and try to record more details.Tor: 0.4.0.x-finalteorteorhttps://gitlab.torproject.org/legacy/trac/-/issues/27806cannot establish connection to tor2018-09-21T01:42:09ZTraccannot establish connection to tor9/23/18, 00:32:39.128 [NOTICE] DisableNetwork is set. Tor will not make or accept non-control network connections. Shutting down all existing connections.
9/23/18, 00:32:39.128 [NOTICE] DisableNetwork is set. Tor will not make or accept...9/23/18, 00:32:39.128 [NOTICE] DisableNetwork is set. Tor will not make or accept non-control network connections. Shutting down all existing connections.
9/23/18, 00:32:39.128 [NOTICE] DisableNetwork is set. Tor will not make or accept non-control network connections. Shutting down all existing connections.
9/23/18, 00:32:39.129 [NOTICE] DisableNetwork is set. Tor will not make or accept non-control network connections. Shutting down all existing connections.
9/23/18, 00:32:39.129 [NOTICE] Opening Socks listener on 127.0.0.1:9150
9/23/18, 00:32:39.775 [NOTICE] Bootstrapped 5%: Connecting to directory server
9/23/18, 00:32:39.850 [NOTICE] Bootstrapped 10%: Finishing handshake with directory server
9/23/18, 00:32:45.951 [WARN] Received NETINFO cell with skewed time (OR:199.58.81.140:443): It seems that our clock is ahead by 1 days, 23 hours, 59 minutes, or that theirs is behind. Tor requires an accurate clock to work: please check your time, timezone, and date settings.
9/23/18, 00:32:45.952 [WARN] Problem bootstrapping. Stuck at 10%: Finishing handshake with directory server. (Clock skew 172799 in NETINFO cell from OR; CLOCK_SKEW; count 4; recommendation warn; host 74A910646BCEEFBCD2E874FC1DC997430F968145 at 199.58.81.140:443)
9/23/18, 00:32:45.952 [WARN] 3 connections have failed:
9/23/18, 00:32:45.952 [WARN] 2 connections died in state handshaking (Tor, v3 handshake) with SSL state SSL negotiation finished successfully in OPEN
9/23/18, 00:32:45.952 [WARN] 1 connections died in state connect()ing with SSL state (No SSL object)
9/23/18, 00:32:45.962 [NOTICE] Bootstrapped 15%: Establishing an encrypted directory connection
9/23/18, 00:32:45.966 [NOTICE] Closing no-longer-configured Socks listener on 127.0.0.1:9150
9/23/18, 00:32:45.967 [NOTICE] DisableNetwork is set. Tor will not make or accept non-control network connections. Shutting down all existing connections.
9/23/18, 00:32:45.967 [NOTICE] Closing old Socks listener on 127.0.0.1:9150
9/23/18, 00:32:46.518 [NOTICE] Delaying directory fetches: DisableNetwork is set.
**Trac**:
**Username**: NerxXm8https://gitlab.torproject.org/legacy/trac/-/issues/27351DisableNetwork is not unset if bootstrapping fails2020-06-16T00:49:41ZtraumschuleDisableNetwork is not unset if bootstrapping failsReloading TB's Tor instance disables network and keeps it turned off because it fails to attach to the SocksPort.
One has to change the SocksPort, enable the network and change back the SocksPort with nyx:
```
>>> signal reload
250 OK
>...Reloading TB's Tor instance disables network and keeps it turned off because it fails to attach to the SocksPort.
One has to change the SocksPort, enable the network and change back the SocksPort with nyx:
```
>>> signal reload
250 OK
>>> getconf disablenetwork
250 DisableNetwork=1
>>> setconf disablenetwork=0
553 Unable to set option: Failed to bind one of the listener ports.
>>> setconf socksport=9152
250 OK
>>> setconf disablenetwork=0
250 OK
>>> setconf socksport=9150
250 OK
```https://gitlab.torproject.org/legacy/trac/-/issues/27167track "first" OR_CONN2020-06-13T17:44:18ZTaylor Yutrack "first" OR_CONNRight now the first stages of the "first" OR_CONN get reported as `BOOTSTRAP_STATUS_CONN_DIR` and `BOOTSTRAP_STATUS_HANDSHAKE` (the latter is a special bootstrap phase that gets translated into `BOOTSTRAP_STATUS_HANDSHAKE_DIR` or `BOOTST...Right now the first stages of the "first" OR_CONN get reported as `BOOTSTRAP_STATUS_CONN_DIR` and `BOOTSTRAP_STATUS_HANDSHAKE` (the latter is a special bootstrap phase that gets translated into `BOOTSTRAP_STATUS_HANDSHAKE_DIR` or `BOOTSTRAP_STATUS_HANDSHAKE_OR` depending on how much progress was previously reported. The logic in functions that report these events should be moved up to a new abstraction so lower level code has to track less high-level state.
This also eliminates some logic in `control_event_bootstrap()` that tries to figure out whether a given handshake attempt corresponds to a directory connection or an application circuit connection.Tor: 0.4.0.x-finalTaylor YuTaylor Yuhttps://gitlab.torproject.org/legacy/trac/-/issues/27103report initial OR_CONN as the earliest bootstrap phases2020-06-13T15:31:22ZTaylor Yureport initial OR_CONN as the earliest bootstrap phasesWe should always make the earliest bootstrap phases be our first connection to any OR, regardless of whether we already have enough directory info to start building circuits.
When starting to boostrap with existing directory info, there...We should always make the earliest bootstrap phases be our first connection to any OR, regardless of whether we already have enough directory info to start building circuits.
When starting to boostrap with existing directory info, there might not be a need to make an initial connection to a bridge or fallback directory server to download directory info. This means that the initial OR_CONN to a bridge or guard displays on a progress bar as 80%, when in fact a fairly "early" dependency (the initial connection to any OR) could be failing.
Intuitively, starting Tor Browser and seeing the progress bar hang at 80% for a very long time is frustrating and misleading. A user who sees the progress bar hang at at 5% or 10% has a much better idea of what's going on.
Existing directory info can be reflected in the progress bar as a rapid jump after the initial OR_CONN succeeds. This seems less likely to frustrate users.Tor: 0.4.0.x-finalTaylor YuTaylor Yuhttps://gitlab.torproject.org/legacy/trac/-/issues/25756EARLY_CONSENSUS_NOTICE_SKEW of 60 is too strict for some drifting dirauth clocks2020-06-13T17:54:04ZTracEARLY_CONSENSUS_NOTICE_SKEW of 60 is too strict for some drifting dirauth clocksI keep getting this error on my relay, I set my timezone as Bucharest, Romania. The relay is running ubuntu 16.04.4 LTS (GNU/Linux 4.4.0-119-generic x86_64) and using Tor 0.3.2.10
```
[WARN] Received ns flavor consensus with skewed tim...I keep getting this error on my relay, I set my timezone as Bucharest, Romania. The relay is running ubuntu 16.04.4 LTS (GNU/Linux 4.4.0-119-generic x86_64) and using Tor 0.3.2.10
```
[WARN] Received ns flavor consensus with skewed time (CONSENSUS): It seems that our clock is behind by 1 minutes, 1 seconds, or that theirs is ahead. Tor requires an
accurate clock to work: please check your time, timezone, and date settings.
17:58:59 [WARN] Our clock is 1 minutes, 1 seconds behind the time published in the consensus network status document (2018-04-09 15:00:00 UTC). Tor needs an accurate clock to
work correctly. Please check your time and date settings!
```
**Trac**:
**Username**: DbryrtfbcbhgfTor: 0.3.4.x-finalTaylor YuTaylor Yuhttps://gitlab.torproject.org/legacy/trac/-/issues/25511Expose TZ info on control port for better debugging of time errors2020-06-13T17:44:06ZNick MathewsonExpose TZ info on control port for better debugging of time errorsWhen we tell people that their clocks are wrong, it's frequently because they've set up their systems with the wrong time zone. It would be helpful to tell torbrowser (or some other controller) about this information, so that it can giv...When we tell people that their clocks are wrong, it's frequently because they've set up their systems with the wrong time zone. It would be helpful to tell torbrowser (or some other controller) about this information, so that it can give more useful error messages on time-related bootstrapping failures.
~~One complication here is (IIUC) TB runs, and runs tor, with its TZ set to UTC.~~
Further investigation suggests that TB might not set `TZ` for the first time it starts tor. Opened #25823 for the Tor Launcher behavior inconsistency.Tor: 0.3.4.x-finalNeel Chauhanneel@neelc.orgNeel Chauhanneel@neelc.orghttps://gitlab.torproject.org/legacy/trac/-/issues/25120getrandom() syscall failure warning should be a notice and worded better2020-06-13T15:21:30ZTaylor Yugetrandom() syscall failure warning should be a notice and worded betterThe logging improvement for error handling of the `getrandom()` syscall in #24500 could be further improved:
* The log level should possibly be NOTICE rather than WARN.
* The log message should mention that tor will fall back to alterna...The logging improvement for error handling of the `getrandom()` syscall in #24500 could be further improved:
* The log level should possibly be NOTICE rather than WARN.
* The log message should mention that tor will fall back to alternative sources of randomness.
* Maybe we want to mention header/kernel version mismatches as a specific common reason for this issue.
Yes, some of the fallbacks are somewhat dangerous, like `/dev/urandom` might not be seeded yet. etc,.Tor: 0.3.3.x-finalhttps://gitlab.torproject.org/legacy/trac/-/issues/25061Relays log a bootstrap warning if they can't extend for somebody else's circuit2020-06-13T15:21:17ZRoger DingledineRelays log a bootstrap warning if they can't extend for somebody else's circuitSay you have a relay that is up and listed as non-slow in the consensus. Due to the current overload (#24902), this relay is getting many many circuit requests per second. Due to bug #24767, we will make a huge number of connection attem...Say you have a relay that is up and listed as non-slow in the consensus. Due to the current overload (#24902), this relay is getting many many circuit requests per second. Due to bug #24767, we will make a huge number of connection attempts to other relays that are down, because as soon as we get a "connection refused", we will get another circuit request that triggers another connection attempt.
So when your relay restarts, since it's still in the consensus and clients still think it's usable, it will immediately get flooded with circuit requests, causing these connection attempts to resume.
And Tor calls every one of those connection attempts a bootstrapping attempt, even if there are no origin circuits related to that connection attempt.Tor: unspecifiedhttps://gitlab.torproject.org/legacy/trac/-/issues/24500Confusing log message "Can't get entropy from getrandom()"2020-06-13T15:21:30ZRoger DingledineConfusing log message "Can't get entropy from getrandom()"A relay operator on #tor shared these log lines:
```
Dec 01 16:33:00.000 [notice] Tor 0.3.1.8 (git-868c1b2b1eb7225a) opening log file.
Dec 01 16:33:00.515 [warn] Can't get entropy from getrandom().
Dec 01 16:33:00.534 [notice] Tor 0.3.1....A relay operator on #tor shared these log lines:
```
Dec 01 16:33:00.000 [notice] Tor 0.3.1.8 (git-868c1b2b1eb7225a) opening log file.
Dec 01 16:33:00.515 [warn] Can't get entropy from getrandom().
Dec 01 16:33:00.534 [notice] Tor 0.3.1.8 (git-868c1b2b1eb7225a) running on Linux with Libevent 2.0.21-stable, OpenSSL 1.0.2g, Zlib 1.2.8, Liblzma 5.1.0alpha[...]
```
The middle line is confusing -- why can't we get the entropy from it? Does that mean Tor has failed at something important? What should the relay operator do?
If the relay operator shouldn't do anything, because it's fine, this should be a notice or less. If the relay operator ought to do something, because it's not fine, then we should suggest what to do and/or what the problem is with doing nothing.Tor: 0.3.2.x-finalhttps://gitlab.torproject.org/legacy/trac/-/issues/24486Mark all bridges as up on application activity2020-06-13T15:18:16ZteorMark all bridges as up on application activityIf circuit_get_open_circ_or_launch() or its callers don't already mark all bridges as up, we should make them do so.
A good way to do this is to:
* modify the bridge state so we're using the bootstrapping schedule, then
* reset the down...If circuit_get_open_circ_or_launch() or its callers don't already mark all bridges as up, we should make them do so.
A good way to do this is to:
* modify the bridge state so we're using the bootstrapping schedule, then
* reset the download statuses on all bridges, and
* reset the guard state on all the bridges (?).Tor: 0.3.2.x-finalhttps://gitlab.torproject.org/legacy/trac/-/issues/24392Ignore cached bridge descriptors until we check if they are running2020-06-13T15:17:52ZteorIgnore cached bridge descriptors until we check if they are runningSplit off #24367:
[An] underlying issue here is that every use of any_bridge_descriptors_known() in tor is wrong. Instead, we want to know if num_bridges_usable() > 0.
This is a bug introduced in commit 93a8ed3 in 0.3.2.1-alpha by #233...Split off #24367:
[An] underlying issue here is that every use of any_bridge_descriptors_known() in tor is wrong. Instead, we want to know if num_bridges_usable() > 0.
This is a bug introduced in commit 93a8ed3 in 0.3.2.1-alpha by #23347. But it was also a bug in commit eca2a30 in 0.2.0.3-alpha, but we've never encountered it before, because we always retried our bridges immediately (and far too often).
My branch bug24367 at ​https://github.com/teor2345/tor.git passes all of make test-network, but fails some of the guard and microdesc unit tests. This probably means the unit tests were relying on the buggy behaviour, and need to be changed. I might need nickm or asn to help me fix the unit tests, because they wrote them.Tor: 0.2.9.x-finalteorteorhttps://gitlab.torproject.org/legacy/trac/-/issues/23605expired consensus causes guard selection to stall at BOOTSTRAP PROGRESS=802020-06-13T15:33:33ZTaylor Yuexpired consensus causes guard selection to stall at BOOTSTRAP PROGRESS=80Tor can report `BOOTSTRAP_STATUS_CONN_OR` (PROGRESS=80, "Connecting to the Tor network") when it actually can do no such thing. In some situations (e.g., clock skew) this causes progress to get stuck at 80% indefinitely, resulting in ve...Tor can report `BOOTSTRAP_STATUS_CONN_OR` (PROGRESS=80, "Connecting to the Tor network") when it actually can do no such thing. In some situations (e.g., clock skew) this causes progress to get stuck at 80% indefinitely, resulting in very poor user experience.
Right now `update_router_have_minimum_dir_info()` reports the `BOOTSTRAP_STATUS_CONN_OR` event if there's a "reasonably live" consensus and enough descriptors downloaded. A client with a clock skewed several hours into the future can get stalled here indefinitely due to inability to select a guard: if the client's clock is skewed, it will never have a live consensus. (Guard selection seems to require a non-expired consensus, rather than a reasonably live consensus at least during bootstrap.)
We should either relax the guard selection consensus liveness requirement, or avoid reporting `BOOTSTRAP_STATUS_CONN_OR` when we have no reasonable chance of actually connecting to a guard for building application circuits.
Arguably we shouldn't start downloading descriptors until we have a non-expired consensus either, because that gets represented as a considerable chunk of the progress bar (40%->80%) in a way that could be misleading to a user. Making that change without additional work would cause bootstrap to get stuck at 40% instead of 80%, which might be an improvement. This can already happen if the client's clock is skewed several hours in the past.Tor: 0.4.0.x-finalTaylor YuTaylor Yuhttps://gitlab.torproject.org/legacy/trac/-/issues/23565document signs of client clock skew to ease troubleshooting2020-06-13T17:36:11ZTaylor Yudocument signs of client clock skew to ease troubleshootingTicket #23508 describes some ways that clock skews during client bootstrapping can often cause stalls without any useful user feedback. Document some signs of this behavior (e.g., specific message patterns in log files, Tor Launcher mes...Ticket #23508 describes some ways that clock skews during client bootstrapping can often cause stalls without any useful user feedback. Document some signs of this behavior (e.g., specific message patterns in log files, Tor Launcher messages when stalled) so we can better help users who aren't running a modern enough release to mitigate these issues.Tor: unspecifiedhttps://gitlab.torproject.org/legacy/trac/-/issues/23508large clock skews cause numerous bootstrap UX issues2020-06-13T17:36:11ZTaylor Yularge clock skews cause numerous bootstrap UX issuesSetting the system clock several hours ahead or behind real time can cause many different bootstrapping problems. Often bootstrapping will get stuck with no obvious way to make progress, and no visible indication of what might actually ...Setting the system clock several hours ahead or behind real time can cause many different bootstrapping problems. Often bootstrapping will get stuck with no obvious way to make progress, and no visible indication of what might actually be wrong. These seem to be lead to a disproportionate number of support requests.
Some examples are:
clock in past:
* stuck at 40% (Loading authority key certs)
clock in future:
* stuck at 80% (Connecting to the Tor network)
* stuck at 85% (Finishing handshake with first hop)
More details and child tickets as I investigate further.Tor: unspecified