After upgrade of Ichotolot60 1AE039EE0B11DB79E4B4B29CBA9F752864A0259E from 4.2.7 to 4.4.5 it dropped off consensus.
Apr 02 21:51:31.872 [notice] Tor 0.4.2.7 running on FreeBSD with Libevent 2.1.11-stable, OpenSSL LibreSSL 3.0.2, Zlib 1.2.11, Liblzma 5.2.4, and Libzstd 1.4.4.
The consensus-health on April/9 shows all auths see it running.
Sep 18 21:00:22.384 [notice] Tor 0.4.4.5 running on FreeBSD with Libevent 2.1.12-stable, OpenSSL LibreSSL 3.2.1, Zlib 1.2.11, Liblzma 5.2.4, and Libzstd 1.4.5.
The consensus-health after one day shows:
moria1 Fast Run Stab V2Dir Valid bw=566tor26 Fast Guard Stab V2Dir Validdizum Fast Guard Stab V2Dir Validgabel. Fast Stab V2Dir Valid bw=1130danne. Fast Guard Run Stab V2Dir Validmaatu. Fast Stab V2Dir Valid bw=1200farav. Fast Guard Run Stab V2Dir Valid bw=2970longc. Fast Stab V2Dir Valid bw=490bastet Fast Guard Run Stab V2Dir Valid bw=3520bwauth=n/a
Sep 20 10:30:45.633 [notice] Tor 0.4.4.5 running on FreeBSD with
Libevent 2.1.12-stable, OpenSSL LibreSSL 3.2.0, Zlib 1.2.11, Liblzma
5.2.4, and Libzstd 1.4.5.
The consensus-health shows 2 hours after Libressl 321 -> 320
moria1 Fast Run Stab V2Dir Valid bw=566tor26 Fast Run Stab V2Dir Validdizum Fast Run Stab V2Dir Validgabel. Fast Run Stab V2Dir Valid bw=1130danne. Fast -Guard Run Stab V2Dir Validmaatu. Fast Run Stab V2Dir Valid bw=1200farav. Fast -Guard Run Stab V2Dir Valid bw=2970longc. !Fast Run Stab V2Dir Valid bw=70bastet Fast -Guard Run Stab V2Dir Valid bw=3520bwauth=gabelmoo
Libressl 321 is not compatible to the authorities tor26, dizum, gabel., maatu. and longc. to grant a "Running. What can that be?
:: Legacy ::
We had the same trouble with Libressl 292 when 28x worked well and
30x too.
With Libressl292 the consensus-health showed Sep/7 2019
ACBBB426CE1D0641A590BF1FC1CF05416FC0FF6F Planetclaire62
Fast !Running Stable V2Dir Valid bw=8200 Fast !Running Stable V2Dir Valid!Fast Running Stable V2Dir Valid Fast !Running Stable V2Dir Valid!Fast Running Stable V2Dir Valid Fast !Running Stable V2Dir Valid Fast Running Stable V2Dir Valid bw=747 Fast Guard HSDir Running Stable V2Dir Valid Fast Guard HSDir Running Stable V2Dir Valid bw=8180 Fast Running Stable V2Dir Valid bw=8180bwauth=faravahar
A Tor relay 0.4.5.0-alpha-dev here under a stable hardened Gentoo Linux with LibreSSL 3.2.1 showed only 50% of the amount of the connections as it had have before with 3.2.0 - and its TCP traffic dropped down by nearly 100%.
I recompiled Tor after LibreSSL to ensure, that it is not an ABI issue (whilst API looks fine).
Tor doesn't play well with LibreSSL 3.2.1 - will roll back here to LibreSSL 3.2.0.
Maybe this is related so on the advice of nickm, I'm going to outline the problem.
Recently, one of my relay has been emitting this protocol warning log roughly 2-5 times every hour:
Oct 21 16:22:54.355 [warn] Received a bad AUTHENTICATE cell on OR connection (handshaking (Tor, v3 handshake)) with 193.23.244.244:26528: Some field in the AUTHENTICATE cell body was not as expected
That IP is dannenberg directory authority. So I contacted Andreas and got some information out of him. Nothing much special in the logs about my relay. But dannenberg was recently moved from FreeBSD to OpenBSD. LibreSSL 3.2.2 is linked with tor at the moment.
@nickm asked me to dump the bad AUTHENTICATE cell in hex so we could inspect it:
From dannenberg (BAD):
00 03 01 60 41 55 54 48 30 30 30 33 84 5D D2 75 18 87 9B 24 2F 96 8C 5F 36 3A 2E 20 27 5A 4D 2B EB 16 8A 7C 87 CF 34 EB 15 95 88 50 5F 3C 74 66 E0 2D 22 6271 72 2E FC 87 CA BD 0E 87 1C 7F F3 BB 9F 05 E4 34 26 0E DC 30 84 3B 38 F7 FC 45 6D 1F 56 83 12 40 8D 59 2A 8C 27 7F 4B FE 79 0F 58 AA 6C 07 9C 7C A4 DE 6A 2E AF FC 48 E1 E7 61 1B 78 16 8C 94 31 C5 F3 29 19 99 87 3D 03 B7 4F 28 19 08 73 B7 8B 17 88 D8 4A 60 F9 DE E9 F9 21 E9 D0 DE 96 33 2A DF F3 0F 32 4E FE DA80 BD 5B 72 38 D9 43 1B F8 BC B7 B4 D2 1D 98 7C 64 49 FF 68 D7 83 0F E3 9A 101B 37 34 57 04 B7 A0 BE 4B FC D2 C9 F2 ED EC 6A D7 46 2E 8E C8 67 F7 49 E5 2AD3 92 5B 25 76 5B A6 AC D3 09 16 F7 7B 4B 22 FD 91 1B 2F CE A8 23 8E 8B 82 53DA 2F 6A C6 92 1F 08 3B 2F 8D D7 EA 58 AB 81 BA AB AE EB 96 E4 1D 57 54 96 83 FE AA 9A 0D 47 A5 FC 74 3F B5 39 CA 7C 50 BD 2A EF 7D AB EE 39 09 1C DB 89 208D 59 34 6B 4A 72 50 57 DD AA 22 BA F6 EC FC 60 F4 9C 08 7B 68 76 17 76 C6 3D3B 88 A7 36 6D 06 54 AA D8 15 52 08 E6 F4 42 B6 0C AA 60 EB 52 E3 80 8C 7D 54 34 98 C0 D8 63 36 1E B7 FB 8E 7B EC FE 55 7B 0B 7F 0D
Expected AUTHENTICATE:
00 03 00 00 41 55 54 48 30 30 30 33 84 5D D2 75 18 87 9B 24 2F 96 8C 5F 36 3A 2E 20 27 5A 4D 2B EB 16 8A 7C 87 CF 34 EB 15 95 88 50 5F 3C 74 66 E0 2D 22 6271 72 2E FC 87 CA BD 0E 87 1C 7F F3 BB 9F 05 E4 34 26 0E DC 30 84 3B 38 F7 FC 45 6D 1F 56 83 12 40 8D 59 2A 8C 27 7F 4B FE 79 0F 58 AA 6C 07 9C 7C A4 DE 6A 2E AF FC 48 E1 E7 61 1B 78 16 8C 94 31 C5 F3 29 19 99 87 3D 03 B7 4F 28 19 08 73 B7 8B 17 88 D8 4A 60 F9 DE E9 F9 21 E9 D0 DE 96 33 2A DF F3 0F 32 4E FE DA80 BD 5B 72 38 D9 43 1B F8 BC B7 B4 D2 1D 98 7C 64 49 FF 68 D7 83 0F E3 9A 10 1B 37 34 57 04 B7 A0 BE 4B FC D2 C9 F2 ED EC 6A D7 46 2E 8E C8 67 F7 49 E5 2AD3 92 5B 25 76 5B A6 AC D3 09 16 F7 7B 4B 22 FD 91 1B 2F CE A8 23 8E 8B 82 53DA 2F 78 D5 2A F3 28 DB CA 28 1A 06 0E BC 77 DC 7B 02 67 E9 DB 00 52 56 F5 AE 01 6B 4F EF F4 EE FC D6 9B 20 EA 5F D5 12 E6 B4 B5 88 30 55 3F AF 34 0B AE 0935 77 BF 01 D5 79
Oh, oops. I need the AUTHENTICATE cell and the expected value. Around channeltls.c:2481 , that's the value of authand the value of expected_cell->payload.
Okay, looking at those fields above: It appears that the TLSSECRETS field is different, but the other relevant fields are the same. That field is the one that is supposed to be the output of the RFC5705 exporter. I am suspecting that either there is a LibreSSL bug here, or there is some bug in how we are using SSL_export_keying_material().
Oh! The problem here is not that any LibreSSL code changed, but that some code didn't change!
If I am diagnosing this correctly, it looks like tls1_export_keying_material() is unconditionally using the exporter function from RFC 5705. But TLS 1.3 defines a completely different exporter function, in RFC 8446 section 7.5. This means that whenever LibreSSL negotiates a TLS 1.3 connection and tries to export keying material, it does not generate the same result as other TLS 1.3 implementations.
It looks like we started encountering this with LibreSSL 3.2.1 because that's the version where LibreSSL enabled TLS 1.3 in TLS_method().
I've opened a LibreSSL issue at https://github.com/libressl-portable/portable/issues/629 . I'm not sure if that's the right place to report it, but there seems to be a fair amount of traffic on that github project, so I'm taking my chances.
Tor 0.4.5.2-alpha running on FreeBSD with Libevent 2.1.12-stable, OpenSSL LibreSSL 3.3.0, Zlib 1.2.11, Liblzma 5.2.4, Libzstd 1.4.5 and Unknown N/A as libc.
"Running" is not granted and is removed after 1 hour by six authorities.
Please find the log for the not working version:
Nov 30 23:13:31.000 [notice] Self-testing indicates your ORPort x.x.x.x:443 is reachable from the outside. Excellent.Nov 30 23:13:31.000 [notice] Now checking whether IPv4 ORPort x.x.x.x:443 is reachable... (this may take up to 20 minutes -- look for log messages indicating success)Nov 30 23:13:31.000 [notice] Now checking whether IPv6 ORPort [y:y:y:y::y]:443 is reachable... (this may take up to 20 minutes -- look for log messages indicating success)Nov 30 23:13:31.000 [notice] Now checking whether IPv4 DirPort x.x.x.x:80 is reachable... (this may take up to 20 minutes -- look for log messages indicating success)Nov 30 23:13:31.000 [notice] Self-testing indicates your DirPort is reachable from the outside. Excellent.Nov 30 23:13:34.000 [notice] Self-testing indicates your ORPort [y:y:y:y::y]:443 is reachable from the outside. Excellent. Publishing server descriptor.
Remarkable is that the self-test message was missing:
Neither works here. When I try to connect to a bridge with Libressl 330 + commit_6e2ae6018f8ad5bba7cd973e4360c215dee79bcc the TBB stalls at:
Negotiating with a Tor relay
BUT: 330 already has the patch. 322 did not.
The long story, log from TBB:
Dec 07 21:33:23.000 [notice] Bootstrapped 1% (conn_pt): Connecting to pluggable transportDec 07 21:33:23.000 [notice] Bootstrapped 2% (conn_done_pt): Connected to pluggable transportDec 07 21:33:25.000 [notice] Bootstrapped 10% (conn_done): Connected to a relayDec 07 21:33:25.000 [notice] Bootstrapped 14% (handshake): Handshaking with a relayDec 07 21:33:26.000 [info] should_delay_dir_fetches(): Delaying dir fetches (no running bridges known)Dec 07 21:33:26.000 [info] should_delay_dir_fetches(): Delaying dir fetches (no running bridges known)...Dec 07 21:35:22.000 [info] should_delay_dir_fetches(): Delaying dir fetches (no running bridges known)Dec 07 21:35:22.000 [info] should_delay_dir_fetches(): Delaying dir fetches (no running bridges known)Dec 07 21:35:23.000 [info] connection_ap_expire_beginning(): Tried for 120 seconds to get a connection to xxx. Giving up. (waiting for circuit)Dec 07 21:35:23.000 [info] should_delay_dir_fetches(): Delaying dir fetches (no running bridges known)Dec 07 21:35:23.000 [info] should_delay_dir_fetches(): Delaying dir fetches (no running bridges known)Dec 07 21:35:23.000 [info] connection_free_minimal(): Freeing linked Socks connection [waiting for circuit] with 106 bytes on inbuf, 0 on outbuf.Dec 07 21:35:23.000 [info] connection_dir_client_reached_eof(): 'fetch' response not all here, but we're at eof. Closing.Dec 07 21:35:23.000 [info] entry_guards_note_guard_failure(): Recorded failure for primary guard xxxDec 07 21:35:23.000 [info] connection_dir_client_request_failed(): Giving up on serverdesc/extrainfo fetch from directory server at xxx; retryingDec 07 21:35:23.000 [info] connection_free_minimal(): Freeing linked Directory connection [client reading] with 0 bytes on inbuf, 0 on outbuf.Dec 07 21:35:23.000 [info] should_delay_dir_fetches(): Delaying dir fetches (no running bridges known)Dec 07 21:35:24.000 [info] should_delay_dir_fetches(): Delaying dir fetches (no running bridges known)...Dec 07 21:35:27.000 [info] should_delay_dir_fetches(): Delaying dir fetches (no running bridges known)Dec 07 21:35:27.000 [info] should_delay_dir_fetches(): Delaying dir fetches (no running bridges known)Dec 07 21:35:28.000 [info] connection_ap_make_link(): Making internal direct tunnel to xxx ...Dec 07 21:35:28.000 [info] connection_ap_make_link(): ... application connection created and linked.Dec 07 21:35:28.000 [info] should_delay_dir_fetches(): Delaying dir fetches (no running bridges known)Dec 07 21:35:28.000 [info] should_delay_dir_fetches(): Delaying dir fetches (no running bridges known)Dec 07 21:35:28.000 [info] connection_edge_process_inbuf(): data from edge while in 'waiting for circuit' state. Leaving it on buffer.Dec 07 21:35:28.000 [info] connection_edge_process_inbuf(): data from edge while in 'waiting for circuit' state. Leaving it on buffer.Dec 07 21:35:28.000 [info] should_delay_dir_fetches(): Delaying dir fetches (no running bridges known)Dec 07 21:35:29.000 [info] should_delay_dir_fetches(): Delaying dir fetches (no running bridges known)...Dec 07 21:37:02.000 [info] should_delay_dir_fetches(): Delaying dir fetches (no running bridges known)Dec 07 21:37:02.000 [info] should_delay_dir_fetches(): Delaying dir fetches (no running bridges known)Dec 07 21:37:03.000 [info] should_delay_dir_fetches(): Delaying dir fetches (no running bridges known)Dec 07 21:37:03.000 [info] circuit_n_chan_done(): Channel failed; closing circ.Dec 07 21:37:03.000 [info] circuit_mark_for_close_(): Circuit 0 (id: 1) marked for close at circuitbuild.c:673 (orig reason: 8, new reason: 0)Dec 07 21:37:03.000 [info] connection_or_note_state_when_broken(): Connection died in state 'handshaking (Tor, v3 handshake) with SSL state SSL negotiation finished successfully in OPEN'Dec 07 21:37:03.000 [info] bto_status_rcvr(): ORCONN DELETE gid=7 status=2 reason=7Dec 07 21:37:03.000 [warn] Problem bootstrapping. Stuck at 14% (handshake): Handshaking with a relay. (IOERROR; IOERROR; count 1; recommendation warn; host xxx)Dec 07 21:37:03.000 [warn] 1 connections have failed:Dec 07 21:37:03.000 [warn] 1 connections died in state handshaking (Tor, v3 handshake) with SSL state SSL negotiation finished successfully in OPENDec 07 21:37:03.000 [info] circuit_build_failed(): Our circuit 0 (id: 1) died before the first hop with no connectionDec 07 21:37:03.000 [info] connection_ap_fail_onehop(): Closing one-hop stream to xxx because the OR conn just failed.Dec 07 21:37:03.000 [info] circuit_free_(): Circuit 0 (id: 1) has been freed.Dec 07 21:37:03.000 [info] connection_free_minimal(): Freeing linked Socks connection [waiting for circuit] with 106 bytes on inbuf, 0 on outbuf.Dec 07 21:37:03.000 [info] connection_dir_client_reached_eof(): 'fetch' response not all here, but we're at eof. Closing.Dec 07 21:37:03.000 [info] entry_guards_note_guard_failure(): Recorded failure for primary guard xxxDec 07 21:37:03.000 [info] connection_dir_client_request_failed(): Giving up on serverdesc/extrainfo fetch from directory server at xxx; retryingDec 07 21:37:03.000 [info] connection_free_minimal(): Freeing linked Directory connection [client reading] with 0 bytes on inbuf, 0 on outbuf.Dec 07 21:37:03.000 [info] set_max_file_descriptors(): This platform is missing getrlimit(). Proceeding.Dec 07 21:37:03.000 [notice] Closing no-longer-configured Socks listener on 127.0.0.1:9150Dec 07 21:37:03.000 [notice] DisableNetwork is set. Tor will not make or accept non-control network connections. Shutting down all existing connections.Dec 07 21:37:03.000 [notice] Tor 0.4.4.6 (git-2a8b789ea6f308d0) opening log file.Dec 07 21:37:03.000 [info] options_commit_listener_transaction(): Recomputed OOS thresholds: ConnLimit 1000, ConnLimit_ 14968, ConnLimit_high_thresh 14904, ConnLimit_low_thresh 11226Dec 07 21:37:03.000 [info] select_scheduler(): Scheduler type KIST not built inDec 07 21:37:03.000 [info] scheduler_kist_set_lite_mode(): Setting KIST scheduler without kernel support (KISTLite mode)Dec 07 21:37:03.000 [info] cmux_ewma_set_options(): Enabled cell_ewma algorithm because of value in CircuitPriorityHalflifeMsec in consensus; scale factor is 0.793701 per 10 secondsDec 07 21:37:03.000 [info] should_delay_dir_fetches(): Delaying dir fetches (DisableNetwork is set)Dec 07 21:37:03.000 [info] should_delay_dir_fetches(): Delaying dir fetches (DisableNetwork is set)Dec 07 21:37:04.000 [warn] Pluggable Transport process terminated with status code 0Dec 07 21:37:04.000 [info] should_delay_dir_fetches(): Delaying dir fetches (DisableNetwork is set)Dec 07 21:37:04.000 [info] should_delay_dir_fetches(): Delaying dir fetches (DisableNetwork is set)Dec 07 21:37:05.000 [info] should_delay_dir_fetches(): Delaying dir fetches (DisableNetwork is set)
With the OpenBSD snapshot #266 (closed) from around 20200109, the ssl handshakes started to complete. It's been fully operational since.
The following error continues to appear in the log is:
Jan 27 22:05:01.000 [warn] tor_tls_finish_handshake: Bug: For some reason, wasV2Handshake didn't get set. Fixing that. (on Tor 0.4.4.6 )
Note that v3 hidden services work if that is at all relevant.
Also, any fixes since October won't appear in -stable branch until its release around May 1, so to have an operational OpenBSD node you need to run -current or build with OpenSSL.
https://github.com/libressl-portable/portableLibreSSL Portable Release Notes:3.3.3 - Stable release * This is the first stable release from the 3.3.x series. There are no changes from 3.3.2.3.3.2 - Development release * This release adds support for DTLSv1.2 and continues the rewrite of the record layer for the legacy stack. Numerous bugs and interoperability issues were fixed in the new verifier. A few bugs and incompatibilities remain, so this release uses the old verifier by default. The OpenSSL 1.1 TLSv1.3 API is not yet available.
Good news, Libressl seems to work for us with 3.4.1 (Freebsd ports libressl/devel).
Oct 22 21:06:26.894 [notice] Tor 0.4.6.7 running on FreeBSD with Libevent 2.1.12-stable, OpenSSL LibreSSL 3.4.1, Zlib 1.2.11, Liblzma 5.2.5, Libzstd 1.5.0 and Unknown N/A as libc.
It's the first time since 3.2.0:
LibreSSL Portable Release Notes:3.4.1 - Stable release * New Features - Added support for OpenSSL 1.1.1 TLSv1.3 APIs.