Aug 03 03:40:28.000 [warn] tor_bug_occurred_: Bug: conflux_pool.c:828: record_rtt_client: Non-fatal assertion now >= leg->link_sent_usec failed. (on Tor 0.4.8.2-alpha 328f976245134501)Aug 03 03:40:28.000 [warn] Bug: Tor 0.4.8.2-alpha (git-328f976245134501): Non-fatal assertion now >= leg->link_sent_usec failed in record_rtt_client at conflux_pool.c:828. (Stack trace not available) (on Tor 0.4.8.2-alpha 328f976245134501)
Steps to reproduce:
Looks like it appears randomly.
However I have ntpd installed, maybe it interferes with Tor somehow.
What is the current bug behavior?
Error shows.
What is the expected behavior?
No error.
Environment
- Which version of Tor are you using?
Tor 0.4.8.2-alpha (git-328f9762) running on Windows 7 with Libevent 2.1.12-stable, OpenSSL 3.0.8, Zlib 1.2.13, Liblzma 5.4.1, Libzstd 1.5.4 and Unknown N/A as libc.
- Which operating system are you using?
Windows 7 SP1 x64
- Which installation method did you use?
I built Tor from sources.
Relevant logs and/or screenshots
I saw such problem two more times:
Jul 24 03:15:14.000 [warn] tor_bug_occurred_: Bug: conflux_pool.c:828: record_rtt_client: Non-fatal assertion now >= leg->link_sent_usec failed. (on Tor 0.4.8.2-alpha 328f976245134501)Jul 24 03:15:14.000 [warn] Bug: Tor 0.4.8.2-alpha (git-328f976245134501): Non-fatal assertion now >= leg->link_sent_usec failed in record_rtt_client at conflux_pool.c:828. (Stack trace not available) (on Tor 0.4.8.2-alpha 328f976245134501)...Jul 24 18:16:48.000 [warn] tor_bug_occurred_: Bug: conflux_pool.c:828: record_rtt_client: Non-fatal assertion now >= leg->link_sent_usec failed. (on Tor 0.4.8.2-alpha 328f976245134501)Jul 24 18:16:48.000 [warn] Bug: Tor 0.4.8.2-alpha (git-328f976245134501): Non-fatal assertion now >= leg->link_sent_usec failed in record_rtt_client at conflux_pool.c:828. (Stack trace not available) (on Tor 0.4.8.2-alpha 328f976245134501)
Possible fixes
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
To clarify: you do not see any other backtrace or warn logs before this?
There is a check to make sure leg->link_sent_usec is not 0 right before this, which would cause the warn log Conflux: Trying to record client RTT without a timestamp to appear, after a list of function calls.
If this log is absent, then it means that monotime_absolute_usec() is broken on windows. :/
I saw some additional messages around assertion now >= leg->link_sent_usec. But I doubt they are related to this problem.
Jul 24 05:30:32.000 [notice] Sudden decrease in circuit RTT (2765 vs 18443927094041), likely due to clock jump.
Aug 05 12:58:59.000 [notice] No circuits are opened. Relaxed timeout for circuit 5251 (a Unlinked conflux circuit 3-hop circuit in state doing handshakes with channel state open) to 60000ms. However, it appears the circuit has timed out anyway. [3 similar message(s) suppressed in last 43500 seconds]
I do not see backtraces, either because they are broken, or I disabled them somehow with my build options:
Yes these other logs do actually look related. The problem is that monotime_absolute_usec() is supposed to only increase the clock, but it looks like on your system, it can still go backwards.
This could be caused by ntpd, but it is also not supposed to happen in C-Tor. If the clock jumps backwards, the time values we use should stay the same.
Are you able to check ntpd logs to see if the ntp time updates match with instances of these Tor messages?
I looked into source code and found that on Windows monotime_get uses QueryPerformanceCounter. This article shows that QueryPerformanceCounter uses rdtscp instruction.
So it means that monotime_get should be as monotonic as rdtscp allows it to be.
Also it means that ntpd can't affect its values.
Nice digging. I think you're on to something here. This definitely looks like some kind of internal overflow in monotime.
The commit message also says "This has potential to mess us up if some future MS version declares that performance counters will use 1,000,000,007 units per second, but let's burn that bridge when we come to it."
Which sounds like is exactly what is happening to you. I have to admit I am rather clueless on how to proceed here though, and @nickm (who wrote that code) is also on vacation.
I am trying to reproduce this with Tor Browser Alpha on a Windows 11 machine, and so far I am unable to get either this non-fatal assertion, or the other notice-level messages about RTT (both are caused by monotime failures).
I wonder if this is specific to your compilation with mingw, your CPU, or Windows 7.
Custom compilation should not matter.
I think there are 4 possible reasons: specific CPU, specific OS version, computer uptime and luck.
CPU + OS + luck determines li.QuadPart value, uptime + luck determines if there will be overflow.
I think that to reproduce this problem, first of all li.QuadPart should be examined.
If it allows for simplification, then even with large uptime there will be no problems.
If not, then related to this problem messages will probably appear in logs just several times a week.
And here are some Sudden decrease in circuit RTT messages, maybe they will help:
Sudden decrease in circuit RTT
Jan 09 08:31:52.000 [notice] Sudden decrease in circuit RTT (10932 vs 18370826188662), likely due to clock jump. [17 similar message(s) suppressed in last 760080 seconds]Jan 28 12:34:37.000 [notice] Sudden decrease in circuit RTT (505 vs 16397101038859), likely due to clock jump.Jan 31 10:35:48.000 [notice] Sudden decrease in circuit RTT (18428297329637010 vs 6136616861855226), likely due to clock jump.Feb 15 07:16:56.000 [notice] Sudden decrease in circuit RTT (1442 vs 664765999400485), likely due to clock jump.Feb 26 16:02:15.000 [notice] Sudden decrease in circuit RTT (101221 vs 6148912987547), likely due to clock jump. [17 similar message(s) suppressed in last 981960 seconds]Feb 27 20:33:36.000 [notice] Sudden decrease in circuit RTT (339 vs 12297825779578), likely due to clock jump. [10 similar message(s) suppressed in last 102720 seconds]Mar 12 19:59:55.000 [notice] Sudden decrease in circuit RTT (1937 vs 6136616861858483), likely due to clock jump.Mar 14 10:34:15.000 [notice] Sudden decrease in circuit RTT (76938 vs 6148912916667), likely due to clock jump.Mar 19 15:15:57.000 [notice] Sudden decrease in circuit RTT (134 vs 12297825776345), likely due to clock jump. [5 similar message(s) suppressed in last 448920 seconds]Apr 05 23:00:08.000 [notice] Sudden decrease in circuit RTT (376 vs 16397101035755), likely due to clock jump. [15 similar message(s) suppressed in last 1493100 seconds]Apr 12 03:45:03.000 [notice] Sudden decrease in circuit RTT (438 vs 6148912888645), likely due to clock jump. [1 similar message(s) suppressed in last 535500 seconds]Apr 12 17:16:45.000 [notice] Sudden decrease in circuit RTT (11956 vs 18370826177595), likely due to clock jump. [12 similar message(s) suppressed in last 48720 seconds]Apr 14 18:06:35.000 [notice] Sudden decrease in circuit RTT (18428297329637653 vs 6136616861855981), likely due to clock jump. [9 similar message(s) suppressed in last 175800 seconds]May 08 16:11:54.000 [notice] Sudden decrease in circuit RTT (794 vs 6148911085995), likely due to clock jump.May 11 17:49:37.000 [notice] Sudden decrease in circuit RTT (18428297329637047 vs 6136616861854514), likely due to clock jump. [13 similar message(s) suppressed in last 265080 seconds]May 12 08:51:11.000 [notice] Sudden decrease in circuit RTT (18428297329636023 vs 6136616861854407), likely due to clock jump. [11 similar message(s) suppressed in last 54120 seconds]May 23 09:18:56.000 [notice] Sudden decrease in circuit RTT (696 vs 664765999398278), likely due to clock jump. [13 similar message(s) suppressed in last 952080 seconds]May 24 07:51:28.000 [notice] Sudden decrease in circuit RTT (7649 vs 18370820788164), likely due to clock jump. [17 similar message(s) suppressed in last 81180 seconds]May 28 05:01:05.000 [notice] Sudden decrease in circuit RTT (596 vs 18370820773508), likely due to clock jump. [1 similar message(s) suppressed in last 335400 seconds]May 30 03:35:58.000 [notice] Sudden decrease in circuit RTT (18428297329636357 vs 6136616861855111), likely due to clock jump. [14 similar message(s) suppressed in last 167700 seconds]May 31 23:11:31.000 [notice] Sudden decrease in circuit RTT (265 vs 4099274057026), likely due to clock jump. [25 similar message(s) suppressed in last 156960 seconds]Jun 08 08:50:55.000 [notice] Sudden decrease in circuit RTT (282 vs 1502746), likely due to clock jump.Jun 24 22:01:39.000 [notice] Sudden decrease in circuit RTT (3019 vs 6148911089847), likely due to clock jump.Jun 28 20:41:29.000 [notice] Sudden decrease in circuit RTT (18428297329640794 vs 6136616861858063), likely due to clock jump. [7 similar message(s) suppressed in last 340800 seconds]Jul 04 05:55:06.000 [notice] Sudden decrease in circuit RTT (18428297329654759 vs 6136616861865803), likely due to clock jump. [16 similar message(s) suppressed in last 465240 seconds]Jul 07 15:03:33.000 [notice] Sudden decrease in circuit RTT (18428297329636623 vs 6136616861854598), likely due to clock jump. [2 similar message(s) suppressed in last 292140 seconds]Jul 13 09:54:26.000 [notice] Sudden decrease in circuit RTT (18428297329636297 vs 6136616861854151), likely due to clock jump.Jul 14 05:34:08.000 [notice] Sudden decrease in circuit RTT (18428297329636132 vs 6136616861854304), likely due to clock jump.Jul 14 13:50:02.000 [notice] Sudden decrease in circuit RTT (1714 vs 18370826161367), likely due to clock jump. [17 similar message(s) suppressed in last 29760 seconds]Jul 17 02:57:34.000 [notice] Sudden decrease in circuit RTT (18428297329636018 vs 6136616861854252), likely due to clock jump.Jul 18 10:30:52.000 [notice] Sudden decrease in circuit RTT (18428297329636755 vs 6136616861854833), likely due to clock jump. [11 similar message(s) suppressed in last 113640 seconds]Jul 20 01:35:07.000 [notice] Sudden decrease in circuit RTT (117 vs 2033241124569341), likely due to clock jump. [19 similar message(s) suppressed in last 140700 seconds]Jul 24 05:30:32.000 [notice] Sudden decrease in circuit RTT (2765 vs 18443927094041), likely due to clock jump.Jul 26 11:36:10.000 [notice] Sudden decrease in circuit RTT (829 vs 18446703954549), likely due to clock jump. [3 similar message(s) suppressed in last 194760 seconds]Jul 27 04:52:59.000 [notice] Sudden decrease in circuit RTT (721 vs 664765999398217), likely due to clock jump. [14 similar message(s) suppressed in last 62220 seconds]Jul 27 12:23:45.000 [notice] Sudden decrease in circuit RTT (18428297329636288 vs 6136616861854456), likely due to clock jump. [18 similar message(s) suppressed in last 27060 seconds]Jul 27 18:24:36.000 [notice] Sudden decrease in circuit RTT (18428297329806886 vs 2033241124734828), likely due to clock jump. [31 similar message(s) suppressed in last 21660 seconds]Aug 03 20:12:12.000 [notice] Sudden decrease in circuit RTT (18428297329636262 vs 6136616861854400), likely due to clock jump. [11 similar message(s) suppressed in last 611280 seconds]Aug 05 09:16:33.000 [notice] Sudden decrease in circuit RTT (18428297329684818 vs 2033241124603397), likely due to clock jump.Aug 06 09:18:36.000 [notice] Sudden decrease in circuit RTT (18428297329635946 vs 6136616861854110), likely due to clock jump.Aug 06 10:48:46.000 [notice] Sudden decrease in circuit RTT (373 vs 664765999399482), likely due to clock jump. [26 similar message(s) suppressed in last 5460 seconds]Aug 06 12:18:55.000 [notice] Sudden decrease in circuit RTT (18428297329636954 vs 6136616861854510), likely due to clock jump.Aug 06 15:19:14.000 [notice] Sudden decrease in circuit RTT (18428297329636330 vs 6136616861854625), likely due to clock jump. [3 similar message(s) suppressed in last 10860 seconds]Aug 07 10:51:17.000 [notice] Sudden decrease in circuit RTT (476 vs 664765999397922), likely due to clock jump. [13 similar message(s) suppressed in last 70380 seconds]Aug 09 17:55:41.000 [notice] Sudden decrease in circuit RTT (82 vs 18370827116091), likely due to clock jump. [17 similar message(s) suppressed in last 198300 seconds]
The QuadPart on my system is a different value every single invocation. It also seems very large. 5680 billion plus or minus random changes, if I'm reading it right..
What sort of uptime do you need before this starts becoming a problem?
The QuadPart on my system is a different value every single invocation
This is strange.
Are you sure that you checked QueryPerformanceFrequency and not QueryPerformanceCounter?
From documentation: "The frequency of the performance counter is fixed at system boot and is consistent across all processors so you only need to query the frequency from QueryPerformanceFrequency as the application initializes, and then cache the result."
Also I always thought that it just means CPU frequency. My value of 3410087 just corresponds to frequency of 3.41 GHz.
What sort of uptime do you need before this starts becoming a problem?
I did not tracked it.
But I can say that I started my system this time at 14.07.2023 17:47:33, Tor was started Jul 14 17:51:34 and first Sudden decrease in circuit RTT message appeared Jul 17 02:57:34.
Oh, my visual studio autocompleted to QueryPerformanceCounter, not QueryPerformanceFrequency. With QueryPerformanceFrequency(), I am now getting a consistent 10 million, which has no relation to my CPU speed.
Here is what article from previous link says: "when running under a hypervisor that implements the hypervisor version 1.0 interface (or always in some newer versions of Windows), the performance counter frequency is fixed to 10 MHz."
Ah. I am not running under a hypervisor, so it sounds like this is now normalized to 10Mhz. Is there any proof it still returns non-10Mhz on actually supported versions of windows?
@richard Can we disable Conflux on Windows 7? We have before been fighting monotime related issues on Windows that we wasted a ton of time on for very little benefit.
@ahf presuming disabling conflux would happen transparently to the browser (ie we don't have to do anything) then yeah go for it, though we could use some words for our 13.0 release blog post in Sept addressing this.
It may be easier to just fix this bug.
For example like this:
If after simplify_fraction64(&n, &d);n or d stays larger than 10 million, then divide n by 2 and d by 2 in a loop until they become <= 10 million.
I doubt that it can break anything.
Here are results of testing for c9e3f502:
During uptime of 27 days there were no assertion messages.
There were rare circuit RTT messages, but they had much smaller values.
Probably, they are related to some different bug:
Aug 29 08:56:36.000 [notice] Tor 0.4.8.4 (git-c9e3f502d01b8a28) opening log file.Sep 03 15:38:48.000 [notice] Sudden decrease in circuit RTT (46 vs 312626), likely due to clock jump.Sep 08 02:53:13.000 [notice] Sudden decrease in circuit RTT (56 vs 498796), likely due to clock jump.Sep 08 08:24:03.000 [notice] Sudden decrease in circuit RTT (66 vs 1886539), likely due to clock jump. [1 similar message(s) suppressed in last 19800 seconds]Sep 10 21:13:57.000 [notice] Sudden decrease in circuit RTT (55 vs 295102), likely due to clock jump. [1 similar message(s) suppressed in last 219000 seconds]
Someone is doing attacks to crash relays using tor_assertion in different ways with lower ports. during try connection. Since i have not found proper issue i decided comment here. In my case he is using conn->state to crash my relay on connection.c line 5773.