Extra-info descriptors have corrupt {write,read}-history lines

changed milestone to %Tor: unspecified

added bandwidth component::core tor/tor extrainfo metrics milestone::Tor: unspecified parent::29019 priority::low severity::normal status::new tor-relay type::defect labels

Do these correlate with any particular pattern of tor versions?

I found 2 relays running 0.2.1.26 on Windows and 1 relay running 0.2.1.21 on Linux. I admit that's not much. If this doesn't help, I can look at the 2009 archives, too, to find more cases.

I had a closer look at the archives and found more mysteries. I think there's a bug in the byte counting code or in the state loading code. But staring at the code for two hours didn't help me find the bug. Maybe someone else is more lucky?

Here's my analysis so far:

Affected versions range from 0.2.0.23-rc to 0.2.2.13-alpha, so it's an old bug. The bug might even be older than 0.2.0.x when relays wrote their history lines to their server descriptors which I didn't look at.
Both relays on Windows and Linux are affected. I didn't find other platforms, but I expect the bug to affect all platforms.
Reported write-history and read-history lines have dates both in the past (see attached file past) and in the future (see attached file future.bz2). I found timestamps 1970-01-02 and 1985-06-11 as well as 2020-08-01 and 2028-09-23.
For timestamps in the future, relays keep reporting the same history lines over and over again. We have a relay reporting the same bandwidth history ending on 2009-08-12 for two years between 2007-08-15 and 2009-07-11.

Someone willing to win the oldest bug prize? :)

Trac:
past

History lines lying in the past of the descriptor publication time

Trac:
future.bz2

History lines lying in the future of the descriptor publication time

How about this theory for dates in the future: You ran your relay with the wrong time (the one in the future), and wrote a state file. Then you corrected your clock. Then Tor loads the clock, and doesn't discard that strange data, but rather keeps reporting it until either your state file is erased or the incorrect time has been reached

Trac:
Component: Metrics to Tor Relay

Trac:
Milestone: N/A to Tor: 0.2.2.x-final
Priority: normal to minor

I looked at all available data and tried to extract wrong dates. The table below lists my findings. Basically I extracted the (read|write)-history dates from the data and threw away the correct year. So you'll find sometimes entries from December in January entries. Those are in my opinion not wrong dates.

My script also threw away double date entries. So there are cases where a date occurs on my than one day. So you'll find sometimes the same days on entries over weeks or even moths. One case is 2009-08-12. You can find it on each month between August 2007 and December 2009 (18 times). Other "special" dates are:

2035-08-28, 2009-06-03, 2009-05-03 (6 entries in the table below)
2001-12-31 (5 entries in the table below)
2021-01-18, 2020-08-01, 2012-04-30, 2010-03-16, 2008-12-31, 2008-08-09, 2008-05-30 (4 entries in the table below)
2014-05-27, 2007-01-24, 2002-01-01 (3 entries in the table below)

I have now a script running which extracts all descriptors with wrong dates. I'll attach it when it is finished.

The maximum and minimum date is highlighted.

02	2005	2000-01-01

03	2005

04	2005	2000-01-07

05	2005

06	2005

07	2005

08	2005

09	2005	2002-09-01

10	2005	2002-11-29

11	2005
12	2005
01	2006	2005-12-07, 2005-12-30, 2005-12-31
02	2006
03	2006
04	2006	2002-04-24
05	2006	2007-05-27
06	2006	2007-05-27
07	2006
08	2006	2007-08-10
09	2006	2008-09-08
10	2006	2007-01-24, 2010-07-17
11	2006	2007-01-24, 2007-01-26, 2007-07-16, 2007-12-06, 2010-07-17
12	2006	2007-01-24, 2007-12-03, 2007-12-09, 2007-12-13, 2007-12-20
01	2007	2006-12-30, 2006-12-31, 2021-01-18
02	2007	2014-05-27, 2021-01-18
03	2007	2009-05-03, 2009-06-03, 2010-03-16, 2014-05-27, 2014-11-25, 2035-08-28
04	2007	2008-04-01, 2009-04-12, 2009-05-03, 2009-06-03, 2010-03-16, 2014-05-27, 2014-11-25, 2021-01-18, 2021-03-31, 2035-08-28
05	2007	2008-05-30, 2009-02-16, 2009-05-03, 2009-06-03, 2010-03-16, 2012-04-30, 2015-05-07, 2021-05-11, 2030-05-04, 2035-08-28
06	2007	2008-05-30, 2008-06-21, 2009-05-03, 2009-06-03, 2010-03-16, 2012-04-30, 2013-06-18, 2017-06-03, 2027-06-09, 2030-05-04, 2035-08-28
07	2007	2008-05-03, 2008-05-30, 2008-06-21, 2008-07-01, 2009-05-03, 2009-06-03, 2012-04-30, 2013-06-18, 2035-08-28
08	2007	2008-08-09, 2009-08-12, 1999-11-13, 2008-05-03, 2008-05-30, 2008-08-09, 2008-08-17, 2009-05-03, 2009-06-03, 2009-08-12, 2010-08-10, 2012-04-30, 2020-08-01, 2021-01-18, 2027-07-31, 2035-08-28

09	2007	1985-06-11, 2008-08-09, 2008-09-15, 2009-08-12, 2020-08-01
10	2007	2008-08-09, 2009-08-12, 2011-01-01
11	2007	2009-08-12, 2020-08-01
12	2007	2009-08-12, 2020-08-01
01	2008	2009-08-12
02	2008	2009-08-12
03	2008	2009-08-12
04	2008	2004-08-26, 2009-08-12
05	2008	2001-01-01, 2009-08-12
06	2008	2007-01-01, 2009-08-12
07	2008	2009-08-12
08	2008	2009-08-12
09	2008	2009-08-12
10	2008	2007-10-26, 2009-08-12, 2009-10-29, 2015-07-19
11	2008	2009-08-12
12	2008	2009-01-01, 2009-08-12
01	2009	2001-12-31
02	2009	2001-12-31
03	2009
04	2009
05	2009
06	2009

07	2009	2028-09-23
08	2009
09	2009	2006-01-03
10	2009
11	2009	2004-11-01, 2006-12-26
12	2009	2010-01-01
01	2010	2009-10-07, 2009-11-30
02	2010	2003-12-31
03	2010	2008-12-31
04	2010	2008-01-02
05	2010	1970-01-02, 2001-12-31, 2009-10-07
06	2010	2003-01-01, 2009-01-01, 2009-12-21
07	2010	2001-01-01, 2001-07-12
08	2010	2002-01-01, 2003-01-01, 2007-12-31
09	2010	2002-01-01, 2003-01-12, 2008-12-31
10	2010	2000-01-01, 2000-01-02, 2006-08-15, 2008-12-31
11	2010	2001-12-31, 2002-01-03, 2009-08-17
12	2010	2001-12-31, 2002-12-30, 2009-03-11
01	2011	2002-01-01, 2002-01-23, 2010-01-16, 2010-09-05, 2010-10-15, 2010-11-19, 2010-11-22, 2010-11-24, 2012-01-17 (also many days in 12/2010)

Trac:
falsedata

Original output of my script (collecting wrong date entries from descriptors)

Replying to qbi:

My script also threw away double date entries. So there are cases where a date occurs on my than one day. So you'll find sometimes the same days on entries over weeks or even moths. One case is 2009-08-12. You can find it on each month between August 2007 and December 2009 (18 times). Other "special" dates are:

Correction: It should read "between August 2007 and December 2008".

Trac:
router-platform-date-V1.bz2

Listing of router, platform and dates where router publishes wrong history data

I've found some more information:

Replying to qbi:

I've found some more information:

The first router version is 0.1.0.0-alpha-cvs.
So far I only found Linux and Windows systems:
- Linux i586 and i686
- Windows 2000 SP4 server and workstation
- Windows Me
- Windows Longhorn
- Windows XP SP1 and SP2
- Windows 2003
When a date occurs multiple times, read- and write-history have exactly the same value (in seconds since 1970). For instance there are nearly 150 entries with the below dates:

router cedartor 85.216.69.79 9001 0 0
platform Tor 0.1.1.23 on Linux i686
published 2006-10-27 15:28:49
2007-01-24 09:44:26
2007-01-24 09:44:26

When read- and write-history date are different, you only find one entry. Sometimes one of those entries is a correct date and sometimes both differ.

I have some suspicion that (part of) the bug is located in rephist.c:1259. If the date on a local computer is set to some point in the future, then AFAIS it returns this date. This happens until the real date is reached.

Furthermore we should check if the value of b is really NULL. If not this should also give some nasty effects. I'll look into that later and comment some more.:)

Replying to qbi:

I have some suspicion that (part of) the bug is located in rephist.c:1259. If the date on a local computer is set to some point in the future, then AFAIS it returns this date. This happens until the real date is reached.

I'm not sure which date you mean and how we would end up in the situation you describe. Can you elaborate?

But I wonder if we can assume that there is no big time difference between starting Tor and calling add_obs(). Except if you mean that single clock jumps while running Tor lead to this bug. So, when starting Tor we should notice if our current system time differs much from the time stored in our state file. I wonder if Tor writes apparently wrong timestamps to its state and doesn't validate these timestamps when loading the state.

Furthermore we should check if the value of b is really NULL. If not this should also give some nasty effects. I'll look into that later and comment some more.:)

Which line? Do you mean NULL or not NULL?

Trac:
Points: N/A to N/A

I collected all V1 directory information in one document. It is to large to upload it here. So you find it at V1-dirinfo.bz2. This document contains all "broken" descriptors.

Trac:
Actualpoints: N/A to N/A

Trac:
V1-dirinfo.bz2

Replying to qbi:

Update:

02	2011	2000-01-01, 2008-09-11, 2009-08-11, 2010-09-05, 2010-10-15, 2010-11-19, 2010-11-24

As I suspected the dirreq-(read|write)-history lines are also affected by this bug. I found several lines with wrong dates so far.

Replying to qbi:

As I suspected the dirreq-(read|write)-history lines are also affected by this bug. I found several lines with wrong dates so far.

Interesting! This makes sense, because I reused the byte history code for dirreq histories. Can you paste the affected relay descriptor identifiers here?

I wonder if we reached the point where looking at descriptor archives won't tell us why there's a bug. There seems to be no specific Tor version or platform that has this problem and that we could blame.

How about we analyze the code that implements the dirreq histories and try to find the bug here? If we find the bug in the code that's touched by dirreq histories, we very likely also find the bug in general byte histories. Dirreq histories have been implemented in this commit.

The router PeterbjornxRelay reports 2011-01-26.

I took another look at extra-info descriptors containing corrupt write-history lines, and I found an interesting pattern that I can't explain.

A lot of the obviously wrong write-history timestamps still match the 900-second period of the write history in the previous or subsequent descriptor.

Here are some examples (fingerprint, descriptor publication time, write-history timestamp, *** for corrupt timestamp):

0A8B196D53B02CA93818B0258249C04A5CB60E39,2011-11-18 03:15:42,2010-11-18 03:09:43 *** 
0A8B196D53B02CA93818B0258249C04A5CB60E39,2011-11-18 03:16:43,2011-11-18 03:09:43

3A64376144842654940E09C4D8CC3A5307536C1B,2011-11-09 14:13:20,2004-12-31 23:01:45 *** 
3A64376144842654940E09C4D8CC3A5307536C1B,2011-11-09 14:17:24,2011-11-09 14:16:45

40257810D70D006D0B240D02727ED0C9636A45FA,2012-01-05 16:33:12,2008-07-18 00:21:50 *** 
40257810D70D006D0B240D02727ED0C9636A45FA,2012-01-05 16:34:13,2012-01-05 16:21:50

4373CC0CFF1BAC0B91C61F6E1BF8998C8B6DE4EE,2011-12-19 10:26:13,2011-12-19 10:22:38
4373CC0CFF1BAC0B91C61F6E1BF8998C8B6DE4EE,2011-12-19 14:50:20,2011-12-13 09:52:38 *** 
4373CC0CFF1BAC0B91C61F6E1BF8998C8B6DE4EE,2011-12-19 14:51:20,2011-12-19 14:37:38

4C2D42B29791347A9F94E52AB5B54CA1EF2A3F5F,2012-04-08 09:16:02,2002-01-01 03:16:49 *** 
4C2D42B29791347A9F94E52AB5B54CA1EF2A3F5F,2012-04-08 09:17:03,2012-04-08 09:16:49

4C2ED83F581503D2DE73EE5DB11D5DAC20F7986C,2011-10-26 21:47:47,2011-10-26 21:45:25
4C2ED83F581503D2DE73EE5DB11D5DAC20F7986C,2011-10-26 22:25:03,1970-01-01 00:00:25 *** 
4C2ED83F581503D2DE73EE5DB11D5DAC20F7986C,2011-10-26 22:26:36,2011-10-26 22:15:25

50C07527F812190CD2719125DB27F5A35E775536,2011-10-17 16:23:15,2010-10-17 18:05:19 *** 
50C07527F812190CD2719125DB27F5A35E775536,2011-10-17 16:27:51,2011-10-17 16:20:19

56113F3CC4FBAF9E2F4E6B020E821CD170B15885,2011-11-22 16:27:39,2011-11-22 16:14:41
56113F3CC4FBAF9E2F4E6B020E821CD170B15885,2011-11-22 18:09:22,2011-11-18 20:59:41 *** 
56113F3CC4FBAF9E2F4E6B020E821CD170B15885,2011-11-22 18:10:20,2011-11-22 17:59:41
56113F3CC4FBAF9E2F4E6B020E821CD170B15885,2011-11-22 18:24:28,2011-11-18 20:59:41 *** 
56113F3CC4FBAF9E2F4E6B020E821CD170B15885,2011-11-22 18:25:32,2011-11-22 18:14:41

72F69E6B3575BA258758E5B3FC749D01950210B4,2011-11-19 16:45:23,1999-12-31 23:30:40 *** 
72F69E6B3575BA258758E5B3FC749D01950210B4,2011-11-19 16:46:22,2011-11-19 16:45:40

8875C1AA54E9DA0C2F79AE7E21F03218BC5086F7,2012-02-12 22:45:22,2012-02-12 22:35:47
8875C1AA54E9DA0C2F79AE7E21F03218BC5086F7,2012-02-27 01:12:02,2012-02-04 02:35:47 *** 
8875C1AA54E9DA0C2F79AE7E21F03218BC5086F7,2012-02-27 01:13:03,2012-02-27 01:05:47

8DE5D6A67DD16A9C6EABB7EFC69BB81B6E26FCB6,2012-02-07 18:58:01,2005-04-06 02:21:18 *** 
8DE5D6A67DD16A9C6EABB7EFC69BB81B6E26FCB6,2012-02-07 18:58:09,2012-02-07 18:51:18

95C8F4A0C6262FD4B4CAF7D43F446026D35B8C86,2011-12-14 15:34:15,2011-05-14 15:19:09 *** 
95C8F4A0C6262FD4B4CAF7D43F446026D35B8C86,2011-12-14 15:35:15,2011-12-14 15:34:09

97DAAF5C00A4B1E28ED0E9306C59D50E9C6DFC77,2012-01-03 22:32:10,2000-01-01 07:16:13 *** 
97DAAF5C00A4B1E28ED0E9306C59D50E9C6DFC77,2012-01-03 22:33:22,2012-01-03 22:31:13

9CD2B1E09B76503B6ADC0EB90DD399DEC7A3CE76,2011-12-11 18:37:55,1970-01-01 00:15:42 *** 
9CD2B1E09B76503B6ADC0EB90DD399DEC7A3CE76,2011-12-11 18:39:39,2011-12-11 18:30:42

9CD2B1E09B76503B6ADC0EB90DD399DEC7A3CE76,2012-03-26 09:58:20,1970-01-02 21:30:41 *** 
9CD2B1E09B76503B6ADC0EB90DD399DEC7A3CE76,2012-03-26 10:00:29,2012-03-26 09:45:41

A153374361E53C21C673485F65A43E33CAA252D7,2012-02-19 17:58:29,2000-01-01 00:23:51 *** 
A153374361E53C21C673485F65A43E33CAA252D7,2012-02-19 17:59:30,2012-02-19 17:53:51

B47ED35EAD03C6D594A2A0ED1DD807EE5FF69F41,2011-11-10 00:03:16,2010-09-16 23:01:00 *** 
B47ED35EAD03C6D594A2A0ED1DD807EE5FF69F41,2011-11-10 09:20:28,2011-11-10 09:16:00

E163C7D1A1C187EF946EA06732398D1CDC0C9186,2011-10-12 14:03:49,2006-02-09 22:05:28 *** 
E163C7D1A1C187EF946EA06732398D1CDC0C9186,2011-10-12 14:05:51,2011-10-12 14:05:28

That means that corrupt timestamps aren't entirely random.

My first thought was that there are single corrupt bits. But that's unlikely/impossible if we look at the binary representation of, say, the first two dates above:

2010-11-18 03:09:43 = 01001100 11100100 10011000 11110111
2011-11-18 03:09:43 = 01001110 11000101 11001100 01110111

So, there's something else changing the "2011" to "2010" after formatting the timestamp string. I'm running out of ideas what that could be.

And why are quite a few of the relays publishing another descriptor 1 to 5 minutes after the one containing a corrupt timestamp? (Note that directory authorities must have accepted these descriptors, or we wouldn't have them in the archives.)

Trac:
Milestone: Tor: 0.2.2.x-final to Tor: unspecified

Trac:
Keywords: N/A deleted, tor-relay added

Trac:
Component: Tor Relay to Tor

Trac:
Sponsor: N/A to N/A
Severity: N/A to Normal
Reviewer: N/A to N/A
Keywords: tor-relay deleted, tor-relay metrics extrainfo bandwidth added

I don't think I'll find the time to hunt down this possible bug any further. And maybe we'll be using PrivCount for bandwidth statistics in the future anyway. I'm unassigning this ticket from myself, to have a more accurate view on what tickets I'm actually working on.

Trac:
Status: new to assigned
Owner: karsten to N/A

Trac:
Status: assigned to new

#29019 (moved) may fix this bug, as a side-effect of checking for TestingTorNetwork period length transitions.

Trac:
Parent: N/A to #29019 (moved)

mentioned in issue #29019 (moved)

moved to tpo/core/tor#1926 (closed)

Extra-info descriptors have corrupt {write,read}-history lines

Child items ...

Activity