While parsing extra-info descriptors for the dir bytes analysis, I found that some of them have weird write-history or read-history lines with obviously wrong dates. I wonder what leads to these false reports and if there are more errors like this in extra-info descriptors.
I found 2 relays running 0.2.1.26 on Windows and 1 relay running 0.2.1.21 on Linux. I admit that's not much. If this doesn't help, I can look at the 2009 archives, too, to find more cases.
I had a closer look at the archives and found more mysteries. I think there's a bug in the byte counting code or in the state loading code. But staring at the code for two hours didn't help me find the bug. Maybe someone else is more lucky?
Here's my analysis so far:
Affected versions range from 0.2.0.23-rc to 0.2.2.13-alpha, so it's an old bug. The bug might even be older than 0.2.0.x when relays wrote their history lines to their server descriptors which I didn't look at.
Both relays on Windows and Linux are affected. I didn't find other platforms, but I expect the bug to affect all platforms.
Reported write-history and read-history lines have dates both in the past (see attached file past) and in the future (see attached file future.bz2). I found timestamps 1970-01-02 and 1985-06-11 as well as 2020-08-01 and 2028-09-23.
For timestamps in the future, relays keep reporting the same history lines over and over again. We have a relay reporting the same bandwidth history ending on 2009-08-12 for two years between 2007-08-15 and 2009-07-11.
How about this theory for dates in the future: You ran your relay with the wrong time (the one in the future), and wrote a state file. Then you corrected your clock. Then Tor loads the clock, and doesn't discard that strange data, but rather keeps reporting it until either your state file is erased or the incorrect time has been reached
I looked at all available data and tried to extract wrong dates. The table below lists my findings. Basically I extracted the (read|write)-history dates from the data and threw away the correct year. So you'll find sometimes entries from December in January entries. Those are in my opinion not wrong dates.
My script also threw away double date entries. So there are cases where a date occurs on my than one day. So you'll find sometimes the same days on entries over weeks or even moths. One case is 2009-08-12. You can find it on each month between August 2007 and December 2009 (18 times). Other "special" dates are:
2035-08-28, 2009-06-03, 2009-05-03 (6 entries in the table below)
2001-12-31 (5 entries in the table below)
2021-01-18, 2020-08-01, 2012-04-30, 2010-03-16, 2008-12-31, 2008-08-09, 2008-05-30 (4 entries in the table below)
2014-05-27, 2007-01-24, 2002-01-01 (3 entries in the table below)
I have now a script running which extracts all descriptors with wrong dates. I'll attach it when it is finished.
My script also threw away double date entries. So there are cases where a date occurs on my than one day. So you'll find sometimes the same days on entries over weeks or even moths. One case is 2009-08-12. You can find it on each month between August 2007 and December 2009 (18 times). Other "special" dates are:
Correction: It should read "between August 2007 and December 2008".
When a date occurs multiple times, read- and write-history have exactly the same value (in seconds since 1970). For instance there are nearly 150 entries with the below dates:
router cedartor 85.216.69.79 9001 0 0platform Tor 0.1.1.23 on Linux i686published 2006-10-27 15:28:492007-01-24 09:44:262007-01-24 09:44:26
When read- and write-history date are different, you only find one entry. Sometimes one of those entries is a correct date and sometimes both differ.
I have some suspicion that (part of) the bug is located in rephist.c:1259. If the date on a local computer is set to some point in the future, then AFAIS it returns this date. This happens until the real date is reached.
Furthermore we should check if the value of b is really NULL. If not this should also give some nasty effects. I'll look into that later and comment some more.:)
I have some suspicion that (part of) the bug is located in rephist.c:1259. If the date on a local computer is set to some point in the future, then AFAIS it returns this date. This happens until the real date is reached.
I'm not sure which date you mean and how we would end up in the situation you describe. Can you elaborate?
But I wonder if we can assume that there is no big time difference between starting Tor and calling add_obs(). Except if you mean that single clock jumps while running Tor lead to this bug. So, when starting Tor we should notice if our current system time differs much from the time stored in our state file. I wonder if Tor writes apparently wrong timestamps to its state and doesn't validate these timestamps when loading the state.
Furthermore we should check if the value of b is really NULL. If not this should also give some nasty effects. I'll look into that later and comment some more.:)
I collected all V1 directory information in one document. It is to large to upload it here. So you find it at V1-dirinfo.bz2. This document contains all "broken" descriptors.
As I suspected the dirreq-(read|write)-history lines are also affected by this bug. I found several lines with wrong dates so far.
Interesting! This makes sense, because I reused the byte history code for dirreq histories. Can you paste the affected relay descriptor identifiers here?
I wonder if we reached the point where looking at descriptor archives won't tell us why there's a bug. There seems to be no specific Tor version or platform that has this problem and that we could blame.
How about we analyze the code that implements the dirreq histories and try to find the bug here? If we find the bug in the code that's touched by dirreq histories, we very likely also find the bug in general byte histories. Dirreq histories have been implemented in this commit.
That means that corrupt timestamps aren't entirely random.
My first thought was that there are single corrupt bits. But that's unlikely/impossible if we look at the binary representation of, say, the first two dates above:
So, there's something else changing the "2011" to "2010" after formatting the timestamp string. I'm running out of ideas what that could be.
And why are quite a few of the relays publishing another descriptor 1 to 5 minutes after the one containing a corrupt timestamp? (Note that directory authorities must have accepted these descriptors, or we wouldn't have them in the archives.)
I don't think I'll find the time to hunt down this possible bug any further. And maybe we'll be using PrivCount for bandwidth statistics in the future anyway. I'm unassigning this ticket from myself, to have a more accurate view on what tickets I'm actually working on.
Trac: Status: new to assigned Owner: karsten toN/A