08:57 <+GeKo> not random, always the first day of the month at about 00:00:0008:57 <+GeKo> ah, that's the "bug", oaky08:57 <@armacake> right. they're supposed to see how long it took them to use it all last time, and then assume they'll take the same interval this time,08:57 <@armacake> and then randomly place that interval in the month08:58 <@armacake> so it will likely skew early, but it should not be 00:00:00, unless it took them approximately the whole month to use it last time09:06 <@armacake> see the file comment at the top of src/feature/hibernate/hibernate.c for how it's supposed to behave
Designs
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
It might be a tor bug that this is actually happening. It's worth, though, investigating those relays a bit before, trying to find some patterns or the potential problem and then file a ticket to get things fixed on tor's side, if it's actually a bug.
Which are the exact "rate limits" operators would change?, are they public in some descriptor?
If they are, we could check that, if not, it should probably ask operators.
Regarding my last comment in that issue, it's no longer true that reports less relays than Torflow and less over time after solving the issue. But it's still true that new more relays are reported at the beginning of the month, see and from https://consensus-health.torproject.org/graphs.html
Hi, I stumbled across this issue in June when plotting some graphs about the number of Tor relays (the new relays joining at the beginning of the month are clearly visible in there). Check out the Running flag for example:
I tried to figure out why this happens and while I was unable to fully confirm what is going on, I do think I can add some additional information:
This behavior started in July 2020 (since then there was an unusual spike between 00:00 and 01:00 on the first of every month. These spikes did not exist before...)
By comparing the archived consensus from 00:00 and 01:00 of the first of every month, I was able to confirm that those spikes are regularly caused by the same relays. To be more precise, I found more than 150 relays that reliably drop out of the network over time and show up precisely at the start of a new month.
I used onionoo to gather some more data about those relays to see if anything about them would give me a clue as to why they are acting the way they do. Here I noticed that a large share of these relays was deployed since March 2020 (unsurprising since the behavior only started in July), but they are mostly clustered around two cloud hosting providers: Hetzner and Linode. I mean those two hosting providers are often used for Tor relays, but that they have such a huge share while hosters like OVH don't show up at all seems odd (unless that is due to a different reverse DNS policy at OVH)
Especially interesting in my opinion is the "first_seen" information for those spiking relays. When I sorted the list of relays according to their first_seen date, they also sorted themselves according to hosting provider. So for example, between 04-29-2020 and 05-10-2020 32 spiking relays were deployed at Hetzner (not a single one was deployed anywhere else. Then a week later between 05-18-2020 and 05-26-2020 22 spiking relays were deployed at Linode and again no spiking relays showed up anywhere else. And that pattern repeats again during June and July 2020.
If you want to take a look at my data yourself, I have uploaded it here. The data stops at the last of July, so spikes from the first of August are not yet included in the data. I have highlighted the pattern where I think the appearance of relays is suspicious. The second column (count) tells you during how many months the relay showed up between 00:00 and 01:00.
Unfortunately, I have still no idea, why those relays rejoin the network exactly at the first of the month, but maybe my findings so far can help you figure it out. If there is anything I can do to help (like adding additional columns to my relay list with field from onionoo) just ask, I am really curious myself where this leads to.
@TTH: Thanks for the investigation, really appreciated!
This behavior started in July 2020 (since then there was an unusual spike between 00:00 and 01:00 on the first of every month. These spikes did not exist before...)
I am not sure whether that's actually correct. If you look at data throughout 2018, for instance, you see a similar spike every 1st of the month. However, it is not as high as after July 2020. But that in turn is likely due to all the relays you mentioned joining around that time. Thus, those relays seem to make the underlying issue just much more visible than causing it.
If I see this right there is no contact info to reach out to the larger clusters, right? Hrm. Maybe we could ask the operator for 50F1D32B208A2D97F124882137C0A3BDB74EDD09 whether he has some clue, given that only that relay seems to show the same "spike-y" behavior while the other two in the family, being at Hetzner as well!, behave differently.
Hi,
to be honest, I did not check anything before 2020 yet (downloading and parsing so much consensus data overwhelms my laptop, will have to run that over night)
For comparison here are consensus count changes since 01-2020. I had the quite arbitrary limit of +150 relays before I considered it a relevant spike which led me to the first of July.
2020-01-01-01-00-00: Relay count changed from 6312 to 6328 (+16)2020-02-01-01-00-00: Relay count changed from 6294 to 6337 (+43)2020-03-01-01-00-00: Relay count changed from 6446 to 6470 (+24)2020-05-01-01-00-00: Relay count changed from 6661 to 6694 (+33)2020-06-01-01-00-00: Relay count changed from 6160 to 6264 (+104)2020-07-01-01-00-00: Relay count changed from 6225 to 6385 (+160)2020-08-01-01-00-00: Relay count changed from 6414 to 6758 (+344)2020-09-01-01-00-00: Relay count changed from 6411 to 6803 (+392)2020-10-01-01-00-00: Relay count changed from 6541 to 6942 (+401)2020-11-01-01-00-00: Relay count changed from 6377 to 6795 (+418)2020-12-01-01-00-00: Relay count changed from 6593 to 6996 (+403)2021-01-01-01-00-00: Relay count changed from 6829 to 7175 (+346)2021-02-01-01-00-00: Relay count changed from 6907 to 7251 (+344)2021-03-01-01-00-00: Relay count changed from 6744 to 7086 (+342)2021-04-01-01-00-00: Relay count changed from 6555 to 6918 (+363)2021-05-01-01-00-00: Relay count changed from 6528 to 6775 (+247)2021-06-01-01-00-00: Relay count changed from 6574 to 6825 (+251)2021-07-01-01-00-00: Relay count changed from 6587 to 6830 (+243)
I'll let you know once I have data on how the monthly relay spikes behaved before 2020. What I can already tell you is that some relays have been displaying this behavior for years and most of them do provide contact info, so contacting them might be a good idea. If it helps, I can prepare a list of relays that have been displaying this behavior before the spike in 2020, but I feel like contacting them should be done from an @torproject.org mail address.
I'll let you know once I have data on how the monthly relay spikes behaved before 2020. What I can already tell you is that some relays have been displaying this behavior for years and most of them do provide contact info, so contacting them might be a good idea. If it helps, I can prepare a list of relays that have been displaying this behavior before the spike in 2020, but I feel like contacting them should be done from an @torproject.org mail address.
Sounds good. And, yes, we can take care of contacting folks, thanks!
Well, it turns out that that parsing >5 years of consensus data is not an overnight, but a weekend job, but here are the consensus size changes since 2015:
2015-01-01-01-00-00: Relay count changed from 6755 to 6770 (+15) 2015-02-01-01-00-00: Relay count changed from 6934 to 6939 (+5) 2015-03-01-01-00-00: Relay count changed from 6804 to 6803 (-1) 2015-04-01-01-00-00: Relay count changed from 6496 to 6516 (+20) 2015-05-01-01-00-00: Relay count changed from 6396 to 6424 (+28) 2015-06-01-01-00-00: Relay count changed from 6523 to 6520 (-3) 2015-07-01-01-00-00: Relay count changed from 6561 to 6476 (-85) 2015-08-01-01-00-00: Relay count changed from 6287 to 6300 (+13) 2015-09-01-01-00-00: Relay count changed from 6235 to 6267 (+32) 2015-10-01-01-00-00: Relay count changed from 6525 to 6523 (-2) 2015-11-01-01-00-00: Relay count changed from 6735 to 6745 (+10) 2015-12-01-01-00-00: Relay count changed from 6711 to 6749 (+38) 2016-01-01-01-00-00: Relay count changed from 6942 to 6975 (+33) 2016-03-01-01-00-00: Relay count changed from 7146 to 7154 (+8) 2016-04-01-01-00-00: Relay count changed from 7111 to 7124 (+13) 2016-05-01-01-00-00: Relay count changed from 7039 to 7038 (-1) 2016-06-01-01-00-00: Relay count changed from 6993 to 7003 (+10) 2016-07-01-01-00-00: Relay count changed from 7044 to 7062 (+18) 2016-08-01-01-00-00: Relay count changed from 6938 to 6923 (-15) 2016-09-01-01-00-00: Relay count changed from 6990 to 7004 (+14) 2016-10-01-01-00-00: Relay count changed from 7192 to 7196 (+4) 2016-11-01-01-00-00: Relay count changed from 7077 to 7093 (+16) 2016-12-01-01-00-00: Relay count changed from 7178 to 7168 (-10) 2017-01-01-01-00-00: Relay count changed from 7085 to 7096 (+11) 2017-02-01-01-00-00: Relay count changed from 7192 to 7213 (+21) 2017-03-01-01-00-00: Relay count changed from 7244 to 7260 (+16) 2017-04-01-01-00-00: Relay count changed from 7286 to 7305 (+19) 2017-05-01-01-00-00: Relay count changed from 7214 to 7251 (+37) 2017-07-01-01-00-00: Relay count changed from 7069 to 7108 (+39) 2017-08-01-01-00-00: Relay count changed from 6840 to 6883 (+43) 2017-09-01-01-00-00: Relay count changed from 6850 to 6883 (+33) 2017-10-01-01-00-00: Relay count changed from 6609 to 6659 (+50) 2017-11-01-01-00-00: Relay count changed from 6432 to 6491 (+59) 2017-12-01-01-00-00: Relay count changed from 6553 to 6596 (+43) 2018-01-01-01-00-00: Relay count changed from 6162 to 6231 (+69) 2018-02-01-01-00-00: Relay count changed from 6039 to 6099 (+60) 2018-03-01-01-00-00: Relay count changed from 5890 to 5952 (+62) 2018-04-01-01-00-00: Relay count changed from 6287 to 6344 (+57) 2018-05-01-01-00-00: Relay count changed from 6392 to 6443 (+51) 2018-06-01-01-00-00: Relay count changed from 6330 to 6375 (+45) 2018-07-01-01-00-00: Relay count changed from 6365 to 6409 (+44) 2018-08-01-01-00-00: Relay count changed from 6331 to 6359 (+28) 2018-09-01-01-00-00: Relay count changed from 6288 to 6334 (+46) 2018-10-01-01-00-00: Relay count changed from 6331 to 6369 (+38) 2018-11-01-01-00-00: Relay count changed from 6341 to 6382 (+41) 2018-12-01-01-00-00: Relay count changed from 6379 to 6397 (+18) 2019-01-01-01-00-00: Relay count changed from 6105 to 6172 (+67) 2019-02-01-01-00-00: Relay count changed from 6528 to 6556 (+28) 2019-03-01-01-00-00: Relay count changed from 6469 to 6500 (+31) 2019-04-01-01-00-00: Relay count changed from 6534 to 6570 (+36) 2019-05-01-01-00-00: Relay count changed from 6565 to 6605 (+40) 2019-06-01-01-00-00: Relay count changed from 6523 to 6537 (+14) 2019-07-01-01-00-00: Relay count changed from 6355 to 6384 (+29) 2019-08-01-01-00-00: Relay count changed from 6440 to 6454 (+14) 2019-09-01-01-00-00: Relay count changed from 6514 to 6536 (+22) 2019-10-01-01-00-00: Relay count changed from 6405 to 6440 (+35) 2019-11-01-01-00-00: Relay count changed from 6013 to 6008 (-5) 2019-12-01-01-00-00: Relay count changed from 6149 to 6155 (+6) 2020-01-01-01-00-00: Relay count changed from 6312 to 6328 (+16) 2020-02-01-01-00-00: Relay count changed from 6294 to 6337 (+43) 2020-03-01-01-00-00: Relay count changed from 6446 to 6470 (+24) 2020-05-01-01-00-00: Relay count changed from 6661 to 6694 (+33) 2020-06-01-01-00-00: Relay count changed from 6160 to 6264 (+104) 2020-07-01-01-00-00: Relay count changed from 6225 to 6385 (+160) 2020-08-01-01-00-00: Relay count changed from 6414 to 6758 (+344) 2020-09-01-01-00-00: Relay count changed from 6411 to 6803 (+392) 2020-10-01-01-00-00: Relay count changed from 6541 to 6942 (+401) 2020-11-01-01-00-00: Relay count changed from 6377 to 6795 (+418) 2020-12-01-01-00-00: Relay count changed from 6593 to 6996 (+403) 2021-01-01-01-00-00: Relay count changed from 6829 to 7175 (+346) 2021-02-01-01-00-00: Relay count changed from 6907 to 7251 (+344) 2021-03-01-01-00-00: Relay count changed from 6744 to 7086 (+342) 2021-04-01-01-00-00: Relay count changed from 6555 to 6918 (+363) 2021-05-01-01-00-00: Relay count changed from 6528 to 6775 (+247) 2021-06-01-01-00-00: Relay count changed from 6574 to 6825 (+251) 2021-07-01-01-00-00: Relay count changed from 6587 to 6830 (+243) 2021-08-01-01-00-00: Relay count changed from 6670 to 6914 (+244)
It confirms that in 2017 and 2018 there always was an increase at the start of the month. My guess would be that there have been a few relays exhibiting this behavior for a long time, but they were only noticed because their number increased significantly in 2020. (Note that some months are missing in this list, if the consensus for 00:00 or 01:00 was missing)
My record holding relay is 384B51C97D98F34893B37107E74C70295D2C10E7, it has rejoined the network 60 times at the start of a month. I am not sure on what is the best criteria to decide if a relay operator should be contacted, so I'll just give a short list of relays that were still active in August, provide contact information, run on Linux and have been showing the monthly spike behavior regularly (more than 3 times out of 4):
384B51C97D98F34893B37107E74C70295D2C10E7: 60 times since 2015-01
2AB0B91CCF12664D5D95083A6A7B871918C8CF9C: 41 times since 2015-09
756C34BBDC7716A5FE838507D8B5FD1CF24F303B: 18 times since 2019-12
A37B77EB9C22F136A6E42E099997EA172C89FE42: 15 times since 2020-04
A024E5003A0C2D4748863B901F1CDFDF89284774: 15 times since 2020-05
1A0D10A4FE62D38097C0E7182638F3EAB1958E28: 14 times since 2020-06
2EEC859FB97C5BDB9518E2049C31B1C9E33123E8: 10 times since 2020-09
7355A4A20BA4C083C921464302108C121BAC7EF2: 15 times since 2020-01
spiking_relays_since_2015.ods you find again my spreadsheet with the parsed information. This time I included information about their exit policy and most relevant the months in which they spiked. That makes it easier to differentiate between relays that spike occasionally over a long period of time and relays that spike reliable during a shorter period.
select relays_published,as_name,count(*) as c,sum(guard_probability*100),sum(middle_probability*100) from long_term_relays where running=1 and as_name IN ('DigitalOcean, LLC','Hetzner Online GmbH','Linode, LLC','OVH SAS','Cogent Communications')and nickname ='Unnamed' and contact is null and `exit`=0 and or_port=443 and dir_port IN (0, 8443) and relays_published > '2020-06-01' group by relays_published,as_name order by 1
FWIW: we kicked out a bunch of relays recently that are likely responsible for the spikes we see here. So, expect some smaller spike for 12/01. I doubt, though, it will be fully gone next month.
FWIW: we kicked out a bunch of relays recently that are likely responsible for the spikes we see here. So, expect some smaller spike for 12/01. I doubt, though, it will be fully gone next month.
While I was right about that I think we are done here, though, given that the spike seems to be gone: