This was causing my relay to crash every two days till I figured it out. At a minimum Tor should warn about memory consumption when this option is enabled. IMO the present behavior is broken and the feature should be redesigned or eliminated.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
Marking for backport, let's get this done soon, because it makes the DDoS worse.
Edit: spacing
Trac: Parent: N/Ato#24806 (moved) Severity: Normal to Major Priority: Medium to High Keywords: N/Adeleted, must-fix-before-033-stable, 029-backport, ddos, 031-backport added Milestone: N/Ato Tor: 0.3.3.x-final Points: N/Ato 1
Ok, my tor relay, now that it's not under valgrind, has been slowly growing.
It could be actual leaks that only happen when the relay is fast enough to do something. It could be memory fragmentation. Or it could be that we're simply keeping a whole lot of extra stuff in memory.
diff --git a/src/or/rephist.c b/src/or/rephist.cindex 15fb674..bb59b6b 100644--- a/src/or/rephist.c+++ b/src/or/rephist.c@@ -2390,6 +2390,8 @@ rep_hist_add_buffer_stats(double mean_num_cells_in_queue, if (!circuits_for_buffer_stats) circuits_for_buffer_stats = smartlist_new(); smartlist_add(circuits_for_buffer_stats, stats);+ log_info(LD_HIST, "circuits_for_buffer_stats now size %d",+ smartlist_len(circuits_for_buffer_stats)); } /** Remember cell statistics for circuit <b>circ</b> at time
Every time a circuit closes, we malloc some stuff and add it onto this smartlist. And we only do something with the contents of the smartlist after 24 hours.
During this overload period, my relay handles many hundreds of millions of circuits in a day. Keeping anything about each one of them is going to be too much.
What do we actually use this option for? Some experiment we did in the distant past? It sounds like "recommend against enabling it" is the first step. And then either redesigning it to accomplish whatever goals remain, or stripping it out.
I am cc'ing Karsten because I bet he can help us with the 'distant past' part.
I am cc'ing Karsten because I bet he can help us with the 'distant past' part.
commit b493a2ccb97e00f4fe3acb5c59c941c2babaeebbAuthor: Karsten Loesing <karsten.loesing@gmx.net>Date: Sun Jul 5 19:53:25 2009 +0200 If configured, write cell statistics to disk periodically.
And then either redesigning it to accomplish whatever goals remain, or stripping it out.
Instead of deciles, a combination of variance, skew and kurtosis could be used (a la Commons Math's AbstractStorelessUnivariateStatistic).
I'd like to take a closer look and also discuss this at the metrics team meeting on Thursday. Copying iwakeh and robgjansen who I think might have an opinion on this, too.
@arma, CellStatistics still needs to be explicitly enabled in the torrc. Is your valgrind to 6GB of RAM have it enabled? If not, something else is causing massive memory usage (again...)
These stats are useful for traffic modeling. I think a fine plan is to recommend to phase out cell statistics, with the goal of replacing them in the long term with more useful privacy-preserving measurements using the new PrivCount protocol when its ready, such as bytes-per-stream and streams-per-circuit.
Data point about RAM usage:
I've had the CellStatistics option enabled on my fast relays for a couple of years now. I've noticed that RAM usage on any of the relays would occasionally increase above 2 GiB (even when I temporarily experimented with MaxMemInQueues 512 MB), but I never linked it to CellStatistics. I've now disabled CellStatistics because RAM has recently become a bit of a problem for other reasons.
Do we still care about cell statistics?
Theoretically, cell statistics can help us understand how well our test networks model the conditions of the real Tor network, which would be useful for test networks that actually try to faithfully model the conditions of the real Tor network (like Shadow). For example, we could run a bunch of clients in the test network that are initiating stream transfers according to some understanding of client usage, and then we collect the cell stats on the relays in the test network and compare them to the stats from the public network. How close we are provides a measure of fidelity. I actually did this in the original Shadow paper.
However, circuit-level cell information by itself is not the greatest for informing how the clients in the test network should generate traffic in the first place. For that, we need a combination of stream-level and circuit-level information. From a client modeling perspective, I need to know how many streams each client should create, how much to download on each stream, how many pauses to add and how long to pause, etc. Circuits and cells are a second-level effect of the client model.
The privacy implications are also worth considering (some concerns raised here, and moved here). I think individual relay stats are much more specific that what we need; for modeling purposes, we lose very little utility by using aggregate network results, but they are much safer.