Shadow was initially designed for Tor simulation, and Tor is still the primary use-case. However, Shadow is really just a general-purpose network simulator and these days you can plug pretty much any unmodified Linux binary in and Shadow will simulate the network parts for you, meaning you can run any sort of distributed network. I think hosting on Github avoids perpetuating misconceptions that it's only for simulating Tor.
(People have used Shadow for running Mixnets, distributed consensus protocols, P2P networks, Ethereum networks, etc.)
+1
FWIW, our in-progress PT (proteus) wants to configure per-bridge protocol semantics (large protocol spec objects) and HTTP CONNECT
seems like a much more flexible method for accomplishing that.
Happy to see that Shadow helped to expose this bug! (A Shadow-based fuzzing tool is an interesting direction for future funding opportunities...)
FWIW, we are working on a future PT as part of Sponsor 28 (RACE) for which support for longer-length parameters would be very useful and would simplify deployment.
Another thing that we should make sure not to ignore when calibrating is the combined relay throughput which is computed by, for every second in the simulation, summing the total number of bits forwarded by all relays during that second. This gives you a CDF over the Gbit/s value that we can then compare to a version of the relay bandwidth history graph on Tor metrics to make sure that overall Tor network throughout matches too.
I can't tell from those client graphs how close the Tor network throughput is now, but moving to the "multiple Tor processes on each Shadow virtual host" as a way to constrain client throughput will also likely reduce the overall Tor network throughput.
Ideally, both the client throughout and the Tor network throughput will both sync up at the same time :)
I don't think https://github.com/shadow/shadow-plugin-tor/issues/63 is relevant - we use tornettools now which does not do geoip lookups during the network generation process. Instead, servers are assigned to random cities in countries where Tor users are located.
Centralizing the list is a refactoring job from the perspective of tornettools, and I would be happy to accept that merge request.
The goal of tornettools is to produce a "standard" network but otherwise be as simple as possible. The perf clients are part of the standard network because they mimic what torperf (now onionperf) does - this way we can directly compare the perf benchmarks in shadow with the perf benchmarks in live Tor.
While there may be other client behaviors that one might want to test depending on your research problem, I think those should remain separate and external to tornettools unless/until we get better data from Tor itself to warrant including them in the standard network. Writing a script to add different client types to a config exported by tornettools would be relatively straightforward.
I guess the part where I have doubts is regarding the outliers. I see that utilization can go above 1. This means nodes were "allowed" to use more bandwidth of what was advertised?
Using more bandwidth than the advertised bandwidth does not seem like a bug to me, because the advertised bandwidth is not a 100% accurate representation of the true bandwidth available to a relay. Generally, the advertised bandwidth under-estimates the true available upstream capacity.
For example, suppose a fresh new relay on a 1 Gbit/s link thinks that it only has 10 Mbit/s, because it hasn't been used that much yet to drive its bandwidth usage higher to make it realize that it can actually send >> 10 Mbit/s. It would only advertise 10 Mbit/s. But then some clients could choose that relay and initiate high volume transfers that exceed 10 Mbit/s, thereby causing your utilization calculation to go above 1.
I am confused though about why the bandwidth history is computed as the sum of read and write histories; is that how the advertised bandwidth is computed too? I was expecting max or maybe mean, but the most important thing is that we want bandwidth history and advertised bandwidth to be consistent.
For those not up to speed (no pun intended):
Since the PeerFlow paper, we've taken a step back and designed a system that we think is more reasonable for Tor's threat model and operational environment called FlashFlow
(yes, too many "flows" being thrown around).
FlashFlow
is basically a system to run my flooding experiment, but distributed across a set of bandwidth authorities with well-defined (and more intelligent) algorithms for coordination, measurement of new relays, etc.
We have a research paper explaining the system and it's security and performance implications, Tor proposal 316 detailing how Tor could adopt it, and @pastly implemented it specifically for Tor deployment.
We had originally intended sbws to be a temporary band-aid and designed FlashFlow
to be the future.
Whenever a Markov client wants to create a stream, it connects to a server chosen randomly from among all servers. We do not match individual clients with individual servers. So I think creating some "bottleneck servers" as you describe should work for your test case.
- Could you double-check my understanding above, and let me know if you have any other ideas or insights?
Your understanding matches mine. I agree with the plan to do the test by adding a new client type with a static behavior, and the existing perf client model should suffice.
- Any objection to adding a "packet loss" attribute to hosts in the shadow simulation?
Yeah, I am against this. You might recall that we discussed host-specific packet loss rates back when Steve was implementing the new network graph code in Rust and we decided that it doesn't really make sense to model that characteristic at the host level. The reason is that our hosts already do drop packets locally when buffers are full, and adding more packet loss on top of that will compose in non-intuitive ways.
The thing that you are really trying to model is a crappy upstream network link. The easiest way to do this is to simply modify the network graph, which I believe is easier/faster than modifying Shadow.
I think you're using the complete graph version of the graph, so you just need to either choose a network node where you will place your host and then modify the packet loss on all of its outgoing edges, or copy the chosen network node and then modify the packet loss on all of the copied edges.
If you're using the incomplete graph, then you should be able to just add a single network node and a single link with high packet loss, and then let Shadow do the path calculations internally. In fact, you could do this strategy on the complete graph too and then convert it to an incomplete graph.
I suppose it would be easy enough to add in options and have tornettools do this for you too, since that might be easier for server descriptors (which iirc are sorted by relay hash not by day like the consensus files).
tornettools does not impose any limitations on the modeling time period. You just give it a path to a directory containing server descriptors, and a path to a directory containing consensus files. You can put whichever server descriptors and consensus files you want in those directories.
It just so happens that these files are archived by month, so all of the examples show us using a month worth of data. If you want a shorter time period, delete the data from the days you don't want, or copy the data from the days you do want into some other directories that are then passed to tornettools.
We should be careful to add only graphs that we think are generally useful to tornettools to decrease maintenance burden. I used to plot many more graphs than the ones I ported to tornettools, but cut it down to only the ones I think are useful to most people to avoid information overload and "graph fatigue".
That being said, I think the utilization graphs are generally useful. Although relay utilization is not a user-focused metric, it helps us understand the state and conditions of the simulation Tor network and can help us make adjustments if necessary.
take the bandwidth history from extra-info, and divide it by the advertized bandwidth from the descriptor.
We need to be careful here. A relay's advertised bandwidth often underestimates its available capacity, especially in under-loaded networks (as we demonstrated in our speed test experiments and paper on the subject. To be accurate, a relay needs to sustain peak load over at least 10 seconds, which may not happen.
bandwidth capacity In live Tor, we can get a more accurate estimate of bandwidth capacity for a relay by taking the maximum advertised bandwidth it has published over the last N months. Ideally, the last N months will have included a period during which we ran a speed test, so that the advertised bandwidth that was published during a speed test gets included in the maximum calculation. This is in fact what we do when building the Tor network for Shadow in the tornettools stage ...
step, where the maximum advertised bandwidth becomes the relay's bandwidth capacity in shadow.config.yaml
. Please do not use the advertised bandwidth from inside a Shadow network, since that will never be more accurate than the bandwidth listed in shadow.config.yaml
.
bandwidth usage Shadow has more fine-grained information available than live Tor, so I generally prefer using data from Shadow whenever possible. In the case of bandwidth usage, we already parse out the number of bytes written and read by every relay for every second during the simulation - this data is available in the oniontrace.analysis.json.xz
file after running the tornettools parse ...
step. This data is already used to plot the relay throughput graph.
It's worth noting that, when comparing Shadow to live Tor, the time period that you are simulating in Shadow (chosen in the tornettools stage ...
step) should be the same period for which data is taken from the live network for comparison to Shadow. For example, you don't want to set up your Shadow network based on the state of the Tor network from January 2021 but then compare Shadow utilization rates with live Tor utilization rates from December 2021. This is not a problem when comparing Shadow to itself (e.g., when trying out different CC algs).
This got long and I don't understand the epoll-based graphs well enough to know how generally useful they would be. My knee-jerk reaction is that most people won't understand these, and so they should be kept separate from tornettools. If you want better feedback, I would ask to see some example graphs so I could get a better sense of the information.
The --tor_metrics_path
option helps you understand where your shadow network might be different from the public network. And then you have to decide if you want to tweak the tornettools params or if it’s unimportant for your research question.
For example, one easy thing to tweak would be to increase the --load_scale
, if your “Tor Relay Goodput” line is too low compared to Tor (e.g., because our Markov models are getting older and clients send more traffic now than they used to). This is probably the most important option to check and consider tweaking, because it will make sure your Shadow network is more accurately “congested”.
If your shadow network util is too low relative to Tor, then many performance problems disappear. For example, your download times for the perf clients probably won’t have long tails anymore like Tor's do.
Let's see if I can help.
The short answer: our traffic models come from Tor measurements taken in 2018, not from Tor metrics data. They include measurements of traffic going in both directions on streams.
Tor metrics data does not include the level of detail that we were able to capture in our Markov models. Yes, the Markov models are a few years old now and I would like to update them (maybe when the Speed Test experiments are done?), but they are still a much, much, much better representation of Tor traffic than the old "download a bunch of 50kib/1mib/5mib files" TorPerf behavior that we used to use.
Longer answer:
When running a Tor traffic generation client, the first thing that they do is create a circuit, then they need to know how many streams on that circuit, and then what the traffic patterns should look like on the stream. We use Markov models, which were created with real Tor traffic measurements, to make these decisions.
For the Markov modeling work in 2018, I ran some exit relays and used privacy-preserving measurement techniques (PrivCount) to measure and produce stream and packet Markov models. Later in the Never Enough paper we created a circuit Markov model.
Circuit model: we measured the average total number of circuits in Tor in a 10 minute period, and took that static count and converted it to a simple Markov model in which the circuit creation follows an exponential distribution with a rate of 1/μ/count microseconds, where μ ← 6 · 10**8 is the number of microseconds in 10 minutes. (The circuit count is scaled based on your network size and load settings in tornettools.)
Stream model: we measured this directly in Tor using exit relays and our Markov model iterative measurement approach.
Packet model: we measured this directly in Tor using exit relays and our Markov model iterative measurement approach.
The Markov tgen clients in Shadow use the IsolateSOCKSAuth
feature with a distinct SOCKS username/password for each "session" (tgen doesn't know about Tor, so we call it a session, but the way we use SOCKS and Tor's SOCKS options means that each session in tgen corresponds to unique circuit in Tor).
So that's it: each tgen client now knows when to create a new circuit, when to create a new stream on the circuit, and when to create a new packet on the stream. We can pack on more and more circuits to a single tgen process to simulate multiple users in parallel, depending on how you configure the tornettools options.
(If you want a more precise technical explanation of how traffic is being generated, the best thing to read is the "Traffic Generation" part in Section 3.2.2 in the Never Enough paper.)
Is there still a long(er) term plan to transition to use the FlashFlow measurement scheme? I'm a bit confused if this is dead since the MR is closed. In the current state, I think we lose the code if source branch is deleted - is that what we want if nobody currently has time to spend on it?
Chiming in to voice my support here. We currently have some shell magic to correct the permissions on the hs directory so that our Tor test cases can run correctly in testing Tor networks, but we have found it to be fragile (and it can lead to a non-intuitive situation for users).