Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
Arti
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Container Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
The Tor Project
Core
Arti
Commits
f23fca5e
Commit
f23fca5e
authored
2 years ago
by
Nick Mathewson
Browse files
Options
Downloads
Plain Diff
Merge branch 'testing-docs' into 'main'
New documents to checkpoint my work on
#329
and
#87
See merge request
!407
parents
6f9094e2
331da627
No related branches found
Branches containing commit
No related tags found
Tags containing commit
1 merge request
!407
New documents to checkpoint my work on #329 and #87
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
doc/testing/HowToBreak.md
+192
-0
192 additions, 0 deletions
doc/testing/HowToBreak.md
doc/testing/Profiling.md
+152
-0
152 additions, 0 deletions
doc/testing/Profiling.md
with
344 additions
and
0 deletions
doc/testing/HowToBreak.md
0 → 100644
+
192
−
0
View file @
f23fca5e
# Simulating failures in Arti
This document explains how to simulate different kinds of bootstrapping and
network failures in Arti.
The main reason for simulating failures is to ensure that Arti's
behavior is "generally reasonable" when the network is down or
misbehaving, when the local host is set up in a confusing way, etc.
Here "generally reasonable" should mean that we aren't making a huge
number of connections to the network or wasting a huge amount of
bandwidth. Similarly, we shouldn't be using huge amounts of CPU, or
filling up the logs at level
`info`
or higher.
It's an extra benefit if we can ensure that our bootstrap reporting
mechanisms give us accurate feedback in these cases, and diagnose the
problem accurately.
Most of the examples here will use the
`arti-testing`
tool. Some will
also use a small Chutney network. In either case, you'll need an
explicit client configuration, since
`arti-testing`
doesn't want you to
use the default; I'll assume you've put its location in
`${ARTI_CONF}`
.
Note that you shouldn't _need_ to use chutney in these cases if Arti is
in fact well-behaved. However, it's courteous to do so if you think
there might be problems in Arti's behavior: you wouldn't want to flood
the real network.
I'll be assuming that you have a Linux environment.
## What to look at
The output from
`arti-testing`
will tell you whether bootstrapping
succeeded or failed. If bootstrapping is not expected to succeed, try
adding
`--timeout ${DELAY} --expect timeout`
to indicate that the
operation isn't supposed to succeed, and should eventually time out.
If bootstrapping or connecting succeeds when it shouldn't, then the test
was wrong: we were trying to make success impossible, but somehow it
succeeded anyway.
When we're done,
`arti-testing`
will tell us some statistics about TCP
connections and log messages. Here is an example of a not-too-bad
attempt to bootstrap over 30 seconds:
```
TCP stats: TcpCount { n_connect_attempt: 1, n_connect_ok: 1, n_accept: 0, n_bytes_send: 17223, n_bytes_recv: 59092 }
Total events: Trace: 159, Debug: 14, Info: 16, Warn: 8, Error: 0
```
And here's an example of obviously problematic behavior over a similar
period:
```
Timeout occurred [as expected]
TCP stats: TcpCount { n_connect_attempt: 1220, n_connect_ok: 1220, n_accept: 0, n_bytes_send: 1394460, n_bytes_recv: 4267636 }
Total events: Trace: 13431, Debug: 2088, Info: 2383, Warn: 15, Error: 0
```
## Failures related to time
These require the [
`faketime`
] tool.
### System clock set wrong, no directory cached
Start with an empty cache. Optionally, start with an empty state file.
Then run:
`faketime ${WHEN} arti-testing bootstrap -c ${ARTI_CONF} --timeout 30`
Try this with different values of
`WHEN`
:
*
'4 hours ago'
*
'1 day ago'
*
'1 month ago'
*
'1 day'
*
'1 month'
*
'1 year'
### System clock set wrong, live directory cached.
Start with an empty cache. Optionally, start with an empty state file.
Then run:
`arti-testing bootstrap -c ${ARTI_CONF}`
This should succeed. Now run:
```
faketime ${WHEN} arti-testing connect -c ${ARTI_CONF} \
--target www.torproject.org:80 \
--timeout 30 --retry 0
```
Try this with different values of
`WHEN`
as above. This simulates a
case where we previously bootstrapped with a reasonably live directory,
but we wound up with a wrong clock when we restarted.
### System clock set wrong, obsolete directory cached
You can simulate this with a directory that you made before, then
copied into your cache directory. Use
`faketime`
to set the current
time to a point at which the directory was valid, or recently valid.
Note that this test won't work well with as chutney, since chutney
directory lifetimes are very short.
TODO: Describe better ways to do this.
## Failures related to the network
The
`arti-testing`
tool can simulate multiple kinds of errors:
*
connections fail immediately (or after a little while)
(
`--tcp-failure error --tcp-failure-delay 1`
)
*
connections time out and never succeed (
`--tcp-failure timeout`
)
*
connections succeed, but drop all data and say
nothing. (
`--tcp-failure blackhole`
)
You can arrange for these failures to start in the bootstrap phase
(
`--tcp-failure-stage bootstrap`
) or in the connect stage
(
`--tcp-failure-stage connect`
).
With these options, you can simulate different kinds of failures by
starting with an empty directory cache (and optionally empty state).
The bootstrap phase failures correspond to failures on your fallback
directories; the connect-phase failures correspond to failures on the
live network.
(TODO: There's an issue here where if you have open connections to the
fallbacks, the TCP-failure code won't yet make them start failing when
you connect to the network. As a workaround, bootstrap in a separate
`arti-testing`
call, then connect with TCP failures enabled.)
Here's an an example of failing during bootstrapping. (Clear your cache
first.)
`arti-testing bootstrap -c ${ARTI_CONF} --timeout 30 --tcp-failure error`
Here's an example of failing after bootstrapping. (Clear your cache
before the first command.)
```
# This one should succeed
arti-testing bootstrap -c ${ARTI_CONF}
# This will fail.
arti-testing connect -c ${ARTI_CONF} \
--target www.torproject.org:80 \
--timeout 30 --retry 0 \
--tcp-failure blackhole
```
## Partial network blocking
You can make the above network failures conditional, to simulate
different kinds of broken local networks. Try
`--tcp-failure-on v4`
to
simulate an IPv4-only network, or
`--tcp-failure-on non443`
to simulate
a network that blocks everything but HTTPS.
(These won't work with chutney networks, since a typical chutney
network's relays are all on IPv4 with high ports.)
## Network identity mismatch
One way to get an interesting set of failures is to mix-and-match the
`arti.toml`
files from two different chutney networks. You can find older
chutney networks in subdirectories of
`${CHUTNEY_PATH}/net/`
other than
`nodes`
.
If you use an older set of fallback directories, you'll simulate the
case where the client can't actually connect to any fallback
directories because its beliefs about their identities are all wrong.
If you keep the running set of fallback directories, but use the older
set of authorities, you'll simulate the case where the client fetches a
directory, but doesn't believe in any authorities that signed it.
(For both of these cases, start with an empty cache and use the
`arti-testing bootstrap`
command.)
# TODO
arti-testing:
-
Ability to clear cache and/or state.
-
Fresh client for connecting.
-
Ability to close after a little while.
-
Directory munger.
This diff is collapsed.
Click to expand it.
doc/testing/Profiling.md
0 → 100644
+
152
−
0
View file @
f23fca5e
# Arti profiling methodology
This document describes basic tools for profiling Arti's CPU and memory
usage. Not all of these tools will make sense for every situation, and
we may want to switch them in the future. The main reason for recording
them here is so that we don't have to re-learn how to use them the next
time we need to do a big round of profiling tests.
## Building for profiling
When you're testing with
`cargo build --release`
, use
`CARGO_PROFILE_RELEASE_DEBUG=true`
to include extra debugging
information for better output.
## Profiling tools
Here I'll talk about a few tools for measuring CPU usage, memory usage,
and the like. For now, I'll assume you're on a reasonably modern Linux
environment: if you aren't, you'll have to do some stuff differently.
I'll talk about particular scenarios to profile in the next major
section.
### cargo flamegraph
[
cargo-flamegraph
](
https://github.com/flamegraph-rs/flamegraph
)
is a
pretty quick-and-easy event profiling visualization tool. It produces
nice SVG flamegraphs in a variety of pretty colors. As with all
flamegraphs, these are better for visualization than detailed
drill-down. On Linux,
`cargo-flamegraph`
uses
[
`perf`
](
https://perf.wiki.kernel.org/index.php/Main_Page
)
under the
hood.
To install, make sure you have a working version of
`perf`
installed. Then run
`cargo install flamegraph`
.
Basic usage:
```
flamegraph {command}
```
Output:
`flamegraph.svg`
Also consider using the
`--reverse`
flag, to reverse the stack and see the
lowest-level functions that get the most use.
### tcmalloc and pprof
This can generate usage graphs showing who allocated your memory when.
(It can get a bit confusing in Rust.)
```
HEAPPROFILE=/tmp/heap.hprof \
LD_PRELOAD=/usr/lib64/libtcmalloc_and_profiler.so \
{command}
```
```
pprof --pdf --inuse_space {binary} /tmp/heap.hprof > heap.pdf
```
You might need a longer timeout with this one; it's nontrivial.
### valgrind --massif
This tool can also generate usage graphs like pprof above.
`valgrind --tool=massif {command}`
It will generate a file called
`massif.out.PID`
. You can view it with the
`ms_print`
tool (included with valgrind) or the
`massif-visualizer`
tool
(installed separately, highly recommended.)
## Some commands to profile
These should generally run against a chutney network whenever possible;
the
`ARTI_CONF`
envvar should be set to
e.g.
`$(pwd)/chutney/net/nodes/arti.toml`
.
### Bootstrapping a directory
`arti-testing bootstrap -c ${ARTI_CONF}`
(This test bootstraps only. It might make sense to do this one on the
real network, since its data is more complex. You need to start with an
empty set of state files for this to test bootstrapping instead of
loading.)
### Large number of circuits, focusing on circuit construction
Bootstrap outside of benchmarking, then run:
`arti-bench -u 1 -d 1 -s 100 -C 20 -p 1 -c ${ARTI_CONF}`
(100 samples, 20 circuits per sample, 1 stream per circuit, only 1 byte
to upload or download.)
Note that this test won't necessarily tell you so much about _path
construction_, since path construction on a large real network with
different weights, policies, and families is more complex than on a
chutney network.
(just times out with chutney; directory changes too fast, I think.)
### Running offline
Also
*
Bootstrapping failure conditional
*
Going offline
*
Primary guards go down after bootstrap
(See
`HowToBreak.md`
)
### Data transfer
`arti-bench -s 20 -C 1 -p 1 {...}`
(No parallelism, 10 MB up and down.)
### Data transfer with many circuits
`arti-bench -s 1 -C 64 -p 1 -c ${ARTI_CONF}`
(Circuit parallelism only, 10 mb up and down)
### Data transfer with many streams
`arti-bench -s 1 -C 1 -p 64 -c ${ARTI_CONF}`
(Stream parallelism only, 10 mb up and down)
### Huge number of simultaneous connection attempts
`arti-bench -s 1 -C 16 -p 16 -c ${ARTI_CONF}`
(stream and circuit parallelism)
# TODO
arti-bench:
-
take a target address as a string.
-
Allow -p 0 to build a circuit only?
-
Some way to build a path only?
Extract chutney boilerplate.
arti-testing:
-
ability to make connections aggressively simultaneous
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment