GitLab is used only for code review, issue tracking and project management. Canonical locations for source code are still https://gitweb.torproject.org/ https://git.torproject.org/ and git-rw.torproject.org.

Commit 76517ac6 authored by Karsten Loesing's avatar Karsten Loesing

Merge branch 'acute-task-33260-4' into develop

parents c8275b25 f063b636
# Changes in version 0.8 - 2020-09-16
- Add a new `onionperf filter` mode that takes an OnionPerf analysis
results file or directory as input, applies filters, and produces
new OnionPerf analysis results file(s) as output. Bump the analysis
version number to 4.0 to include additional filter metadata defined
in a 'filters' field and an optional 'filtered\_out' field per Tor
circuit. Implements #33260.
# Changes in version 0.7 - 2020-09-01
- Add `onionperf measure --drop-guards` parameter to use and drop
......
......@@ -16,6 +16,7 @@
+ [Troubleshooting](#troubleshooting)
* [Analysis](#analysis)
+ [Analyzing measurement results](#analyzing-measurement-results)
+ [Filtering measurement results](#filtering-measurement-results)
+ [Visualizing measurement results](#visualizing-measurement-results)
+ [Interpreting the PDF output format](#interpreting-the-pdf-output-format)
+ [Interpreting the CSV output format](#interpreting-the-csv-output-format)
......@@ -252,6 +253,26 @@ OnionPerf's `analyze` mode has several command-line parameters for customizing t
onionperf analyze --help
```
### Filtering measurement results
The `filter` subcommand can be used to filter out measurement results based on given criteria. This subcommand is typically used in combination with the `visualize` subcommand. The workflow is to apply one or more filters and then visualize only those measurements with an existing mapping between TGen transfers/streams and Tor streams/circuits.
Currently, OnionPerf measurement results can be filtered based on Tor relay fingerprints found in Tor circuits, although support for filtering based on Tor streams and/or TGen transfers/streams may be added in the future.
The `filter` mode takes a list of fingerprints and one or more existing analysis files as inputs and outputs new analysis files with the same contents as the input analysis files plus annotations on those Tor circuits that have been filtered out. If a directory of analysis files is given to '-i', the structure and filenames of that directory are preserved under the path specified with '-o'.
For example, the analysis file produced above can be filtered with the following command, which retains only those Tor circuits with fingerprints contained in the file 'fingerprints.txt':
```shell
onionperf filter -i onionperf.analysis.json.xz -o filtered.onionperf.analysis.json.xz --include-fingerprints fingerprints.txt
```
OnionPerf's `filter` command usage can be inspected with:
```shell
onionperf filter --help
```
### Visualizing measurement results
Step two in the analysis is to process analysis files with OnionPerf's `visualize` mode which produces CSV and PDF files as output.
......@@ -267,6 +288,8 @@ As a result, two files are written to the current working directory:
- `onionperf.viz.$datetime.csv` contains visualized data in a CSV file format; and
- `onionperf.viz.$datetime.pdf` contains visualizations in a PDF file format.
For analysis files containing tor circuit filters, only measurements with an existing mapping between TGen transfers/streams Tor streams/circuits which have not been marked as 'filtered\_out' are visualized.
Similar to the other modes, OnionPerf's `visualize` mode has command-line parameters for customizing the visualization step:
```shell
......
......@@ -24,7 +24,7 @@ class OPAnalysis(Analysis):
def __init__(self, nickname=None, ip_address=None):
super().__init__(nickname, ip_address)
self.json_db = {'type': 'onionperf', 'version': '3.0', 'data': {}}
self.json_db = {'type': 'onionperf', 'version': '4.0', 'data': {}}
self.torctl_filepaths = []
def add_torctl_file(self, filepath):
......@@ -62,8 +62,7 @@ class OPAnalysis(Analysis):
self.json_db['data'][self.nickname]["tgen"].pop("stream_summary")
self.did_analysis = True
def save(self, filename=None, output_prefix=os.getcwd(), do_compress=True, date_prefix=None):
def save(self, filename=None, output_prefix=os.getcwd(), do_compress=True, date_prefix=None, sort_keys=True):
if filename is None:
base_filename = "onionperf.analysis.json.xz"
if date_prefix is not None:
......@@ -80,7 +79,7 @@ class OPAnalysis(Analysis):
logging.info("saving analysis results to {0}".format(filepath))
outf = util.FileWritable(filepath, do_compress=do_compress)
json.dump(self.json_db, outf, sort_keys=True, separators=(',', ': '), indent=2)
json.dump(self.json_db, outf, sort_keys=sort_keys, separators=(',', ': '), indent=2)
outf.close()
logging.info("done!")
......@@ -98,6 +97,15 @@ class OPAnalysis(Analysis):
except:
return None
def get_tor_circuits(self, node):
try:
return self.json_db['data'][node]['tor']['circuits']
except:
return None
def set_tor_circuits(self, node, tor_circuits):
self.json_db['data'][node]['tor']['circuits'] = tor_circuits
def get_tor_streams(self, node):
try:
return self.json_db['data'][node]['tor']['streams']
......@@ -125,7 +133,7 @@ class OPAnalysis(Analysis):
if 'type' not in db or 'version' not in db:
logging.warning("'type' or 'version' not present in database")
return None
elif db['type'] != 'onionperf' or str(db['version']) >= '4.':
elif db['type'] != 'onionperf' or str(db['version']) >= '5.':
logging.warning("type or version not supported (type={0}, version={1})".format(db['type'], db['version']))
return None
else:
......
'''
OnionPerf
Authored by Rob Jansen, 2015
Copyright 2015-2020 The Tor Project
See LICENSE for licensing information
'''
import re
from onionperf.analysis import OPAnalysis
class Filtering(object):
def __init__(self):
self.fingerprints_to_include = None
self.fingerprints_to_exclude = None
self.fingerprint_pattern = re.compile("\$?([0-9a-fA-F]{40})")
def include_fingerprints(self, path):
self.fingerprints_to_include = []
self.fingerprints_to_include_path = path
with open(path, 'rt') as f:
for line in f:
fingerprint_match = self.fingerprint_pattern.match(line)
if fingerprint_match:
fingerprint = fingerprint_match.group(1).upper()
self.fingerprints_to_include.append(fingerprint)
def exclude_fingerprints(self, path):
self.fingerprints_to_exclude = []
self.fingerprints_to_exclude_path = path
with open(path, 'rt') as f:
for line in f:
fingerprint_match = self.fingerprint_pattern.match(line)
if fingerprint_match:
fingerprint = fingerprint_match.group(1).upper()
self.fingerprints_to_exclude.append(fingerprint)
def filter_tor_circuits(self, analysis):
if self.fingerprints_to_include is None and self.fingerprints_to_exclude is None:
return
filters = analysis.json_db.setdefault("filters", {})
tor_circuits_filters = filters.setdefault("tor/circuits", [])
if self.fingerprints_to_include:
tor_circuits_filters.append({"name": "include_fingerprints", "filepath": self.fingerprints_to_include_path })
if self.fingerprints_to_exclude:
tor_circuits_filters.append({"name": "exclude_fingerprints", "filepath": self.fingerprints_to_exclude_path })
for source in analysis.get_nodes():
tor_circuits = analysis.get_tor_circuits(source)
filtered_circuit_ids = []
for circuit_id, tor_circuit in tor_circuits.items():
keep = False
if "path" in tor_circuit:
path = tor_circuit["path"]
keep = True
for long_name, _ in path:
fingerprint_match = self.fingerprint_pattern.match(long_name)
if fingerprint_match:
fingerprint = fingerprint_match.group(1).upper()
if self.fingerprints_to_include is not None and fingerprint not in self.fingerprints_to_include:
keep = False
break
if self.fingerprints_to_exclude is not None and fingerprint in self.fingerprints_to_exclude:
keep = False
break
if not keep:
tor_circuits[circuit_id]["filtered_out"] = True
tor_circuits[circuit_id] = dict(sorted(tor_circuit.items()))
def apply_filters(self, input_path, output_dir, output_file):
analysis = OPAnalysis.load(filename=input_path)
self.filter_tor_circuits(analysis)
analysis.json_db["version"] = '4.0'
analysis.json_db = dict(sorted(analysis.json_db.items()))
analysis.save(filename=output_file, output_prefix=output_dir, sort_keys=False)
......@@ -74,6 +74,23 @@ Stats files in the default Torperf format can also be exported.
HELP_ANALYZE = """
Analyze Tor and TGen output
"""
DESC_FILTER = """
Takes an OnionPerf analysis results file or directory as input, applies filters,
and produces new OnionPerf analysis results file(s) as output.
The `filter` subcommand is typically used in combination with the `visualize`
subcommand. The work flow is to filter out any TGen transfers/streams or Tor
streams/circuits that are not supposed to be visualized and then visualize only
those measurements with an existing mapping between TGen transfers/streams and
Tor streams/circuits.
This subcommand only filters individual objects and leaves summaries unchanged.
"""
HELP_FILTER = """
Filter OnionPerf analysis results
"""
DESC_VISUALIZE = """
Loads an OnionPerf json file, e.g., one produced with the `analyze` subcommand,
and plots various interesting performance metrics to PDF files.
......@@ -305,6 +322,37 @@ files generated by this script will be written""",
action="store", dest="date_prefix",
default=None)
# filter
filter_parser = sub_parser.add_parser('filter', description=DESC_FILTER, help=HELP_FILTER,
formatter_class=my_formatter_class)
filter_parser.set_defaults(func=filter, formatter_class=my_formatter_class)
filter_parser.add_argument('-i', '--input',
help="""a file or directory PATH from which OnionPerf analysis results
files are read""",
metavar="PATH", required="True",
action="store", dest="input")
filter_parser.add_argument('--include-fingerprints',
help="""include only Tor circuits with known circuit path and with all
relays being contained in the fingerprints file located at
PATH""",
metavar="PATH", action="store", dest="include_fingerprints",
default=None)
filter_parser.add_argument('--exclude-fingerprints',
help="""exclude Tor circuits without known circuit path or with any
relay being contained in the fingerprints file located at
PATH""",
metavar="PATH", action="store", dest="exclude_fingerprints",
default=None)
filter_parser.add_argument('-o', '--output',
help="""a file or directory PATH where filtered output OnionPerf
analysis results files are written""",
metavar="PATH", required="True",
action="store", dest="output")
# visualize
visualize_parser = sub_parser.add_parser('visualize', description=DESC_VISUALIZE, help=HELP_VISUALIZE,
formatter_class=my_formatter_class)
......@@ -434,6 +482,31 @@ def analyze(args):
else:
logging.error("Given paths were an unrecognized mix of file and directory paths, nothing will be analyzed")
def filter(args):
from onionperf.filtering import Filtering
input_path = os.path.abspath(os.path.expanduser(args.input))
if not os.path.exists(input_path):
raise argparse.ArgumentTypeError("input path '%s' does not exist" % args.input)
output_path = os.path.abspath(os.path.expanduser(args.output))
if os.path.exists(output_path):
raise argparse.ArgumentTypeError("output path '%s' already exists" % args.output)
filtering = Filtering()
if args.include_fingerprints is not None:
filtering.include_fingerprints(args.include_fingerprints)
if args.exclude_fingerprints is not None:
filtering.exclude_fingerprints(args.exclude_fingerprints)
if os.path.isfile(input_path):
output_dir, output_file = os.path.split(output_path)
filtering.apply_filters(input_path=input_path, output_dir=output_dir, output_file=output_file)
else:
from onionperf import reprocessing
analyses = reprocessing.collect_logs(input_path, '*onionperf.analysis.*')
for analysis in analyses:
full_output_path = os.path.join(output_path, os.path.relpath(analysis, input_path))
output_dir, output_file = os.path.split(full_output_path)
filtering.apply_filters(input_path=analysis, output_dir=output_dir, output_file=output_file)
def visualize(args):
from onionperf.visualization import TGenVisualization
from onionperf.analysis import OPAnalysis
......
......@@ -62,6 +62,7 @@ class TGenVisualization(Visualization):
if "source" in tor_stream and ":" in tor_stream["source"]:
source_port = tor_stream["source"].split(":")[1]
tor_streams_by_source_port.setdefault(source_port, []).append(tor_stream)
tor_circuits = analysis.get_tor_circuits(client)
tgen_streams = analysis.get_tgen_streams(client)
tgen_transfers = analysis.get_tgen_transfers(client)
while tgen_streams or tgen_transfers:
......@@ -122,20 +123,34 @@ class TGenVisualization(Visualization):
unix_ts_end = transfer_data["unix_ts_end"]
if "unix_ts_start" in transfer_data:
stream["start"] = datetime.datetime.utcfromtimestamp(transfer_data["unix_ts_start"])
tor_stream = None
tor_circuit = None
if source_port and unix_ts_end:
for s in tor_streams_by_source_port[source_port]:
if abs(unix_ts_end - s["unix_ts_end"]) < 150.0:
tor_stream = s
break
if tor_stream and "circuit_id" in tor_stream:
circuit_id = tor_stream["circuit_id"]
if str(circuit_id) in tor_circuits:
tor_circuit = tor_circuits[circuit_id]
if error_code:
if error_code == "PROXY":
error_code_parts = ["TOR"]
else:
error_code_parts = ["TGEN", error_code]
if source_port and unix_ts_end:
for tor_stream in tor_streams_by_source_port[source_port]:
if abs(unix_ts_end - tor_stream["unix_ts_end"]) < 150.0:
if "failure_reason_local" in tor_stream:
error_code_parts.append(tor_stream["failure_reason_local"])
if "failure_reason_remote" in tor_stream:
error_code_parts.append(tor_stream["failure_reason_remote"])
if tor_stream:
if "failure_reason_local" in tor_stream:
error_code_parts.append(tor_stream["failure_reason_local"])
if "failure_reason_remote" in tor_stream:
error_code_parts.append(tor_stream["failure_reason_remote"])
stream["error_code"] = "/".join(error_code_parts)
streams.append(stream)
if "filters" in analysis.json_db.keys() and analysis.json_db["filters"]["tor/circuits"]:
if tor_circuit and "filtered_out" not in tor_circuit.keys():
streams.append(stream)
else:
streams.append(stream)
self.data = pd.DataFrame.from_records(streams, index="id")
def __plot_firstbyte_ecdf(self):
......
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment