Commit 479ca34d authored by Karsten Loesing's avatar Karsten Loesing
Browse files

Add report on cells in circuit queues.

parent 7ca14119
Loading
Loading
Loading
Loading
+30 −2
Original line number Diff line number Diff line
@@ -306,7 +306,7 @@ $ java -cp bin/:lib/* org.torproject.metrics.entry.ParseEntryStats


6  Exit port statistics
======================
=======================

Put the exit-stats file coming from one exit node containing one or more
days of measurements with possibly different exit policies in a directory
@@ -330,3 +330,31 @@ The four parameters are:
3. a comma-separated list of identities of the measuring relay, and
4. the output directory.


7  Cell statistics
==================

Put the buffer-stats files in a directory data/buffer/ , renaming them to
the relay names that shall appear in the output graphs.

$ make data/
$ make data/buffer/

Compile the evaluation application:

$ javac -d bin/ -cp src/:lib/* src/org/torproject/metrics/buffer/*.java

Run the evaluation application:

$ java -cp bin/:lib/* org.torproject.metrics.buffer.EvaluateCellStats
  data/buffer/ out/buffer/

Run the R script:

$ R --no-save -q < scripts/buffer/cellstats.R

Compile the PDF using pdflatex:

$ cd report/buffer/
$ pdflatex torperf.tex       # Run twice to update references
+78.5 KiB

File added.

No diff preview for this file type.

+196 −0
Original line number Diff line number Diff line
\documentclass{article}
\usepackage[dvips]{graphicx}
\usepackage{graphics}
\usepackage{color}
\usepackage{booktabs}
\usepackage{multirow}
\begin{document}
\title{Analysis of Circuit Queues in Tor}
\author{Karsten Loesing}
\maketitle

\section{Background on queues and buffers in Tor}

Whenever a relay receives a cell, this cell is appended to the outbound
queue of the corresponding circuit.
As soon as there is room in the outgoing buffer of the connection to the
next relay in the circuit, the cell is removed from the queue and the cell
body is written to that buffer.
The bytes reside in the outgoing connection buffers until they can be sent
over the TLS socket to the previous or next relay in the client's circuit.
Multiple circuits can share the same connection, so that the cells of
multiple queues can be waiting to be written to a single outgoing buffer.

This analysis focuses on circuits and the cells that are kept in their
queues, rather than on connections to other relays and bytes waiting in
their buffers.
Possible changes to circuit queues include reducing queue lengths to reduce
the latency of sending cells and changing the scheduling of processing
cells competing for the same connection.
Future analyses might also focus more on connections.

The graphs in this report are based on statistics gathered by a few relays
starting on July 20, 2009.
Table~\ref{tab:relays} lists the relays including their bandwidth
configurations and exit policies.

\begin{table}
\centering
\caption{Relays measuring cell statistics}
\label{tab:relays}
\vspace{0.5cm}
\begin{tabular}{lccccl}
\toprule
 & \multicolumn{3}{c}{Bandwidth (KiB/s)} &\\ 
Nickname & Rate & Burst & MaxAdv & Exit & Operator\\
\midrule
echelon1 & 400 & 1024 & -- & no & Karsten Loesing\\
echelon2 & 400 & 1024 & -- & no & Karsten Loesing\\
ephemer2 & 90 & -- & -- & no & Steven J. Murdoch\\
gabelmoo & 1024 & 1250 & 500 & no & Karsten Loesing\\
hamsterrad & 100 & 500 & -- & no & Karsten Loesing\\
moria1 & -- & -- & 10 & no &  Roger Dingledine\\
nottheNSA & 100 & 200 & -- & yes & Andrew Lewman\\
TorTeamHelp & 750 & 1500 & -- & yes & Jonathan Rippstein\\
vallenator & 2100 & 4000 & -- & no & Hans Schnehl\\
\bottomrule
\end{tabular}
\end{table}

\section{Format of measured data}

The data format of the graphs in this report is as follows:

\begin{verbatim}
written 2009-07-25 00:24:27 (86400 s)
processed-cells 5978,223,56,22,9,5,5,4,3,1
queued-cells 5.40,0.60,0.03,0.01,0.00,0.00,0.00,0.00,0.00,0.01
time-in-queue 1919,1838,127,116,124,99,47,58,111,447
number-of-circuits-per-share 15351
\end{verbatim}

Every 24 hours, these five lines are appended to a local file. The
\texttt{written} line contains the end time of the measurement interval as
well as the interval length in seconds.
The \texttt{processed-cells} line contains the number of processed cells by
deciles in descending order from loudest to quietest circuits.
The \texttt{queued-cells} line denotes the mean number of cells contained
in queues of the circuit deciles.
The \texttt{time-in-queue} line contains the mean time that cells spend in
circuit queues in milliseconds.
Finally, \texttt{number-of-circuits-per-share} states how many circuits are
included in every decile.

\section{Total number of circuits}

The first data set that can be visualized from the statistics is the number
of circuits that can be derived from the
\texttt{number-of-circuits-per-share} line.
Figure~\ref{fig:totalnumbers} shows the number of circuits over time
periods of 24 hours.
The graph shows that the number of circuits varies from relay to relay
depending on their bandwidth and exit policy.

\begin{figure}
\centering
\includegraphics[width=\textwidth]{total-cells}
\caption{Total number of circuits per day}
\label{fig:totalnumbers}
\end{figure}

\section{Classification of circuits by processed cells}

The total number of circuits does not reflect how loud or quiet these
circuits are.
Therefore, circuits are classified by the number of cells that they have
processed as written in the \texttt{processed-cells} line.
Figure~\ref{fig:processed} shows the classification result:
the loudest 10\% of all circuits have pushed more than 5,000 cells in the
mean, the quietest 10\% only a single cell.
The monotonic decrease in this graph can be explained from the definition
of louder circuits processing more cells than quieter ones.

\begin{figure}
\centering
\includegraphics[width=\textwidth]{processed-cells}
\caption{Mean number of processed cells per circuit}
\label{fig:processed}
\end{figure}

There are two noticeable lines in the graph.
The loudest 10\% of the circuits on \texttt{TorTeamHelp} and
\texttt{moria1} have processed far fewer cells than the loudest circuits in
all other relays.
Possible explanations are that \texttt{TorTeamHelp} is running a patch that
reduces the circuit window size from 1000 cells to 100 cells and that
\texttt{moria1} is a directory authority advertising a bandwidth of only 10
KB/s.

\section{Queued cells by circuit loudness}

Even though it is interesting to learn about the classification of circuits
by their loudness, there is not much to be changed or improved.
Whether or not a circuit is loud depends only on the usage behavior.
It is questionable if this behavior can be changed (for example by
discouraging high-volume applications), and if so, only in the long term.

The mean number of queued cells as a function of circuit loudness is a more
interesting metric here.
These numbers are contained in the \texttt{queued-cells} line.
Figure~\ref{fig:queued} shows these numbers over the measurement interval
of 24 hours.
The loudest 10\% of all circuits have up to 12 cells in their queues in the
mean, the 10--20\% loudest only up to 1, and so on.
The intuition is that louder circuits have (and should have) their cells
queued for a bit while quieter circuits have their cells delivered quickly
rather than waiting in queues.

\begin{figure}
\centering
\includegraphics[width=\textwidth]{queued-cells}
\caption{Mean number of cells in queues}
\label{fig:queued}
\end{figure}

Interestingly, both \texttt{TorTeamHelp} and \texttt{vallenator} have
almost 0 cells waiting in circuit queues, even for the loudest 10\% of the
circuits.

\section{Mean time cells spend in queue}

The mean time that cells spend in circuit queues is probably the most
important metric here.
This waiting time directly contributes to the latency that users
experience.
It would be good to have a short latency for the quieter circuits.
Figure~\ref{fig:timeinqueueavg} shows the mean time of cells by circuit
deciles.
The general trend is that the quieter the circuits are, the less time cells
spend in queues.
Again, \texttt{TorTeamHelp} and \texttt{vallenator} exhibit very low
waiting times.

\begin{figure}
\centering
\includegraphics[width=\textwidth]{time-in-queue}
\caption{Mean time that cells spend in circuit queues}
\label{fig:timeinqueueavg}
\end{figure}

\section{Conclusion}

This analysis has shown a few characteristics of circuit queues in Tor.
Results include total numbers of processed cells, a classification of
circuits by the number of processed cells, the mean number of cells in
queues, and the mean time that cells spend in circuit queues.
If possible, the waiting times for the quieter circuits should be reduced,
even at the cost of increasing the waiting time for the loudest circuits.
The rationale is that the loudest circuits do not use Tor for low-latency
applications anyway, but for high-volume applications.
The result would be that Tor becomes more attractive for interactive
applications.
This analysis permits measurement of the effectiveness of design changes in
the future.

\end{document}
+8.54 KiB

File added.

No diff preview for this file type.

Loading