Improving Snowflake Proxy Performance by Adjusting Copy Buffer Size
TL;DR: The current implementation uses a 32K buffer size for a total of 64K of buffers/connection, but each read/write is less than 2K according to my measurements.
Background
The Snwoflake proxy uses as particularly hot function copyLoop
(proxy/lib/snowflake.go) to proxy data from a Tor relay to a connected client.
This is currently done using the io.Copy
function to write all incoming data
both ways.
Looking at the io.Copy
implementation, it internally uses io.CopyBuffer
,
which in turn defaults to a buffer of size 32K for copying data (I checked and
the current implementation uses 32K every time).
Since snowflake-proxy
is intended to be run in a very distributed manner, on
as many machines as possible, minimizing the CPU and memory footprint of each
proxied connection would be ideal, as well as maximising throughput for
clients.
Hypothesis
There might exist a buffer size X
that is more suitable for usage in copyLoop
than 32K.
Testing
Using tcpdump
Assuming you use -ephemeral-ports-range 50000:51000
for snowflake-proxy
,
you can capture the UDP packets being proxied using
sudo tcpdump -i <interface> udp portrange 50000-51000
which will provide a length
value for each packet captured. One good start
value for X
could then be slighly larger than the largest captured packet,
assuming one packet is copied at a time.
Experimentally I found this value to be 1265 bytes, which would make X = 2K
a
possible starting point.
Printing actual read
The following snippe was added in proxy/lib/snowflake.go
:
// Taken straight from standardlib io.copyBuffer
func copyBuffer(dst io.Writer, src io.Reader, buf []byte) (written int64, err error) {
// If the reader has a WriteTo method, use it to do the copy.
// Avoids an allocation and a copy.
if wt, ok := src.(io.WriterTo); ok {
return wt.WriteTo(dst)
}
// Similarly, if the writer has a ReadFrom method, use it to do the copy.
if rt, ok := dst.(io.ReaderFrom); ok {
return rt.ReadFrom(src)
}
if buf == nil {
size := 32 * 1024
if l, ok := src.(*io.LimitedReader); ok && int64(size) > l.N {
if l.N < 1 {
size = 1
} else {
size = int(l.N)
}
}
buf = make([]byte, size)
}
for {
nr, er := src.Read(buf)
if nr > 0 {
log.Printf("Read %d", nr) // THIS IS THE ONLY DIFFERENCE FROM io.CopyBuffer
nw, ew := dst.Write(buf[0:nr])
if nw < 0 || nr < nw {
nw = 0
if ew == nil {
ew = errors.New("invalid write result")
}
}
written += int64(nw)
if ew != nil {
err = ew
break
}
if nr != nw {
err = io.ErrShortWrite
break
}
}
if er != nil {
if er != io.EOF {
err = er
}
break
}
}
return written, err
}
and copyLoop
was amended to use this instead of io.Copy
.
The Read: BYTES
was saved to a file using this command
./proxy -verbose -ephemeral-ports-range 50000:50010 2>&1 >/dev/null | awk '/Read: / { print $4 }' | tee read_sizes.txt
I got the result:
min: 8 max: 1402 median: 1402 average: 910.305
Suggested buffer size: 2K Current buffer size: 32768 (32K, experimentally verified)
Using a Snowflake Proxy in Tor browser and use Wireshark
I also used Wireshark, and concluded that all packets sent was < 2K.
Conclusion
As per the commit I suggest changing the buffer size to 2K. Some things I have not been able to answer:
- Does this make a big impact on performance?
- Are there any unforseen consequences? What happens if a packet is > 2K (I think the Go standard libary just splits the packet, but someone please confirm).