Skip to content

Improving Snowflake Proxy Performance by Adjusting Copy Buffer Size

obble requested to merge obble/snowflake:proxy-change-copyloop-buffer into main

TL;DR: The current implementation uses a 32K buffer size for a total of 64K of buffers/connection, but each read/write is less than 2K according to my measurements.

Background

The Snwoflake proxy uses as particularly hot function copyLoop (proxy/lib/snowflake.go) to proxy data from a Tor relay to a connected client. This is currently done using the io.Copy function to write all incoming data both ways.

Looking at the io.Copy implementation, it internally uses io.CopyBuffer, which in turn defaults to a buffer of size 32K for copying data (I checked and the current implementation uses 32K every time).

Since snowflake-proxy is intended to be run in a very distributed manner, on as many machines as possible, minimizing the CPU and memory footprint of each proxied connection would be ideal, as well as maximising throughput for clients.

Hypothesis

There might exist a buffer size X that is more suitable for usage in copyLoop than 32K.

Testing

Using tcpdump

Assuming you use -ephemeral-ports-range 50000:51000 for snowflake-proxy, you can capture the UDP packets being proxied using

sudo tcpdump  -i <interface> udp portrange 50000-51000

which will provide a length value for each packet captured. One good start value for X could then be slighly larger than the largest captured packet, assuming one packet is copied at a time.

Experimentally I found this value to be 1265 bytes, which would make X = 2K a possible starting point.

Printing actual read

The following snippe was added in proxy/lib/snowflake.go:

// Taken straight from standardlib io.copyBuffer
func copyBuffer(dst io.Writer, src io.Reader, buf []byte) (written int64, err error) {
	// If the reader has a WriteTo method, use it to do the copy.
	// Avoids an allocation and a copy.
	if wt, ok := src.(io.WriterTo); ok {
		return wt.WriteTo(dst)
	}
	// Similarly, if the writer has a ReadFrom method, use it to do the copy.
	if rt, ok := dst.(io.ReaderFrom); ok {
		return rt.ReadFrom(src)
	}
	if buf == nil {
		size := 32 * 1024
		if l, ok := src.(*io.LimitedReader); ok && int64(size) > l.N {
			if l.N < 1 {
				size = 1
			} else {
				size = int(l.N)
			}
		}
		buf = make([]byte, size)
	}
	for {
		nr, er := src.Read(buf)
		if nr > 0 {
			log.Printf("Read %d", nr) // THIS IS THE ONLY DIFFERENCE FROM io.CopyBuffer
			nw, ew := dst.Write(buf[0:nr])
			if nw < 0 || nr < nw {
				nw = 0
				if ew == nil {
					ew = errors.New("invalid write result")
				}
			}
			written += int64(nw)
			if ew != nil {
				err = ew
				break
			}
			if nr != nw {
				err = io.ErrShortWrite
				break
			}
		}
		if er != nil {
			if er != io.EOF {
				err = er
			}
			break
		}
	}
	return written, err
}

and copyLoop was amended to use this instead of io.Copy.

The Read: BYTES was saved to a file using this command

./proxy -verbose -ephemeral-ports-range 50000:50010 2>&1 >/dev/null  | awk '/Read: / { print $4 }' | tee read_sizes.txt

I got the result:

min: 8 max: 1402 median: 1402 average: 910.305

Suggested buffer size: 2K Current buffer size: 32768 (32K, experimentally verified)

Using a Snowflake Proxy in Tor browser and use Wireshark

I also used Wireshark, and concluded that all packets sent was < 2K.

Conclusion

As per the commit I suggest changing the buffer size to 2K. Some things I have not been able to answer:

  1. Does this make a big impact on performance?
  2. Are there any unforseen consequences? What happens if a packet is > 2K (I think the Go standard libary just splits the packet, but someone please confirm).

Merge request reports