Use a sync.Pool to reduce allocation in QueuePacketConn (!145) · Merge requests · The Tor Project / Anti-censorship / Pluggable Transports / Snowflake

David Fifield requested to merge dcf/snowflake:queuepacketconn-pool into main Apr 05, 2023

This is the continuation of a thread that started at #40260 (comment 2885524). #40260 (closed) undid an erroneous performance optimization; however that optimization was tried for a reason, and we talked about possible correct alternatives.

This merge request uses a sync.Pool in QueuePacketConn as a source of byte buffers that are stored in queues, rather than allocating fresh buffers with make and letting each one be garbage collected.

The most notable change relative to the patch proposed in #40260 (comment 2885539) is that buffers are automatically returned to the pool in more situations: when a packet is silently dropped in QueueIncoming and WriteTo, and with the packets whose contents are returned by ReadFrom.

Besides that, 97c93001 is a fix for benchmark interference from TestQueuePacketConnWriteToKCP noted at #40260 (comment 2885832). The test started a goroutine that was supposed to stop when the test function returned, but a bug meant the goroutine kept running and consumed resources during the benchmark. 6bae31f0 reduces memory allocation during BenchmarkQueueIncoming and BenchmarkWriteTo, which is no longer necessary since #40260 (closed) went back to permitting the caller to reuse its passed-in buffer.

I went with the model of having a sync.Pool be an instance variable of QueuePacketConn. That makes it easier to control the MTU as a parameter to NewQueuePacketConn. It does forego the possibility of some sharing of a single pool over multiple QueuePacketConn (in practice we do use multiple QueuePacketConn, 4 currently). My intuition says it does not matter much.

These are the QueuePacketConn benchmarks at 97c93001 (i.e., without the sync.Pool change but with the benchmark fixes):

snowflake/common/turbotunnel$ go test -bench 'Benchmark(QueueIncoming|WriteTo)' -benchmem -benchtime 10s
goos: linux
goarch: amd64
pkg: git.torproject.org/pluggable-transports/snowflake.git/v2/common/turbotunnel
cpu: Intel(R) Core(TM) i5 CPU         680  @ 3.60GHz
BenchmarkQueueIncoming-4        42610378               299.1 ns/op           514 B/op          1 allocs/op
BenchmarkWriteTo-4              22618404               463.1 ns/op           516 B/op          1 allocs/op

The results at c097d5f3 (i.e., with the sync.Pool change):

snowflake/common/turbotunnel$ go test -bench 'Benchmark(QueueIncoming|WriteTo)' -benchmem -benchtime 10s
goos: linux
goarch: amd64
pkg: git.torproject.org/pluggable-transports/snowflake.git/v2/common/turbotunnel
cpu: Intel(R) Core(TM) i5 CPU         680  @ 3.60GHz
BenchmarkQueueIncoming-4        77492412               187.0 ns/op            28 B/op          1 allocs/op
BenchmarkWriteTo-4              29897580               362.6 ns/op            36 B/op          1 allocs/op

Use a sync.Pool to reduce allocation in QueuePacketConn

Merge request reports