Increase kcp and smux window and buffer sizes
This should increase the maximum amount of inflight data and hopefully the performance of Snowflake, especially for clients geographically distant from proxies and the server.
See https://lists.torproject.org/pipermail/anti-censorship-team/2021-July/000185.html for a justification of this change.
Merge request reports
Activity
requested review from @dcf
It also occurs to me that the kcp.MaxWindowSize has to be at least as big as the MaxStreamBuffer size to notice any improvements, otherwise that will be the limiting factor on the amount of inflight data. Right now this is set to 64KB for both the client and the server.
I believe
UDPSession.SetWindowSize
is in units of packets, not bytes. The current values of 65535 should be plenty. Does it still work with the current values? The kcptun defaults are 128 and 512.In the project I posted about (an AMP cache tunnel), I'm using
SetWindowSize(1024, 1024)
. Actually, I have measurements testing that change separately from increasing the smux buffers:direct thru cache condition (cumulative) 329.0 260.5 baseline 477.4 416.9 SetWindowSize(32, 32)
→(1024, 1024)
1388.3 1405.8 MaxStreamBuffer = 64*1024
→1*1024*1024
MaxReceiveBuffer = 4*1024*1024
→16*1024*1024
(server only)Ah, I think you're right. Hmm, I wonder why I wasn't getting any noticeable difference in download speed by changing the MaxStreamBuffer then (because, if I understand correctly, changing it at the client should be enough to affect downloads as opposed to uploads). I'll take a closer look at this again.
In experimenting with increasing the buffer sizes in dnstt, I found that another parameter that matters for tuning is the turbotunnel
queueSize
, which is the size of internal buffering forQueuePacketConn
, currently 32. If it's too small on the sending size, packets get dropped insideQueuePacketConn
and have to be retransmitted later. You can check whether drops are happening with the following patch. For me, there were drops in sending on the server side.diff --git a/common/turbotunnel/queuepacketconn.go b/common/turbotunnel/queuepacketconn.go index 14a9833..ede3433 100644 --- a/common/turbotunnel/queuepacketconn.go +++ b/common/turbotunnel/queuepacketconn.go @@ -1,6 +1,7 @@ package turbotunnel import ( + "log" "net" "sync" "sync/atomic" @@ -57,6 +58,7 @@ func (c *QueuePacketConn) QueueIncoming(p []byte, addr net.Addr) { select { case c.recvQueue <- taggedPacket{buf, addr}: default: + log.Printf("drop incoming %d", len(buf)) // Drop the incoming packet if the receive queue is full. } } @@ -99,6 +101,7 @@ func (c *QueuePacketConn) WriteTo(p []byte, addr net.Addr) (int, error) { return len(buf), nil default: // Drop the outgoing packet if the send queue is full. + log.Printf("drop outgoing %d", len(buf)) return len(buf), nil } }
In some quick tests, I started to get faster results in dnstt when I tried
-
queueSize = 2048
(was 32) -
SetWindowSize(1024, 1024)
(was 32) -
MaxStreamBuffer = 4 * 1024 * 1024
(was 65536)
It was only a little slower with
SetWindowSize(128, 128)
. I imagine there could be drawbacks to an overly largequeueSize
orSetWindowSize
—when a packet needs to be retransmitted it goes to the end of the queue. For reference, the settings I was using when I first noticed the potential of increasingMaxStreamBuffer
in the AMP cache tunnel were:queueSize = 256
SetWindowSize(1024, 1024)
MaxStreamBuffer = 1 * 1024 * 1024
-
added 7 commits
-
1470b25b...e3d376ca - 6 commits from branch
tpo/anti-censorship/pluggable-transports:main
- b5b5334f - Increase smux and QueuePacketConn buffer sizes
-
1470b25b...e3d376ca - 6 commits from branch
I spent some time on Friday and yesterday trying out the new Shadow release to see if it would work with Go yet. Turns out there's still some work that needs to be done to support Go binaries, but I'm hopeful that this will be useful for doing more in-depth experiments with Snowflake.
In the nearer term, I don't see an issue with increasing the
queueSize
andMaxStreamBuffer
as suggested here at the server and trying it out. I just pushed an update to the merge request.Okay, let's go ahead with this.
With a large
queueSize
, let's watch out for bufferbloat effects. What may happen is that when a connection is interrupted, the buffer fills up with packets that cannot be sent yet, possibly including retransmissions of packets already in the buffer. All these old stale packets will need to be flushed out of the buffer before anything new can be sent. I was thinking of instituting some kind of random early detection drops intoturbotunnel.QueuePacketConn
to mitigate these kinds of effects.I just merged and deployed this. Thanks for pointing out the potential buffer bloat issue. If I understand this correctly, two ways of monitoring this would be:
- in packet captures for clients that have interrupted connections, we'll see a lot of duplicate retransmissions following the outage and a longer than usual recovery time
- increased memory usage at the server
These sound difficult to actively monitor, we might just need to run a few intentional experiments to determine whether this is an issue.
The random early detection is interesting, thanks for the link!
Edited by Cecylia Bocovichmentioned in issue #40026
mentioned in issue #40086 (closed)
mentioned in commit dcf/snowflake@328f0f41
mentioned in issue #40179 (closed)
mentioned in commit dcf/snowflake@f157ce90
mentioned in commit dcf/snowflake@e6f57f2c
mentioned in commit dcf/snowflake@d4749d2c