Skip to content

Specialize ClientMap for ClientID

The ClientMap.SendQueue function is the first actual Snowflake function that appears in a CPU profile of snowflake-server:

snowflake-server.2023-06-28T10_47_29Z.cpu.prof

$ go tool pprof -text 'snowflake-server.2023-06-28T10:47:29Z.cpu.prof'
File: snowflake-server.20230628.c3e2f91b.prof
Type: cpu
Time: Jun 28, 2023 at 10:47am (UTC)
Duration: 1hrs, Total samples = 63696.55s (1769.24%)
Showing nodes accounting for 52749.47s, 82.81% of 63696.55s total
Dropped 1622 nodes (cum <= 318.48s)
      flat  flat%   sum%        cum   cum%
 17749.19s 27.87% 27.87%  17749.19s 27.87%  runtime/internal/syscall.Syscall6
  3813.60s  5.99% 33.85%   3813.60s  5.99%  runtime.epollwait
  2380.75s  3.74% 37.59%   8172.14s 12.83%  runtime.selectgo
  1915.76s  3.01% 40.60%   1915.76s  3.01%  runtime.memmove
  1770.22s  2.78% 43.38%   2580.39s  4.05%  runtime.lock2
  1269.98s  1.99% 45.37%   1341.07s  2.11%  runtime.casgstatus
  1046.43s  1.64% 47.01%   1291.14s  2.03%  runtime.unlock2
   998.18s  1.57% 48.58%   2658.96s  4.17%  runtime.sellock
   905.56s  1.42% 50.00%   3656.57s  5.74%  runtime.mallocgc
   867.74s  1.36% 51.36%    871.58s  1.37%  runtime.(*waitq).dequeue (inline)
   826.43s  1.30% 52.66%    826.43s  1.30%  crypto/aes.gcmAesEnc
   789.16s  1.24% 53.90%   2709.70s  4.25%  github.com/xtaci/kcp-go/v5.(*KCP).flush
   617.51s  0.97% 54.87%   7628.15s 11.98%  runtime.schedule
   572.20s   0.9% 55.77%   2542.47s  3.99%  gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/v2/common/turbotunnel.(*ClientMap).SendQueue
   538.09s  0.84% 56.61%    538.09s  0.84%  runtime.futex
   513.24s  0.81% 57.42%    513.24s  0.81%  runtime.procyield
   445.32s   0.7% 58.12%    466.23s  0.73%  runtime.findObject

It only accounts for 1% of total CPU time, or 4% if you count the cumulative contribution of the mutex lock/unlock and the operations on the inner map, but it's still a hot spot.

In the branch https://gitlab.torproject.org/dcf/snowflake/-/commits/sendqueue-clientid/ I have a change to make ClientMap assume it is always dealing with a ClientID, not just any net.Addr. The change makes ClientMap.SendQueue faster in BenchmarkSendQueue.