Skip to content
Snippets Groups Projects

Add timeout for proxy webRTCConn

Merged Cecylia Bocovich requested to merge cohosh/snowflake:issue/40220 into main

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • added 1 commit

    Compare with previous version

  • requested review from @shelikhoo

  • Contributor

    3 minutes from start and I already see
    2022/10/19 06:03:03 Closed connection due to inactivity


    I think that some connections may become inactive after long periods of activity.
    But there are no statistics (Traffic throughput (up|down)...) after Closed connection due to inactivity.
    I'm not sure if stats are needed there, just thoughts.

  • Contributor

    Another strange effect:
    Lots of connections with similar duration:

    2022/10/19 06:12:20 Traffic throughput (up|down): 20 KB|10 KB -- (56 OnMessages, 365 Sends, over 510 seconds)
    2022/10/19 06:29:38 Traffic throughput (up|down): 1 MB|16 KB -- (81 OnMessages, 1164 Sends, over 510 seconds)
    2022/10/19 06:45:02 Traffic throughput (up|down): 7 MB|659 KB -- (944 OnMessages, 6205 Sends, over 510 seconds)
    2022/10/19 07:18:16 Traffic throughput (up|down): 39 KB|9 KB -- (38 OnMessages, 482 Sends, over 511 seconds)
    2022/10/19 07:19:39 Traffic throughput (up|down): 66 KB|34 KB -- (114 OnMessages, 926 Sends, over 510 seconds)
    2022/10/19 07:23:55 Traffic throughput (up|down): 175 KB|10 KB -- (49 OnMessages, 1081 Sends, over 510 seconds)
    2022/10/19 08:51:58 Traffic throughput (up|down): 1 MB|16 KB -- (70 OnMessages, 1307 Sends, over 511 seconds)
    2022/10/19 09:27:54 Traffic throughput (up|down): 3 MB|266 KB -- (294 OnMessages, 3254 Sends, over 510 seconds)
    2022/10/19 09:37:06 Traffic throughput (up|down): 31 KB|11 KB -- (78 OnMessages, 653 Sends, over 510 seconds)
    2022/10/19 09:57:44 Traffic throughput (up|down): 86 KB|20 KB -- (61 OnMessages, 915 Sends, over 510 seconds)
    2022/10/19 10:01:05 Traffic throughput (up|down): 2 MB|16 KB -- (74 OnMessages, 1928 Sends, over 512 seconds)
    2022/10/19 10:07:28 Traffic throughput (up|down): 39 KB|7 KB -- (49 OnMessages, 209 Sends, over 510 seconds)
    2022/10/19 10:08:30 Traffic throughput (up|down): 8 MB|898 KB -- (1630 OnMessages, 6753 Sends, over 510 seconds)

    Too many of them to be coincidence.

  • added 1 commit

    • 1bc54948 - Terminate timeoutLoop when conn is closed

    Compare with previous version

    • Okay I think this is ready for review now. I ran it for 8 hours and it seems to keep the number of current connections relatively low by cleaning up old ones.

      I got a few connections that never sent any bytes:

      2022/10/19 19:53:49 Traffic throughput (up|down): 0 B|0 B -- (0 OnMessages, 0 Sends, over 30 seconds)

      But also a lot of connections that lasted a long time:

      2022/10/19 22:21:28 Traffic throughput (up|down): 69 KB|54 KB -- (1879 OnMessages, 1844 Sends, over 8616 seconds)

      The one thing here I'm a bit nervous about is the call to c.activity <- struct{}{} in Write() blocking or slowing things down. I set the size of the buffer to 100 and go channels are pretty fast, but I'd appreciate some feedback there.

    • Contributor

      Problem with 510..512 second connections still persists for me.
      Did you searched for over 510 seconds in your logs?
      If it do not appears there, then this problem may related to UDP packet loss.
      And it is better in such case to make test with lossess too.

      but I'd appreciate some feedback there

      I do not know Go, so will not be able to answer this question.

      Edited by Vort
    • $ grep "over 51" proxy.log | wc -l
      3
      $ grep "Traffic throughput" proxy.log | wc -l
      52

      Only happened 3 out of 52 connections, which doesn't seem unusual.

    • Contributor

      11% for me. All of them not just closed, but due to inactivity.
      I don't expect amount of non-standard connections to be that large.
      More likely that 30s timeout is too low for some specific cases.
      I do not know what cases, however.

      $ grep -c "over 51[012]" log_crop2.txt && grep -c "Traffic throughput" log_crop2.txt
      52
      459
    • More likely that 30s timeout is too low for some specific cases.
      I do not know what cases, however.

      Even if someone is running a really old client, their connection will be shutdown on the client side for an inactivity period even less than this. It's set to timeout at 20 seconds right now. Since snowflake can recover fairly easily from disconnections between clients and proxies, I'd prefer a connection slow enough to trigger this to be discarded for a new connection anyway.

      11% for me. All of them not just closed, but due to inactivity.
      I don't expect amount of non-standard connections to be that large.

      This might be worth looking into more, but I don't want to block this feature because of it. It's hard to tell if it's a proxy problem, your specific network conditions, or some non-standard clients that are using the network differently than we are.

    • Contributor

      This might be worth looking into more, but I don't want to block this feature because of it.

      Ok, I agree with that. Should I create separate issue about this problem?

    • Yeah, it might be a while until someone from the team can help out, but it will be nice to have some record of it. And then in the meantime if you find out more, you can add your own notes there.

    • Contributor

      Done: #40222

    • Please register or sign in to reply
  • Cecylia Bocovich marked this merge request as ready

    marked this merge request as ready

  • Vort mentioned in issue #40222

    mentioned in issue #40222

  • Vort mentioned in issue #40211 (closed)

    mentioned in issue #40211 (closed)

39 }
40
41 func newWebRTCConn(pc *webrtc.PeerConnection, dc *webrtc.DataChannel, pr *io.PipeReader, eventLogger event.SnowflakeEventReceiver) *webRTCConn {
42 conn := &webRTCConn{pc: pc, dc: dc, pr: pr, eventLogger: eventLogger}
43 conn.bytesLogger = newBytesSyncLogger()
44 conn.activity = make(chan struct{}, 100)
45 conn.closed = make(chan struct{})
46 conn.inactivityTimeout = 30 * time.Second
47 go conn.timeoutLoop()
48 return conn
49 }
50
51 func (c *webRTCConn) timeoutLoop() {
52 for {
53 select {
54 case <-time.After(c.inactivityTimeout):
  • 32 32
    33 33 bytesLogger bytesLogger
    34 34 eventLogger event.SnowflakeEventReceiver
    35
    36 inactivityTimeout time.Duration
    37 activity chan struct{}
    38 closed chan struct{}
  • I think this merge request is almost good to go, I just have one small nitpick suggestion about the possibility to reuse the timer for performance reason.

  • Vort mentioned in issue #40154 (closed)

    mentioned in issue #40154 (closed)

  • Cecylia Bocovich added 22 commits

    added 22 commits

    • 1bc54948...90d1a567 - 18 commits from branch tpo/anti-censorship/pluggable-transports:main
    • 6007d5e0 - Refactor creation of webRTCConn in proxy
    • 5c23fcf1 - Add timeout for webRTCConn
    • b010de5a - Terminate timeoutLoop when conn is closed
    • b547d449 - Refactor timeout loop to use a context and reuse timers

    Compare with previous version

  • I've made some changes and am ready for another review when you have time.

  • shelikhoo approved this merge request

    approved this merge request

  • Thanks! I think this merge request is good to go! Thanks for working on this!

    • Contributor

      With recent increase of connection amount (40-50 simultaneously) it is harder to track if all connections are in use.
      But since I still see Closed connection due to inactivity in logs, most likely, latest commit is good.

      Edited by Vort
    • Thanks!

    • Please register or sign in to reply
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading