Offloading KCP related processing from server to proxy
Currently, to the best of my knowledge, the proxy will forward all data it received to the server, where packet loss and connection instability are compensated.
@arma asked if it would be possible to offload the packet loss compensation(KCP) to the proxy, thus reducing the traffic between proxy and server in order to improve connection speed. I am unsure if this would be possible, so I opened this ticket to start a public discussion that includes @dcf.
The original chat log is as follows:
[7:17:46 pm] <+nickm> so i might have a hard time seeing whatever.
[7:36:23 pm] <+nickm> (is hetzner on fire, or is this the threatened upgrade to a new debian version?)
[7:41:20 pm] -*- ahf dont know
[7:47:00 pm] <+meejah> the chances of _two_ fires in one year are low, right?? ;)
[7:53:22 pm] <shelikhoo> there is a tool that can reduce the impact of packet loss: https://github.com/xtaci/kcptun
[7:53:53 pm] <shelikhoo> the KCP part of this tunnel software is already used in snowflake
[7:54:30 pm] <shelikhoo> it will send packets aggressively, so packet loss are overpowered
[7:55:33 pm] <shelikhoo> I use this for my traffic between home network and network egress to compensate for network quality issue with local ISP
[7:58:17 pm] <+armadev> shelikhoo: hey, speaking of kcp and snowflake, and now also speaking of tcp and bbr
[7:58:27 pm] <+armadev> if there is packet loss between the snowflake user and the snowflake volunteer,
[7:58:35 pm] <+armadev> like say one of them is inside china and one of them outside,
[7:58:56 pm] <+armadev> how does snowflake handle this? in the obfs4 case we found that being more aggressive at tcp helped a lot
[7:59:13 pm] <+armadev> ( https://gitlab.torproject.org/tpo/anti-censorship/team/-/issues/65 )
[7:59:14 pm] [zwiebelbot] tor:tpo/anti-censorship/team#65: S96 dynamic IP obfs4 bridge performance insufficiency - https://bugs.torproject.org/tpo/anti-censorship/team/65 - [Open]
[7:59:35 pm] <+armadev> is there some similar change we should consider for snowflake? or is the kcp part supposed to already handle that?
[8:00:07 pm] <shelikhoo> I think snowflake have built-in KCP, but test result from vantage point show some of times the bootstrap didn't finish
[8:00:22 pm] <shelikhoo> in our vantage point in China
[8:01:24 pm] <shelikhoo> let's say in the most recent test there is 1 of 10 times that it stuck at 50%
[8:01:43 pm] <+armadev> right. i am wondering if kcp, between client and bridge, is too big a loop
[8:02:22 pm] <shelikhoo> I didn't get the idea behind "is too big a loop"
[8:02:24 pm] <+armadev> (client -> volunteer -> bridge (and back))
[8:02:31 pm] <shelikhoo> oh yes
[8:02:49 pm] <shelikhoo> and KCP create a lot of traffic
[8:02:53 pm] <shelikhoo> which means
[8:03:13 pm] <shelikhoo> (volunteer -> bridge) will be slower
[8:03:16 pm] <shelikhoo> and
[8:03:32 pm] <shelikhoo> bridge will need to process more traffic
[8:04:09 pm] <+armadev> how is webrtc (dtls) at handling packet loss?
[8:04:27 pm] <+armadev> like, how much are we relying on kcp here because the other layers are failing us
[8:04:52 pm] <shelikhoo> however we are unable to move this KCP processing to volunteer side, since that would require rework of turbo tunnel....
[8:05:08 pm] <shelikhoo> webrtc handle packet loss with SCTP
[8:05:12 pm] <shelikhoo> not DTLS
[8:05:24 pm] <shelikhoo> https://github.com/pion/sctp
[8:05:33 pm] <shelikhoo> but it is toooooooo conservative
[8:05:53 pm] <shelikhoo> so won't work at all when there is packet loss
[8:06:17 pm] <shelikhoo> so very slow when there is constant packet loss
[8:06:26 pm] <+armadev> ok so that is a good candidate as The Problem
[8:07:23 pm] <shelikhoo> I think the task in improving snowflake speed is assigned to cecylia ....
[8:07:44 pm] <shelikhoo> But I also have some experience in getting around this issue
[8:07:47 pm] <shelikhoo> as well
[8:08:13 pm] <+armadev> yep. i am not worried that we will steal her task and accidentally finish it :)
[8:09:17 pm] <shelikhoo> it is actually a quite difficult task, the way I was trying to solve it in my own research is with forward error correction
[8:10:14 pm] <shelikhoo> like reed solomon
[8:10:20 pm] <shelikhoo> reed solomon
[8:10:28 pm] <shelikhoo> or Fountain code
[8:11:36 pm] <shelikhoo> so instead of retransmit data when things are lost like tcp
[8:11:46 pm] <shelikhoo> or send a few copy of data like kcp
[8:12:02 pm] <shelikhoo> send original data and a few reconstruction shard
[8:12:47 pm] <+armadev> year. this is an entire research field. i imagine the theory is pretty easy, but if the reality is that packet loss isn't uniformly-at-random, the theory starts to fall apart
[8:13:15 pm] <shelikhoo> so in an given block the number of loss packet is lower than reconstruction shard then it will not need retransmission
[8:13:48 pm] <shelikhoo> and in my own project, there is packet dispatch pattern control
[8:14:20 pm] <shelikhoo> so reconstruction shard are not all send in the same time as the data itself
[8:15:15 pm] <shelikhoo> instead different packet are dynamically scheduled to send interlaced
[8:15:50 pm] <shelikhoo> so burst lost and constant loss case all deal with in an best effort way
[8:18:53 pm] <+armadev> you should learn about... what's it called.. 'network coding'
[8:19:49 pm] <shelikhoo> yes! added to todo list
[8:20:57 pm] <+armadev> all of these things are fun in theory but the people who work on them rarely actually interact with the real world. that makes it tough. :)
[8:22:16 pm] <shelikhoo> Isn't this kind of thing that are in mobile phone's baseband that are also know as 4G/5G?
[8:22:34 pm] <shelikhoo> Isn't this kind of thing are in mobile phone's baseband that are also know as 4G/5G?
[8:22:55 pm] <shelikhoo> so they kind of need to face reality
[8:25:50 pm] <+armadev> i don't know. good question. i would also wonder if the type/pattern of packet loss they see is different from what snowflake sees.
[8:26:04 pm] <+armadev> they probably get transient radio interference etc, which is different from congestion
[8:28:31 pm] <shelikhoo> Yes that could be true. This is a good question that I don't answer now. But when cecylia actually begin the work on this part, I would be happy to join the discussion about transfer performance(and sad if not invited....).
[8:30:07 pm] <+armadev> please grab the backlog here in case you want to use it then
[8:30:24 pm] <+armadev> and bringing it back to snowflake: wait what, we use sctp, not dtls? is that because we use the data channel and not the media channel?
[8:30:50 pm] <shelikhoo> dtls is encryption
[8:31:00 pm] <shelikhoo> sctp is packet -> stream
[8:31:12 pm] <+armadev> oh. it's dtls on the outside, sctp inside, and yet something else inside that probably?
[8:31:31 pm] <shelikhoo> then turbo tunnel inside
[8:31:38 pm] <+armadev> so the answer to "how does dtls handle packet loss" is "some of the packets don't arrive, that's how it handles it"
[8:31:42 pm] <shelikhoo> turbo tunnel includes a layer of kcp
[8:32:47 pm] <shelikhoo> DTLS will propagate packet loss to the user
[8:32:59 pm] <+armadev> does webrtc always use sctp?
[8:33:27 pm] <shelikhoo> yes, but sctp support reliable and unreliable traffic
[8:33:41 pm] <shelikhoo> so it can either propagate packet loss
[8:33:52 pm] <shelikhoo> or deal with it itself
[8:34:30 pm] <shelikhoo> we are asking it to propagate packet loss and deal with it at turbo tunnel's kcp
[8:35:06 pm] <+armadev> oh interesting. so we could try to get it to fix its packet loss at the client -> volunteer layer too.
[8:35:27 pm] <+armadev> and then we'd have two layers fighting each other, but maybe it's stll a win. fun.
[8:35:40 pm] <shelikhoo> that would some rework of turbo tunnel i guess
[8:36:20 pm] <+armadev> not necessarily. we could let turbotunnel keep doing what it is doing, for example to handle when you change snowflakes
[8:38:22 pm] <shelikhoo> I am unsure about that.... We can discuss this in a ticket.... I can create a ticket to discuss this with dcf around....
[8:42:12 pm] <+armadev> yep. i don't have any answers. just yet more possible ways to combine cpomponents.
[8:42:38 pm] <shelikhoo> yes...
[8:42:43 pm] <+armadev> (experiencing my own packet loss here, which makes typos, sorry)
[8:43:12 pm] <shelikhoo> (no problem~)```