sched: In KIST, the extra_space kernel value needs to be allowed to be negative
KIST, when updating the TCP socket information, computes a limit of bytes we are allowed to write on the socket of the given active channel.
tcp_space tells us how much TCP buffer space we have in the kernel for this socket. The computation is below. I encourage anyone to go read the comment in
update_socket_info_impl() to know more about the why:
tcp_space = (ent->cwnd - ent->unacked) * (int64_t)(ent->mss);
After that, we compute some
extra_space to be used to give the kernel a bit more data so when the ACK comes back from the packets sitting in the
tcp_space, it can then take some in that extra space and doesn't have to wait on the scheduler to feed more data. Here is how it is computed:
extra_space = clamp_double_to_int64( (ent->cwnd * (int64_t)ent->mss) * sock_buf_size_factor) - ent->notsent;
It uses the
notsent value which is the size of the queue in the kernel with data not sent so the data in there is not reflected in the
unacked value because they haven't been sent yet on the wire.
That queue can be large, someimtes bigger than the
tcp_space we computed above because the congestion window moves over time and the kernel can move as much as its want from the congestion windows into the output queue, that is the TCP stack black magic. On minute the cwnd = 10 and the other it is 67.
extra_space becomes negative because
notsent is bigger than the current congestion window, this means that the regular
tcp_space needs to shrink down. Right now, we just add the extra_space if it is positive but the reality is that the current tcp space needs to consider the
notsent size also.
Bottom line, if
tcp_space + extra_space end up < 0, the allowed limit needs to be
0 and not what
We've been able to find this issue while looking at very loaded relays that kept putting data in the outbuf while the connection socket was not ready to write. We realized that the
notsent queue was huge but still KIST was allowing more bytes to be written over and over again filling the outbuf at a rapid rate and thus the memory.