sched: In KIST, the extra_space kernel value needs to be allowed to be negative
KIST, when updating the TCP socket information, computes a limit of bytes we are allowed to write on the socket of the given active channel.
First, the tcp_space
tells us how much TCP buffer space we have in the kernel for this socket. The computation is below. I encourage anyone to go read the comment in update_socket_info_impl()
to know more about the why:
tcp_space = (ent->cwnd - ent->unacked) * (int64_t)(ent->mss);
After that, we compute some extra_space
to be used to give the kernel a bit more data so when the ACK comes back from the packets sitting in the tcp_space
, it can then take some in that extra space and doesn't have to wait on the scheduler to feed more data. Here is how it is computed:
extra_space =
clamp_double_to_int64(
(ent->cwnd * (int64_t)ent->mss) * sock_buf_size_factor) - ent->notsent;
It uses the notsent
value which is the size of the queue in the kernel with data not sent so the data in there is not reflected in the unacked
value because they haven't been sent yet on the wire.
That queue can be large, someimtes bigger than the tcp_space
we computed above because the congestion window moves over time and the kernel can move as much as its want from the congestion windows into the output queue, that is the TCP stack black magic. On minute the cwnd = 10 and the other it is 67.
If extra_space
becomes negative because notsent
is bigger than the current congestion window, this means that the regular tcp_space
needs to shrink down. Right now, we just add the extra_space if it is positive but the reality is that the current tcp space needs to consider the notsent
size also.
Bottom line, if tcp_space + extra_space
end up < 0, the allowed limit needs to be 0
and not what tcp_space
is.
We've been able to find this issue while looking at very loaded relays that kept putting data in the outbuf while the connection socket was not ready to write. We realized that the notsent
queue was huge but still KIST was allowing more bytes to be written over and over again filling the outbuf at a rapid rate and thus the memory.