TLS reads and writes are not effectively buffered
Summary
This issue comes from the investigation behind #786 (closed), a code cleanup ticket about auto-flushing copiers. Those copiers are one tool in a toolbox of I/O buffering strategies, so it raises questions about how the tool is being used, for what, and how effectively. I tried to get some idea of our end-to-end buffering behavior in Arti using strace, and the trace shows some areas for improvement.
Steps to reproduce:
- Release build,
cargo build --release
- Trace Arti to a file, no string dumps,
strace -s 0 -o trace.log -f target/release/arti -o application.permit_debugging=true proxy
- Generate some load.. how about downloading a complex application over HTTP2,
curl -x socks5h://localhost:9150 -v https://www.youtube.com/ | less
What is the current bug behavior?
There are several places where we may potentially have buffering issues, I'm looking at them individually in this test.
The SOCKS end might be fine. Reads are buffered into 1024-byte chunks, a little small in general for user/kernel buffers but okay in practice for HTTP. I'd recommend increasing this to 8K or so by default.
The very small reads I'm seeing (5 bytes) are actually on the TLS end. The size and context would explain this as a TLS record header read. Normally this would be buffered, we are missing a buffering layer here. TLS is reading using syscall buffer sizes that match the TLS records.
Writes on the SOCKS end are never larger than 1 cell, even when larger buffers are being processed on the input side of the stream.
Trace excerpts below.
What is the expected behavior?
Ideally every wakeup from epoll_wait should result in some number of full (8K or so) buffers exchanged per socket plus no more than one under-full buffer per socket.
The TLS connection definitely needs to be buffered, and ideally we would have a way to flush multiple cells at once to the kernel when multiple cells are processed per wakeup.
Environment
- Version: git main
- Operating system: debian sid
Relevant logs and/or screenshots:
FD 15 is TLS, 16 is SOCKS.
2386747 epoll_wait(3, [...], 1024, 664) = 1
2386747 recvfrom(15, ""..., 5, 0, NULL, NULL) = 5
2386747 recvfrom(15, ""..., 531, 0, NULL, NULL) = 531
2386747 recvfrom(15, 0x7fc090213833, 5, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
2386747 sendto(16, ""..., 31, MSG_NOSIGNAL, NULL, 0) = 31
2386747 epoll_wait(3, [...], 1024, 524) = 1
2386747 recvfrom(15, ""..., 5, 0, NULL, NULL) = 5
2386747 recvfrom(15, ""..., 4065, 0, NULL, NULL) = 4065
2386747 recvfrom(15, ""..., 5, 0, NULL, NULL) = 5
2386747 recvfrom(15, ""..., 3679, 0, NULL, NULL) = 3679
2386747 recvfrom(15, ""..., 5, 0, NULL, NULL) = 5
2386747 recvfrom(15, ""..., 4065, 0, NULL, NULL) = 3825
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 164, MSG_NOSIGNAL, NULL, 0) = 164
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 404, MSG_NOSIGNAL, NULL, 0) = 404
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 404, MSG_NOSIGNAL, NULL, 0) = 404
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 epoll_wait(3, [...], 1024, 481) = 1
2386747 recvfrom(15, ""..., 240, 0, NULL, NULL) = 240
2386747 recvfrom(15, ""..., 5, 0, NULL, NULL) = 5
2386747 recvfrom(15, ""..., 81, 0, NULL, NULL) = 81
2386747 recvfrom(15, ""..., 5, 0, NULL, NULL) = 5
2386747 recvfrom(15, ""..., 531, 0, NULL, NULL) = 531
2386747 recvfrom(15, ""..., 5, 0, NULL, NULL) = 5
2386747 recvfrom(15, ""..., 4065, 0, NULL, NULL) = 581
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 310, MSG_NOSIGNAL, NULL, 0) = 310
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 11, MSG_NOSIGNAL, NULL, 0) = 11
2386747 epoll_wait(3, [...], 1024, 469) = 1
2386747 recvfrom(15, ""..., 3484, 0, NULL, NULL) = 2896
2386747 epoll_wait(3, [], 1024, 0) = 0
2386747 epoll_wait(3, [...], 1024, 363) = 1
2386747 recvfrom(15, ""..., 588, 0, NULL, NULL) = 588
2386747 recvfrom(15, ""..., 5, 0, NULL, NULL) = 5
2386747 recvfrom(15, ""..., 4065, 0, NULL, NULL) = 4065
2386747 recvfrom(15, ""..., 5, 0, NULL, NULL) = 5
2386747 recvfrom(15, ""..., 4065, 0, NULL, NULL) = 4065
2386747 recvfrom(15, ""..., 5, 0, NULL, NULL) = 5
2386747 recvfrom(15, ""..., 1751, 0, NULL, NULL) = 1751
2386747 recvfrom(15, ""..., 5, 0, NULL, NULL) = 5
2386747 recvfrom(15, ""..., 1045, 0, NULL, NULL) = 1045
2386747 recvfrom(15, ""..., 5, 0, NULL, NULL) = 5
2386747 recvfrom(15, ""..., 4065, 0, NULL, NULL) = 4065
2386747 recvfrom(15, ""..., 5, 0, NULL, NULL) = 5
2386747 recvfrom(15, ""..., 81, 0, NULL, NULL) = 81
2386747 recvfrom(15, ""..., 5, 0, NULL, NULL) = 5
2386747 recvfrom(15, ""..., 4065, 0, NULL, NULL) = 4065
2386747 recvfrom(15, ""..., 5, 0, NULL, NULL) = 5
2386747 recvfrom(15, ""..., 1109, 0, NULL, NULL) = 1109
2386747 recvfrom(15, ""..., 5, 0, NULL, NULL) = 5
2386747 recvfrom(15, ""..., 4065, 0, NULL, NULL) = 2051
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 310, MSG_NOSIGNAL, NULL, 0) = 310
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 404, MSG_NOSIGNAL, NULL, 0) = 404
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 404, MSG_NOSIGNAL, NULL, 0) = 404
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 404, MSG_NOSIGNAL, NULL, 0) = 404
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 404, MSG_NOSIGNAL, NULL, 0) = 404
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 404, MSG_NOSIGNAL, NULL, 0) = 404
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 404, MSG_NOSIGNAL, NULL, 0) = 404
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 404, MSG_NOSIGNAL, NULL, 0) = 404
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 404, MSG_NOSIGNAL, NULL, 0) = 404
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 404, MSG_NOSIGNAL, NULL, 0) = 404
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 404, MSG_NOSIGNAL, NULL, 0) = 404
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 404, MSG_NOSIGNAL, NULL, 0) = 404
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 404, MSG_NOSIGNAL, NULL, 0) = 404
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
2386747 sendto(16, ""..., 498, MSG_NOSIGNAL, NULL, 0) = 498
Possible fixes:
The missing buffers on TLS can be fixed by adding buffers presumably, but we need to be very careful about the end-to-end flush behavior in order to balance latency and overhead. Write buffering may need to be written with KIST in mind, but read buffering should be straightforwardly implementable.