tls: `buf_read_from_tls()` is exiting its loop too early

Closed Issue created 4 years ago by David Goulet

This is a bit of a deep dive so you've been warned.

Context

The buf_read_from_tls() function is called when a connection is ready to read data from a TLS socket into its inbuf (buf_t).

That function loops over the total length of data we are allowed to consume (that value is affected by bandwidth limitation). The general case is 16k bytes or 32 cells being the hard coded maximum allowed per mainloop round of a connection read. The parameter is at_most and it is the maximum amount of data we are asking to read.

Wow OpenSSL

Within the read loop, we call read_to_chunk_tls() which essentially boils down to calling SSL_read(at_most) (with tor using OpenSSL). It is given the at_most bytes value and thus the expectation is that we'll get that amount of bytes back from the TLS socket.

Now it turns out that SSL_read() only returns a TLS record and will buffer the rest up to at_most if that limit is higher than a record size (in our case a record size is 4096 bytes).

See the man page for a better explanation. I might of not fully understood it but experimental data shows that we get 4k bytes every time: https://www.openssl.org/docs/man1.0.2/man3/SSL_read.html

And thus the takeaway here is that read_to_chunk_tls() called with the upper limit of 16k bytes will always only return 4k bytes.

The Bug

Putting everything together, lets look at the read_to_chunk_tls() reading loop, see if you can spot the problem:

while (at_most > total_read) {
    [...]
    r = read_to_chunk_tls(buf, chunk, tls, readlen);
    if (r < 0)
      return r; /* Error */
    tor_assert(total_read+r <= BUF_MAX_LEN);
    total_read += r;
    if ((size_t)r < readlen) /* eof, block, or no more to read. */
      break;
}

Notice the end of the loop if condition, it believes that if the returned length is smaller than the one asked, we've reached EOF (exactly like read() behaves). But in the case of SSL_read(), this is not true and so we bail every time after reading only a fraction of at_most that is one single TLS record.

I was able to confirm by adding custom log_debug:

Jun 19 13:53:13.551 [debug] buf_read_from_tls(): tls: read len: 14858, at_most: 14858, total: 0
Jun 19 13:53:13.551 [debug] tor_tls_read(): tls read: at_most 14858, read: 4048

You can see that the read is only 4048 (TLS overhead is removed and thus less than 4096).

Solution

Obvious solution here would be to remove that if condition and let tor consume up to the at_most value of bytes.

I did run an experiment with this and was able to see my onion service consuming 32 cells per round instead of 8.

Performance

Now, the naive thinking here would be "Ok wow, if we make tor read 4 times more cells, will tor cell relaying will improve by 4 folds?". In practice, probably not.

But this will obviously improve performance in some ways. Reason is that take KIST scheduler that runs every 10 msec. Now, it will be able to make a decision based on 4 times the amount of cells it used to see. If the TCP window allows it, more cells will go on the wire. But also, I strongly believe EWMA priority will get better here because circuit will see more cells and thus the back offs will get sharper per circuit instead of constantly hovering at most around 8-16 cells per round.

It remains unclear if this will trigger more congestion on the network for instance or buffer bloating or anything unexpected. Because of that, we might want to not backport this right away and see a progression from newer relays instead.

And of course, we need the shadow people to help us measure this quantitatively :).

Note: I was able to find this issue because I was measuring, with the tracing branch #32910 (closed), the onion service timings to handle lots of INTRO2 cells under DDoS and noticed that we would never processed more than 8 cells per mainloop round.