conn_read_callback is called on connections that are marked for closed
There is a bug causing busy loops in Libevent and infinite loops in the Shadow simulator. In my Shadow simulations, I have observed that conn_read_callback
is getting called on a connection that is marked for close.
This is similar to legacy/trac#5263 (moved), and as described there, I believe it is a bug for conn_read_callback
to be called on sockets that are marked for close.
The bug is problematic in Shadow when the marked connection also wants to flush, but attempting to write the outbuf inside conn_close_if_marked
fails because the write would block (because the kernel write buffer is already full). Because it still wants to flush, the connection does not get closed, but the connection remains readable causing Libevent to continue looping on conn_read_callback
until the socket buffer can actually write. This wastes CPU resources in Tor, and causes an infinite loop in Shadow because Shadow's discrete-event time does not advance during this loop.
I found the bug on 0.3.5.8, and it is probably present at least since then.
I think the conn should not be waiting for read events when it is marked. I'm not sure where in the code this assumption is first violated, but the following patch fixed the issue in my Shadow simulations:
diff --git a/src/core/mainloop/mainloop.c b/src/core/mainloop/mainloop.c
index 6e2b300..e82c77a 100644
--- a/src/core/mainloop/mainloop.c
+++ b/src/core/mainloop/mainloop.c
@@ -894,6 +894,21 @@ conn_read_callback(evutil_socket_t fd, short event, void *_conn)
}
assert_connection_ok(conn, time(NULL));
+ if (conn->marked_for_close && connection_is_reading(conn)) {
+ /* Libevent says we can read, but we are marked so we will never try
+ * to read again. We will try to close the connection below inside of
+ * close_closeable_connections(), but let's make sure not to cause
+ * Libevent to spin on conn_read_callback() while we wait for the
+ * socket to let us flush to it.*/
+ connection_stop_reading(conn);
+
+ /* Now, if we still have data to flush, then make sure Libevent tells
+ * us when the conn will allow us to write the bytes. */
+ if (connection_wants_to_flush(conn) && !connection_is_writing(conn)) {
+ connection_start_writing(conn);
+ }
+ }
+
if (smartlist_len(closeable_connection_lst))
close_closeable_connections();
}