fix(WebRTCPeer): avoid races for closed and lastReceive
The first data race occurs because concurrent goroutines are
intermixing reads and writes of WebRTCPeer.closed
.
The second one occurs because concurrent goroutines are intermixing
reads and writes of WebRTCPeer.lastReceive
.
I spotted both issues when integrating Snowflake inside OONI in https://github.com/ooni/probe-cli/pull/373.
Overview
Here's what this diff does:
-
fix the
closed
data race by making closed a int32 and by using thesync/atomic
package to read/write it; -
fix the
lastReceive
data race by using a mutex; -
the code under
client/remove-later
is apated from OONI and exercises Snowflake in a way that shows the races.
(It could probably have been possible to write a simpler test case but this was what I could manage to do in a limited time.
Also, inspecting the code convinced me that these races are real and not an artefact of how OONI's using Snowflake.)
Testing
The idea is, of course, to remove the remove-later
folder if this
pull request is accepted and all the required changes have been made,
but I figured it could be useful for others to see it/use it.
(I only tested macOS, but I do not see any reason why the races would not be detected also on Linux, Windows, etc.)
To test, you need to do this:
-
cd client
-
go run -race ./remove-later/testcase
-
run tor using the command line printed by the testcase when it starts running.
The command line would be something like:
tor DataDirectory remove-later/testdata UseBridges 1 ClientTransportPlugin 'snowflake socks5 127.0.0.1:50693' Bridge 'snowflake 192.0.2.3:1 2B280B23E1107BB62ABFC40DDCC8824814F80A72'
Open questions
(Perhaps, it may be worth it adding an integration test as part of
future work? It may be useful to run such a test with -race
at
every commit, so to give us the guarantee that there are no data races?)
I choose to turn closed
into an atomic int32 rather than using a
mutex to protect it, because it felt a bit weird to have other structs
lock the mutex protecting WebRTCPeer
from the outside.
Perhaps, an even cleaner patch would be to have a synchronized method
called (*WebRTCPeer) Closed() bool
and always use that?
Since I am not super familiar with the preferred way of structuring code in Snowflake, I opted for what I considered to be a simple diff.
(By turning closed
into a int32 I could easily find and change
all the places in which it was being used.)