Skip to content

fix(WebRTCPeer): avoid races for closed and lastReceive

sbs requested to merge sbs/snowflake:fix/race into main

The first data race occurs because concurrent goroutines are intermixing reads and writes of WebRTCPeer.closed.

The second one occurs because concurrent goroutines are intermixing reads and writes of WebRTCPeer.lastReceive.

I spotted both issues when integrating Snowflake inside OONI in https://github.com/ooni/probe-cli/pull/373.

Overview

Here's what this diff does:

  1. fix the closed data race by making closed a int32 and by using the sync/atomic package to read/write it;

  2. fix the lastReceive data race by using a mutex;

  3. the code under client/remove-later is apated from OONI and exercises Snowflake in a way that shows the races.

(It could probably have been possible to write a simpler test case but this was what I could manage to do in a limited time.

Also, inspecting the code convinced me that these races are real and not an artefact of how OONI's using Snowflake.)

Testing

The idea is, of course, to remove the remove-later folder if this pull request is accepted and all the required changes have been made, but I figured it could be useful for others to see it/use it.

(I only tested macOS, but I do not see any reason why the races would not be detected also on Linux, Windows, etc.)

To test, you need to do this:

  1. cd client

  2. go run -race ./remove-later/testcase

  3. run tor using the command line printed by the testcase when it starts running.

The command line would be something like:

tor DataDirectory remove-later/testdata UseBridges 1 ClientTransportPlugin 'snowflake socks5 127.0.0.1:50693' Bridge 'snowflake 192.0.2.3:1 2B280B23E1107BB62ABFC40DDCC8824814F80A72'

Open questions

(Perhaps, it may be worth it adding an integration test as part of future work? It may be useful to run such a test with -race at every commit, so to give us the guarantee that there are no data races?)

I choose to turn closed into an atomic int32 rather than using a mutex to protect it, because it felt a bit weird to have other structs lock the mutex protecting WebRTCPeer from the outside.

Perhaps, an even cleaner patch would be to have a synchronized method called (*WebRTCPeer) Closed() bool and always use that?

Since I am not super familiar with the preferred way of structuring code in Snowflake, I opted for what I considered to be a simple diff.

(By turning closed into a int32 I could easily find and change all the places in which it was being used.)

Merge request reports