Skip to content

Find and address the root cause of client rendezvous errors

From our CDN 77 client panel, we've seen a lot of 504 gateway timeout errors recently that occur 5-6 seconds after the client sends their offer to the broker. We've updated the broker's ClientTimeout in response, to ensure that rather than receiving a 5xx error, the client receives a 200 response with the reason for the error being that it took too long to get a matched proxy answer.

That still leaves the question of why it's taking up to 6 seconds to get a working answer for the proxy and what we can do to speed this process up. We had a brief discussion of this in !498 (merged)

  • One option is that it's taking too long for the broker to match clients with proxies. This is a critical section and the broker has to obtain a lock for each client poll. This lock is also obtained when adding proxies to the pool and exporting metrics or debug information.
  • Another is that the proxy is taking too long to respond with an answer once the client is matched. The most time consuming part of this process is the gathering of ICE candidates for the answer.

The ClientTimeout is only applied after a client has been matched with a proxy, while it is waiting for the answer. So if !498 (merged) does reduce the number of 5xx errors, it suggests that it was the proxy answer step that was taking too long.

We have a relatively new timeout at the proxy to stop the ICE gathering process and send whatever candidates we have:

Should we update this value to allow proxies to return an answer before the new 5 second ClientTimeout?

Edited by Cecylia Bocovich