hs: Multiple service issues
While investigating #21297 (moved), I found several issues. Unfortunately, I can't find the cause of #21297 (moved) yet but maybe someone will with that list of issues.
- Integer overflow in
rend_consider_services_intro_points()
. We have a safe guard against that except if theexpiring_nodes
list is not empty which is totally a possibility. This is bad because it will ultimately create a LOT of IP circuits and fail in unknown ways...
if (smartlist_len(service->expiring_nodes) == 0 &&
intro_nodes_len >= service->n_intro_points_wanted) {
continue;
}
/* Number of intro points we want to open which is the wanted amount
* minus the current amount of valid nodes. */
n_intro_points_to_open = service->n_intro_points_wanted - intro_nodes_len;
- In
rend_service_intro_has_opened()
, this if is subject to a bad "cast overflow" if we have a case that brings theexpiring_nodes
size bigger than the number of circuits. The following result is cast as anunsigned int
so for instance "5 - 6" is actually a BIG number.
if ((count_intro_point_circuits(service) -
smartlist_len(service->expiring_nodes)) >
service->n_intro_points_wanted) {
The consequence of this is that if we have a bug (maybe #21297 (moved)!!) that brings the size of expiring_nodes
to 6 and then we try to open 5 IP circuits for the service (3 + 2 extras), all 5 will always be closed and it will loop non stop.
- In
remove_invalid_intro_points()
, if we put an IP object in theretry_nodes
list and this IP also expires, it will be put in theexpiring_nodes
list of the service and removed from the service valid list. After this, we'll retry a circuit to that IP object but since it's not in the valid list anymore, therend_consider_services_intro_points()
will launch a bunch of other IP circuit(s) ignoring the one being retried. This is OK because the one we want to retry should actually expire BUT we could avoid a useless circuit being created since we know it will expire and potential bug of losing the reference to that circuit if we have issues with cleaning up ourexpiring_nodes
list.
My theory is that the solution to #21297 (moved) lies in a combination of the above bugs. Either we overflow with (2) or we have IP circuits that we've lost the reference to (1) making (2) always closing the 5 circuits we just opened and thus not having any valid IP object for the service. I don't think (1) is involved but you never know.