relay-search's "last seen" for a bridge counts times when the bridge isn't Running, and it shouldn't
We have a bridge operator who is concerned that their bridge is compromised / out of their control, because they shut it off yesterday, yet relay-search says its "last seen" field is an hour ago, and that it has a downtime of about an hour.
I went looking for the actual data, and the actual data contradicts what relay-search says. My first guess is that relay-search decides that if a bridge is listed in the bridge "status" file, then it has been 'seen', but in reality if it is listed but it doesn't have the Running flag, that alone should not bump the last-seen timestamp.
Because the details are going to go obsolete quickly, I'm going to try to capture them below.
The time that the bridge actually went offline was around 2022-01-28-1900 utc:
$ grep -A1 goldengatebridge 20220*
20220128-185753-BA44A889E64B93FAA2B114E02C2A279A8555C533:r goldengatebridge OrjNjcCf2mlFU5tBW+rqZmiTAnQ RF7itOjavccZKJvkp+Ic7IpEuJQ 2022-01-28 16:00:43 10.63.101.232 61959 0
20220128-185753-BA44A889E64B93FAA2B114E02C2A279A8555C533-s Running Valid
--
20220128-192753-BA44A889E64B93FAA2B114E02C2A279A8555C533:r goldengatebridge OrjNjcCf2mlFU5tBW+rqZmiTAnQ RF7itOjavccZKJvkp+Ic7IpEuJQ 2022-01-28 16:00:43 10.63.101.232 61959 0
20220128-192753-BA44A889E64B93FAA2B114E02C2A279A8555C533-s Valid
(this is the public anonymized data from https://collector.torproject.org/, so the ip address and port will be randomized, but the nickname and timestamps should still be right.)
I looked at Serge's current set of bridge descriptors, and 2022-01-28 16:00:43
is indeed the timestamp of the most recent descriptor.
Yet https://metrics.torproject.org/rs.html#details/3AB8CD8DC09FDA6945539B415BEAEA6668930274 lists Downtime (defined in the mouseover as "The time since this bridge as last seen online") as 1 hour 14 minute and 25 seconds
, and Last Seen (defined in the mouseover as "The timestamp when the bridge was last seen in the consensus") as 2022-01-29 09:57:58
. It is around 11am utc currently so those dates make me think that relay-search is taking last-seen to be "the time of the most recent bridge status file", since goldengatebridge is still listed, albeit not Running, in the most recent status file:
$ grep -A1 goldengatebridge 20220129-095758-BA44A889E64B93FAA2B114E02C2A279A8555C533
r goldengatebridge OrjNjcCf2mlFU5tBW+rqZmiTAnQ RF7itOjavccZKJvkp+Ic7IpEuJQ 2022-01-28 16:00:43 10.63.101.232 61959 0
s Valid
So: the simplest fix would be to change it so we don't really count it as existing or being seen if it doesn't have the Running flag. Then the last seen will actually be when it was last known to be Running.
But this simplest fix won't be good at handling the situation of an obfs4 bridge that firewalls its ORPort, e.g. the default tor browser obfs4 bridges. So I propose a slight adjustment to the above algorithm, which is: do count it as seeing the bridge when the timestamp in the bridge descriptor is newer than previous sightings. That is, if the bridge has published a new descriptor, yet it hasn't been considered Running for a long time, then that descriptor date is the time that the bridge was last seen, and the time to count downtime from. Note that the descriptor time is easily found in the status file, for example in the lines I quoted above (but see also Tor proposal 275 for how that might change in the future (but maybe 275 will only apply to microdescriptor consensus formats, which isn't what the bridge authority produces)).
And lastly, see #33493 for a related relay-search ticket, but that ticket is about helping users not worry about what their bridge flags are, and this ticket would be about changing which flags are presented, to be the ones from back when the bridge was last Running (or when it last published a new descriptor, if that's later).