Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
T
torflow
  • Project overview
    • Project overview
    • Details
    • Activity
  • Issues 49
    • Issues 49
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Analytics
    • Analytics
    • CI / CD
    • Value Stream
  • Wiki
    • Wiki
  • Members
    • Members
  • Activity
  • Create a new issue
  • Jobs
  • Issue Boards
Collapse sidebar

GitLab is used only for code review, issue tracking and project management. Canonical locations for source code are still https://gitweb.torproject.org/ https://git.torproject.org/ and git-rw.torproject.org.

  • The Tor Project
  • Network Health
  • torflow
  • Issues
  • #26675

Closed
Open
Opened Jul 06, 2018 by starlight@starlight

Incorrect ancillory data presented on Torflow guard relay "update" vote lines

I realize little appetite exists for correcting Torflow issues at this juncture, but I came across an inexplicable behavior that is either my inability to unravel the code or a bug serious enough to merit fixing even at this stage of the new scanner development.

Investigating the scenario where the current measurement of a Guard relay is disregarded and instead the most recent voted measurement revised, I found this:

first archive appearance of new measurement vote

https://bwauth.ritter.vg/bwauth/bwscan.20180629-0645

1530271972
node_id=$4F0DB7E687FC7C0AE55C8F243DA8B0EB27FBF1F2 bw=11500 nick=Binnacle measured_at=1530271632 updated_at=1530271632 pid_error=0.34411159803 pid_error_sum=0.34411159803 pid_bw=11467418 pid_delta=-0.0875088625329 circ_fail=0.0 scanner=/scanner.5/scan-data/bws-51.6:52.4-done-2018-06-29-06:27:12

@type server-descriptor 1.0
router Binnacle 108.53.208.157 443 0 80
published 2018-06-29 00:29:07
fingerprint 4F0D B7E6 87FC 7C0A E55C 8F24 3DA8 B0EB 27FB F1F2
bandwidth 1073741824 1073741824 8531597

first subsequent no-measurement vote update

https://bwauth.ritter.vg/bwauth/bwscan.20180701-0545

1530441163
node_id=$4F0DB7E687FC7C0AE55C8F243DA8B0EB27FBF1F2 bw=20100 nick=Binnacle measured_at=1530271632 updated_at=1530439657 pid_error=0.34411159803 pid_error_sum=0.34411159803 pid_bw=11467418 pid_delta=-0.0875088625329 circ_fail=0.0 scanner=/scanner.4/scan-data/bws-38.6:39.4-done-2018-07-01-05:07:37

@type server-descriptor 1.0
router Binnacle 108.53.208.157 443 0 80
published 2018-06-30 12:30:08
fingerprint 4F0D B7E6 87FC 7C0A E55C 8F24 3DA8 B0EB 27FB F1F2
bandwidth 1073741824 1073741824 9458512

reading aggregate.py it appears this case his handled by lines 678-688 with kp=1 ki=0 and kd=0:

  # Don't use feedback here, but we might as well use our
  # new measurement against the previous vote.

  n.copy_vote(prev_votes.vote_map[n.idhex])

  if cs_junk.use_desc_bw:
    n.new_bw = n.get_pid_bw(prev_votes.vote_map[n.idhex],
                        cs_junk.K_p,
                        cs_junk.K_i,
                        cs_junk.K_d,
                        0.0, False)


  def get_pid_bw(self, prev_vote, kp, ki, kd, kidecay, update=True):
    if not update:
      return self.use_bw \
                  + kp*self.use_bw*self.pid_error \
                  + ki*self.use_bw*self.pid_error_sum \
                  + kd*self.use_bw*self.pid_delta

line 591
  n.use_bw = n.desc_bw

line 556
  prev_votes = VoteSet(argv[-1])
line 205
  try:
    self.pid_error = float(re.search("[\s]*pid_error=([\S]+)[\s]*", line).group(1))

  def copy_vote(self, vote):
    self.new_bw = vote.bw*1000 # Not set yet
    self.pid_bw = vote.pid_bw  # Not set yet
    self.pid_error_sum = vote.pid_error_sum # Not set yet
    self.pid_delta = vote.pid_delta # Not set yet

My reading indicates new_bw is calculated by applying the last vote pid_error to the current descriptor bandwidth. Comment appears misleading, but the logic seems sensible enough.

Problem is that 9458512 + 9458512 * 0.34411159803 is 12713,296 rather than the published value of 20100. If my understanding of the code is correct, the actual behavior here does not reflect the intended behavior and the result is egregiously incorrect. I suspect a bug, but the code is complex and in the absence of debugging a live Torflow instance my analysis could be wrong.

Perhaps Mike Perry could chime in here?

Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
Reference: tpo/network-health/torflow#26675