• Writing a support article for relay operators to find out about why their relay is overloading: https://gitlab.torproject.org/tpo/web/support/-/merge_requests/43

  • Collector has been losing descriptors lately. The kind of log dump we get from cron is usually:

    2021-08-31 01:06:23,948 WARN o.t.m.c.r.ReferenceChecker:314 Missing too many referenced descriptors (3.438976158994976).
    2021-08-31 01:36:20,683 WARN o.t.m.c.r.ReferenceChecker:314 Missing too many referenced descriptors (3.4092098247190425).
    2021-08-31 02:06:28,280 WARN o.t.m.c.r.ReferenceChecker:314 Missing too many referenced descriptors (5.167635574090996).
    2021-08-31 02:36:23,190 WARN o.t.m.c.r.ReferenceChecker:314 Missing too many referenced descriptors (3.7290451703005147).
    2021-08-31 03:06:33,765 WARN o.t.m.c.r.ReferenceChecker:314 Missing too many referenced descriptors (5.106704060083311).
    2021-08-31 04:11:51,541 WARN o.t.m.c.r.ReferenceChecker:314 Missing too many referenced descriptors (1.656720594542896).
    2021-08-31 04:38:08,730 WARN o.t.m.c.r.ReferenceChecker:314 Missing too many referenced descriptors (1.537885181589836).
    2021-08-31 05:09:57,288 WARN o.t.m.c.r.ReferenceChecker:314 Missing too many referenced descriptors (3.481630411955097).
    2021-08-31 05:36:36,219 WARN o.t.m.c.r.ReferenceChecker:314 Missing too many referenced descriptors (3.392145080352952).
    2021-08-31 06:06:56,593 WARN o.t.m.c.r.ReferenceChecker:314 Missing too many referenced descriptors (4.956642055839017).
    2021-08-31 06:36:42,566 WARN o.t.m.c.r.ReferenceChecker:314 Missing too many referenced descriptors (4.897109387287149).
    2021-08-31 07:06:37,013 WARN o.t.m.c.r.ReferenceChecker:314 Missing too many referenced descriptors (6.361479345088055).
    2021-08-31 07:36:47,209 WARN o.t.m.c.r.ReferenceChecker:314 Missing too many referenced descriptors (6.232449438824111).
    2021-08-31 08:09:13,343 WARN o.t.m.c.r.ReferenceChecker:314 Missing too many referenced descriptors (2.6230701887968575).
    2021-08-31 08:39:48,491 WARN o.t.m.c.r.ReferenceChecker:314 Missing too many referenced descriptors (2.5338641227843692).
    2021-08-31 09:07:56,240 WARN o.t.m.c.r.ReferenceChecker:314 Missing too many referenced descriptors (4.118589590659841).
    2021-08-31 09:36:55,507 WARN o.t.m.c.r.ReferenceChecker:314 Missing too many referenced descriptors (3.9791630567124363).
    2021-08-31 10:06:40,755 WARN o.t.m.c.r.ReferenceChecker:314 Missing too many referenced descriptors (6.6751591483240915).
    2021-08-31 10:36:57,627 WARN o.t.m.c.r.ReferenceChecker:314 Missing too many referenced descriptors (6.556032588609297).
    2021-08-31 11:07:22,820 WARN o.t.m.c.r.ReferenceChecker:314 Missing too many referenced descriptors (2.619131323106306).
    2021-08-31 11:38:46,890 WARN o.t.m.c.r.ReferenceChecker:314 Missing too many referenced descriptors (2.024853775120285).
    2021-08-31 12:06:56,059 WARN o.t.m.c.r.ReferenceChecker:314 Missing too many referenced descriptors (3.0592812148284434).

    This is a common issue with collector as described in: https://gitlab.torproject.org/tpo/network-health/team/-/wikis/metrics/CollecTor#resolving-common-issues

    Nevertheless we should have a more verbose output to investigate if it is something worth looking at or it can be ignored.

    Edited by Hiro
  • Edited by Hiro
  • Both onionoo and metrics-web have io issues when analyzing descriptors. Open a ticket to tpa.

  • 1st Metrics okr should be systems monitoring. We do not know when or why we are losing data

    2nd metrics okr should be stability and scalability.

    3rd metrics okr should be restructuring part of our code base. Possible candidates are onionoo and the website.

    Edited by Hiro
  • Added a table to map metrics VMs to services.

    https://gitlab.torproject.org/tpo/network-health/team/-/wikis/metrics/machines

    Maybe some of these setups could be simplified. The haproxy/varnish onionoo setup could be an nginx probably.

  • Started https://gitlab.torproject.org/tpo/network-health/team/-/wikis/metrics/services-ops-docs Need more info and investigation on the state of things.

Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment