Onionprobe: a test/monitor tool for Onion Services

changed milestone to %Sponsor 123: Tor Secure Access Package for USAGM [First Phase]

Initial specs

Initial specs as described by @irl and @hiro on 2022-03-09 during conversation at #tor-internal:

10:49 +<irl> i wondered if we had something smart that would take a
             list of onions to check and make sure that you can always fetch descriptors
             rather than just using cached descriptors etc
10:49 +<irl> we also need to know about "does the site have useful content?"
11:00 +<irl> you configure it with a set of onion addresses, and it
             goes in a loop testing one at a time continuously, each test would wipe out
             descriptor caches so you're always fetching fresh descriptors and then would
             fetch a set of paths from each onion. you export
             prometheus metrics for the connection to the onion service, and extra metrics
             per path on the status code for each path returned by the server.
11:01 +<irl> you could add in timing metrics wherever is appropriate
             there, using the existing blackbox_exporter timing metrics as a model.
11:01 +<irl> bonus: allow configuring a regex per path for what should
             be found in the returned content/headers.
12:23 +<hiro> if you use the prometheus exporter with python one could
              just use request and beautiful soup to check that the page is returning what
              one expects

Initial research

Some tools with similar application domains:

GitHub - systemli/prometheus-onion-service-exporter: Prometheus Exporter for Tor Onion Services
GitHub - s-rah/onionscan: OnionScan is a free and open source tool for investigating the Dark Web. (currently lacks v3 support).

Libraries that can be used or studied as references:

added Backlog label

HTTP status codes
Regex for content inside the page
Customisable by test path (not all our sites have content at the root)
Randomisation of timing to avoid systemic errors getting lucky and not detected
Flushing descriptor caches so we're testing as if we're a fresh client (but let's not have to bootstrap every time if we can avoid it)
Page load latency

To get the timings right, the tool should take care of the test frequency and just expose the metrics rather than having Prometheus scraping individual targets on Prometheus' schedule.

Ok, so Sponsor 123 project management agreed that I can start this tool using a week of development time

I'm sketching a simple tool to show you and make sure I got the idea right.

First version probably will only contain the main loop, checking the descriptors (with cache cleansing) and testing the connection, without the prometheus integration (to be sorted out later after the business logic is working).

assigned to @rhatto

added Doing label

removed Backlog label

changed due date to March 19, 2022

changed time estimate to 20h

changed time estimate to 56h

added 24h of time spent

changed time estimate to 72h

Initial working prototype is here, but might be move under the network health umbrella.

Task estimation

Complexity: large (5 days)
Uncertainty: low (x1.1)
Reference

changed time estimate to 44h

Currently most of initial specs are implemented. @irl, could we discuss it in our next week's meeting? I'd like to know if the current code matches your needs/expectations and also what should be done next.

I'm also planning to present it at the 23th March's Demo Day to collect additional feedback and suggestions.

added Needs Review label

removed Doing label

marked this issue as related to #14 (closed)

marked this issue as related to #58 (moved)

marked this issue as related to #59 (moved)

closed

removed Needs Review label

New canonical repository.

made the issue confidential

made the issue visible to everyone

moved to onionprobe#1 (closed)

Onionprobe: a test/monitor tool for Onion Services

Child items ...

Activity

Initial specs

Initial research

Task estimation