Add a flag to disable Tor's DNS cache

Making a note here so I don't lose it from my local git checkout:

index 129f6209d7..3f6209c9ac 100644
--- a/src/feature/relay/dns.c
+++ b/src/feature/relay/dns.c
@@ -1291,6 +1291,8 @@ make_pending_resolve_cached(cached_resolve_t *resolve)
   * expire. See fd0bafb0dedc7e2 for a brief explanation of how this got that
   * way.  XXXXX we could do better!*/
 
+  /* right here is where we would return rather than adding the cached resolve back in */
+
   {
     cached_resolve_t *new_resolve = tor_memdup(resolve,
                                                sizeof(cached_resolve_t));

Considering this would be a security improvement from what I can read, I will tentatively label this for 048 so our next (or next-next) release can pick it up.

changed milestone to %Tor: 0.4.8.x-post-stable

added Security label

When doing this it would be great to make corresponding changes to the specification.

mentioned in issue arti#1448

This flag is worth adding to let the community experiment a bit I think.

Regarding big operators teaming up on DNS infra, we must remember not to create too big shared resolvers used exclusively by exits. With Tor's DNS cache disabled, the resolver operator (or anyone between the exit and resolver, barring transport security) gets juicy telemetry on what happens in the network.

So wait, your comment here seems to discourage disabling the cache... I'm not sure here what are you suggesting with this?

From what arma suggested earlier, we should avoid multiple inflight requests for the same domain but else, we should not cache. And you seemed ok with this?

Sorry if the comment came out confusing. Please let me clarify:

Adding support for a flag to disable the cache is a good idea. Please make sure there are big, flashy warnings associated with it.

As big exit operators, such as @tornth and friends, start collaborating on DNS resolver infrastructure, they must remember that while this may mitigate some attacks and even improve performance, as discussed in this issue (good), it also makes their DNS resolver infrastructure an increasingly juicy target (bad).

In a sense, when exit operators collaborate, it leads to centralization of the Tor network, which eventually becomes bad for the network. I don't know where to draw this line; I'm just confident in there being a line (somewhere)

Ok I see. So, this is just a related-parallel mention about big centralized DNS infra which becomes juicier if the Tor cache is disabled considering the increased DNS queries that infra would see. Thanks

Yup exactly! If this whole thing with improving DNS proves to be successful, it would be great if multiple big Tor operators (or even some additional parties that don't run Tor at all) run these DNS servers so DNS stays decentralized on the Tor network. And then smaller exit relay operators (if they want to use the improved DNS setup without hosting it themselves) can use a nearby (in terms of latency) improved DNS server (or even load balance between multiple) to keep centralization at bay. What can be centralized is the creation of the mega preload list used by the DNS recursors.

But this is all in the future, first we'll put effort in building and testing the required infrastructure .

What can be centralized is the creation of the mega preload list used by the DNS recursors.

+1, eventual work on maintaining such a list would be great!

I have a question about planning/timing for this proposal so I figured I put it here :). If disabling Tor's internal cache will be made possible, what kind of planning/approximate date until it's part of Tor should I take into account?

This improved DNS setup requires development, testing and (security) auditing on the DNS side of things. I'm planning on making use of my network and some NTH/Cyberology funds to work together with more people (researchers, specialists, hackers, programmers etc.) on this but before we start this project it's good to know what we can expect (and what not :).

If we need more work to be done here instead of a simple patch, then we'll get it in "when it is ready" is probably the only logical answer.

As this is a security issue, we'll get it in all our maintained release(s).

I suspect it's a rather simple patch, like Arma mentioned. Basically when the flag is set, then "return rather than adding the cached resolve back in". The thing is: I can't program well (let alone in C), otherwise I would have made a concrete suggestion already .

Well, seems a bit more work is needed considering arma's suggestion of avoiding multiple requests at the same time?

In other words, I'm not sure we settled here on how to proceed?

I think I have a simple proposal that avoids multiple requests at the same time.

Note the flow in dns_found_answer() in src/feature/relay/dns.c:

/** Called on the OR side when the eventdns library tells us the outcome of a
 * single DNS resolve: remember the answer, and tell all pending connections
 * about the result of the lookup if the lookup is now done.  (<b>address</b>
 * is a NUL-terminated string containing the address to look up;
 * <b>query_type</b> is one of DNS_{IPv4_A,IPv6_AAAA,PTR}; <b>dns_answer</b>
 * is DNS_OK or one of DNS_ERR_*, <b>addr</b> is an IPv4 or IPv6 address if we
 * got one; <b>hostname</b> is a hostname fora PTR request if we got one, and
 * <b>ttl</b> is the time-to-live of this answer, in seconds.)
 */
static void
dns_found_answer(const char *address, uint8_t query_type,
                 int dns_answer,
                 const tor_addr_t *addr,
                 const char *hostname, uint32_t ttl)
{
  cached_resolve_t search;
  cached_resolve_t *resolve;

  assert_cache_ok();

  strlcpy(search.address, address, sizeof(search.address));

  resolve = HT_FIND(cache_map, &cache_root, &search);
  if (!resolve) {
    int is_test_addr = is_test_address(address);
    if (!is_test_addr)
      log_info(LD_EXIT,"Resolved unasked address %s; ignoring.",
               escaped_safe_str(address));
    return;
  }
  assert_resolve_ok(resolve);

  if (resolve->state != CACHE_STATE_PENDING) {
    /* XXXX Maybe update addr? or check addr for consistency? Or let
     * VALID replace FAILED? */
    int is_test_addr = is_test_address(address);
    if (!is_test_addr)
      log_notice(LD_EXIT,
                 "Resolved %s which was already resolved; ignoring",
                 escaped_safe_str(address));
    tor_assert(resolve->pending_connections == NULL);
    return;
  }

  cached_resolve_add_answer(resolve, query_type, dns_answer,
                            addr, hostname, ttl);

  if (cached_resolve_have_all_answers(resolve)) {
    inform_pending_connections(resolve);

    make_pending_resolve_cached(resolve);
  }
}

Basically, some checks and bookkeeping, ending with inform_pending_connections and make_pending_resolve_cached.

We want to keep inform_pending_connections, this is what @arma asked for.

The simple fix is to now get into make_pending_resolve_cached:

/** Remove a pending cached_resolve_t from the hashtable, and add a
 * corresponding cached cached_resolve_t.
 *
 * This function is only necessary because of the perversity of our
 * cache timeout code; see inline comment for ideas on eliminating it.
 **/
static void
make_pending_resolve_cached(cached_resolve_t *resolve)
{
  cached_resolve_t *removed;

  resolve->state = CACHE_STATE_DONE;
  removed = HT_REMOVE(cache_map, &cache_root, resolve);
  if (removed != resolve) {
    log_err(LD_BUG, "The pending resolve we found wasn't removable from"
            " the cache. Tried to purge %s (%p); instead got %s (%p).",
            resolve->address, (void*)resolve,
            removed ? removed->address : "NULL", (void*)removed);
  }
  assert_resolve_ok(resolve);
  assert_cache_ok();
  /* The resolve will eventually just hit the time-out in the expiry queue and
  * expire. See fd0bafb0dedc7e2 for a brief explanation of how this got that
  * way.  XXXXX we could do better!*/

  {
    cached_resolve_t *new_resolve = tor_memdup(resolve,
                                               sizeof(cached_resolve_t));
    uint32_t ttl = UINT32_MAX;
    new_resolve->expire = 0; /* So that set_expiry won't croak. */
    if (resolve->res_status_hostname == RES_STATUS_DONE_OK)
      new_resolve->result_ptr.hostname =
        tor_strdup(resolve->result_ptr.hostname);

    new_resolve->state = CACHE_STATE_CACHED;

    assert_resolve_ok(new_resolve);
    HT_INSERT(cache_map, &cache_root, new_resolve);

    if ((resolve->res_status_ipv4 == RES_STATUS_DONE_OK ||
         resolve->res_status_ipv4 == RES_STATUS_DONE_ERR) &&
        resolve->ttl_ipv4 < ttl)
      ttl = resolve->ttl_ipv4;

    if ((resolve->res_status_ipv6 == RES_STATUS_DONE_OK ||
         resolve->res_status_ipv6 == RES_STATUS_DONE_ERR) &&
        resolve->ttl_ipv6 < ttl)
      ttl = resolve->ttl_ipv6;

    if ((resolve->res_status_hostname == RES_STATUS_DONE_OK ||
         resolve->res_status_hostname == RES_STATUS_DONE_ERR) &&
        resolve->ttl_hostname < ttl)
      ttl = resolve->ttl_hostname;

    set_expiry(new_resolve, time(NULL) + ttl);
  }

  assert_cache_ok();
}

Above, keep the remove logic at the start but skip the re-insert of the cached entry if the config flag is set to not cache. Done?

This was from the top of my mind. I haven't checked it, but I think this works. Can someone with a proper C-tor setup help check?

assigned to @dgoulet

added Sponsor 112 S112-O3 labels

I hear @trinity-1686a is coming up with a branch of this one.

mentioned in merge request !823

Add a flag to disable Tor's DNS cache

Improving DNS privacy on Tor exit relays

Preface

1 Improved DNS setup

1.1 Longer and random TTL

1.2 Extensive preloading

1.3 Automatic refreshes

2 Shortcomings

2.1 DNS cache misses

2.2 Domain groupings

2.3 Required infrastructure

2.4 It doesn't exist yet

2.5 Only A and AAAA records

2.6 No visited domains in preload list

3 Changes to Tor

Child items ...

Activity