Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • Trac Trac
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Issues 246
    • Issues 246
    • List
    • Boards
    • Service Desk
    • Milestones
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
  • Wiki
    • Wiki
  • Activity
  • Create a new issue
  • Issue Boards
Collapse sidebar
  • Legacy
  • TracTrac
  • Issues
  • #27076

Closed (moved)
(moved)
Open
Created Aug 08, 2018 by Karsten Loesing@karsten

Reconfigure collector2.tp.o to do less

We have two CollecTor instances: collector.tp.o on colchicifolium and collector2.tp.o on corsicum. Reasons for having two instances instead of one are related to failure tolerance:

  1. Whenever collector.tp.o fails, it doesn't fetch consensuses and votes from the directory authorities, and those are only available for an hour. If collector.tp.o fails for a couple hours, it can later fetch missing descriptors from collector2.tp.o.
  2. While collector.tp.o is down, Onionoo can fetch relay descriptors from collector2.tp.o and continue to provide recent data.

However, I think we went a bit too far when configuring collector2.tp.o to also sync descriptors from collector.tp.o. It does that with bridge descriptors and sanitized web logs.

Here's how the two instances are currently configured:

collector.tp.o/colchicifolium: 
RelaySources = Cache, Remote, Sync, Local
BridgeSources = Local
ExitlistSources = Remote
OnionPerfSources = Remote
WebstatsSources = Local

collector2.tp.o/corsicum:
RelaySources = Remote
BridgeSources = Sync
ExitlistSources = Remote
OnionPerfSources = Remote
WebstatsSources = Sync

It's the two "Sync" entries at the bottom. I think we mainly put them in so that the respective sync code gets executed, too, so that we would notice any issues with that.

I now believe that these entries are not helpful and potentially harmful, for several reasons:

  1. The sync mode of the bridgedescs module does not clean up the recent/ directory after placing descriptors there. The local mode would do that, but the sync mode does not. The effect is that bridge descriptors in recent/ pile up and fill up disk space. Even worse, Onionoo fetches everything contained in that directory, so that bootstrapping a new Onionoo instance downloads vast amounts of data these days.
  2. I don't yet know what happened in #27055 (moved), but it seems that simplifying the configuration of collector2.tp.o should make that issue at least less likely to happen again.

I could imagine reconfiguring collector2.tp.o to only perform the following tasks:

collector2.tp.o/corsicum:
RelaySources = Remote
ExitlistSources = Remote

The effect would be that we'd still keep our failure tolerance properties and nothing more.

Does that make sense? Did I miss anything important here?

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking