Implement Distinct Descriptor Mode
For my master thesis I implemented a Distinct Descriptor Mode (DDM) for v3-Descriptors and I wanted to finally share my results :)
The idea was, that we can create several subdescriptors which all contain different introduction points und upload those to different HSDirs. For Onionbalance it is irrevelant from which backend instances these intro points come from. It only cares for the intro points and can be used to insert them into different descriptors. We use the HSDir-Spread-Store- and Replica-Parameters to destribute the descriptors across several HSDirs. The limitating factor here is the HSDir_Spread_Fetch-Parameter - as clients by default consider only the first 3 HSDirs after the replica-indices when fetching a new descriptor.
As of now, a descriptor is allowed to contain up to 20 intro points. When we choose HSDir-Spread_Store = 3 and replica = 2, Onionbalance receives 6 different HSDirs which can be used to upload the descriptors. In total, we can handle 6 * 20 introduction points (which translates to up to 120 possible backend instances, or 60, if we take 2 intro points from each instance).
Example:
- number of instances: 35, number of intro points per instance: 2 ---> we need enough descriptors to fit 70 intro points
- Onionbalance creates 4 subdescriptors - every descriptor contains 17 or 18 (=70/4) different intro points
- we can use 6 HSDirs - that means that Onionbalance uploads every descriptor at least one time (2 descriptors are uploded two times)
For every subdescriptor, there is a "first" and a "second" descriptor created and uploaded. Regarding the update-functionality of the subdescriptor I kept the original implementaion: the descriptors are updated every time a new consensus arrives or the set of introduction points changes. This also means, that all subdescriptors are updated at the same time - although they might still be valid. Right now, all subdescriptor can only be created and updated at the same time and not be managed separately. This is something that could be improved in the future.
I wrote some functional tests for the DDM in Onionbalance but they only test the general functionality and could probably require some extra work.
I tested the DDM both in Chutney and on the live-network. For live-testing I used docker containers (each representing a frontend instance, backend instance or client connecting to the Onion Service). I wrote a script to reproduce the docker tests: https://github.com/nr24119/test_ddm. The backend instances were configured to each serve slightly different content (see Index.html). The clients automically performed curl-requests and the response revealed the backend instance they connected to.
I tracked the docker clients connecting to an Onion Service which was load balanced by 20 backend instances. The results show, that the requests are spread fairly regular over all backend-instances (ca. 2000 curl-requests, time frame ca. 2.5 hrs):
I tested up to 60 backend instances and it worked very well!
Please let me know if you have any questions or remarks and I will try to answer them as soon as possible :)
Summary
The following changes were made:
Raise the maximum limit of 8 backend instances:
- new parameter to set the desired number of instances and number of introduction points per instance (see params.py)
- adjusting the config generator to allow a max. number of instances (config-generator.py)
Calculate the number of needed (sub-)descriptors: (service.py)
- estimate how big a descriptor is without containing intro points (max. space for a descriptor is 50.000 bytes)
- find out how much space per descriptor can be used by the intro points
- calculate number of descriptors needed to fit all intro points
Create (sub-)descriptors: (service.py)
- determine which intro point should go into which descriptor
- create descriptors and temporarly store them
Upload all (sub-)descriptors: (service.py)
- get responsible HSDirs
- assign a HSDir-node to every (sub-)descriptor and start uploading
Logging
- adjust existing logging
- log all actions concerning the DDM (log-level INFO)