Skip to content
Snippets Groups Projects

doc/CONFIG-NOTES.md: config proposal

Merged Ian Jackson requested to merge Diziet/arti:config into main
17 unresolved threads

This is my draft proposal to address #285 (closed). The text is identical to that in #285 (comment 2770848)

Merge request reports

Approval is optional

Merged by etaeta 3 years ago (Jan 26, 2022 5:00pm UTC)

Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
71 ```
72 (borrowing notation from `fehler` for clarity; actual code will be `-> Result`)
73
74 For embedders, `arti-client` should provide a method a bit like this
75 (name TBD)
76 ```
77 #]throws]
78 pub fn tor_and_caller_config_from_usual_files<T>() -> (TorClientCfg, T)
79 where T: Deserialize
80 {
81 #[derive(Deserialize)]
82 struct CombinedConfig {
83 #[flatten] tor: TorConfig,
84 #[flatten] rest: T,
85 }
86 ... load a CombinedCnfig from the usual Tor config files ...
  • Comment on lines +78 to +86

    I think this idea is interesting, though if we want to support it in the general case, we might to talk a bit about namespacing. (Does serde support multiple flattens like this? Can we re-serialize the combined files with this?)

  • Author Maintainer

    Sure it does, and you can re-serialize: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=56529f2371f3c982850784b54e3bb412

    And yes, there is a namespacing issue: you can't have sections that are provided by our code which have the same names as the caller's sections. serde's behaviour is to take the first when there are clashes. So the suggestion I am making causes our sections to override the caller's. That may not be right. But I think whichever way we choose for that we can address that with appropriate documentation.

  • Please register or sign in to reply
  • Nick Mathewson
    Nick Mathewson @nickm started a thread on commit ebda5a0a
  • 93 I suggest that `TorClientCfg` should be a non-exhaustive but otherwise transparent struct (i.e. with `pub` fields).
    94
    95 The "sections" in it (ie, the sub-struct names and types) are already public, since they are part of the config file format. Restructuring this would be a breaking change regardless of the Rust API. So the current scheme does not add any additional API abstraction - it merely demands boilerplate code.
    96
    97 ### `config` vs `figment`
    98
    99 Currently we use `config`. I looked at others briefly and rather more closely at `figment`. Certainly `figment is a lot richer. I think it might be able to produce better errors. I have to say though that when ended up using a library that used `figment` I found it a bit awkward and confusing.
    100
    101 For now I suggest retaining `config`. At least, the code that knows we're using it will be fairly self-contained.
    102
    103 ## Miscellaneous
    104
    105 * `TorClientCfg`. Suggest renaming to `TorClientConfig`. If we do things right, users won't normally need to type this name.
    106
    107 * Default config file location is currently `./arti-config.toml`. This should
    108 be changed (to XDG dirs probably)
    • Comment on lines +107 to +108

      I don't think that's what the default is now: it's currently in arti-config:

      /// Return a filename for the default user configuration file.
      pub fn default_config_file() -> Option<PathBuf> {
          CfgPath::new("${ARTI_CONFIG}/arti.toml".into()).path().ok()
      }

      So that should in theory use XDG dirs on XDG platforms.

    • Author Maintainer

      I think there are multiple defaults at different layers. I'm sure I found a ./arti-config.toml somewhere :-). IMO there should be one default config path and yes, XDG dirs by default on Unix.

    • It is a mystery: git grep arti-config.toml finds nothing for me. :shrug:

    • Contributor

      You might be thinking of this:

      eta@euston ~/s/arti ((a9be6c9e…)) [1]> rg config.toml
      crates/arti/src/main.rs
      127:    let dflt_config = arti_config::default_config_file().unwrap_or_else(|| "./config.toml".into());
      
    • Author Maintainer

      Yes, I think so. Well found.

    • Please register or sign in to reply
  • Nick Mathewson
  • Nick Mathewson
    Nick Mathewson @nickm started a thread on commit ebda5a0a
  • 42
    43 ### Discussion
    44
    45 To make our config loading code reuseable, in everything from a slightly extended version of `arti`, to some entirely different embedding of our Tor, code, we need to separate out:
    46
    47 * Code that knows about configuration sources (how to read a file, default locations of files, how to access the command line)
    48
    49 * Code that knows about configuration *settings*
    50
    51 (Separate in the sense that most entrypoints should be one of these or the other, and they should be brought together only in convenience methods.)
    52
    53 ### Proposal
    54
    55 All of the `FooConfig` structs should be made directly `Deserialize` and `Serialize` (subject to serde cargo feature of course). (See above re abolition of `FooConfigBuilder` as a separate public type.) This means that much of the boilerplate field-copying (and conversions) in arti-config can be deleted.
    56
    57 The knowledge of the default config file location(s) and sources should be (exposed) in `arti-client` (not `arti`), with the implication that we hope that most embedders will use it.
    • So, right now it's in arti-config, on the theory that arti-config is where you look if you want your application to support the same configuratoin format as arti, and not otherwise. It sounds like in this proposal you're lowering part of arti-config into arti-client, but I'm not sure exactly how much.

    • Author Maintainer

      Everything that knows about what's in TorClientCfg at least, I think, since we want all of that logic to be shared with everyone who uses arti-client.

      I think what is left in arti-config is mostly the cmdline handling, and maybe some of the logging. If we want a principled distinction I think it's probably that arti-config is "stuff that is specific to the arti command line program, including its logging config, and parsing of config from the command line". So reuseable by an embedder other than arti, but we don't much mind if embedders do this stuff differently, so it can be off in another crate.

      But, I am not wedded to the precise crate structure at this level. What I do think we want is that if an embedder has configurable Tor client, the easy path for them ends up with their Tor client configurable using the same files that arti uses. Whether the rest of their program's config is their own section in that config, or in a different file, or whatever, is something I don't think we need to have an opinion aobut.

    • Contributor

      What I do think we want is that if an embedder has configurable Tor client, the easy path for them ends up with their Tor client configurable using the same files that arti uses.

      It's worth pointing out that this is very much already the case!

    • Please register or sign in to reply
  • Nick Mathewson
    Nick Mathewson @nickm started a thread on commit ebda5a0a
  • 10
    11 We should make client construction using default config file locations easy (at leasat as easy as construction with hardcoded defaults). That will make it easy for embedders to provide user-tweakable config.
    12
    13 We should add some warnings to the docs for configuration settings which shrink anonmity sets. (Or possibly blanket warnings.) This should aim to discourage not only users but also embedders from making poor choices.
    14
    15 ## Validation (startup API simplification)
    16
    17 ### Discussion
    18
    19 Few callers will want to validate the configuration separately from using it to make a Tor client (or running copy of the arti executable). The separation between `FooConfigBuilder` (partially-validated[1] configuration read from file(s) and/or set up with builder pattern methods) and `FooConfig` (validated configuration for Foo) doubles the number of config structs in our API. ([1] I say "partially" because the types of individual fields are right, although their values may be inappropriate.)
    20
    21 This distinction prevents (typestate-style) a programmer from writing code which directly constructs a client from an only-partially-validated config, but a program which does this is not buggy (if we make the validation happen at construction time) - and actually providing that API is easier for the caller. And because the `FooConfig` is already partially validated, there is
    22
    23 ### Proposal
    24
    25 We rename `FooConfigBuilder` to `FooConfig`. The current `FooConfig` type will become a private `foo::ValidatedConfig` type. (The two types will usually still need to be separate because defaulting means the builder ought usually to contain many `Option`, whereas the operational code doesn't want to see those.)
  • Nick Mathewson
    Nick Mathewson @nickm started a thread on commit ebda5a0a
  • 16
    17 ### Discussion
    18
    19 Few callers will want to validate the configuration separately from using it to make a Tor client (or running copy of the arti executable). The separation between `FooConfigBuilder` (partially-validated[1] configuration read from file(s) and/or set up with builder pattern methods) and `FooConfig` (validated configuration for Foo) doubles the number of config structs in our API. ([1] I say "partially" because the types of individual fields are right, although their values may be inappropriate.)
    20
    21 This distinction prevents (typestate-style) a programmer from writing code which directly constructs a client from an only-partially-validated config, but a program which does this is not buggy (if we make the validation happen at construction time) - and actually providing that API is easier for the caller. And because the `FooConfig` is already partially validated, there is
    22
    23 ### Proposal
    24
    25 We rename `FooConfigBuilder` to `FooConfig`. The current `FooConfig` type will become a private `foo::ValidatedConfig` type. (The two types will usually still need to be separate because defaulting means the builder ought usually to contain many `Option`, whereas the operational code doesn't want to see those.)
    26
    27 Config validation errors will be reported during client construction.
    28
    29 If desired we could provide a `validate()` method for callers that want to double check a config without trying to instantiate it. Or maybe the "create without bootstrapping" entrypoint will be good for this.
    30
    31 The `Deserialize` impl on configs will be applied to the builder struct (now `FooConfig`) rather than the "built" struct, via `#[builder(derive(Deserialize))]`. (Defaults will not need to be specified to serde, since the fields of the builder are `Option`.)
    • I'm okay with this if it works and doesn't require changes to our current configuration file formats. (Fortunately we have a test for that)

    • Author Maintainer

      I think it shouldn't, because I think serde default and Option work near enough similarly. My hope is that it will bring the API and the config file closer.

      I guess there might be minor differences.

    • Please register or sign in to reply
  • Nick Mathewson
    Nick Mathewson @nickm started a thread on commit ebda5a0a
  • 24
    25 We rename `FooConfigBuilder` to `FooConfig`. The current `FooConfig` type will become a private `foo::ValidatedConfig` type. (The two types will usually still need to be separate because defaulting means the builder ought usually to contain many `Option`, whereas the operational code doesn't want to see those.)
    26
    27 Config validation errors will be reported during client construction.
    28
    29 If desired we could provide a `validate()` method for callers that want to double check a config without trying to instantiate it. Or maybe the "create without bootstrapping" entrypoint will be good for this.
    30
    31 The `Deserialize` impl on configs will be applied to the builder struct (now `FooConfig`) rather than the "built" struct, via `#[builder(derive(Deserialize))]`. (Defaults will not need to be specified to serde, since the fields of the builder are `Option`.)
    32
    33 Since the "built" config is now private, the builder-generated `build` method becomes private. If we want to do complicated defaulting (eg, defaulting that depends on other options) or the like, we may need to hand-roll that (or parts of it).
    34
    35 ## Visibility of `FooConfig` structs
    36
    37 We have decided that `tor-*` crates have much more unstable APIs than `arti-*`. Configuration is exposed via the config file format, and therefore a semantically equivalent view should be avialable via the APIs.
    38
    39 So each `FooConfig` needs to be re-exported by `arti-client`, and semver-breaking changes to those are semver breaks for arti.
    • I believe we do that now; I still think it's a good idea.

      (I assume when you say FooConfig you only mean the tor-* FooConfig structures, not the ones currently in arti-config.)

    • Author Maintainer

      I believe we do that now; I still think it's a good idea.

      (This is apropros re-exporting.) I confess I didn't check. I'm not sure if all the crates are handled the same way, either.

      (I assume when you say FooConfig you only mean the tor-* FooConfig structures, not the ones currently in arti-config.)

      Yes.

    • Please register or sign in to reply
  • Nick Mathewson
    Nick Mathewson @nickm started a thread on commit ebda5a0a
  • 80 {
    81 #[derive(Deserialize)]
    82 struct CombinedConfig {
    83 #[flatten] tor: TorConfig,
    84 #[flatten] rest: T,
    85 }
    86 ... load a CombinedCnfig from the usual Tor config files ...
    87 ```
    88
    89 On the principle of not having important knowledge only entangled in some more complicated machinery, there should probably be a ```fn usual_config_files() -> Vec<PathBuf>``` (or something).
    90
    91 ## Transparency of `TorClientCfg`
    92
    93 I suggest that `TorClientCfg` should be a non-exhaustive but otherwise transparent struct (i.e. with `pub` fields).
    94
    95 The "sections" in it (ie, the sub-struct names and types) are already public, since they are part of the config file format. Restructuring this would be a breaking change regardless of the Rust API. So the current scheme does not add any additional API abstraction - it merely demands boilerplate code.
    • I'm okay with this, though I wouldn't be okay with applying the same logic to the other FooConfig objects. I see below that we're in agreement about that.

      This does mean that we can't easily rename a configuration section without an API break.

    • Author Maintainer

      This does mean that we can't easily rename a configuration section without an API break.

      Yes, but doing that is a config file compat break too, so let's do that very rarely.

    • Please register or sign in to reply
  • Nick Mathewson
    Nick Mathewson @nickm started a thread on commit ebda5a0a
  • 95 The "sections" in it (ie, the sub-struct names and types) are already public, since they are part of the config file format. Restructuring this would be a breaking change regardless of the Rust API. So the current scheme does not add any additional API abstraction - it merely demands boilerplate code.
    96
    97 ### `config` vs `figment`
    98
    99 Currently we use `config`. I looked at others briefly and rather more closely at `figment`. Certainly `figment is a lot richer. I think it might be able to produce better errors. I have to say though that when ended up using a library that used `figment` I found it a bit awkward and confusing.
    100
    101 For now I suggest retaining `config`. At least, the code that knows we're using it will be fairly self-contained.
    102
    103 ## Miscellaneous
    104
    105 * `TorClientCfg`. Suggest renaming to `TorClientConfig`. If we do things right, users won't normally need to type this name.
    106
    107 * Default config file location is currently `./arti-config.toml`. This should
    108 be changed (to XDG dirs probably)
    109
    110 * `arti_defaults.toml`: this duplicates the default settings baked into the code. This is kind of inevitable if we want to supplyy an example, but there is a big risk of divergence. Either (a) there should be a test case that the defaults match the baked-in ones or (b) there should be no baked-in ones and instead things should be read from this file. Also, it is in danger of being taken as an example config file, which is not great if we ever want to change the defaults. Suggest we comment out every setting. (The test case or run-time defaulting will now need to use a mechanically-uncommented versions.)
    • I think that test-case for "(a)", or something like it, exists as arti_config::options::test::default_config.

      (Another purpose of having arti_defaults.toml is to make sure that the defaults in the code match what we think they are, and that the encoding matches what we think it is.)

      I'm okay with making things commented out by default.

    • Author Maintainer

      OK, cool. Thanks for the ref to the test case.

    • Please register or sign in to reply
    • I think I'm good with this design, modulo questions above, though of course I'd also like to see @eta's feedback.

      Three things to think about before proceeding:

      • Is there a way to do any of this incrementally as a series of smaller MRs? Ideally, we'd do the riskiest parts first, so that we can be sure that all of our ideas work out.
      • What example code should we provide? (I think doing some example code for this would be a valuable way to confirm that the API is ergonomic.)
      • How does this interact with the reconfigure() design?
    • Author Maintainer

      Is there a way to do any of this incrementally as a series of smaller MRs?

      There are a number of different things going on here and yes I think they can be split up. The most risky parts are (i) conflating the builder structs with what the API thinks of the config and (ii) the high-level changes about how the files are read. I think (i) can be tried out with one crate first without causing anything other than an API break. (ii) can be done separately from the rest.

      What example code should we provide? (I think doing some example code for this would be a valuable way to confirm that the API is ergonomic.)

      I'm hoping that arti will become even more trivial than it is already :-). arti then becomes the prototypical embedder.

      How does this interact with the reconfigure() design?

      I confess I didn't think about this properly but I don't think there are fundamental problems. I will go and do some more reading and update my proposal.

    • Please register or sign in to reply
  • Nick Mathewson requested review from @eta

    requested review from @eta

  • eta
    eta @eta started a thread on the diff
  • 1 # Configuration of tor-* and arti (API)
    2
    3 *Discussion proposal*
    4
    5 ## Overall philosophy
    6
    7 There should be one coherent schema/data model for our configuration. It should be extensible enough that the arti executable can reuse the machinery used in arti-client, and that other embedders can also use it.
    8
    9 The schema should be reified in serde implementations, our config reader, and builder pattern methods etc. - i.e., all these methods should be equivalent. There shouldn't be different names or semantics depending how the config is fed into our code.
    10
    11 We should make client construction using default config file locations easy (at leasat as easy as construction with hardcoded defaults). That will make it easy for embedders to provide user-tweakable config.
    • Contributor
      Suggested change
      11 We should make client construction using default config file locations easy (at leasat as easy as construction with hardcoded defaults). That will make it easy for embedders to provide user-tweakable config.
      11 We should make client construction using default config file locations easy (at least as easy as construction with hardcoded defaults). That will make it easy for embedders to provide user-tweakable config.
    • Author Maintainer

      This is just a rewrap ?

    • Please register or sign in to reply
  • eta
  • 6
    7 There should be one coherent schema/data model for our configuration. It should be extensible enough that the arti executable can reuse the machinery used in arti-client, and that other embedders can also use it.
    8
    9 The schema should be reified in serde implementations, our config reader, and builder pattern methods etc. - i.e., all these methods should be equivalent. There shouldn't be different names or semantics depending how the config is fed into our code.
    10
    11 We should make client construction using default config file locations easy (at leasat as easy as construction with hardcoded defaults). That will make it easy for embedders to provide user-tweakable config.
    12
    13 We should add some warnings to the docs for configuration settings which shrink anonmity sets. (Or possibly blanket warnings.) This should aim to discourage not only users but also embedders from making poor choices.
    14
    15 ## Validation (startup API simplification)
    16
    17 ### Discussion
    18
    19 Few callers will want to validate the configuration separately from using it to make a Tor client (or running copy of the arti executable). The separation between `FooConfigBuilder` (partially-validated[1] configuration read from file(s) and/or set up with builder pattern methods) and `FooConfig` (validated configuration for Foo) doubles the number of config structs in our API. ([1] I say "partially" because the types of individual fields are right, although their values may be inappropriate.)
    20
    21 This distinction prevents (typestate-style) a programmer from writing code which directly constructs a client from an only-partially-validated config, but a program which does this is not buggy (if we make the validation happen at construction time) - and actually providing that API is easier for the caller. And because the `FooConfig` is already partially validated, there is
  • eta
    eta @eta started a thread on the diff
  • 4
    5 ## Overall philosophy
    6
    7 There should be one coherent schema/data model for our configuration. It should be extensible enough that the arti executable can reuse the machinery used in arti-client, and that other embedders can also use it.
    8
    9 The schema should be reified in serde implementations, our config reader, and builder pattern methods etc. - i.e., all these methods should be equivalent. There shouldn't be different names or semantics depending how the config is fed into our code.
    10
    11 We should make client construction using default config file locations easy (at leasat as easy as construction with hardcoded defaults). That will make it easy for embedders to provide user-tweakable config.
    12
    13 We should add some warnings to the docs for configuration settings which shrink anonmity sets. (Or possibly blanket warnings.) This should aim to discourage not only users but also embedders from making poor choices.
    14
    15 ## Validation (startup API simplification)
    16
    17 ### Discussion
    18
    19 Few callers will want to validate the configuration separately from using it to make a Tor client (or running copy of the arti executable). The separation between `FooConfigBuilder` (partially-validated[1] configuration read from file(s) and/or set up with builder pattern methods) and `FooConfig` (validated configuration for Foo) doubles the number of config structs in our API. ([1] I say "partially" because the types of individual fields are right, although their values may be inappropriate.)
    • Contributor

      So I don't think having builders is actually bad per se, although I've never understood the need to "validate" configuration when using them (this is more of a complaint against derive_builder). An ideal design would reject invalid values when calling the setter method, instead of when calling build() at the end, i.e.:

      Foo::builder().field(...)?.build()

      instead of

      Foo::builder().field(...).build()?

      In a lot of cases, though, I don't think there even are invalid values (and we might be able to constrain those that exist using the type system, so they're catchable at compile time) — which would let us get rid of the need to do this at all.

    • One issue is that there are sometimes settings that are individually allowable, but not mutually permissible. For example, there are some settings that we should only allow you to reconfigure if you are on a nonstandard Tor network (not with the default authorities).

      (This isn't the case for most options, but it does come up.)

    • Author Maintainer

      not mutually permissible

      Right, so the thing with the builder methods necessarily has "invalid" states. I think that is OK. And given that, it is not necessary to completely validate the valid range (say) of every config parameter.

      I guess I'm saying validating individual values early is rather more work and probably doesn't buy very much. I'm not against it but I'm not convinced that it is worth making every setter method on the builder fallible.

    • Please register or sign in to reply
  • eta
    eta @eta started a thread on the diff
  • 15 ## Validation (startup API simplification)
    16
    17 ### Discussion
    18
    19 Few callers will want to validate the configuration separately from using it to make a Tor client (or running copy of the arti executable). The separation between `FooConfigBuilder` (partially-validated[1] configuration read from file(s) and/or set up with builder pattern methods) and `FooConfig` (validated configuration for Foo) doubles the number of config structs in our API. ([1] I say "partially" because the types of individual fields are right, although their values may be inappropriate.)
    20
    21 This distinction prevents (typestate-style) a programmer from writing code which directly constructs a client from an only-partially-validated config, but a program which does this is not buggy (if we make the validation happen at construction time) - and actually providing that API is easier for the caller. And because the `FooConfig` is already partially validated, there is
    22
    23 ### Proposal
    24
    25 We rename `FooConfigBuilder` to `FooConfig`. The current `FooConfig` type will become a private `foo::ValidatedConfig` type. (The two types will usually still need to be separate because defaulting means the builder ought usually to contain many `Option`, whereas the operational code doesn't want to see those.)
    26
    27 Config validation errors will be reported during client construction.
    28
    29 If desired we could provide a `validate()` method for callers that want to double check a config without trying to instantiate it. Or maybe the "create without bootstrapping" entrypoint will be good for this.
    30
  • eta
    eta @eta started a thread on the diff
  • 54
    55 All of the `FooConfig` structs should be made directly `Deserialize` and `Serialize` (subject to serde cargo feature of course). (See above re abolition of `FooConfigBuilder` as a separate public type.) This means that much of the boilerplate field-copying (and conversions) in arti-config can be deleted.
    56
    57 The knowledge of the default config file location(s) and sources should be (exposed) in `arti-client` (not `arti`), with the implication that we hope that most embedders will use it.
    58 (`tor-config` can continue to be the actual implementation of env vars, default path lookup, etc.)
    59
    60 Individual `tor-*` crates will retain their knowledge of their own configuration. arti-config will retain the the knowledge of executable-sepcific config settings (notably logging), and can be reused by shallow embedders if they like.
    61
    62 ### API suggestions
    63
    64 The most simple ones that just read a Tor config should look like this:
    65
    66 ```
    67 impl TorClientCfg {
    68 #[throws] pub fn from_files() -> Self;
    69 #[throws] pub fn builtin_defaults() -> Self;
    • Contributor

      why would this throw / be fallible?

    • Author Maintainer

      from_files definitely needs to be fallible. Perhaps the files can't be read because a fileserver is down or something.

      I guess builtin_defaults doesn't need to be fallible but I don't want to tempt people to call that instead just to avoid having to do error handling :-)

    • Please register or sign in to reply
  • eta
    eta @eta started a thread on the diff
  • 57 The knowledge of the default config file location(s) and sources should be (exposed) in `arti-client` (not `arti`), with the implication that we hope that most embedders will use it.
    58 (`tor-config` can continue to be the actual implementation of env vars, default path lookup, etc.)
    59
    60 Individual `tor-*` crates will retain their knowledge of their own configuration. arti-config will retain the the knowledge of executable-sepcific config settings (notably logging), and can be reused by shallow embedders if they like.
    61
    62 ### API suggestions
    63
    64 The most simple ones that just read a Tor config should look like this:
    65
    66 ```
    67 impl TorClientCfg {
    68 #[throws] pub fn from_files() -> Self;
    69 #[throws] pub fn builtin_defaults() -> Self;
    70 }
    71 ```
    72 (borrowing notation from `fehler` for clarity; actual code will be `-> Result`)
    • Contributor

      This isn't actually clearer whatsoever, IMO: now I have to go and look at what that means ("throwing" is not a very familiar concept in Rust-land). It'd be better if everything just used Result.

    • Author Maintainer

      OK, I was just writing a sketch. I will refrain from speaking in fehler around here in future.

    • Please register or sign in to reply
  • eta
    eta @eta started a thread on the diff
  • 63
    64 The most simple ones that just read a Tor config should look like this:
    65
    66 ```
    67 impl TorClientCfg {
    68 #[throws] pub fn from_files() -> Self;
    69 #[throws] pub fn builtin_defaults() -> Self;
    70 }
    71 ```
    72 (borrowing notation from `fehler` for clarity; actual code will be `-> Result`)
    73
    74 For embedders, `arti-client` should provide a method a bit like this
    75 (name TBD)
    76 ```
    77 #]throws]
    78 pub fn tor_and_caller_config_from_usual_files<T>() -> (TorClientCfg, T)
    • Contributor

      I'm really unsure about this whole "the caller will want to merge their config file format with ours and load them from the same file" idea: it strikes me that almost no serious embedder would want to use such an API.

      What might be more useful is an API that let you make a TorClientCfg from a serde_json::Value or similar.

    • Author Maintainer

      What might be more useful is an API that let you make a TorClientCfg from a serde_json::Value or similar.

      In my proposal TorClientCfg is Deserialize so you can indeed convert it (fallibly) from serde_json::Value.

      I'm really unsure about this whole "the caller will want to merge their config file format with ours and load them from the same file" idea: it strikes me that almost no serious embedder would want to use such an API.

      The arti executable itself wants it. Anyone who wants to do something "vaguely like arti" ought to be provided with the same facility. This is just a consequence of the general API principle that functionalities which people might want to use separately shouldn't be provided only via an API which welds them together.

      Eg, maybe you want a version of arti which is also a web reverse proxy for onion services, so it will be "just like arti but with this extra config stuff for the web routing"

    • Please register or sign in to reply
  • eta
    • Contributor

      I think the intent behind this design doc is generally very good, although I think my ideal vision for what config should look like in Arti differs a bit.

      Here are some general summary points I have of the proposal, as is (in addition to the more specific inline review comments):

      • I agree with @Diziet that it's confusing having separate unvalidated and validated config formats.
        • However, I think the way to solve this is to do away with the entire concept of an unvalidated / incorrect configuration, at the type level.
        • For example, the Deserialize implementation for a configuration type should fail if presented with values of the correct type that are otherwise incorrect.
        • This could be implemented in practice by having a bunch of newtype wrappers, or by using #[serde(deserialize_with)].
      • I'm not so sure how great the derive_builder crate is. Currently, I'm leaning towards the idea that replacing it with a bunch of manual boilerplate might be a good thing in the long run.
        • One specific problem: the fact that build() is fallible, instead of the individual setter methods being fallible. (See the relevant review thread about this.)
        • More generally, though, having that crate codegen a bunch of builder stuff for us that's hard for us to modify / tweak to our specific needs seems like a pretty large downside, especially given we care about API stability / design.
        • Autogenerating documentation also doesn't lead to the best results, either.

      As for how I'd propose improving things myself: I think it could be a nice idea to get rid of the builder pattern and just move to regular old setter methods — instead of starting out with a blank slate and constructing things from there, users would start with a default config and call set_xyz to change things.

      In addition, it might be nice to have a "merge configurations" function (as in, merge A with B, preferring A's settings if they both differ from the defaults, otherwise use the one that isn't the default), which might even replace a lot of what the config crate does for us.

      • However, I think the way to solve this is to do away with the entire concept of an unvalidated / incorrect configuration, at the type level.

      This might be possible, though I'm not sure.

      (It won't do away with the need for validation at reconfiguration time, though, since not all transitions are going to be supported.)

      • I'm not so sure how great the derive_builder crate is. Currently, I'm leaning towards the idea that replacing it with a bunch of manual boilerplate might be a good thing in the long run.

      I'd lean towards this as a "long run" idea rather than a "right now" idea. One of the challenges with manual boilerplate is that it's easier to mess it up: setting the wrong option, or having the wrong return type, for instance.

      Another possibility is to improve derive_builder to meet our needs, by adding support for manual documentation overrides and infallible build() methods. (I think it might already support fallible setters, but I could be wrong there.)

      In addition, it might be nice to have a "merge configurations" function (as in, merge A with B, preferring A's settings if they both differ from the defaults, otherwise use the one that isn't the default), which might even replace a lot of what the config crate does for us.

      It's actually important that we distinguish "has the default value" from "is explicitly set to the default value" here: If we want to apply configuration A on top of configuration B, then it matters whether A has left option X unset, or whether A has explicitly set it to the default value.

    • Please register or sign in to reply
  • Ian Jackson added 1 commit

    added 1 commit

    • 6ba3686a - CONFIG-NOTES: Suggestions from MR review

    Compare with previous version

  • Author Maintainer

    do away with the entire concept of an unvalidated / incorrect configuration, at the type level

    I think this is the most fundamental point here. I agree that making invalid states unrepresentable is generally a good idea.

    But, here, I don't think it is practical or adviseable.

    Firstly, it's not practical because of the point @nickm makes about combinations of settings possibly being invalid. If invalid combinations are unrepresentable then either setter methods would have to be called in a particular order (and, depending, there might not even be a possible order), or we would have to provide multi-setter methods, possibly in multiple combinations.

    Secondly, it is actually useful to be able to represent a possibly-invalid configuration. That allows a program to read a configuration from disk and fix it up programmatically, for example. I think it will be easier to produce good error messages if we can have an in-memory representation of what everything is and where it came from - and if we do more of the validation in our own bespoke code, rather than in a Deserialize callback.

    Stepping back: the principle of making invalid states unrepresentable is great when the invalidity might come from a bug (including a failure to check something). It allows the compiler to spot these bugs and render them statically impossible. But the scheme I propose already has separate types for unvalidated and validated config, so this is already the case within our own code. But at the API level, config data is not a complex data structure that is constructed programatically: it's very data-like, and often comes in from the outside. There is nothing wrong with having an ability to represent unvalidated data if you can be sure not to use it by mistake.

    improve derive_builder

    It's true that my proposal doesn't automatically give us good documentation for our config. The result will be tolerable, but it will leave people inferring the necessary TOML syntax by extrapolating from setter method names and types. Really we want a bespoke doc which describes the TOML format, but which is generated from the doc comments we write in-code (or some other single source). derive_builder can't do that right now. We might need to find or write a tool for this. But I don't think this is on a critical path for 0.1.0. Unlike Deserialize, the use of derive_builder doesn't become part of our API.

    As for the build method being fallible or not: my suggestion is to hide this method entirely. Users would pass in a hopefully-valid TorClientCfg and it would be converted internally to a known-valid representation, using whatever combination of autogenerated and handwritten code is most convenient. Right now that would be a combination of the autogenerated build plus hand-rolled validation.

  • Ian Jackson added 1 commit

    added 1 commit

    • 85ed27f6 - CONFIG-NOTES: finish a sentence

    Compare with previous version

  • eta approved this merge request

    approved this merge request

  • merged

  • eta mentioned in commit 4aa442de

    mentioned in commit 4aa442de

  • Please register or sign in to reply
    Loading