Over the years folks have put all sorts of information into the contact information field. When working on arti relay we should be more strict than that. I think we should only allow an email address and ignore all the other stuff people try to put there. We have recently seen scammers that abuse that field for lack of any other usable one and I think with arti relay it's time to prevent that from happening in the future.
EDIT (05/30/2023): However, to be clear, this is not the only reason for this ticket. See e.g.: #870 (comment 2906733).
I have no strong opinion on whether we should allow some kind of obfuscation mechanisms like [@] instead of a plain @ but it could be something to consider.
if ContactInfo gets more restrictions, I think it should still accept more than just an email. Many operator have migrated from simple email to something more structured like nusenu's ContactInfo-Information-Sharing-Specification, at least to get an AROI.
if ContactInfo gets more restrictions, I think it should still accept more than just an email. Many operator have migrated from simple email to something more structured like nusenu's ContactInfo-Information-Sharing-Specification, at least to get an AROI.
I think an email is all we need to get into contact with relay operators. If we need more fields or want to have more of them we should create new ones and not overload ContactInfo.
I think the benefits of obfuscation here is an illusion. When you look at internet databases such as RIPE's whois database and such, they all contain email addresses. People have spamfilters in place to take care of spam received from scraped data.
I think being able to just copy a list of email addresses when you want to reach people is a good idea. Additionally, having a properly formatted, email would allow us to automate email validation for relays and potentially later assign flags based on criteria such as the given relay having a valid, reachable, email in a similar way as many websites work today for their registration.
I think a change like this might belong in a spec proposal? IOW, you're planning to change a freeform text field into a field with MUST semantics.
But when you say...
We have recently seen scammers that abuse that field for lack of any other usable one
Adding a new field to a descriptor is a one-line patch to the C codebase, and it won't get harder to do in Arti. Is the problem here that the contact field in particular gets imported into some public webpage for SEO purposes, or is there something else going on?
I agree with @nickm that adding a new EmailAddress torrc option would be better than to change the semantics of ContactInfo to prevent introducing a backward incompatible breaking change.
@nusenu I think you misunderstand the context of this ticket. The suggestion here is that Arti's upcoming configuration format enforces an email address as contact information when we begin the work on Arti relays -- we are not changing anything to C Tor's torrc here.
Since we will eventually have to transition C Tor nodes to Arti nodes, it would be good to clean up some of the current mess that we have that poor config examples have done to ContactInfo in C Tor over the years. As shown below, not even the directory authorities have a common structure here.
Having computer readable email addresses would eventually allow us to, for example, require valid email addresses to receive certain flags in the network -- this is a net win for the safety of the Tor network.
In addition to that, it would make it a whole lot easier for our bad relays people and network health folks to reach out in bulk to relay operators.
If we need to add other semantically different concepts to our consensus documents, we can surely do that, but I fail to see why they belong as contact information.
The suggestion here is that Arti's upcoming configuration format enforces an email address as contact information when we begin the work on Arti relays -- we are not changing anything to C Tor's torrc here.
@ahf It is clear to me that this issue #870 is arti specific, it was not clear to me that the proposal tpo/community/relays#71 (moved) is also arti specific.
Does that mean that descriptor lines in C tor and future arti tor are not consistent in their semantics?
Having computer readable email addresses would eventually allow us to, for example, require valid email addresses to receive certain flags in the network -- this is a net win for the safety of the Tor network.
Once syntactically valid email addresses are a requirement, would the Tor Project verify these addresses on an ongoing basis? And at what frequency?
@ahf It is clear to me that this issue #870 is arti specific, it was not clear to me that the proposal tpo/community/relays#71 (moved) is also arti specific.
Let's keep this ticket here for the work that needs to be done in arti and tpo/community/relays#71 (moved) for the discussion that helps to shape the proposal.
That said the proposal is not necessarily arti-specific while this ticket is. We might file a ticket later on for getting a similar change into C Tor if we think that would be needed.
Does that mean that descriptor lines in C tor and future arti tor are not consistent in their semantics?
That's not clear yet. I could see a future where we do make the change proposed in this ticket solely in arti, after C Tor relays are gone (EOL), or where we do adpat C Tor as well.
It is a spec change properly, but it is not something we need to do something with in C Tor I think. Generally, the UX around the ContactInfo field in torrc's examples is currently quite awkward.
For gabelmoo it is actually "4096R/261C5FBE77285F88FB0C343266C8C2D7C5AA446D Sebastian Hahn tor@sebastianhahn.net - 12NbRAjAG5U3LLWETSF7fSTcdaz32Mu5CN" which is pgp, name, email, btc address - there used to be a project which allowed tipping relay ops, and this was a nice way to do it. I think it's completely obsolete today, but it was cool to allow this while it was a thing.
If there is any redesign here, could there potentially be a design where the email address can be submitted to dirauths and submitted to a network-health team but not be widely available? Perhaps through a mandatory extra-info descriptor style thing. A big reason not to put the mail address there is spam, so maybe that's worth it somehow?
I do not know if it's worthwhile, but I am entirely open to the idea. It was my impression the email was occasionally used for social purposes too where non-TPO people are involved too.
I am using my personal email spelled out entirely in our consensus documents, and I do not believe that it have negatively impacted the amount of spam I receive, but probably something to think of here. That is a larger project than what is described here though and probably something the Network Health Team and relay community could figure out
Please do not make the email address private and only accessible to the tor project.
We found it annoying to be unable to contact fellow relay operators in the past. This was especially an issue during the DDoS when we wanted to reach a few relay operators that had strange connection rates to our exit relays.
@applied_privacy Yeah, I would think it being visible is ideal for everybody as well - especially given how "cheap" it is to create an alias to filter on.
Leaving the old ContactInfo as it is and a new EmailAddress in Arti seems to be the most practicable.
Because of: (and make it mandatory)
For the skeptics it should be mentioned: A working email address doesn't have to be an authenticated address. (So it's ok to use: riseup.net, systemli.org, cock.li, ...)
The people who complain about too much spam, I think they also use the email address in ContactInfo somewhere else. My daily spam abuse ratio is: 500-1000 auto-abuse from fail2ban and .*copyright-notice.com to 1-2 spam mail. My abuse@ and admin@ address is non-obfuscated in ContactInfo.
As long as the Tor Project introduces a replacement freeform field for AROIs etc. before locking down ContactInfo, this seems fine.
This is currently not planned and I doubt we'll get back to such a free form field, see the description of this issue for some reasoning. (Let's keep general remarks and discussion for tpo/community/relays#71 (moved) and leave this ticket just for the arti implementation)
We have recently seen scammers that abuse that field for lack of any other usable one
Adding a new field to a descriptor is a one-line patch to the C codebase, and it won't get harder to do in Arti. Is the problem here that the contact field in particular gets imported into some public webpage for SEO purposes, or is there something else going on?
We believe we have cryptocurrency scammers in the network that make use of the contact information field and I got told they explicitly needed that free form field because all the other ones had a too strict requirement for their purposes. We should raise the bar here considerably (and an email address is all we need as a means of contacting operators anyway).
So, it's not so much about adding a new field in a descriptor than locking down the old one. You can think of making ContactInfo behave the way the man page is saying it behaves. :) (with the slight change that it has to be set as well if someone is running just one relay).
Adding a new field to a descriptor is a one-line patch to the C codebase, and it won't get harder to do in Arti. Is the problem here that the contact field in particular gets imported into some public webpage for SEO purposes, or is there something else going on?
We believe we have cryptocurrency scammers in the network that make use of the contact information field and I got told they explicitly needed that free form field because all the other ones had a too strict requirement for their purposes. We should raise the bar here considerably (and an email address is all we need as a means of contacting operators anyway).
I meant to talk about the fact that the spec explicitly allows unrecognized fields. Do you have a plan if the scammers start adding an X-Additional-Information: header? Or is there some why ContactInfo is particularly problematic but a user-added additional information field wouldn't be? (I've got some guesses here, but they don't make a lot of sense.)
If they're deriving money from their scams, they can presumably find a programmer who knows enough C or Rust to add a field to a router descriptor.
Now, I guess we could forbid all unrecognized fields, but that's a much bigger change than any proposed here so far, and it would make a lot of our current forward-compatibility practices hard to continue. (And for what it's worth, there are a few other ways that free-form information could be added without violating the current specs or adding new fields.)
I have not thought about what happens if scammers start adding additional fields/headers. I am not too worried about that part, though, as I think the proposed contact information change raises the bar considerably and we could probably deal with other creative means of trying to scam via the Tor network during our bad relay work.
That said, a thing I forgot to mention in the original description: this ticket is not solely motivated by scammers showing up more or less recently but to a large extent as well by obstacles to our day-to-day workflow. More and more work (be it EOL outreach, in the bad relay area or related to other network health topics) relies on reaching out to operators via email. ContactInfo being some free form field where anyone can put in basically anything makes that way harder than it should be where folks at the end start to write custom parsers for that field which still fail for some non-negligible fraction of entries which then have to get deciphered by hand etc. There should be no need for us to go to such great lengths IMO just to e.g. reach out to operators getting issues with their relays fixed.
I agree, making it easier to get the email addresses is going to be an improvement. I wrote this parser some time ago, but it fails for many cases: https://gitlab.torproject.org/-/snippets/175
I think the contactinfo should be an email address, and if we care about obfuscation that should be done in the metrics side.