We just finished the GSoC project to re-write Tor Weather and we would like to deploy the result now with weather.torproject.org as the respective domain.
@sarthikg: could you make a list of things we'd need to have available on a new machine/in a new VM to have Tor Weather running properly? And maybe some spec that VM/machine? Thanks!
In-terms of the machine, there are no specific requirements for the cores & the ram in the system. Though, we will be needing a database to persist some data. It could be great if the database could be a separate instance altogether, which should ease out the process of maintenance, or scalability in future.
For the database, anything above 15Gb should be good to go.
Apart from this, one major requirement includes sending emails. This can either be provided by opening the communications through Port-25, or through a token from one of the email service providers.
I think running the DB on a separate instance would be a bit overkill. Moving it out to a separate VM in the future shouldn't take too much effort.
Using a transactional mail server instead of directly sending from the machine would be best, especially considering our recent mail deliverability problems.
I think running the DB on a separate instance would be a bit overkill. Moving it out to a separate VM in the future shouldn't take too much effort.
Agreed.
Using a transactional mail server instead of directly sending from the machine would be best, especially considering our recent mail deliverability problems.
Somehow I missed the train where "transactional mail" was invented, so
maybe someone can fill me up on that?
Typically right now we don't submit mail from other servers to the
submission.tpo server. So:
I wonder whether it would be possible to hook the mail delivery into
our submission.tpo system [...]
... the answer to that is "no, we can't relay mail through
submission.tpo from another server, it's specifically designed for
end-users".
Now a server can do one of two things:
send mail on its own (e.g. GitLab, CiviCRM, RT, and others do this)
relay mail through the central server (eugeni, every other server
does this, that's the default configuration)
The reason why standalone mail delivery was enabled on servers in case 1
above is to isolate problems. We were having deliverability issues on
eugeni (case 2) and we wanted to solve the problem one step at a
time....
so we have an actual chance of getting our mail delivered (+ not landing in spam folders).
Well that's the name of the game isn't it. :)
What email are we talking about here? internal mail? or gettor-like
"email the world" kind of problem?
...
On 2022-09-20 20:34:53, kezzle wrote:
--
Antoine Beaupré
torproject.org system administration
Somehow I missed the train where "transactional mail" was invented, so maybe someone can fill me up on that?
from mailchimp:
A transactional email is an email that is sent to an individual recipient following a commercial transaction or specific action performed by that person, such as a purchase in your connected store or a password reset request
does this replace an existing service or it's an entirely new thing?
It does not replace anything currently running but it replaces something we ran a couple of years ago. It might still be an entirely new thing as far as you are concerned. :)
Using a transactional mail server instead of directly sending from the machine would be best, especially considering our recent mail deliverability problems.
I wonder whether it would be possible to hook the mail delivery into our submission.tpo system so we have an actual chance of getting our mail delivered (+ not landing in spam folders).
so we have an actual chance of getting our mail delivered (+ not landing in spam folders).
Well that's the name of the game isn't it. :)
What email are we talking about here? internal mail? or gettor-like "email the world" kind of problem?
I guess we mean the latter: we need a system that can get notifications to real users in case X happens.
Which reminds me of another part of this new service: it'll contain a database with user information (email addresses). I am not sure if TPA has a procedure for setting that part up (given that it is potentially PII) and safeguarding. Either way it might be something we want to take into account.
We do not, as far as I know, have any procedure regarding this other
than "do not do it". In other words, we consider the only safe way to
handle PII is to not hold on to it at all.
There are exceptions: RT holds user email addresses, and CiviCRM as
well. The latter is protected by a double-layered, ungodly mess of
middleware, ipsec, redis and php duct tape, an approach which I might
not recommend in this case.
Why do you need to keep user's email addresses anyways? :)
so we have an actual chance of getting our mail delivered (+ not landing in spam folders).
Well that's the name of the game isn't it. :)
What email are we talking about here? internal mail? or gettor-like "email the world" kind of problem?
I guess we mean the latter: we need a system that can get notifications to real users in case X happens.
Which reminds me of another part of this new service: it'll contain a database with user information (email addresses). I am not sure if TPA has a procedure for setting that part up (given that it is potentially PII) and safeguarding. Either way it might be something we want to take into account.
--
Antoine Beaupré
torproject.org system administration
We do not, as far as I know, have any procedure regarding this other than "do not do it". In other words, we consider the only safe way to handle PII is to not hold on to it at all.
Fair enough. I like that sentiment. :)
There are exceptions: RT holds user email addresses, and CiviCRM as well. The latter is protected by a double-layered, ungodly mess of middleware, ipsec, redis and php duct tape, an approach which I might not recommend in this case.
Why do you need to keep user's email addresses anyways? :)
Well, if you have a better idea how to email users about their relays being down without keeping email addresses they used for subscribing to the service, I am all ears. :) @sarthikg had the idea of splitting up the db and application so they do not sit on the same server, which seems like a good idea to me. I guess that complicates the setup and I am sorry for that, but I guess that gives us the option to have a safer deployment, which is a thing we want.
There are exceptions: RT holds user email addresses, and CiviCRM as well. The latter is protected by a double-layered, ungodly mess of middleware, ipsec, redis and php duct tape, an approach which I might not recommend in this case.
Why do you need to keep user's email addresses anyways? :)
Well, if you have a better idea how to email users about their relays
being down without keeping email addresses they used for subscribing
to the service, I am all ears. :)
I can think of many options.
Users could hold a token and use it to check themselves when a problem
occurs, on a dashboard.
We could support web hook notifications, or other kinds of notifications
that do not include some PII token.
I don't have any idea involving email notification without actually
having the user's email addresses, of course.
@sarthik had the idea of splitting up the db and application so they do not sit on the same server, which seems like a good idea to me. I guess that complicates the setup and I am sorry for that, but I guess that gives us the option to have a safer deployment, which is a thing we want.
I'm not sure just splitting the DB out gives us much benefit. If the
frontend is compromised, it has access to email addresses anyways,
right?
One split that could be worthwhile would be to have a monitoring
service that fires alerts using an opaque token, and a separate backend
that would map that token into an email address, to reduce the attack
surface area. Maybe the backend could even encrypt the PII in the
database with said token, for example.
But that's getting into the weeds of "TPA is telling you how to do your
job" here.. not sure it's a good direction to take, especially if you
have a finished product you want to ship. :)
I was mostly curious about how things are setup and why you need PII. If
you do need it, let's just do this and hope for the best, and document
in the service page (we will have a service page, right?) that we do
have PII. :)
No block on my end, in any case.
A.
...
On 2022-09-23 08:37:53, Georg Koppen wrote:
--
Antoine Beaupré
torproject.org system administration
To be fair, during development, I didn't think through the safeguarding of PII. But as this conversation progressed, I think we might have to rethink the architecture of the application that gives a major consideration to data security.
I might be wrong here, but I think that the approach suggested by @anarcat regarding email tokenisation is good, but should only be adding obscurity when compared with encrypting the data on the host server itself.
@anarcat, storing email-id's is the backbone around tor-weather as it's a service built around notifying people in case some anomaly is detected with their relays. The only alternative to that is to have a dashboard where they can login to check, which if you think is just something they can do today.
Though, I'll do some research around how we can safeguard the PII & discuss it with @gk, following which I'll get back here with the finalised plan.
typically, the DB would be in /var... the OS installs on that first 10G
and you bind mount stuff out of /var into /srv, which is that 50G. but
yes, that looks okay.
...
On 2022-10-14 18:47:47, kezzle wrote:
kezzle commented:
@anarcat it sounds like this machine doesn't need a /srv disk, just a /var for the DB. how do these parameters look for the new instance?
@anarcat what would be the right nagios hostgroups for this? based on the other hosts i've got computers, syslog-ng-hosts, hassrvfs, nginx-hosts, nginx-https-hosts
@anarcat what would be the right hostgroups for this? based on the others i've got computers, syslog-ng-hosts, hassrvfs, nginx-hosts, nginx-https-hosts
--
Antoine Beaupré
torproject.org system administration
we've deployed a Debian stable VM. that ships with:
Python 3.9.2
PostgresSQL 13.8.0
Python 3.11 and PostgreSQL were release just in the last month, the odds
of us deploying those are pretty thin at this point.
We're looking at upgrading the fleet to bookworm, that said, within
early 2023. We might be able to upgrade you as a guinea pig, if you're
interested. That would mean (currently):
Python 3.10.6
PostgreSQL 15.0
I would assume that Python 3.11 will enter bookworm shortly as well.
Nginx (I think this should be already there)
I do not believe Nginx was installed already. Do you have specific
version requirements for those as well? Debian stable has 1.18.0 and
bookworm (testing) has 1.22.0.
These are the base requirements, we might have some additional dependencies in the future though...
Let us know.
Is it possible to use Docker for deployments?
That way, we won't require anything on the system apart from the docker server, and probably some network configurations.
Our policy on this is not super solid at this point. Typically, we don't
use Docker outside of special deployments managed by TPA. The current
exceptions are GitLab CI, BTCPayserver and Dangerzone, all services
directly managed by us, and where Docker is in use because, really,
nothing else would work.
So I would rather stick with a regular deployment if you could live with
our rather stringent requirements for now. :)
Thanks, and sorry for the trouble,
...
On 2022-10-29 12:27:56, sarthikg (@sarthikg) wrote:
Antoine Beaupré
torproject.org system administration
... so maybe this is something we could consider, but keep in mind
it's a significant effort on our end. I think at this point I would
probably rather keep it a simple SSH-based deployment unless you have
a very strong incentive to deploy this through GitLab.
Also, be aware that I'm considering, long term, using things like
kubernetes or at least some container-based deployment system through
GitLab to replace this classic "you get a VM and a shell" deployment
approach. This is, for now, just at the "thought in my head" step, far
from any proposal or actual work.
So, for now, yeah, probably just a SSH...
...
On 2022-10-31 14:46:30, sarthikg wrote:
--
Antoine Beaupré
torproject.org system administration
Ahh, got it! Just had one question in my mind coming from this. Where do we have our servers? AWS, GCP, Azure maybe? I think all of them have their CLI's available from where we can automate the deployments using some sort of access token or IAM users.
Incase we use their CLI's, even automatic deployments directly from Gitlab should not be that big of an effort since security is mostly offloaded if we do things the "right way". But if the instance does not belong to one of these giants, I think manual deployments should be the way to go.
Ahh, got it! Just had one question in my mind coming from this. Where do we have our servers? AWS, GCP, Azure maybe? I think all of them have their CLI's available from where we can automate the deployments using some sort of access token or IAM users.
We basically run our own private cloud, which are (currently) hosted at
multiple locations including Hetzner, Sunet, and Cymru.
Incase we use their CLI's, even automatic deployments directly from Gitlab should not be that big of an effort since security is mostly offloaded if we do things the "right way". But if the instance does not belong to one of these giants, I think manual deployments should be the way to go.
I suspect you underestimate the amount of legacy stuff we are dealing
with here. :) Happy to expand if you're really curious, but this is
getting pretty out of scope.
Manual deploys it should be, I think...
a.
...
On 2022-10-31 15:23:35, sarthikg wrote:
Antoine Beaupré
torproject.org system administration
@sarthikg if you send me an nginx config, i'll add it to the machine. i think that's one of the last things i need for this, and then it's double-checking that everything looks right!
for this, maybe look at how rdsys or probe telemetry was setup. the
latter, in particular, was setup with an nginx proxy.
also, if we need just a http(s) proxy, we would typically use Apache for
that, because we use that everywhere, with exceptions (like probe
telemetry, because of websocket).
@sarthikg if you send me an nginx config, i'll add it to the machine. i think that's one of the last things i need for this, and then it's double-checking that everything looks right!
--
Antoine Beaupré
torproject.org system administration
Yeah, I haven't worked with Apache before. But as it seems the only reason you used Nginx was because of a websocket requirement, and we don't particularly have a use-case for it, I think Apache should be good to go!
@kez it looks like you forgot to change the parentHost,purpose/etc in the LDAP entry:
anarcat@curie:~$ ssh weather-01.torproject.orgLinux weather-01 5.10.0-18-amd64 #1 SMP Debian 5.10.140-1 (2022-09-02) x86_64This device is for authorized users only.%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Welcome to weather-01.torproject.org, used for the following services: XXX This host is also accessible under the following aliases: XXX This virtual server runs on the physical host .%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
i removed the nginx nagios checks from weather-01. i added them without really thinking about whether they were needed, and they're being extremely noisy right now. i'll add them back in later if nginx ends up getting installed on the box.