migration of gettor into rdsys

Feel free to rewrite/reorganize the tasks of this ticket, I have no idea of the moving parts on TPA side and I might be missing many tasks.

mentioned in issue #40692 (closed)

when do you need that actual checklist to be done exactly?

changed due date to June 20, 2022

added Backlog label

changed due date to June 30, 2022

added lifecycle label

mentioned in issue #40795 (closed)

I see June 27th is hackweek. I'll confirm it, I'm hopping to be ready by June 22, but is ok (and maybe more realistic) to leave it for the week of July 3-7.

mentioned in issue #40669 (closed)

marked this issue as related to #40669 (closed)

so in other words, we should do this by june 22nd, or during that week, or ready for the first week of july?

that seems like, in any case, we should set this up in the week of june 22nd if we want to have a hackweek ourselves. ;)

Yes, I'll do my best to get everything ready for the week of the 22th. Otherwise is not a big deal to do it after hackweek, we should enjoy hackweek and don't work on this over that week.

The problem is that you can't move the email itself until everything is ready to receive it in rdsys side. But maybe you can setup the imap server and other moving pieces before the email itself is moved to it.

assigned to @anarcat

added Next label and removed Backlog label

changed due date to June 22, 2022

we're heading into vacations at TPA here, so I think it will be best if i do the first part of the setup ASAP, so i'll get started on this the coming week.

have a smtp server to send email with gettor@torproject.org email address. There is already a working email server in polyanthum used by bridgedb for bridges@torproject.org. Doesn't need to be in the same machine, rdsys has support to do plain auth.

we can just reuse the regular SMTP server on polyanthum here, right?

have a metrics endpoint for prometheus metrics. https://bridges.torproject.org/rdsys-gettor-metrics? pointed to localhost:7700/metrics

did you already start working on this in the prometheus-alerts repository, or is that something new? and can you handle the prometheus part? (i understand we'll need to deal with the apache part, and i wonder if we shouldn't create a gettor.tpo alias here...)

also: could we run this on a different VM altogether, ie. rebuild a gettor-02? or does that need to be on polyanthum absolutely? i'd love to split out that busy server...

thanks!

we can just reuse the regular SMTP server on polyanthum here, right?

Right

have a metrics endpoint for prometheus metrics. https://bridges.torproject.org/rdsys-gettor-metrics? pointed to localhost:7700/metrics

did you already start working on this in the prometheus-alerts repository, or is that something new?

This is new, and not included in prometheus-alerts (for now is a MR tpo/anti-censorship/rdsys!43 (merged)). The previous prometheus modification was for the telegram bot.

and can you handle the prometheus part?

Not sure what the prometheus part implies. But I don't have rights to modify prometheus.yml, and that will be needed as this exporter will be in a different path.

(i understand we'll need to deal with the apache part, and i wonder if we shouldn't create a gettor.tpo alias here...)

Becareful as gettor.tpo is a static website hosted in a different server, the gettor migration is not affecting that website.

also: could we run this on a different VM altogether, ie. rebuild a gettor-02? or does that need to be on polyanthum absolutely? i'd love to split out that busy server...

Yes, we could run it in a different server. rdsys is designed as different processes and they communicate between them over http. Up to now we have being deploying all rdsys daemons in a single VM and user (right now there are 3: backend, moat and telegram), but we could split them.

The rdsys backend needs to live in polyanthum, as is where the bridge descriptors live. But all the distributors could live in other VMs or users. We'll need to expose the backend API http endpoint to those VMs. I will like that endpoint to don't be publicly reachable over internet, it has authentication tokens, but will be nice if not everything gets broken if those leak or an issue in our validation is found. I believe this is easy to solve in apache by an allow list or we could set up some kind of tunnels around.

I'm happy to do the deployment in gettor-02 (or rdsys-gettor-01 can also work as name) if you prefer that. I don't think gettor needs anything above the normal VM setup (2GB RAM, 20GB in /srv, ...).

[...]

and can you handle the prometheus part?

Not sure what the prometheus part implies. But I don't have rights to modify prometheus.yml, and that will be needed as this exporter will be in a different path.

I meant the patch on prometheus-alerts.

(i understand we'll need to deal with the apache part, and i wonder if we shouldn't create a gettor.tpo alias here...)

Becareful as gettor.tpo is a static website hosted in a different server, the gettor migration is not affecting that website.

Oh. Right.

also: could we run this on a different VM altogether, ie. rebuild a gettor-02? or does that need to be on polyanthum absolutely? i'd love to split out that busy server...

Yes, we could run it in a different server. rdsys is designed as different processes and they communicate between them over http. Up to now we have being deploying all rdsys daemons in a single VM and user (right now there are 3: backend, moat and telegram), but we could split them.

The rdsys backend needs to live in polyanthum, as is where the bridge descriptors live. But all the distributors could live in other VMs or users. We'll need to expose the backend API http endpoint to those VMs. I will like that endpoint to don't be publicly reachable over internet, it has authentication tokens, but will be nice if not everything gets broken if those leak or an issue in our validation is found. I believe this is easy to solve in apache by an allow list or we could set up some kind of tunnels around.

I'm happy to do the deployment in gettor-02 (or rdsys-gettor-01 can also work as name) if you prefer that. I don't think gettor needs anything above the normal VM setup (2GB RAM, 20GB in /srv, ...).

Is there an incentive for you to split that up at all? security wise, say?

as for tunnels, i would typically rely on https + auth token. otherwise we can setup ipsec if you don't want to trust the CA cartel. and of course we can slap IP address allow lists on top of that...

rdsys-frontend-01?

could you host all the frontends there, removing more stuff from polyanthum, eventuall?

thanks!

...

On 2022-06-20 14:46:23, meskio (@meskio) wrote:

-- Antoine Beaupré torproject.org system administration

Is there an incentive for you to split that up at all? security wise, say?

mmm, spliting services security wise is usually an improvement to split between VMs. But here is pretty marginal improvement, right now any distributor with a token has basically access to all bridges anyway. And exposing the backend API to the network might actually be a bigger problem in security that what we win from splitting the services. But hopefully this is something we can improve in the future.

Anyway separating the bridge descriptors in a different machine might be a good idea.

rdsys-frontend-01?

could you host all the frontends there, removing more stuff from polyanthum, eventuall?

Sounds good, for now let's do gettor, as the other frontends will require some some changes. But once gettor is working in the new VM I'll figure out the changes needed to move the others and we can do the full change in the coming months.

I guess it will make sense to create a user per distributor, so now we'll have a gettor user in rdsys-frontend-01.

I'll update the description of this issue adding the VM creation to it.

sweet thanks, i updated the ticket summary to clarify some implementation details so that others can pick that up as they wish.

i am not sure i'll have time to wrap this up before the hackweek, so i'm going to donate this back to the TPA pool, so that someone can handle this from the backlog. makes sense?

I see you changed the metrics url. Be aware that in this rdsys-frontend there are going to be several metrics endpoints, right now we are talking about the gettor metrics, but there will be more. So we could use different domain names for each distributor (https://rdsys-gettor.tpo/metrics, https://rdsys-telegram.tpo/metrics, ... or we can use the path of the metrics (https://rdsys-frontend.tpo/gettor-metrics, https://rdsys-frontend.tpo/telegram-metrics, ...). But it doesn't make sense to host gettor metrics in https://rdsys-frontend.tpo/metrics.

i am not sure i'll have time to wrap this up before the hackweek, so i'm going to donate this back to the TPA pool, so that someone can handle this from the backlog. makes sense?

Sounds good. I'll be happy to aim for the first week of July for this deployment, that also gives me some space to don't rush things here.

meskio commented on a discussion: #40789 (comment 2815796)

I see you changed the metrics url. Be aware that in this rdsys-frontend there are going to be several metrics endpoints, right now we are talking about the gettor metrics, but there will be more. So we could use different domain names for each distributor (https://rdsys-gettor.tpo/metrics, https://rdsys-telegram.tpo/metrics, ... or we can use the path of the metrics (https://rdsys-frontend.tpo/gettor-metrics, https://rdsys-frontend.tpo/telegram-metrics, ...). But it doesn't make sense to host gettor metrics in https://rdsys-frontend.tpo/metrics.

I think i'd rather have one hostname per service so that we don't have to do path-based hacks. We have to intervene for the port redirection anyways, might as well do the full stack properly and then we can simplify Prometheus service discovery, as previously discussed.

i am not sure i'll have time to wrap this up before the hackweek, so i'm going to donate this back to the TPA pool, so that someone can handle this from the backlog. makes sense?

Sounds good. I'll be happy to aim for the first week of July for this deployment, that also gives me some space to don't rush things here.

Okay, then you'll need to sync up with @lavamind or @kez because i'll be afk for the first two weeks of july. I'm sure they can handle this though.

...

On 2022-06-21 16:20:25, meskio (@meskio) wrote:

-- Antoine Beaupré torproject.org system administration

changed the description

unassigned @anarcat

We'll share binaries and some configuration repo between all distributors. I think it makes sense to set up a global account where we install all those shared stuff and one for each distributor to run them actually with separated users.

changed the description

assigned to @lavamind

changed due date to July 06, 2022

added Backlog label and removed Next label

added Next label and removed Backlog label

changed the description

I'm almost ready to do this migration, doing the last testing of gettor and I should be ready to do the deployment next week (I know is taking me longer than expected).

How can we coordinate that? Can you start the deployment and leave the email address switch for the last moment? I guess for me the first thing I need is to have the new VM ready so I can start the deployment on my side.

I added a task to the list to expose the rdsys backend in polyanthum, let me know if is not clear.

meskio commented:

I'm almost ready to do this migration, doing the last testing of gettor and I should be ready to do the deployment next week (I know is taking me longer than expected).

How can we coordinate that? Can you start the deployment and leave the email address switch for the last moment? I guess for me the first thing I need is to have the new VM ready so I can start the deployment on my side.

It's getting a little late for this week (I'm about to head out), but I should be able to give you a VM early monday, remind me if i fail.

I added a task to the list to expose the rdsys backend in polyanthum, let me know if is not clear.

that looks ok, although....

expose the rdsys-backend (localhost:7100/resources-stream) in apache from polyanthum to be reachable from rdsys-frontend-01 (and only from that host)

what do you mean here by "only from that host"? IP-level access control? or password + https? we can do either or both.

...

On 2022-07-22 18:23:38, meskio (@meskio) wrote:

-- Antoine Beaupré torproject.org system administration

what do you mean here by "only from that host"? IP-level access control? or password + https? we can do either or both.

I think IP-level is fine. The backend uses a token based authentication, I will just feel better to have a second layer of protection there.

I didn't say it explicitly, but it must be https so the token and the communication content is not exposed to an observer.

Alright, I've set up a rdsys-backend.torproject.org HTTPS vhost on polyanthum for this purpose. Apache2 only allows connections from the frontend node, as required.

marked the checklist item create new rdsys-frontend-01 VM with rdsys and gettor users with sudo for all anti-censorship team. as completed

added Doing label and removed Next label

marked the checklist item expose the rdsys-backend (localhost:7100/resources-stream) in apache from polyanthum to be reachable from rdsys-frontend-01 (and only from that host) as completed

added Next label and removed Doing label

I see rdsys-frontend-01 is already created. rdsys and gettor users are there. Thanks for the work.

I can sudo into gettor, but not into rdsys it gives me this error:

meskio@rdsys-frontend-01:~$ sudo -u rdsys -s
Sorry, user meskio is not allowed to execute '/bin/bash' as rdsys on rdsys-frontend-01.torproject.org.

So I guess I'm missing rights there.

The gettor user doesn't have a home folder created, I will need a place to put files for it (same for rdsys). Can you create one? Or give me a space in /srv/ where I can have files for each user?

Hi @meskio, I added you to the rdsys group in LDAP and create home directories for both gettor and rdsys. HTH

Nice, I see now that both of them have a home folder and a /srv/*.torproject.org folder. But all of them are owned by root and I can't access them. Do you mind making each user own /srv/.torproject.org folder and it's contents?

Done!

assigned to @anarcat and unassigned @lavamind

i'll try to finish this one this week. it seems like all that's left is to setup the IMAP server and some forwarding. ie.

did i miss anything?

I think that is all, yes.

changed due date to August 18, 2022

marked the checklist item on rdsys-frontend-01, setup a dovecot imap-only mailbox (like gitlab and civicrm) where gettor@torproject.org emails arrive. (gettor@torproject.org emails are currently arriving to gettor-01 and is being sent to gettor over a postfix pipe script), that implies: as completed

marked the checklist item have a smtp server to send email with gettor@torproject.org email address. ~~Doesn't need to be in the same machine, rdsys has support to do plain auth.~~ should just be localhost delivery, make rdsys-frontend-01 a "mailhost" in puppet as completed

changed the description

there's now an IMAP server on rdsys-frontend-01, which receives email directed at gettor@rdsys-frontend.torproject.org. the username of the IMAP account is gettor@rdsys-frontend.torproject.org as well, and the password is on the server, in /home/rdsys/pass. please destroy after reading, or, alternatively, we could populate that thing with puppet so it's kept up to date (if you're going to keep it in cleartext anyways, might as well have it written down reliably).

the next step is to make sure everything works on your side. then i think the next steps are:

change the gettor@torproject.org forward on eugeni
have a metrics endpoint for prometheus metrics. https://rdsys-frontend.torproject.org/metrics pointed to localhost:7700/metrics
remove gettor-01 machine as is not used anymore, needs coordination with anti-censorship team

note that the second bullet point there doesn't seem to work right now:

root@rdsys-frontend-01:/home/rdsys# curl localhost:7700/metrics
curl: (7) Failed to connect to localhost port 7700: Connection refused

so i haven't done that part yet.

have a metrics endpoint for prometheus metrics. https://rdsys-frontend.torproject.org/metrics pointed to localhost:7700/metrics

also, it seems like we don't have apache or nginx running on that box at all right now... i wonder if it's worth the trouble of creating a whole web server just to forward that URL... maybe we could use something clever with systemd socket activation instead?

have a metrics endpoint for prometheus metrics. https://rdsys-frontend.torproject.org/metrics pointed to localhost:7700/metrics

also, it seems like we don't have apache or nginx running on that box at all right now... i wonder if it's worth the trouble of creating a whole web server just to forward that URL... maybe we could use something clever with systemd socket activation instead?

That might be a solution for now. But afaik the intention with rdsys-frontend is to host many other rdsys frontends, and many of them will expose prometheus metrics (or other http stuff) in a different port. So in the long run we'll need nginx or apache there to play as a reverse proxy.

Now thinking on it it should not be called https://rdsys-frontend.torproject.org/metrics but https://rdsys-frontend.torproject.org/gettor/metrics, as we'll have other metrics coming from different services in that VM.

That might be a solution for now. But afaik the intention with rdsys-frontend is to host many other rdsys frontends, and many of them will expose prometheus metrics (or other http stuff) in a different port. So in the long run we'll need nginx or apache there to play as a reverse proxy.

Ah yes, that makes sense.

Now thinking on it it should not be called https://rdsys-frontend.torproject.org/metrics but https://rdsys-frontend.torproject.org/gettor/metrics, as we'll have other metrics coming from different services in that VM.

Hmm... how about we point gettor.torproject.org there? the whole point of this is to avoid the annoying suffix change (e.g. /gettor/metrics instead of plain /metrics). or maybe rdsys-gettor.tpo since gettor.tpo is the main website?

how about we point gettor.torproject.org there? the whole point of this is to avoid the annoying suffix change (e.g. /gettor/metrics instead of plain /metrics). or maybe rdsys-gettor.tpo since gettor.tpo is the main website?

gettor.tpo is a website that I don't know where is hosted or how is built, I guess I could learn, but if I can avoid adding one extra task to this migration I'll be happier. rdsys-gettor.tpo sounds good to me.

gettor.tpo is a website that I don't know where is hosted or how is built, I guess I could learn, but if I can avoid adding one extra task to this migration I'll be happier. rdsys-gettor.tpo sounds good to me.

will set that up now.

the https://rdsys-gettor.torproject.org/metrics endpoint works, insofar as it is exposed to the network, but only from the prometheus2 server. it doesn't work because port 7700 is not open locally, but i guess you'll fix this eventually?

i'll go ahead and add it to the scrape targets.

https://rdsys-gettor.torproject.org/metrics is now being scraped.

changed due date to August 24, 2022

gettor@rdsys-frontend-01:~$ XDG_RUNTIME_DIR=/run/user/$(id -u) systemctl --user
Failed to connect to bus: No such file or directory

Looks like gettor user doesn't have rights to access systemd in rdsys-fronted-01. Can I get those rights enabled?

Looks like gettor user doesn't have rights to access systemd in rdsys-fronted-01. Can I get those rights enabled?

it looks like lingering wasn't enabled for that user, i have enabled that which requires a reboot (or possibly some reload but i can't be bothered to figure out the specifics right now), scheduled for 10 minutes from now.

gettor@rdsys-frontend-01:~$ XDG_RUNTIME_DIR=/run/user/$(id -u) systemctl --user

this is now fixed.

Yep, it works nicely now. Thanks.

I'm noticing I have made a mistake and ask to configure apache in polyanthum to expose /resources-stream while is /resource-stream (without plural). Can you remove the s in the apache configuration so I can reach the right place? Sorry for the mistake.

I'm noticing I have made a mistake and ask to configure apache in polyanthum to expose /resources-stream while is /resource-stream (without plural). Can you remove the s in the apache configuration so I can reach the right place? Sorry for the mistake.

that is now done.

It works, but there is a 30s timeout. The frontend does maintain a constant connection there. Can we rise or disable this timeout for that domain name?

I created an issue for this: #40876 (closed)

BTW imap and smtp servers work fine. I'm testing them and getting and sending emails.

BTW imap and smtp servers work fine. I'm testing them and getting and sending emails.

okay well, i've checked every item on my list but this. i don't see a metrics exporter on :7700/metrics so I'm not 100% sure the service is running right now.

i will therefore hold off until you give me confirmation that everything is okay before flipping the switch on the gettor@ email. once that's done, i'll start the retirement procedure for gettor-01 and this ticket will finally be done!

sorry for the delays.

The service was off, but it is on now and the metrics seems to work fine: https://rdsys-gettor.torproject.org/metrics

changed the description

mentioned in commit prometheus-alerts@59f346c4

marked the checklist item have a metrics endpoint for prometheus metrics. ~~https://rdsys-frontend.torproject.org/metrics~~ rdsys-gettor.torproject.org/metrics pointed to localhost:7700/metrics as completed

changed the description

added Doing label and removed Next label

migration of gettor into rdsys

Designs

Child items ...

Activity