We want to set up a staging server of rdsys, that will be automatically deployed on each commit from the CI. We'll need a new VM for it. Many things will be similar to polyanthium, but we might not need separation of services by users (it might be easier to deploy if we have everything in one user).
We will not need much disk space, CPU or RAM, whatever are the defaults you use now a days will be enough for us.
What we need there:
an account that we can ssh automatically from the CI to setup everything. let's postpone this
We'll also need everybody from anti-censorship to be able to sudo into that account.
an email account that can send and receive emails over imap and smtp. Maybe gettor-tst@torproject.org?
I don't have a strong opinion for port numbers, the domain names and the email address, I'm just putting some proposals here but I'm happy to adapt to what you think makes sense.
11 of 12 checklist items completed
· Edited
Designs
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
I'm going to be mostly AFK until September, will be great if we manage to get this VM working so I can start setting it up once I'm back. I'll try to keep an eye to this issue while I'm AFK so you are not blocked, but it might take days some times.
an account that we can ssh automatically from the CI to setup everything. We'll also need everybody from anti-censorship to be able to sudo into that account.
so both CI and users will need access to this. what's the story for "oops, jane destroyed the test server", you rebuild it?
in other words, how's deployment from scratch done here? is that something you do in CI or something we help you do in Puppet, or a mix?
tst looks weird to me, can we make that test or staging, since, well, it's a staging server?
I don't think we'll connect them to the prometheus server
why not?
otherwise, we're a bit crammed right now but i'll try to squeeze this in August, no promises. but i'm happy you're looking at staging, i think it's a great idea!
@kez can you share your experience of setting up a dev server for donate? how would you do it here?
in general, this gets us closer to doing "continuous deployments" in GitLab, which is, in theory, designed for this. we don't have much experience with that, but @lavamind did work extensively on environments for the static site deployments, so we at least have that running in prod. it's not a long-running server process however (unless you count apache, but that's managed by us/puppet, not GitLab)...
would help tremendously if you have existing examples of other projects deploying such staging servers, otherwise we'll look into it of course.
so both CI and users will need access to this. what's the story for "oops, jane destroyed the test server", you rebuild it?
in other words, how's deployment from scratch done here? is that something you do in CI or something we help you do in Puppet, or a mix?
I don't have all shorted out in my head yet, but my first idea is to do it from the CI. So basically the CI deletes some folders and replaces binaries, configuration and things like that. The idea will be to have a script for that, I was hopping to don't get into puppet.
But I'm happy to hear ideas on how to do it.
tst looks weird to me, can we make that test or staging, since, well, it's a staging server?
I'm ok with that, let's go for test. staging is too long.
I don't think we'll connect them to the prometheus server
why not?
It will require us to redesign a bit our dashboards to handle the staging data, but maybe.
would help tremendously if you have existing examples of other projects deploying such staging servers, otherwise we'll look into it of course.
I agree, I'll poke around to see if I can find something.
What are you suggesting here? To set up rdsys as a CI service? AFAIK this is not meant to leave a service running after the CI has finished neither for us to access the service and modify it. We might want to do that at some point for integration tests, but for now we want to have a server with the latest version of rdsys were we can try things while doing development.
In my head the staging server is loosely connected to do integration tests in the CI and I want to use this work to think on how to do integration tests and see if we can reuse some pieces for both.
What are you suggesting here? To set up rdsys as a CI service?
not the prod version, but i was under the impression that would work for the staging version...
In my head the staging server is loosely connected to do integration tests in the CI and I want to use this work to think on how to do integration tests and see if we can reuse some pieces for both.
Maybe my mental model of what your proposing is unclear. Could you make a diagram or a longer-form explanation of how things would work?
Maybe my mental model of what your proposing is unclear. Could you make a diagram or a longer-form explanation of how things would work?
The idea is to have a rdsys setup with fake bridges where we can test new features we are developing. I thought it might be handy if that setup is automatically deployed on every commit on main, so our messy tests get deleted and we have a clean system with the latest piece of code.
But I have to recognize I don't have a clear plan here.
I guess the normal steps in a workflow with my proposal will be:
a commit into main triggered the CI to do a clean deployment in our staging server
I start working on a new feature
I deploy my work in progress feature in the staging server to test it
I finish the development and create a merge request to rdsys
My merge request gets merged into main and the CI cleans up my mess in the staging server and leaves it in a clean working status (back to 1.)
okay, so that's a good first draft. let's push this a little bit.
a commit into main triggered the CI to do a clean deployment in our staging server
that's the same as step 5 below, right?
I start working on a new feature
I deploy my work in progress feature in the staging server to test it
I finish the development and create a merge request to rdsys
My merge request gets merged into main and the CI cleans up my mess in the staging server and leaves it in a clean working status (back to 1.)
okay, i find this really confusing. why does CI mess with your setup at all in this case? it seems you would have a CI job running only on the protected branch here (main) and all it would do would be to trash your dev environment to rebuilt it with the stuff from main, and then... do nothing?
how do you intend on doing the other steps above? like when you "work on a new feature" and "deploy [on] the staging server", what does that mean concretely? you deploy with git? rsync? copy-paste?
i ask, because typically what we would do is that merge requests would create MR-specific "environments" and those get deployed on their own to wherever we choose. this could be a new VM, a vhost in a VM, a prefix on a vhost in a VM, or could be a container image, it can be anything really.
so i find the above process a bit confusing because it mixes up manual deployments and automated deployments on the same host. i think that's error prone and bound to create problems, for example permissions problems between files managed by you and the ones deployed by CI.
i would much rather have all of this deployed by CI, including pipelines running from your merge request. that might require a bit more thought on how we actually design this thing, but it seems like we need to think this through anyway...
how do you develop this locally right now? maybe we can take inspiration from that?
We had a discussion about that on irc. I need to rethink the CI setup, but we'll move along to setup a staging server and see if we connect it to the CI after.
server bootstrapped and now in DNS, @meskio you should have shell access to rdsys-frontend-test-02.torproject.org, can you check? you can sync the host keys from another existing TPO host, or use DNSSEC.
right now there's a rdsys account in the rdsys group that you can all sudo into. we could make another account that's also part of the group for CI, but i'm still unclear on how you expect CI to work in the first place...
then again, maybe i can just setup all the pieces and you play with them as you see fit after!
now i feel like i made a terrible mistake in naming that server like rdsys-frontend, it's the backend, isn't it? so i should probably rip all that out and name it rdsys-backend-01?
I see now I'm trigger happy and did write a comment about it before reading this one. Yes, will be better to rename it. But is not going to be only the backend, but also the distributors will live there. Also this is going to be a staging/test server, so let's not call it backend. Maybe to call it rdsys-test-01.
okay, i think i'm stuck now. i need to be clearer on what this thing is, whether it's the frontend or the backend, to make sure i name things correctly.
I don't think it makes sense to name this rdsys-frontend-..., is a staging server for the whole rdsys, not just the frontends, we'll test the backend there too. Can we just call it rdsys-test-01 (or the number you want)?
anarcat@angela:tsa-misc$ ./retire -H rdsys-frontend-test-02.torproject.org retire-all --parent-host=dal-node-01.torproject.org -vstarting tasks at 2023-08-30 16:11:19.845657+00:00checking for ganeti master on host dal-node-01.torproject.orgganeti node detected with master dal-node-01.torproject.orgchecking on dal-node-01.torproject.org if instance rdsys-frontend-test-02.torproject.org is runningstopping instance rdsys-frontend-test-02.torproject.org on dal-node-01.torproject.orgWaiting for job 102720 for rdsys-frontend-test-02.torproject.org ...scheduling rdsys-frontend-test-02.torproject.org instance removal on host dal-node-01.torproject.orgscheduling gnt-instance remove --force rdsys-frontend-test-02.torproject.org to run on dal-node-01.torproject.org in 7 dayswarning: commands will be executed using /bin/shjob 7 at Wed Sep 6 16:11:00 2023scheduling rdsys-frontend-test-02.torproject.org backup disks removal on host bungei.torproject.org and director bacula-director-01.torproject.orgchecking for path "/srv/backups/bacula/rdsys-frontend-test-02.torproject.org/" on bungei.torproject.orgscheduling rm -rf "/srv/backups/bacula/rdsys-frontend-test-02.torproject.org/" to run on bungei.torproject.org in 30 dayswarning: commands will be executed using /bin/shjob 108 at Fri Sep 29 16:11:00 2023checking for path "/srv/backups/pg/rdsys-frontend-test-02/" on bungei.torproject.orgpath /srv/backups/pg/rdsys-frontend-test-02/ not found: [Errno 2] No such filescheduling echo delete client=rdsys-frontend-test-02.torproject.org-fd yes | bconsole to run on bacula-director-01.torproject.org in 30 dayswarning: commands will be executed using /bin/shjob 59 at Fri Sep 29 16:11:00 2023Notice: Revoked certificate with serial 185Notice: Removing file Puppet::SSL::Certificate rdsys-frontend-test-02.torproject.org at '/var/lib/puppet/ssl/ca/signed/rdsys-frontend-test-02.torproject.org.pem'rdsys-frontend-test-02.torproject.orgSubmitted 'deactivate node' for rdsys-frontend-test-02.torproject.org with UUID df1e580d-e663-4442-8571-c67cd40bdeb6completed tasks, elasped: 0:00:26.924690 (user 2.43 system 0.07 chlduser 0.04 chldsystem 0.01 RSS 57.6 MB)anarcat@angela:tsa-misc$
puppet bootstrapping, next step is to make a puppet recipe for this host to do the forwards and everything, but within the hour, @meskio and ACT should have access to rdsys-test-01.torproject.org already.
prometheus exporters to be exposed, I don't think we'll connect them to the prometheus server, but will be useful to be able to reach them for tests:
Okay hold on here. I was about to open ports 7100, 7600, 7700 and 7800 to the big bad internet right there, but right above that we proxy at least port 7100 (and what's up with the other ports), so which one is it? do you need that stuff proxied or not? :) It's okay to have it both proxied and not, but i just want to make double-sure i don't expose stuff that shouldn't be exposed.
That is a good catch. I think is fine to expose it as it will be a testing machine, but nothing private.
But that makes me realize that either I stop reusing ports for API+metrics or we define a diffrent way to share prometheus exporters. I'll chime in #41280 about that.
oh, that's interesting. i would have figured we'd have a way to automatically do this, but it seems we haven't implemented this yet.
for now i've dropped a plain-text version in ~meskio/rdsys-mail-password.txt on rdsys-test-01. i am happy to write a plain-text version owned by the right user in the right format somewhere else if that's better for you, as for now that's a static copy that won't change if we rotate the password (or when we deploy this to prod)...
anarcatmarked the checklist item prometheus exporters to be exposed, I don't think we'll connect them to the prometheus server, but will be useful to be able to reach them for tests: as completed
marked the checklist item prometheus exporters to be exposed, I don't think we'll connect them to the prometheus server, but will be useful to be able to reach them for tests: as completed
anarcatmarked the checklist item backend bridges-tst.torproject.org:7100/metrics as completed
marked the checklist item backend bridges-tst.torproject.org:7100/metrics as completed
anarcatmarked the checklist item telegram bridges-tst.torproject.org:7600/metrics as completed
marked the checklist item telegram bridges-tst.torproject.org:7600/metrics as completed
anarcatmarked the checklist item gettor-distributor bridges-tst.torproject.org:7700/metrics as completed
marked the checklist item gettor-distributor bridges-tst.torproject.org:7700/metrics as completed