In order to experiment with Review Apps deployed to the static mirror system, I'd like to add two options to tpa-rsync-static-update-wrapper.sh.
The first would be to allow a CI job to deploy to a subdirectory of the static mirror component DocumentRoot. For example, if the CI job passes a second argument of syncreview-new-blog-post (new-blog-post being the name of a MR) then the wrapper, via exec rrsync -wo "/srv/static-gitlab-shim/$SITE_URL/$REMOTE_SUBDIR" would rsync the artifacts to a /syncreview-new-blog-post subdirectory on the static mirror, below the DocumentRoot.
Secondly, we'd need to be able to call that same wrapper to delete such subdirectories when the associated Review Apps are no longer needed. For example, if we re-use the second-argument idea above, then something like deletereview-new-blog-post would trigger a rm -rf on that new-blog-post subdirectory.
If the increase in complexity is too problematic for that poor wrapper, maybe instead of trying to shove everything in a single script, we could just create a new, dedicated one, on a different user account on static-gitlab-shim.tpo?
Designs
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
We can have environment-specific variables, yes. Currently in the blog project I use that because I need two sets of SITE_URL and STATIC_GITLAB_SHIM_SSH_PRIVATE_KEY variables to deploy to two different static mirror components, as so: https://gitlab.torproject.org/tpo/web/blog/-/settings/ci_cd
But that's what environments are basically, just scopes. AIUI the url: key is only used for UI display purposes, it can't tell GitLab CI how to deploy to that URL. That's entirely the purview of CI jobs.
So yes, static-shim already is partly environment agnostic. What we need is the ability to rsync only part of the contents of a given static component files, because currently, like GitLab Pages, it's all-or-nothing.
I should add that not only can CI/CD variables be scoped per-environment, they can be scoped dynamically per-environment. When I wrote above that we can name our review environments review/$CI_COMMIT_REF_NAME, we can have an sshkey CI/CD variable for environment called review/*, that will be accessible to any environment created with this name.
So yes, static-shim already is partly environment agnostic. What we need is the ability to rsync only part of the contents of a given static component files, because currently, like GitLab Pages, it's all-or-nothing.
so i started writing a long reply explaining why that's not possible, linking to the design docs of both the staticsync and static shim... and now i'm not sure.
i think you may be onto something, and this might be worth a try. i would worry about:
breaking the general case: what happens when there's no magic argument?
chroot escapes: right now we do not accept any user-provided argument in the shim wrapper, if we just pass along the argument, an attacker could escape their sandbox and start writing outside of our predefined root. we could sanitize the string, but that's risky business, especially in bash
stray environments: what happens if somehow we forget to disable an environment? won't this leave stray files around?
subdir breakage: one of the problems we had with multiple environments in gitlab pages is that we had to hack at lektor to make it happy to run in a subdir, was that fixed at all?
naming: what would that URL actually be, say, for the blog? we already have blog.tpo, blog-staging.tpo... blog-review.tpo/$REF?
but yes, that could just work! :) maybe i could figure out something tomorrow.
breaking the general case: what happens when there's no magic argument?
The idea is without the magic argument, it would function as it does right now, which is to rsync the DocumentRoot.
chroot escapes: right now we do not accept any user-provided argument in the shim wrapper, if we just pass along the argument, an attacker could escape their sandbox and start writing outside of our predefined root. we could sanitize the string, but that's risky business, especially in bash
That's a serious concern for sure. I'm not certain of the best way we could address it besides accepting only a narrow set of characters in that argument (eg. [a-z09\-_]).
stray environments: what happens if somehow we forget to disable an environment? won't this leave stray files around?
We can set environments to auto-stop after a certain time period. Incidentally, we can also auto-delete environments when the associated branch is deleted.
subdir breakage: one of the problems we had with multiple environments in gitlab pages is that we had to hack at lektor to make it happy to run in a subdir, was that fixed at all?
For Review Apps I think we can live with some level of breakage. For example, I remember there were issues with URLs in the generated RSS/Atom feeds, but I don't think it's a deal-breaker if those bugs pop up in Review Apps. Actually, I was even thinking we might disable some non-essential blog features in Review Apps to speed up builds, like disabling feeds.
naming: what would that URL actually be, say, for the blog? we already have blog.tpo, blog-staging.tpo... blog-review.tpo/$REF?
I think we should start like this, yes. But it would be overkill to have one static mirror component per project so at some point review.tpo/
I'm really not convinced the artifacts preview approach is the right one for reviews. In this scenario, a new environment, with a different name and URL, is created for every set of CI build artifact, and therefore every commit pushed to a branch.
What we want to do rather, and this is the concept behind Review Apps, AIUI, is tie a single review environment to a specific merge request which may see several builds/CI runs in its lifetime as the MR is developped and fine-tuned.
progress update: i rewrote the shim in python and added unit tests in preparation for this change, but i'm not sure how to actually do it. the arguments to the script are hardcoded in the authorized_keys file through puppet right now, i am not sure how to pass parameters through without starting to parse the rsync commandline...
What this issue is about is specifically Review Apps, where we would set it up so that we'd have CI deploy builds made from Merge Requests, using dynamic environments.
another progress update: duh. rrsync already accepts subdirectories, so we just need to change the deploy template to make a different variable for the SUBDIR and REMOTE_SUBDIR.
i still keep the python rewrite because we need an easy way to flush out old environments, and i actually implemented that (the delete-environment command), so that's something we'll have to use too.
@lavamind has started working on a review.tpo site where review apps will get deployed. the next step here is to deploy the python rewrite to replace the .sh script and test the above theory about REMOTE_SUBDIR just working already, guh.
The first would be to allow a CI job to deploy to a subdirectory of the static mirror component DocumentRoot. For example, if the CI job passes a second argument of syncreview-new-blog-post (new-blog-post being the name of a MR) then the wrapper, via exec rrsync -wo "/srv/static-gitlab-shim/$SITE_URL/$REMOTE_SUBDIR" would rsync the artifacts to a /syncreview-new-blog-post subdirectory on the static mirror, below the DocumentRoot.
this is done, in the sense that we can already provide subdirectories to rsync. in fact, it was already possible to do so. we just hadn't realized it. don't forget to use --mkpath.
Secondly, we'd need to be able to call that same wrapper to delete such subdirectories when the associated Review Apps are no longer needed. For example, if we re-use the second-argument idea above, then something like deletereview-new-blog-post would trigger a rm -rf on that new-blog-post subdirectory.