Our deb.torproject.org debs are now built on Gitlab (and possibly our RPMs too?). I believe the deb.torproject.org packages come from https://gitlab.torproject.org/tpo/core/debian/tor/-/pipelines, but @weasel will have to confirm. I am less sure where the RPMs come from. @kushal will have to let us know.
We should confirm that these packages are reproducible. If they are based on the Debian build system, I believe they should be. And tor.git itself might be reproducible by default. However, @weasel was not sure if this was the case, and @ahf and I were not either.
It would be a sad day if Gitlab 0day got someone the whole Tor network, and reproducible builds prevent this possibility, and also allow us to check this after-the-fact.
So I am creating this ticket to check reproducibility, and then fix any issues.
I think the best way forward is to export the gitlab runner script into a standard docker container and see if the sha256sums from https://deb.torproject.org/torproject.org/pool/main/t/tor/ match from the resulting debs from said docker container, or even just a build on a random debian machine. But there may be other ways. I know @jnewsome is good at spinning up ephemeral runners, so that could be another route too, but that does not fully eliminate Gitlab from the picture.
If the test build's sha256sums do not match against the ones in https://deb.torproject.org/torproject.org/pool/main/t/tor/, we may want to use @boklm's RBM tool in our gitlab tor.git build runners, to make it reproducible. (RBM is what Tor Browser now uses, to produce reproducible Tor Browser releases.)
170522 20:45:24 ahf: weasel: yo yo, is the packages for debian stable on deb.torproject.org build reproducibly?200522 13:21:09 weasel: ahf: I haven't heard that it isn't, at least200522 14:42:46 ahf: weasel: cool! i just couldn't figure out what was up and down with the gitlab CI runners. you still build things locally, right?200522 14:51:35 weasel: ahf: the most recent backports uploads I built manually200522 14:52:08 weasel: (but I don't upload binaries to debian.org anyway -- unless in the weird corner cases where that is required)200522 14:52:22 ahf: ya200522 14:52:26 ahf: but this was for deb.torproject.org i meant200522 14:52:28 ahf: not debian itself200522 14:52:54 weasel: the deb.tpo binaries usually come from gitlab200522 14:53:21 ahf: ah-ha, so they are probably *not* reproducible here?200522 14:53:30 weasel: unsure200522 14:53:34 ahf: ack, thank you!200522 14:53:34 weasel: why wouldn't they?200522 14:53:51 ahf: i do not know, it was my impression there was a lot of infrastructure around the goal of building things reproducibly200522 14:54:02 ahf: but maybe the debian tooling takes care of that200522 14:54:26 ahf: my biggest experience with this stuff is via TBB's nice "rbm" tool that does all the hard work for me (well, it's the team that does the hard work, but i can just run their stuff)200522 14:54:27 weasel: I haven't played with it a lot200522 14:54:51 weasel: but I suspect that given the right metainfo, one could build a second set of binaries that match the ones on deb.tpo200522 14:55:02 ahf: ya200522 14:55:04 weasel: now, i'm not sure we actually collect the right metainfo200522 14:55:29 ahf: does debian.org have some fancy queue system that builds the binary packages from scm automatically?200522 14:55:35 weasel: no200522 14:55:52 weasel: things get built from source packages200522 14:56:37 weasel: how you get to those is a different beast. plenty of methods200522 14:57:41 ahf: ack200522 14:59:36 ahf: thanks for the info!
The registration token comes from the Settings -> CI/CD page in the gitlab UI.
Otoh a compromised gitlab could insert a - patch < backdoor.diff line into the script that gets sent to the runner. Ideally we'd figure out a way to audit what script the runner actually executes to rule out this sort of attack.
Happy to take a shot at reproducing that way from a runner on my desktop if you like. We may need to edit the workflow to allow dynamically choosing a runner tag (which we can in turn use to pick a specific runner).
Happy to take a shot at reproducing that way from a runner on my desktop if you like. We may need to edit the workflow to allow dynamically choosing a runner tag (which we can in turn use to pick a specific runner).
How easy is it to use this process to extract a script that can be run on a regular docker container, or on a bare metal machine? Is that a feasible way to quickly rule out gitlab from this process?
How easy is it to use this process to extract a script that can be run on a regular docker container, or on a bare metal machine? Is that a feasible way to quickly rule out gitlab from this process?
I don't know of a way to do that.
It looks like .gitlab-ci.yml already delegates most of the work to ci-driver.sh. I think it already shouldn't be too difficult to arrange to invoke that script directly via Docker.
I think the main thing missing from ci-driver.sh is installing system packages. Btw afaict that itself is not done in a reproducible way - it just gets the latest version of the debian packages. I think we'd need to find some way to ensure we get the exact same versions when reproducing. It'd be nice if apt-get provided a way to say "install the versions that were the latest at time X" but I don't see such a feature. Maybe in the build we could generate a manifest of exactly which versions get installed, and optionally take and use such a manifest (when reproducing)?
Happy to take a shot at reproducing that way from a runner on my desktop if you like. We may need to edit the workflow to allow dynamically choosing a runner tag (which we can in turn use to pick a specific runner).
How easy is it to use this process to extract a script that can be run on a regular docker container, or on a bare metal machine? Is that a feasible way to quickly rule out gitlab from this process?
If there is a script like this, then I think it would be quite easy to
add something to tor-browser-build.git where a make tor-deb-match
would build a package inside a container (using the same system we use
for building Tor Browser), and compare it with the package on
deb.torproject.org.
I think the main thing missing from ci-driver.sh is installing system packages. Btw afaict that itself is not done in a reproducible way - it just gets the latest version of the debian packages. I think we'd need to find some way to ensure we get the exact same versions when reproducing. It'd be nice if apt-get provided a way to say "install the versions that were the latest at time X" but I don't see such a feature. Maybe in the build we could generate a manifest of exactly which versions get installed, and optionally take and use such a manifest (when reproducing)?
Unfortunately it's not easy to get always exactly the same package
versions installed, since old versions of the packages are removed from
mirrors when there is a new debian point release.
It would be possible to use snapshot.debian.org to get packages from a
certain date, but it's rate limited so doesn't work very well.
However, not having exactly the same packages is usually not a problem
since most updates don't affect build outputs. It can be a problem when
there is a gcc update, but it doesn't happen very often in stable
releases.
For Tor Browser we have this ticket (which is not fixed yet):
tor-browser-build#28102
Maybe the gitlab workflow could include all downloaded deb packages (which are signed?) as an additional artifact, and those could be verified and used when reproducing?
Maybe the gitlab workflow could include all downloaded deb packages (which are signed?) as an additional artifact, and those could be verified and used when reproducing?
Possibly at a much later step. It is likely not necessary for this purpose. In my experience with the Tor Browser gitian build system, even years ago it was never the case that package updates influenced things. In fact, for something like C-Tor, I bet it will only happen if the entire base system and toolchain upgrades to a new version, or if there is an emergency compiler bugfix.
If it is still looking tricky to reproduce the build in a standalone setup, let's just try the ephemeral Gitlab runner idea you had in #40615 (comment 2805974) and re-run the job on a new runner. That will eliminate much of this uncertainty if that matches, and if it does not match, we also will have someplace concrete to start.
It looks like I have the capability to add a runner - I went ahead and added one.
I'd have to change the CI script a bit to allow overriding the runner tag, so that I could force the x86 build to happen on my runner.
I'll probably try just using Docker directly first; I think it should be pretty easy. If it's not I'll send a PR with the changes needed to the CI script.
So I hacked up reproduce.sh. It's basically a copy-paste of the gitlab yml (debian/.debian-ci.yml), fed through Docker. It's currently hard-coded to build branch debian-0.4.6 for amd64. It's made to be executed from a checkout of debian/tor repo. It mounts the current directory, in the docker containers, creating directories source-packages and binary-packages.
Taking a step back, the build_sourceartifacts probably also ought to match the generated source-packages, but that differs as well. We probably need to make this step reproducible first.
Trying again from e6683856 (the version of this repo that the job I'm try to reproduce ran at)
That seems to have fixed at least some of the differences in the source artifacts. There are some different paths, and different paths inside the patches, but those might not affect the final output. (I haven't checked thoroughly whether there are other differences)
diffing the buildinfo files in what's ultimately built though, it looks like we are ending up with different versions of libssl and libxml; i.e. it looks like they updated in debian in the meantime.
Cleaned up the script a little bit; maybe I should check this in somewhere? reproduce.sh. I fixed the image for the build step to use debian-bullseye instead of debian-stable, but afaik they're equivalent, and I still got the diff above.
I also unpacked the debs and found that the tor binaries are the same, though I'm a bit surprised that's the case given the library version differences. Details in the README.
So I think we've ~passed our sanity check that it's possible (though maybe we should check more then just the tor binary itself?). We should figure out if this reproduce + validate workflow is something we want to maintain and how.
Poked around a bit more. It looks like the remaining differences are primarily timestamps. If we want the whole .deb to be reproducible, I think we need to run with a frozen clock (e.g. using faketime) and then use the same time when reproducing.
$ for f in `(cd artifacts-38207/ && find usr -type f)`; do diff artifacts-38207/$f reproduce-38207/$f; doneBinary files artifacts-38207/usr/share/doc/tor/changelog.Debian.gz and reproduce-38207/usr/share/doc/tor/changelog.Debian.gz differBinary files artifacts-38207/usr/share/man/man8/tor-instance-create.8.gz and reproduce-38207/usr/share/man/man8/tor-instance-create.8.gz differ
$ diff <(gunzip -c artifacts-38207/usr/share/doc/tor/changelog.Debian.gz) <(gunzip -c reproduce-38207/usr/share/doc/tor/changelog.Debian.gz)1c1< tor (0.4.6.10-dev-20220509T161328Z-1~d11.bullseye+1) tor-nightly-0.4.6.x-bullseye; urgency=medium---> tor (0.4.6.10-dev-20220526T191917Z-1~d11.bullseye+1) tor-nightly-0.4.6.x-bullseye; urgency=medium5c5< -- Peter Palfrader <build@runner-y9surhn-project-1218-concurrent-0> Mon, 09 May 2022 16:14:16 +0000---> -- Peter Palfrader <build@2b183bfec45f> Thu, 26 May 2022 19:20:07 +00007c7< tor (0.4.6.10-dev-20220509T161328Z-1) tor-nightly-0.4.6.x; urgency=medium---> tor (0.4.6.10-dev-20220526T191917Z-1) tor-nightly-0.4.6.x; urgency=medium9c9< * Automated build of tor-nightly at 20220509T161328Z, git revision---> * Automated build of tor-nightly at 20220526T191917Z, git revision12c12< -- Peter Palfrader <build@runner-y9surhn-project-1218-concurrent-0> Mon, 09 May 2022 16:14:12 +0000---> -- Peter Palfrader <build@2b183bfec45f> Thu, 26 May 2022 19:20:04 +0000
first off, if you're poking around artifacts to try to figure out why
they differ, i strongly encourage you try out the "diffoscope" tool
written by the reproducible builds people.
the other thing you have there is that yes, there's something that
injects a new changelog entry in your build system. i'm not sure what
that is, but that is of course bound to generate a diff.
that thing should respect the SOURCE_EPOCH variable so that it can be
run reproducibly, without having to mess with LD_PRELOAD (which is what
faketime does, i believe).
...
On 2022-05-26 20:36:05, Jim Newsome (@jnewsome) wrote:
Poked around a bit more. It looks like the remaining differences are primarily timestamps. If we want the whole .deb to be reproducible, I think we need to run with a frozen clock (e.g. using faketime) and then use the same time when reproducing.
$ for f in `(cd artifacts-38207/ && find usr -type f)`; do diff artifacts-38207/$f reproduce-38207/$f; doneBinary files artifacts-38207/usr/share/doc/tor/changelog.Debian.gz and reproduce-38207/usr/share/doc/tor/changelog.Debian.gz differBinary files artifacts-38207/usr/share/man/man8/tor-instance-create.8.gz and reproduce-38207/usr/share/man/man8/tor-instance-create.8.gz differ
$ diff <(gunzip -c artifacts-38207/usr/share/doc/tor/changelog.Debian.gz) <(gunzip -c reproduce-38207/usr/share/doc/tor/changelog.Debian.gz)1c1< tor (0.4.6.10-dev-20220509T161328Z-1~d11.bullseye+1) tor-nightly-0.4.6.x-bullseye; urgency=medium---> tor (0.4.6.10-dev-20220526T191917Z-1~d11.bullseye+1) tor-nightly-0.4.6.x-bullseye; urgency=medium5c5< -- Peter Palfrader <build@runner-y9surhn-project-1218-concurrent-0> Mon, 09 May 2022 16:14:16 +0000---> -- Peter Palfrader <build@2b183bfec45f> Thu, 26 May 2022 19:20:07 +00007c7< tor (0.4.6.10-dev-20220509T161328Z-1) tor-nightly-0.4.6.x; urgency=medium---> tor (0.4.6.10-dev-20220526T191917Z-1) tor-nightly-0.4.6.x; urgency=medium9c9< * Automated build of tor-nightly at 20220509T161328Z, git revision---> * Automated build of tor-nightly at 20220526T191917Z, git revision12c12< -- Peter Palfrader <build@runner-y9surhn-project-1218-concurrent-0> Mon, 09 May 2022 16:14:12 +0000---> -- Peter Palfrader <build@2b183bfec45f> Thu, 26 May 2022 19:20:04 +0000
--
Antoine Beaupré
torproject.org system administration
first off, if you're poking around artifacts to try to figure out why they differ, i strongly encourage you try out the "diffoscope" tool written by the reproducible builds people.
Nifty! The build-date is also in the name of the deb files, which is presumably preventing diffoscope from drilling down further:
that thing should respect the SOURCE_EPOCH variable so that it can be run reproducibly, without having to mess with LD_PRELOAD (which is what faketime does, i believe).
It sure does. Agreed SOURCE_EPOCH sounds like a more principled way to do it.
first off, if you're poking around artifacts to try to figure out why they differ, i strongly encourage you try out the "diffoscope" tool written by the reproducible builds people.
Nifty! The build-date is also in the name of the deb files, which is presumably preventing diffoscope from drilling down further:
--
Antoine Beaupré
torproject.org system administration
a lot of changes have been made to Debian's build infrastructure to make those kind of "just work", so when you build a debian package "the right way", you actually benefit from a lot of those.
it would actually be important to more clearly state what the concern is: has there been non-reproducible builds found in the wild? how are you rebuilding the debian packages (and why...)?
in general, the reproducible-builds.org website is a great resource for all of this, and they can actually help us if we need to figure out some of those things. from what i remember there's paid consultants working on this stuff over there now...
oh, and if I might add, the RBM tool is quite a beast, from what I remember. we shouldn't need this for tor-little-t, really.
(and rust is a whole different reproducibility problem, i'll just assume that problem space doesn't exist for now. :p)
The first step is just to check the build outside gitlab (or even in gitlab). For simple things, the toolchain has been updated not to include timestamps, etc. Reproducibility only requires RBM if there's additional stuff like zips, scripting, etc that re-inserts things like timestamps, filesystem ordering, etc.
We should not over-complicate this before doing a simple test. And if that test fails, then we should try using the tools we have, if we can.
It uses a local .gitlab-ci.yml, so I think the user doesn't need any special permission to the repo (e.g. to register a runner). Compared to registering and using a local runner this also removes the attack vector of a compromised gitlab server injecting backdoored commands into the script generated from .gitlab-ci.yml.
We'd still need a wrapper script to "orchestrate" the jobs since it only runs a single job, and we may need to inject some environment variables, but at least the repro script wouldn't need to duplicate the shell snippets in the gitlab yml. (We could alternatively move those shell snippets out to external shell scripts in the repo, but the extra layer of indirection would make the yml itself a bit more opaque)
That looks like a feature... request, not an actual shipped feature. :)
But it's an interesting idea.
It uses a local .gitlab-ci.yml, so I think the user doesn't need any special permission to the repo (e.g. to register a runner). Compared to registering and using a local runner this also removes the attack vector of a compromised gitlab server injecting backdoored commands into the script generated from .gitlab-ci.yml.
... or compromised code. Yes, that's really interesting!
We'd still need a wrapper script to "orchestrate" the jobs since it only runs a single job, and we may need to inject some environment variables, but at least the repro script wouldn't need to duplicate the shell snippets in the gitlab yml. (We could alternatively move those shell snippets out to external shell scripts in the repo, but the extra layer of indirection would make the yml itself a bit more opaque)
I tend to keep as little business logic in the YML as i can, exactly for
that reason. If the YML is simple, then you can reproduce the build in a
simple docker container, and it's pretty close to the environment
provided by the GitLab runner.
At least in theory. :) In practice, there's a ton of stuff that happens
during the runner setup, including environment variables, config files
and so on, which are obviously hard to reproduce...
...
On 2022-05-27 14:14:25, Jim Newsome (@jnewsome) wrote:
--
Antoine Beaupré
torproject.org system administration
That looks like a feature... request, not an actual shipped feature. :) But it's an interesting idea.
It's a request for a v2 of the feature that's more maintainable and has more features :)
I tend to keep as little business logic in the YML as i can, exactly for that reason. If the YML is simple, then you can reproduce the build in a simple docker container, and it's pretty close to the environment provided by the GitLab runner.
At least in theory. :) In practice, there's a ton of stuff that happens during the runner setup, including environment variables, config files and so on, which are obviously hard to reproduce...
Yeah, that's the road I was starting to go down. Then I thought maybe it'd be easier to just write a script that parses the yml into scripts, then thought but maybe gitlab already exposes that somewhere... so here we are :). Using this might be a nice shortcut, but yeah if it doesn't work out we can fall back to moving more of the logic out of yml and into scripts invoked by the yml...
That looks like a feature... request, not an actual shipped feature. :) But it's an interesting idea.
It's a request for a v2 of the feature that's more maintainable and has more features :)
Oh nice. okay, then I definitely need to look more into this and
probably document this in our wiki.
I tend to keep as little business logic in the YML as i can, exactly for that reason. If the YML is simple, then you can reproduce the build in a simple docker container, and it's pretty close to the environment provided by the GitLab runner.
At least in theory. :) In practice, there's a ton of stuff that happens during the runner setup, including environment variables, config files and so on, which are obviously hard to reproduce...
Yeah, that's the road I was starting to go down. Then I thought maybe it'd be easier to just write a script that parses the yml into scripts, then thought but maybe gitlab already exposes that somewhere... so here we are :). Using this might be a nice shortcut, but yeah if it doesn't work out we can fall back to moving more of the logic out of yml and into scripts invoked by the yml...
hehe... good thing you stopped in time.
do you have a link to the upstream docs on this?
...
On 2022-05-27 14:28:38, Jim Newsome (@jnewsome) wrote:
--
Antoine Beaupré
torproject.org system administration
It successfully runs the build_source job and grabs the artifacts, but I got stuck at trying to execute just one of the parameterized build jobs (build_binary: [debian, bullseye, amd64, amd64, -slim]). Maybe there's a way to do it that I don't see... Otherwise maybe just doing it the "boring" way of moving everything out of yml into helper scripts s.t. it can be used with a thin docker wrapper is the way to go :)
The script could still stand to be generalized some more (e.g. currently hardcoded to build for debian-bullseye; we should also be able to point it at alternative repos for the tor and/or build source to cross-validate), but you should be able to just edit the pipeline_id and it should automagically build at the same branch and commit as that pipeline did.
One thing I mentioned there that I hadn't mentioned here was that the planned next version of gitlab-runner exec interacts with the GitLab instance, and it's unclear exactly what the GitLab instance will provide and whether it will be auditable. I just raised this concern on the gitlab issue: https://gitlab.com/gitlab-org/gitlab-runner/-/issues/2797#note_1003160535
(Added 0.4.7 milestone because we should revisit this every point release going forward, and try to reduce the set of things that differs in the build, as well as improve the end-to-end authentication from release tag to package).
There has been a bit of deep compromise of infrastructure the past few months, and some gitlab security issues. I am also getting a bit nervous about the adversarial activity we've seen lately on the network, and the fact that it is known that this ticket is just sitting here, unfunded :/.
Updated the script to accommodate changes in the pipeline (new parameters in the build matrix), and to make it easier to reproduce different configs. I checked the ubuntu-xenial-amd64 build as well with similar results.
Exactly which build to reproduce can now be changed by editing job_params
Can I build other debs than Deb 11 from a Deb 11 VM, with different docker images? Does the script do this?
Yup, the script uses Docker, and pulls the same image name as was used in the build.
Also, for the diffoscope, do you download the artifacts for the job manually, or does your script handle it?
I think you picked the wrong pipeline? This builds 0.4.8.0 for me, not 0.4.7.13. It looks like your diffoscope output also is comparing 0.4.8.0.
The correct pipeline for 0.4.7.13 seems to be 62395.
Additionally, I had to do some fiddling to get it to find the right job. At least for arm64, the arguments for the job name change order from what you had in your refactoring commit. I had to revert to the hardcoded commit and do:
But now, I am still hitting a tag checkout issue in the build container. It seems like it is taking too much of the debian package version into the tag name, plus some other weird stuff? It is trying to do a git checkout of checkout maint-tor-0.4.7.13-1, but the right tag is just tor-0.4.7.13 (or debian-tor-0.4.7.13-1)..
Not sure how to fix this.. Hardcoding pipeline_branch and commit don't work. The bad tag seems to be being made in the build script itself.
I think you picked the wrong pipeline? This builds 0.4.8.0 for me, not 0.4.7.13. It looks like your diffoscope output also is comparing 0.4.8.0.
The correct pipeline for 0.4.7.13 seems to be 62395.
Ah ok. For documentation/future-reference, how did you go about identifying that pipeline id btw?
Additionally, I had to do some fiddling to get it to find the right job. At least for arm64, the arguments for the job name change order from what you had in your refactoring commit.
Ugh, yeah I wasn't sure whether I could count on the order being consistent or not. I'll see if I can make that a bit more robust.
But now, I am still hitting a tag checkout issue in the build container. It seems like it is taking too much of the debian package version into the tag name, plus some other weird stuff? It is trying to do a git checkout of checkout maint-tor-0.4.7.13-1, but the right tag is just tor-0.4.7.13 (or debian-tor-0.4.7.13-1)..
It was trying to run the wrong job. I updated the script to run build_source-release and build_binary-release.
I also needed to set CI_COMMIT_TAG, which those jobs depend on.
It gets a bit further now but is failing in the pristine-tar command:
+ pristine-tar checkout tor-0.4.7.13.tar.gzfatal: ambiguous argument 'cb08422f82dbd92fc6dac9a6962d81afd8a93377^{tree}': unknown revision or path not in the working tree.
fatal: ambiguous argument 'cb08422f82dbd92fc6dac9a6962d81afd8a93377^{tree}': unknown revision or path not in the working tree.
I had issues like this due to the usage of --depth for shallow checkouts. I removed those in my branch from your python, but there are also some usage of them in the build scripts themselves maybe?
I think it's looking for it in the debian tor repo (gitlab.torproject.org/tpo/core/debian/tor.git). I don't see it there, nor in the tor repo (gitlab.torproject.org/tpo/core/tor.git).
I spot checked the hashes in a couple of the older files and couldn't find those either. Probably I'm not understanding what it's actually trying to do here...
I don't see a commit with that hash in any of the relevant repos. I
think it's looking for it in the debian tor repo
(https://gitlab.torproject.org/tpo/core/debian/tor.git). I don't see
it there, nor the in the tor repo
(git@gitlab.torproject.org:tpo/core/tor.git).
For historical reference, I reproduced 0.4.7.13 on arm64 successfully. diffoscope differences are similar to earlier: just file list metadata and the package info replacement strings.
I also had to comment out the env line from subprocess.Popen(), because my $DOCKER_HOST is empty (local docker). Else it failed:
+ #env={'DOCKER_HOST': os.getenv('DOCKER_HOST')},
It does concern me a bit that the build script that is run comes from gitlab itself, which could be tainted. Perhaps there could be a step to save/review it, and optionally use a local cached copy?
It does concern me a bit that the build script that is run comes from gitlab itself, which could be tainted. Perhaps there could be a step to save/review it, and optionally use a local cached copy?
The cloned repo is in reproduce-*/pipeline_src. The scripts can be inspected there. Or if we want to cross-validate with another clone of that repo, I think we just need to validate that the given tag has the same hash in that checkout as in the other repo. e.g. git rev-parse debian-tor-0.4.7.13-1^ gives the answer for me in that directory vs my working checkout of that repo.
Hi. Thanks for the mention @mikeperry. I feel I'm coming into the middle of this rather, and I'm not familiar with the context. I skimread this thread.
My starting point is that I would like to use the Reproducible Builds project's notion of reproducibility, which involves producing the exact same deliverables, bit-for-bit identical. So there should be no question of anyone reviewing things in diffoscope. (Or rather, having to review something in diffoscope means "our build is not reproducible, and we are debugging it".)
And ideally actual reproduction would be done routinely, on systems sharing as little as possible in terms of non-source inputs.
It seems we're using a roughly normal-Debian-package build system; there's a debian/rules. So we ought to be using Debian's official reproduction tooling, and a normal Debian build rune (dpkg-buildpackage, sbuild, or something), not some ad-hoc script. And we could use Debian's tooling for producing the repro environment, rather than a docker image. After all, subverting the build environment is a way to subvert the binaries.
So. I think "the Tor binaries on deb.torproject.org are reproducible" means this:
I should be able to get the tor source code from the signed tag, and run debrebuild, and get the same binaries. I haven't tried to do this. Is it likely to work? I think from reading this thread that the answer is "no".
It may seem like this is an extreme notion of reproducibility, but it is usually achieveable in practice in my experience. Recently, with my personal hat on, I built .debs for the Xen Hypervisor on my laptop and uploaded them to Debian experimental (for foolish reasons to do with broken Debian processes), and these turned out to be identical to the binaries one of my collaborators built from the same git commit, and we hadn't even bothered with a tool like debrepro.
I completely agree on all points, Ian. The main issue we're running into right now is that we migrated the package building into gitlab a couple years back but did not make use of the Debian reproducibility tooling for this. So we have had to check this stuff manually with diffoscope. This ticket is mostly filled with that kind of triage, to make sure we weren't pwnt after weird shit happened on gitlab.
It is also my opinion that doing this right is not hard. We just haven't had someone familiar with the debian reproducibility tooling to do it. We also got in managerial disagreements about getting funding for this, hiring/contracting someone, etc.
However, this is one of those situations where trying to get funding will literally cost more than the activity itself, by several orders of magnitude. Unfortunately, this also means that these kinds of simple yet crucial things tend to never get done, for this exactly reason :/.
I should be able to get the tor source code from the signed tag, and run debrebuild, and get the same binaries. I haven't tried to do this. Is it likely to work? I think from reading this thread that the answer is "no".