In legacy/trac#28325 (moved) we are working on enabling go.mod support for our Go based projects. While that will be an improvement in that we don't need to work around a new default anymore, it does not solve the question how we can easily update all the dependencies of a project and the dependencies of those dependencies. That's the scope for this ticket. Besides legacy/trac#28325 (moved) we should keep in mind that we might want to think about how we want to handle go modules in the future. Right now, we have a project per module in tor-browser-build. Maybe that's not a smart thing with the growing amount of projects we have. Either way, the solution for this ticket will have an impact on that, so we should take the question about how to handle the module-project relationship into account.
For the dependency update path forward we a bunch of possible options, some already mentioned in legacy/trac#28325 (moved) (in no particular order):
Use go mod vendor to vendor in the dependencies and then build with -mod=vendor to use the vendor folder with the dependencies.
Use go mod download to fetch dependencies into the cache and then point, during the build, with GOPROXY to the cached files (that's feeling similar to what we use for our Gradle dependencies right now).
There might be more than those three.
boklm had a nice idea of restructuring our go projects in a way that we'd only have one go-module project and we could list all needed dependencies directly in the respective project under input_files (see: comment:42:ticket:28942). That might be orthogonal to 1) and 3) but might actually help with 2). not sure.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
One thing to keep in mind here is that the go module system pulls in a lot more dependencies than we actually need. This happens regardless of which of the 3 options above we go with, since they all use the go module logic to figure out what is needed.
See legacy/trac#33761 (moved), where we recently removed a bunch of unnecessary dependencies from Snowflake. Upon a quick investigation, these extra dependencies come from at least the following places:
The dependencies are only needed for running tests
They are marked with // indirect in the go.mod file, meaning they are tentative dependencies. These could be dependencies of dependencies that don't have a go.mod file, are missing from their go.mod file, or are the result of an up or downgrade (see here).
The dependencies are required only for part of the code that's built with an option build constraint. This is what we ran into with Snowflake, where the quic dependencies (which are numerous) or only needed for a part of pion-webrtc that we don't use and isn't built by default.
I think it's worth attempting to exclude go module dependencies that are not needed. You can see discussion on legacy/trac#33761 (moved) and legacy/trac#33745 (moved) on why we want to remove the quic dependencies for Snowflake specifically. However, this would require extra processing scripts for any of the 3 options above we decide to go with.
The script for (2) could use some more work in addition to excluding unnecessary dependencies, like automatically mapping versions to git hashes, and integration into however we decide to structure the rbm projects (e.g., boklm's input_files idea).
Use go mod vendor to vendor in the dependencies and then build with -mod=vendor to use the vendor folder with the dependencies.
How would this work? Would we have to pull from a separate snowflake branch that has this vendor folder checked in? If we're going to pull all the dependencies at once, I'd rather do something like option (3), since it sounds like there's already a workflow present for something similar. Maintaining the vendor directory sounds tricky.
I think it's worth attempting to exclude go module dependencies that are not needed.
My thought now is that if we go with options (1) or (3) this might not matter so much. Since the dependencies aren't used to build the binary, it's not like they are contributing to binary size. It was more a pain point from a maintenance and rbm project blowup perspective. It added to the size of the rbm repository and increased build time. But if we're doing (1) or (3) these aren't a concern anymore if I am understanding correctly.
Use go mod vendor to vendor in the dependencies and then build with -mod=vendor to use the vendor folder with the dependencies.
How would this work? Would we have to pull from a separate snowflake branch that has this vendor folder checked in? If we're going to pull all the dependencies at once, I'd rather do something like option (3), since it sounds like there's already a workflow present for something similar. Maintaining the vendor directory sounds tricky.
I think this can be done by adding a go_mod_vendor step, which will use a container with network enabled and a snowflake source tarball (from the same git clone) to run go mod vendor and generate a tarball which will be used as input_files for the snowflake build.
With this we will be running "go mod vendor" and creating a snowflake-go-mod-tarball-$git_hash.tar.xz tarball each time the snowflake commit changes. However the tarball will probably not change for each commit, so as an alternative we could name it snowflake-go-mod-tarball-$expected_sha256sum.tar.xz (where $expected_sha256sum is the expected checksum of the tarball, assuming building it is reproducible) to avoid regenerating it when it is not expected to change.
I think it's worth attempting to exclude go module dependencies that are not needed.
My thought now is that if we go with options (1) or (3) this might not matter so much. Since the dependencies aren't used to build the binary, it's not like they are contributing to binary size. It was more a pain point from a maintenance and rbm project blowup perspective. It added to the size of the rbm repository and increased build time. But if we're doing (1) or (3) these aren't a concern anymore if I am understanding correctly.
Yes, I think unneeded dependencies isn't a big concern if we're doing (1) or (3).
Use go mod vendor to vendor in the dependencies and then build with -mod=vendor to use the vendor folder with the dependencies.
How would this work? Would we have to pull from a separate snowflake branch that has this vendor folder checked in? If we're going to pull all the dependencies at once, I'd rather do something like option (3), since it sounds like there's already a workflow present for something similar. Maintaining the vendor directory sounds tricky.
I think this can be done by adding a go_mod_vendor step, which will use a container with network enabled and a snowflake source tarball (from the same git clone) to run go mod vendor and generate a tarball which will be used as input_files for the snowflake build.
That's one approach, yes. I had more the option in mind to do it like we handle our Rust crates. One would update all the modules and then put them into a .tar.bz2 file somewhere which then gets used during the build. I don't like the idea of using just what go mod vendor gives us automatically for building for each build but it seems you have addressed that with your PoC. We'd have right now duplicated repos, though, due to legacy/trac#33988 (moved), right?
Okay, there is safeguarded with a sha256sum we calculate before using the whole input, that's good. I still feel a bit uneasy with doing build X while network access is allowed for building X. Because you should not need to have network access when building. :) But one maybe could see it more like fetching resources which we'd need to do anyway for building.
Another thing that I feel the go mod vendor version does not give us is easy transparency regarding dependencies and what is used. You have, however we construct the fetching of dependencies, usually a .tar.xz blob and that's it while with the current setup (and boklm's improved one) it makes it easier to see the updated repo changes and spotcheck things.
Use go mod vendor to vendor in the dependencies and then build with -mod=vendor to use the vendor folder with the dependencies.
How would this work? Would we have to pull from a separate snowflake branch that has this vendor folder checked in? If we're going to pull all the dependencies at once, I'd rather do something like option (3), since it sounds like there's already a workflow present for something similar. Maintaining the vendor directory sounds tricky.
I think this can be done by adding a go_mod_vendor step, which will use a container with network enabled and a snowflake source tarball (from the same git clone) to run go mod vendor and generate a tarball which will be used as input_files for the snowflake build.
That's one approach, yes. I had more the option in mind to do it like we handle our Rust crates. One would update all the modules and then put them into a .tar.bz2 file somewhere which then gets used during the build. I don't like the idea of using just what go mod vendor gives us automatically for building for each build but it seems you have addressed that with your PoC. We'd have right now duplicated repos, though, due to legacy/trac#33988 (moved), right?
That would be separate steps from the same project, so they would use the same git clone. legacy/trac#33988 (moved) is when different projects use the same repo.