investigate using httpdirfs (mounting remote directories as read-only) instead of requests
What if we used https://dist.torproject.org/ instead of fiddling around with the Tor-provided API endpoint for obtaining the download links to new releases and caching downloads temporarily on a local filesystem?
-> Original discussion: tpo/tpa/team#40683 (comment 2789962) (@lavamind)
- The interface of the bot is dynamically built on top of a response that the API has sent (
https://aus1.torproject.org/torbrowser/update_3/release/downloads.json
) and is designed in a way so that it can instantly apply updates to available platforms and languages. There is a certain structure to the API itself, which abstracts the part where we have to sort of the binaries depending on the platform and the languages. It is possible to dig everything up from the directory listings directly, is it worth it? - With the locale and the operating system provided, we obtain a download link that we use to define the name of the download file instead. Perhaps we could also use the said file name to dig it up from a read-only "mount" of dist.torproject.org while retaining most of the requests functionality. Otherwise, if we were to completely scrap the requests, we would have to determine all of the available locales and platforms by processing all of the file names (some of which differ slightly, see: spelling in the macOS and the Windows binaries). Won't it be more unstable like that?
I used the API instead of "guessing" the filenames (aka. presuming that the files will always have a specific naming convention attributed to them) and requesting them directly from dist.torproject.org, and using the directory directly makes me think that I would be making the same presumption, but apply it in a reverse manner.
httpdirfs seems like a fancy tool that I really want to use because it straight up looks cool and abstracts a lot of things that are unpleasant to deal with. Implementing it is a big effort, but it could be potentially useful. However, the question is whether there are any advantages to using it.
For example, we could probably just read-only mount https://dist.torproject.org/torbrowser/
I am presuming that this is a remote directory and that there is no huge advantage to using it, performance-wise, other than an interface that is easy to manage. The question is, is it?
It [httpdirfs] even supports caching downloaded files so if the file was retrieved previously, it doesn't have to fetch it from the remote filesystem again and again.
Files that were retrieved previously are already cached on Telegram's servers. This could be potentially useful for an E2EE implementation, that is not planned, but this is definitely not a concern right now. The files either have to have a local copy, or they do not. In this case, they do not. This means that the best strategy, especially under a storage constraint, is to get rid of them as soon as possible. If not, the more important question here is when and how can we get rid of them as soon as possible. Does httpdirfs provide any sort of fine-tuned control over that?
Related issue: https://gitlab.torproject.org/tpo/anti-censorship/gettor-project/onionsproutsbot/-/issues/5