space to deploy OnionSproutsBot

changed the description

I forgot to mention OnionSproutsBot will not require any external port open, apache to work, ip address or public domain pointing to it. It does connect to the telegram API and doesn't have any directly visible service. It just needs internet access.

Sidenote: I would say that there is a minimum space requirement of around 5-10 GBs in storage, all the way up to ~140 GBs (for every single binary in the latest version, in every language and available platform) to mitigate a hypothetical attack vector. Most of that space will not be used 99.99% of the time.

would make sense to make a new VM if gettor will be deprecated eventually. can it talk to rdsys over the network securely? (ie. does it need to be on polyanthum?)

Sidenote: I would say that there is a minimum space requirement of around 5-10 GBs in storage, all the way up to ~140 GBs (for every single binary in the latest version, in every language and available platform) to mitigate a hypothetical attack vector. Most of that space will not be used 99.99% of the time.

would love to get a clearer spec here:

machine name (see tpa-rfc-6 naming conventions
core memory (standard is 8GB)
virtual CPU cores (standard is 2)
SSD disk space (standard is 10G system, 20G /srv)
HDD disk space (standard is "none", although we also have SAS-only VM if you don't need fast storage)
any networking limitations / requirements (e.g. open ports, etc)

i should really make this part of a FAQ or something.

would make sense to make a new VM if gettor will be deprecated eventually. can it talk to rdsys over the network securely? (ie. does it need to be on polyanthum?)

This bot doesn't need to talk with rdsys or any other polyanthum service. Is pretty much autonomous, only talks with the telegram API and the tor browser downloads json and tgzs (to update the binaries it provides when there is a new version).

machine name (see tpa-rfc-6 naming conventions

onionsprouts-01? gettor-02? gettor-telegram-01?
I don't have a strong preference, reading the rfc I guess any of those will make sense, so whatever you think is more clear in your naming convention.

core memory (standard is 8GB)

8GB is plenty, I'm pretty sure we can live with 4GB.

virtual CPU cores (standard is 2)

1 should be enough, is async python, the main process is going to be single threaded.

SSD disk space (standard is 10G system, 20G /srv)

That should be enough.

HDD disk space (standard is "none", although we also have SAS-only VM if you don't need fast storage)

none is good

any networking limitations / requirements (e.g. open ports, etc)

No open ports or public IP address needed. Just being able to do outgoing connections to 443 will be enough.

Is it aiming at being a full gettor replacement?

...

On 2022-03-21 17:30:58, meskio (@meskio) wrote:

meskio commented on a discussion: #40683 (comment 2789174)

would make sense to make a new VM if gettor will be deprecated eventually. can it talk to rdsys over the network securely? (ie. does it need to be on polyanthum?)

This bot doesn't need to talk with rdsys or any other polyanthum service. Is pretty much autonomous, only talks with the telegram API and the tor browser downloads json and tgzs (to update the binaries it provides when there is a new version).

machine name (see tpa-rfc-6 naming conventions

onionsprouts-01? gettor-02? gettor-telegram-01?

-- Antoine Beaupré torproject.org system administration

Is it aiming at being a full gettor replacement?

No, OnionSproutsBot is a telegram bot that distributes tor browser binaries. It doesn't replace the existing gettor implementation neither uses it.

then maybe gettor-telegram-01

added Needs Information label

assigned to @anarcat

assigned to @lavamind and unassigned @anarcat

SSD disk space (standard is 10G system, 20G /srv)

I designed the bot so that the files downloaded by the bot are stored in the temporary directory of the operating system it's running on. In this case, it would be better if as much space as possible went towards /tmp instead. I can make accommodations if needed, but any urgent problems should go away with a simple reboot without any further adjustments on any mainstream Linux system (presumably Debian) and the thing I described in the beginning of the thread regarding a max of 120 GBs isn't as urgent.

it would be better if as much space as possible went towards /tmp instead

I created an issue in OpenStroutsBot to be able to configure where the binaries are downloaded (https://gitlab.torproject.org/tpo/anti-censorship/gettor-project/onionsproutsbot/-/issues/6). We might want to download them to /srv.

the thing I described in the beginning of the thread regarding a max of 120 GBs isn't as urgent

We should not need 120GB if we delete each file after being uploaded to telegram. Tor Browser binaries are ~80MB, in 20GB you could be downloading 500 at the same time, that should be enough, but maybe I'm missing something here.

We should not need 120GB if we delete each file after being uploaded to telegram. Tor Browser binaries are ~80MB, in 20GB you could be downloading 500 at the same time, that should be enough, but maybe I'm missing something here.

I'll let you know about this in a second.

Tor Browser binaries are ~80MB, in 20GB you could be downloading 500 at the same time, that should be enough, but maybe I'm missing something here.

I'm curious to know why this service would need several hundred copies of the same binary, is it because each one needs to be encrypted for individual recipients?

@lavamind Well, it's not the same binary, and as things are right now, there's no E2EE. More on that later. The bot is able to provide downloads for every locale and every operating system. (Linux 32-bit, Linux 64-bit, etc.) Although the files are stored in a temporary directory right now so that they will be removed later (this does not happen yet from the bot itself, but implementing this is trivial). However, someone could hypothetically request every single binary that the bot is able to provide, particularly shortly after an upgrade.

There's no E2EE; That may be the case in the future and that's something I would like to work on, but not now, as not using it (with the con of Telegram being supposedly able to know what you requested, but if you try to initiate a E2EE chat, it will know that as well) allows people to spread awareness about the bot within the platform easily by forwarding messages.

The bot is able to provide downloads for every locale and every operating system. (Linux 32-bit, Linux 64-bit, etc.) Although the files are stored in a temporary directory right now so that they will be removed later (this does not happen yet from the bot itself, but implementing this is trivial).

I'm wondering then if the bot really needs to maintain a directory archive separate from dist.torproject.org. For example, we could probably just read-only mount https://dist.torproject.org/torbrowser/ inside that machine so that it wouldn't have to deal with adding/removing binaries at all, and new versions would be available immediately as they get released. What do you think?

Specifically, I was thinking of we could use https://packages.debian.org/bullseye/httpdirfs to mount the directory listing as a filesystem. It even supports caching downloaded files so if the file was retrieved previously, it doesn't have to fetch it from the remote filesystem again and again.

I'd also add, for the record, that a complete Tor Browser release, all locales and operating systems, is currently around 33G. So unless you want users to be able to request old or alpha versions, we shouldn't need storage space much beyond this.

For example, we could probably just read-only mount https://dist.torproject.org/torbrowser/ inside that machine so that it wouldn't have to deal with adding/removing binaries at all, and new versions would be available immediately as they get released. What do you think?

I could make it so that it would prioritize looking from a specific directory before it goes on to download a file online, and use the online option as a fallback. However, my implementation here is tightly knit to that API to the core. I was thinking about this problem, actually, because of a similar implementation for Signal that will end up requiring storing the files. I barely have time as it is, so all of these new details for the project that I have been sporadically working on over the pandemic make me think that the development progress is around ~35%...-ish, instead of the 75%-80% that I had in the back of my head (but this is like the third time this is happening and the final product is getting progressively better, so that's fine). I'll open a new issue for this on my repo and ask for more details there.

I'd also add, for the record, that a complete Tor Browser release, all locales and operating systems, is currently around 33G.

Thanks for correcting me, that makes much more sense and I am not sure where my calculations went wrong. I had some previous version that added a timestamp to the local version, so it's very likely that I downloaded multiple copies of everything.

added Next label and removed Needs Information label

Actually, I'll disclose this: There's this edge case that all of the binaries available may be requested shortly after an update. Ensuring the integrity of binaries was on my list anyways, but if someone does that, even if I intend for it to work with as minimal resources as possible, countering that gets tricky. I have had a couple of designs for a "priority queue" in the back of my head that could counter this, but since I have nothing to show for it at the moment, I am solely acting based on the (most likely incorrect) assumption that it's impossible to make this work.

Although not 100% optimal, /srv/ is ok. I summed up the details under the following comment: https://gitlab.torproject.org/tpo/anti-censorship/gettor-project/onionsproutsbot/-/issues/6#note_2789369

This question will not have an impact on anything, but just asking out of curiosity: In the event of an emergency, can the space be temporarily bumped up? (I am already planning to implement safety nets so that nobody will have to intervene, as leaving the bot running in the background and having to do close to no maintenance whatsoever is a primary goal of my project, but still.)

On TPA systems, /srv is on a separate partition such that filling it up should not cause the operating system to become unresponsive or crash.

In the event of an emergency, can the space be temporarily bumped up?

It could, but how fast depends on whether someone from TPA is available to do it, as we don't have people on call 24/7.

The anti-censorship people will also be able to delete the files if needed, but we are neither 24/7, but at some point we should add it to our monitoring system to be alerted if something fails.

Considering this statement from @n0toose

the development progress is around ~35%...-ish, instead of the 75%-80% that I had in the back of my head

I'm moving this request to Backlog for now. Please let me know once we you think the development if far enough along to warrant working on a production deployment.

Thanks, I'll come back when is ready for it.

Hi, just to clarify: I initially thought that this feature was something that was to be done as an obligation, but after further investigation, I have come to realize that this method is far from productive right now. Apart from one minor bug that I have to fix, it's ready to go.

Nevermind, the bug does not actually exist/magically fixed itself. Yep, it's okay.

added Backlog label and removed Next label

added lifecycle label

added Needs Information label and removed Backlog label

Just FYI we've deployed a new HDD storage cluster lately and this means we could deploy a machine for you with a couple hundred GB's, no problem.

I wouldn't like to pose a burden to your limited resources, unless if there's a way for anyone to dynamically adjust the partition hosting the files dynamically using something like LVM in the event of an emergency or increased demand.

If 600GB of space works for you and is enough to forgo any risk of an emergency I'm pretty sure that's totally fine with us.

/cc @anarcat

If the bot runs out of computational resources (heavy demand, e.g. async is basically an abstraction layer for threads on Linux AFAIK), async will take care of it and just respond to requests whenever it has the capacity to do so. The bot can handle connection disruptions, and if a download/upload gets completely interrupted, it will carry on later. The only plausible time when this hypothetical risk of the disk space running out would be only around the time when there is an update to the browser, and after a file is uploaded once, it can be safely removed as it will have been cached on Telegram's servers. I have stress-tested the bot on a Gigabit connection under strained resources, and had tested it with medium-sized groups of people with 10 GB Of available storage. It's fine, the worst thing that has happened is that it would respond more slowly or that an error would pop up and the user would have to try again (or later).

As far as I have been able to tell, there's no urgent cause for concern whatsoever, I am just very worried about a theoretical attack vector because the design of my bot has some very minor flaws underneath that I willingly chose to ignore to get the job done. I will probably not be sure if that would be a problem, let alone to the point of the sustainability of the project coming to question, until this gets stress-tested. I doubt it, but I am expressing that I am not 100% confident about that. If you have nothing to do with the space right now, I would propose to just slap the bot on it and then monitor how much space it actually consumes in a real-world scenario and lower down the space accordingly.

In a bad scenario, the bot will have to be temporarily brought down and the storage space will have to be emptied out manually. In the worst case scenario, the SQLite database storing the IDs for cached versions of the uploaded files may have to be modified (or just completely nuked for convenience) and the bot will have to start over again as if there was a new update to the browser.

There's only one line that I have temporarily toggled so that temporary files won't get removed so that they get uploaded, and I did that because I was looking for a way to experiment with some UI for making signature verification easier for the user, but I can fix it within a couple of minutes:

https://gitlab.torproject.org/tpo/anti-censorship/gettor-project/onionsproutsbot/-/blob/rewrite/OnionSproutsBot/files.py#L43

I am just possibly excessively diligent.

There's only one line that I have temporarily toggled so that temporary files won't get removed so that they get uploaded, and I did that because I was looking for a way to experiment with some UI for making signature verification easier for the user, but I can fix it within a couple of minutes:

https://gitlab.torproject.org/tpo/anti-censorship/gettor-project/onionsproutsbot/-/commit/8a8c07457a4bea7d8b9a0a53107121a166e7410a

Jeez, I just had to fix the indentation and writing this entire block of text took me longer than making files delete themselves after an upload. Yeah, the risk of me trying to boost efficiency by taking as many concurrent uploads as possible is minimal. That said risk is now dependent on how many threads there are available, the available disk space and whether there was an update to any of the available downloads recently. I am absolutely not worried about it, it's not worth discussing any further unless if it becomes an actual problem, somehow, later.

@n0toose great, thanks for the changes. How much space do you think we should give it now? Will we be fine with the 'TPA standard 20GB'? Should we have a bit more just in case 40? 100?

I propose that we just proceed with 100GB on the iSCSI HDD storage cluster. It won't be as fast as SSD storage but if I understand correctly it should be fine considering the workload of this service, and at least we have plenty of disk space there.

Sounds good, thank you.

Hi, apologies for the delayed response. Sounds good to me.

mentioned in issue tpo/anti-censorship/team#78 (closed)

added Next label and removed Needs Information label

mentioned in issue tpo/community/l10n#40070 (closed)

Any news on this? is there any blockers for it? or just TPA being busy? I can wait for just checking that this ticket is not lost in the void.

I'll take care of it this week, @meskio, sorry for the delay.

No problem, I'm happy to see it moving.

added Doing label and removed Next label

We started the installation process but we hit some snags with the installer. It hasn't been tested extensively with the new iSCSI storage backend and still needs a few tweaks. Hopefully next week we'll have the machine deployed.

mentioned in issue #40775 (closed)

marked this issue as related to #40775 (closed)

@lavamind I managed to figure out a way to install and boot the VM. we're at the "next steps" step, which I will leave in your good hands:

https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/ganeti/#next-steps

@meskio I've deployed telegram-bot-01.torproject.org. It has a telegrambot role account from which I assume a Python virtualenv will be set up, and the daemon will execute. Please refer to the procedure to deploy systemd user services to set that up.

If I do not end up being the one picking this up, I would seriously appreciate submitting the unit file itself to the upstream repository if possible.

@lavamind thank you, I have access to the machine and to the telegrambot account. I'll work on the set in the next few days.

@n0toose sure, I'll submit the unit file as a merge request.

it seems there was an issue in the nagios configuration, as the configuratino block had the wrong name... i fixed it in

 [tor-nagios/master] dc710e1 2022-06-01 00:57:03 Antoine Beaupré <anarcat@debian.org>: fix typo in telegram-bot-01 entry (tpo/tpa/team#40683)

Thanks, that was indeed a typo on my part.

closed

space to deploy OnionSproutsBot

Designs

Child items ...

Activity