We would need to spin up a new machine with PostgreSQL, however if I'm reading things correctly, our main cluster gnt-fsn is getting a little crowded. How much memory do you think this machine would need @hiro?
I can test with 10GB to start and then we will see how much we grow in data over time. At the beginning I will be deleting and recreating table many times (although I already have a general schema of things - https://gitlab.torproject.org/hiro/timescaledb-docker).
:) ie. if you need a TLS connect, we can probably make one. i don't think we've ever set that up, but surely that can be done. i wonder if we can do TLS auth too... but i guess i'll mark that as "psql port needs to be open with TLS" kind of thing?
i keep wondering if we want to trust psql tls or setup ipsec for that kind of stuff... thoughts?
I will leave that to you. I just would like to run the code on my laptop and connect to the db, so that when I'll run a module on collector I can tell collector to connect to this DB... but yeah with auth please!
I will leave that to you. I just would like to run the code on my laptop and connect to the db, so that when I'll run a module on collector I can tell collector to connect to this DB...
--
Antoine Beaupré
torproject.org system administration
if you don't have a public IP (which we should really call a "static IP", or at least "rarely changing"), then we'd have to constantly update the firewall rules, which is not going to be practical. how often does your IP change anyways?
could bouncing through an SSH jump host be an acceptable compromise?
i don't want to overly complicate your life here. if you are comfortable with postgresql listening publicly on the network (at least during dev?) I'm fine with it too, provided we have a working TLS setup (which I still need to research).
let's create the VM first and see where we go next?
postgresql 13 setup, struggling to make the backup server connect to it.
also, it seems like it does have TLS setup, but it's with the "autocert" stuff, so you'll probably have to mess around with your local psql thingie to use the right CA for that to work.
i figured out the setup procedure for backups. i think.
the only thing missing here is you, bascially; i haven't looked at setting up timescaledb at all, nor at creating new accounts for your project. i think i should file a RFP bug to see if it's possible at all to get a Debian package going for timescale, in the long run... in the short term, i think we may want to build from source?
in any case, please document your work from here on, and let me know if you need help!
i looked a little more about filing an RFP for TimescaleDB, and I found that it's not actually licensed under an official, OSI-approved free software license. a part of timescale is licensed under Apache-2.0, and that's fine, but a look at their LICENSE file:
All source code should have information at the beginning of its respective file
which specifies its licensing information.
Outside of the "tsl" directory, source code in a given file is licensed
under the Apache License Version 2.0, unless otherwise noted (e.g., an
Apache-compatible license).
Within the "tsl" folder, source code in a given file is licensed under the
Timescale License, unless otherwise noted.
When built, separate shared object files are generated for the Apache-licensed
source code and the Timescale-licensed source code. The shared object binaries
that contain -tsl in their name are licensed under the Timescale License.
okay, so what's in that tsl/ folder? there you have another LICENSE file which is a custom license written specifically (presumably by lawyers, which I am not) for timescaleDB:
I haven't read the entirety of it, but it's pretty clear to me that this cannot be packaged in Debian at all, ever, under that license. Just clause 2.2 (prohibiting use in "software-as-a-service") breaks clause 6 of the Debian free software guidelines (AKA the "DFSG"). Those guidelines are also the basis for the open source initiative formal definition of the open source initiative definition.
The OSI hasn't actually made a formal decision on those types of licenses, because MongoDB retracted their application, but OSI actually made a statement on that license explicitly saying that it's not "open source".
Looping back to the timescaleDB license specifically, even assuming we might be willing to bypass the commitment to opensource, there is still a sensitive legal issue with the license itself in the way it's formulated. The core of the license (section 2. "GRANT OF LICENSES") states this:
(a) Internal Use. A license to copy, compile, install, and use the Timescale Software and Derivative Works solely for Your own internal business purposes in a manner that does not expose or give access to, directly or indirectly (e.g., via a wrapper), the Timescale Data Definition Interfaces or the Timescale Data Manipulation Interfaces to any person or entity other than You or Your employees and Contractors working on Your behalf.
So right there, we can't just use it for "internal use", because we are likely to expose derivative works (or "Timescale Data Definition Interfaces or the Timescale Data Manipulation Interfaces", which, frankly, i can't be bothered to lookup right now) to the public. So that use is out.
The next possible use is 2.1"(b) Value Added Products or Services". I have read and re-read that paragraph a few times now:
(b) Value Added Products or Services. A license (i) to copy, compile, install, and use the Timescale Software, Derivative Works, or parts thereof to develop and maintain Your Value Added Products or Services, (ii) to utilize (in the case of services) copies of the Timescale Software, Derivative Works, or parts thereof solely as incorporated into or utilized with Your Value Added Products or Services, and (iii) to distribute (in the case of products that are distributed to Your customers) copies of the Timescale Software binaries or of Derivative Works solely in binary form, and both solely as incorporated into or utilized with Your Value Added Products or Services; provided that (1) You notify Your customers that use of such Timescale Software or Derivative Works is subject to this TSL Agreement and You provide to each such customer a copy of the most current version of this TSL Agreement or a URL from which the most current version of this TSL Agreement may be obtained, and (2) the customer is prohibited, either contractually or technically, from defining, redefining, or modifying the database schema or other structural aspects of database objects, such as through use of the Timescale Data Definition Interfaces, in a Timescale Database utilized by such Value Added Products or Services.
this is a nightmare of compliance. i think we could make this work: we'd have to compile and install the program (without modification! otherwise we trip clause (d) Derivative works and that trips (a) Internal use) and run it publicly. But we'd have to really make sure that we have (1) a URL where people can fetch the license and (2) the customer is somehow prohibited with messing with the database.
At this point, if you want to use this, i think the only safe way forward is to hire a lawyer to read this through. In any case, I'll need to stop reading that horror because i sense a headache coming up.
In general, I must say this is another reason why I'm weary of using stuff that's not packaged in Debian for our projects. We have a very stringent copyright review policy and when something is in debian (main), you can trust that it's free software and you won't have to hire a lawyer to know whether you're even allowed to look at the code (let alone download, compile and run it). In this case we might be allowed to run it, but I am really not sure we want to waste any time going through the legal review (or risks) involved here.
i should also state, for the record, that I am not a lawyer and the above cannot, therefore, serve as legal advice.
Is there an alternative backend we could use here? A quick search in debian yields victoriametrics which is packaged in debian and offers a Prometheus-compatible interface, but with timescaledb-like scalability...
Apparently, I heard there is a way to package Timescale to skip the
problematic files somehow. This is what the Guix people did, so
presumably you don't need to ask permission as much as thread very
carefully, bascially not using any "tsl" stuff.
You might also want to look at this table, what you want to use is the
"apache 2" version:
Would it be ok if I try to get in touch with timescale and ask them permission or something?
Apparently, I heard there is a way to package Timescale to skip the
problematic files somehow. This is what the Guix people did, so
presumably you don't need to ask permission as much as thread very
carefully, bascially not using any "tsl" stuff.
--
Antoine Beaupré
torproject.org system administration
As long as you won’t provide TimescaleDB as a Hosted / Managed Product you’re good to go. No need to acquire a license. The license is only to prevent hosting companies (such as AWS, Microsoft, Digital Ocean, …) from offering TimescaleDB as their own managed service.
I mean we discussed timescaledb vs victoria-metrics already (tpo/network-health/metrics/collector#40012 (comment 2775514)) I need to have some of the data in a relational db and I was hoping to leverage some of the features offered by timescaledb plus the convenience to have everything in the same DB.
If we can't install the timescaledb plugin I'll still need postgres, maybe we can do everything with postgresql who knows
I mean we discussed timescaledb vs victoria-metrics already (tpo/network-health/metrics/collector#40012 (comment 2775514)) I need to have some of the data in a relational db and I was hoping to leverage some of the features offered by timescaledb plus the convenience to have everything in the same DB.
Right, that makes sense.
If we can't install the timescaledb plugin I'll still need postgres, maybe we can do everything with postgresql who knows
yeah, maybe rebuilding from scratch will work better? :)
i think this ticket can be closed, as far as TPA's concerned... @hiro do you want to move it to metrics, or open a new ticket there to track the timescaledb stuff?
do let us know how things go, and ideally link to this ticket so we can keep a trace of what's up. :) you never know, maybe we'll need to setup timescaledb for TPA too! ;)