Slow queries on meronense.
Around the beginning of November I noticed the update script on meronense was taking longer than before. Since the end of September the script was running twice a day every 12h, but I had to switch it back to once per day around the 2nd week of November.
Then approximately a week ago I noticed the update script was taking more than 24h to finish (tpo/network-health/metrics/website#40073 (closed)) so I had to remove the cronjob, and start it manually, since otherwise two consecutive runs can overwrite the same data. It took approx between 24 and 48 hours to finish, after that I monitored it for a few days by starting it manually, and I have now reinstated the cronjob to run every 24h.
I was wondering what might have caused the issue in the first place and if is there a way that we can safely have the script run again twice a day.
Regarding the first questions I have a few theories. Either some routine job (like a vacuum) had some issue and made everything super slow, or maybe a postgresql update killed a run and then the next run took longer than the usual.
Regarding the second question, I could make a prometheus alert to let me know if a run is taking longer than Nh so that I can avoid data being overwritten. At the same time I was wondering if we can check how often we run routine maintenance like vaccum and such and maybe see if we can space those at a time when the update script isn't running.
I have also read postgresql15 does autovacuum by default now (https://www.postgresql.org/docs/current/sql-vacuum.html) so maybe we don't have to run it again when we do the backup?