disk on weather-01 is almost full
- Show closed items
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
The partition is currently mostly used by postgresql's disk storage, and particularly the WAL is occupying 5.4Gb so the vast majority of pg storage.
According to graphs, disk usage started growing linearly on september 23rd
there's probably something that's been preventhing postgres from committing it's WAL and then getting rid of old entries.
this is interesting:
postgres=# SELECT slot_name, pg_wal_lsn_diff( pg_current_wal_lsn(), restart_lsn ) AS bytes_behind, active, wal_status FROM pg_replication_slots WHERE wal_status <> 'lost' ORDER BY restart_lsn; slot_name | bytes_behind | active | wal_status -----------+--------------+--------+------------ barman | 5811047248 | f | extended (1 row)
something related to barman is maybe holding up the wal behind?
cleared out the replication slot related to barman:
postgres=# select pg_drop_replication_slot('barman');
but the disk usage still hasn't shot down. the checkpoint interval seems to be set to 15mins (default value) so maybe I need to wait until postgres performs a checkpoint
nice, postgres finally cleaned up its WAL and disk usage is now back down to 37%. so the problem is solved.
FYI the query in the comment above for finding the replication slot was found on: https://www.cybertec-postgresql.com/en/why-does-my-pg_wal-keep-growing/
I don't see a mention of the problem and/or solution in our documentation -- most probably because we don't use replication anywhere, that's a technique that barman is adding to our setup -- so I'll add a tiny something before closing this incident.
- lelutin mentioned in commit wiki-replica@1ae70da4
mentioned in commit wiki-replica@1ae70da4
- lelutin closed
closed
- lelutin changed the incident status to Resolved by closing the incident
changed the incident status to Resolved by closing the incident