OOM issue on meronense after upgrade
Noticed metrics.tpo is not getting all its updates since postgresql has been upgraded to v13.
I have started the script manually: https://gitlab.torproject.org/tpo/network-health/metrics/metrics-bin/-/blob/main/website/run-web.sh
And found out the job was being killed:
[308908.109696] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-0.slice/session-4020.scope,task=java,pid=375579,uid=1512
[308908.109723] Out of memory: Killed process 375579 (java) total-vm:14411748kB, anon-rss:7917568kB, file-rss:0kB, shmem-rss:32kB, UID:1512 pgtables:23120kB oom_score_adj:0
cc: @gk
No timeline items have been added yet.
- Show closed items
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
I was reading about: https://www.postgresql.org/docs/current/runtime-config-resource.html#GUC-WORK-MEM
I wonder if setting that parameter can help finish the job, which will indeed run more slowly, but at least it will complete.
Collapse replies
- anarcat changed issue type to incident
changed issue type to incident
i don't have instrumentation to watch per-preocess memory usage on meronense the same way i have it on materculae, so i'll set that up to have the data at least. related, materculae is still having those problems in #40815 (closed).
- anarcat marked this issue as related to #40815 (closed)
marked this issue as related to #40815 (closed)
instrumentation setup, samples tricking in to this new grafana dashboard:
https://grafana.torproject.org/d/LbhyBYq7k/per-process-memory-usage?orgId=1&var-instance=meronense.torproject.org&var-process=All&var-min_size=2000000000&from=now-24h&to=now
java and apache are currently the only two using more than 2GB of memory.
also, i was worried about meronense being impacted by the VACUUM jobs running, see #40809 (closed) for details on that. it seems the VACUUM operation finished though, here are the running queries:
datid | datname | pid | query_start | age | state | wait_event | query -------+-------------+--------+-------------------------------+-----------------+--------+---------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | 466770 | | | | AutoVacuumMain | | | 466773 | | | | LogicalLauncherMain | 16987 | postgres | 466805 | 2022-06-27 15:36:17.412411+00 | 00:00:52.56964 | idle | ClientRead | SELECT * FROM pg_stat_bgwriter; 16401 | ipv6servers | 488259 | 2022-06-27 10:59:00.841413+00 | 04:38:09.140638 | active | | SELECT valid_after_date, server, guard_relay, exit_relay, announced_ipv6, exiting_ipv6_relay, reachable_ipv6_relay, server_count_sum_avg, advertised_bandwidth_bytes_sum_avg FROM ipv6servers 16401 | ipv6servers | 508167 | 2022-06-27 15:07:48.585706+00 | 00:29:21.396345 | active | | SELECT valid_after_date, server, guard_relay, exit_relay, announced_ipv6, exiting_ipv6_relay, reachable_ipv6_relay, server_count_sum_avg, advertised_bandwidth_bytes_sum_avg FROM ipv6servers 16987 | postgres | 511248 | 2022-06-27 15:37:09.982051+00 | 00:00:00 | active | | SELECT datid,datname,pid,query_start,now()-query_start as AGE,state,wait_event,query FROM pg_stat_activity; | | 466768 | | | | BgWriterMain | | | 466767 | | | | CheckpointerMain | | | 466769 | | | | WalWriterMain | (9 rows)
- anarcat mentioned in issue #40809 (closed)
mentioned in issue #40809 (closed)
- anarcat added Doing Metrics PostgreSQL labels
added Doing Metrics PostgreSQL labels
- anarcat changed the severity to High - S2
changed the severity to High - S2
- anarcat marked this issue as related to #40809 (closed)
marked this issue as related to #40809 (closed)
so just for the record, here's the upgrade time line here, to keep things in perspective:
- 2022-06-20: ganeti host upgraded to backports (#40689 (closed))
- 2022-06-21: ganeti host upgraded to bullseye
- 2022-06-22: meronense bullseye upgrade (#40692 (closed))
- 2022-06-23: meronense postgres 13 upgrade (#40809 (closed))
- 2022-06-24: first OOM (this ticket)
- 2022-06-26: second and third OOMs
I looked at all the OOMs (
zgrep "Out of memory" /var/log/kern.log*
) and in all cases the process table dump doesn't show any postgresql process eating any significant amount of memory.so i think we should discard the idea that PostgreSQL is eating all memory here, this is a different scenario than the one on materculae, where Postgresql is clearly using a lot of memory. we will be able to confirm this with the per process stats of course, but this would probably be showing up in the OOM logs... here's an example:
Jun 26 14:48:32 meronense/meronense kernel: VM Thread invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 Jun 26 14:48:32 meronense/meronense kernel: CPU: 2 PID: 638 Comm: VM Thread Not tainted 5.10.0-15-amd64 #1 Debian 5.10.120-1 Jun 26 14:48:32 meronense/meronense kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 Jun 26 14:48:32 meronense/meronense kernel: Call Trace: Jun 26 14:48:32 meronense/meronense kernel: dump_stack+0x6b/0x83 Jun 26 14:48:32 meronense/meronense kernel: dump_header+0x4a/0x1f0 Jun 26 14:48:32 meronense/meronense kernel: oom_kill_process.cold+0xb/0x10 Jun 26 14:48:32 meronense/meronense kernel: out_of_memory+0x1bd/0x4e0 Jun 26 14:48:32 meronense/meronense kernel: __alloc_pages_slowpath.constprop.0+0xb8c/0xc60 Jun 26 14:48:32 meronense/meronense kernel: __alloc_pages_nodemask+0x2da/0x310 Jun 26 14:48:32 meronense/meronense kernel: pagecache_get_page+0x16d/0x380 Jun 26 14:48:32 meronense/meronense kernel: filemap_fault+0x69e/0x900 Jun 26 14:48:32 meronense/meronense kernel: ? xas_load+0x5/0x70 Jun 26 14:48:32 meronense/meronense kernel: ext4_filemap_fault+0x2d/0x40 [ext4] Jun 26 14:48:32 meronense/meronense kernel: __do_fault+0x37/0x170 Jun 26 14:48:32 meronense/meronense kernel: handle_mm_fault+0x11e7/0x1bf0 Jun 26 14:48:32 meronense/meronense kernel: ? timerqueue_add+0x96/0xb0 Jun 26 14:48:32 meronense/meronense kernel: do_user_addr_fault+0x1b8/0x3f0 Jun 26 14:48:32 meronense/meronense kernel: ? switch_fpu_return+0x40/0xb0 Jun 26 14:48:32 meronense/meronense kernel: exc_page_fault+0x78/0x160 Jun 26 14:48:32 meronense/meronense kernel: ? asm_exc_page_fault+0x8/0x30 Jun 26 14:48:32 meronense/meronense kernel: asm_exc_page_fault+0x1e/0x30 Jun 26 14:48:32 meronense/meronense kernel: RIP: 0033:0x7fca617824f0 Jun 26 14:48:32 meronense/meronense kernel: Code: Unable to access opcode bytes at RIP 0x7fca617824c6. Jun 26 14:48:32 meronense/meronense kernel: RSP: 002b:00007fca5c66bad8 EFLAGS: 00010202 Jun 26 14:48:32 meronense/meronense kernel: RAX: 0000000000000001 RBX: 00007fca58012870 RCX: 0000000000000000 Jun 26 14:48:32 meronense/meronense kernel: RDX: 00000000000003e8 RSI: 0000000000000001 RDI: 00007fca614acfc8 Jun 26 14:48:32 meronense/meronense kernel: RBP: 00007fca5c66bb20 R08: 0000000000000000 R09: 0000000000084cc2 Jun 26 14:48:32 meronense/meronense kernel: R10: 0000000000000000 R11: 0000000000000286 R12: 0000000000000000 Jun 26 14:48:32 meronense/meronense kernel: R13: 00000000000003e8 R14: 0000000000000001 R15: 0000000000000001 Jun 26 14:48:32 meronense/meronense kernel: Mem-Info: Jun 26 14:48:32 meronense/meronense kernel: active_anon:311708 inactive_anon:4720952 isolated_anon:0 active_file:52 inactive_file:77 isolated_file:0 unevictable:0 dirty:0 writeback:2 slab_reclaimable:17023 slab_unreclaimable:12319 mapped:7333 shmem:10611 pagetables:13413 bounce:0 free:38228 free_pcp:32 free_cma:0 Jun 26 14:48:32 meronense/meronense kernel: Node 0 active_anon:1246832kB inactive_anon:18883808kB active_file:208kB inactive_file:308kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:29332kB dirty:0kB writeback:8kB shmem:42444kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 10321920kB writeback_tmp:0kB kernel_stack:5072kB all_unreclaimable? no Jun 26 14:48:32 meronense/meronense kernel: Node 0 DMA free:15844kB min:52kB low:64kB high:76kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB Jun 26 14:48:32 meronense/meronense kernel: lowmem_reserve[]: 0 2962 19966 19966 19966 Jun 26 14:48:32 meronense/meronense kernel: Node 0 DMA32 free:77540kB min:10016kB low:13048kB high:16080kB reserved_highatomic:0KB active_anon:245524kB inactive_anon:2722368kB active_file:64kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129200kB managed:3063664kB mlocked:0kB pagetables:7228kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB Jun 26 14:48:32 meronense/meronense kernel: lowmem_reserve[]: 0 0 17004 17004 17004 Jun 26 14:48:32 meronense/meronense kernel: Node 0 Normal free:59528kB min:59560kB low:76972kB high:94384kB reserved_highatomic:0KB active_anon:1001308kB inactive_anon:16161220kB active_file:84kB inactive_file:144kB unevictable:0kB writepending:8kB present:17825792kB managed:17417656kB mlocked:0kB pagetables:46424kB bounce:0kB free_pcp:128kB local_pcp:8kB free_cma:0kB Jun 26 14:48:32 meronense/meronense kernel: lowmem_reserve[]: 0 0 0 0 0 Jun 26 14:48:32 meronense/meronense kernel: Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15844kB Jun 26 14:48:32 meronense/meronense kernel: Node 0 DMA32: 79*4kB (UME) 229*8kB (UME) 451*16kB (UME) 310*32kB (UME) 149*64kB (UE) 101*128kB (UE) 57*256kB (UME) 22*512kB (UE) 10*1024kB (ME) 0*2048kB 0*4096kB = 77844kB Jun 26 14:48:32 meronense/meronense kernel: Node 0 Normal: 665*4kB (UME) 699*8kB (UME) 853*16kB (UME) 420*32kB (UME) 183*64kB (UME) 68*128kB (UME) 15*256kB (ME) 1*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 60108kB Jun 26 14:48:32 meronense/meronense kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB Jun 26 14:48:32 meronense/meronense kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Jun 26 14:48:32 meronense/meronense kernel: 87775 total pagecache pages Jun 26 14:48:32 meronense/meronense kernel: 76973 pages in swap cache Jun 26 14:48:32 meronense/meronense kernel: Swap cache stats: add 5017735, delete 4940820, find 63636932/63833339 Jun 26 14:48:32 meronense/meronense kernel: Free swap = 0kB Jun 26 14:48:32 meronense/meronense kernel: Total swap = 4194300kB Jun 26 14:48:32 meronense/meronense kernel: 5242746 pages RAM Jun 26 14:48:32 meronense/meronense kernel: 0 pages HighMem/MovableOnly Jun 26 14:48:32 meronense/meronense kernel: 118439 pages reserved Jun 26 14:48:32 meronense/meronense kernel: 0 pages hwpoisoned Jun 26 14:48:32 meronense/meronense kernel: Tasks state (memory values in pages): Jun 26 14:48:32 meronense/meronense kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name Jun 26 14:48:32 meronense/meronense kernel: [ 252] 0 252 40718 598 335872 159 -250 systemd-journal Jun 26 14:48:32 meronense/meronense kernel: [ 278] 0 278 5818 204 73728 65 -1000 systemd-udevd Jun 26 14:48:32 meronense/meronense kernel: [ 543] 0 543 588 15 40960 3 0 acpid Jun 26 14:48:32 meronense/meronense kernel: [ 548] 106 548 2474 138 57344 74 -900 dbus-daemon Jun 26 14:48:32 meronense/meronense kernel: [ 555] 999 555 344752 4016 200704 361 0 prometheus-apac Jun 26 14:48:32 meronense/meronense kernel: [ 558] 999 558 327564 4841 241664 273 0 prometheus-node Jun 26 14:48:32 meronense/meronense kernel: [ 559] 999 559 308092 4304 208896 1131 0 prometheus-post Jun 26 14:48:32 meronense/meronense kernel: [ 560] 0 560 3950 136 69632 147 0 systemd-logind Jun 26 14:48:32 meronense/meronense kernel: [ 563] 0 563 1384 55 45056 8 0 atd Jun 26 14:48:32 meronense/meronense kernel: [ 567] 0 567 2611 0 65536 159 0 cron Jun 26 14:48:32 meronense/meronense kernel: [ 571] 118 571 1877 41 53248 90 0 ulogd Jun 26 14:48:32 meronense/meronense kernel: [ 597] 1512 597 605 0 40960 21 0 sh Jun 26 14:48:32 meronense/meronense kernel: [ 620] 1512 620 605 0 45056 20 0 start-web.sh Jun 26 14:48:32 meronense/meronense kernel: [ 625] 1512 625 2239669 125755 2088960 65915 0 java Jun 26 14:48:32 meronense/meronense kernel: [ 682] 1512 682 50433 546 229376 15770 0 Rserve Jun 26 14:48:32 meronense/meronense kernel: [ 738] 0 738 103296 4967 294912 2340 0 syslog-ng Jun 26 14:48:32 meronense/meronense kernel: [ 740] 0 740 6628 497 90112 1294 0 unattended-upgr Jun 26 14:48:32 meronense/meronense kernel: [ 742] 0 742 98238 1586 122880 1474 0 fail2ban-server Jun 26 14:48:32 meronense/meronense kernel: [ 744] 0 744 1587 47 49152 7 0 agetty Jun 26 14:48:32 meronense/meronense kernel: [ 749] 0 749 3314 81 61440 154 -1000 sshd Jun 26 14:48:32 meronense/meronense kernel: [ 750] 107 750 19078 47 65536 149 0 ntpd Jun 26 14:48:32 meronense/meronense kernel: [ 752] 2164 752 3782 126 65536 167 0 systemd Jun 26 14:48:32 meronense/meronense kernel: [ 761] 105 761 7419 1433 94208 1027 0 unbound Jun 26 14:48:32 meronense/meronense kernel: [ 764] 2164 764 42153 391 98304 323 0 (sd-pam) Jun 26 14:48:32 meronense/meronense kernel: [ 775] 111 775 2522 14 65536 168 -500 nrpe Jun 26 14:48:32 meronense/meronense kernel: [ 777] 0 777 1206 42 49152 7 0 agetty Jun 26 14:48:32 meronense/meronense kernel: [ 890] 119 890 16227 5052 176128 2880 0 tor Jun 26 14:48:32 meronense/meronense kernel: [ 893] 0 893 3804 480 73728 313 0 apache2 Jun 26 14:48:32 meronense/meronense kernel: [ 1073] 0 1073 10017 100 69632 71 0 master Jun 26 14:48:32 meronense/meronense kernel: [ 1075] 108 1075 10053 114 77824 62 0 qmgr Jun 26 14:48:32 meronense/meronense kernel: [ 1078] 110 1078 43089 145 225280 653 0 bacula-fd Jun 26 14:48:32 meronense/meronense kernel: [ 1238] 0 1238 3782 159 61440 134 0 systemd Jun 26 14:48:32 meronense/meronense kernel: [ 1239] 0 1239 41675 342 94208 373 0 (sd-pam) Jun 26 14:48:32 meronense/meronense kernel: [ 3094] 0 3094 2849 38 53248 189 0 ssh Jun 26 14:48:32 meronense/meronense kernel: [ 82216] 108 82216 10976 130 77824 132 0 tlsmgr Jun 26 14:48:32 meronense/meronense kernel: [ 171824] 0 171824 1896 59 57344 48 0 cron Jun 26 14:48:32 meronense/meronense kernel: [ 174724] 0 174724 2544 318 57344 435 0 screen Jun 26 14:48:32 meronense/meronense kernel: [ 174725] 0 174725 1784 329 49152 93 0 bash Jun 26 14:48:32 meronense/meronense kernel: [ 174727] 0 174727 2832 168 65536 18 0 sudo Jun 26 14:48:32 meronense/meronense kernel: [ 174728] 112 174728 4912 304 77824 39 0 psql Jun 26 14:48:32 meronense/meronense kernel: [ 342574] 33 342574 3420 219 65536 380 0 apache2 Jun 26 14:48:32 meronense/meronense kernel: [ 375288] 0 375288 4032 316 69632 10 0 sshd Jun 26 14:48:32 meronense/meronense kernel: [ 375297] 0 375297 1811 2 45056 457 0 bash Jun 26 14:48:32 meronense/meronense kernel: [ 375352] 112 375352 24248 960 131072 321 -900 postgres Jun 26 14:48:32 meronense/meronense kernel: [ 375355] 112 375355 24303 7023 176128 322 0 postgres Jun 26 14:48:32 meronense/meronense kernel: [ 375356] 112 375356 24285 6885 176128 330 0 postgres Jun 26 14:48:32 meronense/meronense kernel: [ 375357] 112 375357 24248 421 118784 330 0 postgres Jun 26 14:48:32 meronense/meronense kernel: [ 375358] 112 375358 24458 691 176128 379 0 postgres Jun 26 14:48:32 meronense/meronense kernel: [ 375359] 112 375359 16141 158 110592 325 0 postgres Jun 26 14:48:32 meronense/meronense kernel: [ 375360] 112 375360 16317 283 114688 338 0 postgres Jun 26 14:48:32 meronense/meronense kernel: [ 375361] 112 375361 24383 325 131072 382 0 postgres Jun 26 14:48:32 meronense/meronense kernel: [ 375423] 0 375423 2832 2 57344 185 0 sudo Jun 26 14:48:32 meronense/meronense kernel: [ 375424] 1512 375424 2326 17 53248 475 0 bash Jun 26 14:48:32 meronense/meronense kernel: [ 375440] 112 375440 25335 2356 180224 381 0 postgres Jun 26 14:48:32 meronense/meronense kernel: [ 378899] 0 378899 2612 24 61440 157 0 cron Jun 26 14:48:32 meronense/meronense kernel: [ 378904] 1512 378904 605 0 45056 20 0 sh Jun 26 14:48:32 meronense/meronense kernel: [ 378907] 1512 378907 1412 0 53248 66 0 run-web.sh Jun 26 14:48:32 meronense/meronense kernel: [ 378916] 1512 378916 3602809 1978504 23691264 930141 0 java Jun 26 14:48:32 meronense/meronense kernel: [ 387232] 112 387232 62424 17670 421888 1755 0 postgres Jun 26 14:48:32 meronense/meronense kernel: [ 400260] 33 400260 619649 8154 434176 234 0 apache2 Jun 26 14:48:32 meronense/meronense kernel: [ 403823] 0 403823 1778 2 57344 439 0 bash Jun 26 14:48:32 meronense/meronense kernel: [ 403888] 0 403888 2832 2 65536 185 0 sudo Jun 26 14:48:32 meronense/meronense kernel: [ 403889] 1512 403889 2293 1 61440 452 0 bash Jun 26 14:48:32 meronense/meronense kernel: [ 403943] 1512 403943 1412 0 45056 67 0 run-web.sh Jun 26 14:48:32 meronense/meronense kernel: [ 403945] 1512 403945 3537530 2786622 22831104 1865 0 java Jun 26 14:48:32 meronense/meronense kernel: [ 404231] 108 404231 10025 165 73728 0 0 pickup Jun 26 14:48:32 meronense/meronense kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/cron.service,task=java,pid=378916,uid=1512 Jun 26 14:48:32 meronense/meronense kernel: Out of memory: Killed process 378916 (java) total-vm:14411236kB, anon-rss:7913984kB, file-rss:0kB, shmem-rss:32kB, UID:1512 pgtables:23136kB oom_score_adj:0
there are a few possible courses of action here:
- we assign less memory to the JVM, and try again the coming night
- we give more memory to the VM again (last times: 12-16 in december, 16-20GB in february)
- we rollback to PostgreSQL 11 and we reload the data
- we upgrade to bookworm and PostgreSQL 14
- we downgrade the kernel to 4.19
This covers basically the entire stack but the KVM part.
- anarcat marked this issue as related to #40482 (closed)
marked this issue as related to #40482 (closed)
also, this is the previous incident with memory on meronense: #40482 (closed)
my two cents on those:
- we assign less memory to the JVM, and try again the coming night
- we give more memory to the VM again (last times: 12-16 in december, 16-20GB in february)
- we rollback to PostgreSQL 11 and we reload the data
- we upgrade to bookworm and PostgreSQL 14
- we downgrade the kernel to 4.19
i'm tempted to try (1) for now. i do not think postgresql is at fault here, because it's not in the OOM tables. last time we "fixed" this by throwing more RAM at meronense (in #40482 (closed)).
another data point, the node exporter gives us a counter that increments when the kernel invokes the oom killer:
it was invoked on:
- 2021-09-20
- 2021-09-26
- 10 times in october 2021
- 2021-11-21
- 2021-11-30
- 2022-05-06
- 2022-05-12 (twice)
- 2022-06-24
- 2022-06-26 (twice)
so this is not a new phenomenon and might not be related to the upgrade at all.
- Hiro mentioned in issue tpo/network-health/metrics/website#40059 (closed)
mentioned in issue tpo/network-health/metrics/website#40059 (closed)
/cc @gk
- Hiro changed the description
changed the description
- anarcat mentioned in issue #40815 (closed)
mentioned in issue #40815 (closed)
- Hiro marked this issue as related to tpo/network-health/metrics/website#40059 (closed)
marked this issue as related to tpo/network-health/metrics/website#40059 (closed)
so I'm not sure what was fixed or how, but in that other ticket (tpo/network-health/metrics/website#40059 (closed)), @hiro said:
I think if we look at the memory graph, we aren't in a bad shape at this point. I think we have to optimize the DB operations now so that it doesn't take more than 24hours to run the update.
so let's close this. in any case, we have instrumentation on meronense now that shows PostgreSQL is not the culprit, so to a certain extent, there's not much TPA can do to solve this here...
there's also a separate ticket to track performance problems with postgresql which might come back here in another form, but let's followup in tpo/network-health/metrics/website#40060 (closed) for now.
Collapse replies I have changed the garbage collector algorithm (tpo/network-health/metrics/website#40059 (comment 2817836))
https://gitlab.torproject.org/tpo/network-health/metrics/metrics-bin/-/blob/main/website/run-web.sh
1
- anarcat closed
closed
- anarcat changed the incident status to Resolved by closing the incident
changed the incident status to Resolved by closing the incident
@lavamind found that the JIT could be the culprit here, i'm going to try to turn it off here as well, see #40815 (comment 2823184) for a deeper discussion.
- anarcat reopened
reopened
i just did this runtime change:
root@meronense:~# sudo -u postgres psql could not change directory to "/root": Permission denied psql (13.7 (Debian 13.7-0+deb11u1)) Type "help" for help. postgres=# show jit; jit ----- on (1 row) postgres=# set jit to off; SET postgres=# show jit; jit ----- off (1 row)
- anarcat changed due date to July 27, 2022
changed due date to July 27, 2022
let's leave this one lying there for a couple of days to see the effect.
we probably won't see a change in the memory usage, because meronense doesn't have that problem anymore, but just in case, those are the last 7 days of per app usage:
https://grafana.torproject.org/d/LbhyBYq7k/per-process-memory-usage?orgId=1&var-instance=meronense.torproject.org&var-process=java&var-process=postgres&var-min_size=2000000&from=now-7d&to=now
https://grafana.torproject.org/d/Z7T7Cfemz/node-exporter-full?orgId=1&var-job=node&var-node=meronense.torproject.org&var-port=9100&from=now-7d&to=now&viewPanel=78&refresh=1m
and CPU usage, which may have a more noticeable impact:
https://grafana.torproject.org/d/Z7T7Cfemz/node-exporter-full?orgId=1&var-job=node&var-node=meronense.torproject.org&var-port=9100&from=now-7d&to=now&refresh=1m&viewPanel=77
Edited by anarcatCollapse replies hmm... i'm not sure this is due to the jit change though. the jit change was done about 24h ago. if you look at the last 3 days graph:
https://grafana.torproject.org/d/Z7T7Cfemz/node-exporter-full?orgId=1&var-job=node&var-node=meronense.torproject.org&var-port=9100&from=now-3d&to=now&refresh=1m&viewPanel=77
... it doesn't seem like the jit change had any impact: it was taking ~4h before, and it's still taking about ~4h. am i missing something?
am i missing something?
i don't think i am: it's quite possible the issue with meronense was different than the one on materculae, at least as far as the modified code goes. in other words: it's possible the JIT was causing memory issues only with the version of the code that we were running before @hiro made all that work to clean it up.
so i'm going to go on a limb here and pretend that the jit change won't change anything here, and that this ticket doesn't desserve further scrutiny. things are still resolved in any case.
- anarcat closed
closed
- anarcat changed the incident status to Resolved by closing the incident
changed the incident status to Resolved by closing the incident
- anarcat mentioned in commit wiki-replica@44a50bcd
mentioned in commit wiki-replica@44a50bcd
- anarcat marked this incident as related to #41515 (closed)
marked this incident as related to #41515 (closed)
- anarcat mentioned in incident #41515 (closed)
mentioned in incident #41515 (closed)