Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • TPA team TPA team
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Issues 181
    • Issues 181
    • List
    • Boards
    • Service Desk
    • Milestones
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
  • Wiki
    • Wiki
  • Activity
  • Create a new issue
  • Issue Boards
Collapse sidebar
  • The Tor Project
  • TPA
  • TPA teamTPA team
  • Issues
  • #40427
Closed
Open
Incident created Sep 30, 2021 by anarcat@anarcatOwner

dsa-check-libs triggers the OOM-killer on shadow/chi-node-14

nagios is unhappy about services being down on chi-node-14:

10:03:44 <nsa> tor-nagios: [chi-node-14] process - ntpd is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, command name ntpd, args /usr/sbin/ntpd -p /var/run/ntpd.pid
10:04:44 <nsa> tor-nagios: [chi-node-14] process - postfix - master is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name master, args /usr/lib/postfix/sbin/master

according to systemd, at least ntpd was terminated with a SIGKILL, which, according to dmesg is the OOM-killer's fault:

[775967.370568] Out of memory: Killed process 2027654 (ntpd) total-vm:78480kB, anon-rss:224kB, file-rss:0kB, shmem-rss:0kB, UID:108 pgtables:68kB oom_score_adj:0

at first i thought it was shadow eating up all memory, but it's actually behaving pretty well. what's eating all memory is the dsa-check-libs process, or more specifically... lsof!

top - 14:06:31 up 8 days, 23:47,  1 user,  load average: 161.16, 136.99, 141.52
Tasks: 2123 total,  11 running, 2112 sleeping,   0 stopped,   0 zombie
%Cpu(s): 22.4 us, 39.9 sy,  0.0 ni, 36.9 id,  0.7 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 1546797.+total, 134029.6 free, 1410797.+used,   1970.0 buff/cache
MiB Swap:  30720.0 total,   3407.8 free,  27312.2 used. 129521.4 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                        
3338378 root      20   0  222.4g   2.5g 752744 S  3796   0.2 258:56.77 shadow                                                                                         
3258869 root      20   0   58.0g  57.8g   2200 R 100.0   3.8  11:42.23 dsa-check-libs                                                                                 
3273070 root      20   0   46.6g  46.3g   1228 R 100.0   3.1  12:50.24 dsa-check-libs                                                                                 
3287055 root      20   0   34.1g  33.4g   1268 R 100.0   2.2  14:09.52 dsa-check-libs                                                                                 
3302215 root      20   0   22.7g  22.6g   1236 R 100.0   1.5  15:37.86 dsa-check-libs                                                                                 
3316260 root      20   0   11.6g  11.6g    992 R 100.0   0.8  17:11.18 dsa-check-libs                                                                                 
3204203 root      20   0  174.0g 168.3g    284 R  99.3  11.1 504:46.41 lsof                                                                                           
3227095 root      20   0  157.3g 153.0g    312 R  99.3  10.1 447:42.66 lsof                                                                                           
3219968 root      20   0  169.8g 169.4g    288 R  98.4  11.2 478:26.18 lsof                                                                                           
3223533 root      20   0  163.7g 163.2g    300 R  97.4  10.8 463:12.19 lsof                                                                                           
3230838 root      20   0  151.0g 141.6g    332 D  85.3   9.4 433:40.74 lsof                                                                                           
3244850 root      20   0   71.8g  70.8g   2412 R  59.5   4.7   7:21.62 dsa-check-libs                                                                                 
    504 root      20   0       0      0      0 S  28.4   0.0  13:32.20 kcompactd0                                                                                     
3219967 root      20   0   71.8g  71.1g    320 S  20.3   4.7   7:25.55 dsa-check-libs                                                                                 
3227094 root      20   0   71.2g  69.1g    288 S  19.9   4.6   7:31.91 dsa-check-libs                                                                                 
3223530 root      20   0   68.4g  68.3g    320 S  19.3   4.5   7:02.61 dsa-check-libs                                                                                 
3204202 root      20   0   56.2g  55.3g    332 S  18.3   3.7   6:17.70 dsa-check-libs                                                                                 
3230837 root      20   0   69.2g  67.5g    280 S  16.3   4.5   7:12.65 dsa-check-libs                                                                                 
   4816 root      20   0 5922180  17728      0 S   2.0   0.0  36:54.13 containerd                                                                                     
3314923 root      20   0       0      0      0 I   2.0   0.0   0:21.53 kworker/u162:14-kcryptd/253:0                                                                  
3306182 root      20   0       0      0      0 I   1.6   0.0   0:21.63 kworker/u162:7-kcryptd/253:0                                                                   
3332128 root      20   0       0      0      0 I   1.6   0.0   0:02.36 kworker/u162:37-kcryptd/253:0                                                                  
3353566 root      20   0   12416   5948   3040 R   1.6   0.0   0:01.10 top                                                                                            
3322573 root      20   0       0      0      0 I   1.3   0.0   0:19.27 kworker/u162:19-kcryptd/253:0                                                                  
3330624 root      20   0       0      0      0 I   1.3   0.0   0:11.93 kworker/u162:24-kcryptd/253:0                                                                  
3353609 root      20   0       0      0      0 I   1.3   0.0   0:00.30 kworker/u162:2-kcryptd/253:0                                                                   
3328556 root      20   0       0      0      0 I   1.0   0.0   0:15.72 kworker/u162:28-kcryptd/253:0                                                                  
3348164 root      20   0  328704  29212  28576 S   1.0   0.0   0:03.55 tor                                                                                            
3326231 root      20   0       0      0      0 I   0.7   0.0   0:15.25 kworker/u162:21-kcryptd/253:0                                                                  
3348166 root      20   0  329084  28872  28236 S   0.7   0.0   0:04.14 tor                                                                                            
3349381 root      20   0  307164   6492   5856 S   0.7   0.0   0:00.18 tor                                                                                            
    215 root      20   0       0      0      0 S   0.3   0.0   3:33.53 ksoftirqd/40                                                                                   
   2883 root      20   0       0      0      0 S   0.3   0.0  30:47.42 dmcrypt_write/2                                                                                
   2920 root       0 -20       0      0      0 I   0.3   0.0   0:03.06 kworker/59:1H-kblockd                                                                          
   3322 root       0 -20       0      0      0 I   0.3   0.0   0:02.79 kworker/61:1H-kblockd                                                                          
   3330 root       0 -20       0      0      0 I   0.3   0.0   0:03.09 kworker/63:1H-kblockd                                                                          
   3391 root       0 -20       0      0      0 I   0.3   0.0   0:03.30 kworker/55:1H-kblockd                                                                          
3322041 root      20   0       0      0      0 I   0.3   0.0   0:08.65 kworker/u161:10-kcryptd/253:0                                                                  
3332025 root      20   0       0      0      0 I   0.3   0.0   0:03.59 kworker/0:1-events                                                                             
3338290 root      20   0 2267244  50872   3996 S   0.3   0.0   0:05.23 tornettools                                                                                    
3348191 root      20   0  307164   6396   5760 S   0.3   0.0   0:00.20 tor                                                                                            
3348281 root      20   0  307164   6364   5728 S   0.3   0.0   0:00.23 tor                                                                                            
3348295 root      20   0  307164   6544   5908 S   0.3   0.0   0:00.27 tor                                                                                            
3348495 root      20   0  307164   6484   5848 S   0.3   0.0   0:00.21 tor                                                                                            

weird, and should be investigated, because we're basically killing that box because of monitoring.

/cc @jnewsome

Assignee
Assign to
Time tracking