如何找出为什么进程在服务器上被杀死

前几天我在我们的服务器上运行了占用70%内存的工作。 当我在一天后登记进行检查时,该工作已经被杀死(它在终端中被“杀死”)。

我的问题是,我能找出发生了什么吗?

  1. 是不是因为另一个用户开始了一项内存占用率超过30%的工作,所以我的进程被杀了?
  2. 管理员杀了它吗?

有没有办法找出究竟发生了什么?

如果一个进程消耗了太多内存,那么内核“Out of Memory”(OOM)杀手将自动杀死有问题的进程。 听起来这可能发生在你的工作上。 内核日志应显示OOM杀手操作,因此使用“dmesg”命令查看发生的情况,例如

dmesg | less 

您将看到OOM杀手消息,如下所示:

 [ 54.125380] Out of memory: Kill process 8320 (stress-ng-brk) score 324 or sacrifice child [ 54.125382] Killed process 8320 (stress-ng-brk) total-vm:1309660kB, anon-rss:1287796kB, file-rss:76kB [ 54.522906] gmain invoked oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0 [ 54.522908] gmain cpuset=accounts-daemon.service mems_allowed=0 [ 54.522912] CPU: 6 PID: 1032 Comm: gmain Not tainted 4.4.0-0-generic #3-Ubuntu [ 54.522913] Hardware name: Intel Corporation Skylake Client platform/Skylake DT DDR4 RVP8, BIOS SKLSE2R1.R00.B089.B00.1506160228 06/16/2015 [ 54.522914] 0000000000000000 000000002d879fe9 ffff88016d727a58 ffffffff813d8604 [ 54.522915] ffff88016d727c50 ffff88016d727ac8 ffffffff8120272e 0000000000000015 [ 54.522916] 0000000000000000 ffff880080ab3600 ffff880086725880 ffff88016d727ab8 [ 54.522917] Call Trace: [ 54.522921] [] dump_stack+0x44/0x60 [ 54.522924] [] dump_header+0x5a/0x1c5 [ 54.522926] [] ? apparmor_capable+0xb8/0x120 [ 54.522928] [] oom_kill_process+0x202/0x3b0 [ 54.522929] [] out_of_memory+0x215/0x460 [ 54.522931] [] __alloc_pages_nodemask+0x9b0/0xb40 [ 54.522933] [] alloc_pages_current+0x8c/0x110 [ 54.522934] [] __page_cache_alloc+0xb5/0xc0 [ 54.522935] [] filemap_fault+0x14a/0x3f0 [ 54.522937] [] __do_fault+0x50/0xe0 [ 54.522938] [] handle_mm_fault+0xf92/0x1840 [ 54.522939] [] ? eventfd_ctx_read+0x67/0x210 [ 54.522941] [] __do_page_fault+0x197/0x400 [ 54.522942] [] do_page_fault+0x22/0x30 [ 54.522944] [] page_fault+0x28/0x30 [ 54.522945] Mem-Info: [ 54.522947] active_anon:788399 inactive_anon:33532 isolated_anon:0 active_file:83 inactive_file:37 isolated_file:0 unevictable:1 dirty:10 writeback:0 unstable:0 slab_reclaimable:5166 slab_unreclaimable:13868 mapped:5646 shmem:9752 pagetables:4476 bounce:0 free:7576 free_pcp:0 free_cma:0 [ 54.522948] Node 0 DMA free:15476kB min:28kB low:32kB high:40kB active_anon:144kB inactive_anon:216kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15888kB mlocked:0kB dirty:0kB writeback:0kB mapped:80kB shmem:80kB slab_reclaimable:0kB slab_unreclaimable:48kB kernel_stack:0kB pagetables:4kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [ 54.522951] lowmem_reserve[]: 0 2072 3862 3862 [ 54.522952] Node 0 DMA32 free:11220kB min:4204kB low:5252kB high:6304kB active_anon:1711968kB inactive_anon:80964kB active_file:236kB inactive_file:100kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2206296kB managed:2125964kB mlocked:0kB dirty:36kB writeback:0kB mapped:17948kB shmem:26240kB slab_reclaimable:8988kB slab_unreclaimable:26036kB kernel_stack:2656kB pagetables:9348kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:3776 all_unreclaimable? yes [ 54.522955] lowmem_reserve[]: 0 0 1790 1790 [ 54.522956] Node 0 Normal free:3608kB min:3628kB low:4532kB high:5440kB active_anon:1441484kB inactive_anon:52948kB active_file:96kB inactive_file:48kB unevictable:4kB isolated(anon):0kB isolated(file):0kB present:1900544kB managed:1833172kB mlocked:4kB dirty:4kB writeback:0kB mapped:4556kB shmem:12688kB slab_reclaimable:11676kB slab_unreclaimable:29388kB kernel_stack:2448kB pagetables:8552kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:924 all_unreclaimable? yes [ 54.522958] lowmem_reserve[]: 0 0 0 0 [ 54.522959] Node 0 DMA: 7*4kB (UME) 3*8kB (UM) 4*16kB (UME) 4*32kB (UME) 2*64kB (U) 4*128kB (UME) 1*256kB (E) 2*512kB (ME) 3*1024kB (UME) 1*2048kB (E) 2*4096kB (M) = 15476kB [ 54.522965] Node 0 DMA32: 118*4kB (UME) 36*8kB (UME) 62*16kB (UME) 94*32kB (UME) 34*64kB (UME) 24*128kB (UME) 5*256kB (UE) 1*512kB (U) 0*1024kB 0*2048kB 0*4096kB = 11800kB [ 54.522969] Node 0 Normal: 151*4kB (UME) 39*8kB (UME) 77*16kB (UME) 38*32kB (UME) 9*64kB (ME) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3940kB [ 54.522974] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 54.522974] Node 0 hugepages_total=256 hugepages_free=256 hugepages_surp=0 hugepages_size=2048kB [ 54.522975] 9932 total pagecache pages [ 54.522976] 0 pages in swap cache [ 54.522976] Swap cache stats: add 1831590, delete 1831590, find 5929/10969 [ 54.522977] Free swap = 0kB [ 54.522977] Total swap = 0kB [ 54.522978] 1030706 pages RAM [ 54.522978] 0 pages HighMem/MovableOnly [ 54.522979] 36950 pages reserved [ 54.522979] 0 pages cma reserved [ 54.522979] 0 pages hwpoisoned [ 54.522980] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name [ 54.522986] [ 285] 0 285 10173 1022 23 3 0 0 systemd-journal [ 54.522988] [ 312] 0 312 11192 266 22 3 0 -1000 systemd-udevd [ 54.522989] [ 623] 100 623 25590 569 20 4 6 0 systemd-timesyn [ 54.522990] [ 823] 0 823 5859 1723 14 3 0 0 dhclient [ 54.522991] [ 917] 0 917 7152 96 18 3 2 0 systemd-logind [ 54.522992] [ 936] 0 936 6310 223 16 3 0 0 smartd [ 54.522993] [ 943] 0 943 112847 523 72 3 9 0 NetworkManager [ 54.522993] [ 952] 0 952 84334 421 68 4 0 0 ModemManager [ 54.522994] [ 957] 0 957 4797 40 15 4 0 0 atd [ 54.522995] [ 961] 115 961 93456 912 80 4 0 0 whoopsie [ 54.522996] [ 963] 0 963 4865 65 13 3 0 0 irqbalance [ 54.522997] [ 964] 104 964 65667 224 30 4 9 0 rsyslogd [ 54.522998] [ 966] 0 966 23282 34 13 3 0 0 lxcfs [ 54.522999] [ 971] 105 971 10926 318 26 3 8 -900 dbus-daemon [ 54.523000] [ 1008] 0 1008 9570 82 25 3 0 0 cgmanager [ 54.523001] [ 1016] 0 1016 70808 240 41 3 0 0 accounts-daemon [ 54.523002] [ 1019] 0 1019 1119 46 8 3 0 0 ondemand [ 54.523003] [ 1022] 0 1022 7233 68 20 3 0 0 cron [ 54.523004] [ 1028] 109 1028 11218 97 26 3 3 0 avahi-daemon [ 54.523005] [ 1030] 0 1030 1807 20 10 3 0 0 sleep [ 54.523006] [ 1037] 109 1037 11185 82 25 3 0 0 avahi-daemon [ 54.523007] [ 1047] 0 1047 141966 2188 156 4 3 0 libvirtd [ 54.523008] [ 1053] 0 1053 13902 163 33 3 0 -1000 sshd [ 54.523009] [ 1057] 0 1057 69683 586 40 3 12 0 polkitd [ 54.523010] [ 1072] 0 1072 10963 134 24 3 0 0 wpa_supplicant [ 54.523011] [ 1081] 0 1081 87582 696 39 3 23 0 lightdm [ 54.523012] [ 1088] 0 1088 99946 6138 97 3 15 0 Xorg [ 54.523012] [ 1111] 0 1111 1099 45 8 3 0 0 acpid [ 54.523013] [ 1125] 0 1125 56533 191 47 4 14 0 lightdm [ 54.523014] [ 1129] 114 1129 11957 850 27 3 0 0 systemd [ 54.523015] [ 1130] 114 1130 15825 501 33 3 0 0 (sd-pam) [ 54.523029] [ 1136] 114 1136 30728 108 26 4 0 0 gnome-keyring-d [ 54.523030] [ 1138] 114 1138 1119 20 8 3 0 0 lightdm-greeter [ 54.523031] [ 1143] 114 1143 10743 145 25 3 13 0 dbus-daemon [ 54.523032] [ 1144] 114 1144 227063 2039 170 4 17 0 unity-greeter [ 54.523032] [ 1146] 114 1146 84488 626 34 3 0 0 at-spi-bus-laun [ 54.523033] [ 1151] 114 1151 10680 97 27 4 0 0 dbus-daemon [ 54.523034] [ 1153] 114 1153 51706 157 37 3 3 0 at-spi2-registr [ 54.523035] [ 1159] 114 1159 68584 154 37 3 0 0 gvfsd [ 54.523036] [ 1164] 114 1164 85325 145 32 3 0 0 gvfsd-fuse [ 54.523037] [ 1174] 114 1174 44626 121 23 3 3 0 dconf-service [ 54.523038] [ 1197] 0 1197 20665 147 44 3 0 0 lightdm [ 54.523038] [ 1201] 114 1201 11465 160 27 3 0 0 upstart [ 54.523039] [ 1204] 114 1204 144936 1323 136 4 4 0 nm-applet [ 54.523040] [ 1206] 114 1206 88647 256 41 3 26 0 indicator-messa [ 54.523041] [ 1207] 114 1207 83323 127 31 3 0 0 indicator-bluet [ 54.523042] [ 1208] 114 1208 122044 98 37 4 12 0 indicator-power [ 54.523043] [ 1209] 114 1209 132868 439 75 3 0 0 indicator-datet [ 54.523044] [ 1210] 114 1210 140272 1504 127 4 1 0 indicator-keybo [ 54.523045] [ 1211] 114 1211 134142 426 68 4 8 0 indicator-sound [ 54.523045] [ 1212] 114 1212 189042 260 47 4 0 0 indicator-sessi [ 54.523046] [ 1218] 114 1218 117391 350 89 4 0 0 indicator-appli [ 54.523047] [ 1232] 0 1232 7973 81 20 3 11 0 bluetoothd [ 54.523048] [ 1238] 114 1238 152474 1084 129 3 15 0 unity-settings- [ 54.523049] [ 1261] 114 1261 104039 719 78 4 0 0 pulseaudio [ 54.523050] [ 1272] 120 1272 45874 77 24 3 1 0 rtkit-daemon [ 54.523051] [ 1293] 0 1293 68995 324 53 3 12 0 upowerd [ 54.523052] [ 1296] 114 1296 15493 366 33 3 0 0 gconfd-2 [ 54.523053] [ 1342] 110 1342 75254 1170 49 3 0 0 colord [ 54.523054] [ 1429] 113 1429 12484 98 27 3 0 0 dnsmasq [ 54.523054] [ 1430] 0 1430 12477 94 27 3 0 0 dnsmasq [ 54.523055] [ 1514] 0 1514 22408 226 49 3 0 0 sshd [ 54.523056] [ 1570] 1000 1570 11958 853 26 3 0 0 systemd [ 54.523057] [ 1571] 1000 1571 15825 501 33 3 0 0 (sd-pam) [ 54.523058] [ 1631] 1000 1631 22408 244 46 3 0 0 sshd [ 54.523058] [ 1632] 1000 1632 5779 619 16 3 0 0 bash [ 54.523059] [ 1692] 118 1692 11320 77 25 3 14 0 kerneloops [ 54.523060] [ 1745] 0 1745 3964 41 13 3 0 0 agetty [ 54.523061] [ 1768] 125 1768 13192 98 27 3 0 0 dnsmasq [ 54.523062] [ 2276] 126 2276 32160 388 58 3 0 0 exim4 [ 54.523062] [ 8310] 1000 8310 5508 661 14 3 0 0 stress-ng [ 54.523063] [ 8311] 1000 8311 5508 49 13 3 0 0 stress-ng-brk [ 54.523064] [ 8312] 1000 8312 5508 46 13 3 0 0 stress-ng-brk [ 54.523065] [ 8313] 1000 8313 5508 46 13 3 0 0 stress-ng-brk [ 54.523065] [ 8314] 1000 8314 5508 46 13 3 0 0 stress-ng-brk [ 54.523066] [ 8321] 1000 8321 365871 360407 717 4 0 0 stress-ng-brk [ 54.523067] [ 8322] 1000 8322 239424 233959 470 3 0 0 stress-ng-brk [ 54.523068] [ 8323] 1000 8323 143599 138152 283 3 0 0 stress-ng-brk [ 54.523069] [ 8324] 1000 8324 54613 49145 109 3 0 0 stress-ng-brk [ 54.523070] Out of memory: Kill process 8321 (stress-ng-brk) score 363 or sacrifice child [ 54.523072] Killed process 8321 (stress-ng-brk) total-vm:1463484kB, anon-rss:1441628kB, file-rss:0kB 

但是,此消息可能已从内核日志中清除,因此可能需要检查内核日志/var/log/kern.log*

Linux的默认虚拟内存设置是过度提交内存。 这意味着内核将允许一个内存分配比可用内存更多的内存,允许进程内存映射大区域,因为通常不会使用分配中的所有页面。 但是,有时进程会读取/写入所有过度提交的页面,并且内核无法提供足够的物理内存+交换,因此OOM杀手会尝试找到最佳候选过载进程并将其终止。

因此,如果您希望立即查看内核日志,则会使用以下bash脚本将其包装起来:

 #!/bin/bash your_job_here ret=$? # # returns > 127 are a SIGNAL # if [ $ret -gt 127 ]; then sig=$((ret - 128)) echo "Got SIGNAL $sig" if [ $sig -eq $(kill -l SIGKILL) ]; then echo "process was killed with SIGKILL" dmesg > $HOME/dmesg-kill.log fi fi 

注意:“your_job_here”是您要运行的程序/作业的名称。 此脚本检查程序的返回代码,并检查它是否被SIGKILL杀死,如果是,则会立即将dmesg转储到名为dmesg-kill.log的文件中的主目录中

希望有所帮助