* [PVE-User] A less aggressive OOM? @ 2025-07-07 9:26 Marco Gaiarin 2025-07-07 21:39 ` Victor Rodriguez 2025-07-08 12:05 ` Roland via pve-user 0 siblings, 2 replies; 10+ messages in thread From: Marco Gaiarin @ 2025-07-07 9:26 UTC (permalink / raw) To: pve-user We have upgraded a set of clusters from PVE6 to PVE8, and we have found that in newer kernels, OOM is a bit more 'aggressive' and sometime kill a VMs. Nodes have plently of RAM (64GB, VMs are 2-3, each 8GB ram), VMs have qemu agent installed and ballooning enabled, but still sometime OOM happen. Clearly, if get OOM the main VMs that have the local DNS, we get some trouble. I've looked in PVE wiki, but found nothing. There's some way to relax OOM, or control their behaviour? In nodes there's no swap, so probably the best thing to do (but the hardest one ;-) is to setup some swap with a lower swappiness, but i'm seeking feedback. Thanks. -- _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PVE-User] A less aggressive OOM? 2025-07-07 9:26 [PVE-User] A less aggressive OOM? Marco Gaiarin @ 2025-07-07 21:39 ` Victor Rodriguez 2025-07-08 16:31 ` Marco Gaiarin 2025-07-08 12:05 ` Roland via pve-user 1 sibling, 1 reply; 10+ messages in thread From: Victor Rodriguez @ 2025-07-07 21:39 UTC (permalink / raw) To: Proxmox VE user list, Marco Gaiarin Hi, I would start by analyzing the memory status at the time of the OOM. There should be a some lines in journal/syslog were the kernel writes what the memory looked like and you can figure out why it had to kill a process. Makes few sense that OOM triggers in 64GB hosts with just 24GB configured in VMs and, probably, less real usage. IMHO it's not VMs what fill your memory up to the point of OOM, but some other process, ZFS ARC, maybe even some mem leak. Maybe some process is producing severe memory fragmentation. Regards, On 7/7/25 11:26, Marco Gaiarin wrote: > We have upgraded a set of clusters from PVE6 to PVE8, and we have found that > in newer kernels, OOM is a bit more 'aggressive' and sometime kill a VMs. > > Nodes have plently of RAM (64GB, VMs are 2-3, each 8GB ram), VMs have qemu > agent installed and ballooning enabled, but still sometime OOM happen. > Clearly, if get OOM the main VMs that have the local DNS, we get some > trouble. > > > I've looked in PVE wiki, but found nothing. There's some way to relax OOM, > or control their behaviour? > > In nodes there's no swap, so probably the best thing to do (but the hardest > one ;-) is to setup some swap with a lower swappiness, but i'm seeking > feedback. > > > Thanks. > -- _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PVE-User] A less aggressive OOM? 2025-07-07 21:39 ` Victor Rodriguez @ 2025-07-08 16:31 ` Marco Gaiarin 2025-07-10 8:56 ` Victor Rodriguez 0 siblings, 1 reply; 10+ messages in thread From: Marco Gaiarin @ 2025-07-08 16:31 UTC (permalink / raw) To: Victor Rodriguez, Roland; +Cc: Proxmox VE user list Mandi! Victor Rodriguez In chel di` si favelave... > I would start by analyzing the memory status at the time of the OOM. There > should be a some lines in journal/syslog were the kernel writes what the > memory looked like and you can figure out why it had to kill a process. This is the full OOM log: Jul 4 20:00:12 pppve1 kernel: [3375931.660119] kvm invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0 Jul 4 20:00:12 pppve1 kernel: [3375931.669158] CPU: 1 PID: 4088 Comm: kvm Tainted: P O 6.8.12-10-pve #1 Jul 4 20:00:12 pppve1 kernel: [3375931.677778] Hardware name: Dell Inc. PowerEdge T440/021KCD, BIOS 2.24.0 04/02/2025 Jul 4 20:00:12 pppve1 kernel: [3375931.686211] Call Trace: Jul 4 20:00:12 pppve1 kernel: [3375931.689504] <TASK> Jul 4 20:00:12 pppve1 kernel: [3375931.692428] dump_stack_lvl+0x76/0xa0 Jul 4 20:00:12 pppve1 kernel: [3375931.696915] dump_stack+0x10/0x20 Jul 4 20:00:12 pppve1 kernel: [3375931.701057] dump_header+0x47/0x1f0 Jul 4 20:00:12 pppve1 kernel: [3375931.705358] oom_kill_process+0x110/0x240 Jul 4 20:00:12 pppve1 kernel: [3375931.710169] out_of_memory+0x26e/0x560 Jul 4 20:00:12 pppve1 kernel: [3375931.714707] __alloc_pages+0x10ce/0x1320 Jul 4 20:00:12 pppve1 kernel: [3375931.719422] alloc_pages_mpol+0x91/0x1f0 Jul 4 20:00:12 pppve1 kernel: [3375931.724136] alloc_pages+0x54/0xb0 Jul 4 20:00:12 pppve1 kernel: [3375931.728320] __get_free_pages+0x11/0x50 Jul 4 20:00:12 pppve1 kernel: [3375931.732938] __pollwait+0x9e/0xe0 Jul 4 20:00:12 pppve1 kernel: [3375931.737015] eventfd_poll+0x2c/0x70 Jul 4 20:00:12 pppve1 kernel: [3375931.741261] do_sys_poll+0x2f4/0x610 Jul 4 20:00:12 pppve1 kernel: [3375931.745587] ? __pfx___pollwait+0x10/0x10 Jul 4 20:00:12 pppve1 kernel: [3375931.750332] ? __pfx_pollwake+0x10/0x10 Jul 4 20:00:12 pppve1 kernel: [3375931.754900] ? __pfx_pollwake+0x10/0x10 Jul 4 20:00:12 pppve1 kernel: [3375931.759463] ? __pfx_pollwake+0x10/0x10 Jul 4 20:00:12 pppve1 kernel: [3375931.764011] ? __pfx_pollwake+0x10/0x10 Jul 4 20:00:12 pppve1 kernel: [3375931.768617] ? __pfx_pollwake+0x10/0x10 Jul 4 20:00:12 pppve1 kernel: [3375931.773165] ? __pfx_pollwake+0x10/0x10 Jul 4 20:00:12 pppve1 kernel: [3375931.777688] ? __pfx_pollwake+0x10/0x10 Jul 4 20:00:12 pppve1 kernel: [3375931.782156] ? __pfx_pollwake+0x10/0x10 Jul 4 20:00:12 pppve1 kernel: [3375931.786622] ? __pfx_pollwake+0x10/0x10 Jul 4 20:00:12 pppve1 kernel: [3375931.791111] __x64_sys_ppoll+0xde/0x170 Jul 4 20:00:12 pppve1 kernel: [3375931.795656] x64_sys_call+0x1818/0x2480 Jul 4 20:00:12 pppve1 kernel: [3375931.800193] do_syscall_64+0x81/0x170 Jul 4 20:00:12 pppve1 kernel: [3375931.804485] ? __x64_sys_ppoll+0xf2/0x170 Jul 4 20:00:12 pppve1 kernel: [3375931.809100] ? syscall_exit_to_user_mode+0x86/0x260 Jul 4 20:00:12 pppve1 kernel: [3375931.814566] ? do_syscall_64+0x8d/0x170 Jul 4 20:00:12 pppve1 kernel: [3375931.818979] ? syscall_exit_to_user_mode+0x86/0x260 Jul 4 20:00:12 pppve1 kernel: [3375931.824425] ? do_syscall_64+0x8d/0x170 Jul 4 20:00:12 pppve1 kernel: [3375931.828825] ? clear_bhb_loop+0x15/0x70 Jul 4 20:00:12 pppve1 kernel: [3375931.833211] ? clear_bhb_loop+0x15/0x70 Jul 4 20:00:12 pppve1 kernel: [3375931.837579] ? clear_bhb_loop+0x15/0x70 Jul 4 20:00:12 pppve1 kernel: [3375931.841928] entry_SYSCALL_64_after_hwframe+0x78/0x80 Jul 4 20:00:12 pppve1 kernel: [3375931.847482] RIP: 0033:0x765bb1ce8316 Jul 4 20:00:12 pppve1 kernel: [3375931.851577] Code: 7c 24 08 e8 2c 95 f8 ff 4c 8b 54 24 18 48 8b 74 24 10 41 b8 08 00 00 00 41 89 c1 48 8b 7c 24 08 4c 89 e2 b8 0f 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 32 44 89 cf 89 44 24 08 e8 76 95 f8 ff 8b 44 Jul 4 20:00:12 pppve1 kernel: [3375931.871194] RSP: 002b:00007fff2d39ea20 EFLAGS: 00000293 ORIG_RAX: 000000000000010f Jul 4 20:00:12 pppve1 kernel: [3375931.879298] RAX: ffffffffffffffda RBX: 00006045d3e68470 RCX: 0000765bb1ce8316 Jul 4 20:00:12 pppve1 kernel: [3375931.886963] RDX: 00007fff2d39ea40 RSI: 0000000000000010 RDI: 00006045d4de5f20 Jul 4 20:00:12 pppve1 kernel: [3375931.894630] RBP: 00007fff2d39eaac R08: 0000000000000008 R09: 0000000000000000 Jul 4 20:00:12 pppve1 kernel: [3375931.902299] R10: 0000000000000000 R11: 0000000000000293 R12: 00007fff2d39ea40 Jul 4 20:00:12 pppve1 kernel: [3375931.909951] R13: 00006045d3e68470 R14: 00006045b014d570 R15: 00007fff2d39eab0 Jul 4 20:00:12 pppve1 kernel: [3375931.917656] </TASK> Jul 4 20:00:12 pppve1 kernel: [3375931.920515] Mem-Info: Jul 4 20:00:12 pppve1 kernel: [3375931.923465] active_anon:4467063 inactive_anon:2449638 isolated_anon:0 Jul 4 20:00:12 pppve1 kernel: [3375931.923465] active_file:611 inactive_file:303 isolated_file:0 Jul 4 20:00:12 pppve1 kernel: [3375931.923465] unevictable:39551 dirty:83 writeback:237 Jul 4 20:00:12 pppve1 kernel: [3375931.923465] slab_reclaimable:434580 slab_unreclaimable:1792355 Jul 4 20:00:12 pppve1 kernel: [3375931.923465] mapped:571491 shmem:581427 pagetables:26365 Jul 4 20:00:12 pppve1 kernel: [3375931.923465] sec_pagetables:11751 bounce:0 Jul 4 20:00:12 pppve1 kernel: [3375931.923465] kernel_misc_reclaimable:0 Jul 4 20:00:12 pppve1 kernel: [3375931.923465] free:234516 free_pcp:5874 free_cma:0 Jul 4 20:00:12 pppve1 kernel: [3375931.969518] Node 0 active_anon:17033436kB inactive_anon:10633368kB active_file:64kB inactive_file:3196kB unevictable:158204kB isolated(anon):0kB isolated(file):0kB mapped:2285988kB dirty:356kB writeback:948kB shmem:2325708kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:866304kB writeback_tmp:0kB kernel_stack:11520kB pagetables:105460kB sec_pagetables:47004kB all_unreclaimable? no Jul 4 20:00:12 pppve1 kernel: [3375932.004977] Node 0 DMA free:11264kB boost:0kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB Jul 4 20:00:12 pppve1 kernel: [3375932.032646] lowmem_reserve[]: 0 1527 63844 63844 63844 Jul 4 20:00:12 pppve1 kernel: [3375932.038675] Node 0 DMA32 free:252428kB boost:0kB min:1616kB low:3176kB high:4736kB reserved_highatomic:2048KB active_anon:310080kB inactive_anon:986436kB active_file:216kB inactive_file:0kB unevictable:0kB writepending:0kB present:1690624kB managed:1623508kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB Jul 4 20:00:12 pppve1 kernel: [3375932.069110] lowmem_reserve[]: 0 0 62317 62317 62317 Jul 4 20:00:12 pppve1 kernel: [3375932.074979] Node 0 Normal free:814396kB boost:290356kB min:356304kB low:420116kB high:483928kB reserved_highatomic:346112KB active_anon:11258684kB inactive_anon:15111580kB active_file:0kB inactive_file:2316kB unevictable:158204kB writepending:1304kB present:65011712kB managed:63820796kB mlocked:155132kB bounce:0kB free_pcp:12728kB local_pcp:0kB free_cma:0kB Jul 4 20:00:12 pppve1 kernel: [3375932.109188] lowmem_reserve[]: 0 0 0 0 0 Jul 4 20:00:12 pppve1 kernel: [3375932.114119] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB Jul 4 20:00:12 pppve1 kernel: [3375932.127796] Node 0 DMA32: 5689*4kB (UMH) 1658*8kB (UMH) 381*16kB (UM) 114*32kB (UME) 97*64kB (UME) 123*128kB (UMEH) 87*256kB (MEH) 96*512kB (UMEH) 58*1024kB (UME) 5*2048kB (UME) 11*4096kB (ME) = 253828kB Jul 4 20:00:12 pppve1 kernel: [3375932.148050] Node 0 Normal: 16080*4kB (UMEH) 36886*8kB (UMEH) 22890*16kB (UMEH) 4687*32kB (MEH) 159*64kB (UMEH) 10*128kB (UE) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 887088kB Jul 4 20:00:12 pppve1 kernel: [3375932.165899] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB Jul 4 20:00:12 pppve1 kernel: [3375932.175876] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Jul 4 20:00:12 pppve1 kernel: [3375932.185569] 586677 total pagecache pages Jul 4 20:00:12 pppve1 kernel: [3375932.190737] 0 pages in swap cache Jul 4 20:00:12 pppve1 kernel: [3375932.195285] Free swap = 0kB Jul 4 20:00:12 pppve1 kernel: [3375932.199404] Total swap = 0kB Jul 4 20:00:12 pppve1 kernel: [3375932.203513] 16679583 pages RAM Jul 4 20:00:12 pppve1 kernel: [3375932.207787] 0 pages HighMem/MovableOnly Jul 4 20:00:12 pppve1 kernel: [3375932.212819] 314667 pages reserved Jul 4 20:00:12 pppve1 kernel: [3375932.217321] 0 pages hwpoisoned Jul 4 20:00:12 pppve1 kernel: [3375932.221525] Tasks state (memory values in pages): Jul 4 20:00:12 pppve1 kernel: [3375932.227400] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name Jul 4 20:00:12 pppve1 kernel: [3375932.239680] [ 1959] 106 1959 1971 544 96 448 0 61440 0 0 rpcbind Jul 4 20:00:12 pppve1 kernel: [3375932.251672] [ 1982] 104 1982 2350 672 160 512 0 57344 0 -900 dbus-daemon Jul 4 20:00:12 pppve1 kernel: [3375932.264020] [ 1991] 0 1991 1767 275 83 192 0 57344 0 0 ksmtuned Jul 4 20:00:12 pppve1 kernel: [3375932.276094] [ 1995] 0 1995 69541 480 64 416 0 86016 0 0 pve-lxc-syscall Jul 4 20:00:12 pppve1 kernel: [3375932.289686] [ 2002] 0 2002 1330 384 32 352 0 53248 0 0 qmeventd Jul 4 20:00:12 pppve1 kernel: [3375932.301811] [ 2003] 0 2003 55449 727 247 480 0 86016 0 0 rsyslogd Jul 4 20:00:12 pppve1 kernel: [3375932.313919] [ 2004] 0 2004 3008 928 448 480 0 69632 0 0 smartd Jul 4 20:00:12 pppve1 kernel: [3375932.325834] [ 2009] 0 2009 6386 992 224 768 0 77824 0 0 systemd-logind Jul 4 20:00:12 pppve1 kernel: [3375932.339438] [ 2010] 0 2010 584 256 0 256 0 40960 0 -1000 watchdog-mux Jul 4 20:00:12 pppve1 kernel: [3375932.352936] [ 2021] 0 2021 60174 928 256 672 0 90112 0 0 zed Jul 4 20:00:12 pppve1 kernel: [3375932.364626] [ 2136] 0 2136 75573 256 64 192 0 86016 0 -1000 lxcfs Jul 4 20:00:12 pppve1 kernel: [3375932.376485] [ 2397] 0 2397 2208 480 64 416 0 61440 0 0 lxc-monitord Jul 4 20:00:12 pppve1 kernel: [3375932.389169] [ 2421] 0 2421 40673 454 70 384 0 73728 0 0 apcupsd Jul 4 20:00:12 pppve1 kernel: [3375932.400685] [ 2426] 0 2426 3338 428 172 256 0 69632 0 0 iscsid Jul 4 20:00:12 pppve1 kernel: [3375932.412121] [ 2427] 0 2427 3464 3343 431 2912 0 77824 0 -17 iscsid Jul 4 20:00:12 pppve1 kernel: [3375932.423754] [ 2433] 0 2433 3860 1792 320 1472 0 77824 0 -1000 sshd Jul 4 20:00:12 pppve1 kernel: [3375932.435208] [ 2461] 0 2461 189627 2688 1344 1344 0 155648 0 0 dsm_ism_srvmgrd Jul 4 20:00:12 pppve1 kernel: [3375932.448290] [ 2490] 113 2490 4721 750 142 608 0 61440 0 0 chronyd Jul 4 20:00:12 pppve1 kernel: [3375932.459988] [ 2492] 113 2492 2639 502 118 384 0 61440 0 0 chronyd Jul 4 20:00:12 pppve1 kernel: [3375932.471684] [ 2531] 0 2531 1469 448 32 416 0 49152 0 0 agetty Jul 4 20:00:12 pppve1 kernel: [3375932.483269] [ 2555] 0 2555 126545 673 244 429 0 147456 0 0 rrdcached Jul 4 20:00:12 pppve1 kernel: [3375932.483275] [ 2582] 0 2582 155008 15334 3093 864 11377 434176 0 0 pmxcfs Jul 4 20:00:12 pppve1 kernel: [3375932.506653] [ 2654] 0 2654 10667 614 134 480 0 77824 0 0 master Jul 4 20:00:12 pppve1 kernel: [3375932.517986] [ 2656] 107 2656 10812 704 160 544 0 73728 0 0 qmgr Jul 4 20:00:12 pppve1 kernel: [3375932.529118] [ 2661] 0 2661 139892 41669 28417 2980 10272 405504 0 0 corosync Jul 4 20:00:12 pppve1 kernel: [3375932.540553] [ 2662] 0 2662 1653 576 32 544 0 53248 0 0 cron Jul 4 20:00:12 pppve1 kernel: [3375932.551657] [ 2664] 0 2664 1621 480 96 384 0 57344 0 0 proxmox-firewal Jul 4 20:00:12 pppve1 kernel: [3375932.564093] [ 3164] 0 3164 83332 26227 25203 768 256 360448 0 0 pve-firewall Jul 4 20:00:12 pppve1 kernel: [3375932.576192] [ 3233] 0 3233 85947 28810 27242 1216 352 385024 0 0 pvestatd Jul 4 20:00:12 pppve1 kernel: [3375932.587638] [ 3417] 0 3417 93674 36011 35531 480 0 438272 0 0 pvedaemon Jul 4 20:00:12 pppve1 kernel: [3375932.599167] [ 3421] 0 3421 95913 37068 35884 1120 64 454656 0 0 pvedaemon worke Jul 4 20:00:12 pppve1 kernel: [3375932.611536] [ 3424] 0 3424 96072 36972 35852 1088 32 454656 0 0 pvedaemon worke Jul 4 20:00:12 pppve1 kernel: [3375932.623977] [ 3426] 0 3426 96167 37068 35948 1056 64 458752 0 0 pvedaemon worke Jul 4 20:00:12 pppve1 kernel: [3375932.636698] [ 3558] 0 3558 90342 29540 28676 608 256 385024 0 0 pve-ha-crm Jul 4 20:00:12 pppve1 kernel: [3375932.648477] [ 3948] 33 3948 94022 37705 35849 1856 0 471040 0 0 pveproxy Jul 4 20:00:12 pppve1 kernel: [3375932.660083] [ 3954] 33 3954 21688 14368 12736 1632 0 221184 0 0 spiceproxy Jul 4 20:00:12 pppve1 kernel: [3375932.671862] [ 3956] 0 3956 90222 29321 28521 544 256 397312 0 0 pve-ha-lrm Jul 4 20:00:12 pppve1 kernel: [3375932.683484] [ 3994] 0 3994 1290140 706601 705993 608 0 6389760 0 0 kvm Jul 4 20:00:12 pppve1 kernel: [3375932.694551] [ 4088] 0 4088 1271416 1040767 1040223 544 0 8994816 0 0 kvm Jul 4 20:00:12 pppve1 kernel: [3375932.705624] [ 4160] 0 4160 89394 30149 29541 608 0 380928 0 0 pvescheduler Jul 4 20:00:12 pppve1 kernel: [3375932.717864] [ 4710] 0 4710 1375 480 32 448 0 57344 0 0 agetty Jul 4 20:00:12 pppve1 kernel: [3375932.729183] [ 5531] 0 5531 993913 567351 566647 704 0 5611520 0 0 kvm Jul 4 20:00:12 pppve1 kernel: [3375932.740212] [ 6368] 0 6368 5512483 4229046 4228342 704 0 34951168 0 0 kvm Jul 4 20:00:12 pppve1 kernel: [3375932.751255] [ 9796] 0 9796 1941 768 64 704 0 57344 0 0 lxc-start Jul 4 20:00:12 pppve1 kernel: [3375932.762840] [ 9808] 100000 9808 3875 160 32 128 0 77824 0 0 init Jul 4 20:00:12 pppve1 kernel: [3375932.774063] [ 11447] 100000 11447 9272 192 64 128 0 118784 0 0 rpcbind Jul 4 20:00:12 pppve1 kernel: [3375932.785534] [ 11620] 100000 11620 45718 240 112 128 0 126976 0 0 rsyslogd Jul 4 20:00:12 pppve1 kernel: [3375932.797241] [ 11673] 100000 11673 4758 195 35 160 0 81920 0 0 atd Jul 4 20:00:12 pppve1 kernel: [3375932.808516] [ 11748] 100000 11748 6878 228 36 192 0 98304 0 0 cron Jul 4 20:00:12 pppve1 kernel: [3375932.819868] [ 11759] 100102 11759 10533 257 65 192 0 122880 0 0 dbus-daemon Jul 4 20:00:12 pppve1 kernel: [3375932.832328] [ 11765] 100000 11765 13797 315 155 160 0 143360 0 0 sshd Jul 4 20:00:12 pppve1 kernel: [3375932.843547] [ 11989] 100104 11989 565602 19744 288 160 19296 372736 0 0 postgres Jul 4 20:00:12 pppve1 kernel: [3375932.855266] [ 12169] 100104 12169 565938 537254 678 192 536384 4517888 0 0 postgres Jul 4 20:00:12 pppve1 kernel: [3375932.866950] [ 12170] 100104 12170 565859 199654 550 224 198880 4296704 0 0 postgres Jul 4 20:00:12 pppve1 kernel: [3375932.878525] [ 12171] 100104 12171 565859 4710 358 224 4128 241664 0 0 postgres Jul 4 20:00:12 pppve1 kernel: [3375932.890252] [ 12172] 100104 12172 565962 7654 518 192 6944 827392 0 0 postgres Jul 4 20:00:12 pppve1 kernel: [3375932.901845] [ 12173] 100104 12173 20982 742 518 224 0 200704 0 0 postgres Jul 4 20:00:13 pppve1 kernel: [3375932.913421] [ 13520] 100000 13520 9045 192 128 64 0 114688 0 0 master Jul 4 20:00:13 pppve1 kernel: [3375932.924809] [ 13536] 100100 13536 9601 320 128 192 0 126976 0 0 qmgr Jul 4 20:00:13 pppve1 kernel: [3375932.936088] [ 13547] 100000 13547 3168 192 32 160 0 73728 0 0 getty Jul 4 20:00:13 pppve1 kernel: [3375932.947424] [ 13548] 100000 13548 3168 160 32 128 0 73728 0 0 getty Jul 4 20:00:13 pppve1 kernel: [3375932.958761] [1302486] 0 1302486 1941 768 96 672 0 53248 0 0 lxc-start Jul 4 20:00:13 pppve1 kernel: [3375932.970490] [1302506] 100000 1302506 2115 128 32 96 0 65536 0 0 init Jul 4 20:00:13 pppve1 kernel: [3375932.981999] [1302829] 100001 1302829 2081 128 0 128 0 61440 0 0 portmap Jul 4 20:00:13 pppve1 kernel: [3375932.993763] [1302902] 100000 1302902 27413 160 64 96 0 122880 0 0 rsyslogd Jul 4 20:00:13 pppve1 kernel: [3375933.005719] [1302953] 100000 1302953 117996 1654 1366 227 61 450560 0 0 apache2 Jul 4 20:00:13 pppve1 kernel: [3375933.017459] [1302989] 100000 1302989 4736 97 33 64 0 81920 0 0 atd Jul 4 20:00:13 pppve1 kernel: [3375933.028905] [1303004] 100104 1303004 5843 64 32 32 0 94208 0 0 dbus-daemon Jul 4 20:00:13 pppve1 kernel: [3375933.041272] [1303030] 100000 1303030 12322 334 110 224 0 139264 0 0 sshd Jul 4 20:00:13 pppve1 kernel: [3375933.052755] [1303048] 100000 1303048 5664 64 32 32 0 94208 0 0 cron Jul 4 20:00:13 pppve1 kernel: [3375933.064220] [1303255] 100000 1303255 9322 224 96 128 0 118784 0 0 master Jul 4 20:00:13 pppve1 kernel: [3375933.075896] [1303284] 100101 1303284 9878 352 128 224 0 122880 0 0 qmgr Jul 4 20:00:13 pppve1 kernel: [3375933.087405] [1303285] 100000 1303285 1509 32 0 32 0 61440 0 0 getty Jul 4 20:00:13 pppve1 kernel: [3375933.099008] [1303286] 100000 1303286 1509 64 0 64 0 61440 0 0 getty Jul 4 20:00:13 pppve1 kernel: [3375933.110571] [1420994] 33 1420994 21749 13271 12759 512 0 204800 0 0 spiceproxy work Jul 4 20:00:13 pppve1 kernel: [3375933.123378] [1421001] 33 1421001 94055 37044 35892 1152 0 434176 0 0 pveproxy worker Jul 4 20:00:13 pppve1 kernel: [3375933.136284] [1421002] 33 1421002 94055 36980 35860 1120 0 434176 0 0 pveproxy worker Jul 4 20:00:13 pppve1 kernel: [3375933.149173] [1421003] 33 1421003 94055 37044 35892 1152 0 434176 0 0 pveproxy worker Jul 4 20:00:13 pppve1 kernel: [3375933.162040] [2316827] 0 2316827 6820 1088 224 864 0 69632 0 -1000 systemd-udevd Jul 4 20:00:13 pppve1 kernel: [3375933.174778] [2316923] 0 2316923 51282 2240 224 2016 0 438272 0 -250 systemd-journal Jul 4 20:00:13 pppve1 kernel: [3375933.187768] [3148356] 0 3148356 32681 21120 19232 1888 0 249856 0 0 glpi-agent (tag Jul 4 20:00:13 pppve1 kernel: [3375933.200481] [3053571] 0 3053571 19798 480 32 448 0 57344 0 0 pvefw-logger Jul 4 20:00:13 pppve1 kernel: [3375933.212970] [3498513] 100033 3498513 119792 7207 2632 223 4352 516096 0 0 apache2 Jul 4 20:00:13 pppve1 kernel: [3375933.224713] [3498820] 100104 3498820 575918 235975 9351 160 226464 3424256 0 0 postgres Jul 4 20:00:13 pppve1 kernel: [3375933.236579] [3500997] 100033 3500997 119889 7202 2594 192 4416 524288 0 0 apache2 Jul 4 20:00:13 pppve1 kernel: [3375933.248240] [3501657] 100104 3501657 571325 199025 6001 160 192864 2945024 0 0 postgres Jul 4 20:00:13 pppve1 kernel: [3375933.260100] [3502514] 100033 3502514 119119 5907 2004 191 3712 503808 0 0 apache2 Jul 4 20:00:13 pppve1 kernel: [3375933.271772] [3503679] 100104 3503679 575295 211508 6612 192 204704 2953216 0 0 postgres Jul 4 20:00:13 pppve1 kernel: [3375933.283619] [3515234] 100033 3515234 119042 6568 1960 192 4416 503808 0 0 apache2 Jul 4 20:00:13 pppve1 kernel: [3375933.295362] [3515420] 100104 3515420 569839 97579 4491 160 92928 2293760 0 0 postgres Jul 4 20:00:13 pppve1 kernel: [3375933.307155] [3520282] 100033 3520282 119129 5416 2056 192 3168 495616 0 0 apache2 Jul 4 20:00:13 pppve1 kernel: [3375933.318923] [3520287] 100033 3520287 119015 5709 1894 167 3648 503808 0 0 apache2 Jul 4 20:00:13 pppve1 kernel: [3375933.330805] [3520288] 100033 3520288 119876 5961 2729 224 3008 507904 0 0 apache2 Jul 4 20:00:13 pppve1 kernel: [3375933.342648] [3521057] 100104 3521057 573824 46069 8341 128 37600 1830912 0 0 postgres Jul 4 20:00:13 pppve1 kernel: [3375933.354567] [3521067] 100104 3521067 574768 99734 7446 96 92192 2134016 0 0 postgres Jul 4 20:00:13 pppve1 kernel: [3375933.366512] [3521301] 100104 3521301 569500 174722 4194 160 170368 2482176 0 0 postgres Jul 4 20:00:13 pppve1 kernel: [3375933.378484] [3532810] 100033 3532810 118740 4127 1727 160 2240 479232 0 0 apache2 Jul 4 20:00:13 pppve1 kernel: [3375933.390140] [3532933] 100033 3532933 118971 5064 1864 160 3040 503808 0 0 apache2 Jul 4 20:00:13 pppve1 kernel: [3375933.401854] [3534151] 100104 3534151 567344 168822 1686 160 166976 2408448 0 0 postgres Jul 4 20:00:13 pppve1 kernel: [3375933.413852] [3535832] 100104 3535832 569005 41042 2578 128 38336 1150976 0 0 postgres Jul 4 20:00:13 pppve1 kernel: [3375933.425919] [3550993] 100033 3550993 118029 1768 1544 224 0 425984 0 0 apache2 Jul 4 20:00:13 pppve1 kernel: [3375933.437868] [3560475] 107 3560475 10767 928 160 768 0 77824 0 0 pickup Jul 4 20:00:13 pppve1 kernel: [3375933.449513] [3563017] 100101 3563017 9838 256 96 160 0 122880 0 0 pickup Jul 4 20:00:13 pppve1 kernel: [3375933.461255] [3575085] 100100 3575085 9561 288 128 160 0 118784 0 0 pickup Jul 4 20:00:13 pppve1 kernel: [3375933.473119] [3579986] 0 3579986 1367 384 0 384 0 49152 0 0 sleep Jul 4 20:00:13 pppve1 kernel: [3375933.484646] [3579996] 100104 3579996 566249 5031 615 128 4288 450560 0 0 postgres Jul 4 20:00:13 pppve1 kernel: [3375933.496645] [3580020] 0 3580020 91269 30310 29606 704 0 409600 0 0 pvescheduler Jul 4 20:00:13 pppve1 kernel: [3375933.509585] [3580041] 0 3580041 5005 1920 640 1280 0 81920 0 100 systemd Jul 4 20:00:13 pppve1 kernel: [3375933.521297] [3580044] 0 3580044 42685 1538 1218 320 0 102400 0 100 (sd-pam) Jul 4 20:00:13 pppve1 kernel: [3375933.533226] [3580125] 100104 3580125 566119 5607 583 704 4320 446464 0 0 postgres Jul 4 20:00:13 pppve1 kernel: [3375933.545245] [3580193] 0 3580193 4403 2368 384 1984 0 81920 0 0 sshd Jul 4 20:00:13 pppve1 kernel: [3375933.556849] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=qemu.slice,mems_allowed=0,global_oom,task_memcg=/qemu.slice/121.scope,task=kvm,pid=6368,uid=0 Jul 4 20:00:13 pppve1 kernel: [3375933.573133] Out of memory: Killed process 6368 (kvm) total-vm:22049932kB, anon-rss:16913368kB, file-rss:2944kB, shmem-rss:0kB, UID:0 pgtables:34132kB oom_score_adj:0 Jul 4 20:00:15 pppve1 kernel: [3375935.378441] zd16: p1 p2 p3 < p5 p6 > Jul 4 20:00:16 pppve1 kernel: [3375936.735383] oom_reaper: reaped process 6368 (kvm), now anon-rss:0kB, file-rss:32kB, shmem-rss:0kB Jul 4 20:01:11 pppve1 kernel: [3375991.767379] vmbr0: port 5(tap121i0) entered disabled state Jul 4 20:01:11 pppve1 kernel: [3375991.778143] tap121i0 (unregistering): left allmulticast mode Jul 4 20:01:11 pppve1 kernel: [3375991.785976] vmbr0: port 5(tap121i0) entered disabled state Jul 4 20:01:11 pppve1 kernel: [3375991.791555] zd128: p1 Jul 4 20:01:13 pppve1 kernel: [3375993.594688] zd176: p1 p2 > Makes few sense that OOM triggers in 64GB hosts with just 24GB configured in > VMs and, probably, less real usage. IMHO it's not VMs what fill your memory > up to the point of OOM, but some other process, ZFS ARC, maybe even some mem > leak. Maybe some process is producing severe memory fragmentation. i can confirm that server was doing some heavy I/O (backup), but AFAIK nothing more. Mandi! Roland > it's a little bit weird that OOM kicks in with VMs <32GB RAM when you have 64GB > take a closer look why this happens , i.e. why OOM thinks there is ram pressure effectively server was running: + vm 100, 2GB + vm 120, 4GB + vm 121, 16GB + vm 127, 4GB + lxc 124, 2GB + lxc 125, 4GB so exactly 32GB of RAM. But most of the VM/LXC barely arrived at half of the allocated RAM... Thanks. -- _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PVE-User] A less aggressive OOM? 2025-07-08 16:31 ` Marco Gaiarin @ 2025-07-10 8:56 ` Victor Rodriguez 2025-07-10 9:08 ` Roland via pve-user 0 siblings, 1 reply; 10+ messages in thread From: Victor Rodriguez @ 2025-07-10 8:56 UTC (permalink / raw) To: Proxmox VE user list, Marco Gaiarin Hi, Checked the OOM log and for me the conclusion is clear (disclaimer, numbers might not be exact): - You had around 26.7G used mem by processes + 2.3G for shared memory: active_anon:17033436kB inactive_anon:10633368kB shmem:2325708kB mapped:2285988kB unevictable:158204kB - Seems like you are also using ZFS (some zd* disks appear in the log) and given that you were doing backups at the time of the OOM, I will suppose that that your ARC size is set to 50% of the hosts memory (check with arc_summary), so another 32G of used memory. ARC is reclaimable by the host, but usually ZFS does not return that memory fast enough, specially during heavy use of the ARC (i.e. reading for a backup), so can't really count on that memory. - Memory was quite framented and only small pages were available: Node 0 Normal: 16080*4kB 36886*8kB 22890*16kB 4687*32kB 159*64kB 10*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB Conclusions: You had 32+26.7+2.3 ≃ 61G of used memory, with the ~3G available being small blocks that can't be used for the typically large allocations that VMs do. You host had no choice but to trigger OOM. What I would do: - Lower ARC size [1] - Add some swap (never place it in a ZFS disk!). Even some ZRAM could help. - Lower your VMs memory, either the total, either the minimum memory (balloon) or both. Check that VirtIO drivers + balloon driver is installed and working so the host can reclaim memory from the guests. - Get more ram :) Regards [1] https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage On 7/8/25 18:31, Marco Gaiarin wrote: > Mandi! Victor Rodriguez > In chel di` si favelave... > >> I would start by analyzing the memory status at the time of the OOM. There >> should be a some lines in journal/syslog were the kernel writes what the >> memory looked like and you can figure out why it had to kill a process. > This is the full OOM log: > > Jul 4 20:00:12 pppve1 kernel: [3375931.660119] kvm invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0 > Jul 4 20:00:12 pppve1 kernel: [3375931.669158] CPU: 1 PID: 4088 Comm: kvm Tainted: P O 6.8.12-10-pve #1 > Jul 4 20:00:12 pppve1 kernel: [3375931.677778] Hardware name: Dell Inc. PowerEdge T440/021KCD, BIOS 2.24.0 04/02/2025 > Jul 4 20:00:12 pppve1 kernel: [3375931.686211] Call Trace: > Jul 4 20:00:12 pppve1 kernel: [3375931.689504] <TASK> > Jul 4 20:00:12 pppve1 kernel: [3375931.692428] dump_stack_lvl+0x76/0xa0 > Jul 4 20:00:12 pppve1 kernel: [3375931.696915] dump_stack+0x10/0x20 > Jul 4 20:00:12 pppve1 kernel: [3375931.701057] dump_header+0x47/0x1f0 > Jul 4 20:00:12 pppve1 kernel: [3375931.705358] oom_kill_process+0x110/0x240 > Jul 4 20:00:12 pppve1 kernel: [3375931.710169] out_of_memory+0x26e/0x560 > Jul 4 20:00:12 pppve1 kernel: [3375931.714707] __alloc_pages+0x10ce/0x1320 > Jul 4 20:00:12 pppve1 kernel: [3375931.719422] alloc_pages_mpol+0x91/0x1f0 > Jul 4 20:00:12 pppve1 kernel: [3375931.724136] alloc_pages+0x54/0xb0 > Jul 4 20:00:12 pppve1 kernel: [3375931.728320] __get_free_pages+0x11/0x50 > Jul 4 20:00:12 pppve1 kernel: [3375931.732938] __pollwait+0x9e/0xe0 > Jul 4 20:00:12 pppve1 kernel: [3375931.737015] eventfd_poll+0x2c/0x70 > Jul 4 20:00:12 pppve1 kernel: [3375931.741261] do_sys_poll+0x2f4/0x610 > Jul 4 20:00:12 pppve1 kernel: [3375931.745587] ? __pfx___pollwait+0x10/0x10 > Jul 4 20:00:12 pppve1 kernel: [3375931.750332] ? __pfx_pollwake+0x10/0x10 > Jul 4 20:00:12 pppve1 kernel: [3375931.754900] ? __pfx_pollwake+0x10/0x10 > Jul 4 20:00:12 pppve1 kernel: [3375931.759463] ? __pfx_pollwake+0x10/0x10 > Jul 4 20:00:12 pppve1 kernel: [3375931.764011] ? __pfx_pollwake+0x10/0x10 > Jul 4 20:00:12 pppve1 kernel: [3375931.768617] ? __pfx_pollwake+0x10/0x10 > Jul 4 20:00:12 pppve1 kernel: [3375931.773165] ? __pfx_pollwake+0x10/0x10 > Jul 4 20:00:12 pppve1 kernel: [3375931.777688] ? __pfx_pollwake+0x10/0x10 > Jul 4 20:00:12 pppve1 kernel: [3375931.782156] ? __pfx_pollwake+0x10/0x10 > Jul 4 20:00:12 pppve1 kernel: [3375931.786622] ? __pfx_pollwake+0x10/0x10 > Jul 4 20:00:12 pppve1 kernel: [3375931.791111] __x64_sys_ppoll+0xde/0x170 > Jul 4 20:00:12 pppve1 kernel: [3375931.795656] x64_sys_call+0x1818/0x2480 > Jul 4 20:00:12 pppve1 kernel: [3375931.800193] do_syscall_64+0x81/0x170 > Jul 4 20:00:12 pppve1 kernel: [3375931.804485] ? __x64_sys_ppoll+0xf2/0x170 > Jul 4 20:00:12 pppve1 kernel: [3375931.809100] ? syscall_exit_to_user_mode+0x86/0x260 > Jul 4 20:00:12 pppve1 kernel: [3375931.814566] ? do_syscall_64+0x8d/0x170 > Jul 4 20:00:12 pppve1 kernel: [3375931.818979] ? syscall_exit_to_user_mode+0x86/0x260 > Jul 4 20:00:12 pppve1 kernel: [3375931.824425] ? do_syscall_64+0x8d/0x170 > Jul 4 20:00:12 pppve1 kernel: [3375931.828825] ? clear_bhb_loop+0x15/0x70 > Jul 4 20:00:12 pppve1 kernel: [3375931.833211] ? clear_bhb_loop+0x15/0x70 > Jul 4 20:00:12 pppve1 kernel: [3375931.837579] ? clear_bhb_loop+0x15/0x70 > Jul 4 20:00:12 pppve1 kernel: [3375931.841928] entry_SYSCALL_64_after_hwframe+0x78/0x80 > Jul 4 20:00:12 pppve1 kernel: [3375931.847482] RIP: 0033:0x765bb1ce8316 > Jul 4 20:00:12 pppve1 kernel: [3375931.851577] Code: 7c 24 08 e8 2c 95 f8 ff 4c 8b 54 24 18 48 8b 74 24 10 41 b8 08 00 00 00 41 89 c1 48 8b 7c 24 08 4c 89 e2 b8 0f 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 32 44 89 cf 89 44 24 08 e8 76 95 f8 ff 8b 44 > Jul 4 20:00:12 pppve1 kernel: [3375931.871194] RSP: 002b:00007fff2d39ea20 EFLAGS: 00000293 ORIG_RAX: 000000000000010f > Jul 4 20:00:12 pppve1 kernel: [3375931.879298] RAX: ffffffffffffffda RBX: 00006045d3e68470 RCX: 0000765bb1ce8316 > Jul 4 20:00:12 pppve1 kernel: [3375931.886963] RDX: 00007fff2d39ea40 RSI: 0000000000000010 RDI: 00006045d4de5f20 > Jul 4 20:00:12 pppve1 kernel: [3375931.894630] RBP: 00007fff2d39eaac R08: 0000000000000008 R09: 0000000000000000 > Jul 4 20:00:12 pppve1 kernel: [3375931.902299] R10: 0000000000000000 R11: 0000000000000293 R12: 00007fff2d39ea40 > Jul 4 20:00:12 pppve1 kernel: [3375931.909951] R13: 00006045d3e68470 R14: 00006045b014d570 R15: 00007fff2d39eab0 > Jul 4 20:00:12 pppve1 kernel: [3375931.917656] </TASK> > Jul 4 20:00:12 pppve1 kernel: [3375931.920515] Mem-Info: > Jul 4 20:00:12 pppve1 kernel: [3375931.923465] active_anon:4467063 inactive_anon:2449638 isolated_anon:0 > Jul 4 20:00:12 pppve1 kernel: [3375931.923465] active_file:611 inactive_file:303 isolated_file:0 > Jul 4 20:00:12 pppve1 kernel: [3375931.923465] unevictable:39551 dirty:83 writeback:237 > Jul 4 20:00:12 pppve1 kernel: [3375931.923465] slab_reclaimable:434580 slab_unreclaimable:1792355 > Jul 4 20:00:12 pppve1 kernel: [3375931.923465] mapped:571491 shmem:581427 pagetables:26365 > Jul 4 20:00:12 pppve1 kernel: [3375931.923465] sec_pagetables:11751 bounce:0 > Jul 4 20:00:12 pppve1 kernel: [3375931.923465] kernel_misc_reclaimable:0 > Jul 4 20:00:12 pppve1 kernel: [3375931.923465] free:234516 free_pcp:5874 free_cma:0 > Jul 4 20:00:12 pppve1 kernel: [3375931.969518] Node 0 active_anon:17033436kB inactive_anon:10633368kB active_file:64kB inactive_file:3196kB unevictable:158204kB isolated(anon):0kB isolated(file):0kB mapped:2285988kB dirty:356kB writeback:948kB shmem:2325708kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:866304kB writeback_tmp:0kB kernel_stack:11520kB pagetables:105460kB sec_pagetables:47004kB all_unreclaimable? no > Jul 4 20:00:12 pppve1 kernel: [3375932.004977] Node 0 DMA free:11264kB boost:0kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > Jul 4 20:00:12 pppve1 kernel: [3375932.032646] lowmem_reserve[]: 0 1527 63844 63844 63844 > Jul 4 20:00:12 pppve1 kernel: [3375932.038675] Node 0 DMA32 free:252428kB boost:0kB min:1616kB low:3176kB high:4736kB reserved_highatomic:2048KB active_anon:310080kB inactive_anon:986436kB active_file:216kB inactive_file:0kB unevictable:0kB writepending:0kB present:1690624kB managed:1623508kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > Jul 4 20:00:12 pppve1 kernel: [3375932.069110] lowmem_reserve[]: 0 0 62317 62317 62317 > Jul 4 20:00:12 pppve1 kernel: [3375932.074979] Node 0 Normal free:814396kB boost:290356kB min:356304kB low:420116kB high:483928kB reserved_highatomic:346112KB active_anon:11258684kB inactive_anon:15111580kB active_file:0kB inactive_file:2316kB unevictable:158204kB writepending:1304kB present:65011712kB managed:63820796kB mlocked:155132kB bounce:0kB free_pcp:12728kB local_pcp:0kB free_cma:0kB > Jul 4 20:00:12 pppve1 kernel: [3375932.109188] lowmem_reserve[]: 0 0 0 0 0 > Jul 4 20:00:12 pppve1 kernel: [3375932.114119] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB > Jul 4 20:00:12 pppve1 kernel: [3375932.127796] Node 0 DMA32: 5689*4kB (UMH) 1658*8kB (UMH) 381*16kB (UM) 114*32kB (UME) 97*64kB (UME) 123*128kB (UMEH) 87*256kB (MEH) 96*512kB (UMEH) 58*1024kB (UME) 5*2048kB (UME) 11*4096kB (ME) = 253828kB > Jul 4 20:00:12 pppve1 kernel: [3375932.148050] Node 0 Normal: 16080*4kB (UMEH) 36886*8kB (UMEH) 22890*16kB (UMEH) 4687*32kB (MEH) 159*64kB (UMEH) 10*128kB (UE) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 887088kB > Jul 4 20:00:12 pppve1 kernel: [3375932.165899] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB > Jul 4 20:00:12 pppve1 kernel: [3375932.175876] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB > Jul 4 20:00:12 pppve1 kernel: [3375932.185569] 586677 total pagecache pages > Jul 4 20:00:12 pppve1 kernel: [3375932.190737] 0 pages in swap cache > Jul 4 20:00:12 pppve1 kernel: [3375932.195285] Free swap = 0kB > Jul 4 20:00:12 pppve1 kernel: [3375932.199404] Total swap = 0kB > Jul 4 20:00:12 pppve1 kernel: [3375932.203513] 16679583 pages RAM > Jul 4 20:00:12 pppve1 kernel: [3375932.207787] 0 pages HighMem/MovableOnly > Jul 4 20:00:12 pppve1 kernel: [3375932.212819] 314667 pages reserved > Jul 4 20:00:12 pppve1 kernel: [3375932.217321] 0 pages hwpoisoned > Jul 4 20:00:12 pppve1 kernel: [3375932.221525] Tasks state (memory values in pages): > Jul 4 20:00:12 pppve1 kernel: [3375932.227400] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name > Jul 4 20:00:12 pppve1 kernel: [3375932.239680] [ 1959] 106 1959 1971 544 96 448 0 61440 0 0 rpcbind > Jul 4 20:00:12 pppve1 kernel: [3375932.251672] [ 1982] 104 1982 2350 672 160 512 0 57344 0 -900 dbus-daemon > Jul 4 20:00:12 pppve1 kernel: [3375932.264020] [ 1991] 0 1991 1767 275 83 192 0 57344 0 0 ksmtuned > Jul 4 20:00:12 pppve1 kernel: [3375932.276094] [ 1995] 0 1995 69541 480 64 416 0 86016 0 0 pve-lxc-syscall > Jul 4 20:00:12 pppve1 kernel: [3375932.289686] [ 2002] 0 2002 1330 384 32 352 0 53248 0 0 qmeventd > Jul 4 20:00:12 pppve1 kernel: [3375932.301811] [ 2003] 0 2003 55449 727 247 480 0 86016 0 0 rsyslogd > Jul 4 20:00:12 pppve1 kernel: [3375932.313919] [ 2004] 0 2004 3008 928 448 480 0 69632 0 0 smartd > Jul 4 20:00:12 pppve1 kernel: [3375932.325834] [ 2009] 0 2009 6386 992 224 768 0 77824 0 0 systemd-logind > Jul 4 20:00:12 pppve1 kernel: [3375932.339438] [ 2010] 0 2010 584 256 0 256 0 40960 0 -1000 watchdog-mux > Jul 4 20:00:12 pppve1 kernel: [3375932.352936] [ 2021] 0 2021 60174 928 256 672 0 90112 0 0 zed > Jul 4 20:00:12 pppve1 kernel: [3375932.364626] [ 2136] 0 2136 75573 256 64 192 0 86016 0 -1000 lxcfs > Jul 4 20:00:12 pppve1 kernel: [3375932.376485] [ 2397] 0 2397 2208 480 64 416 0 61440 0 0 lxc-monitord > Jul 4 20:00:12 pppve1 kernel: [3375932.389169] [ 2421] 0 2421 40673 454 70 384 0 73728 0 0 apcupsd > Jul 4 20:00:12 pppve1 kernel: [3375932.400685] [ 2426] 0 2426 3338 428 172 256 0 69632 0 0 iscsid > Jul 4 20:00:12 pppve1 kernel: [3375932.412121] [ 2427] 0 2427 3464 3343 431 2912 0 77824 0 -17 iscsid > Jul 4 20:00:12 pppve1 kernel: [3375932.423754] [ 2433] 0 2433 3860 1792 320 1472 0 77824 0 -1000 sshd > Jul 4 20:00:12 pppve1 kernel: [3375932.435208] [ 2461] 0 2461 189627 2688 1344 1344 0 155648 0 0 dsm_ism_srvmgrd > Jul 4 20:00:12 pppve1 kernel: [3375932.448290] [ 2490] 113 2490 4721 750 142 608 0 61440 0 0 chronyd > Jul 4 20:00:12 pppve1 kernel: [3375932.459988] [ 2492] 113 2492 2639 502 118 384 0 61440 0 0 chronyd > Jul 4 20:00:12 pppve1 kernel: [3375932.471684] [ 2531] 0 2531 1469 448 32 416 0 49152 0 0 agetty > Jul 4 20:00:12 pppve1 kernel: [3375932.483269] [ 2555] 0 2555 126545 673 244 429 0 147456 0 0 rrdcached > Jul 4 20:00:12 pppve1 kernel: [3375932.483275] [ 2582] 0 2582 155008 15334 3093 864 11377 434176 0 0 pmxcfs > Jul 4 20:00:12 pppve1 kernel: [3375932.506653] [ 2654] 0 2654 10667 614 134 480 0 77824 0 0 master > Jul 4 20:00:12 pppve1 kernel: [3375932.517986] [ 2656] 107 2656 10812 704 160 544 0 73728 0 0 qmgr > Jul 4 20:00:12 pppve1 kernel: [3375932.529118] [ 2661] 0 2661 139892 41669 28417 2980 10272 405504 0 0 corosync > Jul 4 20:00:12 pppve1 kernel: [3375932.540553] [ 2662] 0 2662 1653 576 32 544 0 53248 0 0 cron > Jul 4 20:00:12 pppve1 kernel: [3375932.551657] [ 2664] 0 2664 1621 480 96 384 0 57344 0 0 proxmox-firewal > Jul 4 20:00:12 pppve1 kernel: [3375932.564093] [ 3164] 0 3164 83332 26227 25203 768 256 360448 0 0 pve-firewall > Jul 4 20:00:12 pppve1 kernel: [3375932.576192] [ 3233] 0 3233 85947 28810 27242 1216 352 385024 0 0 pvestatd > Jul 4 20:00:12 pppve1 kernel: [3375932.587638] [ 3417] 0 3417 93674 36011 35531 480 0 438272 0 0 pvedaemon > Jul 4 20:00:12 pppve1 kernel: [3375932.599167] [ 3421] 0 3421 95913 37068 35884 1120 64 454656 0 0 pvedaemon worke > Jul 4 20:00:12 pppve1 kernel: [3375932.611536] [ 3424] 0 3424 96072 36972 35852 1088 32 454656 0 0 pvedaemon worke > Jul 4 20:00:12 pppve1 kernel: [3375932.623977] [ 3426] 0 3426 96167 37068 35948 1056 64 458752 0 0 pvedaemon worke > Jul 4 20:00:12 pppve1 kernel: [3375932.636698] [ 3558] 0 3558 90342 29540 28676 608 256 385024 0 0 pve-ha-crm > Jul 4 20:00:12 pppve1 kernel: [3375932.648477] [ 3948] 33 3948 94022 37705 35849 1856 0 471040 0 0 pveproxy > Jul 4 20:00:12 pppve1 kernel: [3375932.660083] [ 3954] 33 3954 21688 14368 12736 1632 0 221184 0 0 spiceproxy > Jul 4 20:00:12 pppve1 kernel: [3375932.671862] [ 3956] 0 3956 90222 29321 28521 544 256 397312 0 0 pve-ha-lrm > Jul 4 20:00:12 pppve1 kernel: [3375932.683484] [ 3994] 0 3994 1290140 706601 705993 608 0 6389760 0 0 kvm > Jul 4 20:00:12 pppve1 kernel: [3375932.694551] [ 4088] 0 4088 1271416 1040767 1040223 544 0 8994816 0 0 kvm > Jul 4 20:00:12 pppve1 kernel: [3375932.705624] [ 4160] 0 4160 89394 30149 29541 608 0 380928 0 0 pvescheduler > Jul 4 20:00:12 pppve1 kernel: [3375932.717864] [ 4710] 0 4710 1375 480 32 448 0 57344 0 0 agetty > Jul 4 20:00:12 pppve1 kernel: [3375932.729183] [ 5531] 0 5531 993913 567351 566647 704 0 5611520 0 0 kvm > Jul 4 20:00:12 pppve1 kernel: [3375932.740212] [ 6368] 0 6368 5512483 4229046 4228342 704 0 34951168 0 0 kvm > Jul 4 20:00:12 pppve1 kernel: [3375932.751255] [ 9796] 0 9796 1941 768 64 704 0 57344 0 0 lxc-start > Jul 4 20:00:12 pppve1 kernel: [3375932.762840] [ 9808] 100000 9808 3875 160 32 128 0 77824 0 0 init > Jul 4 20:00:12 pppve1 kernel: [3375932.774063] [ 11447] 100000 11447 9272 192 64 128 0 118784 0 0 rpcbind > Jul 4 20:00:12 pppve1 kernel: [3375932.785534] [ 11620] 100000 11620 45718 240 112 128 0 126976 0 0 rsyslogd > Jul 4 20:00:12 pppve1 kernel: [3375932.797241] [ 11673] 100000 11673 4758 195 35 160 0 81920 0 0 atd > Jul 4 20:00:12 pppve1 kernel: [3375932.808516] [ 11748] 100000 11748 6878 228 36 192 0 98304 0 0 cron > Jul 4 20:00:12 pppve1 kernel: [3375932.819868] [ 11759] 100102 11759 10533 257 65 192 0 122880 0 0 dbus-daemon > Jul 4 20:00:12 pppve1 kernel: [3375932.832328] [ 11765] 100000 11765 13797 315 155 160 0 143360 0 0 sshd > Jul 4 20:00:12 pppve1 kernel: [3375932.843547] [ 11989] 100104 11989 565602 19744 288 160 19296 372736 0 0 postgres > Jul 4 20:00:12 pppve1 kernel: [3375932.855266] [ 12169] 100104 12169 565938 537254 678 192 536384 4517888 0 0 postgres > Jul 4 20:00:12 pppve1 kernel: [3375932.866950] [ 12170] 100104 12170 565859 199654 550 224 198880 4296704 0 0 postgres > Jul 4 20:00:12 pppve1 kernel: [3375932.878525] [ 12171] 100104 12171 565859 4710 358 224 4128 241664 0 0 postgres > Jul 4 20:00:12 pppve1 kernel: [3375932.890252] [ 12172] 100104 12172 565962 7654 518 192 6944 827392 0 0 postgres > Jul 4 20:00:12 pppve1 kernel: [3375932.901845] [ 12173] 100104 12173 20982 742 518 224 0 200704 0 0 postgres > Jul 4 20:00:13 pppve1 kernel: [3375932.913421] [ 13520] 100000 13520 9045 192 128 64 0 114688 0 0 master > Jul 4 20:00:13 pppve1 kernel: [3375932.924809] [ 13536] 100100 13536 9601 320 128 192 0 126976 0 0 qmgr > Jul 4 20:00:13 pppve1 kernel: [3375932.936088] [ 13547] 100000 13547 3168 192 32 160 0 73728 0 0 getty > Jul 4 20:00:13 pppve1 kernel: [3375932.947424] [ 13548] 100000 13548 3168 160 32 128 0 73728 0 0 getty > Jul 4 20:00:13 pppve1 kernel: [3375932.958761] [1302486] 0 1302486 1941 768 96 672 0 53248 0 0 lxc-start > Jul 4 20:00:13 pppve1 kernel: [3375932.970490] [1302506] 100000 1302506 2115 128 32 96 0 65536 0 0 init > Jul 4 20:00:13 pppve1 kernel: [3375932.981999] [1302829] 100001 1302829 2081 128 0 128 0 61440 0 0 portmap > Jul 4 20:00:13 pppve1 kernel: [3375932.993763] [1302902] 100000 1302902 27413 160 64 96 0 122880 0 0 rsyslogd > Jul 4 20:00:13 pppve1 kernel: [3375933.005719] [1302953] 100000 1302953 117996 1654 1366 227 61 450560 0 0 apache2 > Jul 4 20:00:13 pppve1 kernel: [3375933.017459] [1302989] 100000 1302989 4736 97 33 64 0 81920 0 0 atd > Jul 4 20:00:13 pppve1 kernel: [3375933.028905] [1303004] 100104 1303004 5843 64 32 32 0 94208 0 0 dbus-daemon > Jul 4 20:00:13 pppve1 kernel: [3375933.041272] [1303030] 100000 1303030 12322 334 110 224 0 139264 0 0 sshd > Jul 4 20:00:13 pppve1 kernel: [3375933.052755] [1303048] 100000 1303048 5664 64 32 32 0 94208 0 0 cron > Jul 4 20:00:13 pppve1 kernel: [3375933.064220] [1303255] 100000 1303255 9322 224 96 128 0 118784 0 0 master > Jul 4 20:00:13 pppve1 kernel: [3375933.075896] [1303284] 100101 1303284 9878 352 128 224 0 122880 0 0 qmgr > Jul 4 20:00:13 pppve1 kernel: [3375933.087405] [1303285] 100000 1303285 1509 32 0 32 0 61440 0 0 getty > Jul 4 20:00:13 pppve1 kernel: [3375933.099008] [1303286] 100000 1303286 1509 64 0 64 0 61440 0 0 getty > Jul 4 20:00:13 pppve1 kernel: [3375933.110571] [1420994] 33 1420994 21749 13271 12759 512 0 204800 0 0 spiceproxy work > Jul 4 20:00:13 pppve1 kernel: [3375933.123378] [1421001] 33 1421001 94055 37044 35892 1152 0 434176 0 0 pveproxy worker > Jul 4 20:00:13 pppve1 kernel: [3375933.136284] [1421002] 33 1421002 94055 36980 35860 1120 0 434176 0 0 pveproxy worker > Jul 4 20:00:13 pppve1 kernel: [3375933.149173] [1421003] 33 1421003 94055 37044 35892 1152 0 434176 0 0 pveproxy worker > Jul 4 20:00:13 pppve1 kernel: [3375933.162040] [2316827] 0 2316827 6820 1088 224 864 0 69632 0 -1000 systemd-udevd > Jul 4 20:00:13 pppve1 kernel: [3375933.174778] [2316923] 0 2316923 51282 2240 224 2016 0 438272 0 -250 systemd-journal > Jul 4 20:00:13 pppve1 kernel: [3375933.187768] [3148356] 0 3148356 32681 21120 19232 1888 0 249856 0 0 glpi-agent (tag > Jul 4 20:00:13 pppve1 kernel: [3375933.200481] [3053571] 0 3053571 19798 480 32 448 0 57344 0 0 pvefw-logger > Jul 4 20:00:13 pppve1 kernel: [3375933.212970] [3498513] 100033 3498513 119792 7207 2632 223 4352 516096 0 0 apache2 > Jul 4 20:00:13 pppve1 kernel: [3375933.224713] [3498820] 100104 3498820 575918 235975 9351 160 226464 3424256 0 0 postgres > Jul 4 20:00:13 pppve1 kernel: [3375933.236579] [3500997] 100033 3500997 119889 7202 2594 192 4416 524288 0 0 apache2 > Jul 4 20:00:13 pppve1 kernel: [3375933.248240] [3501657] 100104 3501657 571325 199025 6001 160 192864 2945024 0 0 postgres > Jul 4 20:00:13 pppve1 kernel: [3375933.260100] [3502514] 100033 3502514 119119 5907 2004 191 3712 503808 0 0 apache2 > Jul 4 20:00:13 pppve1 kernel: [3375933.271772] [3503679] 100104 3503679 575295 211508 6612 192 204704 2953216 0 0 postgres > Jul 4 20:00:13 pppve1 kernel: [3375933.283619] [3515234] 100033 3515234 119042 6568 1960 192 4416 503808 0 0 apache2 > Jul 4 20:00:13 pppve1 kernel: [3375933.295362] [3515420] 100104 3515420 569839 97579 4491 160 92928 2293760 0 0 postgres > Jul 4 20:00:13 pppve1 kernel: [3375933.307155] [3520282] 100033 3520282 119129 5416 2056 192 3168 495616 0 0 apache2 > Jul 4 20:00:13 pppve1 kernel: [3375933.318923] [3520287] 100033 3520287 119015 5709 1894 167 3648 503808 0 0 apache2 > Jul 4 20:00:13 pppve1 kernel: [3375933.330805] [3520288] 100033 3520288 119876 5961 2729 224 3008 507904 0 0 apache2 > Jul 4 20:00:13 pppve1 kernel: [3375933.342648] [3521057] 100104 3521057 573824 46069 8341 128 37600 1830912 0 0 postgres > Jul 4 20:00:13 pppve1 kernel: [3375933.354567] [3521067] 100104 3521067 574768 99734 7446 96 92192 2134016 0 0 postgres > Jul 4 20:00:13 pppve1 kernel: [3375933.366512] [3521301] 100104 3521301 569500 174722 4194 160 170368 2482176 0 0 postgres > Jul 4 20:00:13 pppve1 kernel: [3375933.378484] [3532810] 100033 3532810 118740 4127 1727 160 2240 479232 0 0 apache2 > Jul 4 20:00:13 pppve1 kernel: [3375933.390140] [3532933] 100033 3532933 118971 5064 1864 160 3040 503808 0 0 apache2 > Jul 4 20:00:13 pppve1 kernel: [3375933.401854] [3534151] 100104 3534151 567344 168822 1686 160 166976 2408448 0 0 postgres > Jul 4 20:00:13 pppve1 kernel: [3375933.413852] [3535832] 100104 3535832 569005 41042 2578 128 38336 1150976 0 0 postgres > Jul 4 20:00:13 pppve1 kernel: [3375933.425919] [3550993] 100033 3550993 118029 1768 1544 224 0 425984 0 0 apache2 > Jul 4 20:00:13 pppve1 kernel: [3375933.437868] [3560475] 107 3560475 10767 928 160 768 0 77824 0 0 pickup > Jul 4 20:00:13 pppve1 kernel: [3375933.449513] [3563017] 100101 3563017 9838 256 96 160 0 122880 0 0 pickup > Jul 4 20:00:13 pppve1 kernel: [3375933.461255] [3575085] 100100 3575085 9561 288 128 160 0 118784 0 0 pickup > Jul 4 20:00:13 pppve1 kernel: [3375933.473119] [3579986] 0 3579986 1367 384 0 384 0 49152 0 0 sleep > Jul 4 20:00:13 pppve1 kernel: [3375933.484646] [3579996] 100104 3579996 566249 5031 615 128 4288 450560 0 0 postgres > Jul 4 20:00:13 pppve1 kernel: [3375933.496645] [3580020] 0 3580020 91269 30310 29606 704 0 409600 0 0 pvescheduler > Jul 4 20:00:13 pppve1 kernel: [3375933.509585] [3580041] 0 3580041 5005 1920 640 1280 0 81920 0 100 systemd > Jul 4 20:00:13 pppve1 kernel: [3375933.521297] [3580044] 0 3580044 42685 1538 1218 320 0 102400 0 100 (sd-pam) > Jul 4 20:00:13 pppve1 kernel: [3375933.533226] [3580125] 100104 3580125 566119 5607 583 704 4320 446464 0 0 postgres > Jul 4 20:00:13 pppve1 kernel: [3375933.545245] [3580193] 0 3580193 4403 2368 384 1984 0 81920 0 0 sshd > Jul 4 20:00:13 pppve1 kernel: [3375933.556849] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=qemu.slice,mems_allowed=0,global_oom,task_memcg=/qemu.slice/121.scope,task=kvm,pid=6368,uid=0 > Jul 4 20:00:13 pppve1 kernel: [3375933.573133] Out of memory: Killed process 6368 (kvm) total-vm:22049932kB, anon-rss:16913368kB, file-rss:2944kB, shmem-rss:0kB, UID:0 pgtables:34132kB oom_score_adj:0 > Jul 4 20:00:15 pppve1 kernel: [3375935.378441] zd16: p1 p2 p3 < p5 p6 > > Jul 4 20:00:16 pppve1 kernel: [3375936.735383] oom_reaper: reaped process 6368 (kvm), now anon-rss:0kB, file-rss:32kB, shmem-rss:0kB > Jul 4 20:01:11 pppve1 kernel: [3375991.767379] vmbr0: port 5(tap121i0) entered disabled state > Jul 4 20:01:11 pppve1 kernel: [3375991.778143] tap121i0 (unregistering): left allmulticast mode > Jul 4 20:01:11 pppve1 kernel: [3375991.785976] vmbr0: port 5(tap121i0) entered disabled state > Jul 4 20:01:11 pppve1 kernel: [3375991.791555] zd128: p1 > Jul 4 20:01:13 pppve1 kernel: [3375993.594688] zd176: p1 p2 > > >> Makes few sense that OOM triggers in 64GB hosts with just 24GB configured in >> VMs and, probably, less real usage. IMHO it's not VMs what fill your memory >> up to the point of OOM, but some other process, ZFS ARC, maybe even some mem >> leak. Maybe some process is producing severe memory fragmentation. > i can confirm that server was doing some heavy I/O (backup), but AFAIK > nothing more. > > > Mandi! Roland > >> it's a little bit weird that OOM kicks in with VMs <32GB RAM when you have 64GB >> take a closer look why this happens , i.e. why OOM thinks there is ram pressure > effectively server was running: > + vm 100, 2GB > + vm 120, 4GB > + vm 121, 16GB > + vm 127, 4GB > + lxc 124, 2GB > + lxc 125, 4GB > > so exactly 32GB of RAM. But most of the VM/LXC barely arrived at half of the > allocated RAM... > > > > Thanks. > -- _______________________________________________ SOLTECSIS SOLUCIONES TECNOLOGICAS, S.L. Víctor Rodríguez Cortés Teléfono: 966 446 046 vrodriguez@soltecsis.com www.soltecsis.com _______________________________________________ La información contenida en este e-mail es confidencial, siendo para uso exclusivo del destinatario arriba mencionado. Le informamos que está totalmente prohibida cualquier utilización, divulgación, distribución y/o reproducción de esta comunicación sin autorización expresa en virtud de la legislación vigente. Si ha recibido este mensaje por error, le rogamos nos lo notifique inmediatamente por la misma vía y proceda a su eliminación. _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PVE-User] A less aggressive OOM? 2025-07-10 8:56 ` Victor Rodriguez @ 2025-07-10 9:08 ` Roland via pve-user 2025-07-10 14:49 ` dorsy via pve-user 0 siblings, 1 reply; 10+ messages in thread From: Roland via pve-user @ 2025-07-10 9:08 UTC (permalink / raw) To: Proxmox VE user list, Victor Rodriguez, Marco Gaiarin; +Cc: Roland [-- Attachment #1: Type: message/rfc822, Size: 59734 bytes --] From: Roland <devzero@web.de> To: Proxmox VE user list <pve-user@lists.proxmox.com>, Victor Rodriguez <vrodriguez@soltecsis.com>, Marco Gaiarin <gaio@lilliput.linux.it> Subject: Re: [PVE-User] A less aggressive OOM? Date: Thu, 10 Jul 2025 11:08:31 +0200 Message-ID: <54f612ef-4dbc-4382-8ee8-2e11e860b34b@web.de> if OOM kicks in because half of the ram is being used for caches/buffers, i would blame OOMkiller or ZFS for tha. The problem should be resolved at zfs or memory management level. Why kill processes instead of reclaiming arc ? i think that's totally wrong behaviour. will watch out for appropriate zfs github issue or we should consider open up one. roland Am 10.07.25 um 10:56 schrieb Victor Rodriguez: > Hi, > > Checked the OOM log and for me the conclusion is clear (disclaimer, > numbers might not be exact): > > - You had around 26.7G used mem by processes + 2.3G for shared memory: > > active_anon:17033436kB > inactive_anon:10633368kB > shmem:2325708kB > mapped:2285988kB > unevictable:158204kB > > - Seems like you are also using ZFS (some zd* disks appear in the log) > and given that you were doing backups at the time of the OOM, I will > suppose that that your ARC size is set to 50% of the hosts memory > (check with arc_summary), so another 32G of used memory. ARC is > reclaimable by the host, but usually ZFS does not return that memory > fast enough, specially during heavy use of the ARC (i.e. reading for a > backup), so can't really count on that memory. > > - Memory was quite framented and only small pages were available: > > Node 0 Normal: > 16080*4kB > 36886*8kB > 22890*16kB > 4687*32kB > 159*64kB > 10*128kB > 0*256kB > 0*512kB > 0*1024kB > 0*2048kB > 0*4096kB > > > Conclusions: > > You had 32+26.7+2.3 ≃ 61G of used memory, with the ~3G available being > small blocks that can't be used for the typically large allocations > that VMs do. You host had no choice but to trigger OOM. > > > What I would do: > > - Lower ARC size [1] > - Add some swap (never place it in a ZFS disk!). Even some ZRAM could > help. > - Lower your VMs memory, either the total, either the minimum memory > (balloon) or both. Check that VirtIO drivers + balloon driver is > installed and working so the host can reclaim memory from the guests. > - Get more ram :) > > > Regards > > > [1] > https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage > > > > On 7/8/25 18:31, Marco Gaiarin wrote: >> Mandi! Victor Rodriguez >> In chel di` si favelave... >> >>> I would start by analyzing the memory status at the time of the OOM. >>> There >>> should be a some lines in journal/syslog were the kernel writes what >>> the >>> memory looked like and you can figure out why it had to kill a process. >> This is the full OOM log: >> >> Jul 4 20:00:12 pppve1 kernel: [3375931.660119] kvm invoked >> oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0 >> Jul 4 20:00:12 pppve1 kernel: [3375931.669158] CPU: 1 PID: 4088 >> Comm: kvm Tainted: P O 6.8.12-10-pve #1 >> Jul 4 20:00:12 pppve1 kernel: [3375931.677778] Hardware name: Dell >> Inc. PowerEdge T440/021KCD, BIOS 2.24.0 04/02/2025 >> Jul 4 20:00:12 pppve1 kernel: [3375931.686211] Call Trace: >> Jul 4 20:00:12 pppve1 kernel: [3375931.689504] <TASK> >> Jul 4 20:00:12 pppve1 kernel: [3375931.692428] dump_stack_lvl+0x76/0xa0 >> Jul 4 20:00:12 pppve1 kernel: [3375931.696915] dump_stack+0x10/0x20 >> Jul 4 20:00:12 pppve1 kernel: [3375931.701057] dump_header+0x47/0x1f0 >> Jul 4 20:00:12 pppve1 kernel: [3375931.705358] >> oom_kill_process+0x110/0x240 >> Jul 4 20:00:12 pppve1 kernel: [3375931.710169] >> out_of_memory+0x26e/0x560 >> Jul 4 20:00:12 pppve1 kernel: [3375931.714707] >> __alloc_pages+0x10ce/0x1320 >> Jul 4 20:00:12 pppve1 kernel: [3375931.719422] >> alloc_pages_mpol+0x91/0x1f0 >> Jul 4 20:00:12 pppve1 kernel: [3375931.724136] alloc_pages+0x54/0xb0 >> Jul 4 20:00:12 pppve1 kernel: [3375931.728320] >> __get_free_pages+0x11/0x50 >> Jul 4 20:00:12 pppve1 kernel: [3375931.732938] __pollwait+0x9e/0xe0 >> Jul 4 20:00:12 pppve1 kernel: [3375931.737015] eventfd_poll+0x2c/0x70 >> Jul 4 20:00:12 pppve1 kernel: [3375931.741261] do_sys_poll+0x2f4/0x610 >> Jul 4 20:00:12 pppve1 kernel: [3375931.745587] ? >> __pfx___pollwait+0x10/0x10 >> Jul 4 20:00:12 pppve1 kernel: [3375931.750332] ? >> __pfx_pollwake+0x10/0x10 >> Jul 4 20:00:12 pppve1 kernel: [3375931.754900] ? >> __pfx_pollwake+0x10/0x10 >> Jul 4 20:00:12 pppve1 kernel: [3375931.759463] ? >> __pfx_pollwake+0x10/0x10 >> Jul 4 20:00:12 pppve1 kernel: [3375931.764011] ? >> __pfx_pollwake+0x10/0x10 >> Jul 4 20:00:12 pppve1 kernel: [3375931.768617] ? >> __pfx_pollwake+0x10/0x10 >> Jul 4 20:00:12 pppve1 kernel: [3375931.773165] ? >> __pfx_pollwake+0x10/0x10 >> Jul 4 20:00:12 pppve1 kernel: [3375931.777688] ? >> __pfx_pollwake+0x10/0x10 >> Jul 4 20:00:12 pppve1 kernel: [3375931.782156] ? >> __pfx_pollwake+0x10/0x10 >> Jul 4 20:00:12 pppve1 kernel: [3375931.786622] ? >> __pfx_pollwake+0x10/0x10 >> Jul 4 20:00:12 pppve1 kernel: [3375931.791111] >> __x64_sys_ppoll+0xde/0x170 >> Jul 4 20:00:12 pppve1 kernel: [3375931.795656] >> x64_sys_call+0x1818/0x2480 >> Jul 4 20:00:12 pppve1 kernel: [3375931.800193] do_syscall_64+0x81/0x170 >> Jul 4 20:00:12 pppve1 kernel: [3375931.804485] ? >> __x64_sys_ppoll+0xf2/0x170 >> Jul 4 20:00:12 pppve1 kernel: [3375931.809100] ? >> syscall_exit_to_user_mode+0x86/0x260 >> Jul 4 20:00:12 pppve1 kernel: [3375931.814566] ? >> do_syscall_64+0x8d/0x170 >> Jul 4 20:00:12 pppve1 kernel: [3375931.818979] ? >> syscall_exit_to_user_mode+0x86/0x260 >> Jul 4 20:00:12 pppve1 kernel: [3375931.824425] ? >> do_syscall_64+0x8d/0x170 >> Jul 4 20:00:12 pppve1 kernel: [3375931.828825] ? >> clear_bhb_loop+0x15/0x70 >> Jul 4 20:00:12 pppve1 kernel: [3375931.833211] ? >> clear_bhb_loop+0x15/0x70 >> Jul 4 20:00:12 pppve1 kernel: [3375931.837579] ? >> clear_bhb_loop+0x15/0x70 >> Jul 4 20:00:12 pppve1 kernel: [3375931.841928] >> entry_SYSCALL_64_after_hwframe+0x78/0x80 >> Jul 4 20:00:12 pppve1 kernel: [3375931.847482] RIP: 0033:0x765bb1ce8316 >> Jul 4 20:00:12 pppve1 kernel: [3375931.851577] Code: 7c 24 08 e8 2c >> 95 f8 ff 4c 8b 54 24 18 48 8b 74 24 10 41 b8 08 00 00 00 41 89 c1 48 >> 8b 7c 24 08 4c 89 e2 b8 0f 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 32 >> 44 89 cf 89 44 24 08 e8 76 95 f8 ff 8b 44 >> Jul 4 20:00:12 pppve1 kernel: [3375931.871194] RSP: >> 002b:00007fff2d39ea20 EFLAGS: 00000293 ORIG_RAX: 000000000000010f >> Jul 4 20:00:12 pppve1 kernel: [3375931.879298] RAX: ffffffffffffffda >> RBX: 00006045d3e68470 RCX: 0000765bb1ce8316 >> Jul 4 20:00:12 pppve1 kernel: [3375931.886963] RDX: 00007fff2d39ea40 >> RSI: 0000000000000010 RDI: 00006045d4de5f20 >> Jul 4 20:00:12 pppve1 kernel: [3375931.894630] RBP: 00007fff2d39eaac >> R08: 0000000000000008 R09: 0000000000000000 >> Jul 4 20:00:12 pppve1 kernel: [3375931.902299] R10: 0000000000000000 >> R11: 0000000000000293 R12: 00007fff2d39ea40 >> Jul 4 20:00:12 pppve1 kernel: [3375931.909951] R13: 00006045d3e68470 >> R14: 00006045b014d570 R15: 00007fff2d39eab0 >> Jul 4 20:00:12 pppve1 kernel: [3375931.917656] </TASK> >> Jul 4 20:00:12 pppve1 kernel: [3375931.920515] Mem-Info: >> Jul 4 20:00:12 pppve1 kernel: [3375931.923465] active_anon:4467063 >> inactive_anon:2449638 isolated_anon:0 >> Jul 4 20:00:12 pppve1 kernel: [3375931.923465] active_file:611 >> inactive_file:303 isolated_file:0 >> Jul 4 20:00:12 pppve1 kernel: [3375931.923465] unevictable:39551 >> dirty:83 writeback:237 >> Jul 4 20:00:12 pppve1 kernel: [3375931.923465] >> slab_reclaimable:434580 slab_unreclaimable:1792355 >> Jul 4 20:00:12 pppve1 kernel: [3375931.923465] mapped:571491 >> shmem:581427 pagetables:26365 >> Jul 4 20:00:12 pppve1 kernel: [3375931.923465] sec_pagetables:11751 >> bounce:0 >> Jul 4 20:00:12 pppve1 kernel: [3375931.923465] >> kernel_misc_reclaimable:0 >> Jul 4 20:00:12 pppve1 kernel: [3375931.923465] free:234516 >> free_pcp:5874 free_cma:0 >> Jul 4 20:00:12 pppve1 kernel: [3375931.969518] Node 0 >> active_anon:17033436kB inactive_anon:10633368kB active_file:64kB >> inactive_file:3196kB unevictable:158204kB isolated(anon):0kB >> isolated(file):0kB mapped:2285988kB dirty:356kB writeback:948kB >> shmem:2325708kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:866304kB >> writeback_tmp:0kB kernel_stack:11520kB pagetables:105460kB >> sec_pagetables:47004kB all_unreclaimable? no >> Jul 4 20:00:12 pppve1 kernel: [3375932.004977] Node 0 DMA >> free:11264kB boost:0kB min:12kB low:24kB high:36kB >> reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB >> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB >> present:15996kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB >> local_pcp:0kB free_cma:0kB >> Jul 4 20:00:12 pppve1 kernel: [3375932.032646] lowmem_reserve[]: 0 >> 1527 63844 63844 63844 >> Jul 4 20:00:12 pppve1 kernel: [3375932.038675] Node 0 DMA32 >> free:252428kB boost:0kB min:1616kB low:3176kB high:4736kB >> reserved_highatomic:2048KB active_anon:310080kB >> inactive_anon:986436kB active_file:216kB inactive_file:0kB >> unevictable:0kB writepending:0kB present:1690624kB managed:1623508kB >> mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB >> Jul 4 20:00:12 pppve1 kernel: [3375932.069110] lowmem_reserve[]: 0 0 >> 62317 62317 62317 >> Jul 4 20:00:12 pppve1 kernel: [3375932.074979] Node 0 Normal >> free:814396kB boost:290356kB min:356304kB low:420116kB high:483928kB >> reserved_highatomic:346112KB active_anon:11258684kB >> inactive_anon:15111580kB active_file:0kB inactive_file:2316kB >> unevictable:158204kB writepending:1304kB present:65011712kB >> managed:63820796kB mlocked:155132kB bounce:0kB free_pcp:12728kB >> local_pcp:0kB free_cma:0kB >> Jul 4 20:00:12 pppve1 kernel: [3375932.109188] lowmem_reserve[]: 0 0 >> 0 0 0 >> Jul 4 20:00:12 pppve1 kernel: [3375932.114119] Node 0 DMA: 0*4kB >> 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) >> 1*2048kB (M) 2*4096kB (M) = 11264kB >> Jul 4 20:00:12 pppve1 kernel: [3375932.127796] Node 0 DMA32: >> 5689*4kB (UMH) 1658*8kB (UMH) 381*16kB (UM) 114*32kB (UME) 97*64kB >> (UME) 123*128kB (UMEH) 87*256kB (MEH) 96*512kB (UMEH) 58*1024kB (UME) >> 5*2048kB (UME) 11*4096kB (ME) = 253828kB >> Jul 4 20:00:12 pppve1 kernel: [3375932.148050] Node 0 Normal: >> 16080*4kB (UMEH) 36886*8kB (UMEH) 22890*16kB (UMEH) 4687*32kB (MEH) >> 159*64kB (UMEH) 10*128kB (UE) 0*256kB 0*512kB 0*1024kB 0*2048kB >> 0*4096kB = 887088kB >> Jul 4 20:00:12 pppve1 kernel: [3375932.165899] Node 0 >> hugepages_total=0 hugepages_free=0 hugepages_surp=0 >> hugepages_size=1048576kB >> Jul 4 20:00:12 pppve1 kernel: [3375932.175876] Node 0 >> hugepages_total=0 hugepages_free=0 hugepages_surp=0 >> hugepages_size=2048kB >> Jul 4 20:00:12 pppve1 kernel: [3375932.185569] 586677 total >> pagecache pages >> Jul 4 20:00:12 pppve1 kernel: [3375932.190737] 0 pages in swap cache >> Jul 4 20:00:12 pppve1 kernel: [3375932.195285] Free swap = 0kB >> Jul 4 20:00:12 pppve1 kernel: [3375932.199404] Total swap = 0kB >> Jul 4 20:00:12 pppve1 kernel: [3375932.203513] 16679583 pages RAM >> Jul 4 20:00:12 pppve1 kernel: [3375932.207787] 0 pages >> HighMem/MovableOnly >> Jul 4 20:00:12 pppve1 kernel: [3375932.212819] 314667 pages reserved >> Jul 4 20:00:12 pppve1 kernel: [3375932.217321] 0 pages hwpoisoned >> Jul 4 20:00:12 pppve1 kernel: [3375932.221525] Tasks state (memory >> values in pages): >> Jul 4 20:00:12 pppve1 kernel: [3375932.227400] [ pid ] uid tgid >> total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents >> oom_score_adj name >> Jul 4 20:00:12 pppve1 kernel: [3375932.239680] [ 1959] 106 >> 1959 1971 544 96 448 0 61440 >> 0 0 rpcbind >> Jul 4 20:00:12 pppve1 kernel: [3375932.251672] [ 1982] 104 >> 1982 2350 672 160 512 0 57344 >> 0 -900 dbus-daemon >> Jul 4 20:00:12 pppve1 kernel: [3375932.264020] [ 1991] 0 >> 1991 1767 275 83 192 0 57344 >> 0 0 ksmtuned >> Jul 4 20:00:12 pppve1 kernel: [3375932.276094] [ 1995] 0 >> 1995 69541 480 64 416 0 86016 >> 0 0 pve-lxc-syscall >> Jul 4 20:00:12 pppve1 kernel: [3375932.289686] [ 2002] 0 >> 2002 1330 384 32 352 0 53248 >> 0 0 qmeventd >> Jul 4 20:00:12 pppve1 kernel: [3375932.301811] [ 2003] 0 >> 2003 55449 727 247 480 0 86016 >> 0 0 rsyslogd >> Jul 4 20:00:12 pppve1 kernel: [3375932.313919] [ 2004] 0 >> 2004 3008 928 448 480 0 69632 >> 0 0 smartd >> Jul 4 20:00:12 pppve1 kernel: [3375932.325834] [ 2009] 0 >> 2009 6386 992 224 768 0 77824 >> 0 0 systemd-logind >> Jul 4 20:00:12 pppve1 kernel: [3375932.339438] [ 2010] 0 >> 2010 584 256 0 256 0 40960 >> 0 -1000 watchdog-mux >> Jul 4 20:00:12 pppve1 kernel: [3375932.352936] [ 2021] 0 >> 2021 60174 928 256 672 0 90112 >> 0 0 zed >> Jul 4 20:00:12 pppve1 kernel: [3375932.364626] [ 2136] 0 >> 2136 75573 256 64 192 0 86016 >> 0 -1000 lxcfs >> Jul 4 20:00:12 pppve1 kernel: [3375932.376485] [ 2397] 0 >> 2397 2208 480 64 416 0 61440 >> 0 0 lxc-monitord >> Jul 4 20:00:12 pppve1 kernel: [3375932.389169] [ 2421] 0 >> 2421 40673 454 70 384 0 73728 >> 0 0 apcupsd >> Jul 4 20:00:12 pppve1 kernel: [3375932.400685] [ 2426] 0 >> 2426 3338 428 172 256 0 69632 >> 0 0 iscsid >> Jul 4 20:00:12 pppve1 kernel: [3375932.412121] [ 2427] 0 >> 2427 3464 3343 431 2912 0 77824 >> 0 -17 iscsid >> Jul 4 20:00:12 pppve1 kernel: [3375932.423754] [ 2433] 0 >> 2433 3860 1792 320 1472 0 77824 >> 0 -1000 sshd >> Jul 4 20:00:12 pppve1 kernel: [3375932.435208] [ 2461] 0 >> 2461 189627 2688 1344 1344 0 155648 >> 0 0 dsm_ism_srvmgrd >> Jul 4 20:00:12 pppve1 kernel: [3375932.448290] [ 2490] 113 >> 2490 4721 750 142 608 0 61440 >> 0 0 chronyd >> Jul 4 20:00:12 pppve1 kernel: [3375932.459988] [ 2492] 113 >> 2492 2639 502 118 384 0 61440 >> 0 0 chronyd >> Jul 4 20:00:12 pppve1 kernel: [3375932.471684] [ 2531] 0 >> 2531 1469 448 32 416 0 49152 >> 0 0 agetty >> Jul 4 20:00:12 pppve1 kernel: [3375932.483269] [ 2555] 0 >> 2555 126545 673 244 429 0 147456 >> 0 0 rrdcached >> Jul 4 20:00:12 pppve1 kernel: [3375932.483275] [ 2582] 0 >> 2582 155008 15334 3093 864 11377 434176 >> 0 0 pmxcfs >> Jul 4 20:00:12 pppve1 kernel: [3375932.506653] [ 2654] 0 >> 2654 10667 614 134 480 0 77824 >> 0 0 master >> Jul 4 20:00:12 pppve1 kernel: [3375932.517986] [ 2656] 107 >> 2656 10812 704 160 544 0 73728 >> 0 0 qmgr >> Jul 4 20:00:12 pppve1 kernel: [3375932.529118] [ 2661] 0 >> 2661 139892 41669 28417 2980 10272 405504 >> 0 0 corosync >> Jul 4 20:00:12 pppve1 kernel: [3375932.540553] [ 2662] 0 >> 2662 1653 576 32 544 0 53248 >> 0 0 cron >> Jul 4 20:00:12 pppve1 kernel: [3375932.551657] [ 2664] 0 >> 2664 1621 480 96 384 0 57344 >> 0 0 proxmox-firewal >> Jul 4 20:00:12 pppve1 kernel: [3375932.564093] [ 3164] 0 >> 3164 83332 26227 25203 768 256 360448 >> 0 0 pve-firewall >> Jul 4 20:00:12 pppve1 kernel: [3375932.576192] [ 3233] 0 >> 3233 85947 28810 27242 1216 352 385024 >> 0 0 pvestatd >> Jul 4 20:00:12 pppve1 kernel: [3375932.587638] [ 3417] 0 >> 3417 93674 36011 35531 480 0 438272 >> 0 0 pvedaemon >> Jul 4 20:00:12 pppve1 kernel: [3375932.599167] [ 3421] 0 >> 3421 95913 37068 35884 1120 64 454656 >> 0 0 pvedaemon worke >> Jul 4 20:00:12 pppve1 kernel: [3375932.611536] [ 3424] 0 >> 3424 96072 36972 35852 1088 32 454656 >> 0 0 pvedaemon worke >> Jul 4 20:00:12 pppve1 kernel: [3375932.623977] [ 3426] 0 >> 3426 96167 37068 35948 1056 64 458752 >> 0 0 pvedaemon worke >> Jul 4 20:00:12 pppve1 kernel: [3375932.636698] [ 3558] 0 >> 3558 90342 29540 28676 608 256 385024 >> 0 0 pve-ha-crm >> Jul 4 20:00:12 pppve1 kernel: [3375932.648477] [ 3948] 33 >> 3948 94022 37705 35849 1856 0 471040 >> 0 0 pveproxy >> Jul 4 20:00:12 pppve1 kernel: [3375932.660083] [ 3954] 33 >> 3954 21688 14368 12736 1632 0 221184 >> 0 0 spiceproxy >> Jul 4 20:00:12 pppve1 kernel: [3375932.671862] [ 3956] 0 >> 3956 90222 29321 28521 544 256 397312 >> 0 0 pve-ha-lrm >> Jul 4 20:00:12 pppve1 kernel: [3375932.683484] [ 3994] 0 3994 >> 1290140 706601 705993 608 0 6389760 >> 0 0 kvm >> Jul 4 20:00:12 pppve1 kernel: [3375932.694551] [ 4088] 0 4088 >> 1271416 1040767 1040223 544 0 8994816 >> 0 0 kvm >> Jul 4 20:00:12 pppve1 kernel: [3375932.705624] [ 4160] 0 >> 4160 89394 30149 29541 608 0 380928 >> 0 0 pvescheduler >> Jul 4 20:00:12 pppve1 kernel: [3375932.717864] [ 4710] 0 >> 4710 1375 480 32 448 0 57344 >> 0 0 agetty >> Jul 4 20:00:12 pppve1 kernel: [3375932.729183] [ 5531] 0 >> 5531 993913 567351 566647 704 0 5611520 >> 0 0 kvm >> Jul 4 20:00:12 pppve1 kernel: [3375932.740212] [ 6368] 0 6368 >> 5512483 4229046 4228342 704 0 34951168 >> 0 0 kvm >> Jul 4 20:00:12 pppve1 kernel: [3375932.751255] [ 9796] 0 >> 9796 1941 768 64 704 0 57344 >> 0 0 lxc-start >> Jul 4 20:00:12 pppve1 kernel: [3375932.762840] [ 9808] 100000 >> 9808 3875 160 32 128 0 77824 >> 0 0 init >> Jul 4 20:00:12 pppve1 kernel: [3375932.774063] [ 11447] 100000 >> 11447 9272 192 64 128 0 118784 >> 0 0 rpcbind >> Jul 4 20:00:12 pppve1 kernel: [3375932.785534] [ 11620] 100000 >> 11620 45718 240 112 128 0 126976 >> 0 0 rsyslogd >> Jul 4 20:00:12 pppve1 kernel: [3375932.797241] [ 11673] 100000 >> 11673 4758 195 35 160 0 81920 >> 0 0 atd >> Jul 4 20:00:12 pppve1 kernel: [3375932.808516] [ 11748] 100000 >> 11748 6878 228 36 192 0 98304 >> 0 0 cron >> Jul 4 20:00:12 pppve1 kernel: [3375932.819868] [ 11759] 100102 >> 11759 10533 257 65 192 0 122880 >> 0 0 dbus-daemon >> Jul 4 20:00:12 pppve1 kernel: [3375932.832328] [ 11765] 100000 >> 11765 13797 315 155 160 0 143360 >> 0 0 sshd >> Jul 4 20:00:12 pppve1 kernel: [3375932.843547] [ 11989] 100104 >> 11989 565602 19744 288 160 19296 372736 >> 0 0 postgres >> Jul 4 20:00:12 pppve1 kernel: [3375932.855266] [ 12169] 100104 >> 12169 565938 537254 678 192 536384 4517888 >> 0 0 postgres >> Jul 4 20:00:12 pppve1 kernel: [3375932.866950] [ 12170] 100104 >> 12170 565859 199654 550 224 198880 4296704 >> 0 0 postgres >> Jul 4 20:00:12 pppve1 kernel: [3375932.878525] [ 12171] 100104 >> 12171 565859 4710 358 224 4128 241664 >> 0 0 postgres >> Jul 4 20:00:12 pppve1 kernel: [3375932.890252] [ 12172] 100104 >> 12172 565962 7654 518 192 6944 827392 >> 0 0 postgres >> Jul 4 20:00:12 pppve1 kernel: [3375932.901845] [ 12173] 100104 >> 12173 20982 742 518 224 0 200704 >> 0 0 postgres >> Jul 4 20:00:13 pppve1 kernel: [3375932.913421] [ 13520] 100000 >> 13520 9045 192 128 64 0 114688 >> 0 0 master >> Jul 4 20:00:13 pppve1 kernel: [3375932.924809] [ 13536] 100100 >> 13536 9601 320 128 192 0 126976 >> 0 0 qmgr >> Jul 4 20:00:13 pppve1 kernel: [3375932.936088] [ 13547] 100000 >> 13547 3168 192 32 160 0 73728 >> 0 0 getty >> Jul 4 20:00:13 pppve1 kernel: [3375932.947424] [ 13548] 100000 >> 13548 3168 160 32 128 0 73728 >> 0 0 getty >> Jul 4 20:00:13 pppve1 kernel: [3375932.958761] [1302486] 0 >> 1302486 1941 768 96 672 0 53248 >> 0 0 lxc-start >> Jul 4 20:00:13 pppve1 kernel: [3375932.970490] [1302506] 100000 >> 1302506 2115 128 32 96 0 65536 >> 0 0 init >> Jul 4 20:00:13 pppve1 kernel: [3375932.981999] [1302829] 100001 >> 1302829 2081 128 0 128 0 61440 >> 0 0 portmap >> Jul 4 20:00:13 pppve1 kernel: [3375932.993763] [1302902] 100000 >> 1302902 27413 160 64 96 0 122880 >> 0 0 rsyslogd >> Jul 4 20:00:13 pppve1 kernel: [3375933.005719] [1302953] 100000 >> 1302953 117996 1654 1366 227 61 450560 >> 0 0 apache2 >> Jul 4 20:00:13 pppve1 kernel: [3375933.017459] [1302989] 100000 >> 1302989 4736 97 33 64 0 81920 >> 0 0 atd >> Jul 4 20:00:13 pppve1 kernel: [3375933.028905] [1303004] 100104 >> 1303004 5843 64 32 32 0 94208 >> 0 0 dbus-daemon >> Jul 4 20:00:13 pppve1 kernel: [3375933.041272] [1303030] 100000 >> 1303030 12322 334 110 224 0 139264 >> 0 0 sshd >> Jul 4 20:00:13 pppve1 kernel: [3375933.052755] [1303048] 100000 >> 1303048 5664 64 32 32 0 94208 >> 0 0 cron >> Jul 4 20:00:13 pppve1 kernel: [3375933.064220] [1303255] 100000 >> 1303255 9322 224 96 128 0 118784 >> 0 0 master >> Jul 4 20:00:13 pppve1 kernel: [3375933.075896] [1303284] 100101 >> 1303284 9878 352 128 224 0 122880 >> 0 0 qmgr >> Jul 4 20:00:13 pppve1 kernel: [3375933.087405] [1303285] 100000 >> 1303285 1509 32 0 32 0 61440 >> 0 0 getty >> Jul 4 20:00:13 pppve1 kernel: [3375933.099008] [1303286] 100000 >> 1303286 1509 64 0 64 0 61440 >> 0 0 getty >> Jul 4 20:00:13 pppve1 kernel: [3375933.110571] [1420994] 33 >> 1420994 21749 13271 12759 512 0 204800 >> 0 0 spiceproxy work >> Jul 4 20:00:13 pppve1 kernel: [3375933.123378] [1421001] 33 >> 1421001 94055 37044 35892 1152 0 434176 >> 0 0 pveproxy worker >> Jul 4 20:00:13 pppve1 kernel: [3375933.136284] [1421002] 33 >> 1421002 94055 36980 35860 1120 0 434176 >> 0 0 pveproxy worker >> Jul 4 20:00:13 pppve1 kernel: [3375933.149173] [1421003] 33 >> 1421003 94055 37044 35892 1152 0 434176 >> 0 0 pveproxy worker >> Jul 4 20:00:13 pppve1 kernel: [3375933.162040] [2316827] 0 >> 2316827 6820 1088 224 864 0 69632 >> 0 -1000 systemd-udevd >> Jul 4 20:00:13 pppve1 kernel: [3375933.174778] [2316923] 0 >> 2316923 51282 2240 224 2016 0 438272 >> 0 -250 systemd-journal >> Jul 4 20:00:13 pppve1 kernel: [3375933.187768] [3148356] 0 >> 3148356 32681 21120 19232 1888 0 249856 >> 0 0 glpi-agent (tag >> Jul 4 20:00:13 pppve1 kernel: [3375933.200481] [3053571] 0 >> 3053571 19798 480 32 448 0 57344 >> 0 0 pvefw-logger >> Jul 4 20:00:13 pppve1 kernel: [3375933.212970] [3498513] 100033 >> 3498513 119792 7207 2632 223 4352 516096 >> 0 0 apache2 >> Jul 4 20:00:13 pppve1 kernel: [3375933.224713] [3498820] 100104 >> 3498820 575918 235975 9351 160 226464 3424256 >> 0 0 postgres >> Jul 4 20:00:13 pppve1 kernel: [3375933.236579] [3500997] 100033 >> 3500997 119889 7202 2594 192 4416 524288 >> 0 0 apache2 >> Jul 4 20:00:13 pppve1 kernel: [3375933.248240] [3501657] 100104 >> 3501657 571325 199025 6001 160 192864 2945024 >> 0 0 postgres >> Jul 4 20:00:13 pppve1 kernel: [3375933.260100] [3502514] 100033 >> 3502514 119119 5907 2004 191 3712 503808 >> 0 0 apache2 >> Jul 4 20:00:13 pppve1 kernel: [3375933.271772] [3503679] 100104 >> 3503679 575295 211508 6612 192 204704 2953216 >> 0 0 postgres >> Jul 4 20:00:13 pppve1 kernel: [3375933.283619] [3515234] 100033 >> 3515234 119042 6568 1960 192 4416 503808 >> 0 0 apache2 >> Jul 4 20:00:13 pppve1 kernel: [3375933.295362] [3515420] 100104 >> 3515420 569839 97579 4491 160 92928 2293760 >> 0 0 postgres >> Jul 4 20:00:13 pppve1 kernel: [3375933.307155] [3520282] 100033 >> 3520282 119129 5416 2056 192 3168 495616 >> 0 0 apache2 >> Jul 4 20:00:13 pppve1 kernel: [3375933.318923] [3520287] 100033 >> 3520287 119015 5709 1894 167 3648 503808 >> 0 0 apache2 >> Jul 4 20:00:13 pppve1 kernel: [3375933.330805] [3520288] 100033 >> 3520288 119876 5961 2729 224 3008 507904 >> 0 0 apache2 >> Jul 4 20:00:13 pppve1 kernel: [3375933.342648] [3521057] 100104 >> 3521057 573824 46069 8341 128 37600 1830912 >> 0 0 postgres >> Jul 4 20:00:13 pppve1 kernel: [3375933.354567] [3521067] 100104 >> 3521067 574768 99734 7446 96 92192 2134016 >> 0 0 postgres >> Jul 4 20:00:13 pppve1 kernel: [3375933.366512] [3521301] 100104 >> 3521301 569500 174722 4194 160 170368 2482176 >> 0 0 postgres >> Jul 4 20:00:13 pppve1 kernel: [3375933.378484] [3532810] 100033 >> 3532810 118740 4127 1727 160 2240 479232 >> 0 0 apache2 >> Jul 4 20:00:13 pppve1 kernel: [3375933.390140] [3532933] 100033 >> 3532933 118971 5064 1864 160 3040 503808 >> 0 0 apache2 >> Jul 4 20:00:13 pppve1 kernel: [3375933.401854] [3534151] 100104 >> 3534151 567344 168822 1686 160 166976 2408448 >> 0 0 postgres >> Jul 4 20:00:13 pppve1 kernel: [3375933.413852] [3535832] 100104 >> 3535832 569005 41042 2578 128 38336 1150976 >> 0 0 postgres >> Jul 4 20:00:13 pppve1 kernel: [3375933.425919] [3550993] 100033 >> 3550993 118029 1768 1544 224 0 425984 >> 0 0 apache2 >> Jul 4 20:00:13 pppve1 kernel: [3375933.437868] [3560475] 107 >> 3560475 10767 928 160 768 0 77824 >> 0 0 pickup >> Jul 4 20:00:13 pppve1 kernel: [3375933.449513] [3563017] 100101 >> 3563017 9838 256 96 160 0 122880 >> 0 0 pickup >> Jul 4 20:00:13 pppve1 kernel: [3375933.461255] [3575085] 100100 >> 3575085 9561 288 128 160 0 118784 >> 0 0 pickup >> Jul 4 20:00:13 pppve1 kernel: [3375933.473119] [3579986] 0 >> 3579986 1367 384 0 384 0 49152 >> 0 0 sleep >> Jul 4 20:00:13 pppve1 kernel: [3375933.484646] [3579996] 100104 >> 3579996 566249 5031 615 128 4288 450560 >> 0 0 postgres >> Jul 4 20:00:13 pppve1 kernel: [3375933.496645] [3580020] 0 >> 3580020 91269 30310 29606 704 0 409600 >> 0 0 pvescheduler >> Jul 4 20:00:13 pppve1 kernel: [3375933.509585] [3580041] 0 >> 3580041 5005 1920 640 1280 0 81920 >> 0 100 systemd >> Jul 4 20:00:13 pppve1 kernel: [3375933.521297] [3580044] 0 >> 3580044 42685 1538 1218 320 0 102400 >> 0 100 (sd-pam) >> Jul 4 20:00:13 pppve1 kernel: [3375933.533226] [3580125] 100104 >> 3580125 566119 5607 583 704 4320 446464 >> 0 0 postgres >> Jul 4 20:00:13 pppve1 kernel: [3375933.545245] [3580193] 0 >> 3580193 4403 2368 384 1984 0 81920 >> 0 0 sshd >> Jul 4 20:00:13 pppve1 kernel: [3375933.556849] >> oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=qemu.slice,mems_allowed=0,global_oom,task_memcg=/qemu.slice/121.scope,task=kvm,pid=6368,uid=0 >> Jul 4 20:00:13 pppve1 kernel: [3375933.573133] Out of memory: Killed >> process 6368 (kvm) total-vm:22049932kB, anon-rss:16913368kB, >> file-rss:2944kB, shmem-rss:0kB, UID:0 pgtables:34132kB oom_score_adj:0 >> Jul 4 20:00:15 pppve1 kernel: [3375935.378441] zd16: p1 p2 p3 < p5 >> p6 > >> Jul 4 20:00:16 pppve1 kernel: [3375936.735383] oom_reaper: reaped >> process 6368 (kvm), now anon-rss:0kB, file-rss:32kB, shmem-rss:0kB >> Jul 4 20:01:11 pppve1 kernel: [3375991.767379] vmbr0: port >> 5(tap121i0) entered disabled state >> Jul 4 20:01:11 pppve1 kernel: [3375991.778143] tap121i0 >> (unregistering): left allmulticast mode >> Jul 4 20:01:11 pppve1 kernel: [3375991.785976] vmbr0: port >> 5(tap121i0) entered disabled state >> Jul 4 20:01:11 pppve1 kernel: [3375991.791555] zd128: p1 >> Jul 4 20:01:13 pppve1 kernel: [3375993.594688] zd176: p1 p2 >> >> >>> Makes few sense that OOM triggers in 64GB hosts with just 24GB >>> configured in >>> VMs and, probably, less real usage. IMHO it's not VMs what fill your >>> memory >>> up to the point of OOM, but some other process, ZFS ARC, maybe even >>> some mem >>> leak. Maybe some process is producing severe memory fragmentation. >> i can confirm that server was doing some heavy I/O (backup), but AFAIK >> nothing more. >> >> >> Mandi! Roland >> >>> it's a little bit weird that OOM kicks in with VMs <32GB RAM when >>> you have 64GB >>> take a closer look why this happens , i.e. why OOM thinks there is >>> ram pressure >> effectively server was running: >> + vm 100, 2GB >> + vm 120, 4GB >> + vm 121, 16GB >> + vm 127, 4GB >> + lxc 124, 2GB >> + lxc 125, 4GB >> >> so exactly 32GB of RAM. But most of the VM/LXC barely arrived at half >> of the >> allocated RAM... >> >> >> >> Thanks. >> [-- Attachment #2: Type: text/plain, Size: 157 bytes --] _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PVE-User] A less aggressive OOM? 2025-07-10 9:08 ` Roland via pve-user @ 2025-07-10 14:49 ` dorsy via pve-user 2025-07-10 16:11 ` Roland via pve-user ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: dorsy via pve-user @ 2025-07-10 14:49 UTC (permalink / raw) To: pve-user; +Cc: dorsy [-- Attachment #1: Type: message/rfc822, Size: 7780 bytes --] From: dorsy <dorsyka@yahoo.com> To: pve-user@lists.proxmox.com Subject: Re: [PVE-User] A less aggressive OOM? Date: Thu, 10 Jul 2025 16:49:34 +0200 Message-ID: <caa07d1c-6898-434a-85f6-274b1511ed06@yahoo.com> On 7/10/2025 11:08 AM, Roland via pve-user wrote: if OOM kicks in because half of the ram is being used for caches/buffers, i would blame OOMkiller or ZFS for tha. The problem should be resolved at zfs or memory management level. Absolutely no! You are responsible for giving ZFS the limits. As even described in the proxmox documentation here: https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage > _______________________________________________ > pve-user mailing list > pve-user@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user [-- Attachment #2: Type: text/plain, Size: 157 bytes --] _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PVE-User] A less aggressive OOM? 2025-07-10 14:49 ` dorsy via pve-user @ 2025-07-10 16:11 ` Roland via pve-user [not found] ` <98ace9cf-a47f-40cd-8796-6bec3558ebb0@web.de> 2025-07-13 14:28 ` Marco Gaiarin 2 siblings, 0 replies; 10+ messages in thread From: Roland via pve-user @ 2025-07-10 16:11 UTC (permalink / raw) To: Proxmox VE user list; +Cc: Roland [-- Attachment #1: Type: message/rfc822, Size: 10695 bytes --] From: Roland <devzero@web.de> To: Proxmox VE user list <pve-user@lists.proxmox.com> Subject: Re: [PVE-User] A less aggressive OOM? Date: Thu, 10 Jul 2025 18:11:41 +0200 Message-ID: <98ace9cf-a47f-40cd-8796-6bec3558ebb0@web.de> imho, killing processes because of arc using too much ram which can't be reclaimed fast enough is a failure in overall memory coordination. we can set zfs limits as a workaround, yes - but zfs and oomkiller is to blame !!! 1. zfs should free up memory faster, as memory is also freed from buffers/caches 2. oomkiller should put pressure on arc or try reclaim pages from that first, instead of killing kvm processes. maybe oomkiller could be made arc-aware!? roland >On 7/10/2025 11:08 AM, Roland via pve-user wrote: >if OOM kicks in because half of the ram is being used for caches/buffers, i would blame OOMkiller or ZFS for tha. The problem should be resolved at zfs or memory management level. >Absolutely no! >You are responsible for giving ZFS the limits. As even described in the proxmox documentation here: >https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage > _______________________________________________ > pve-user mailing list > pve-user@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user Am 10.07.25 um 16:49 schrieb dorsy via pve-user: > On 7/10/2025 11:08 AM, Roland via pve-user wrote: > if OOM kicks in because half of the ram is being used for > caches/buffers, i would blame OOMkiller or ZFS for tha. The problem > should be resolved at zfs or memory management level. > > Absolutely no! > You are responsible for giving ZFS the limits. As even described in > the proxmox documentation here: > https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage > >> _______________________________________________ >> pve-user mailing list >> pve-user@lists.proxmox.com >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > > > _______________________________________________ > pve-user mailing list > pve-user@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user [-- Attachment #2: Type: text/plain, Size: 157 bytes --] _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <98ace9cf-a47f-40cd-8796-6bec3558ebb0@web.de>]
* Re: [PVE-User] A less aggressive OOM? [not found] ` <98ace9cf-a47f-40cd-8796-6bec3558ebb0@web.de> @ 2025-07-10 16:15 ` dorsy via pve-user 0 siblings, 0 replies; 10+ messages in thread From: dorsy via pve-user @ 2025-07-10 16:15 UTC (permalink / raw) To: Proxmox VE user list; +Cc: dorsy [-- Attachment #1: Type: message/rfc822, Size: 9458 bytes --] From: dorsy <dorsyka@yahoo.com> To: Proxmox VE user list <pve-user@lists.proxmox.com> Subject: Re: [PVE-User] A less aggressive OOM? Date: Thu, 10 Jul 2025 18:15:55 +0200 Message-ID: <2a7aed75-5870-4e23-994a-6f78bbe9dd85@yahoo.com> You did overcommit memory by not setting appropriate ZFS limits. So OOMkiller saved Your machine from hangup in a memory constraint situation. That's easy like that. Read The Fine Manual! On 7/10/2025 6:11 PM, Roland wrote: > imho, killing processes because of arc using too much ram which can't > be reclaimed fast enough is a failure in overall memory coordination. > > we can set zfs limits as a workaround, yes - but zfs and oomkiller is > to blame !!! > > 1. zfs should free up memory faster, as memory is also freed from > buffers/caches > > 2. oomkiller should put pressure on arc or try reclaim pages from that > first, instead of killing kvm processes. maybe oomkiller could be > made arc-aware!? > > roland > > > >On 7/10/2025 11:08 AM, Roland via pve-user wrote: > >if OOM kicks in because half of the ram is being used for > caches/buffers, i would blame OOMkiller or ZFS for tha. The problem > should be resolved at zfs or memory management level. > > >Absolutely no! > >You are responsible for giving ZFS the limits. As even described in > the proxmox documentation here: > >https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage > > >> _______________________________________________ >> pve-user mailing list >> pve-user@lists.proxmox.com >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > > > _______________________________________________ > pve-user mailing list > pve-user@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > > Am 10.07.25 um 16:49 schrieb dorsy via pve-user: >> On 7/10/2025 11:08 AM, Roland via pve-user wrote: >> if OOM kicks in because half of the ram is being used for >> caches/buffers, i would blame OOMkiller or ZFS for tha. The problem >> should be resolved at zfs or memory management level. >> >> Absolutely no! >> You are responsible for giving ZFS the limits. As even described in >> the proxmox documentation here: >> https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage >> >> >>> _______________________________________________ >>> pve-user mailing list >>> pve-user@lists.proxmox.com >>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user >> >> >> >> _______________________________________________ >> pve-user mailing list >> pve-user@lists.proxmox.com >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user -- Üdvözlettel, Dorotovics László rendszergazda IKRON Fejlesztő és Szolgáltató Kft. Székhely: 6721 Szeged, Szilágyi utca 5-1. [-- Attachment #2: Type: text/plain, Size: 157 bytes --] _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PVE-User] A less aggressive OOM? 2025-07-10 14:49 ` dorsy via pve-user 2025-07-10 16:11 ` Roland via pve-user [not found] ` <98ace9cf-a47f-40cd-8796-6bec3558ebb0@web.de> @ 2025-07-13 14:28 ` Marco Gaiarin 2 siblings, 0 replies; 10+ messages in thread From: Marco Gaiarin @ 2025-07-13 14:28 UTC (permalink / raw) To: dorsy via pve-user; +Cc: pve-user Mandi! dorsy via pve-user In chel di` si favelave... Thanks to all, particulary to Victor for the wonderful analisys, that lead me to learn a bit better OOM dump... >> if OOM kicks in because half of the ram is being used for >> caches/buffers, i would blame OOMkiller or ZFS for tha. The problem >> should be resolved at zfs or memory management level. > Absolutely no! > You are responsible for giving ZFS the limits. As even described in the > proxmox documentation here: > https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage I'm a bit in the side of Roland on this. ARC is a (indeed, complex) buffer/cache, so seems reasonably that, if i need to sacrifice something, it is better to sacrifice cache than VM. Aniway, if i understood well, default ZFS was to have ARC at 50% of the RAM; after PVE 8.1, PVE modify the default to 10% (for new installation); there's also a 'rule of thumb' to setup ARC, so 10% is somewhat a 'starting point'. In some server i can setup easily swap (i have a disk for an L2ARC, so i can simply detach, partition a bit and reattach as L2ARC and swap). Clearly, i'll set swappiness at 1, to be used only when strictly needed. Thanks to all! -- _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PVE-User] A less aggressive OOM? 2025-07-07 9:26 [PVE-User] A less aggressive OOM? Marco Gaiarin 2025-07-07 21:39 ` Victor Rodriguez @ 2025-07-08 12:05 ` Roland via pve-user 1 sibling, 0 replies; 10+ messages in thread From: Roland via pve-user @ 2025-07-08 12:05 UTC (permalink / raw) To: Proxmox VE user list; +Cc: Roland [-- Attachment #1: Type: message/rfc822, Size: 9044 bytes --] From: Roland <devzero@web.de> To: Proxmox VE user list <pve-user@lists.proxmox.com> Subject: Re: [PVE-User] A less aggressive OOM? Date: Tue, 8 Jul 2025 14:05:54 +0200 Message-ID: <6b15b452-0fc6-41ee-a1f7-34cd7943ab38@web.de> hi, it's a little bit weird that OOM kicks in with VMs <32GB RAM when you have 64GB take a closer look why this happens , i.e. why OOM thinks there is ram pressure roland Am 07.07.25 um 11:26 schrieb Marco Gaiarin: > We have upgraded a set of clusters from PVE6 to PVE8, and we have found that > in newer kernels, OOM is a bit more 'aggressive' and sometime kill a VMs. > > Nodes have plently of RAM (64GB, VMs are 2-3, each 8GB ram), VMs have qemu > agent installed and ballooning enabled, but still sometime OOM happen. > Clearly, if get OOM the main VMs that have the local DNS, we get some > trouble. > > > I've looked in PVE wiki, but found nothing. There's some way to relax OOM, > or control their behaviour? > > In nodes there's no swap, so probably the best thing to do (but the hardest > one ;-) is to setup some swap with a lower swappiness, but i'm seeking > feedback. > > > Thanks. > [-- Attachment #2: Type: text/plain, Size: 157 bytes --] _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-07-13 14:39 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-07-07 9:26 [PVE-User] A less aggressive OOM? Marco Gaiarin 2025-07-07 21:39 ` Victor Rodriguez 2025-07-08 16:31 ` Marco Gaiarin 2025-07-10 8:56 ` Victor Rodriguez 2025-07-10 9:08 ` Roland via pve-user 2025-07-10 14:49 ` dorsy via pve-user 2025-07-10 16:11 ` Roland via pve-user [not found] ` <98ace9cf-a47f-40cd-8796-6bec3558ebb0@web.de> 2025-07-10 16:15 ` dorsy via pve-user 2025-07-13 14:28 ` Marco Gaiarin 2025-07-08 12:05 ` Roland via pve-user
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.