From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id CFA9C1FF141 for ; Fri, 13 Mar 2026 10:36:08 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 3026EC4B9; Fri, 13 Mar 2026 10:36:08 +0100 (CET) Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Fri, 13 Mar 2026 10:35:34 +0100 Message-Id: From: "Daniel Kral" To: "DERUMIER, Alexandre" , "pve-devel@lists.proxmox.com" Subject: Re: [RFC PATCH-SERIES many 00/36] dynamic scheduler + load rebalancer X-Mailer: aerc 0.21.0-38-g7088c3642f2c-dirty References: <20260217141437.584852-1-d.kral@proxmox.com> <5a2c063e18f8b052e22af94ecd5db74748082019.camel@groupe-cyllene.com> In-Reply-To: <5a2c063e18f8b052e22af94ecd5db74748082019.camel@groupe-cyllene.com> X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1773394497483 X-SPAM-LEVEL: Spam detection results: 0 AWL -1.023 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_MSPIKE_H2 0.001 Average reputation (+2) RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.408 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.819 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.903 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: 54B5D6ZO6SED3X5GEWXONWWSFTR2IMGS X-Message-ID-Hash: 54B5D6ZO6SED3X5GEWXONWWSFTR2IMGS X-MailFrom: d.kral@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu Mar 12, 2026 at 5:24 PM CET, Alexandre DERUMIER wrote: > Hi, > > thanks for working on this ! > > I see another possible re-balancing case, is when pressure of a vm is > too high (mostly on cpu), like 5~10% of pressure (should be > configurable) > > When host have a lot of cores, balancing based on host cpu average is > not always working fine, because you can have some cores at 100% and > other have low usage,=20 > (you can have a global host cpu average at 60~70% for example, but > somes vms with cpu pressure). > > It could be great to have some kind of trigger to migrate vms with too > much pressure (reusing topsis for example). Hi Alexandre, thanks for the input! Using the pressure stall information to reduce resource contention and improve efficient resource usage is indeed a goal! There are some nuances to properly integrate it into the load balancing decisions though. I want to give some notes about the current design decisions, which I'll include in the documentation to make them more accessible. The current implementation focuses on the stabilizing part of the load balancing system. That is, the system reaches an equilibrium as the general resource usage between the cluster nodes is evenly balanced. This reduces the likelihood of individual nodes reaching 100% CPU and/or memory usage, which will certainly lock up the guests on the nodes. We cannot prevent the nodes from reaching 100% of course, as that would mean the cluster itself is using up more resources than it can handle and should be expanded in hardware. The important part is that the CPU and memory usage of guests is usually more reproducible on different nodes. That is, we can more or less assume that the absolute usage will be the same on another node (not taking account of things like KSM, etc currently). This is not the case with pressure stall as this depends on the host and the running processes on there. We cannot predict that the pressure on one node or for one guest can be reduced by moving it to another node. The rough idea might be that either the HA Manager (through the psi information broadcasted over the pmxcfs) or the LRMs (with locally polling the psi files themselves and signaling the information to the HA Manager) might give more clues where guests should go. Though this needs more care and thought as it's important that the system stabilizes and won't move around guests all the time. Hope this gives some more insight why this design was chosen. It's important to have the core system ready in such a shape that it can be improved on later ;). I'd be happy for tests, design critique and review of course! > Another improvements could be to filter candidate target-node based on > ressources availability.( if the target-node have less cores than the > vm , don't have storage or the network for example). These should certainly be easier-to-implement cases too. I want to include the option to include/exclude guests in the load balancing scheme in one of the next revisions or a follow-up in general. For now the series does implement the load balancing for the HA stack only, so the assumptions are that the HA resource does have all necessary resources available on the nodes. If there are nodes where this is not the case, these should be excluded with HA node affinity rules. The goal is to expand this to the whole cluster in the end of course, but this needs some more adaptions and should certainly be handled in another patch series. Daniel