From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id D919484517 for ; Mon, 13 Dec 2021 10:03:09 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id C44A315354 for ; Mon, 13 Dec 2021 10:02:39 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 92AE815344 for ; Mon, 13 Dec 2021 10:02:38 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 53FE344722; Mon, 13 Dec 2021 10:02:38 +0100 (CET) Message-ID: <8d61d388-ce16-41b7-a655-123c1ac45d87@proxmox.com> Date: Mon, 13 Dec 2021 10:02:36 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:96.0) Gecko/20100101 Thunderbird/96.0 Content-Language: en-US To: Proxmox VE development discussion , Alexandre Derumier References: <20211213074316.2565139-1-aderumier@odiso.com> From: Thomas Lamprecht In-Reply-To: <20211213074316.2565139-1-aderumier@odiso.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-SPAM-LEVEL: Spam detection results: 0 AWL 2.128 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -4.093 Looks like a legit reply (A) SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [PATCH pve-ha-manager 0/3] POC/RFC: ressource aware HA manager X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Dec 2021 09:03:09 -0000 Hi, On 13.12.21 08:43, Alexandre Derumier wrote: > Hi, > > this is a proof of concept to implement ressource aware HA. nice! I'll try to give it a quick view now so that I do not stall you to much on this long-wished feature. > The current implementation is really basic, > simply balancing the number of services on each node. > > I had some real production cases, where a node is failing, and restarted vm > impact others nodes because of too much cpu/ram usage. > > This new implementation use best-fit heuristic vector packing with constraints support. > > > - We compute nodes memory/cpu, and vm memory/cpu average stats on last 20min > > For each ressource : > - First, we ordering pending recovery state services by memory, then cpu usage. > Memory is more important here, because vm can't start if target node don't have enough memory agreed > > - Then, we check possible target nodes contraints. (storage available, node have enough cpu/ram, node have enough cores,...) > (could be extended with other constraint like vm affinity/anti-affinity, cpu compatibilty, ...) > > - Then we compute a node weight with euclidean distance of both cpu/ram vectors between vm usage and node available ressources. > Then we choose the first node with the lower eucliean distance weight. > (Ex: if vm use 1go ram/1% cpu, node1 have 2go ram/2% cpu , and node2 have 4go ram/4% cpu, node1 will be choose because it's the nearest of vm usage) sounds like an OK approach to me, I had relatively similar in mind > > - We add recovered vm cpu/ram to target node stats. (This is only an best effort estimation, as the vm start is async on target lrm, and could failed,...) > > > I have keeped HA group node prio, and other other ordering, > so this don't break current tests that is great, the regression test from HA is one of the best we have to test and simulate behavior, so keeping those unchanged can give quite a bit of confidence in any implementation. Albeit with your change its mostly because it's side stepping the balancer as no usage is there? IMO it would be good to have most tests such that they can get affected by the balancer, at least if we make it opt-out > and we can add easily a option at datacenter to enable/disable As a starter we could also only do the compute-node-by-resource usage on recovery and first start transition, as especially for the latter it's quite important to get the service recovered to a node with a low(er) load to avoid domino effect. Doing re-computation then for started VMs would be easy to add once we're sure the algorithm works out. But yeah, for some admins it would surely be welcomed to make it configurable, like: [ ] move to lowest used node on start and recovery of service [ ] auto-balance started services periodically > > It could be easy to implement later some kind of vm auto migration when a node use too much cpu/ram, > reusing same node selection algorithm > > I have added a basic test, I'll add more tests later if this patch serie is ok for you. I'd add commands to sim_hardware_cmd for simulating cpu/memory increase, it's nicer to have that controllable by the cmd list. For the test system it could be also interesting if we can annotate the services with some basic resource usage, e.g. memory and core count and possibly also some low (0.33), mid (0.66) and high (1.0) load-factor (that is controllable by command), that could help to simulate reality while keeping it somewhat simple. > Some good litterature about heuristics: > > microsoft hyper-v implementation: > - http://kunaltalwar.org/papers/VBPacking.pdf > - https://www.microsoft.com/en-us/research/wp-content/uploads/2011/01/virtualization.pdf > Variable size vector bin packing heuristics: > - https://hal.archives-ouvertes.fr/hal-00868016v2/document > > > Alexandre Derumier (3): > add ressource awareness manager > tests: add support for ressources > add test-basic0 > > src/PVE/HA/Env.pm | 24 +++ > src/PVE/HA/Env/PVE2.pm | 90 ++++++++++ > src/PVE/HA/Manager.pm | 246 ++++++++++++++++++++++++++- > src/PVE/HA/Sim/Hardware.pm | 61 +++++++ > src/PVE/HA/Sim/TestEnv.pm | 36 ++++ > src/test/test-basic0/README | 1 + > src/test/test-basic0/cmdlist | 4 + > src/test/test-basic0/hardware_status | 5 + > src/test/test-basic0/log.expect | 52 ++++++ > src/test/test-basic0/manager_status | 1 + > src/test/test-basic0/node_stats | 5 + > src/test/test-basic0/service_config | 5 + > src/test/test-basic0/service_stats | 5 + > 13 files changed, 528 insertions(+), 7 deletions(-) > create mode 100644 src/test/test-basic0/README > create mode 100644 src/test/test-basic0/cmdlist > create mode 100644 src/test/test-basic0/hardware_status > create mode 100644 src/test/test-basic0/log.expect > create mode 100644 src/test/test-basic0/manager_status > create mode 100644 src/test/test-basic0/node_stats > create mode 100644 src/test/test-basic0/service_config > create mode 100644 src/test/test-basic0/service_stats >