From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id F3A2B8E19D for ; Thu, 10 Nov 2022 15:38:39 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id D656C28803 for ; Thu, 10 Nov 2022 15:38:09 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Thu, 10 Nov 2022 15:38:05 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 2AD8044B37 for ; Thu, 10 Nov 2022 15:38:05 +0100 (CET) From: Fiona Ebner To: pve-devel@lists.proxmox.com Date: Thu, 10 Nov 2022 15:37:39 +0100 Message-Id: <20221110143800.98047-1-f.ebner@proxmox.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.122 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment POISEN_SPAM_PILL 0.1 Meta: its spam POISEN_SPAM_PILL_1 0.1 random spam to be learned in bayes POISEN_SPAM_PILL_3 0.1 random spam to be learned in bayes SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pve-devel] [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Nov 2022 14:38:40 -0000 Right now, the online node usage calculation for the HA manager only considers the number of active services on each node. This patch series allows switching to a 'static' scheduler mode instead, where static usage information from the nodes and guest configurations is used instead. This also includes the remaining cgroup/cpuunits-related patches, because the broadcasting of static information was done to include the cgroup mode of the node. With this version, the effect is limited to choosing nodes during recovery, but the plan is to extend this. As a next step, it would be nice to also have for startup, but AFAICT the issue is that the node selection only happens after the state is already set to started and I think select_service_node() doesn't currently know if a service has been newly started. I haven't looked into it in too much detail though. An idea to get a balancer out of it, is to: 1. (optionally) sort all services by badness (needs new backend function) 2. iterate scoring the nodes for each service, adding the usage to the chosen node after each iteration. The current node can be kept if the score compared to the best node doesn't differ too much. 3. record the chosen nodes and migrate the services accordingly. Still missing are also unit tests for ha-manager itself. Almost all of the series is preparatory infrastructure, but the hope is that much of it can be re-used for balancers and dynamic scheduling in the future. The proxmox-resource-scheduling Rust crate implements the TOPSIS algorithm first suggested by Alexandre. It also models the static node and service usages in PVE and allows to score nodes where to start new or recovered service. This is done by simulating starting it on each node and comparing the alternatives with average and highest CPU and memory as criteria. Memory being weighted much more as it is a more limited resource than CPU. I did not implement the criteria weighing process from AHP (yet) (also suggested by Alexandre) which computes avaraged weights and a bias score from a table of pairwise weights between criteria. The downside is that one needs to guess n(n-1)/2 weights instead of n, and the upside is that it has to be done only pairwise rather than relative to all others. But this still can be done in the future if we want. In proxmox-perl-rs, a class is provided for interfacing from Perl. In pve-manager, the static node information is broadcast whenever outdated. There also are the unrelated (but touching the same code) cgroup/cpuunits patches. In pve-cluster, a new crs (=cluster-resource-scheduler) option is added, initially with a mode for HA. In pve-ha-manager, the online node usage calculation is factored out into a 'Usage' plugin system to ease adding the new static mode without much cluttering. If not all nodes provide static service information, we fall back to the 'basic' mode. If only the scoring fails (but that /should/ be rather unlikely), there is no real fallback implemented currently (the '|| $a cmp $b' in select_service_node() destroys the random hash keys order again ;)). We could change it to stay random or better, track the service count in Usage::Static too and use that. Dependency bumps needed: proxmox-perl-rs depends on proxmox-resource-scheduling proxmox-ha-manager (build)depends on proxmox-perl-rs The new feature is only usable with updated pve-manager and pve-cluster of course, but no hard dependency. proxmox-resource-scheduling: Fiona Ebner (3): initial commit add pve_static module add Debian packaging proxmox-perl-rs: Fiona Ebner (2): pve-rs: add resource scheduling module add basic test for resource scheduling Makefile | 1 + pve-rs/Cargo.toml | 1 + pve-rs/src/lib.rs | 1 + pve-rs/src/resource_scheduling/mod.rs | 1 + pve-rs/src/resource_scheduling/static.rs | 116 +++++++++++++++++++++++ pve-rs/test/Makefile | 4 + pve-rs/test/README | 2 + pve-rs/test/resource_scheduling.pl | 70 ++++++++++++++ 8 files changed, 196 insertions(+) create mode 100644 pve-rs/src/resource_scheduling/mod.rs create mode 100644 pve-rs/src/resource_scheduling/static.rs create mode 100644 pve-rs/test/Makefile create mode 100644 pve-rs/test/README create mode 100755 pve-rs/test/resource_scheduling.pl pve-manager: Fiona Ebner (3): pvestatd: broadcast static node information cluster resources: add cgroup-mode to node properties ui: lxc/qemu: cpu edit: make cpuunits depend on node's cgroup version PVE/API2/Cluster.pm | 13 +++++++++++++ PVE/Service/pvestatd.pm | 25 ++++++++++++++++++++++++ www/manager6/lxc/CreateWizard.js | 8 ++++++++ www/manager6/lxc/ResourceEdit.js | 31 +++++++++++++++++++++++++----- www/manager6/lxc/Resources.js | 8 +++++++- www/manager6/qemu/CreateWizard.js | 8 ++++++++ www/manager6/qemu/HardwareView.js | 8 +++++++- www/manager6/qemu/ProcessorEdit.js | 31 +++++++++++++++++++++++------- 8 files changed, 118 insertions(+), 14 deletions(-) pve-cluster: Fiona Ebner (1): datacenter config: add cluster resource scheduling (crs) options data/PVE/DataCenterConfig.pm | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) pve-ha-manager: Fiona Ebner (11): env: add get_static_node_stats() method resources: add get_static_stats() method add Usage base plugin and Usage::Basic plugin manager: select service node: add $sid to parameters manager: online node usage: switch to Usage::Basic plugin usage: add Usage::Static plugin env: add get_crs_settings() method manager: set resource scheduler mode upon init manager: use static resource scheduler when configured manager: avoid scoring nodes if maintenance fallback node is valid manager: avoid scoring nodes when not trying next and current node is valid debian/pve-ha-manager.install | 3 + src/PVE/HA/Env.pm | 13 ++++ src/PVE/HA/Env/PVE2.pm | 29 +++++++++ src/PVE/HA/Makefile | 3 +- src/PVE/HA/Manager.pm | 77 ++++++++++++++--------- src/PVE/HA/Resources.pm | 5 ++ src/PVE/HA/Resources/PVECT.pm | 11 ++++ src/PVE/HA/Resources/PVEVM.pm | 14 +++++ src/PVE/HA/Sim/Env.pm | 9 +++ src/PVE/HA/Sim/TestEnv.pm | 6 ++ src/PVE/HA/Usage.pm | 50 +++++++++++++++ src/PVE/HA/Usage/Basic.pm | 52 ++++++++++++++++ src/PVE/HA/Usage/Makefile | 6 ++ src/PVE/HA/Usage/Static.pm | 114 ++++++++++++++++++++++++++++++++++ src/test/test_failover1.pl | 21 ++++--- 15 files changed, 374 insertions(+), 39 deletions(-) create mode 100644 src/PVE/HA/Usage.pm create mode 100644 src/PVE/HA/Usage/Basic.pm create mode 100644 src/PVE/HA/Usage/Makefile create mode 100644 src/PVE/HA/Usage/Static.pm pve-docs: Fiona Ebner (1): ha: add section about scheduler modes ha-manager.adoc | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) -- 2.30.2