From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 242E21FF15C for ; Fri, 14 Nov 2025 11:07:29 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id ED65ACE78; Fri, 14 Nov 2025 11:07:21 +0100 (CET) From: Daniel Kral To: pve-devel@lists.proxmox.com Date: Fri, 14 Nov 2025 11:06:12 +0100 Message-ID: <20251114100641.92919-1-d.kral@proxmox.com> X-Mailer: git-send-email 2.47.3 MIME-Version: 1.0 X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1763114778125 X-SPAM-LEVEL: Spam detection results: 0 AWL -0.384 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_ASCII_DIVIDERS 0.8 Email that uses ascii formatting dividers and possible spam tricks KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pve-devel] [PATCH-SERIES ha-manager/perl-rs/proxmox v4 00/12] Granular online_node_usage accounting X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" v3: https://lore.proxmox.com/pve-devel/20251027164513.542678-1-d.kral@proxmox.com/ v2: https://lore.proxmox.com/pve-devel/20251020164540.517231-1-d.kral@proxmox.com/ v1: https://lore.proxmox.com/pve-devel/20250930142021.366529-1-d.kral@proxmox.com/ Changes since v3: - rebase on master: - proxmox.git: 54299e63 (pve-api-types: bump version to 8.1.0-1, 2025-11-13) - proxmox-perl-rs.git: 9f59fe9 (pve-rs: bump version to 0.11.1, 2025-11-14) - pve-ha-manager.git: a9a99a9b (gitignore: ignore build directory, 2025-10-14) - add proxmox.git patch to remove deprecated .cargo/config - avoid auto-vivification in get_resource_affinity(...) - do not change the signature of score_nodes_to_start_service but generic to accept nodes as &[T] where T: AsRef Changes since v2: - remove already applied qemu-server patch - fix oversight in static service stats cache patch where get_vmlist() was misused Changes since v1: - rebased all patches on master - improve interface for static cache (thanks @Fiona!) - improve get_used_service_nodes() signature (thanks @Fiona!) - move get_used_service_nodes() to PVE::HA::Usage (thanks @Fiona!) - make service_nodes HashMap Value a HashSet (thanks @Fiona!) - move static cache patch to the end of series (as it shows its improvement there best) - make add_service_usage() helper part of $online_node_usage - various other style nits (thanks @Fiona!) - dropped ha-manager patch #9 Follow-up on making online_node_usage accounting more granular. = Patches = Build-dependency and dependency bump for pve-ha-manager needed! Build-dependency bump needed for pve-rs needed! Versioned breaks for pve-ha-manager in pve-rs needed! See pve-rs #1 and ha-manager #6 for more information. proxmox patch #1 only cleanup proxmox patch #2 necessary for pve-rs patch pve-rs patch #1 allow removing service usage pve-rs patch #2 small refactor for test cases ha-manager patch #1-#3 remove redundant $online_node_usage updates ha-manager patch #4-#5 some decoupling and refactoring ha-manager patch #6-#7 setup $online_node_usage only once per round and make changes granular inbetween ha-manager patch #8 implement static cache and use PVE::Cluster::get_guest_config_properties(...) = Benchmarks = Here are some benchmarks with a 3 nodes cluster, static load scheduler, and rebalance_on_request_start set in a virtualized environment, where all HA resources are added and started at once in the first manage(...) call. The columns are for HA resource count and the rows are for different patches applied (qm #1 = qemu-server patch #1). Run-times for the first manage(...) call to rebalance HA resources: 300 3,000 10,002 master 19.9 s - - #7 909 ms 10.0 s 33.5 s #8 390 ms 3.83 s 13.4 s #8 + qm #1 219 ms 1.92 s 7.11 s The following small breakdown of the #8 + qm #1 benchmark with 10,002 HA resources shows the following top 10 most (exclusively) time-consuming functions. This shows that: - with the patches from the HA rules follow-up [0] should improve the time for get_node_affinity (and therefore select_service_node), and - there's definitely room to improve the call to get_current_memory(...) in get_derived_property(...) in qemu-server, which cascades to $change_service_state (add_service_usage / add_service_usage_to_node) and calls parse_property_string and check_prop. This should still give a nice foundation for the upcoming dynamic load information + load balancing series'. Also it should be pretty rare for the HA Manager to handle 10,002 HA resource state changes + it's still well enough under the time limit. +-------------------------------------------+------------+------------+ | Function | Excl. time | Incl. time | +-------------------------------------------+------------+------------+ | Sys::Syslog::syslog | 1.12 s | 2.45 s | | Sys::Syslog::xlate | 404 ms | 533 ms | | PVE::(...)NodeAffinity::get_node_affinity | 307 ms | 667 ms | | PVE::HA::Manager::select_service_node | 276 ms | 2.35 s | | POSIX::strftime (xsub) | 242 ms | 242 ms | | PVE::JSONSchema::parse_property_string | 241 ms | 782 ms | | PVE::JSONSchema::check_prop | 214 ms | 488 ms | | Sys::Syslog::CORE:syswrite (opcode) | 201 ms | 201 ms | | PVE::HA::Manager::$change_service_state | 198 ms | 2.43 s | | PVE::HA::Manager::manage | 189 ms | 7.11 s | +-------------------------------------------+------------+------------+ [0] https://lore.proxmox.com/pve-devel/20250909083539.39675-1-d.kral@proxmox.com/ proxmox: Daniel Kral (2): resource-scheduling: use workspace's .cargo/config.toml resource-scheduling: allow owned nodes slice for score_nodes_to_start_service proxmox-resource-scheduling/.cargo/config | 5 ----- proxmox-resource-scheduling/src/pve_static.rs | 13 ++++++++++--- 2 files changed, 10 insertions(+), 8 deletions(-) delete mode 100644 proxmox-resource-scheduling/.cargo/config perl-rs: Daniel Kral (2): pve-rs: resource_scheduling: allow granular usage changes test: resource_scheduling: use score_nodes helper to imitate HA Manager .../bindings/resource_scheduling_static.rs | 108 +++++++++++++++--- pve-rs/test/resource_scheduling.pl | 106 ++++++++++++----- 2 files changed, 170 insertions(+), 44 deletions(-) ha-manager: Daniel Kral (8): manager: remove redundant recompute_online_node_usage from next_state_recovery manager: remove redundant add_service_usage_to_node from next_state_recovery manager: remove redundant add_service_usage_to_node from next_state_started rules: resource affinity: decouple get_resource_affinity helper from Usage class manager: make recompute_online_node_usage use add_service_usage helper usage: allow granular changes to Usage implementations manager: make online node usage computation granular implement static service stats cache src/PVE/HA/Env.pm | 12 ++++ src/PVE/HA/Env/PVE2.pm | 36 ++++++++++++ src/PVE/HA/Manager.pm | 82 +++++++--------------------- src/PVE/HA/Resources/PVECT.pm | 3 +- src/PVE/HA/Resources/PVEVM.pm | 3 +- src/PVE/HA/Rules/ResourceAffinity.pm | 24 ++++---- src/PVE/HA/Sim/Env.pm | 12 ++++ src/PVE/HA/Sim/Hardware.pm | 21 +++++++ src/PVE/HA/Sim/Resources.pm | 3 +- src/PVE/HA/Usage.pm | 69 +++++++++++++++++------ src/PVE/HA/Usage/Basic.pm | 35 +++++------- src/PVE/HA/Usage/Static.pm | 43 ++++++--------- src/test/test_failover1.pl | 17 +++--- 13 files changed, 211 insertions(+), 149 deletions(-) Summary over all repositories: 17 files changed, 391 insertions(+), 201 deletions(-) -- Generated by git-murpp 0.8.0 _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel