From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 03E9C1FF187 for ; Mon, 3 Nov 2025 11:22:48 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id DB5A617F2D; Mon, 3 Nov 2025 11:21:59 +0100 (CET) From: Daniel Kral To: pve-devel@lists.proxmox.com Date: Mon, 3 Nov 2025 11:19:44 +0100 Message-ID: <20251103102118.153666-1-d.kral@proxmox.com> X-Mailer: git-send-email 2.47.3 MIME-Version: 1.0 X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1762165265626 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.015 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pve-devel] [PATCH ha-manager v3 00/21] HA rules fixes + performance improvements + cleanup X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" v2: https://lore.proxmox.com/pve-devel/20250909083539.39675-1-d.kral@proxmox.com/ v1: https://lore.proxmox.com/pve-devel/20250821143705.256562-1-d.kral@proxmox.com/ This series is based on top of the granular accounting series [0] [0] https://lore.proxmox.com/pve-devel/20251027164513.542678-1-d.kral@proxmox.com/ Changelog from v2 -> v3: - rebased on top of master + granular accounting series [0], because [0] is already closer to being merged and as both change some rules' interfaces I had to choose one above the other - fix and add clarifying notes to inter-rules feasibility checks - remove usage of prototypes introduced by the original HA rules series, which I had wrong assumptions about - some minor code changes (see per-patch notes): - use 5.36 for newly introduced module PVE::HA::Rules::Helpers - use my sub name {} over my $name = sub {} for new private subroutines / helpers - change POD signature of get_node_affinity(...) too Ran a `git rebase master --exec 'make clean && make deb'` on the series and tested the changes to the Status API manually. I put the patches in decreasing priority: PATCH 1 fix output of get_resource_info() about dependent resources PATCH 2 fix retranslating ha rules on nodelist changes PATCH 3-5 fix wrong assumption about positive resource affinity checks PATCH 6-10 ha rules performance improvements + preparations PATCH 11-12 make test cases use to_json(...) and add compiled configs PATCH 13-21 various smaller cleanups related to HA rules Ad PATCH 1, 2, 3-5: The first few patches fix some wrong assumptions and bugs. Ad PATCH 3-5: During the initial HA rules implementation there were quite a few changes done to how rules were checked and translated, which moved the merging of positive resource affinity rules to the end of the pipeline. Not taking notice of this clearly enough, this causes two checks to have wrong assumptions about positive positive resource affinity rules: they assume that the resource sets of positive resource affinity rules are already disjoint from each other, even though that is only done at a later stage. As it would be rather cumbersome to interleave checks/pruning infeasible rules and rule transforms [0] (i.e. move the merging transform inbetween the previous resource affinity checks and the inter-consistency check; I tried that but it looked rather ugly), the best method IMO was to use the already existing helper to find these disjoint sets. This also provides these checks with the correct positive resource affinity rule ids to blame instead of building any extra logic for handling that if checks/transforms would've been interleaved. Ad PATCH 6-10: These patches prepare and implement the compilation of HA rules when these are used in the HA Manager, which improves the performance of checking for HA rules (significant for overall performance) and when applying these (significant only for resources with rules). Ad PATCH 11-21: These patches are various smaller improvements related to the core HA rules feature, which are dependent on the changes above and/or are easier to apply in a single patch series (e.g. due to larger changes in a single source file): - make rules tests use to_json(...) instead of Data::Dumper - make rules tests also output compiled configs to document changes - synchronize how active LRMs are determined, which included how it is done for checking health for the HA groups migration - add notes about the why's and how's for feasibility checks Daniel Kral (21): config: do not add ignored resources to dependent resources manager: retranslate rules if nodes are added or removed rules: factor out disjoint rules' resource set helper rules: resource affinity: inter-consistency check with merged positive rules rules: add merged positive resource affinity info in global checks rules: make rules sorting optional in foreach_rule helper rename rule's canonicalize stage to transform stage rules: make plugins register transformers instead of plugin_transform rules: node affinity: decouple get_node_affinity helper from Usage class compile ha rules to a more compact representation test: rules: use to_json instead of Data::Dumper for config output test: rules: add compiled config output to rules config test cases rules: node affinity: define node priority outside hash access move minimum version check helper to ha tools manager: move group migration cooldown variable into helper api: status: sync active service counting with lrm's helper manager: group migration: sync active service counting with lrm's helper factor out counting of active services into helper tree-wide: remove misused function prototype declaractions rules: fix documentation for inter-rules checker subroutines rules: add documentation about current feasibility check implementations debian/pve-ha-manager.install | 1 + src/PVE/API2/HA/Rules.pm | 1 + src/PVE/API2/HA/Status.pm | 17 +- src/PVE/HA/Config.pm | 18 +- src/PVE/HA/HashTools.pm | 6 +- src/PVE/HA/LRM.pm | 20 +- src/PVE/HA/Manager.pm | 89 ++-- src/PVE/HA/NodeStatus.pm | 14 + src/PVE/HA/Rules.pm | 215 +++++++--- src/PVE/HA/Rules/Helpers.pm | 77 ++++ src/PVE/HA/Rules/Makefile | 2 +- src/PVE/HA/Rules/NodeAffinity.pm | 90 ++-- src/PVE/HA/Rules/ResourceAffinity.pm | 220 +++++----- src/PVE/HA/Tools.pm | 52 +++ ...efaults-for-node-affinity-rules.cfg.expect | 145 ++++--- ...lts-for-resource-affinity-rules.cfg.expect | 87 ++-- ...onsistent-node-resource-affinity-rules.cfg | 19 + ...nt-node-resource-affinity-rules.cfg.expect | 185 ++++++--- .../inconsistent-resource-affinity-rules.cfg | 15 + ...sistent-resource-affinity-rules.cfg.expect | 21 +- ...egative-resource-affinity-rules.cfg.expect | 73 ++-- ...fective-resource-affinity-rules.cfg.expect | 18 +- ...egative-resource-affinity-rules.cfg.expect | 342 +++++++++------ ...ositive-resource-affinity-rules.cfg.expect | 391 +++++++++++++----- ...egative-resource-affinity-rules.cfg.expect | 186 +++++---- ...ositive-resource-affinity-rules.cfg.expect | 264 +++++++++--- ...ty-with-resource-affinity-rules.cfg.expect | 114 +++-- ...rce-refs-in-node-affinity-rules.cfg.expect | 200 ++++++--- src/test/test_failover1.pl | 15 +- src/test/test_rules_config.pl | 14 +- 30 files changed, 1904 insertions(+), 1007 deletions(-) create mode 100644 src/PVE/HA/Rules/Helpers.pm -- 2.47.3 _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel