public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager
@ 2022-11-17 14:00 Fiona Ebner
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 01/15] env: add get_static_node_stats() method Fiona Ebner
                   ` (17 more replies)
  0 siblings, 18 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
  To: pve-devel

Right now, the online node usage calculation for the HA manager only
considers the number of active services on each node. This patch
series allows switching to a 'static' scheduler mode instead, where
static usage information from the nodes and guest configurations is
used instead.

With this version, the effect is limited to choosing nodes during
recovery or by migrations triggered by a shutdown plolicy, but the
plan is to extend this in the future.

As a next step, it would be nice to also have for startup, but AFAICT
the issue is that the node selection only happens after the state is
already set to started and I think select_service_node() doesn't
currently know if a service has been newly started. I haven't looked
into it in too much detail though.

An idea to get a balancer out of it, is to:
1. (optionally) sort all services by badness (needs new backend function)
2. iterate scoring the nodes for each service, adding the usage to the
   chosen node after each iteration. The current node can be kept if the
   score compared to the best node doesn't differ too much.
3. record the chosen nodes and migrate the services accordingly.


The online node usage calculation is factored out into a 'Usage'
plugin system to ease adding the new static mode without much
cluttering. If not all nodes provide static service information, we
fall back to the 'basic' mode. If only the scoring fails, the service
count is used as a fallback.


Dependency bumps needed:
proxmox-ha-manager (build)depends on proxmox-perl-rs
The new feature is only usable with updated pve-manager and
pve-cluster of course, but no hard dependency.


Changes from v1:
    * Drop already applied patches.
    * Add tests for HA manager which also required properly adding
      relevant methods to the simulation environment.
    * Implement fallback for scoring in Usage/Static.pm.
    * Improve documentation and mention current limitation with many
      services.


ha-manager:

Fiona Ebner (15):
  env: add get_static_node_stats() method
  resources: add get_static_stats() method
  add Usage base plugin and Usage::Basic plugin
  manager: select service node: add $sid to parameters
  manager: online node usage: switch to Usage::Basic plugin
  usage: add Usage::Static plugin
  env: rename get_ha_settings to get_datacenter_settings
  env: datacenter config: include crs (cluster-resource-scheduling)
    setting
  manager: set resource scheduler mode upon init
  manager: use static resource scheduler when configured
  manager: avoid scoring nodes if maintenance fallback node is valid
  manager: avoid scoring nodes when not trying next and current node is
    valid
  usage: static: use service count on nodes as a fallback
  test: add tests for static resource scheduling
  resources: add missing PVE::Cluster use statements

 debian/pve-ha-manager.install                 |   3 +
 src/PVE/HA/Env.pm                             |  10 +-
 src/PVE/HA/Env/PVE2.pm                        |  27 ++-
 src/PVE/HA/LRM.pm                             |   4 +-
 src/PVE/HA/Makefile                           |   3 +-
 src/PVE/HA/Manager.pm                         |  79 +++++---
 src/PVE/HA/Resources.pm                       |   5 +
 src/PVE/HA/Resources/PVECT.pm                 |  13 ++
 src/PVE/HA/Resources/PVEVM.pm                 |  16 ++
 src/PVE/HA/Sim/Env.pm                         |  13 +-
 src/PVE/HA/Sim/Hardware.pm                    |  28 +++
 src/PVE/HA/Sim/Resources.pm                   |  10 +
 src/PVE/HA/Usage.pm                           |  50 +++++
 src/PVE/HA/Usage/Basic.pm                     |  52 ++++++
 src/PVE/HA/Usage/Makefile                     |   6 +
 src/PVE/HA/Usage/Static.pm                    | 120 ++++++++++++
 src/test/test-crs-static1/README              |   4 +
 src/test/test-crs-static1/cmdlist             |   4 +
 src/test/test-crs-static1/datacenter.cfg      |   6 +
 src/test/test-crs-static1/hardware_status     |   5 +
 src/test/test-crs-static1/log.expect          |  50 +++++
 src/test/test-crs-static1/manager_status      |   1 +
 src/test/test-crs-static1/service_config      |   3 +
 .../test-crs-static1/static_service_stats     |   3 +
 src/test/test-crs-static2/README              |   4 +
 src/test/test-crs-static2/cmdlist             |  20 ++
 src/test/test-crs-static2/datacenter.cfg      |   6 +
 src/test/test-crs-static2/groups              |   2 +
 src/test/test-crs-static2/hardware_status     |   7 +
 src/test/test-crs-static2/log.expect          | 171 ++++++++++++++++++
 src/test/test-crs-static2/manager_status      |   1 +
 src/test/test-crs-static2/service_config      |   3 +
 .../test-crs-static2/static_service_stats     |   3 +
 src/test/test-crs-static3/README              |   5 +
 src/test/test-crs-static3/cmdlist             |   4 +
 src/test/test-crs-static3/datacenter.cfg      |   9 +
 src/test/test-crs-static3/hardware_status     |   5 +
 src/test/test-crs-static3/log.expect          | 131 ++++++++++++++
 src/test/test-crs-static3/manager_status      |   1 +
 src/test/test-crs-static3/service_config      |  12 ++
 .../test-crs-static3/static_service_stats     |  12 ++
 src/test/test-crs-static4/README              |   6 +
 src/test/test-crs-static4/cmdlist             |   4 +
 src/test/test-crs-static4/datacenter.cfg      |   9 +
 src/test/test-crs-static4/hardware_status     |   5 +
 src/test/test-crs-static4/log.expect          | 149 +++++++++++++++
 src/test/test-crs-static4/manager_status      |   1 +
 src/test/test-crs-static4/service_config      |  12 ++
 .../test-crs-static4/static_service_stats     |  12 ++
 src/test/test-crs-static5/README              |   5 +
 src/test/test-crs-static5/cmdlist             |   4 +
 src/test/test-crs-static5/datacenter.cfg      |   9 +
 src/test/test-crs-static5/hardware_status     |   5 +
 src/test/test-crs-static5/log.expect          | 117 ++++++++++++
 src/test/test-crs-static5/manager_status      |   1 +
 src/test/test-crs-static5/service_config      |  10 +
 .../test-crs-static5/static_service_stats     |  11 ++
 src/test/test_failover1.pl                    |  21 ++-
 58 files changed, 1242 insertions(+), 50 deletions(-)
 create mode 100644 src/PVE/HA/Usage.pm
 create mode 100644 src/PVE/HA/Usage/Basic.pm
 create mode 100644 src/PVE/HA/Usage/Makefile
 create mode 100644 src/PVE/HA/Usage/Static.pm
 create mode 100644 src/test/test-crs-static1/README
 create mode 100644 src/test/test-crs-static1/cmdlist
 create mode 100644 src/test/test-crs-static1/datacenter.cfg
 create mode 100644 src/test/test-crs-static1/hardware_status
 create mode 100644 src/test/test-crs-static1/log.expect
 create mode 100644 src/test/test-crs-static1/manager_status
 create mode 100644 src/test/test-crs-static1/service_config
 create mode 100644 src/test/test-crs-static1/static_service_stats
 create mode 100644 src/test/test-crs-static2/README
 create mode 100644 src/test/test-crs-static2/cmdlist
 create mode 100644 src/test/test-crs-static2/datacenter.cfg
 create mode 100644 src/test/test-crs-static2/groups
 create mode 100644 src/test/test-crs-static2/hardware_status
 create mode 100644 src/test/test-crs-static2/log.expect
 create mode 100644 src/test/test-crs-static2/manager_status
 create mode 100644 src/test/test-crs-static2/service_config
 create mode 100644 src/test/test-crs-static2/static_service_stats
 create mode 100644 src/test/test-crs-static3/README
 create mode 100644 src/test/test-crs-static3/cmdlist
 create mode 100644 src/test/test-crs-static3/datacenter.cfg
 create mode 100644 src/test/test-crs-static3/hardware_status
 create mode 100644 src/test/test-crs-static3/log.expect
 create mode 100644 src/test/test-crs-static3/manager_status
 create mode 100644 src/test/test-crs-static3/service_config
 create mode 100644 src/test/test-crs-static3/static_service_stats
 create mode 100644 src/test/test-crs-static4/README
 create mode 100644 src/test/test-crs-static4/cmdlist
 create mode 100644 src/test/test-crs-static4/datacenter.cfg
 create mode 100644 src/test/test-crs-static4/hardware_status
 create mode 100644 src/test/test-crs-static4/log.expect
 create mode 100644 src/test/test-crs-static4/manager_status
 create mode 100644 src/test/test-crs-static4/service_config
 create mode 100644 src/test/test-crs-static4/static_service_stats
 create mode 100644 src/test/test-crs-static5/README
 create mode 100644 src/test/test-crs-static5/cmdlist
 create mode 100644 src/test/test-crs-static5/datacenter.cfg
 create mode 100644 src/test/test-crs-static5/hardware_status
 create mode 100644 src/test/test-crs-static5/log.expect
 create mode 100644 src/test/test-crs-static5/manager_status
 create mode 100644 src/test/test-crs-static5/service_config
 create mode 100644 src/test/test-crs-static5/static_service_stats


docs:

Fiona Ebner (2):
  ha: add section about scheduler modes
  ha: add warning against using 'static' mode with many services

 ha-manager.adoc | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

-- 
2.30.2





^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-11-18 13:23 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 01/15] env: add get_static_node_stats() method Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 02/15] resources: add get_static_stats() method Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 03/15] add Usage base plugin and Usage::Basic plugin Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 04/15] manager: select service node: add $sid to parameters Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 05/15] manager: online node usage: switch to Usage::Basic plugin Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 06/15] usage: add Usage::Static plugin Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 07/15] env: rename get_ha_settings to get_datacenter_settings Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 08/15] env: datacenter config: include crs (cluster-resource-scheduling) setting Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 09/15] manager: set resource scheduler mode upon init Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 10/15] manager: use static resource scheduler when configured Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 11/15] manager: avoid scoring nodes if maintenance fallback node is valid Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 12/15] manager: avoid scoring nodes when not trying next and current " Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 13/15] usage: static: use service count on nodes as a fallback Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 14/15] test: add tests for static resource scheduling Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 15/15] resources: add missing PVE::Cluster use statements Fiona Ebner
2022-11-18  7:48   ` Fiona Ebner
2022-11-18 12:48     ` Thomas Lamprecht
2022-11-17 14:00 ` [pve-devel] [PATCH v2 docs 1/2] ha: add section about scheduler modes Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 docs 2/2] ha: add warning against using 'static' mode with many services Fiona Ebner
2022-11-18 13:23 ` [pve-devel] applied-series: [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal