public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH V3 pve-ha-manager 0/2] POC/RFC: ressource aware HA manager
@ 2021-12-21 15:13 Alexandre Derumier
  2021-12-21 15:13 ` [pve-devel] [PATCH V3 pve-ha-manager 1/2] add ressource awareness manager Alexandre Derumier
  2021-12-21 15:13 ` [pve-devel] [PATCH V3 pve-ha-manager 2/2] add test-basic0 Alexandre Derumier
  0 siblings, 2 replies; 3+ messages in thread
From: Alexandre Derumier @ 2021-12-21 15:13 UTC (permalink / raw)
  To: pve-devel

Hi,

this is a proof of concept to implement ressource aware HA.

The current implementation is really basic,
simply balancing the number of services on each node.

I had some real production cases, where a node is failing, and restarted vm
impact others nodes because of too much cpu/ram usage.


Changelog v2:

- merging main code && Sim code in same patch for now. (I'll split them later)
- cleanup will all Thomas comments review (thanks again)
- add more comments in code
- check storage for lxc too
- use maxmem for windows vms


Changelog v3:

- fix vm/ct config read (need to specify node)
- fix storage_availability_check params
- Classify nodes with low/medium/high threshold for better balancing.
  We try to fill nodes with lower usage first until the threshold is reached



I still need to add missing storage availability test

Alexandre Derumier (2):
  add ressource awareness manager
  add test-basic0

 src/PVE/HA/Env.pm                    |  33 ++++
 src/PVE/HA/Env/PVE2.pm               | 177 +++++++++++++++++
 src/PVE/HA/Manager.pm                | 274 ++++++++++++++++++++++++++-
 src/PVE/HA/Sim/Hardware.pm           |  61 ++++++
 src/PVE/HA/Sim/TestEnv.pm            |  50 ++++-
 src/test/test-basic0/README          |   1 +
 src/test/test-basic0/cmdlist         |   4 +
 src/test/test-basic0/hardware_status |   5 +
 src/test/test-basic0/log.expect      |  52 +++++
 src/test/test-basic0/manager_status  |   1 +
 src/test/test-basic0/node_stats      |   5 +
 src/test/test-basic0/service_config  |   5 +
 src/test/test-basic0/service_stats   |   5 +
 13 files changed, 664 insertions(+), 9 deletions(-)
 create mode 100644 src/test/test-basic0/README
 create mode 100644 src/test/test-basic0/cmdlist
 create mode 100644 src/test/test-basic0/hardware_status
 create mode 100644 src/test/test-basic0/log.expect
 create mode 100644 src/test/test-basic0/manager_status
 create mode 100644 src/test/test-basic0/node_stats
 create mode 100644 src/test/test-basic0/service_config
 create mode 100644 src/test/test-basic0/service_stats

-- 
2.30.2




^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-12-21 15:14 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-21 15:13 [pve-devel] [PATCH V3 pve-ha-manager 0/2] POC/RFC: ressource aware HA manager Alexandre Derumier
2021-12-21 15:13 ` [pve-devel] [PATCH V3 pve-ha-manager 1/2] add ressource awareness manager Alexandre Derumier
2021-12-21 15:13 ` [pve-devel] [PATCH V3 pve-ha-manager 2/2] add test-basic0 Alexandre Derumier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal