From: Alexandre Derumier <aderumier@odiso.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH pve-ha-manager 0/3] POC/RFC: ressource aware HA manager
Date: Mon, 13 Dec 2021 08:43:13 +0100 [thread overview]
Message-ID: <20211213074316.2565139-1-aderumier@odiso.com> (raw)
Hi,
this is a proof of concept to implement ressource aware HA.
The current implementation is really basic,
simply balancing the number of services on each node.
I had some real production cases, where a node is failing, and restarted vm
impact others nodes because of too much cpu/ram usage.
This new implementation use best-fit heuristic vector packing with constraints support.
- We compute nodes memory/cpu, and vm memory/cpu average stats on last 20min
For each ressource :
- First, we ordering pending recovery state services by memory, then cpu usage.
Memory is more important here, because vm can't start if target node don't have enough memory
- Then, we check possible target nodes contraints. (storage available, node have enough cpu/ram, node have enough cores,...)
(could be extended with other constraint like vm affinity/anti-affinity, cpu compatibilty, ...)
- Then we compute a node weight with euclidean distance of both cpu/ram vectors between vm usage and node available ressources.
Then we choose the first node with the lower eucliean distance weight.
(Ex: if vm use 1go ram/1% cpu, node1 have 2go ram/2% cpu , and node2 have 4go ram/4% cpu, node1 will be choose because it's the nearest of vm usage)
- We add recovered vm cpu/ram to target node stats. (This is only an best effort estimation, as the vm start is async on target lrm, and could failed,...)
I have keeped HA group node prio, and other other ordering,
so this don't break current tests, and we can add easily a option at datacenter to enable/disable
It could be easy to implement later some kind of vm auto migration when a node use too much cpu/ram,
reusing same node selection algorithm
I have added a basic test, I'll add more tests later if this patch serie is ok for you.
Some good litterature about heuristics:
microsoft hyper-v implementation:
- http://kunaltalwar.org/papers/VBPacking.pdf
- https://www.microsoft.com/en-us/research/wp-content/uploads/2011/01/virtualization.pdf
Variable size vector bin packing heuristics:
- https://hal.archives-ouvertes.fr/hal-00868016v2/document
Alexandre Derumier (3):
add ressource awareness manager
tests: add support for ressources
add test-basic0
src/PVE/HA/Env.pm | 24 +++
src/PVE/HA/Env/PVE2.pm | 90 ++++++++++
src/PVE/HA/Manager.pm | 246 ++++++++++++++++++++++++++-
src/PVE/HA/Sim/Hardware.pm | 61 +++++++
src/PVE/HA/Sim/TestEnv.pm | 36 ++++
src/test/test-basic0/README | 1 +
src/test/test-basic0/cmdlist | 4 +
src/test/test-basic0/hardware_status | 5 +
src/test/test-basic0/log.expect | 52 ++++++
src/test/test-basic0/manager_status | 1 +
src/test/test-basic0/node_stats | 5 +
src/test/test-basic0/service_config | 5 +
src/test/test-basic0/service_stats | 5 +
13 files changed, 528 insertions(+), 7 deletions(-)
create mode 100644 src/test/test-basic0/README
create mode 100644 src/test/test-basic0/cmdlist
create mode 100644 src/test/test-basic0/hardware_status
create mode 100644 src/test/test-basic0/log.expect
create mode 100644 src/test/test-basic0/manager_status
create mode 100644 src/test/test-basic0/node_stats
create mode 100644 src/test/test-basic0/service_config
create mode 100644 src/test/test-basic0/service_stats
--
2.30.2
next reply other threads:[~2021-12-13 7:43 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-13 7:43 Alexandre Derumier [this message]
2021-12-13 7:43 ` [pve-devel] [PATCH pve-ha-manager 1/3] add ressource awareness manager Alexandre Derumier
2021-12-13 10:04 ` Thomas Lamprecht
2021-12-13 10:58 ` DERUMIER, Alexandre
2021-12-13 11:29 ` Thomas Lamprecht
2021-12-13 7:43 ` [pve-devel] [PATCH pve-ha-manager 2/3] tests: add support for ressources Alexandre Derumier
2021-12-13 7:43 ` [pve-devel] [PATCH pve-ha-manager 3/3] add test-basic0 Alexandre Derumier
2021-12-13 9:02 ` [pve-devel] [PATCH pve-ha-manager 0/3] POC/RFC: ressource aware HA manager Thomas Lamprecht
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211213074316.2565139-1-aderumier@odiso.com \
--to=aderumier@odiso.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox