From: Alexandre Derumier <aderumier@odiso.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH pve-ha-manager 0/3] POC/RFC: ressource aware HA manager
Date: Mon, 13 Dec 2021 08:43:13 +0100 [thread overview]
Message-ID: <20211213074316.2565139-1-aderumier@odiso.com> (raw)
Hi,
this is a proof of concept to implement ressource aware HA.
The current implementation is really basic,
simply balancing the number of services on each node.
I had some real production cases, where a node is failing, and restarted vm
impact others nodes because of too much cpu/ram usage.
This new implementation use best-fit heuristic vector packing with constraints support.
- We compute nodes memory/cpu, and vm memory/cpu average stats on last 20min
For each ressource :
- First, we ordering pending recovery state services by memory, then cpu usage.
Memory is more important here, because vm can't start if target node don't have enough memory
- Then, we check possible target nodes contraints. (storage available, node have enough cpu/ram, node have enough cores,...)
(could be extended with other constraint like vm affinity/anti-affinity, cpu compatibilty, ...)
- Then we compute a node weight with euclidean distance of both cpu/ram vectors between vm usage and node available ressources.
Then we choose the first node with the lower eucliean distance weight.
(Ex: if vm use 1go ram/1% cpu, node1 have 2go ram/2% cpu , and node2 have 4go ram/4% cpu, node1 will be choose because it's the nearest of vm usage)
- We add recovered vm cpu/ram to target node stats. (This is only an best effort estimation, as the vm start is async on target lrm, and could failed,...)
I have keeped HA group node prio, and other other ordering,
so this don't break current tests, and we can add easily a option at datacenter to enable/disable
It could be easy to implement later some kind of vm auto migration when a node use too much cpu/ram,
reusing same node selection algorithm
I have added a basic test, I'll add more tests later if this patch serie is ok for you.
Some good litterature about heuristics:
microsoft hyper-v implementation:
- http://kunaltalwar.org/papers/VBPacking.pdf
- https://www.microsoft.com/en-us/research/wp-content/uploads/2011/01/virtualization.pdf
Variable size vector bin packing heuristics:
- https://hal.archives-ouvertes.fr/hal-00868016v2/document
Alexandre Derumier (3):
add ressource awareness manager
tests: add support for ressources
add test-basic0
src/PVE/HA/Env.pm | 24 +++
src/PVE/HA/Env/PVE2.pm | 90 ++++++++++
src/PVE/HA/Manager.pm | 246 ++++++++++++++++++++++++++-
src/PVE/HA/Sim/Hardware.pm | 61 +++++++
src/PVE/HA/Sim/TestEnv.pm | 36 ++++
src/test/test-basic0/README | 1 +
src/test/test-basic0/cmdlist | 4 +
src/test/test-basic0/hardware_status | 5 +
src/test/test-basic0/log.expect | 52 ++++++
src/test/test-basic0/manager_status | 1 +
src/test/test-basic0/node_stats | 5 +
src/test/test-basic0/service_config | 5 +
src/test/test-basic0/service_stats | 5 +
13 files changed, 528 insertions(+), 7 deletions(-)
create mode 100644 src/test/test-basic0/README
create mode 100644 src/test/test-basic0/cmdlist
create mode 100644 src/test/test-basic0/hardware_status
create mode 100644 src/test/test-basic0/log.expect
create mode 100644 src/test/test-basic0/manager_status
create mode 100644 src/test/test-basic0/node_stats
create mode 100644 src/test/test-basic0/service_config
create mode 100644 src/test/test-basic0/service_stats
--
2.30.2
next reply other threads:[~2021-12-13 7:43 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-13 7:43 Alexandre Derumier [this message]
2021-12-13 7:43 ` [pve-devel] [PATCH pve-ha-manager 1/3] add ressource awareness manager Alexandre Derumier
2021-12-13 10:04 ` Thomas Lamprecht
2021-12-13 10:58 ` DERUMIER, Alexandre
2021-12-13 11:29 ` Thomas Lamprecht
2021-12-13 7:43 ` [pve-devel] [PATCH pve-ha-manager 2/3] tests: add support for ressources Alexandre Derumier
2021-12-13 7:43 ` [pve-devel] [PATCH pve-ha-manager 3/3] add test-basic0 Alexandre Derumier
2021-12-13 9:02 ` [pve-devel] [PATCH pve-ha-manager 0/3] POC/RFC: ressource aware HA manager Thomas Lamprecht
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211213074316.2565139-1-aderumier@odiso.com \
--to=aderumier@odiso.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.