public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
	Alexandre Derumier <aderumier@odiso.com>
Subject: Re: [pve-devel] [PATCH pve-ha-manager 0/3] POC/RFC: ressource aware HA manager
Date: Mon, 13 Dec 2021 10:02:36 +0100	[thread overview]
Message-ID: <8d61d388-ce16-41b7-a655-123c1ac45d87@proxmox.com> (raw)
In-Reply-To: <20211213074316.2565139-1-aderumier@odiso.com>

Hi,

On 13.12.21 08:43, Alexandre Derumier wrote:
> Hi,
> 
> this is a proof of concept to implement ressource aware HA.

nice! I'll try to give it a quick view now so that I do not stall you to much
on this long-wished feature.

> The current implementation is really basic,
> simply balancing the number of services on each node.
> 
> I had some real production cases, where a node is failing, and restarted vm
> impact others nodes because of too much cpu/ram usage.
> 
> This new implementation use best-fit heuristic vector packing with constraints support.
> 
> 
> - We compute nodes memory/cpu, and vm memory/cpu average stats  on last 20min
> 
> For each ressource :
> - First, we ordering pending recovery state services by memory, then cpu usage.
>   Memory is more important here, because vm can't start if target node don't have enough memory

agreed

> 
> - Then, we check possible target nodes contraints. (storage available, node have enough cpu/ram, node have enough cores,...)
>   (could be extended with other constraint like vm affinity/anti-affinity, cpu compatibilty, ...)
> 
> - Then we compute a node weight with euclidean distance of both cpu/ram vectors between vm usage and node available ressources.
>   Then we choose the first node with the lower eucliean distance weight.
>   (Ex: if vm use 1go ram/1% cpu, node1 have 2go ram/2% cpu , and node2 have 4go ram/4% cpu,  node1 will be choose because it's the nearest of vm usage)

sounds like an OK approach to me, I had relatively similar in mind

> 
> - We add recovered vm cpu/ram to target node stats. (This is only an best effort estimation, as the vm start is async on target lrm, and could failed,...)
> 
> 
> I have keeped HA group node prio, and other other ordering,
> so this don't break current tests

that is great, the regression test from HA is one of the best we have to
test and simulate behavior, so keeping those unchanged can give quite a bit
of confidence in any implementation. Albeit with your change its mostly because
it's side stepping the balancer as no usage is there?

IMO it would be good to have most tests such that they can get affected by
the balancer, at least if we make it opt-out

> and we can add easily a option at datacenter to enable/disable

As a starter we could also only do the compute-node-by-resource usage on recovery
and first start transition, as especially for the latter it's quite important to
get the service recovered to a node with a low(er) load to avoid domino effect.

Doing re-computation then for started VMs would be easy to add once we're sure
the algorithm works out.

But yeah, for some admins it would surely be welcomed to make it configurable, like:

[ ] move to lowest used node on start and recovery of service
[ ] auto-balance started services periodically

> 
> It could be easy to implement later some kind of vm auto migration when a node use too much cpu/ram,
> reusing same node selection algorithm
> 
> I have added a basic test, I'll add more tests later if this patch serie is ok for you.

I'd add commands to sim_hardware_cmd for simulating cpu/memory increase,
it's nicer to have that controllable by the cmd list.

For the test system it could be also interesting if we can annotate the
services with some basic resource usage, e.g. memory and core count and possibly
also some low (0.33), mid (0.66) and high (1.0) load-factor (that is controllable
by command), that could help to simulate reality while keeping it somewhat simple.

> Some good litterature about heuristics:
> 
> microsoft hyper-v implementation: 
>  - http://kunaltalwar.org/papers/VBPacking.pdf
>  - https://www.microsoft.com/en-us/research/wp-content/uploads/2011/01/virtualization.pdf
> Variable size vector bin packing heuristics:
>  - https://hal.archives-ouvertes.fr/hal-00868016v2/document
> 
> 
> Alexandre Derumier (3):
>   add ressource awareness manager
>   tests: add support for ressources
>   add test-basic0
> 
>  src/PVE/HA/Env.pm                    |  24 +++
>  src/PVE/HA/Env/PVE2.pm               |  90 ++++++++++
>  src/PVE/HA/Manager.pm                | 246 ++++++++++++++++++++++++++-
>  src/PVE/HA/Sim/Hardware.pm           |  61 +++++++
>  src/PVE/HA/Sim/TestEnv.pm            |  36 ++++
>  src/test/test-basic0/README          |   1 +
>  src/test/test-basic0/cmdlist         |   4 +
>  src/test/test-basic0/hardware_status |   5 +
>  src/test/test-basic0/log.expect      |  52 ++++++
>  src/test/test-basic0/manager_status  |   1 +
>  src/test/test-basic0/node_stats      |   5 +
>  src/test/test-basic0/service_config  |   5 +
>  src/test/test-basic0/service_stats   |   5 +
>  13 files changed, 528 insertions(+), 7 deletions(-)
>  create mode 100644 src/test/test-basic0/README
>  create mode 100644 src/test/test-basic0/cmdlist
>  create mode 100644 src/test/test-basic0/hardware_status
>  create mode 100644 src/test/test-basic0/log.expect
>  create mode 100644 src/test/test-basic0/manager_status
>  create mode 100644 src/test/test-basic0/node_stats
>  create mode 100644 src/test/test-basic0/service_config
>  create mode 100644 src/test/test-basic0/service_stats
> 





      parent reply	other threads:[~2021-12-13  9:03 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-13  7:43 Alexandre Derumier
2021-12-13  7:43 ` [pve-devel] [PATCH pve-ha-manager 1/3] add ressource awareness manager Alexandre Derumier
2021-12-13 10:04   ` Thomas Lamprecht
2021-12-13 10:58     ` DERUMIER, Alexandre
2021-12-13 11:29       ` Thomas Lamprecht
2021-12-13  7:43 ` [pve-devel] [PATCH pve-ha-manager 2/3] tests: add support for ressources Alexandre Derumier
2021-12-13  7:43 ` [pve-devel] [PATCH pve-ha-manager 3/3] add test-basic0 Alexandre Derumier
2021-12-13  9:02 ` Thomas Lamprecht [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8d61d388-ce16-41b7-a655-123c1ac45d87@proxmox.com \
    --to=t.lamprecht@proxmox.com \
    --cc=aderumier@odiso.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal