From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <t.lamprecht@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 3968C8456B
 for <pve-devel@lists.proxmox.com>; Mon, 13 Dec 2021 12:29:45 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 2E0F81728D
 for <pve-devel@lists.proxmox.com>; Mon, 13 Dec 2021 12:29:15 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id B4BBB1727D
 for <pve-devel@lists.proxmox.com>; Mon, 13 Dec 2021 12:29:13 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 813FE440F2;
 Mon, 13 Dec 2021 12:29:13 +0100 (CET)
Message-ID: <2a177ba5-02d3-42df-0cdf-1e6c181d9bcd@proxmox.com>
Date: Mon, 13 Dec 2021 12:29:12 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:96.0) Gecko/20100101
 Thunderbird/96.0
Content-Language: en-US
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
 "DERUMIER, Alexandre" <Alexandre.DERUMIER@groupe-cyllene.com>,
 "aderumier@odiso.com" <aderumier@odiso.com>
References: <20211213074316.2565139-1-aderumier@odiso.com>
 <20211213074316.2565139-2-aderumier@odiso.com>
 <6402c0c7-20ac-82d5-ecf8-a0b20899e154@proxmox.com>
 <feeb375dcb6d494497359b67cd8f60657e31682f.camel@groupe-cyllene.com>
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
In-Reply-To: <feeb375dcb6d494497359b67cd8f60657e31682f.camel@groupe-cyllene.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL 2.125 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 NICE_REPLY_A           -4.093 Looks like a legit reply (A)
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pve-devel] [PATCH pve-ha-manager 1/3] add ressource awareness
 manager
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Mon, 13 Dec 2021 11:29:45 -0000

Hi Alexandre,

On 13.12.21 11:58, DERUMIER, Alexandre wrote:
> Thanks for the fast review !
> 
> I'll rework the patches with your comments.
> Do you prefer some another commits patches for each of your remark for easier review ?  or can I send a v2 directly the changes already merged ?

A v2 with the fixes squashed in works good for me, if you also think the commit-split
I suggested is Ok and do that one great, else it's OK too.

> 
>> semi-related, Dietmar and I talked about adapting the RRD format and higher resolution
>> from PBS also for PVE, dropping the dependency on rrdcached, but PVE with the cluster
>> and broadcasting is quite a bit more complex compared to PBS, so the effort stalled.
>>
>> If we make more use of those metrics here it would add further coupling to the existing
>> interface so maybe we'd rethink that through - but nothing that needs to stall this
>> completely, changing RRD would be better done at a major release anyway where we
>> can handle this somewhat cleanly.
> 
> I think we had talked last year about single rrd file by metric, but I'm not sure it's the best way
> (with the number of files it could be generate).
> I would like to have another metrics in rrd, like pressure of cpu/mem, or true vm cgroup mem/cpu.
> I think we already stream them, but rrd part is not yet done.
> 

I'd like to benchmark and evaluate the in-memory key-value store of pmxcfs
to see if it's ok to further expand usage of that one, as for PSI info it could be
OK to only have the last one, as they are already averaged by the kernel,
so we could start out by just broadcasting the last one outside of RRD - but
as said, I do not know if that's really (better) scalable.

> 
> +    $stats->{cpu} = percentile($percentile, \@cpu) || 0;
> +    $stats->{mem} = percentile($percentile, \@mem) || 0;
> +    $stats->{maxmem} = percentile($percentile, \@maxmem) || 0;
> +    $stats->{maxcpu} = percentile($percentile, \@maxcpu) || 0;
> +    $stats->{totalcpu} = $stats->{cpu} * $stats->{maxcpu} * 100;
> 
>> what is cpu/maxcpu exactly, the multiplication of both strikes me as a bit weird?
> 
> maxcpu is the number of core,so totalcpu is the number of cores * current percent usage * 100.
> (This is to be able to compare vm/host with differents number of cores)
> But indeed, maybe I'm wrong with my maths ;) I need to check a little bit the ordering and compare code.
> 
> 

Ok, I was more confused because maxcpu and cpu had different units (count vs. load-percentage).

> 
> +        #### FILTERING NODES WITH HARD CONSTRAINTS (vm can't be started)
> +       next if !check_hard_constraints($haenv, $vmid, $cd, $nodename, $node_stats, $vm_stats, $storecfg, $group_members_prio);
> 
>> I mean, most of those should be covered by the grouping already?
> 
> do you mean: manually grouping by user ? if yes, the point is here to get it done auto (for storage avalability, number de host core,maybe vmbr availability,....), because this is a pain to setup.

Hmm, OK, groups can still make sense if the admin wants to parition the cluster,
e.g., if its not homogenous (e.g., hyper-converged with storage and compute nodes)
or the like.

> For example, currently, if you forgot it, a vm with a local storage could be migrated to another node by HA (with error). Then user don't known how to migrate it back to original node.

I mean, it's definitively a pain point that users new to our HA stack run into,
so IMO good to improve.

Maybe the suggested `check_service_constraints` should then rather be a
`get_runnable_nodes($sid)` method in Env, not to sure yet though, they're both
roughly the same.

> 
> But it's also checking dynamic values like available cpu/ram on host.
> 
> 

hmm, sounds like a third patch we could split out

1. commit: add basic constrained checking with storage and such automatically
2. commit: add resource/node usage tracking code-infrastructure
3. commit integrate that tracking into the constraints checker for dynamic
   target decisions
4. tests and such

As said, the split is not a must for me, but I think it could be a bit easier
to digest that way on review.

thx!