From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pve-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9])
	by lore.proxmox.com (Postfix) with ESMTPS id 73DB21FF16F
	for <inbox@lore.proxmox.com>; Thu, 19 Dec 2024 17:00:10 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 5C445BDA2;
	Thu, 19 Dec 2024 17:00:08 +0100 (CET)
From: Daniel Kral <d.kral@proxmox.com>
To: f.gruenbichler@proxmox.com
Date: Thu, 19 Dec 2024 16:59:43 +0100
Message-Id: <20241219155943.182322-1-d.kral@proxmox.com>
X-Mailer: git-send-email 2.39.5
In-Reply-To: <20240416122054.733817-1-f.gruenbichler@proxmox.com>
References: <20240416122054.733817-1-f.gruenbichler@proxmox.com>
MIME-Version: 1.0
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.003 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pve-devel] [PATCH v2 qemu-server/pve-container 0/19] pool
 resource limits
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
Reply-To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Cc: pve-devel@lists.proxmox.com
Content-Type: multipart/mixed; boundary="===============7915202020934044227=="
Errors-To: pve-devel-bounces@lists.proxmox.com
Sender: "pve-devel" <pve-devel-bounces@lists.proxmox.com>

--===============7915202020934044227==
Content-Transfer-Encoding: quoted-printable

On 16/04/2024 14:20, Fabian Gr=C3=BCnbichler wrote:=0D
> high level description:=0D
> =0D
> VM/CT vmstatus returns new fields for configured and running "usage"=0D
> values, these are then broadcasted by pvestatd on each node via KV.=0D
> =0D
> helpers in guest-common to check those limits=0D
> pool API returns limits and usage, them and allows setting the limits=0D
> =0D
> qemu-server/pve-container try to check actions against those limits.=0D
> =0D
> since the broadcast is async, there is always an opportunity to cheat by=
=0D
> racing against the broadcast. this is unavoidable unless we want to=0D
> serialize all usage affecting tasks across the cluster..=0D
> =0D
> changelog since v1/RFC:=0D
> - GUI can edit limits now=0D
> - incorporated most feedback from Wolfgang and Dominik=0D
> =0D
> potential follow-ups:=0D
> - disk limits/usage=0D
> - RRD/metrics support (or even switching entirely to RRD based=0D
>   broadcasting instead of KV)=0D
> - timeout of broadcasted data if node goes offline/stops broadcasting=0D
> - better GUI=0D
> - ... ?=0D
=0D
I've tried my hands on this patch series. I've added a few notes around=0D
some patches and inlined them directly. Hope the notes are helpful as=0D
the feature seems very promising, especially when an administrator wants=0D
to be able to allocate specific resources to different user's guests.=0D
=0D
---=0D
=0D
I've tested this patch series by creating a few resource pools and=0D
adding/removing some VMs and container to them and trying out different=0D
ways to start them (start on boot, file-restore, normal start) and see=0D
if these are correctly limited.=0D
=0D
It worked fine for containers as far as I have tested, but I've had some=0D
troubles with VMs because it seems that for VMs the memory is not=0D
correctly calculated, I've mentioned that inline. I've also tried the=0D
same thing with setting different cores / socket values and that worked=0D
as expected, when a start would've reached the limit for the cpu count,=0D
the start failed as expected.=0D
=0D
Another thing I took a closer look was hotplugging memory and cpu cores,=0D
which also worked as expected, even though I wasn't sure how to hit all=0D
new codepaths correctly, as I pointed out inline.=0D
=0D
All in all, with the two bug fixes I mentioned inside, this patch series=0D
looks good to me so far! I've added some notes about applying below.=0D
=0D
---=0D
=0D
Since this patch series is a few months old, here's a quick overview on=0D
which base commits I applied, but everything went fine except a single=0D
conflict that was very easy to fix:=0D
=0D
=0D
=3D=3D=3D applying patches =3D=3D=3D=0D
=0D
01-01: pve-access-control=0D
-------------------------=0D
=0D
01 applied cleanly on 4465786 ("api, auth: fix two typos in user-visible de=
scription")=0D
=0D
=0D
02-08: pve-container=0D
--------------------=0D
=0D
02 applied cleanly on a39c5b0 ("bump version to 5.2.2")=0D
=0D
03 applied with fuzz:=0D
=0D
Patch failed at 0002 status: add pool usage fields=0D
patching file src/PVE/LXC.pm=0D
Hunk #1 succeeded at 146 with fuzz 2 (offset 8 lines).=0D
Hunk #2 succeeded at 208 (offset 36 lines).=0D
Hunk #3 succeeded at 264 (offset 42 lines).=0D
Hunk #4 succeeded at 272 (offset 42 lines).=0D
Hunk #5 succeeded at 297 (offset 42 lines).=0D
=0D
04-07 applied cleanly.=0D
=0D
08 applied with fuzz:=0D
=0D
Patch failed at 0007 update: handle pool limits=0D
patching file src/PVE/API2/LXC/Config.pm=0D
Hunk #1 succeeded at 208 with fuzz 2.=0D
=0D
=0D
09-09: pve-guest-common=0D
-----------------------=0D
=0D
09 applied with fuzz:=0D
=0D
Patch failed at 0001 helpers: add pool limit/usage helpers=0D
patching file src/PVE/GuestHelpers.pm=0D
Hunk #1 succeeded at 450 with fuzz 1 (offset 34 lines).=0D
=0D
=0D
10-13: pve-manager=0D
------------------=0D
=0D
10 applied cleanly on ead665d5 ("ui: ceph pool: add columns for application=
")=0D
=0D
11 applied partially with fuzz:=0D
=0D
Patch failed at 0002 pvestatd: collect and broadcast pool usage=0D
patching file PVE/Service/pvestatd.pm=0D
Hunk #1 FAILED at 231.=0D
Hunk #2 succeeded at 245 (offset 3 lines).=0D
Hunk #3 succeeded at 283 with fuzz 1 (offset 5 lines).=0D
Hunk #4 FAILED at 466.=0D
Hunk #5 succeeded at 480 (offset 5 lines).=0D
Hunk #6 succeeded at 592 with fuzz 1 (offset 11 lines).=0D
Hunk #7 FAILED at 593.=0D
=0D
Merged rejected changes of 11 to:=0D
=0D
```gitdiff=0D
diff --git a/PVE/Service/pvestatd.pm b/PVE/Service/pvestatd.pm=0D
index c1d5bda9..fac2c37a 100755=0D
--- a/PVE/Service/pvestatd.pm=0D
+++ b/PVE/Service/pvestatd.pm=0D
@@ -234,7 +234,7 @@ sub auto_balloning {=0D
 }=0D
=0D
 sub update_qemu_status {=0D
-    my ($status_cfg, $pull_txn) =3D @_;=0D
+    my ($status_cfg, $pull_txn, $pool_membership, $pool_usage) =3D @_;=0D
=0D
     my $ctime =3D time();=0D
     my $vmstatus =3D PVE::QemuServer::vmstatus(undef, 1);=0D
@@ -471,7 +471,7 @@ sub rebalance_lxc_containers {=0D
 }=0D
=0D
 sub update_lxc_status {=0D
-    my ($status_cfg, $pull_txn) =3D @_;=0D
+    my ($status_cfg, $pull_txn, $pool_membership, $pool_usage) =3D @_;=0D
=0D
     my $ctime =3D time();=0D
     my $vmstatus =3D PVE::LXC::vmstatus();=0D
@@ -604,17 +604,23 @@ sub update_status {=0D
     syslog('err', "node status update error: $err") if $err;=0D
=0D
     eval {=0D
-       update_qemu_status($status_cfg, $pull_txn);=0D
+       update_qemu_status($status_cfg, $pull_txn, $pool_membership, $pool_=
usage);=0D
     };=0D
     $err =3D $@;=0D
     syslog('err', "qemu status update error: $err") if $err;=0D
=0D
     eval {=0D
-       update_lxc_status($status_cfg, $pull_txn);=0D
+       update_lxc_status($status_cfg, $pull_txn, $pool_membership, $pool_u=
sage);=0D
     };=0D
     $err =3D $@;=0D
     syslog('err', "lxc status update error: $err") if $err;=0D
=0D
+    eval {=0D
+       update_pool_usage($pool_usage);=0D
+    };=0D
+    $err =3D$@;=0D
+    syslog('err', "pool usage status update error: $err") if $err;=0D
+=0D
     eval {=0D
        rebalance_lxc_containers();=0D
     };=0D
```=0D
=0D
11-13 applied cleanly.=0D
=0D
=0D
14-19: qemu-server=0D
------------------=0D
=0D
14 applied cleanly on 45fedd47 ("fix #5980: importdisk: fix spurious warnin=
g")=0D
=0D
15 applied with warning:=0D
=0D
Applying: vmstatus: add usage values for pool limits=0D
.git/rebase-apply/patch:58: trailing whitespace.=0D
=0D
warning: 1 line adds whitespace errors.=0D
=0D
16 applied cleanly.=0D
=0D
17 applied with warning:=0D
=0D
Applying: update/hotplug: handle pool limits=0D
.git/rebase-apply/patch:19: space before tab in indent.=0D
                my $old =3D $usage->{cpu};=0D
warning: 1 line adds whitespace errors.=0D
=0D
18-19 applied cleanly.=0D



--===============7915202020934044227==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

--===============7915202020934044227==--