On 16/04/2024 14:20, Fabian Grünbichler wrote: > high level description: > > VM/CT vmstatus returns new fields for configured and running "usage" > values, these are then broadcasted by pvestatd on each node via KV. > > helpers in guest-common to check those limits > pool API returns limits and usage, them and allows setting the limits > > qemu-server/pve-container try to check actions against those limits. > > since the broadcast is async, there is always an opportunity to cheat by > racing against the broadcast. this is unavoidable unless we want to > serialize all usage affecting tasks across the cluster.. > > changelog since v1/RFC: > - GUI can edit limits now > - incorporated most feedback from Wolfgang and Dominik > > potential follow-ups: > - disk limits/usage > - RRD/metrics support (or even switching entirely to RRD based > broadcasting instead of KV) > - timeout of broadcasted data if node goes offline/stops broadcasting > - better GUI > - ... ? I've tried my hands on this patch series. I've added a few notes around some patches and inlined them directly. Hope the notes are helpful as the feature seems very promising, especially when an administrator wants to be able to allocate specific resources to different user's guests. --- I've tested this patch series by creating a few resource pools and adding/removing some VMs and container to them and trying out different ways to start them (start on boot, file-restore, normal start) and see if these are correctly limited. It worked fine for containers as far as I have tested, but I've had some troubles with VMs because it seems that for VMs the memory is not correctly calculated, I've mentioned that inline. I've also tried the same thing with setting different cores / socket values and that worked as expected, when a start would've reached the limit for the cpu count, the start failed as expected. Another thing I took a closer look was hotplugging memory and cpu cores, which also worked as expected, even though I wasn't sure how to hit all new codepaths correctly, as I pointed out inline. All in all, with the two bug fixes I mentioned inside, this patch series looks good to me so far! I've added some notes about applying below. --- Since this patch series is a few months old, here's a quick overview on which base commits I applied, but everything went fine except a single conflict that was very easy to fix: === applying patches === 01-01: pve-access-control ------------------------- 01 applied cleanly on 4465786 ("api, auth: fix two typos in user-visible description") 02-08: pve-container -------------------- 02 applied cleanly on a39c5b0 ("bump version to 5.2.2") 03 applied with fuzz: Patch failed at 0002 status: add pool usage fields patching file src/PVE/LXC.pm Hunk #1 succeeded at 146 with fuzz 2 (offset 8 lines). Hunk #2 succeeded at 208 (offset 36 lines). Hunk #3 succeeded at 264 (offset 42 lines). Hunk #4 succeeded at 272 (offset 42 lines). Hunk #5 succeeded at 297 (offset 42 lines). 04-07 applied cleanly. 08 applied with fuzz: Patch failed at 0007 update: handle pool limits patching file src/PVE/API2/LXC/Config.pm Hunk #1 succeeded at 208 with fuzz 2. 09-09: pve-guest-common ----------------------- 09 applied with fuzz: Patch failed at 0001 helpers: add pool limit/usage helpers patching file src/PVE/GuestHelpers.pm Hunk #1 succeeded at 450 with fuzz 1 (offset 34 lines). 10-13: pve-manager ------------------ 10 applied cleanly on ead665d5 ("ui: ceph pool: add columns for application") 11 applied partially with fuzz: Patch failed at 0002 pvestatd: collect and broadcast pool usage patching file PVE/Service/pvestatd.pm Hunk #1 FAILED at 231. Hunk #2 succeeded at 245 (offset 3 lines). Hunk #3 succeeded at 283 with fuzz 1 (offset 5 lines). Hunk #4 FAILED at 466. Hunk #5 succeeded at 480 (offset 5 lines). Hunk #6 succeeded at 592 with fuzz 1 (offset 11 lines). Hunk #7 FAILED at 593. Merged rejected changes of 11 to: ```gitdiff diff --git a/PVE/Service/pvestatd.pm b/PVE/Service/pvestatd.pm index c1d5bda9..fac2c37a 100755 --- a/PVE/Service/pvestatd.pm +++ b/PVE/Service/pvestatd.pm @@ -234,7 +234,7 @@ sub auto_balloning { } sub update_qemu_status { - my ($status_cfg, $pull_txn) = @_; + my ($status_cfg, $pull_txn, $pool_membership, $pool_usage) = @_; my $ctime = time(); my $vmstatus = PVE::QemuServer::vmstatus(undef, 1); @@ -471,7 +471,7 @@ sub rebalance_lxc_containers { } sub update_lxc_status { - my ($status_cfg, $pull_txn) = @_; + my ($status_cfg, $pull_txn, $pool_membership, $pool_usage) = @_; my $ctime = time(); my $vmstatus = PVE::LXC::vmstatus(); @@ -604,17 +604,23 @@ sub update_status { syslog('err', "node status update error: $err") if $err; eval { - update_qemu_status($status_cfg, $pull_txn); + update_qemu_status($status_cfg, $pull_txn, $pool_membership, $pool_usage); }; $err = $@; syslog('err', "qemu status update error: $err") if $err; eval { - update_lxc_status($status_cfg, $pull_txn); + update_lxc_status($status_cfg, $pull_txn, $pool_membership, $pool_usage); }; $err = $@; syslog('err', "lxc status update error: $err") if $err; + eval { + update_pool_usage($pool_usage); + }; + $err =$@; + syslog('err', "pool usage status update error: $err") if $err; + eval { rebalance_lxc_containers(); }; ``` 11-13 applied cleanly. 14-19: qemu-server ------------------ 14 applied cleanly on 45fedd47 ("fix #5980: importdisk: fix spurious warning") 15 applied with warning: Applying: vmstatus: add usage values for pool limits .git/rebase-apply/patch:58: trailing whitespace. warning: 1 line adds whitespace errors. 16 applied cleanly. 17 applied with warning: Applying: update/hotplug: handle pool limits .git/rebase-apply/patch:19: space before tab in indent. my $old = $usage->{cpu}; warning: 1 line adds whitespace errors. 18-19 applied cleanly.