* [pve-devel] [PATCH qemu-server] fix #6207: vm status: cache last disk read/write values @ 2025-09-22 10:15 Fiona Ebner 2025-09-22 17:26 ` Thomas Lamprecht 0 siblings, 1 reply; 5+ messages in thread From: Fiona Ebner @ 2025-09-22 10:15 UTC (permalink / raw) To: pve-devel If disk read/write cannot be queried because of QMP timeout, they should not be reported as 0, but the last value should be re-used. Otherwise, the difference between that reported 0 and the next value, when the stats are queried successfully, will show up as a huge spike in the RRD graphs. Invalidate the cache when there is no PID or the PID changed. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> --- src/PVE/QemuServer.pm | 28 ++++++++++++++++++++++++++-- 1 file changed, 26 insertions(+), 2 deletions(-) diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm index f7f85436..5940ec38 100644 --- a/src/PVE/QemuServer.pm +++ b/src/PVE/QemuServer.pm @@ -2670,6 +2670,12 @@ our $vmstatus_return_properties = { my $last_proc_pid_stat; +# See bug #6207. If disk write/read cannot be queried because of QMP timeout, they should not be +# reported as 0, but the last value should be re-used. Otherwise, the difference between that +# reported 0 and the next value, when the stats are queried successfully, will show up as a huge +# spike. +my $last_disk_stats = {}; + # get VM status information # This must be fast and should not block ($full == false) # We only query KVM using QMP if $full == true (this can be slow) @@ -2741,6 +2747,14 @@ sub vmstatus { $d->{tags} = $conf->{tags} if defined($conf->{tags}); $res->{$vmid} = $d; + + if ( + $last_disk_stats->{$vmid} + && (!$d->{pid} || $d->{pid} != $last_disk_stats->{$vmid}->{pid}) + ) { + delete($last_disk_stats->{$vmid}); + } + } my $netdev = PVE::ProcFSTools::read_proc_net_dev(); @@ -2815,6 +2829,8 @@ sub vmstatus { return $res if !$full; + my $disk_stats_present = {}; + my $qmpclient = PVE::QMPClient->new(); my $ballooncb = sub { @@ -2862,8 +2878,10 @@ sub vmstatus { $res->{$vmid}->{blockstat}->{$drive_id} = $blockstat->{stats}; } - $res->{$vmid}->{diskread} = $totalrdbytes; - $res->{$vmid}->{diskwrite} = $totalwrbytes; + $res->{$vmid}->{diskread} = $last_disk_stats->{$vmid}->{diskread} = $totalrdbytes; + $res->{$vmid}->{diskwrite} = $last_disk_stats->{$vmid}->{diskwrite} = $totalwrbytes; + $last_disk_stats->{$vmid}->{pid} = $res->{$vmid}->{pid}; + $disk_stats_present->{$vmid} = 1; }; my $machinecb = sub { @@ -2925,7 +2943,13 @@ sub vmstatus { foreach my $vmid (keys %$list) { next if $opt_vmid && ($vmid ne $opt_vmid); + $res->{$vmid}->{qmpstatus} = $res->{$vmid}->{status} if !$res->{$vmid}->{qmpstatus}; + + if (!$disk_stats_present->{$vmid} && $last_disk_stats->{$vmid}) { + $res->{$vmid}->{diskread} = $last_disk_stats->{$vmid}->{diskread}; + $res->{$vmid}->{diskwrite} = $last_disk_stats->{$vmid}->{diskwrite}; + } } return $res; -- 2.47.3 _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [pve-devel] [PATCH qemu-server] fix #6207: vm status: cache last disk read/write values 2025-09-22 10:15 [pve-devel] [PATCH qemu-server] fix #6207: vm status: cache last disk read/write values Fiona Ebner @ 2025-09-22 17:26 ` Thomas Lamprecht 2025-09-25 8:27 ` Fiona Ebner 0 siblings, 1 reply; 5+ messages in thread From: Thomas Lamprecht @ 2025-09-22 17:26 UTC (permalink / raw) To: Proxmox VE development discussion, Fiona Ebner Am 22.09.25 um 12:18 schrieb Fiona Ebner: > If disk read/write cannot be queried because of QMP timeout, they > should not be reported as 0, but the last value should be re-used. > Otherwise, the difference between that reported 0 and the next value, > when the stats are queried successfully, will show up as a huge spike > in the RRD graphs. Fine with the idea in general, but this is effectively relevant for the pvestatd only though? As of now we would also cache in the API daemon, without every using this. Might not be _that_ much, so not really a problem of the amount, but feels a bit wrong to me w.r.t. "code place". Has pvestatd the necessary info, directly on indirectly through the existence of some other vmstatus properties, to derive when it can safely reuse the previous value? Or maybe we could make this caching opt-in through some module flag that only pvestatd sets? But not really thought that through, so please take this with a grain of salt. btw. what about QMP being "stuck" for a prolonged time, should we stop using the previous value after a few times (or duration)? _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [pve-devel] [PATCH qemu-server] fix #6207: vm status: cache last disk read/write values 2025-09-22 17:26 ` Thomas Lamprecht @ 2025-09-25 8:27 ` Fiona Ebner 2025-09-25 8:52 ` Thomas Lamprecht 0 siblings, 1 reply; 5+ messages in thread From: Fiona Ebner @ 2025-09-25 8:27 UTC (permalink / raw) To: Thomas Lamprecht, Proxmox VE development discussion Am 22.09.25 um 7:26 PM schrieb Thomas Lamprecht: > Am 22.09.25 um 12:18 schrieb Fiona Ebner: >> If disk read/write cannot be queried because of QMP timeout, they >> should not be reported as 0, but the last value should be re-used. >> Otherwise, the difference between that reported 0 and the next value, >> when the stats are queried successfully, will show up as a huge spike >> in the RRD graphs. > > Fine with the idea in general, but this is effectively relevant for > the pvestatd only though? > > As of now we would also cache in the API daemon, without every using > this. Might not be _that_ much, so not really a problem of the amount, > but feels a bit wrong to me w.r.t. "code place". > > Has pvestatd the necessary info, directly on indirectly through the > existence of some other vmstatus properties, to derive when it can > safely reuse the previous value? It's safe (and sensible/required) if and only if there is no new value. We could have the cache be only inside pvestatd, initialize the cache with a value of 0 and properly report diskread/write values as undef if we cannot get an actual value, and have that mean "re-use previous value". (Aside: we cannot use 0 instead of undef to mean "re-use previous value", because there are edge cases where a later 0 actually means 0 again, for example, all disk unplugged). > Or maybe we could make this caching opt-in through some module flag > that only pvestatd sets? But not really thought that through, so > please take this with a grain of salt. > > btw. what about QMP being "stuck" for a prolonged time, should we > stop using the previous value after a few times (or duration)? What other value could we use? Since the graph looks at the differences of reported values, the only reasonable value we can use if we cannot get a new one is the previous one. No matter how long it takes to get a new one, or there will be that completely wrong spike again. Or is there a N/A kind of value that we could use, where RRD/graph would be smart enough to know "I cannot calculate a difference now, will have to wait for multiple good values"? Then I'd go for that instead of the current approach. _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [pve-devel] [PATCH qemu-server] fix #6207: vm status: cache last disk read/write values 2025-09-25 8:27 ` Fiona Ebner @ 2025-09-25 8:52 ` Thomas Lamprecht 2025-09-25 9:28 ` Fiona Ebner 0 siblings, 1 reply; 5+ messages in thread From: Thomas Lamprecht @ 2025-09-25 8:52 UTC (permalink / raw) To: Fiona Ebner, Proxmox VE development discussion Am 25.09.25 um 10:27 schrieb Fiona Ebner: > Am 22.09.25 um 7:26 PM schrieb Thomas Lamprecht: >> Am 22.09.25 um 12:18 schrieb Fiona Ebner: >>> If disk read/write cannot be queried because of QMP timeout, they >>> should not be reported as 0, but the last value should be re-used. >>> Otherwise, the difference between that reported 0 and the next value, >>> when the stats are queried successfully, will show up as a huge spike >>> in the RRD graphs. >> >> Fine with the idea in general, but this is effectively relevant for >> the pvestatd only though? >> >> As of now we would also cache in the API daemon, without every using >> this. Might not be _that_ much, so not really a problem of the amount, >> but feels a bit wrong to me w.r.t. "code place". >> >> Has pvestatd the necessary info, directly on indirectly through the >> existence of some other vmstatus properties, to derive when it can >> safely reuse the previous value? > > It's safe (and sensible/required) if and only if there is no new value. > We could have the cache be only inside pvestatd, initialize the cache > with a value of 0 and properly report diskread/write values as undef if > we cannot get an actual value, and have that mean "re-use previous > value". (Aside: we cannot use 0 instead of undef to mean "re-use > previous value", because there are edge cases where a later 0 actually > means 0 again, for example, all disk unplugged). Yeah, it would have to be a invalid value like -1, but even that is naturally not ideal, an explicit undefined or null value would naturally be better to signal what's happening. >> Or maybe we could make this caching opt-in through some module flag >> that only pvestatd sets? But not really thought that through, so >> please take this with a grain of salt. >> >> btw. what about QMP being "stuck" for a prolonged time, should we >> stop using the previous value after a few times (or duration)? > > What other value could we use? Since the graph looks at the differences > of reported values, the only reasonable value we can use if we cannot > get a new one is the previous one. No matter how long it takes to get a > new one, or there will be that completely wrong spike again. Or is there > a N/A kind of value that we could use, where RRD/graph would be smart > enough to know "I cannot calculate a difference now, will have to wait > for multiple good values"? Then I'd go for that instead of the current > approach. That should never be the problem of the metric collecting entity, but of the one interpreting or displaying the data, as else this is creating a false impression of reality. So the more I think of this, the more I'm sure that we won't do anybody a favor in the mid/long term here with "faking it" in the backend. I'd need to look into RRD, but even if there wasn't a way there to submit null-ish values, I'd rather see that as further argument for switching out RRD with the rust based proxmox-rrd crate, where we have control over these things, compared to recording measurements that did not happen. That does not mean that doing this correctly in proxmox-rrd will be trivial to do once we migrated–which is non-trivial on it's own–though. There are also some ideas to switching to a rather different way to encode metrics, using a more flexible format and stuff like delta encoding, i.e. closer to modern time series DBs like influxdb do it, Lukas signaled some interest in this work here. But that is vaporware as of now, so no need to wait on that to happen now, just wanted to mention it to not have those ideas isolated to much. But taking a step back, why is QMP even timing out here? Is this not just reading some in-memory counters that QEMU has ready to go? _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [pve-devel] [PATCH qemu-server] fix #6207: vm status: cache last disk read/write values 2025-09-25 8:52 ` Thomas Lamprecht @ 2025-09-25 9:28 ` Fiona Ebner 0 siblings, 0 replies; 5+ messages in thread From: Fiona Ebner @ 2025-09-25 9:28 UTC (permalink / raw) To: Thomas Lamprecht, Proxmox VE development discussion Am 25.09.25 um 10:52 AM schrieb Thomas Lamprecht: > Am 25.09.25 um 10:27 schrieb Fiona Ebner: >> Am 22.09.25 um 7:26 PM schrieb Thomas Lamprecht: >>> Am 22.09.25 um 12:18 schrieb Fiona Ebner: >>> Or maybe we could make this caching opt-in through some module flag >>> that only pvestatd sets? But not really thought that through, so >>> please take this with a grain of salt. >>> >>> btw. what about QMP being "stuck" for a prolonged time, should we >>> stop using the previous value after a few times (or duration)? >> >> What other value could we use? Since the graph looks at the differences >> of reported values, the only reasonable value we can use if we cannot >> get a new one is the previous one. No matter how long it takes to get a >> new one, or there will be that completely wrong spike again. Or is there >> a N/A kind of value that we could use, where RRD/graph would be smart >> enough to know "I cannot calculate a difference now, will have to wait >> for multiple good values"? Then I'd go for that instead of the current >> approach. > > That should never be the problem of the metric collecting entity, but of > the one interpreting or displaying the data, as else this is creating a > false impression of reality. > > So the more I think of this, the more I'm sure that we won't do anybody > a favor in the mid/long term here with "faking it" in the backend. Very good point! I'll look into what happens when reporting an undef value, because right now the interpreting entity cannot distinguish between "0 because of no data" and "0 yes I really mean this is the actual value". > I'd need to look into RRD, but even if there wasn't a way there to > submit null-ish values, I'd rather see that as further argument for > switching out RRD with the rust based proxmox-rrd crate, where we have > control over these things, compared to recording measurements that did > not happen. > > That does not mean that doing this correctly in proxmox-rrd will be > trivial to do once we migrated–which is non-trivial on it's own–though. > There are also some ideas to switching to a rather different way to > encode metrics, using a more flexible format and stuff like delta > encoding, i.e. closer to modern time series DBs like influxdb do it, > Lukas signaled some interest in this work here. > But that is vaporware as of now, so no need to wait on that to happen > now, just wanted to mention it to not have those ideas isolated to much. > > > But taking a step back, why is QMP even timing out here? Is this not > just reading some in-memory counters that QEMU has ready to go? There can be another QMP operation going on blocking the request (e.g. backup), or the QEMU main thread might be busy or the system in general might be under too much load to handle all of the QMP commands to all the VMs in time. The report of this issue in the enterprise support has VMs that are not being backed-up showing the spike during backup of other VMs. But it seems like there is potential for improvement how we do things. We collect : > my $statuscb = sub { > my ($vmid, $resp) = @_; > > $qmpclient->queue_cmd($vmid, $blockstatscb, 'query-blockstats'); > $qmpclient->queue_cmd($vmid, $machinecb, 'query-machines'); > $qmpclient->queue_cmd($vmid, $versioncb, 'query-version'); > # this fails if ballon driver is not loaded, so this must be > # the last command (following command are aborted if this fails). > $qmpclient->queue_cmd($vmid, $ballooncb, 'query-balloon'); > > my $status = 'unknown'; > if (!defined($status = $resp->{'return'}->{status})) { > warn "unable to get VM status\n"; > return; > } > > $res->{$vmid}->{qmpstatus} = $resp->{'return'}->{status}; > }; > > foreach my $vmid (keys %$list) { > next if $opt_vmid && ($vmid ne $opt_vmid); > next if !$res->{$vmid}->{pid}; # not running > $qmpclient->queue_cmd($vmid, $statuscb, 'query-status'); > } Okay, good! We collect all commands so we can issue them in parallel. > $qmpclient->queue_execute(undef, 2); Here we only have the default timeout of 3 seconds (i.e. the undef argument), maybe we should bump that to something like 5 seconds? Right now, without having the pvestatd parallelize the update_qemu_status() with other update_xyz() operations, that might already be quite costly :/ But considering it's for all VMs it might be fair? > foreach my $vmid (keys %$list) { > next if $opt_vmid && ($vmid ne $opt_vmid); > next if !$res->{$vmid}->{pid}; #not running > > # we can't use the $qmpclient since it might have already aborted on > # 'query-balloon', but this might also fail for older versions... > my $qemu_support = eval { mon_cmd($vmid, "query-proxmox-support") }; > $res->{$vmid}->{'proxmox-support'} = $qemu_support // {}; > } This OTOH, seems just bad, querying the info one-by-one, each with its own timeout. I'll look into whether this can be reworked to be part of the queue (before 'query-balloon'). And/or we should be able to even disable this for the status daemon, I think it doesn't use that info at all. _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-09-25 9:27 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-09-22 10:15 [pve-devel] [PATCH qemu-server] fix #6207: vm status: cache last disk read/write values Fiona Ebner 2025-09-22 17:26 ` Thomas Lamprecht 2025-09-25 8:27 ` Fiona Ebner 2025-09-25 8:52 ` Thomas Lamprecht 2025-09-25 9:28 ` Fiona Ebner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox