public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH qemu-server] fix #6207: vm status: cache last disk read/write values
@ 2025-09-22 10:15 Fiona Ebner
  2025-09-22 17:26 ` Thomas Lamprecht
  0 siblings, 1 reply; 2+ messages in thread
From: Fiona Ebner @ 2025-09-22 10:15 UTC (permalink / raw)
  To: pve-devel

If disk read/write cannot be queried because of QMP timeout, they
should not be reported as 0, but the last value should be re-used.
Otherwise, the difference between that reported 0 and the next value,
when the stats are queried successfully, will show up as a huge spike
in the RRD graphs.

Invalidate the cache when there is no PID or the PID changed.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 src/PVE/QemuServer.pm | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm
index f7f85436..5940ec38 100644
--- a/src/PVE/QemuServer.pm
+++ b/src/PVE/QemuServer.pm
@@ -2670,6 +2670,12 @@ our $vmstatus_return_properties = {
 
 my $last_proc_pid_stat;
 
+# See bug #6207. If disk write/read cannot be queried because of QMP timeout, they should not be
+# reported as 0, but the last value should be re-used. Otherwise, the difference between that
+# reported 0 and the next value, when the stats are queried successfully, will show up as a huge
+# spike.
+my $last_disk_stats = {};
+
 # get VM status information
 # This must be fast and should not block ($full == false)
 # We only query KVM using QMP if $full == true (this can be slow)
@@ -2741,6 +2747,14 @@ sub vmstatus {
         $d->{tags} = $conf->{tags} if defined($conf->{tags});
 
         $res->{$vmid} = $d;
+
+        if (
+            $last_disk_stats->{$vmid}
+            && (!$d->{pid} || $d->{pid} != $last_disk_stats->{$vmid}->{pid})
+        ) {
+            delete($last_disk_stats->{$vmid});
+        }
+
     }
 
     my $netdev = PVE::ProcFSTools::read_proc_net_dev();
@@ -2815,6 +2829,8 @@ sub vmstatus {
 
     return $res if !$full;
 
+    my $disk_stats_present = {};
+
     my $qmpclient = PVE::QMPClient->new();
 
     my $ballooncb = sub {
@@ -2862,8 +2878,10 @@ sub vmstatus {
 
             $res->{$vmid}->{blockstat}->{$drive_id} = $blockstat->{stats};
         }
-        $res->{$vmid}->{diskread} = $totalrdbytes;
-        $res->{$vmid}->{diskwrite} = $totalwrbytes;
+        $res->{$vmid}->{diskread} = $last_disk_stats->{$vmid}->{diskread} = $totalrdbytes;
+        $res->{$vmid}->{diskwrite} = $last_disk_stats->{$vmid}->{diskwrite} = $totalwrbytes;
+        $last_disk_stats->{$vmid}->{pid} = $res->{$vmid}->{pid};
+        $disk_stats_present->{$vmid} = 1;
     };
 
     my $machinecb = sub {
@@ -2925,7 +2943,13 @@ sub vmstatus {
 
     foreach my $vmid (keys %$list) {
         next if $opt_vmid && ($vmid ne $opt_vmid);
+
         $res->{$vmid}->{qmpstatus} = $res->{$vmid}->{status} if !$res->{$vmid}->{qmpstatus};
+
+        if (!$disk_stats_present->{$vmid} && $last_disk_stats->{$vmid}) {
+            $res->{$vmid}->{diskread} = $last_disk_stats->{$vmid}->{diskread};
+            $res->{$vmid}->{diskwrite} = $last_disk_stats->{$vmid}->{diskwrite};
+        }
     }
 
     return $res;
-- 
2.47.3



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [pve-devel] [PATCH qemu-server] fix #6207: vm status: cache last disk read/write values
  2025-09-22 10:15 [pve-devel] [PATCH qemu-server] fix #6207: vm status: cache last disk read/write values Fiona Ebner
@ 2025-09-22 17:26 ` Thomas Lamprecht
  0 siblings, 0 replies; 2+ messages in thread
From: Thomas Lamprecht @ 2025-09-22 17:26 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fiona Ebner

Am 22.09.25 um 12:18 schrieb Fiona Ebner:
> If disk read/write cannot be queried because of QMP timeout, they
> should not be reported as 0, but the last value should be re-used.
> Otherwise, the difference between that reported 0 and the next value,
> when the stats are queried successfully, will show up as a huge spike
> in the RRD graphs.

Fine with the idea in general, but this is effectively relevant for
the pvestatd only though?

As of now we would also cache in the API daemon, without every using
this. Might not be _that_ much, so not really a problem of the amount,
but feels a bit wrong to me w.r.t. "code place".

Has pvestatd the necessary info, directly on indirectly through the
existence of some other vmstatus properties, to derive when it can
safely reuse the previous value?

Or maybe we could make this caching opt-in through some module flag
that only pvestatd sets? But not really thought that through, so
please take this with a grain of salt.

btw. what about QMP being "stuck" for a prolonged time, should we
stop using the previous value after a few times (or duration)?



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-09-22 17:26 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-22 10:15 [pve-devel] [PATCH qemu-server] fix #6207: vm status: cache last disk read/write values Fiona Ebner
2025-09-22 17:26 ` Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal