From: Fiona Ebner <f.ebner@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH qemu-server 07/11] agent: implement fsfreeze helper to better handle lost commands
Date: Mon, 5 May 2025 14:57:20 +0200 [thread overview]
Message-ID: <20250505125724.75620-8-f.ebner@proxmox.com> (raw)
In-Reply-To: <20250505125724.75620-1-f.ebner@proxmox.com>
As reported in the enterprise support, it can happen that a guest
agent command is read, but then the guest agent never sends an answer,
because the service in the guest is stopped/killed. For example, if a
guest reboot happens before the command can be successfully executed.
This is usually not problematic, but the fsfreeze-freeze command has a
timeout of 1 hour, so the guest agent socket would be blocked for that
amount of time, waiting on a command that is not being executed
anymore.
Use a lower timeout for the fsfreeze-freeze command, and issue an
fsfreeze-status command afterwards, which will return immediately if
the fsfreeze-freeze command already finished, and which will be queued
if not. This is used as a proxy to determine whether the
fsfreeze-freeze command is still running and to check whether it was
successful. Like this, the time the socket is blocked after a "lost
command" is at most 10 minutes.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
PVE/QMPClient.pm | 4 ++++
PVE/QemuConfig.pm | 5 ++--
PVE/QemuServer.pm | 3 ++-
PVE/QemuServer/Agent.pm | 51 ++++++++++++++++++++++++++++++++++++++++
PVE/VZDump/QemuServer.pm | 5 ++--
5 files changed, 63 insertions(+), 5 deletions(-)
diff --git a/PVE/QMPClient.pm b/PVE/QMPClient.pm
index 0f08e678..b3340885 100644
--- a/PVE/QMPClient.pm
+++ b/PVE/QMPClient.pm
@@ -110,6 +110,8 @@ sub cmd {
} elsif ($cmd->{execute} =~ m/^(eject|change)/) {
$timeout = 60; # note: cdrom mount command is slow
} elsif ($cmd->{execute} eq 'guest-fsfreeze-freeze') {
+ # consider using the guest_fsfreeze() helper in Agent.pm
+ #
# freeze syncs all guest FS, if we kill it it stays in an unfreezable
# locked state with high probability, so use an generous timeout
$timeout = 60*60; # 1 hour
@@ -146,6 +148,7 @@ sub cmd {
if (defined($queue_info->{error})) {
die "VM $vmid qmp command '$cmd->{execute}' failed - $queue_info->{error}" if !$noerr;
$result = { error => $queue_info->{error} };
+ $result->{'error-is-timeout'} = 1 if $queue_info->{'error-is-timeout'};
}
return $result;
@@ -467,6 +470,7 @@ sub mux_timeout {
if (my $queue_info = &$lookup_queue_info($self, $fh)) {
$queue_info->{error} = "got timeout\n";
+ $queue_info->{'error-is-timeout'} = 1;
$self->{mux}->inbuffer($fh, ''); # clear to avoid warnings
}
diff --git a/PVE/QemuConfig.pm b/PVE/QemuConfig.pm
index 2609542c..e941f093 100644
--- a/PVE/QemuConfig.pm
+++ b/PVE/QemuConfig.pm
@@ -8,6 +8,7 @@ use Scalar::Util qw(blessed);
use PVE::AbstractConfig;
use PVE::INotify;
use PVE::JSONSchema;
+use PVE::QemuServer::Agent;
use PVE::QemuServer::CPUConfig;
use PVE::QemuServer::Drive;
use PVE::QemuServer::Helpers;
@@ -291,8 +292,8 @@ sub __snapshot_freeze {
eval { mon_cmd($vmid, "guest-fsfreeze-thaw"); };
warn "guest-fsfreeze-thaw problems - $@" if $@;
} else {
- eval { mon_cmd($vmid, "guest-fsfreeze-freeze"); };
- warn "guest-fsfreeze-freeze problems - $@" if $@;
+ eval { PVE::QemuServer::Agent::guest_fsfreeze($vmid); };
+ warn $@ if $@;
}
}
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 577959a4..317c09f2 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -51,6 +51,7 @@ use PVE::Tools qw(run_command file_read_firstline file_get_contents dir_glob_for
use PVE::QMPClient;
use PVE::QemuConfig;
use PVE::QemuConfig::NoWrite;
+use PVE::QemuServer::Agent;
use PVE::QemuServer::Helpers qw(config_aware_timeout min_version kvm_user_version windows_version);
use PVE::QemuServer::Cloudinit;
use PVE::QemuServer::CGroup;
@@ -8242,7 +8243,7 @@ sub qemu_drive_mirror_monitor {
my $agent_running = $qga && qga_check_running($vmid);
if ($agent_running) {
print "freeze filesystem\n";
- eval { mon_cmd($vmid, "guest-fsfreeze-freeze"); };
+ eval { PVE::QemuServer::Agent::guest_fsfreeze($vmid); };
warn $@ if $@;
} else {
print "suspend vm\n";
diff --git a/PVE/QemuServer/Agent.pm b/PVE/QemuServer/Agent.pm
index 41e615aa..ef36a6a8 100644
--- a/PVE/QemuServer/Agent.pm
+++ b/PVE/QemuServer/Agent.pm
@@ -119,4 +119,55 @@ sub qemu_exec_status {
return $res;
}
+# It can happen that a guest agent command is read, but then the guest agent never sends an answer,
+# because the service in the guest is stopped/killed. For example, if a guest reboot happens before
+# the command can be successfully executed. This is usually not problematic, but the fsfreeze-freeze
+# command has a timeout of 1 hour, so the guest agent socket would be blocked for that amount of
+# time, waiting on a command that is not being executed anymore.
+#
+# Use a lower timeout for the fsfreeze-freeze command, and issue an fsfreeze-status command
+# afterwards, which will return immediately if the fsfreeze-freeze command already finished, and
+# which will be queued if not. This is used as a proxy to determine whether the fsfreeze-freeze
+# command is still running and to check whether it was successful. Like this, the time the socket is
+# blocked after a "lost command" is at most 10 minutes.
+sub guest_fsfreeze {
+ my ($vmid) = @_;
+
+ my $timeout = 10 * 60;
+
+ my $result = eval {
+ PVE::QemuServer::Monitor::mon_cmd($vmid, 'guest-fsfreeze-freeze', timeout => $timeout);
+ };
+ if ($result && ref($result) eq 'HASH' && $result->{error}) {
+ my $error = $result->{error}->{desc} // 'unknown';
+ die "unable to freeze guest fs - $error\n";
+ } elsif (defined($result)) {
+ return; # command successful
+ }
+
+ my $status;
+ eval {
+ my ($i, $last_iteration) = (0, 5);
+ while ($i < $last_iteration && !defined($status)) {
+ print "still waiting on guest fs freeze\n";
+ $i++;
+
+ $status = PVE::QemuServer::Monitor::mon_cmd(
+ $vmid, 'guest-fsfreeze-status', timeout => $timeout, noerr => 1);
+
+ if ($status && ref($status) eq 'HASH' && $status->{'error-is-timeout'}) {
+ $status = undef;
+ } else {
+ check_agent_error($status, 'unknown error');
+ }
+ }
+ if (!defined($status)) {
+ die "timeout after " . ($timeout * ($last_iteration + 1) / 60) . " minutes\n";
+ }
+ };
+ die "querying status after freezing guest fs failed - $@" if $@;
+
+ die "unable to freeze guest fs - unexpected status '$status'\n" if $status ne 'frozen';
+}
+
1;
diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm
index 10514f75..b686da84 100644
--- a/PVE/VZDump/QemuServer.pm
+++ b/PVE/VZDump/QemuServer.pm
@@ -29,6 +29,7 @@ use PVE::Format qw(render_duration render_bytes);
use PVE::QemuConfig;
use PVE::QemuServer;
+use PVE::QemuServer::Agent;
use PVE::QemuServer::Drive qw(checked_volume_format);
use PVE::QemuServer::Helpers;
use PVE::QemuServer::Machine;
@@ -1069,10 +1070,10 @@ sub qga_fs_freeze {
}
$self->loginfo("issuing guest-agent 'fs-freeze' command");
- eval { mon_cmd($vmid, "guest-fsfreeze-freeze") };
+ eval { PVE::QemuServer::Agent::guest_fsfreeze($vmid); };
$self->logerr($@) if $@;
- return 1; # even on mon command error, ensure we always thaw again
+ return 1; # even on error, ensure we always thaw again
}
# only call if fs_freeze return 1
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
next prev parent reply other threads:[~2025-05-05 12:58 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-05 12:57 [pve-devel] [PATCH-SERIES qemu-server 00/11] better handle lost freeze command Fiona Ebner
2025-05-05 12:57 ` [pve-devel] [PATCH qemu-server 01/11] agent: drop unused $noerr argument from helpers Fiona Ebner
2025-05-05 12:57 ` [pve-devel] [PATCH qemu-server 02/11] api: agent: improve module imports Fiona Ebner
2025-05-05 12:57 ` [pve-devel] [PATCH qemu-server 03/11] agent: code style: order module imports according to style guide Fiona Ebner
2025-05-05 12:57 ` [pve-devel] [PATCH qemu-server 04/11] agent: avoid dependency on QemuConfig module Fiona Ebner
2025-05-05 12:57 ` [pve-devel] [PATCH qemu-server 05/11] qmp client: remove erroneous comment Fiona Ebner
2025-05-05 12:57 ` [pve-devel] [PATCH qemu-server 06/11] qmp client: add $noerr argument Fiona Ebner
2025-05-05 12:57 ` Fiona Ebner [this message]
2025-05-05 13:57 ` [pve-devel] [PATCH qemu-server 07/11] agent: implement fsfreeze helper to better handle lost commands Mira Limbeck
2025-05-05 14:01 ` Fiona Ebner
2025-05-05 12:57 ` [pve-devel] [PATCH qemu-server 08/11] agent: avoid use of deprecated check_running() function Fiona Ebner
2025-05-05 12:57 ` [pve-devel] [PATCH qemu-server 09/11] agent: move qga_check_running() helper to agent module Fiona Ebner
2025-05-05 12:57 ` [pve-devel] [PATCH qemu-server 10/11] agent: prefer usage of get_qga_key() helper Fiona Ebner
2025-05-05 12:57 ` [pve-devel] [PATCH qemu-server 11/11] agent: move guest agent format and parsing to agent module Fiona Ebner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250505125724.75620-8-f.ebner@proxmox.com \
--to=f.ebner@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal