From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate001.proxmox.com (gate001.proxmox.com [IPv6:2a0f:8001:1:32::40]) by lore.proxmox.com (Postfix) with ESMTPS id 9149F1FF135 for ; Thu, 02 Jul 2026 12:57:27 +0200 (CEST) Received: from gate001.proxmox.com (localhost.localdomain [127.0.0.1]) by gate001.proxmox.com (Proxmox) with ESMTP id 86F60213EE; Thu, 02 Jul 2026 12:57:26 +0200 (CEST) From: Maximiliano Sandoval To: Jakob Klocker Subject: Re: [PATCH qemu-server v3 3/3] fix #5032: agent: sync guest time via QGA when the clock falls behind In-Reply-To: <20260702103201.164567-4-j.klocker@proxmox.com> (Jakob Klocker's message of "Thu, 2 Jul 2026 12:32:01 +0200") References: <20260702103201.164567-1-j.klocker@proxmox.com> <20260702103201.164567-4-j.klocker@proxmox.com> User-Agent: mu4e 1.12.9; emacs 30.1 Date: Thu, 02 Jul 2026 12:56:50 +0200 Message-ID: MIME-Version: 1.0 Content-Type: text/plain X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1782989804082 X-SPAM-LEVEL: Spam detection results: 0 DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment (newer systems) SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: ZD4SGVKNMPIP6NDRJ2UFHSVZETWUI2E5 X-Message-ID-Hash: ZD4SGVKNMPIP6NDRJ2UFHSVZETWUI2E5 X-MailFrom: m.sandoval@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: pve-devel@lists.proxmox.com X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Jakob Klocker writes: Some comments below. One design comment: I am not sure if the functions should have an $override parameter, it could be argued than it is the job of the call-site to determine whether how to proceeded based on API+config parameters. disclaimer: I did not check v1 and v2's discussions. > Add a 'sync-time-on-resume' agent config property (default off) that, > when enabled, issues the guest-set-time QGA command to set the guest > clock to the host's current time after operations that leave the clock > behind: > - resuming from hibernation (suspend-to-disk) > - resuming from pause/suspend > - rolling back to a snapshot that includes RAM > - live migration > - taking a snapshot, during which the guest is briefly frozen > > The start, resume and rollback API endpoints gain a > 'sync-time-on-resume' parameter (e.g. 'qm start > --sync-time-on-resume 0') that overrides the configured agent value for > a single operation, forcing the sync on or off regardless of config. > > guest_set_time prints a confirmation line to stdout on success. During > live migration the resume step runs over the migration tunnel, which > also uses stdout, so an unconditional print corrupts the tunnel > protocol. vm_resume therefore takes a 'quiet' parameter to suppress the > print on the migration path; the confirmation is logged by the migration > code instead. > > Signed-off-by: Jakob Klocker > --- > changes since v1: > - adapt warning message in the resume-from-saved-state path > changes since v2: > - add sync on pause/resume > - add override options > - add print task > - adapt formating and naming > > src/PVE/API2/Qemu.pm | 46 ++++++++++++++++++++++++++--- > src/PVE/CLI/qm.pm | 2 +- > src/PVE/QemuConfig.pm | 12 ++++++-- > src/PVE/QemuMigrate.pm | 2 ++ > src/PVE/QemuServer.pm | 18 +++++++++++- > src/PVE/QemuServer/Agent.pm | 54 ++++++++++++++++++++++++++++++++++ > src/PVE/QemuServer/RunState.pm | 11 ++++++- > 7 files changed, 136 insertions(+), 9 deletions(-) > > diff --git a/src/PVE/API2/Qemu.pm b/src/PVE/API2/Qemu.pm > index 9023c344..f58151e8 100644 > --- a/src/PVE/API2/Qemu.pm > +++ b/src/PVE/API2/Qemu.pm > @@ -3510,6 +3510,14 @@ __PACKAGE__->register_method({ > . ' the migration. A value of 0 means that the host_mtu parameter is to be' > . ' avoided for the corresponding device.', > }, > + 'sync-time-on-resume' => { > + type => 'boolean', > + description => Please have both a `description` and a `verbose_description`. The verbose description should mention all cases where this can happen and if possible mention the exact QGA command used (guest-set-time). There is a more complete verbose_description at Agent.pm below, but imho all doc strings should be complete enough to be read on their own, unless they reference another doc. > + "Override the VM's configured 'sync-time-on-resume' agent setting for this" This is an agent config so: "The VM QGA's configured..." or similar. > + . " start only. Controls whether the guest clock is synchronized after the" > + . " RAM state is restored. The guest agent must be enabled.", > + optional => 1, > + }, > }, > }, > returns => { > @@ -3525,6 +3533,7 @@ __PACKAGE__->register_method({ > my $vmid = extract_param($param, 'vmid'); > my $timeout = extract_param($param, 'timeout'); > my $machine = extract_param($param, 'machine'); > + my $sync_time = extract_param($param, 'sync-time-on-resume'); > > my $get_root_param = sub { > my $value = extract_param($param, $_[0]); > @@ -3631,6 +3640,7 @@ __PACKAGE__->register_method({ > timeout => $timeout, > forcecpu => $force_cpu, > 'nets-host-mtu' => $nets_host_mtu, > + 'sync-time-on-resume' => $sync_time, > }; > > PVE::QemuServer::vm_start($storecfg, $vmid, $params, $migrate_opts); > @@ -4115,6 +4125,14 @@ __PACKAGE__->register_method({ > description => "Do not check whether the VM is running, used" > . " internally during migration. Only root may use this option.", > }, > + 'sync-time-on-resume' => { > + type => 'boolean', > + description => > + "Override the VM's configured 'sync-time-on-resume' agent setting for this" > + . " resume only. Controls whether the guest clock is synchronized after the" > + . " RAM state is restored. The guest agent must be enabled.", Same considerations here. > + optional => 1, > + }, > }, > }, > returns => { > @@ -4131,6 +4149,8 @@ __PACKAGE__->register_method({ > > my $vmid = extract_param($param, 'vmid'); > > + my $sync_time = extract_param($param, 'sync-time-on-resume'); > + > my $skiplock = extract_param($param, 'skiplock'); > raise_param_exc({ skiplock => "Only root may use this option." }) > if $skiplock && $authuser ne 'root@pam'; > @@ -4161,10 +4181,14 @@ __PACKAGE__->register_method({ > syslog('info', "resume VM $vmid: $upid\n"); > > if (!$to_disk_suspended) { > - PVE::QemuServer::RunState::vm_resume($vmid, $skiplock, $nocheck); > + PVE::QemuServer::RunState::vm_resume($vmid, $skiplock, $nocheck, $sync_time); > } else { > my $storecfg = PVE::Storage::config(); > - PVE::QemuServer::vm_start($storecfg, $vmid, { skiplock => $skiplock }); > + PVE::QemuServer::vm_start( > + $storecfg, > + $vmid, > + { skiplock => $skiplock, 'sync-time-on-resume' => $sync_time }, > + ); > } > > return; > @@ -6323,6 +6347,14 @@ __PACKAGE__->register_method({ > optional => 1, > default => 0, > }, > + 'sync-time-on-resume' => { > + type => 'boolean', > + description => > + "Override the VM's configured sync-time-on-resume agent setting for this" > + . " rollback only. Controls whether the guest clock is synchronized" > + . " after the RAM state is restored. The guest agent must be enabled.", > + optional => 1, Ditto. > + }, > }, > }, > returns => { > @@ -6342,6 +6374,8 @@ __PACKAGE__->register_method({ > > my $snapname = extract_param($param, 'snapname'); > > + my $sync_time = extract_param($param, 'sync-time-on-resume'); > + > # vm_start is invoked directly from the worker, so its own permissions > # predicate doesn't fire here - check VM.PowerMgmt up front so a user > # with only VM.Snapshot.Rollback can't power the VM on as a side effect. > @@ -6350,7 +6384,11 @@ __PACKAGE__->register_method({ > > my $realcmd = sub { > PVE::Cluster::log_msg('info', $authuser, "rollback snapshot VM $vmid: $snapname"); > - PVE::QemuConfig->snapshot_rollback($vmid, $snapname); > + PVE::QemuConfig->snapshot_rollback( > + $vmid, > + $snapname, > + { 'sync-time-on-resume' => $sync_time }, > + ); > > if ($param->{start} && !PVE::QemuServer::Helpers::vm_running_locally($vmid)) { > PVE::API2::Qemu->vm_start({ vmid => $vmid, node => $node }); > @@ -6915,7 +6953,7 @@ __PACKAGE__->register_method({ > }, > 'resume' => sub { > if (PVE::QemuServer::Helpers::vm_running_locally($state->{vmid})) { > - PVE::QemuServer::RunState::vm_resume($state->{vmid}, 1, 1); > + PVE::QemuServer::RunState::vm_resume($state->{vmid}, 1, 1, undef, 1); > } else { > die "VM $state->{vmid} not running\n"; > } > diff --git a/src/PVE/CLI/qm.pm b/src/PVE/CLI/qm.pm > index b903c1f1..25351f64 100755 > --- a/src/PVE/CLI/qm.pm > +++ b/src/PVE/CLI/qm.pm > @@ -473,7 +473,7 @@ __PACKAGE__->register_method({ > if (PVE::QemuServer::Helpers::vm_running_locally($vmid)) { > # vm_resume with nocheck, since local node might not have processed config > # move/rename yet > - eval { PVE::QemuServer::RunState::vm_resume($vmid, 1, 1); }; > + eval { PVE::QemuServer::RunState::vm_resume($vmid, 1, 1, undef, 1); }; > if ($@) { > $tunnel_write->("ERR: resume failed - $@"); > } else { > diff --git a/src/PVE/QemuConfig.pm b/src/PVE/QemuConfig.pm > index 26f0fda2..e95a8c61 100644 > --- a/src/PVE/QemuConfig.pm > +++ b/src/PVE/QemuConfig.pm > @@ -383,6 +383,11 @@ sub __snapshot_create_vol_snapshots_hook { > next; > } > } > + my $conf = $class->load_config($vmid); > + if (PVE::QemuServer::Agent::should_sync_time_on_resume($conf->{agent})) { > + eval { PVE::QemuServer::Agent::guest_set_time($vmid); }; > + warn "could not sync guest time after snapshot - $@" if $@; > + } > } > } > } > @@ -444,8 +449,9 @@ sub __snapshot_rollback_hook { > my ($class, $vmid, $conf, $snap, $prepare, $data) = @_; > > if ($prepare) { > - # we save the machine of the current config > + # we save the machine and agent of the current config > $data->{oldmachine} = $conf->{machine}; > + $data->{agent} = $conf->{agent}; > } else { > # if we have a 'runningmachine' entry in the snapshot we use that > # for the forcemachine parameter, else we use the old logic > @@ -509,7 +515,7 @@ sub __snapshot_rollback_vm_stop { > } > > sub __snapshot_rollback_vm_start { > - my ($class, $vmid, $vmstate, $data) = @_; > + my ($class, $vmid, $vmstate, $data, $opts) = @_; > > my $storecfg = PVE::Storage::config(); > my $params = { > @@ -517,6 +523,8 @@ sub __snapshot_rollback_vm_start { > forcemachine => $data->{forcemachine}, > forcecpu => $data->{forcecpu}, > 'nets-host-mtu' => $data->{'nets-host-mtu'}, > + agent => $data->{agent}, > + 'sync-time-on-resume' => $opts->{'sync-time-on-resume'}, > }; > PVE::QemuServer::vm_start($storecfg, $vmid, $params); > } > diff --git a/src/PVE/QemuMigrate.pm b/src/PVE/QemuMigrate.pm > index 8da6f15d..5846578d 100644 > --- a/src/PVE/QemuMigrate.pm > +++ b/src/PVE/QemuMigrate.pm > @@ -1770,6 +1770,8 @@ sub phase3_cleanup { > $self->{errors} = 1; > } > } > + $self->log('info', "synced guest clock via guest agent") > + if PVE::QemuServer::Agent::should_sync_time_on_resume($conf->{agent}); > } > > if ( > diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm > index cdf66e89..e30e89e7 100644 > --- a/src/PVE/QemuServer.pm > +++ b/src/PVE/QemuServer.pm > @@ -5179,7 +5179,10 @@ sub vmconfig_update_agent { > > die "skip\n" if !$conf->{$opt}; > > - my $hotplug_options = { fstrim_cloned_disks => 1 }; > + my $hotplug_options = { > + fstrim_cloned_disks => 1, > + 'sync-time-on-resume' => 1, > + }; > > my $old_agent = parse_guest_agent($conf->{agent}); > my $agent = parse_guest_agent($value); > @@ -6006,6 +6009,19 @@ sub vm_start_nolock { > ); > } > > + # $resume = suspend-to-disk, $statefile = rollback with RAM. Rollback carries the snapshot's > + # agent setting in params; hibernate resume falls back to the live config > + if ($resume || ($statefile && !$migratedfrom)) { > + my $agent = $params->{agent} // $conf->{agent}; > + if (PVE::QemuServer::Agent::should_sync_time_on_resume( > + $agent, > + $params->{'sync-time-on-resume'}, > + )) { > + eval { PVE::QemuServer::Agent::guest_set_time($vmid); 1 }; > + warn "could not sync guest time after resume from saved state - $@" if $@; > + } > + } > + > return $res; > } > > diff --git a/src/PVE/QemuServer/Agent.pm b/src/PVE/QemuServer/Agent.pm > index be6df443..77cd2bd6 100644 > --- a/src/PVE/QemuServer/Agent.pm > +++ b/src/PVE/QemuServer/Agent.pm > @@ -4,6 +4,7 @@ use v5.36; > > use JSON; > use MIME::Base64 qw(decode_base64 encode_base64); > +use Time::HiRes qw(time); > > use PVE::JSONSchema; > > @@ -52,6 +53,24 @@ our $agent_fmt = { > optional => 1, > default => 1, > }, > + # FIXME: MAJOR VERSION: 10.0.0 - reconsider enabling this option as default Unless it was already discussed that this should be done, I would say this is a "TODO". > + 'sync-time-on-resume' => { > + description => "Synchronize the guest clock through QGA after operations that can leave" I think this should be "the guest's clock". Same in other places. > + . " the guest clock behind.", > + verbose_description => > + "Whether to issue the guest-set-time QEMU guest agent command to synchronize the" s/guest-set-time/'guest-set-time'/. > + . " guest clock to the host's current time after operations that can leave the guest" > + . " clock behind. This happens when resuming from hibernation, resuming from a paused" > + . " state, taking a snapshot, after a migration, and after rolling back to a snapshot" > + . " that includes RAM. In these cases the guest's clock may still reflect an earlier" > + . " time. The time is only synchronized when the QEMU Guest Agent option is enabled" > + . " in the guest's configuration and the agent is running inside of the guest. For" > + . " resume, hibernation resume and snapshot rollback, the synchronization can be" > + . " overridden per operation.", > + type => 'boolean', > + optional => 1, > + default => 0, > + }, > type => { > description => "Select the agent type", > type => 'string', > @@ -332,4 +351,39 @@ sub guest_fs_freeze_applicable($agent_str, $vmid, $logfunc = undef) { > return 1; > } > > +=head3 should_sync_time_on_resume > + > +Returns whether the guest's clock should be synchronized to the host's via the QEMU Guest Agent > +when the VM is resumed from saved state. Requires the guest agent to be enabled. An explicit > +C<$override> (from an API parameter) takes precedence over the C agent > +config property when defined; otherwise the config value is used (defaulting to off). Does B > +check whether the agent is actually running. > + > +=cut > + > +sub should_sync_time_on_resume($agent_str, $override = undef) { > + my $agent = parse_guest_agent($agent_str); > + return 0 if !$agent->{enabled}; > + return $override ? 1 : 0 if defined($override); > + return $agent->{'sync-time-on-resume'} // 0; > +} > + > +=head3 guest_set_time > + > +Sets the guest's clock to the current host time via the QEMU Guest Agent's > +C command. The host time is passed explicitly (as nanoseconds > +since the UNIX epoch) because the argument-less form reads the time from the > +guest's RTC, which isn't supported on all platforms (e.g. Windows); passing > +the value directly works regardless of guest OS. > + > +=cut > + > +sub guest_set_time($vmid, $quiet = undef) { > + my $time_ns = Time::HiRes::time() * 1_000_000_000; > + my $res = PVE::QemuServer::Monitor::mon_cmd($vmid, 'guest-set-time', time => int($time_ns)); > + check_agent_error($res, "unable to set guest time"); > + print "synced guest clock via guest agent\n" if !$quiet; > + return; Just an observation, this `return` seems unnecessary to me, however the thaw fn also has one. > +} > + > 1; > diff --git a/src/PVE/QemuServer/RunState.pm b/src/PVE/QemuServer/RunState.pm > index bbbcc88e..a72aed4d 100644 > --- a/src/PVE/QemuServer/RunState.pm > +++ b/src/PVE/QemuServer/RunState.pm > @@ -128,7 +128,7 @@ sub vm_suspend { > # location of the config file (source or target node) is not deterministic, > # since migration cannot wait for pmxcfs to process the rename > sub vm_resume { > - my ($vmid, $skiplock, $nocheck) = @_; > + my ($vmid, $skiplock, $nocheck, $sync_time, $quiet) = @_; > > PVE::QemuConfig->lock_config( > $vmid, > @@ -180,6 +180,15 @@ sub vm_resume { > if $resume_cmd eq 'cont'; > > mon_cmd($vmid, $resume_cmd); > + > + if ( > + $resume_cmd eq 'cont' > + && PVE::QemuServer::Agent::should_sync_time_on_resume($conf->{agent}, > + $sync_time) > + ) { > + eval { PVE::QemuServer::Agent::guest_set_time($vmid, $quiet); }; > + warn "could not sync guest time after resume - $@" if $@; > + } > }, > ); > } -- Maximiliano