From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id ADC551FF141 for ; Fri, 27 Feb 2026 11:30:40 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id A4BF0C58D; Fri, 27 Feb 2026 11:31:36 +0100 (CET) Content-Type: text/plain; charset=UTF-8 Date: Fri, 27 Feb 2026 11:31:00 +0100 Message-Id: From: "Daniel Kral" To: "Fiona Ebner" , Subject: Re: [PATCH qemu-server v2 06/14] migration: intra-cluster: check config can be parsed on target node Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Mailer: aerc 0.21.0-38-g7088c3642f2c-dirty References: <20260225151931.176335-1-f.ebner@proxmox.com> <20260225151931.176335-7-f.ebner@proxmox.com> In-Reply-To: <20260225151931.176335-7-f.ebner@proxmox.com> X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1772188241884 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.029 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: ECHQHY7UVR5SBG2TZFWHB3WWD43ZUDUS X-Message-ID-Hash: ECHQHY7UVR5SBG2TZFWHB3WWD43ZUDUS X-MailFrom: d.kral@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Wed Feb 25, 2026 at 4:18 PM CET, Fiona Ebner wrote: > diff --git a/src/PVE/API2/Qemu.pm b/src/PVE/API2/Qemu.pm > index 1f0864f5..47466513 100644 > --- a/src/PVE/API2/Qemu.pm > +++ b/src/PVE/API2/Qemu.pm > @@ -5399,7 +5399,9 @@ __PACKAGE__->register_method({ > force =3D> { > type =3D> 'boolean', > description =3D> > - "Allow to migrate VMs which use local devices. Only = root may use this option.", > + "Allow to migrate VMs which use local devices and fo= r intra-cluster migration," > + . " configuration options not understood by the targ= et. Only root may use this" > + . " option.", HA-managed VMs are always migrated with force set as it was assumed to be only used for local devices at the time [0]. This might need some adaption so that LRM-initiated migrations won't cause problems for those VMs that this patch series wants to fix. [0] https://git.proxmox.com/?p=3Dpve-ha-manager.git;a=3Dblob;f=3Dsrc/PVE/HA= /Resources/PVEVM.pm;h=3D7586da84b7f19686b680d4e1434a17ffe1633d6d;hb=3D1a8d8= bcef1934a43d37344caf965c082e55d451c#l116 As we might want to know which guests can be moved to which nodes in the future quickly, e.g. for the load balancer to know which target nodes to consider, I briefly considered whether it could also make sense to have some config versioning, which is negotiated between the source and target node (e.g. qemu-server on the source node is lower than the target node, so the VM can be migrated), but that might be too strict, especially for guests that don't even use the new config properties of the more recent qemu-server version. But maybe these load-balancing decisions can also be more coarse-grained then this more fine-grained check for config compatibility and implemented at a later time when it actually is needed. What do you think? > optional =3D> 1, > }, > migration_type =3D> { > diff --git a/src/PVE/QemuMigrate.pm b/src/PVE/QemuMigrate.pm > index f7ec3227..901fe96d 100644 > --- a/src/PVE/QemuMigrate.pm > +++ b/src/PVE/QemuMigrate.pm > @@ -355,6 +355,33 @@ sub prepare { > my $cmd =3D [@{ $self->{rem_ssh} }, '/bin/true']; > eval { $self->cmd_quiet($cmd); }; > die "Can't connect to destination address using public key\n" if= $@; > + > + if (!$self->{opts}->{force}) { > + # Fork a short-lived tunnel for checking the config. Later, = the proper tunnel with SSH > + # forwaring info is forked. > + my $tunnel =3D $self->fork_tunnel(); > + # Compared to remote migration, which also does volume activ= ation, this only strictly > + # parses the config, so no large timeout is needed. Unfortun= ately, mtunnel did not > + # indicate that a command is unknown, but not reply at all, = so the timeout must be very > + # low right now. > + # FIXME PVE 10 - bump timeout, the trade-off between delayin= g backwards migration and > + # giving config check more time should now be in favor of co= nfig checking > + eval { > + my $nodename =3D PVE::INotify::nodename(); > + PVE::Tunnel::write_tunnel($tunnel, 3, "config $vmid $nod= ename"); > + }; > + if (my $err =3D $@) { > + chomp($err); > + # if there is no reply, assume target did not know the c= ommand yet > + if ($err =3D~ m/^no reply to command/) { > + $self->log('info', "skipping strict configuration ch= eck (target too old?)"); > + } else { > + die "$err - use --force to migrate regardless\n"; Though unlikely (I couldn't hit `systemctl stop sshd` on time on the target node with a few tries ^^), write_tunnel(...) might fail with $err that don't really explain why the migration failed. It might be better to filter here or explicitly prepend that the strict config check failed here and then add the full error message? > + } > + } > + eval { PVE::Tunnel::finish_tunnel($tunnel); }; > + $self->log('warn', "failed to finish tunnel in prepare() - $= @") if $@; > + } > } > =20 > return $running; > diff --git a/src/test/MigrationTest/QemuMigrateMock.pm b/src/test/Migrati= onTest/QemuMigrateMock.pm > index df8b575a..170634de 100644 > --- a/src/test/MigrationTest/QemuMigrateMock.pm > +++ b/src/test/MigrationTest/QemuMigrateMock.pm > @@ -65,6 +65,10 @@ $tunnel_module->mock( > my $vmid =3D $1; > die "resuming wrong VM '$vmid'\n" if $vmid ne $test_vmid; > return; > + } elsif ($command =3D~ m/^config (\d+) (\S+)$/) { > + my ($vmid, $node) =3D ($1, $2); > + die "check config for wrong VM '$vmid'\n" if $vmid ne $test_= vmid; > + return; > } > die "write_tunnel (mocked) - implement me: $command\n"; > }, > @@ -73,7 +77,12 @@ $tunnel_module->mock( > my $qemu_migrate_module =3D Test::MockModule->new("PVE::QemuMigrate"); > $qemu_migrate_module->mock( > fork_tunnel =3D> sub { > - die "fork_tunnel (mocked) - implement me\n"; # currently no call= should lead here > + return { > + writer =3D> "mocked", > + reader =3D> "mocked", > + pid =3D> 123456, > + version =3D> 1, > + }; > }, > start_remote_tunnel =3D> sub { > my ($self, $raddr, $rport, $ruri, $unix_socket_info) =3D @_;