From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id DCBD4800B4 for ; Tue, 16 Nov 2021 12:40:11 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id CA554168E5 for ; Tue, 16 Nov 2021 12:40:11 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 12FE2168DA for ; Tue, 16 Nov 2021 12:40:11 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id A49D343B94 for ; Tue, 16 Nov 2021 12:40:10 +0100 (CET) Date: Tue, 16 Nov 2021 12:39:54 +0100 From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= To: Proxmox VE development discussion , Thomas Lamprecht References: <20211116105215.1812508-1-f.gruenbichler@proxmox.com> In-Reply-To: MIME-Version: 1.0 User-Agent: astroid/0.15.0 (https://github.com/astroidmail/astroid) Message-Id: <1637061780.roho39wcf6.astroid@nora.none> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-SPAM-LEVEL: Spam detection results: 0 AWL 0.260 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [qemuserver.pm] Subject: Re: [pve-devel] [PATCH qemu-server] migrate: skip tpmstate for NBD migration X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Nov 2021 11:40:11 -0000 On November 16, 2021 12:12 pm, Thomas Lamprecht wrote: > On 16.11.21 11:52, Fabian Gr=C3=BCnbichler wrote: >> the tpmstate volume is not available in the VM directly, but we do >> migrate the state volume via a storage migration anyway if necessary. >>=20 >=20 > some context would be great to have in the commit message, iow. mentionin= g > that QEMU is already migrating this as part of its memory/state migration= . I tried to get some understanding of how this works, and I don't think=20 that the stuff that Qemu copies as part of the TPM emulator state covers=20 everything that is in the state volume. what happens is the following: - our migration code finds a tpmstate volume, it gets migrated via=20 storage_migrate if on local storage (and replicated if that is=20 enabled) - the VM is started on the remote node with the initial swtpm setup part=20 skipped, since we already have a volume with state - the RAM migration happens (and rest of state, including 'tpm emulator=20 state') so there is a window between storage_migrate/replication happening, and=20 the migration being finished where changes to the TPM state volume from=20 within the guest could potentially get lost (unless the state covered by=20 the migrate stream covers ALL the state inside the state volume, which I=20 don't think, but maybe I am mistaken on that front). but this is irrespective of this patch, which just fixes the wrong=20 attempt of setting up an NBD server for the replicated tpm state volume.=20 even attaching the volume (like we do for backups) and setting up that=20 NBD server would not help, since changes to the state volume are not=20 tracked in the source VM on the block level, as Qemu doesn't access the=20 state volume directly, only swtpm does. >=20 > Also, how is "migrate -> stop -> start" affected, is the TPM synced out t= o > the (previously replicated?) disk on the target side during stop? I am not sure I understand this question. nothing changes about the flow=20 of migration with this patch, except that where the migration would fall=20 apart previously if replication was enabled, it now works. the handling=20 of the state volume is unchanged / identical to a VM that is not=20 replicated. in either case we only sync the state volume once, before=20 starting the VM on the target node, doing block mirror, and the=20 ram/state migration. swtpm probably syncs it whenever state-changing=20 operations are issued from within the VM - but that is not something=20 that we can control when shutting down the VM. AFAIU, the 'raw' state of=20 the TPM is not even available to Qemu directly, that's the whole point=20 of the swtpm component after all? >=20 >> this code path was only triggered for replicated VMs with TPM. >>=20 >> Signed-off-by: Fabian Gr=C3=BCnbichler >> --- >> PVE/QemuServer.pm | 1 + >> 1 file changed, 1 insertion(+) >>=20 >> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm >> index 580af9e..76d45a2 100644 >> --- a/PVE/QemuServer.pm >> +++ b/PVE/QemuServer.pm >> @@ -5238,6 +5238,7 @@ sub vm_migrate_get_nbd_disks { >> my ($ds, $drive) =3D @_; >> =20 >> return if drive_is_cdrom($drive); >> + return if $ds eq 'tpmstate0'; >> =20 >> my $volid =3D $drive->{file}; >> =20 >>=20 >=20 >=20