From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <f.gruenbichler@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 9D143E07A
 for <pve-devel@lists.proxmox.com>; Thu, 21 Apr 2022 11:16:07 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 8B20C1D5FF
 for <pve-devel@lists.proxmox.com>; Thu, 21 Apr 2022 11:15:37 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id 743FF1D5F5
 for <pve-devel@lists.proxmox.com>; Thu, 21 Apr 2022 11:15:36 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 3FD004265A
 for <pve-devel@lists.proxmox.com>; Thu, 21 Apr 2022 11:15:30 +0200 (CEST)
Date: Thu, 21 Apr 2022 11:15:22 +0200
From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= <f.gruenbichler@proxmox.com>
To: Fabian Ebner <f.ebner@proxmox.com>, pve-devel@lists.proxmox.com
References: <20220318075123.5445-1-f.ebner@proxmox.com>
 <1650457969.m5c6lqzh1t.astroid@nora.none>
 <29ca4b42-5260-ecb2-0a5d-858dbf6a4b93@proxmox.com>
In-Reply-To: <29ca4b42-5260-ecb2-0a5d-858dbf6a4b93@proxmox.com>
MIME-Version: 1.0
User-Agent: astroid/0.15.0 (https://github.com/astroidmail/astroid)
Message-Id: <1650527352.zckk7hhjxo.astroid@nora.none>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.171 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pve-devel] [PATCH qemu-server] migrate: keep VM paused after
 migration if it was before
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Thu, 21 Apr 2022 09:16:07 -0000

On April 21, 2022 9:44 am, Fabian Ebner wrote:
> Am 20.04.22 um 14:43 schrieb Fabian Gr=C3=BCnbichler:
>> On March 18, 2022 8:51 am, Fabian Ebner wrote:
>>> Also cannot issue a guest agent command in that case.
>>>
>>> Reported in the community forum:
>>> https://forum.proxmox.com/threads/106618
>>>
>>> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
>>> ---
>>>
>>> Best viewed with -w.
>>>
>>>  PVE/QemuMigrate.pm | 54 ++++++++++++++++++++++++++--------------------
>>>  1 file changed, 31 insertions(+), 23 deletions(-)
>>=20
>> patch looks good to me - it might make sense to restructure the=20
>> conditionals a bit to log that resuming/fstrim was skipped though to=20
>> reduce confusion (user that paused VM and user doing the migration might=
=20
>> not be the same entity after all)?
>>=20
>> one other thing I noticed (pre-existing, but the changes here made me=20
>> look and my search came up short), inside phase2:
>>=20
>> - start block job(s) without autocompletion and wait for them to=20
>>   converge
>> - start RAM/state migration without autocompletion and wait for it to=20
>>   converge
>> X both source and target VMs are paused now with "identical" state,=20
>> irrespective of the source being paused or not initially
>> - cancel block job(s) (to close NBD writer(s) so that switchover can=20
>>   proceed in phase3_cleanup)
>>=20
>> if something happens after X in phase2, we enter phase2_cleanup, and=20
>> attempt to cancel the migration
>=20
> If migrate_cancel actually cancels the migration, the VM will be running
> on the source node again :)

see below - AFAIU migrate_cancel will resume if the migration reached=20
the point of being completed and the VM was not paused, but running at=20
that point.

> If migrate_cancel fails, resume might also fail?

possibly / probably, but at that point then error handling is already=20
best-effort so it doesn't hurt to try I guess. the only way for=20
migrate_cancel to fail (according to the docs ;)) is if QMP itself is=20
not available..

> There is an edge case however:
> If migration actually finished, but we aborted because of e.g. too many
> query-migrate failures, then migrate_cancel will succeed (because there
> is no active migration) and the VM will be in post-migrate state on the
> source node. Here, resume would help.

this confused me, so I tried to look at different code paths - a=20
transition from phase2 (after migration has started, any error before=20
that is irrelevant since it couldn't have triggered a pause) to=20
phase2_cleanup with $err / $self->{errors} set can only happen if

A) migration reached 'completed' state, VM state transitioned to=20
FINISH_MIGRATE, but block-job completing code after the query-migrate=20
loop died
B) query-migrate failed too often, VM state unknown
C) query-migrate returned no or an invalid status, VM state unknown
D) query-migrate returned failed or cancelled status, VM state unknown
E) migration was interrupted

in case A (which was my original sequence of events), the VM is now=20
paused for sure (by the migration), and we only want to resume it (via=20
migrate_cancel?) if it wasn't previously paused. so does migrate_cancel=20
always trigger a resume? if so, do we want to (re-)pause the VM=20
depending on whether it was originally paused?

in case B or C we don't know the state or what it means. we can attempt=20
to migrate_cancel (which might resume the VM?), which might give us the=20
same situation as A

in case D or E the VM might have been paused (and resumed again) by the=20
migration, or maybe not. so we might still want to do the same as with=20
situation A.

looking at the qemu source it should keep track of whether the VM was
running before it paused during the migration completion=20
(MigrationState->vm_was_running).

this gets set when the migration converges (in migration_completion),=20
before the VM transitions to FINISH_MIGRATE (i.e., paused while waiting=20
for completion). from there on we have two options - either the=20
migration status switches to failed, or to completed, and we BREAK the=20
migration iteration (migration_iteration_run), and we end up in=20
migration_iteration_finish.

migration_iteration_finish will transition the VM to POSTMIGRATE if the=20
migration was completed. otherwise, if the migration got cancelled, it=20
will start the VM again if it was marked as previously running but=20
paused by migration (see above), or else if it entered convergence but=20
didn't complete yet it will be switched to POSTMIGRATE (without=20
vm_start). cancelling before the migration converged simple does nothing=20
except cleaning up the migration - the VM is still running or paused=20
depending on its original state, since the migration never paused it it=20
also had no need to resume it.

so there is case A, where we should end up in POSTMIGRATE and want to=20
return to RUNNING or PAUSED, depending on original state. there is case=20
B, where cancelling before convergence is fine, cancelling after=20
convergence but before completion should resume iff originally running=20
(fine), but end up in POSTMIGRATE instead of PAUSED if not (wrong), with=20
cases C, D and E being similar but likely not resumed/completed.

but all of that would require some confirmation via tracing and=20
corresponding 'die' statements, possibly I didn't get the full picture=20
of how cancelling interacts with completion for example ;)