* [pve-devel] [PATCH qemu-server 0/2] migration: fix sporadic nbd-server-stop timeout
@ 2023-09-29 8:28 Alexandre Derumier
2023-09-29 8:28 ` [pve-devel] [PATCH qemu-server 1/2] nbd_stop: increase timeout to 25s Alexandre Derumier
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Alexandre Derumier @ 2023-09-29 8:28 UTC (permalink / raw)
To: pve-devel
Hi,
We had some sporadic nbd-stop error when trying to migrate vm with rbd storage + writeback between 2 differents cluster:
(This is without my other targetcpu patch)
2023-09-28 16:20:39 ERROR: error - tunnel command '{"cmd":"nbdstop"}' failed - failed to handle 'nbdstop' command - VM 140 qmp command 'nbd-server-stop' failed - got timeout
2023-09-28 16:20:39 ERROR: migration finished with problems (duration 00:01:42)
I'm not sure, maybe it's related to writeback, because it never happend with a fresh started vm, but vms running since some time can trigger this.
(I'm not sure, maybe nbd need to flush pending datas in cache ?)
Currently, the tunnel command have a 30s timeout, but the qmp command is only at 5s.
Also the tunnel v2 command don't have any eval, so the migration abort keeping both source && target vm locked.
unlocking target vm and resume it manually is working, so it really seem to be a too low timeout.
Alexandre Derumier (2):
nbd_stop: increase timeout to 25s
migration: add missing eval on nbdstop with tunnel v2.
PVE/QemuMigrate.pm | 8 +++++++-
PVE/QemuServer.pm | 2 +-
2 files changed, 8 insertions(+), 2 deletions(-)
--
2.39.2
^ permalink raw reply [flat|nested] 5+ messages in thread
* [pve-devel] [PATCH qemu-server 1/2] nbd_stop: increase timeout to 25s
2023-09-29 8:28 [pve-devel] [PATCH qemu-server 0/2] migration: fix sporadic nbd-server-stop timeout Alexandre Derumier
@ 2023-09-29 8:28 ` Alexandre Derumier
2023-09-29 8:28 ` [pve-devel] [PATCH qemu-server 2/2] migration: add missing eval on nbdstop with tunnel v2 Alexandre Derumier
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Alexandre Derumier @ 2023-09-29 8:28 UTC (permalink / raw)
To: pve-devel
---
PVE/QemuServer.pm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 1b1ccf4..0259c0f 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -8267,7 +8267,7 @@ sub generate_smbios1_uuid {
sub nbd_stop {
my ($vmid) = @_;
- mon_cmd($vmid, 'nbd-server-stop');
+ mon_cmd($vmid, 'nbd-server-stop', timeout => 25);
}
sub create_reboot_request {
--
2.39.2
^ permalink raw reply [flat|nested] 5+ messages in thread
* [pve-devel] [PATCH qemu-server 2/2] migration: add missing eval on nbdstop with tunnel v2.
2023-09-29 8:28 [pve-devel] [PATCH qemu-server 0/2] migration: fix sporadic nbd-server-stop timeout Alexandre Derumier
2023-09-29 8:28 ` [pve-devel] [PATCH qemu-server 1/2] nbd_stop: increase timeout to 25s Alexandre Derumier
@ 2023-09-29 8:28 ` Alexandre Derumier
2023-09-29 11:57 ` [pve-devel] [PATCH qemu-server 0/2] migration: fix sporadic nbd-server-stop timeout Fiona Ebner
2023-11-06 18:48 ` [pve-devel] applied: " Thomas Lamprecht
3 siblings, 0 replies; 5+ messages in thread
From: Alexandre Derumier @ 2023-09-29 8:28 UTC (permalink / raw)
To: pve-devel
It was already done in tunnel v1.
Avoid to avoid migration (and keep both source/targetvm locked) if nbdstop error occur
2023-09-28 16:20:39 ERROR: error - tunnel command '{"cmd":"nbdstop"}' failed - failed to handle 'nbdstop' command - VM 140 qmp command 'nbd-server-stop' failed - got timeout
2023-09-28 16:20:39 ERROR: migration finished with problems (duration 00:01:42)
---
PVE/QemuMigrate.pm | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index f41c61f..81880e5 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -1475,7 +1475,13 @@ sub phase3_cleanup {
$self->log('info', "stopping NBD storage migration server on target.");
# stop nbd server on remote vm - requirement for resume since 2.9
if ($tunnel && $tunnel->{version} && $tunnel->{version} >= 2) {
- PVE::Tunnel::write_tunnel($tunnel, 30, 'nbdstop');
+ eval {
+ PVE::Tunnel::write_tunnel($tunnel, 30, 'nbdstop');
+ };
+ if (my $err = $@) {
+ $self->log('err', $err);
+ $self->{errors} = 1;
+ }
} else {
my $cmd = [@{$self->{rem_ssh}}, 'qm', 'nbdstop', $vmid];
--
2.39.2
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [pve-devel] [PATCH qemu-server 0/2] migration: fix sporadic nbd-server-stop timeout
2023-09-29 8:28 [pve-devel] [PATCH qemu-server 0/2] migration: fix sporadic nbd-server-stop timeout Alexandre Derumier
2023-09-29 8:28 ` [pve-devel] [PATCH qemu-server 1/2] nbd_stop: increase timeout to 25s Alexandre Derumier
2023-09-29 8:28 ` [pve-devel] [PATCH qemu-server 2/2] migration: add missing eval on nbdstop with tunnel v2 Alexandre Derumier
@ 2023-09-29 11:57 ` Fiona Ebner
2023-11-06 18:48 ` [pve-devel] applied: " Thomas Lamprecht
3 siblings, 0 replies; 5+ messages in thread
From: Fiona Ebner @ 2023-09-29 11:57 UTC (permalink / raw)
To: Proxmox VE development discussion, Alexandre Derumier
Am 29.09.23 um 10:28 schrieb Alexandre Derumier:
> I'm not sure, maybe it's related to writeback, because it never happend with a fresh started vm, but vms running since some time can trigger this.
> (I'm not sure, maybe nbd need to flush pending datas in cache ?)
>
It does drain the export's BlockBackend, i.e. wait for all pending IO
before detaching/closing the export. But we did cancel the mirror job,
which should actually wait for any in-flight IO already, so it's a bit
surprising. Maybe there's some cache interaction happening at an
inconvenient time, no idea ¯\_(ツ)_/¯
The other thing it does is closing the connection to the client, so
there is at least that IO interaction and a higher timeout makes sense.
>
> Alexandre Derumier (2):
> nbd_stop: increase timeout to 25s
> migration: add missing eval on nbdstop with tunnel v2.
>
Patches look good to me, so consider them
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
but they are missing your Signed-off-by trailer. Can you please add that?
^ permalink raw reply [flat|nested] 5+ messages in thread
* [pve-devel] applied: [PATCH qemu-server 0/2] migration: fix sporadic nbd-server-stop timeout
2023-09-29 8:28 [pve-devel] [PATCH qemu-server 0/2] migration: fix sporadic nbd-server-stop timeout Alexandre Derumier
` (2 preceding siblings ...)
2023-09-29 11:57 ` [pve-devel] [PATCH qemu-server 0/2] migration: fix sporadic nbd-server-stop timeout Fiona Ebner
@ 2023-11-06 18:48 ` Thomas Lamprecht
3 siblings, 0 replies; 5+ messages in thread
From: Thomas Lamprecht @ 2023-11-06 18:48 UTC (permalink / raw)
To: Proxmox VE development discussion, Alexandre Derumier
Am 29/09/2023 um 10:28 schrieb Alexandre Derumier:
> We had some sporadic nbd-stop error when
applied both patches, thanks!
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-11-06 18:48 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-29 8:28 [pve-devel] [PATCH qemu-server 0/2] migration: fix sporadic nbd-server-stop timeout Alexandre Derumier
2023-09-29 8:28 ` [pve-devel] [PATCH qemu-server 1/2] nbd_stop: increase timeout to 25s Alexandre Derumier
2023-09-29 8:28 ` [pve-devel] [PATCH qemu-server 2/2] migration: add missing eval on nbdstop with tunnel v2 Alexandre Derumier
2023-09-29 11:57 ` [pve-devel] [PATCH qemu-server 0/2] migration: fix sporadic nbd-server-stop timeout Fiona Ebner
2023-11-06 18:48 ` [pve-devel] applied: " Thomas Lamprecht
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox