all lists on lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH qemu-server 0/2] migration: fix sporadic nbd-server-stop timeout
@ 2023-09-29  8:28 Alexandre Derumier
  2023-09-29  8:28 ` [pve-devel] [PATCH qemu-server 1/2] nbd_stop: increase timeout to 25s Alexandre Derumier
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Alexandre Derumier @ 2023-09-29  8:28 UTC (permalink / raw)
  To: pve-devel

Hi,

We had some sporadic nbd-stop error when trying to migrate vm with rbd storage + writeback between 2 differents cluster:
(This is without my other targetcpu patch)


2023-09-28 16:20:39 ERROR: error - tunnel command '{"cmd":"nbdstop"}' failed - failed to handle 'nbdstop' command - VM 140 qmp command 'nbd-server-stop' failed - got timeout
2023-09-28 16:20:39 ERROR: migration finished with problems (duration 00:01:42)


I'm not sure, maybe it's related to writeback, because it never happend with a fresh started vm, but vms running since some time can trigger this.
(I'm not sure, maybe nbd need to flush pending datas in cache ?)


Currently, the tunnel command have a 30s timeout, but the qmp command is only at 5s.
Also the tunnel v2 command don't have any eval, so the migration abort keeping both source && target vm locked.
unlocking target vm and resume it manually is working, so it really seem to be a too low timeout.


Alexandre Derumier (2):
  nbd_stop: increase timeout to 25s
  migration: add missing eval on nbdstop with tunnel v2.

 PVE/QemuMigrate.pm | 8 +++++++-
 PVE/QemuServer.pm  | 2 +-
 2 files changed, 8 insertions(+), 2 deletions(-)

-- 
2.39.2




^ permalink raw reply	[flat|nested] 5+ messages in thread

* [pve-devel] [PATCH qemu-server 1/2] nbd_stop: increase timeout to 25s
  2023-09-29  8:28 [pve-devel] [PATCH qemu-server 0/2] migration: fix sporadic nbd-server-stop timeout Alexandre Derumier
@ 2023-09-29  8:28 ` Alexandre Derumier
  2023-09-29  8:28 ` [pve-devel] [PATCH qemu-server 2/2] migration: add missing eval on nbdstop with tunnel v2 Alexandre Derumier
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Alexandre Derumier @ 2023-09-29  8:28 UTC (permalink / raw)
  To: pve-devel

---
 PVE/QemuServer.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 1b1ccf4..0259c0f 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -8267,7 +8267,7 @@ sub generate_smbios1_uuid {
 sub nbd_stop {
     my ($vmid) = @_;
 
-    mon_cmd($vmid, 'nbd-server-stop');
+    mon_cmd($vmid, 'nbd-server-stop', timeout => 25);
 }
 
 sub create_reboot_request {
-- 
2.39.2




^ permalink raw reply	[flat|nested] 5+ messages in thread

* [pve-devel] [PATCH qemu-server 2/2] migration: add missing eval on nbdstop with tunnel v2.
  2023-09-29  8:28 [pve-devel] [PATCH qemu-server 0/2] migration: fix sporadic nbd-server-stop timeout Alexandre Derumier
  2023-09-29  8:28 ` [pve-devel] [PATCH qemu-server 1/2] nbd_stop: increase timeout to 25s Alexandre Derumier
@ 2023-09-29  8:28 ` Alexandre Derumier
  2023-09-29 11:57 ` [pve-devel] [PATCH qemu-server 0/2] migration: fix sporadic nbd-server-stop timeout Fiona Ebner
  2023-11-06 18:48 ` [pve-devel] applied: " Thomas Lamprecht
  3 siblings, 0 replies; 5+ messages in thread
From: Alexandre Derumier @ 2023-09-29  8:28 UTC (permalink / raw)
  To: pve-devel

It was already done in tunnel v1.

Avoid to avoid migration (and keep both source/targetvm locked) if nbdstop error occur

2023-09-28 16:20:39 ERROR: error - tunnel command '{"cmd":"nbdstop"}' failed - failed to handle 'nbdstop' command - VM 140 qmp command 'nbd-server-stop' failed - got timeout
2023-09-28 16:20:39 ERROR: migration finished with problems (duration 00:01:42)
---
 PVE/QemuMigrate.pm | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index f41c61f..81880e5 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -1475,7 +1475,13 @@ sub phase3_cleanup {
 	    $self->log('info', "stopping NBD storage migration server on target.");
 	    # stop nbd server on remote vm - requirement for resume since 2.9
 	    if ($tunnel && $tunnel->{version} && $tunnel->{version} >= 2) {
-		PVE::Tunnel::write_tunnel($tunnel, 30, 'nbdstop');
+		eval {
+		    PVE::Tunnel::write_tunnel($tunnel, 30, 'nbdstop');
+		};
+		if (my $err = $@) {
+		    $self->log('err', $err);
+		    $self->{errors} = 1;
+		}
 	    } else {
 		my $cmd = [@{$self->{rem_ssh}}, 'qm', 'nbdstop', $vmid];
 
-- 
2.39.2




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pve-devel] [PATCH qemu-server 0/2] migration: fix sporadic nbd-server-stop timeout
  2023-09-29  8:28 [pve-devel] [PATCH qemu-server 0/2] migration: fix sporadic nbd-server-stop timeout Alexandre Derumier
  2023-09-29  8:28 ` [pve-devel] [PATCH qemu-server 1/2] nbd_stop: increase timeout to 25s Alexandre Derumier
  2023-09-29  8:28 ` [pve-devel] [PATCH qemu-server 2/2] migration: add missing eval on nbdstop with tunnel v2 Alexandre Derumier
@ 2023-09-29 11:57 ` Fiona Ebner
  2023-11-06 18:48 ` [pve-devel] applied: " Thomas Lamprecht
  3 siblings, 0 replies; 5+ messages in thread
From: Fiona Ebner @ 2023-09-29 11:57 UTC (permalink / raw)
  To: Proxmox VE development discussion, Alexandre Derumier

Am 29.09.23 um 10:28 schrieb Alexandre Derumier:
> I'm not sure, maybe it's related to writeback, because it never happend with a fresh started vm, but vms running since some time can trigger this.
> (I'm not sure, maybe nbd need to flush pending datas in cache ?)
> 

It does drain the export's BlockBackend, i.e. wait for all pending IO
before detaching/closing the export. But we did cancel the mirror job,
which should actually wait for any in-flight IO already, so it's a bit
surprising. Maybe there's some cache interaction happening at an
inconvenient time, no idea ¯\_(ツ)_/¯

The other thing it does is closing the connection to the client, so
there is at least that IO interaction and a higher timeout makes sense.

> 
> Alexandre Derumier (2):
>   nbd_stop: increase timeout to 25s
>   migration: add missing eval on nbdstop with tunnel v2.
> 

Patches look good to me, so consider them

Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>

but they are missing your Signed-off-by trailer. Can you please add that?




^ permalink raw reply	[flat|nested] 5+ messages in thread

* [pve-devel] applied: [PATCH qemu-server 0/2] migration: fix sporadic nbd-server-stop timeout
  2023-09-29  8:28 [pve-devel] [PATCH qemu-server 0/2] migration: fix sporadic nbd-server-stop timeout Alexandre Derumier
                   ` (2 preceding siblings ...)
  2023-09-29 11:57 ` [pve-devel] [PATCH qemu-server 0/2] migration: fix sporadic nbd-server-stop timeout Fiona Ebner
@ 2023-11-06 18:48 ` Thomas Lamprecht
  3 siblings, 0 replies; 5+ messages in thread
From: Thomas Lamprecht @ 2023-11-06 18:48 UTC (permalink / raw)
  To: Proxmox VE development discussion, Alexandre Derumier

Am 29/09/2023 um 10:28 schrieb Alexandre Derumier:
> We had some sporadic nbd-stop error when


applied both patches, thanks!




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-11-06 18:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-29  8:28 [pve-devel] [PATCH qemu-server 0/2] migration: fix sporadic nbd-server-stop timeout Alexandre Derumier
2023-09-29  8:28 ` [pve-devel] [PATCH qemu-server 1/2] nbd_stop: increase timeout to 25s Alexandre Derumier
2023-09-29  8:28 ` [pve-devel] [PATCH qemu-server 2/2] migration: add missing eval on nbdstop with tunnel v2 Alexandre Derumier
2023-09-29 11:57 ` [pve-devel] [PATCH qemu-server 0/2] migration: fix sporadic nbd-server-stop timeout Fiona Ebner
2023-11-06 18:48 ` [pve-devel] applied: " Thomas Lamprecht

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal