From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 412A0829C8 for ; Tue, 30 Nov 2021 15:06:48 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 1BF3E28017 for ; Tue, 30 Nov 2021 15:06:18 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id B4F202800B for ; Tue, 30 Nov 2021 15:06:16 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 7F38844D28 for ; Tue, 30 Nov 2021 15:06:16 +0100 (CET) To: pve-devel@lists.proxmox.com, =?UTF-8?Q?Fabian_Gr=c3=bcnbichler?= References: <20211111140721.3288364-1-f.gruenbichler@proxmox.com> From: Fabian Ebner Message-ID: <7a0968df-559b-81de-1df1-f912866b39d5@proxmox.com> Date: Tue, 30 Nov 2021 15:06:14 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: <20211111140721.3288364-1-f.gruenbichler@proxmox.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.157 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -1.317 Looks like a legit reply (A) NUMERIC_HTTP_ADDR 1.242 Uses a numeric IP address in URL POISEN_SPAM_PILL_3 0.1 random spam to be learned in bayes SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [qemuserver.pm, qemu.pm, anyevent.pm, accesscontrol.pm, qemumigrate.pm] WEIRD_PORT 0.001 Uses non-standard port number for HTTP Subject: Re: [pve-devel] [PATCH v2 qemu-server++ 0/15] remote migration X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Nov 2021 14:06:48 -0000 Am 11.11.21 um 15:07 schrieb Fabian Grünbichler: > this series adds remote migration for VMs. > > both live and offline migration including NBD and storage-migrated disks > should work. > Played around with it for a while. Biggest issue is that migration fails if there is no 'meta' property in the config. Most other things I wish for are better error handling, but it seems to be in good shape otherwise! Error "storage does not exist" if the real issue is missing access rights. But that error also appears if missing access for /cluster/resources or if the target node does not exists. For the 'config' command, 'Sys.Modify' seems to be required failed to handle 'config' command - 403 Permission check failed (/, Sys.Modify) but it does create an empty configuration file, leading to target_vmid: Guest with ID '5678' already exists on remote cluster on the next attempt. It also already allocates the disks, but doesn't clean them up, because it gets the wrong lock (since the config is empty) and aborts the 'quit' command. If the config is not recent enough to have a 'meta' property: failed to handle 'config' command - unable to parse value of 'meta' - got undefined value Same issue with disk+config cleanup as above. The local VM stayes locked with 'migrate'. Is that how it should be? Also the __migration__ snapshot will stay around, resulting in an error when trying to migrate again. For live migration I always got a (cosmetic?) "WS closed unexpectedly"-error: tunnel: -> sending command "quit" to remote tunnel: <- got reply tunnel: Tunnel to https://192.168.20.142:8006/api2/json/nodes/rob2/qemu/5678/mtunnelwebsocket? ticket=PVETUNNEL%3A&socket=%2Frun%2Fqemu-server%2F5678.mtunnel failed - WS closed unexpectedly 2021-11-30 13:49:39 migration finished successfully (duration 00:01:02) UPID:pve701:0000D8AD:000CB782:61A61DA5:qmigrate:111:root@pam: Fun fact: the identity storage mapping will be used for storages that don't appear in the explicit mapping. E.g. it's possible to migrate a VM that only has disks on storeA with --target-storage storeB:storeB (if storeA exists on the target of course). But the explicit identity mapping is prohibited. When a target bridge is not present (should that be detected ahead of starting the migration?) and likely for any other startup failure the only error in the log is: 2021-11-30 14:43:10 ERROR: online migrate failure - error - tunnel command '{"cmd":"star failed to handle 'start' command - start failed: QEMU exited with code 1 For non-remote migration we are more verbose in this case and log the QEMU output. Can/should an interrupt be handled more gracefully, so that remote cleanup still happens? ^CCMD websocket tunnel died: command 'proxmox-websocket-tunnel' failed: interrupted by signal 2021-11-30 14:39:07 ERROR: interrupted by signal 2021-11-30 14:39:07 aborting phase 1 - cleanup resources 2021-11-30 14:39:08 ERROR: writing to tunnel failed: broken pipe 2021-11-30 14:39:08 ERROR: migration aborted (duration 00:00:10): interrupted by signal > besides lots of rebases, implemented todos and fixed issues the main > difference to the previous RFC is that we no longer define remote > entries in a config file, but just expect the caller/client to give us > all the required information to connect to the remote cluster. > > new in v2: dropped parts already applied, incorporated Fabian's and > Dominik's feedback (thanks!) > > overview over affected repos and changes, see individual patches for > more details. > > proxmox-websocket-tunnel: > > new tunnel helper tool for forwarding commands and data over websocket > connections, required by qemu-server on source side > > pve-access-control: > > new ticket type, required by qemu-server on target side > > pve-guest-common: > > handle remote migration (no SSH) in AbstractMigrate, > required by qemu-server > > pve-storage: > > extend 'pvesm import' to allow import from UNIX socket, required on > target node by qemu-server > > qemu-server: > > some refactoring, new mtunnel endpoints, new remote_migration endpoints > TODO: handle pending changes and snapshots > TODO: proper CLI for remote migration > potential TODO: precond endpoint? > > pve-http-server: > > fix for handling unflushed proxy streams > > as usual, some of the patches are best viewed with '-w', especially in > qemu-server.. > > required dependencies are noted, qemu-server also requires a build-dep > on patched pve-common since the required options/formats would be > missing otherwise.. > proxmox-websocket-tunnel > > Fabian Grünbichler (4): > initial commit > add tunnel implementation > add fingerprint validation > add packaging > > pve-access-control > > Fabian Grünbichler (2): > tickets: add tunnel ticket > ticket: normalize path for verification > > src/PVE/AccessControl.pm | 52 ++++++++++++++++++++++++++++++---------- > 1 file changed, 40 insertions(+), 12 deletions(-) > > pve-http-server > > Fabian Grünbichler (1): > webproxy: handle unflushed write buffer > > src/PVE/APIServer/AnyEvent.pm | 10 ++++++---- > 1 file changed, 6 insertions(+), 4 deletions(-) > > qemu-server > > Fabian Grünbichler (8): > refactor map_storage to map_id > schema: use pve-bridge-id > update_vm: allow simultaneous setting of boot-order and dev > nbd alloc helper: allow passing in explicit format > mtunnel: add API endpoints > migrate: refactor remote VM/tunnel start > migrate: add remote migration handling > api: add remote migrate endpoint > > PVE/API2/Qemu.pm | 826 ++++++++++++++++++++++++++++++++++++++++++++- > PVE/QemuMigrate.pm | 813 ++++++++++++++++++++++++++++++++++++-------- > PVE/QemuServer.pm | 80 +++-- > debian/control | 2 + > 4 files changed, 1539 insertions(+), 182 deletions(-) >