From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <pve-devel-bounces@lists.proxmox.com> Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 2D6BB1FF191 for <inbox@lore.proxmox.com>; Mon, 2 Jun 2025 15:23:58 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 9730533E51; Mon, 2 Jun 2025 15:24:14 +0200 (CEST) References: <mailman.538.1747833190.394.pve-devel@lists.proxmox.com> <1283184248.17536.1747895442851@webmail.proxmox.com> <857cbd6c-6866-417d-a71f-f5b5297bf09c@storpool.com> <1349127939.17705.1747902137180@webmail.proxmox.com> <CAHXTzuk7tYRJV_j=88RWc3R3C7AkiEdFUXi88m5qwnDeYDEC+A@mail.gmail.com> <11746909.21389.1748414016786@webmail.proxmox.com> <CAHXTzumXeyJQQCj+45Hmy5qdU+BTFBYbHVgPy0u3VS-qS=_bDQ@mail.gmail.com> <1695649345.530.1748849837156@webmail.proxmox.com> <CAHXTzukAMG9050Ynn-KRSqhCz2Y0m6vnAQ7FEkCmEdQT3HapfQ@mail.gmail.com> <1233617227.683.1748854174885@webmail.proxmox.com> <CAHXTzu=AiNx0iTWFEUU2kdzx9-RopwLc7rqGui6f0Q=+Hy52=w@mail.gmail.com> <2141074266.768.1748864577569@webmail.proxmox.com> In-Reply-To: <2141074266.768.1748864577569@webmail.proxmox.com> Date: Mon, 2 Jun 2025 16:23:27 +0300 To: =?UTF-8?Q?Fabian_Gr=C3=BCnbichler?= <f.gruenbichler@proxmox.com> MIME-Version: 1.0 Message-ID: <mailman.167.1748870653.395.pve-devel@lists.proxmox.com> List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com> List-Post: <mailto:pve-devel@lists.proxmox.com> From: Denis Kanchev via pve-devel <pve-devel@lists.proxmox.com> Precedence: list Cc: Denis Kanchev <denis.kanchev@storpool.com>, Wolfgang Bumiller <w.bumiller@proxmox.com>, Proxmox VE development discussion <pve-devel@lists.proxmox.com> X-Mailman-Version: 2.1.29 X-BeenThere: pve-devel@lists.proxmox.com List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe> List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe> List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/> Reply-To: Proxmox VE development discussion <pve-devel@lists.proxmox.com> List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help> Subject: Re: [pve-devel] PVE child process behavior question Content-Type: multipart/mixed; boundary="===============0207705269311310791==" Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" <pve-devel-bounces@lists.proxmox.com> --===============0207705269311310791== Content-Type: message/rfc822 Content-Disposition: inline Return-Path: <denis.kanchev@storpool.com> X-Original-To: pve-devel@lists.proxmox.com Delivered-To: pve-devel@lists.proxmox.com Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 05C73CB074 for <pve-devel@lists.proxmox.com>; Mon, 2 Jun 2025 15:24:13 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id D956E33E38 for <pve-devel@lists.proxmox.com>; Mon, 2 Jun 2025 15:24:12 +0200 (CEST) Received: from mail-yw1-x1143.google.com (mail-yw1-x1143.google.com [IPv6:2607:f8b0:4864:20::1143]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for <pve-devel@lists.proxmox.com>; Mon, 2 Jun 2025 15:24:11 +0200 (CEST) Received: by mail-yw1-x1143.google.com with SMTP id 00721157ae682-70e447507a0so35379247b3.0 for <pve-devel@lists.proxmox.com>; Mon, 02 Jun 2025 06:24:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=storpool.com; s=google; t=1748870644; x=1749475444; darn=lists.proxmox.com; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=bsZUVkW+uvkh+Cy920ab/ZJDj08oiwiohk0sDWI6Izk=; b=B/lG6Qc+iqSdjAV++EawCHXzn+BVkhzOmC/WsCNFdQG6FD5lYjasfmcXZZnGvyd2vq ITUF4PyKO6mBhTzNSL4uqx/65+tXSIj7InEmtb9hVmBJzDy3EE8pPlcAxHlmSBm8hjIy SVbQRbSHggw82ML+l0l9cZujlFZ4/nm0LhAzvBFe0eiVcpYlP3e+huZdoHUJ7OuBNA/9 VWinchCJsz5o7+xWyqbL/YJVdcisyzqrwuEg0pwLIxZdw1XflntHsa4bgx7jrdIY/hEy VJV0FYBdAKzjoNMqJQ6/yGJh8ED3StRE48kUAwT99noJr9UWgh2upFJlutYN4hDVn+B6 CuBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748870644; x=1749475444; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=bsZUVkW+uvkh+Cy920ab/ZJDj08oiwiohk0sDWI6Izk=; b=g0dJ1vHryD1YlmCX4875I7t5DW+DKmBx2/SJYrhl07J0b0XwYZUS94rmh1KY5fjfqN rJZSUV4lxmqzXoy9EwQ/5oxgxPxYsIpOXo9JkRA/P/rQRw8ci8ETbmEAzd6s+6U4PuQH rQYe06DGZ22mZw8c/8kSNj6hWR47g8XAgJ0CIBEfgE2VDn5Q7TEFnKbtRUMAKLUegpVZ DWIPpfdQ79kCVUR4mMhIVcIQlQa+ygB9maV7bAAE8M8rVwMiLH1kfAqz7OEJ6PGIN6GG Ve7hrGxXf86z/MyETh9Br3xmZndWoTyLPYX47QEte3AyutZuGkmSz5eZ7yjvmS54F32p fZmQ== X-Gm-Message-State: AOJu0YziplupIim9Jjrd26SyfgO1GBSu4hHUkFCTf2+n2SvMedoB1C9a XBscKSxfn8iZb7ObmdzJ8O8jrWVCNTSqSlGh0BvcxPgcVcFweOdL3k+Bspy1/kfyUBKZn73ZDOf UkkvWNThPpDJ69+bwA7x56Y0lpciBVp2mtGkVWg/peA== X-Gm-Gg: ASbGncvZ3/JqVTAqHLzjmKC8Kp8NxnYH8AW7v2jukzMt1lDwNSqmbRo8f7FMmHhEApV x9cWYE8coIH5AU5wQtiKAz3w1ZtZkkVRNytGPS1t8cZInuixJdc38CXZknPdyqBRvM9Vnm0CIcP hqd3EeMwZkmja2UhLFQrHfv47gXHuiK1RWwlteF6wKBOmg X-Google-Smtp-Source: AGHT+IEv2Op8oGcEeLWbRbzMb4irkpgHqEoCYRKlzR41OksdWA+pyNkhiTbxFE1yCbck93a7X9pIWV8AXh6P0F3QzYw= X-Received: by 2002:a05:690c:48c7:b0:70e:731f:d4c7 with SMTP id 00721157ae682-7104f18b517mr151596957b3.8.1748870643929; Mon, 02 Jun 2025 06:24:03 -0700 (PDT) MIME-Version: 1.0 References: <mailman.538.1747833190.394.pve-devel@lists.proxmox.com> <1283184248.17536.1747895442851@webmail.proxmox.com> <857cbd6c-6866-417d-a71f-f5b5297bf09c@storpool.com> <1349127939.17705.1747902137180@webmail.proxmox.com> <CAHXTzuk7tYRJV_j=88RWc3R3C7AkiEdFUXi88m5qwnDeYDEC+A@mail.gmail.com> <11746909.21389.1748414016786@webmail.proxmox.com> <CAHXTzumXeyJQQCj+45Hmy5qdU+BTFBYbHVgPy0u3VS-qS=_bDQ@mail.gmail.com> <1695649345.530.1748849837156@webmail.proxmox.com> <CAHXTzukAMG9050Ynn-KRSqhCz2Y0m6vnAQ7FEkCmEdQT3HapfQ@mail.gmail.com> <1233617227.683.1748854174885@webmail.proxmox.com> <CAHXTzu=AiNx0iTWFEUU2kdzx9-RopwLc7rqGui6f0Q=+Hy52=w@mail.gmail.com> <2141074266.768.1748864577569@webmail.proxmox.com> In-Reply-To: <2141074266.768.1748864577569@webmail.proxmox.com> From: Denis Kanchev <denis.kanchev@storpool.com> Date: Mon, 2 Jun 2025 16:23:27 +0300 X-Gm-Features: AX0GCFu3Fa6yXJ8aOOOm1aNgX4yrS1qSn0Gpp2Vrn0zOqOGgdSBU6c4XBXWoVuE Message-ID: <CAHXTzu=qrZe2eEZro7qteR=fDjJQX13syfB9fs5VfFbG7Vy6vQ@mail.gmail.com> Subject: Re: [pve-devel] PVE child process behavior question To: =?UTF-8?Q?Fabian_Gr=C3=BCnbichler?= <f.gruenbichler@proxmox.com> Cc: Proxmox VE development discussion <pve-devel@lists.proxmox.com>, Wolfgang Bumiller <w.bumiller@proxmox.com> X-SPAM-LEVEL: Spam detection results: 0 BAYES_00 -1.9 Bayes spam probability is 0 to 1% DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's domain DKIM_VALID_EF -0.1 Message has a valid DKIM or DK signature from envelope-from domain DMARC_PASS -0.1 DMARC pass policy HTML_MESSAGE 0.001 HTML included in message RCVD_IN_DNSWL_NONE -0.0001 Sender listed at https://www.dnswl.org/, no trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 We tend to prevent having a volume active on two nodes, as may lead to data corruption, so we detach the volume from all nodes ( except the target one ) via our shared storage system. In the sub activate_volume() our logic is to not detach the volume from other hosts in case of migration - because activate_volume() can be called in other cases, where detaching is necessary. But in this case where the QM start process is killed, the migration is marked as failed and still activate_volume() is called on the destination host after migration_cancel ( we track the "lock" flag to be migrate ). That's why i proposed the child processes to be killed when the parent one dies - it will prevent such cases. Not sure if passing an extra argument (marking it as migration) to activate_volume() will solve such issue too. Here is a trace log of activate_volume() in case of migration. 2025-05-02 13:03:28.2222 [2712103] took 0.0006: activate_volume: storeid 'autotest__ec2_1', scfg {'type' =3D> 'storpool','shared' =3D> 1,'template' = =3D> 'autotest__ec2_1','extra-tags' =3D> 'tier=3Dhigh','content' =3D> {'iso' =3D= > 1,'images' =3D> 1}}, volname 'vm-101-disk-0-sp-z.b.df.raw', exclusive undef at /usr/share/perl5/PVE/St orage/Custom/StorPoolPlugin.pm line 1551. PVE::Storage::Custom::StorPoolPlugin::activate_volume("PVE::Storage:= :Custom::StorPoolPlugin", "autotest__ec2_1", HASH(0x559cd06d88a0), "vm-101-disk-0-sp-z.b.df.raw", undef, HASH(0x559cd076b9a8)) called at /usr/share/perl5/PVE/Storage.pm line 1309 PVE::Storage::activate_volumes(HASH(0x559cc99d04e0), ARRAY(0x559cd0754558)) called at /usr/share/perl5/PVE/QemuServer.pm line 5823 PVE::QemuServer::vm_start_nolock(HASH(0x559cc99d04e0), 101, HASH(0x559cd0730ca0), HASH(0x559ccfd6d680), HASH(0x559cc99cfe38)) called at /usr/share/perl5/PVE/QemuServer.pm line 5592 PVE::QemuServer::__ANON__() called at /usr/share/perl5/PVE/AbstractConfig.pm line 299 PVE::AbstractConfig::__ANON__() called at /usr/share/perl5/PVE/Tools.pm line 259 eval {...} called at /usr/share/perl5/PVE/Tools.pm line 259 PVE::Tools::lock_file_full("/var/lock/qemu-server/lock-101.conf", 10, 0, CODE(0x559ccf14b968)) called at /usr/share/perl5/PVE/AbstractConfig.pm line 302 PVE::AbstractConfig::__ANON__("PVE::QemuConfig", 101, 10, 0, CODE(0x559ccf59f740)) called at /usr/share/perl5/PVE/AbstractConfig.pm line 322 PVE::AbstractConfig::lock_config_full("PVE::QemuConfig", 101, 10, CODE(0x559ccf59f740)) called at /usr/share/perl5/PVE/AbstractConfig.pm line 330 PVE::AbstractConfig::lock_config("PVE::QemuConfig", 101, CODE(0x559ccf59f740)) called at /usr/share/perl5/PVE/QemuServer.pm line 5593 PVE::QemuServer::vm_start(HASH(0x559cc99d04e0), 101, HASH(0x559ccfd6d680), HASH(0x559cc99cfe38)) called at /usr/share/perl5/PVE/API2/Qemu.pm line 3259 PVE::API2::Qemu::__ANON__("UPID:lab-dk-2:00296227:0ADF72E0:683DA11F:= qmstart:101:root\@pam:") called at /usr/share/perl5/PVE/RESTEnvironment.pm line 620 eval {...} called at /usr/share/perl5/PVE/RESTEnvironment.pm line 611 PVE::RESTEnvironment::fork_worker(PVE::RPCEnvironment=3DHASH(0x559cc= 99d0558), "qmstart", 101, "root\@pam", CODE(0x559cd06cc160)) called at /usr/share/perl5/PVE/API2/Qemu.pm line 3263 PVE::API2::Qemu::__ANON__(HASH(0x559cd0700df8)) called at /usr/share/perl5/PVE/RESTHandler.pm line 499 PVE::RESTHandler::handle("PVE::API2::Qemu", HASH(0x559cd05deb98), HASH(0x559cd0700df8), 1) called at /usr/share/perl5/PVE/RESTHandler.pm line 985 eval {...} called at /usr/share/perl5/PVE/RESTHandler.pm line 968 PVE::RESTHandler::cli_handler("PVE::API2::Qemu", "qm start", "vm_start", ARRAY(0x559cc99cfee0), ARRAY(0x559cd0745e98), HASH(0x559cd0745ef8), CODE(0x559cd07091f8), undef) called at /usr/share/perl5/PVE/CLIHandler.pm line 594 PVE::CLIHandler::__ANON__(ARRAY(0x559cc99d00c0), undef, CODE(0x559cd07091f8)) called at /usr/share/perl5/PVE/CLIHandler.pm line 673 PVE::CLIHandler::run_cli_handler("PVE::CLI::qm") called at /usr/sbin/qm line 8 On Mon, Jun 2, 2025 at 2:42=E2=80=AFPM Fabian Gr=C3=BCnbichler < f.gruenbichler@proxmox.com> wrote: > > > Denis Kanchev <denis.kanchev@storpool.com> hat am 02.06.2025 11:18 CEST > geschrieben: > > > > > > My bad :) in terms of Proxmox it must be hand-overing the storage > control - the storage plugin function activate_volume() is called in our > case, which moves the storage to the new VM. > > So no data is moved across the nodes and only the volumes get > re-attached. > > Thanks for the plentiful information > > okay! > > so you basically special case this "volume is active on two nodes" case > which should only happen during a live migration, and that somehow runs > into an issue if the migration is aborted because there is some suspected > race somewhere? > > as part of a live migration, the sequence should be: > > node A: migration starts > node A: start request for target VM on node B (over SSH) > node B: `qm start ..` is called > node B: qm start will activate volumes > node B: qm start returns > node A: migration starts > node A/B: some fatal error > node A: cancel migration (via QMP/the source VM running on node A) > node A: request to stop target VM on node B (over SSH) > node B: `qm stop ..` called > node B: qm stop will deactivate volumes > > I am not sure where another activate_volume call after node A has started > the migration could happen? at that point, node A still has control over > the VM (ID), so nothing in PVE should operate on it other than the > selective calls made as part of the migration, which are basically only > querying migration status and error handling at that point.. > > it would still be good to know what actually got OOM-killed in your case.= . > was it the `qm start`? was it the `kvm` process itself? something entirel= y > else? > > if you can reproduce the issue, you could also add logging in > activate_volume to find out the exact call path (e.g., log the call stack > somewhere), maybe that helps find the exact scenario that you are seeing.= . > > --===============0207705269311310791== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel --===============0207705269311310791==--