From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pve-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
	by lore.proxmox.com (Postfix) with ESMTPS id 2D6BB1FF191
	for <inbox@lore.proxmox.com>; Mon,  2 Jun 2025 15:23:58 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 9730533E51;
	Mon,  2 Jun 2025 15:24:14 +0200 (CEST)
References: <mailman.538.1747833190.394.pve-devel@lists.proxmox.com>
 <1283184248.17536.1747895442851@webmail.proxmox.com>
 <857cbd6c-6866-417d-a71f-f5b5297bf09c@storpool.com>
 <1349127939.17705.1747902137180@webmail.proxmox.com>
 <CAHXTzuk7tYRJV_j=88RWc3R3C7AkiEdFUXi88m5qwnDeYDEC+A@mail.gmail.com>
 <11746909.21389.1748414016786@webmail.proxmox.com>
 <CAHXTzumXeyJQQCj+45Hmy5qdU+BTFBYbHVgPy0u3VS-qS=_bDQ@mail.gmail.com>
 <1695649345.530.1748849837156@webmail.proxmox.com>
 <CAHXTzukAMG9050Ynn-KRSqhCz2Y0m6vnAQ7FEkCmEdQT3HapfQ@mail.gmail.com>
 <1233617227.683.1748854174885@webmail.proxmox.com>
 <CAHXTzu=AiNx0iTWFEUU2kdzx9-RopwLc7rqGui6f0Q=+Hy52=w@mail.gmail.com>
 <2141074266.768.1748864577569@webmail.proxmox.com>
In-Reply-To: <2141074266.768.1748864577569@webmail.proxmox.com>
Date: Mon, 2 Jun 2025 16:23:27 +0300
To: =?UTF-8?Q?Fabian_Gr=C3=BCnbichler?= <f.gruenbichler@proxmox.com>
MIME-Version: 1.0
Message-ID: <mailman.167.1748870653.395.pve-devel@lists.proxmox.com>
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Post: <mailto:pve-devel@lists.proxmox.com>
From: Denis Kanchev via pve-devel <pve-devel@lists.proxmox.com>
Precedence: list
Cc: Denis Kanchev <denis.kanchev@storpool.com>,
 Wolfgang Bumiller <w.bumiller@proxmox.com>,
 Proxmox VE development discussion <pve-devel@lists.proxmox.com>
X-Mailman-Version: 2.1.29
X-BeenThere: pve-devel@lists.proxmox.com
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
Reply-To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
Subject: Re: [pve-devel] PVE child process behavior question
Content-Type: multipart/mixed; boundary="===============0207705269311310791=="
Errors-To: pve-devel-bounces@lists.proxmox.com
Sender: "pve-devel" <pve-devel-bounces@lists.proxmox.com>

--===============0207705269311310791==
Content-Type: message/rfc822
Content-Disposition: inline

Return-Path: <denis.kanchev@storpool.com>
X-Original-To: pve-devel@lists.proxmox.com
Delivered-To: pve-devel@lists.proxmox.com
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits))
	(No client certificate requested)
	by lists.proxmox.com (Postfix) with ESMTPS id 05C73CB074
	for <pve-devel@lists.proxmox.com>; Mon,  2 Jun 2025 15:24:13 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id D956E33E38
	for <pve-devel@lists.proxmox.com>; Mon,  2 Jun 2025 15:24:12 +0200 (CEST)
Received: from mail-yw1-x1143.google.com (mail-yw1-x1143.google.com [IPv6:2607:f8b0:4864:20::1143])
	(using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
	(No client certificate requested)
	by firstgate.proxmox.com (Proxmox) with ESMTPS
	for <pve-devel@lists.proxmox.com>; Mon,  2 Jun 2025 15:24:11 +0200 (CEST)
Received: by mail-yw1-x1143.google.com with SMTP id 00721157ae682-70e447507a0so35379247b3.0
        for <pve-devel@lists.proxmox.com>; Mon, 02 Jun 2025 06:24:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=storpool.com; s=google; t=1748870644; x=1749475444; darn=lists.proxmox.com;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:from:to:cc:subject:date:message-id:reply-to;
        bh=bsZUVkW+uvkh+Cy920ab/ZJDj08oiwiohk0sDWI6Izk=;
        b=B/lG6Qc+iqSdjAV++EawCHXzn+BVkhzOmC/WsCNFdQG6FD5lYjasfmcXZZnGvyd2vq
         ITUF4PyKO6mBhTzNSL4uqx/65+tXSIj7InEmtb9hVmBJzDy3EE8pPlcAxHlmSBm8hjIy
         SVbQRbSHggw82ML+l0l9cZujlFZ4/nm0LhAzvBFe0eiVcpYlP3e+huZdoHUJ7OuBNA/9
         VWinchCJsz5o7+xWyqbL/YJVdcisyzqrwuEg0pwLIxZdw1XflntHsa4bgx7jrdIY/hEy
         VJV0FYBdAKzjoNMqJQ6/yGJh8ED3StRE48kUAwT99noJr9UWgh2upFJlutYN4hDVn+B6
         CuBA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1748870644; x=1749475444;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=bsZUVkW+uvkh+Cy920ab/ZJDj08oiwiohk0sDWI6Izk=;
        b=g0dJ1vHryD1YlmCX4875I7t5DW+DKmBx2/SJYrhl07J0b0XwYZUS94rmh1KY5fjfqN
         rJZSUV4lxmqzXoy9EwQ/5oxgxPxYsIpOXo9JkRA/P/rQRw8ci8ETbmEAzd6s+6U4PuQH
         rQYe06DGZ22mZw8c/8kSNj6hWR47g8XAgJ0CIBEfgE2VDn5Q7TEFnKbtRUMAKLUegpVZ
         DWIPpfdQ79kCVUR4mMhIVcIQlQa+ygB9maV7bAAE8M8rVwMiLH1kfAqz7OEJ6PGIN6GG
         Ve7hrGxXf86z/MyETh9Br3xmZndWoTyLPYX47QEte3AyutZuGkmSz5eZ7yjvmS54F32p
         fZmQ==
X-Gm-Message-State: AOJu0YziplupIim9Jjrd26SyfgO1GBSu4hHUkFCTf2+n2SvMedoB1C9a
	XBscKSxfn8iZb7ObmdzJ8O8jrWVCNTSqSlGh0BvcxPgcVcFweOdL3k+Bspy1/kfyUBKZn73ZDOf
	UkkvWNThPpDJ69+bwA7x56Y0lpciBVp2mtGkVWg/peA==
X-Gm-Gg: ASbGncvZ3/JqVTAqHLzjmKC8Kp8NxnYH8AW7v2jukzMt1lDwNSqmbRo8f7FMmHhEApV
	x9cWYE8coIH5AU5wQtiKAz3w1ZtZkkVRNytGPS1t8cZInuixJdc38CXZknPdyqBRvM9Vnm0CIcP
	hqd3EeMwZkmja2UhLFQrHfv47gXHuiK1RWwlteF6wKBOmg
X-Google-Smtp-Source: AGHT+IEv2Op8oGcEeLWbRbzMb4irkpgHqEoCYRKlzR41OksdWA+pyNkhiTbxFE1yCbck93a7X9pIWV8AXh6P0F3QzYw=
X-Received: by 2002:a05:690c:48c7:b0:70e:731f:d4c7 with SMTP id
 00721157ae682-7104f18b517mr151596957b3.8.1748870643929; Mon, 02 Jun 2025
 06:24:03 -0700 (PDT)
MIME-Version: 1.0
References: <mailman.538.1747833190.394.pve-devel@lists.proxmox.com>
 <1283184248.17536.1747895442851@webmail.proxmox.com> <857cbd6c-6866-417d-a71f-f5b5297bf09c@storpool.com>
 <1349127939.17705.1747902137180@webmail.proxmox.com> <CAHXTzuk7tYRJV_j=88RWc3R3C7AkiEdFUXi88m5qwnDeYDEC+A@mail.gmail.com>
 <11746909.21389.1748414016786@webmail.proxmox.com> <CAHXTzumXeyJQQCj+45Hmy5qdU+BTFBYbHVgPy0u3VS-qS=_bDQ@mail.gmail.com>
 <1695649345.530.1748849837156@webmail.proxmox.com> <CAHXTzukAMG9050Ynn-KRSqhCz2Y0m6vnAQ7FEkCmEdQT3HapfQ@mail.gmail.com>
 <1233617227.683.1748854174885@webmail.proxmox.com> <CAHXTzu=AiNx0iTWFEUU2kdzx9-RopwLc7rqGui6f0Q=+Hy52=w@mail.gmail.com>
 <2141074266.768.1748864577569@webmail.proxmox.com>
In-Reply-To: <2141074266.768.1748864577569@webmail.proxmox.com>
From: Denis Kanchev <denis.kanchev@storpool.com>
Date: Mon, 2 Jun 2025 16:23:27 +0300
X-Gm-Features: AX0GCFu3Fa6yXJ8aOOOm1aNgX4yrS1qSn0Gpp2Vrn0zOqOGgdSBU6c4XBXWoVuE
Message-ID: <CAHXTzu=qrZe2eEZro7qteR=fDjJQX13syfB9fs5VfFbG7Vy6vQ@mail.gmail.com>
Subject: Re: [pve-devel] PVE child process behavior question
To: =?UTF-8?Q?Fabian_Gr=C3=BCnbichler?= <f.gruenbichler@proxmox.com>
Cc: Proxmox VE development discussion <pve-devel@lists.proxmox.com>, Wolfgang Bumiller <w.bumiller@proxmox.com>
X-SPAM-LEVEL: Spam detection results:  0
	BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
	DKIM_SIGNED               0.1 Message has a DKIM or DK signature, not necessarily valid
	DKIM_VALID               -0.1 Message has at least one valid DKIM or DK signature
	DKIM_VALID_AU            -0.1 Message has a valid DKIM or DK signature from author's domain
	DKIM_VALID_EF            -0.1 Message has a valid DKIM or DK signature from envelope-from domain
	DMARC_PASS               -0.1 DMARC pass policy
	HTML_MESSAGE            0.001 HTML included in message
	RCVD_IN_DNSWL_NONE     -0.0001 Sender listed at https://www.dnswl.org/, no trust
	SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
	SPF_PASS               -0.001 SPF: sender matches SPF record
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.29

We tend to prevent having a volume active on two nodes, as may lead to data
corruption, so we detach the volume from all nodes ( except the target one
) via our shared storage system.
In the sub activate_volume() our logic is to not detach the volume from
other hosts in case of migration - because activate_volume() can be called
in other cases, where detaching is necessary.
But in this case where the QM start process is killed, the migration is
marked as failed and still activate_volume() is called on the destination
host after migration_cancel ( we track the "lock" flag to be migrate ).
That's why i proposed the child processes to be killed when the parent one
dies - it will prevent such cases.
Not sure if passing an extra argument (marking it as migration) to
activate_volume() will solve such issue too.
Here is a trace log of activate_volume() in case of migration.

2025-05-02 13:03:28.2222 [2712103] took 0.0006: activate_volume: storeid
'autotest__ec2_1', scfg {'type' =3D> 'storpool','shared' =3D> 1,'template' =
=3D>
'autotest__ec2_1','extra-tags' =3D> 'tier=3Dhigh','content' =3D> {'iso' =3D=
>
1,'images' =3D> 1}}, volname 'vm-101-disk-0-sp-z.b.df.raw', exclusive undef
at /usr/share/perl5/PVE/St
orage/Custom/StorPoolPlugin.pm line 1551.
       PVE::Storage::Custom::StorPoolPlugin::activate_volume("PVE::Storage:=
:Custom::StorPoolPlugin",
"autotest__ec2_1", HASH(0x559cd06d88a0), "vm-101-disk-0-sp-z.b.df.raw",
undef, HASH(0x559cd076b9a8)) called at /usr/share/perl5/PVE/Storage.pm line
1309
       PVE::Storage::activate_volumes(HASH(0x559cc99d04e0),
ARRAY(0x559cd0754558)) called at /usr/share/perl5/PVE/QemuServer.pm line
5823
       PVE::QemuServer::vm_start_nolock(HASH(0x559cc99d04e0), 101,
HASH(0x559cd0730ca0), HASH(0x559ccfd6d680), HASH(0x559cc99cfe38)) called at
/usr/share/perl5/PVE/QemuServer.pm line 5592
       PVE::QemuServer::__ANON__() called at
/usr/share/perl5/PVE/AbstractConfig.pm line 299
       PVE::AbstractConfig::__ANON__() called at
/usr/share/perl5/PVE/Tools.pm line 259
       eval {...} called at /usr/share/perl5/PVE/Tools.pm line 259

       PVE::Tools::lock_file_full("/var/lock/qemu-server/lock-101.conf",
10, 0, CODE(0x559ccf14b968)) called at
/usr/share/perl5/PVE/AbstractConfig.pm line 302
       PVE::AbstractConfig::__ANON__("PVE::QemuConfig", 101, 10, 0,
CODE(0x559ccf59f740)) called at /usr/share/perl5/PVE/AbstractConfig.pm line
322
       PVE::AbstractConfig::lock_config_full("PVE::QemuConfig", 101, 10,
CODE(0x559ccf59f740)) called at /usr/share/perl5/PVE/AbstractConfig.pm line
330
       PVE::AbstractConfig::lock_config("PVE::QemuConfig", 101,
CODE(0x559ccf59f740)) called at /usr/share/perl5/PVE/QemuServer.pm line
5593
       PVE::QemuServer::vm_start(HASH(0x559cc99d04e0), 101,
HASH(0x559ccfd6d680), HASH(0x559cc99cfe38)) called at
/usr/share/perl5/PVE/API2/Qemu.pm line 3259
       PVE::API2::Qemu::__ANON__("UPID:lab-dk-2:00296227:0ADF72E0:683DA11F:=
qmstart:101:root\@pam:")
called at /usr/share/perl5/PVE/RESTEnvironment.pm line 620
       eval {...} called at /usr/share/perl5/PVE/RESTEnvironment.pm line
611
       PVE::RESTEnvironment::fork_worker(PVE::RPCEnvironment=3DHASH(0x559cc=
99d0558),
"qmstart", 101, "root\@pam", CODE(0x559cd06cc160)) called at
/usr/share/perl5/PVE/API2/Qemu.pm line 3263
       PVE::API2::Qemu::__ANON__(HASH(0x559cd0700df8)) called at
/usr/share/perl5/PVE/RESTHandler.pm line 499
       PVE::RESTHandler::handle("PVE::API2::Qemu", HASH(0x559cd05deb98),
HASH(0x559cd0700df8), 1) called at /usr/share/perl5/PVE/RESTHandler.pm line
985
       eval {...} called at /usr/share/perl5/PVE/RESTHandler.pm line 968

       PVE::RESTHandler::cli_handler("PVE::API2::Qemu", "qm start",
"vm_start", ARRAY(0x559cc99cfee0), ARRAY(0x559cd0745e98),
HASH(0x559cd0745ef8), CODE(0x559cd07091f8), undef) called at
/usr/share/perl5/PVE/CLIHandler.pm line 594
       PVE::CLIHandler::__ANON__(ARRAY(0x559cc99d00c0), undef,
CODE(0x559cd07091f8)) called at /usr/share/perl5/PVE/CLIHandler.pm line 673
       PVE::CLIHandler::run_cli_handler("PVE::CLI::qm") called at
/usr/sbin/qm line 8


On Mon, Jun 2, 2025 at 2:42=E2=80=AFPM Fabian Gr=C3=BCnbichler <
f.gruenbichler@proxmox.com> wrote:

>
> > Denis Kanchev <denis.kanchev@storpool.com> hat am 02.06.2025 11:18 CEST
> geschrieben:
> >
> >
> > My bad :) in terms of Proxmox it must be hand-overing the storage
> control - the storage plugin function activate_volume() is called in our
> case, which moves the storage to the new VM.
> > So no data is moved across the nodes and only the volumes get
> re-attached.
> > Thanks for the plentiful information
>
> okay!
>
> so you basically special case this "volume is active on two nodes" case
> which should only happen during a live migration, and that somehow runs
> into an issue if the migration is aborted because there is some suspected
> race somewhere?
>
> as part of a live migration, the sequence should be:
>
> node A: migration starts
> node A: start request for target VM on node B (over SSH)
> node B: `qm start ..` is called
> node B: qm start will activate volumes
> node B: qm start returns
> node A: migration starts
> node A/B: some fatal error
> node A: cancel migration (via QMP/the source VM running on node A)
> node A: request to stop target VM on node B (over SSH)
> node B: `qm stop ..` called
> node B: qm stop will deactivate volumes
>
> I am not sure where another activate_volume call after node A has started
> the migration could happen? at that point, node A still has control over
> the VM (ID), so nothing in PVE should operate on it other than the
> selective calls made as part of the migration, which are basically only
> querying migration status and error handling at that point..
>
> it would still be good to know what actually got OOM-killed in your case.=
.
> was it the `qm start`? was it the `kvm` process itself? something entirel=
y
> else?
>
> if you can reproduce the issue, you could also add logging in
> activate_volume to find out the exact call path (e.g., log the call stack
> somewhere), maybe that helps find the exact scenario that you are seeing.=
.
>
>

--===============0207705269311310791==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

--===============0207705269311310791==--