From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <pve-devel-bounces@lists.proxmox.com> Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 6A2E91FF191 for <inbox@lore.proxmox.com>; Mon, 2 Jun 2025 13:43:14 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 9D0DB315C7; Mon, 2 Jun 2025 13:43:30 +0200 (CEST) Date: Mon, 2 Jun 2025 13:42:57 +0200 (CEST) From: =?UTF-8?Q?Fabian_Gr=C3=BCnbichler?= <f.gruenbichler@proxmox.com> To: Denis Kanchev <denis.kanchev@storpool.com> Message-ID: <2141074266.768.1748864577569@webmail.proxmox.com> In-Reply-To: <CAHXTzu=AiNx0iTWFEUU2kdzx9-RopwLc7rqGui6f0Q=+Hy52=w@mail.gmail.com> References: <mailman.538.1747833190.394.pve-devel@lists.proxmox.com> <1283184248.17536.1747895442851@webmail.proxmox.com> <857cbd6c-6866-417d-a71f-f5b5297bf09c@storpool.com> <1349127939.17705.1747902137180@webmail.proxmox.com> <CAHXTzuk7tYRJV_j=88RWc3R3C7AkiEdFUXi88m5qwnDeYDEC+A@mail.gmail.com> <11746909.21389.1748414016786@webmail.proxmox.com> <CAHXTzumXeyJQQCj+45Hmy5qdU+BTFBYbHVgPy0u3VS-qS=_bDQ@mail.gmail.com> <1695649345.530.1748849837156@webmail.proxmox.com> <CAHXTzukAMG9050Ynn-KRSqhCz2Y0m6vnAQ7FEkCmEdQT3HapfQ@mail.gmail.com> <1233617227.683.1748854174885@webmail.proxmox.com> <CAHXTzu=AiNx0iTWFEUU2kdzx9-RopwLc7rqGui6f0Q=+Hy52=w@mail.gmail.com> MIME-Version: 1.0 X-Priority: 3 Importance: Normal X-Mailer: Open-Xchange Mailer v7.10.6-Rev78 X-Originating-Client: open-xchange-appsuite X-SPAM-LEVEL: Spam detection results: 0 AWL 0.045 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] PVE child process behavior question X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com> List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe> List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/> List-Post: <mailto:pve-devel@lists.proxmox.com> List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help> List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe> Reply-To: Proxmox VE development discussion <pve-devel@lists.proxmox.com> Cc: Wolfgang Bumiller <w.bumiller@proxmox.com>, Proxmox VE development discussion <pve-devel@lists.proxmox.com> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" <pve-devel-bounces@lists.proxmox.com> > Denis Kanchev <denis.kanchev@storpool.com> hat am 02.06.2025 11:18 CEST geschrieben: > > > My bad :) in terms of Proxmox it must be hand-overing the storage control - the storage plugin function activate_volume() is called in our case, which moves the storage to the new VM. > So no data is moved across the nodes and only the volumes get re-attached. > Thanks for the plentiful information okay! so you basically special case this "volume is active on two nodes" case which should only happen during a live migration, and that somehow runs into an issue if the migration is aborted because there is some suspected race somewhere? as part of a live migration, the sequence should be: node A: migration starts node A: start request for target VM on node B (over SSH) node B: `qm start ..` is called node B: qm start will activate volumes node B: qm start returns node A: migration starts node A/B: some fatal error node A: cancel migration (via QMP/the source VM running on node A) node A: request to stop target VM on node B (over SSH) node B: `qm stop ..` called node B: qm stop will deactivate volumes I am not sure where another activate_volume call after node A has started the migration could happen? at that point, node A still has control over the VM (ID), so nothing in PVE should operate on it other than the selective calls made as part of the migration, which are basically only querying migration status and error handling at that point.. it would still be good to know what actually got OOM-killed in your case.. was it the `qm start`? was it the `kvm` process itself? something entirely else? if you can reproduce the issue, you could also add logging in activate_volume to find out the exact call path (e.g., log the call stack somewhere), maybe that helps find the exact scenario that you are seeing.. _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel