From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pve-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9])
	by lore.proxmox.com (Postfix) with ESMTPS id 6A2E91FF191
	for <inbox@lore.proxmox.com>; Mon,  2 Jun 2025 13:43:14 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 9D0DB315C7;
	Mon,  2 Jun 2025 13:43:30 +0200 (CEST)
Date: Mon, 2 Jun 2025 13:42:57 +0200 (CEST)
From: =?UTF-8?Q?Fabian_Gr=C3=BCnbichler?= <f.gruenbichler@proxmox.com>
To: Denis Kanchev <denis.kanchev@storpool.com>
Message-ID: <2141074266.768.1748864577569@webmail.proxmox.com>
In-Reply-To: <CAHXTzu=AiNx0iTWFEUU2kdzx9-RopwLc7rqGui6f0Q=+Hy52=w@mail.gmail.com>
References: <mailman.538.1747833190.394.pve-devel@lists.proxmox.com>
 <1283184248.17536.1747895442851@webmail.proxmox.com>
 <857cbd6c-6866-417d-a71f-f5b5297bf09c@storpool.com>
 <1349127939.17705.1747902137180@webmail.proxmox.com>
 <CAHXTzuk7tYRJV_j=88RWc3R3C7AkiEdFUXi88m5qwnDeYDEC+A@mail.gmail.com>
 <11746909.21389.1748414016786@webmail.proxmox.com>
 <CAHXTzumXeyJQQCj+45Hmy5qdU+BTFBYbHVgPy0u3VS-qS=_bDQ@mail.gmail.com>
 <1695649345.530.1748849837156@webmail.proxmox.com>
 <CAHXTzukAMG9050Ynn-KRSqhCz2Y0m6vnAQ7FEkCmEdQT3HapfQ@mail.gmail.com>
 <1233617227.683.1748854174885@webmail.proxmox.com>
 <CAHXTzu=AiNx0iTWFEUU2kdzx9-RopwLc7rqGui6f0Q=+Hy52=w@mail.gmail.com>
MIME-Version: 1.0
X-Priority: 3
Importance: Normal
X-Mailer: Open-Xchange Mailer v7.10.6-Rev78
X-Originating-Client: open-xchange-appsuite
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.045 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pve-devel] PVE child process behavior question
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
Reply-To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Cc: Wolfgang Bumiller <w.bumiller@proxmox.com>,
 Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: pve-devel-bounces@lists.proxmox.com
Sender: "pve-devel" <pve-devel-bounces@lists.proxmox.com>


> Denis Kanchev <denis.kanchev@storpool.com> hat am 02.06.2025 11:18 CEST geschrieben:
> 
> 
> My bad :) in terms of Proxmox it must be hand-overing the storage control - the storage plugin function activate_volume() is called in our case, which moves the storage to the new VM.
> So no data is moved across the nodes and only the volumes get re-attached.
> Thanks for the plentiful information

okay!

so you basically special case this "volume is active on two nodes" case which should only happen during a live migration, and that somehow runs into an issue if the migration is aborted because there is some suspected race somewhere?

as part of a live migration, the sequence should be:

node A: migration starts
node A: start request for target VM on node B (over SSH)
node B: `qm start ..` is called
node B: qm start will activate volumes
node B: qm start returns
node A: migration starts
node A/B: some fatal error
node A: cancel migration (via QMP/the source VM running on node A)
node A: request to stop target VM on node B (over SSH)
node B: `qm stop ..` called
node B: qm stop will deactivate volumes

I am not sure where another activate_volume call after node A has started the migration could happen? at that point, node A still has control over the VM (ID), so nothing in PVE should operate on it other than the selective calls made as part of the migration, which are basically only querying migration status and error handling at that point..

it would still be good to know what actually got OOM-killed in your case.. was it the `qm start`? was it the `kvm` process itself? something entirely else?

if you can reproduce the issue, you could also add logging in activate_volume to find out the exact call path (e.g., log the call stack somewhere), maybe that helps find the exact scenario that you are seeing..


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel