From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <f.weber@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 6365EB4316
 for <pve-devel@lists.proxmox.com>; Fri,  1 Dec 2023 10:57:46 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 43F5F14DC2
 for <pve-devel@lists.proxmox.com>; Fri,  1 Dec 2023 10:57:16 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS
 for <pve-devel@lists.proxmox.com>; Fri,  1 Dec 2023 10:57:15 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 85F704356D
 for <pve-devel@lists.proxmox.com>; Fri,  1 Dec 2023 10:57:15 +0100 (CET)
Message-ID: <8fa7891a-0ed8-46b4-8006-456d307aaa1a@proxmox.com>
Date: Fri, 1 Dec 2023 10:57:14 +0100
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
To: Wolfgang Bumiller <w.bumiller@proxmox.com>
Cc: pve-devel@lists.proxmox.com
References: <20230126083214.711099-1-f.weber@proxmox.com>
 <20230126083214.711099-3-f.weber@proxmox.com>
 <no64qbjndkwmnn62jmp5siavnwkluoalgmgiyefdxt3ibsz6h2@khuefkefgjw7>
Content-Language: en-US
From: Friedrich Weber <f.weber@proxmox.com>
In-Reply-To: <no64qbjndkwmnn62jmp5siavnwkluoalgmgiyefdxt3ibsz6h2@khuefkefgjw7>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.116 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 T_SCC_BODY_TEXT_LINE    -0.01 -
Subject: Re: [pve-devel] [RFC container 2/4] fix #4474: lxc api: add
 overrule-shutdown parameter to stop endpoint
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Fri, 01 Dec 2023 09:57:46 -0000

Thanks for looking into this!

On 17/11/2023 14:09, Wolfgang Bumiller wrote:
[...]
>>  	    return PVE::LXC::Config->lock_config($vmid, $lockcmd);
> 
> ^ Here we lock first, then fork the worker, then do `vm_stop` with the
> config lock inherited.
> 
> This means that creating multiple shutdown tasks before using one with
> override=true could cause the override task to cancel the *first* ongoing
> shutdown task, then move on to the `lock_config` call - in the meantime
> a second shutdown task acquires this very lock and performs another
> long-running shutdown, causing the `override` parameter to be
> ineffective.

Just to make sure I understand correctly, the scenario is (please
correct me if I'm wrong):

* shutdown task #1 has the lock and starts long-running shutdown
* stop API handler with override kills shutdown task #1, but does not
acquire the lock yet
* shutdown task #2 starts, acquires the lock and starts long-running
shutdown
* stop task waits for the lock => override flag was ineffective

> We should switch the ordering here: first fork the worker, then lock.
> (ยน And your new chunk would go into the worker as well)
> 
> Unless I'm missing something, but AFAICT the current ordering there is
> rather ... bad :-)

Would this actually prevent the scenario above? We cannot put my new
chunk into the locked section (because then it couldn't kill an active
shutdown task that has the lock), but if we put it into the worker
before the locked section, couldn't the same thing as above happen?
Meaning the stop task with override kills shutdown tasks but doesn't
have the lock yet, a new shutdown task acquires the lock, makes the stop
task wait for it, and renders the override flag ineffective just the same?