From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 73E5DBEDE2 for ; Tue, 2 Jan 2024 14:34:37 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 5588017E50 for ; Tue, 2 Jan 2024 14:34:07 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Tue, 2 Jan 2024 14:34:06 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 977DF42FED for ; Tue, 2 Jan 2024 14:34:06 +0100 (CET) Message-ID: Date: Tue, 2 Jan 2024 14:34:05 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: Friedrich Weber To: Wolfgang Bumiller Cc: pve-devel@lists.proxmox.com Reply-To: Proxmox VE development discussion References: <20230126083214.711099-1-f.weber@proxmox.com> <20230126083214.711099-3-f.weber@proxmox.com> <8fa7891a-0ed8-46b4-8006-456d307aaa1a@proxmox.com> Content-Language: en-US In-Reply-To: <8fa7891a-0ed8-46b4-8006-456d307aaa1a@proxmox.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.108 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - Subject: Re: [pve-devel] [RFC container 2/4] fix #4474: lxc api: add overrule-shutdown parameter to stop endpoint X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Jan 2024 13:34:37 -0000 On 01/12/2023 10:57, Friedrich Weber wrote: > On 17/11/2023 14:09, Wolfgang Bumiller wrote: > [...] >>> return PVE::LXC::Config->lock_config($vmid, $lockcmd); >> >> ^ Here we lock first, then fork the worker, then do `vm_stop` with the >> config lock inherited. >> >> This means that creating multiple shutdown tasks before using one with >> override=true could cause the override task to cancel the *first* ongoing >> shutdown task, then move on to the `lock_config` call - in the meantime >> a second shutdown task acquires this very lock and performs another >> long-running shutdown, causing the `override` parameter to be >> ineffective. > > Just to make sure I understand correctly, the scenario is (please > correct me if I'm wrong): > > * shutdown task #1 has the lock and starts long-running shutdown > * stop API handler with override kills shutdown task #1, but does not > acquire the lock yet > * shutdown task #2 starts, acquires the lock and starts long-running > shutdown > * stop task waits for the lock => override flag was ineffective Discussed this with Wolfgang off-list, posting here for completeness. I suppose the scenario I sketched is technically possible, but unlikely to occur in practice (the stop API handler will usually acquire the lock before shutdown task #2 can). Wolfgang actually sketched a slightly different scenario, which is reproducible with containers pretty easily: * shutdown task #1 has the lock and starts long-running shutdown * API handler for shutdown task #2 waits for the lock (there is no task yet) * API handler for stop task #3 (with overrule-shutdown) kills shutdown task #1, but does not acquire the lock yet * API handler for shutdown task #2 acquires the lock and runs another long-running shutdown * API handler for stop task #3 waits for the lock => overrule-shutdown flag was ineffective As pointed out by Wolfgang this happens because container shutdown currently uses lock-then-fork. VM shutdown, on the other hand, uses fork-then-lock, so the above can't happen (the stop task with overrule-shutdown kills both shutdown tasks). In the next version I'll send a separate patch that switches the ordering as suggested by Wolfgang.