From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 1C7B61FF139 for ; Tue, 24 Feb 2026 11:05:26 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id BA5F63097; Tue, 24 Feb 2026 11:06:18 +0100 (CET) Message-ID: Date: Tue, 24 Feb 2026 11:06:13 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Beta Subject: Re: [RFC PATCH qemu-server] cleanup: refactor to make cleanup flow more consistent To: Fiona Ebner , pve-devel@lists.proxmox.com References: <20260223105542.1525232-1-d.csapak@proxmox.com> <2c237643-99bd-4e9b-8838-efa9a97f9d75@proxmox.com> <28c91ab7-b41d-446e-b31c-5800ced4ad61@proxmox.com> <7986269c-5fe9-433c-a1c9-bbba3279cb64@proxmox.com> <46fd4500-1410-4d9e-98ee-e47bdc42a820@proxmox.com> Content-Language: en-US From: Dominik Csapak In-Reply-To: <46fd4500-1410-4d9e-98ee-e47bdc42a820@proxmox.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1771927558425 X-SPAM-LEVEL: Spam detection results: 0 AWL -1.033 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 1.179 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.717 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.236 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: KW3KDN4UUVTEGA526CIETXJD4GGQSG2G X-Message-ID-Hash: KW3KDN4UUVTEGA526CIETXJD4GGQSG2G X-MailFrom: d.csapak@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On 2/24/26 10:49 AM, Fiona Ebner wrote: > Am 24.02.26 um 10:37 AM schrieb Dominik Csapak: >> On 2/24/26 10:30 AM, Fiona Ebner wrote: >>> Am 23.02.26 um 4:50 PM schrieb Fiona Ebner: >>>> Am 23.02.26 um 11:56 AM schrieb Dominik Csapak: >>>>> There are two ways a cleanup can be triggered: >>>>> >>>>> * When a guest is stopped/shutdown via the API, 'vm_stop' calls >>>>> 'vm_stop_cleanup'. >>>>> * When the guest process disconnects from qmeventd, 'qm cleanup' is >>>>>    called, which in turn also tries to call 'vm_stop_cleanup'. >>>>> >>>>> Both of these happen under a qemu config lock, so there is no direct >>>>> race condition that it will be called out of order, but it could happen >>>>> that the 'qm cleanup' call happened in addition so cleanup was called >>>>> twice. Which could be a problem when the shutdown was called with >>>>> 'keepActive' which 'qm cleanup' would simply know nothing of and >>>>> ignore. >>>>> >>>>> Also the post-stop hook might not be triggered in case e.g. a stop-mode >>>>> backup was done, since that was only happening via qm cleanup and this >>>>> would detect the now again running guest and abort. >>>>> >>>>> To improve the situation we move the exec_hookscript call at the end >>>>> of vm_stop_cleanup. At this point we know the vm is stopped and we >>>>> still >>>>> have the config lock. >>>>> >>>>> On _do_vm_stop (and in the one case for migration) a 'cleanup-flag' is >>>>> created that marks the vm is cleaned up by the api, so 'qm cleanup' >>>>> should not do it again. >>>>> >>>>> On vm start, this flag is cleared. >>> >>> It feels untidy to have something left after cleaning up, even if it's >>> just the file indicating that cleanup was done. Maybe we can switch it >>> around, see below: >>> >>>>> >>>>> There is still a tiny race possible: >>>>> >>>>> a guest is stopped from within (or crashes) and the vm is started again >>>>> via the api before 'qm cleanup' can run >>>>> >>>>> This should be a very rare case though, and all operation via the API >>>>> (reboot, shutdown+start, stop-mode backup, etc.) should work as >>>>> intended. >>>> >>>> How difficult is it to trigger the race with an HA-managed VM? >>>> >>>>> Signed-off-by: Dominik Csapak >>>>> --- >>>>> I'm not sure how we could possibly eliminate the race i mentioned: >>>>> * we can't block on start because i don't think we can sensibly >>>>> decide between: >>>>>    - vm was crashing/powered off >>>>>    - vm was never started >>>>> >>>>> We could maybe leave a 'started' flag somewhere too and clear the >>>>> cleanup flag also in 'qm cleanup', then we would start the vm >>>>> only when the cleanup flag is cleared >>>>> (or have the cleanup flags have multiple states, like 'started', >>>>> 'finished') >>> >>> We already have something very similar, namely, the PID file. The issue >>> is that the PID file is removed automatically by QEMU upon clean >>> termination. For our use case we would need a second, more persistent >>> file. Then we could solve the issue of duplicate cleanup and the issue >>> of starting another instance before cleanup: >>> >>> 1. create a flag file at startup with an identifier for the >>>     QEMU instance, a second manual PID file? >>> 2. at cleanup, check the file: >>>     a) if there is no such file, skip, somebody else already cleaned up >>>        NOTE: we need to ensure that pre-existing instances are still >>>        cleaned up. One possible way would be to create a flag file during >>>        host startup and only use the new behavior when that is present. >>>     b) if the file exists, check if the QEMU instance is still around. If >>>        it is, wait for the instance to be gone until hitting some >>>        timeout. Once it's gone, do cleanup. >>> 3. make sure to run the post-stop hook whenever we remove the file >>> 4. if the file still exists at startup, cleanup was not done yet, wait >>>     until some timeout and when hitting the timeout, either proceed with >>>     start anyway or suggest running cleanup manually. The latter would be >>>     safer, but also worse from an UX standpoint, since cleanup is >>>     root-only >>> >>> What do you think? >> >> yes, that seems good to me,  i'll play around with that and send a next >> version > > Ah, one more thing. With stop mode backup, we'd still run into the issue > that the cleanup triggered by qmeventd might run into a newly started > instance and then wait around for nothing. We already skip cleanup if we > detect the 'rollback' lock since we know rollback does its own cleanup. > I think we can do the same for the backup lock (if shutdown was clean), > since we know stop mode backup does its own cleanup too. And it might be > better to do warn+return instead of die, since the situation is not > really unexpected (the one for rollback could be adapted too). sure makes sense, but i'd split that in a separate patch