From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 698EC1FF13E for ; Fri, 06 Mar 2026 18:15:40 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id BF6F91E04; Fri, 6 Mar 2026 18:16:45 +0100 (CET) Message-ID: Date: Fri, 6 Mar 2026 18:16:11 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH access-control/common 0/2] address probblem with SIGCHLD handler being temporarily overwritten To: Hannes Laimer , pve-devel@lists.proxmox.com References: <20260304134649.82272-1-h.laimer@proxmox.com> Content-Language: en-US From: Stefan Hanreich In-Reply-To: <20260304134649.82272-1-h.laimer@proxmox.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.723 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_MSPIKE_H2 0.001 Average reputation (+2) SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: LQLCSCMYFMTLQD4OVUHFC5BC35SWEER4 X-Message-ID-Hash: LQLCSCMYFMTLQD4OVUHFC5BC35SWEER4 X-MailFrom: s.hanreich@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Applied this today while developing my integration tests and haven't encountered issues w.r.t tasks hanging since then on my test instances. Consider this: Tested-by: Stefan Hanreich On 3/4/26 2:47 PM, Hannes Laimer wrote: > Thanks a lot @Fabian and @Fiona for helping me debug this! > > The problem is that some libaries do overwrite the SIGCHLD handler > temporarily, if the library is called fast enough this can lead to lost > CHLD signals which in turn prevents `worker_reaper` from being called in > RESTEnvironment. So tasks won't get cleaned-up until a different SIGCHLD > arrives at the same `pvedeamon` process triggering `worker_reaper`. > > As @Fabian mentioned in [1] a general re-work of the task handling, > potentially with `pidfd`s, would make a lot of sense. > > These two patches address the problem in the task handling structure as > it currently is. They > - run the PAM lib call in a fork, so signal handler changes the library > does are isloated from our process > - run `worker_reaper` periodically (5s) do catch any other potential > instances of this, since it would be possible that the same happens > with other libs, not just PAM > > [1] https://lore.proxmox.com/pve-devel/1772617908.i4bmsyq0kp.astroid@yuna.none/T/#m7b0f3873be5755f330e288cfa50905744f225b2b > > > pve-common: > > Hannes Laimer (1): > RESTEnvironment: periodically reap workers as SIGCHLD fallback > > src/PVE/RESTEnvironment.pm | 9 +++++++++ > 1 file changed, 9 insertions(+) > > > pve-access-control: > > Hannes Laimer (1): > pam: fork for PAM authentication to isolate SIGCHLD handler > > src/PVE/Auth/PAM.pm | 74 +++++++++++++++++++++++++-------------------- > 1 file changed, 42 insertions(+), 32 deletions(-) > > > Summary over all repositories: > 2 files changed, 51 insertions(+), 32 deletions(-) >