* [PATCH pve-common 1/1] RESTEnvironment: periodically reap workers as SIGCHLD fallback
2026-03-04 13:46 [PATCH access-control/common 0/2] address probblem with SIGCHLD handler being temporarily overwritten Hannes Laimer
@ 2026-03-04 13:46 ` Hannes Laimer
2026-03-04 13:46 ` [PATCH pve-access-control 1/1] pam: fork for PAM authentication to isolate SIGCHLD handler Hannes Laimer
2026-03-06 17:16 ` [PATCH access-control/common 0/2] address probblem with SIGCHLD handler being temporarily overwritten Stefan Hanreich
2 siblings, 0 replies; 4+ messages in thread
From: Hannes Laimer @ 2026-03-04 13:46 UTC (permalink / raw)
To: pve-devel
Libraries may temporarily override $SIG{CHLD}, causing worker exit
signals to be lost. Poll worker_reaper every 5 seconds via an AnyEvent
timer to catch any missed signals in API server contexts where this can
be problematic.
Signed-off-by: Hannes Laimer <h.laimer@proxmox.com>
---
src/PVE/RESTEnvironment.pm | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/src/PVE/RESTEnvironment.pm b/src/PVE/RESTEnvironment.pm
index 81d7e29..cb44823 100644
--- a/src/PVE/RESTEnvironment.pm
+++ b/src/PVE/RESTEnvironment.pm
@@ -37,6 +37,7 @@ my $rest_env;
my $WORKER_PIDS;
my $WORKER_FLAG = 0;
+my $worker_reaper_timer;
my $log_task_result = sub {
my ($upid, $user, $status) = @_;
@@ -124,6 +125,14 @@ sub init {
}
};
+ # Periodically reap workers as a fallback in case a library call temporarily overrides
+ # $SIG{CHLD} and causes us to miss a signal. Only useful when an event loop is running.
+ $worker_reaper_timer = AnyEvent->timer(
+ after => 5,
+ interval => 5,
+ cb => sub { $worker_reaper->() },
+ );
+
# environment types
# cli ... command started fron command line
# pub ... access from public server (pveproxy)
--
2.47.3
^ permalink raw reply [flat|nested] 4+ messages in thread* [PATCH pve-access-control 1/1] pam: fork for PAM authentication to isolate SIGCHLD handler
2026-03-04 13:46 [PATCH access-control/common 0/2] address probblem with SIGCHLD handler being temporarily overwritten Hannes Laimer
2026-03-04 13:46 ` [PATCH pve-common 1/1] RESTEnvironment: periodically reap workers as SIGCHLD fallback Hannes Laimer
@ 2026-03-04 13:46 ` Hannes Laimer
2026-03-06 17:16 ` [PATCH access-control/common 0/2] address probblem with SIGCHLD handler being temporarily overwritten Stefan Hanreich
2 siblings, 0 replies; 4+ messages in thread
From: Hannes Laimer @ 2026-03-04 13:46 UTC (permalink / raw)
To: pve-devel
PAM modules can temporarily override $SIG{CHLD}, causing SIGCHLDs from
RESTEnvironment worker processes to be lost. Run the PAM interaction in
a subprocess via PVE::Tools::run_fork to contain any signal handler
manipulation to the child.
Signed-off-by: Hannes Laimer <h.laimer@proxmox.com>
---
src/PVE/Auth/PAM.pm | 74 +++++++++++++++++++++++++--------------------
1 file changed, 42 insertions(+), 32 deletions(-)
diff --git a/src/PVE/Auth/PAM.pm b/src/PVE/Auth/PAM.pm
index 3aacfc0..8586da5 100755
--- a/src/PVE/Auth/PAM.pm
+++ b/src/PVE/Auth/PAM.pm
@@ -27,45 +27,55 @@ sub authenticate_user {
# user (www-data) need to be able to read /etc/passwd /etc/shadow
die "no password\n" if !$password;
- my $pamh = Authen::PAM->new(
- 'proxmox-ve-auth',
- $username,
- sub {
- my @res;
- while (@_) {
- my $msg_type = shift;
- my $msg = shift;
- push @res, (0, $password);
- }
- push @res, 0;
- return @res;
- },
- );
-
- if (!ref($pamh)) {
- my $err = $pamh->pam_strerror($pamh);
- die "error during PAM init: $err";
+ # PAM modules may temporarily override $SIG{CHLD}, causing SIGCHLDs from
+ # RESTEnvironment workers to be lost. Running the PAM interaction in a fork
+ # isolates any such handler manipulation from the parent process.
+ my $client_ip;
+ if (my $rpcenv = PVE::RPCEnvironment::get()) {
+ $client_ip = $rpcenv->get_client_ip();
}
- if (my $rpcenv = PVE::RPCEnvironment::get()) {
- if (my $ip = $rpcenv->get_client_ip()) {
- $pamh->pam_set_item(PAM_RHOST(), $ip);
+ PVE::Tools::run_fork(sub {
+ my $pamh = Authen::PAM->new(
+ 'proxmox-ve-auth',
+ $username,
+ sub {
+ my @res;
+ while (@_) {
+ my $msg_type = shift;
+ my $msg = shift;
+ push @res, (0, $password);
+ }
+ push @res, 0;
+ return @res;
+ },
+ );
+
+ if (!ref($pamh)) {
+ my $err = $pamh->pam_strerror($pamh);
+ die "error during PAM init: $err";
}
- }
- my $res;
+ if ($client_ip) {
+ $pamh->pam_set_item(PAM_RHOST(), $client_ip);
+ }
- if (($res = $pamh->pam_authenticate(0)) != PAM_SUCCESS) {
- my $err = $pamh->pam_strerror($res);
- die "$err\n";
- }
+ my $res;
- if (($res = $pamh->pam_acct_mgmt(0)) != PAM_SUCCESS) {
- my $err = $pamh->pam_strerror($res);
- die "$err\n";
- }
+ if (($res = $pamh->pam_authenticate(0)) != PAM_SUCCESS) {
+ my $err = $pamh->pam_strerror($res);
+ die "$err\n";
+ }
+
+ if (($res = $pamh->pam_acct_mgmt(0)) != PAM_SUCCESS) {
+ my $err = $pamh->pam_strerror($res);
+ die "$err\n";
+ }
+
+ $pamh = 0; # call destructor
- $pamh = 0; # call destructor
+ return 1;
+ });
return 1;
}
--
2.47.3
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH access-control/common 0/2] address probblem with SIGCHLD handler being temporarily overwritten
2026-03-04 13:46 [PATCH access-control/common 0/2] address probblem with SIGCHLD handler being temporarily overwritten Hannes Laimer
2026-03-04 13:46 ` [PATCH pve-common 1/1] RESTEnvironment: periodically reap workers as SIGCHLD fallback Hannes Laimer
2026-03-04 13:46 ` [PATCH pve-access-control 1/1] pam: fork for PAM authentication to isolate SIGCHLD handler Hannes Laimer
@ 2026-03-06 17:16 ` Stefan Hanreich
2 siblings, 0 replies; 4+ messages in thread
From: Stefan Hanreich @ 2026-03-06 17:16 UTC (permalink / raw)
To: Hannes Laimer, pve-devel
Applied this today while developing my integration tests and haven't
encountered issues w.r.t tasks hanging since then on my test instances.
Consider this:
Tested-by: Stefan Hanreich <s.hanreich@proxmox.com>
On 3/4/26 2:47 PM, Hannes Laimer wrote:
> Thanks a lot @Fabian and @Fiona for helping me debug this!
>
> The problem is that some libaries do overwrite the SIGCHLD handler
> temporarily, if the library is called fast enough this can lead to lost
> CHLD signals which in turn prevents `worker_reaper` from being called in
> RESTEnvironment. So tasks won't get cleaned-up until a different SIGCHLD
> arrives at the same `pvedeamon` process triggering `worker_reaper`.
>
> As @Fabian mentioned in [1] a general re-work of the task handling,
> potentially with `pidfd`s, would make a lot of sense.
>
> These two patches address the problem in the task handling structure as
> it currently is. They
> - run the PAM lib call in a fork, so signal handler changes the library
> does are isloated from our process
> - run `worker_reaper` periodically (5s) do catch any other potential
> instances of this, since it would be possible that the same happens
> with other libs, not just PAM
>
> [1] https://lore.proxmox.com/pve-devel/1772617908.i4bmsyq0kp.astroid@yuna.none/T/#m7b0f3873be5755f330e288cfa50905744f225b2b
>
>
> pve-common:
>
> Hannes Laimer (1):
> RESTEnvironment: periodically reap workers as SIGCHLD fallback
>
> src/PVE/RESTEnvironment.pm | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
>
> pve-access-control:
>
> Hannes Laimer (1):
> pam: fork for PAM authentication to isolate SIGCHLD handler
>
> src/PVE/Auth/PAM.pm | 74 +++++++++++++++++++++++++--------------------
> 1 file changed, 42 insertions(+), 32 deletions(-)
>
>
> Summary over all repositories:
> 2 files changed, 51 insertions(+), 32 deletions(-)
>
^ permalink raw reply [flat|nested] 4+ messages in thread