all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Hannes Laimer <h.laimer@proxmox.com>, pve-devel@lists.proxmox.com
Cc: Wolfgang Bumiller <w.bumiller@proxmox.com>
Subject: Re: [PATCH pve-common 1/1] RESTEnvironment: periodically reap workers as SIGCHLD fallback
Date: Thu, 12 Mar 2026 10:48:33 +0100	[thread overview]
Message-ID: <1773305738.33dnx84kzu.astroid@yuna.none> (raw)
In-Reply-To: <20260304134649.82272-2-h.laimer@proxmox.com>

On March 4, 2026 2:46 pm, Hannes Laimer wrote:
> Libraries may temporarily override $SIG{CHLD}, causing worker exit
> signals to be lost. Poll worker_reaper every 5 seconds via an AnyEvent
> timer to catch any missed signals in API server contexts where this can
> be problematic.
> 
> Signed-off-by: Hannes Laimer <h.laimer@proxmox.com>
> ---
>  src/PVE/RESTEnvironment.pm | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/src/PVE/RESTEnvironment.pm b/src/PVE/RESTEnvironment.pm
> index 81d7e29..cb44823 100644
> --- a/src/PVE/RESTEnvironment.pm
> +++ b/src/PVE/RESTEnvironment.pm
> @@ -37,6 +37,7 @@ my $rest_env;
>  
>  my $WORKER_PIDS;
>  my $WORKER_FLAG = 0;
> +my $worker_reaper_timer;
>  
>  my $log_task_result = sub {
>      my ($upid, $user, $status) = @_;
> @@ -124,6 +125,14 @@ sub init {
>          }
>      };
>  
> +    # Periodically reap workers as a fallback in case a library call temporarily overrides
> +    # $SIG{CHLD} and causes us to miss a signal. Only useful when an event loop is running.

I'd replace the "Only useful" with something like "Only runs", makes it more explicit.

AFAICT, this not only affects SIGCHLD being redirected, but also SIGCHLD
triggering while we are processing a previous SIGCHLD (if we checked
that particular PID already while iterating, but before it exited).

we could maybe consider to change worker_reaper to do

my $dead;
do {
  $dead = waitpid(-1, WNOHANG))
  if (my $info = $WORKER_PIDS->{$dead}) {
    # log and delete
  }
} while $dead > 0;

that way, it should be faster/cheaper and handle other (transitive,
non-worker) zombies that might have been missed by whatever forked them?

we do have quite a few places where we explicitly wait for a specific
child though, that need to be checked because they might break as a
result..

so in the end the best way to make this better is maybe to use pidfds
via AnyEvent if we have an event loop, to get notified and handle
workers exiting without relying on SIGCHLD at all..

for CLI usage, we already waitpid the forked worker directly anyway (so
should not have a zombie problem, and that flow could switch to a pidfd
as well).. there's two fork_worker calls that set the background flag,
but I don't even see how they'd be invoked in a CLI environment..

> +    $worker_reaper_timer = AnyEvent->timer(
> +        after => 5,
> +        interval => 5,
> +        cb => sub { $worker_reaper->() },

worker_reaper isn't that cheap atm, I don't think we want to run it
every 5s.. on a busy server with lots of tasks, this will do a syscall
per registered task worker..

since this is "just" the fallback for the rare case of "somehow we
missed our signal", doing it once every minute or so should be enough?

I think we can apply this with the timer interval set to a more sensible
value, and the comments slightly adapted..

> +    );
> +
>      # environment types
>      # cli  ... command started fron command line
>      # pub  ... access from public server (pveproxy)
> -- 
> 2.47.3
> 
> 
> 
> 
> 
> 




  reply	other threads:[~2026-03-12  9:49 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-04 13:46 [PATCH access-control/common 0/2] address probblem with SIGCHLD handler being temporarily overwritten Hannes Laimer
2026-03-04 13:46 ` [PATCH pve-common 1/1] RESTEnvironment: periodically reap workers as SIGCHLD fallback Hannes Laimer
2026-03-12  9:48   ` Fabian Grünbichler [this message]
2026-03-04 13:46 ` [PATCH pve-access-control 1/1] pam: fork for PAM authentication to isolate SIGCHLD handler Hannes Laimer
2026-03-12  9:44   ` Fabian Grünbichler
2026-03-06 17:16 ` [PATCH access-control/common 0/2] address probblem with SIGCHLD handler being temporarily overwritten Stefan Hanreich
2026-03-11 14:56 ` Michael Köppl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1773305738.33dnx84kzu.astroid@yuna.none \
    --to=f.gruenbichler@proxmox.com \
    --cc=h.laimer@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    --cc=w.bumiller@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal