From: Hannes Laimer <h.laimer@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH pve-common] REST environment: `waitpid` directly on SIGCHLD
Date: Thu, 26 Feb 2026 11:33:48 +0100 [thread overview]
Message-ID: <20260226103348.37202-1-h.laimer@proxmox.com> (raw)
If we are in an API server that uses AnyEvent, we currently postpone the
task cleanup because the `active_workers` call may lead to a problem
described in [1]. This, however, is only relevant to the
`log_task_result` part of the `worker_reaper` sub, since that calls
`active_workers`.
The problem with postponing `waitpid` is that our task status endpoint
checks the proc file for the PID, and without the `waitpid` call, the
process sticks around as a zombie (`Z`). This leads to the status being
reported as 'running' even though it is not.
If the API is called rapidly, this postponement is pushed back
potentially indefinitely. This is generally problematic, but
specifically for a use-case where a task is started and the `/status`
endpoint is polled to check for task completion. In that case, if the
polling is fast enough, it prevents the cleanup and therefore the
correct status reporting.
I could reproduce this pretty consistently with:
```perl
my $upid = self->client()->put('/some/task/starting/endpoint', {});
do {
$res = $self->client()->get("/nodes/localhost/tasks/$upid/status");
# sleep 1
} while ($res->{status} eq "running");
```
Adding a (long enough) sleep prevented the problem. With `0.5`, the task
was correctly reported as done about 30-60s after it actually finished;
with `0.1`, it took a couple of minutes. Without a `sleep`, tasks would
never (>5-10 mins) be reported as done.
[1] 6870afa4 ("RESTEnvironment: better SIGCHLD handling in AnyEvent event loop")
Signed-off-by: Hannes Laimer <h.laimer@proxmox.com>
---
src/PVE/RESTEnvironment.pm | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/src/PVE/RESTEnvironment.pm b/src/PVE/RESTEnvironment.pm
index 81d7e29..4ed5c05 100644
--- a/src/PVE/RESTEnvironment.pm
+++ b/src/PVE/RESTEnvironment.pm
@@ -78,7 +78,16 @@ my $worker_reaper = sub {
if (defined($waitpid) && ($waitpid == $pid)) {
my $info = $WORKER_PIDS->{$pid};
if ($info && $info->{upid} && $info->{user}) {
- &$log_task_result($info->{upid}, $info->{user}, $?);
+ # when we're using AnyEvent, we have to postpone the call to log_task_result, otherwise it
+ # might interfere with running api calls
+ my $exit_code = $?;
+ if (defined($AnyEvent::MODEL)) {
+ AnyEvent::postpone {
+ $log_task_result->($info->{upid}, $info->{user}, $exit_code)
+ };
+ } else {
+ $log_task_result->($info->{upid}, $info->{user}, $exit_code);
+ }
}
delete($WORKER_PIDS->{$pid});
}
@@ -115,13 +124,7 @@ sub init {
if !$type || $type !~ m/^(cli|pub|priv|ha)$/;
$SIG{CHLD} = sub {
- # when we're using AnyEvent, we have to postpone the call to worker_reaper, otherwise it
- # might interfere with running api calls
- if (defined($AnyEvent::MODEL)) {
- AnyEvent::postpone { $worker_reaper->() };
- } else {
- $worker_reaper->();
- }
+ $worker_reaper->();
};
# environment types
--
2.47.3
reply other threads:[~2026-02-26 10:33 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260226103348.37202-1-h.laimer@proxmox.com \
--to=h.laimer@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.