* [PATCH pve-common] REST environment: `waitpid` directly on SIGCHLD
@ 2026-02-26 10:33 Hannes Laimer
0 siblings, 0 replies; only message in thread
From: Hannes Laimer @ 2026-02-26 10:33 UTC (permalink / raw)
To: pve-devel
If we are in an API server that uses AnyEvent, we currently postpone the
task cleanup because the `active_workers` call may lead to a problem
described in [1]. This, however, is only relevant to the
`log_task_result` part of the `worker_reaper` sub, since that calls
`active_workers`.
The problem with postponing `waitpid` is that our task status endpoint
checks the proc file for the PID, and without the `waitpid` call, the
process sticks around as a zombie (`Z`). This leads to the status being
reported as 'running' even though it is not.
If the API is called rapidly, this postponement is pushed back
potentially indefinitely. This is generally problematic, but
specifically for a use-case where a task is started and the `/status`
endpoint is polled to check for task completion. In that case, if the
polling is fast enough, it prevents the cleanup and therefore the
correct status reporting.
I could reproduce this pretty consistently with:
```perl
my $upid = self->client()->put('/some/task/starting/endpoint', {});
do {
$res = $self->client()->get("/nodes/localhost/tasks/$upid/status");
# sleep 1
} while ($res->{status} eq "running");
```
Adding a (long enough) sleep prevented the problem. With `0.5`, the task
was correctly reported as done about 30-60s after it actually finished;
with `0.1`, it took a couple of minutes. Without a `sleep`, tasks would
never (>5-10 mins) be reported as done.
[1] 6870afa4 ("RESTEnvironment: better SIGCHLD handling in AnyEvent event loop")
Signed-off-by: Hannes Laimer <h.laimer@proxmox.com>
---
src/PVE/RESTEnvironment.pm | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/src/PVE/RESTEnvironment.pm b/src/PVE/RESTEnvironment.pm
index 81d7e29..4ed5c05 100644
--- a/src/PVE/RESTEnvironment.pm
+++ b/src/PVE/RESTEnvironment.pm
@@ -78,7 +78,16 @@ my $worker_reaper = sub {
if (defined($waitpid) && ($waitpid == $pid)) {
my $info = $WORKER_PIDS->{$pid};
if ($info && $info->{upid} && $info->{user}) {
- &$log_task_result($info->{upid}, $info->{user}, $?);
+ # when we're using AnyEvent, we have to postpone the call to log_task_result, otherwise it
+ # might interfere with running api calls
+ my $exit_code = $?;
+ if (defined($AnyEvent::MODEL)) {
+ AnyEvent::postpone {
+ $log_task_result->($info->{upid}, $info->{user}, $exit_code)
+ };
+ } else {
+ $log_task_result->($info->{upid}, $info->{user}, $exit_code);
+ }
}
delete($WORKER_PIDS->{$pid});
}
@@ -115,13 +124,7 @@ sub init {
if !$type || $type !~ m/^(cli|pub|priv|ha)$/;
$SIG{CHLD} = sub {
- # when we're using AnyEvent, we have to postpone the call to worker_reaper, otherwise it
- # might interfere with running api calls
- if (defined($AnyEvent::MODEL)) {
- AnyEvent::postpone { $worker_reaper->() };
- } else {
- $worker_reaper->();
- }
+ $worker_reaper->();
};
# environment types
--
2.47.3
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2026-02-26 10:33 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-26 10:33 [PATCH pve-common] REST environment: `waitpid` directly on SIGCHLD Hannes Laimer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox