* [PATCH pve-common] RESTEnvironment: fix possible race in `register_worker`
@ 2026-03-03 7:15 Hannes Laimer
2026-03-03 8:24 ` Fabian Grünbichler
0 siblings, 1 reply; 4+ messages in thread
From: Hannes Laimer @ 2026-03-03 7:15 UTC (permalink / raw)
To: pve-devel
If the worker finishes right after we `waitpid` but before we add it to
`WORKER_PIDS` the `worker_reaper` won't `waitpid` it cause it iterates
over `WORKER_PIDS`. So
- the clean-up triggered by the SIGCHLD won't catch it cause it needs it to
be in `WORKER_PIDS`
- and, `register_worker` won't because it was still running when it
`waitpid`'ed it
Moving the insertion into `WORKER_PIDS` before the `waitpid` solves
this by making sure it is
- always in the var for `worker_reaper`
- and, if SIGCHILD should trigger `worker_reaper` before we add it to
`WORKER_PIDS`, the `waitpid` in `register_worker` itself will catch
it
Signed-off-by: Hannes Laimer <h.laimer@proxmox.com>
---
src/PVE/RESTEnvironment.pm | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/src/PVE/RESTEnvironment.pm b/src/PVE/RESTEnvironment.pm
index 4ed5c05..4677687 100644
--- a/src/PVE/RESTEnvironment.pm
+++ b/src/PVE/RESTEnvironment.pm
@@ -99,17 +99,18 @@ my $register_worker = sub {
return if !$pid;
- # do not register if already finished
+ $WORKER_PIDS->{$pid} = {
+ user => $user,
+ upid => $upid,
+ };
+
+ # remove immediately if already finished
my $waitpid = waitpid($pid, WNOHANG);
if (defined($waitpid) && ($waitpid == $pid)) {
delete($WORKER_PIDS->{$pid});
return;
}
- $WORKER_PIDS->{$pid} = {
- user => $user,
- upid => $upid,
- };
};
# initialize environment - must be called once at program startup
--
2.47.3
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH pve-common] RESTEnvironment: fix possible race in `register_worker`
2026-03-03 7:15 [PATCH pve-common] RESTEnvironment: fix possible race in `register_worker` Hannes Laimer
@ 2026-03-03 8:24 ` Fabian Grünbichler
2026-03-03 8:37 ` Hannes Laimer
0 siblings, 1 reply; 4+ messages in thread
From: Fabian Grünbichler @ 2026-03-03 8:24 UTC (permalink / raw)
To: Hannes Laimer, pve-devel
On March 3, 2026 8:15 am, Hannes Laimer wrote:
> If the worker finishes right after we `waitpid` but before we add it to
> `WORKER_PIDS` the `worker_reaper` won't `waitpid` it cause it iterates
> over `WORKER_PIDS`. So
it would be interesting to get more details how this happens in practice
(with your reproducer)?
the sequence when forking a worker is:
- fork
- child executes some setup code
- child tells parent it is ready
- child waits for parent to tell it it can continue
register_worker is called by the parent in between the last two steps
(after receiving the notification form the child, but before sending the
notification to the child), so why does the child disappear inbetween?
I think this might actually (also?) be missing error handling in
fork_worker? all the POSIX::close/read/write calls there don't check for
failure, which means we attempt to register a worker that has already
failed at that point?
and, somewhat tangentially related - should we switch this code over to
use pidfds and waitid to close PID reuse races?
> - the clean-up triggered by the SIGCHLD won't catch it cause it needs it to
> be in `WORKER_PIDS`
> - and, `register_worker` won't because it was still running when it
> `waitpid`'ed it
>
> Moving the insertion into `WORKER_PIDS` before the `waitpid` solves
> this by making sure it is
> - always in the var for `worker_reaper`
> - and, if SIGCHILD should trigger `worker_reaper` before we add it to
> `WORKER_PIDS`, the `waitpid` in `register_worker` itself will catch
> it
>
> Signed-off-by: Hannes Laimer <h.laimer@proxmox.com>
> ---
> src/PVE/RESTEnvironment.pm | 11 ++++++-----
> 1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/src/PVE/RESTEnvironment.pm b/src/PVE/RESTEnvironment.pm
> index 4ed5c05..4677687 100644
> --- a/src/PVE/RESTEnvironment.pm
> +++ b/src/PVE/RESTEnvironment.pm
> @@ -99,17 +99,18 @@ my $register_worker = sub {
>
> return if !$pid;
>
> - # do not register if already finished
> + $WORKER_PIDS->{$pid} = {
> + user => $user,
> + upid => $upid,
> + };
> +
> + # remove immediately if already finished
> my $waitpid = waitpid($pid, WNOHANG);
> if (defined($waitpid) && ($waitpid == $pid)) {
> delete($WORKER_PIDS->{$pid});
> return;
> }
>
> - $WORKER_PIDS->{$pid} = {
> - user => $user,
> - upid => $upid,
> - };
> };
>
> # initialize environment - must be called once at program startup
> --
> 2.47.3
>
>
>
>
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH pve-common] RESTEnvironment: fix possible race in `register_worker`
2026-03-03 8:24 ` Fabian Grünbichler
@ 2026-03-03 8:37 ` Hannes Laimer
2026-03-04 9:57 ` Fabian Grünbichler
0 siblings, 1 reply; 4+ messages in thread
From: Hannes Laimer @ 2026-03-03 8:37 UTC (permalink / raw)
To: Fabian Grünbichler, pve-devel
On 2026-03-03 09:24, Fabian Grünbichler wrote:
> On March 3, 2026 8:15 am, Hannes Laimer wrote:
>> If the worker finishes right after we `waitpid` but before we add it to
>> `WORKER_PIDS` the `worker_reaper` won't `waitpid` it cause it iterates
>> over `WORKER_PIDS`. So
>
> it would be interesting to get more details how this happens in practice
> (with your reproducer)?
>
I do have a reproducer for task processes sticking around as zombies
when they are done, but this change unfortunately did not fix that. I
just noticed this in the process of finding the cause for the "original"
problem, so I guess this is not a problem in practice, cause of the
tight timings? But technically it would be possible (I think)
> the sequence when forking a worker is:
> - fork
> - child executes some setup code
> - child tells parent it is ready
> - child waits for parent to tell it it can continue
>
> register_worker is called by the parent in between the last two steps
> (after receiving the notification form the child, but before sending the
> notification to the child), so why does the child disappear inbetween?
>
> I think this might actually (also?) be missing error handling in
> fork_worker? all the POSIX::close/read/write calls there don't check for
> failure, which means we attempt to register a worker that has already
> failed at that point?
>
could be, but I don't think that should influence if a `SIGCHLD` is sent
when the child is done? Cause the handler for `SIGCHLD` in the parent is
never called...
I'll take a look at that, thanks for the pointer!
> and, somewhat tangentially related - should we switch this code over to
> use pidfds and waitid to close PID reuse races?
>
@Wolfgang also mentioned that, would probably make sense
>> - the clean-up triggered by the SIGCHLD won't catch it cause it needs it to
>> be in `WORKER_PIDS`
>> - and, `register_worker` won't because it was still running when it
>> `waitpid`'ed it
>>
>> Moving the insertion into `WORKER_PIDS` before the `waitpid` solves
>> this by making sure it is
>> - always in the var for `worker_reaper`
>> - and, if SIGCHILD should trigger `worker_reaper` before we add it to
>> `WORKER_PIDS`, the `waitpid` in `register_worker` itself will catch
>> it
>>
>> Signed-off-by: Hannes Laimer <h.laimer@proxmox.com>
>> ---
>> src/PVE/RESTEnvironment.pm | 11 ++++++-----
>> 1 file changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/src/PVE/RESTEnvironment.pm b/src/PVE/RESTEnvironment.pm
>> index 4ed5c05..4677687 100644
>> --- a/src/PVE/RESTEnvironment.pm
>> +++ b/src/PVE/RESTEnvironment.pm
>> @@ -99,17 +99,18 @@ my $register_worker = sub {
>>
>> return if !$pid;
>>
>> - # do not register if already finished
>> + $WORKER_PIDS->{$pid} = {
>> + user => $user,
>> + upid => $upid,
>> + };
>> +
>> + # remove immediately if already finished
>> my $waitpid = waitpid($pid, WNOHANG);
>> if (defined($waitpid) && ($waitpid == $pid)) {
>> delete($WORKER_PIDS->{$pid});
>> return;
>> }
>>
>> - $WORKER_PIDS->{$pid} = {
>> - user => $user,
>> - upid => $upid,
>> - };
>> };
>>
>> # initialize environment - must be called once at program startup
>> --
>> 2.47.3
>>
>>
>>
>>
>>
>>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH pve-common] RESTEnvironment: fix possible race in `register_worker`
2026-03-03 8:37 ` Hannes Laimer
@ 2026-03-04 9:57 ` Fabian Grünbichler
0 siblings, 0 replies; 4+ messages in thread
From: Fabian Grünbichler @ 2026-03-04 9:57 UTC (permalink / raw)
To: Hannes Laimer, pve-devel
On March 3, 2026 9:37 am, Hannes Laimer wrote:
> On 2026-03-03 09:24, Fabian Grünbichler wrote:
>> On March 3, 2026 8:15 am, Hannes Laimer wrote:
>>> If the worker finishes right after we `waitpid` but before we add it to
>>> `WORKER_PIDS` the `worker_reaper` won't `waitpid` it cause it iterates
>>> over `WORKER_PIDS`. So
>>
>> it would be interesting to get more details how this happens in practice
>> (with your reproducer)?
>>
>
> I do have a reproducer for task processes sticking around as zombies
> when they are done, but this change unfortunately did not fix that. I
> just noticed this in the process of finding the cause for the "original"
> problem, so I guess this is not a problem in practice, cause of the
> tight timings? But technically it would be possible (I think)
so we figured that one out in the meantime.. and it is probably best to
fix that issue by revamping the worker tracking entirely, both to fix
the bug and to improve performance/reduce overhead.
I still think we want to improve the error handling during forking, and
that this patch here doesn't actually fix anything substantial other
than temporary zombies if the worker terminates during setup.
it shouldn't hurt either though..
>> the sequence when forking a worker is:
>> - fork
>> - child executes some setup code
>> - child tells parent it is ready
>> - child waits for parent to tell it it can continue
>>
>> register_worker is called by the parent in between the last two steps
>> (after receiving the notification form the child, but before sending the
>> notification to the child), so why does the child disappear inbetween?
>>
>> I think this might actually (also?) be missing error handling in
>> fork_worker? all the POSIX::close/read/write calls there don't check for
>> failure, which means we attempt to register a worker that has already
>> failed at that point?
>>
>
> could be, but I don't think that should influence if a `SIGCHLD` is sent
> when the child is done? Cause the handler for `SIGCHLD` in the parent is
> never called...
> I'll take a look at that, thanks for the pointer!
>
>> and, somewhat tangentially related - should we switch this code over to
>> use pidfds and waitid to close PID reuse races?
>>
>
> @Wolfgang also mentioned that, would probably make sense
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-03-04 9:56 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-03-03 7:15 [PATCH pve-common] RESTEnvironment: fix possible race in `register_worker` Hannes Laimer
2026-03-03 8:24 ` Fabian Grünbichler
2026-03-03 8:37 ` Hannes Laimer
2026-03-04 9:57 ` Fabian Grünbichler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox