all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Hannes Laimer <h.laimer@proxmox.com>
To: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>,
	pve-devel@lists.proxmox.com
Subject: Re: [PATCH pve-common] RESTEnvironment: fix possible race in `register_worker`
Date: Tue, 3 Mar 2026 09:37:41 +0100	[thread overview]
Message-ID: <88a31fde-949b-4e3c-9561-f68c6cedc28b@proxmox.com> (raw)
In-Reply-To: <1772525404.stk1gnguby.astroid@yuna.none>

On 2026-03-03 09:24, Fabian Grünbichler wrote:
> On March 3, 2026 8:15 am, Hannes Laimer wrote:
>> If the worker finishes right after we `waitpid` but before we add it to
>> `WORKER_PIDS` the `worker_reaper` won't `waitpid` it cause it iterates
>> over `WORKER_PIDS`. So
> 
> it would be interesting to get more details how this happens in practice
> (with your reproducer)?
> 

I do have a reproducer for task processes sticking around as zombies
when they are done, but this change unfortunately did not fix that. I
just noticed this in the process of finding the cause for the "original"
problem, so I guess this is not a problem in practice, cause of the
tight timings? But technically it would be possible (I think)

> the sequence when forking a worker is:
> - fork
> - child executes some setup code
> - child tells parent it is ready
> - child waits for parent to tell it it can continue
> 
> register_worker is called by the parent in between the last two steps
> (after receiving the notification form the child, but before sending the
> notification to the child), so why does the child disappear inbetween?
> 
> I think this might actually (also?) be missing error handling in
> fork_worker? all the POSIX::close/read/write calls there don't check for
> failure, which means we attempt to register a worker that has already
> failed at that point?
> 

could be, but I don't think that should influence if a `SIGCHLD` is sent
when the child is done? Cause the handler for `SIGCHLD` in the parent is
never called...
I'll take a look at that, thanks for the pointer!

> and, somewhat tangentially related - should we switch this code over to
> use pidfds and waitid to close PID reuse races?
> 

@Wolfgang also mentioned that, would probably make sense

>>  - the clean-up triggered by the SIGCHLD won't catch it cause it needs it to
>>    be in `WORKER_PIDS`
>>  - and, `register_worker` won't because it was still running when it
>>    `waitpid`'ed it
>>
>> Moving the insertion into `WORKER_PIDS` before the `waitpid` solves
>> this by making sure it is
>>  - always in the var for `worker_reaper`
>>  - and, if SIGCHILD should trigger `worker_reaper` before we add it to
>>    `WORKER_PIDS`, the `waitpid` in `register_worker` itself will catch
>>    it
>>
>> Signed-off-by: Hannes Laimer <h.laimer@proxmox.com>
>> ---
>>  src/PVE/RESTEnvironment.pm | 11 ++++++-----
>>  1 file changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/src/PVE/RESTEnvironment.pm b/src/PVE/RESTEnvironment.pm
>> index 4ed5c05..4677687 100644
>> --- a/src/PVE/RESTEnvironment.pm
>> +++ b/src/PVE/RESTEnvironment.pm
>> @@ -99,17 +99,18 @@ my $register_worker = sub {
>>  
>>      return if !$pid;
>>  
>> -    # do not register if already finished
>> +    $WORKER_PIDS->{$pid} = {
>> +        user => $user,
>> +        upid => $upid,
>> +    };
>> +
>> +    # remove immediately if already finished
>>      my $waitpid = waitpid($pid, WNOHANG);
>>      if (defined($waitpid) && ($waitpid == $pid)) {
>>          delete($WORKER_PIDS->{$pid});
>>          return;
>>      }
>>  
>> -    $WORKER_PIDS->{$pid} = {
>> -        user => $user,
>> -        upid => $upid,
>> -    };
>>  };
>>  
>>  # initialize environment - must be called once at program startup
>> -- 
>> 2.47.3
>>
>>
>>
>>
>>
>>





  reply	other threads:[~2026-03-03  8:37 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-03  7:15 Hannes Laimer
2026-03-03  8:24 ` Fabian Grünbichler
2026-03-03  8:37   ` Hannes Laimer [this message]
2026-03-04  9:57     ` Fabian Grünbichler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=88a31fde-949b-4e3c-9561-f68c6cedc28b@proxmox.com \
    --to=h.laimer@proxmox.com \
    --cc=f.gruenbichler@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal