From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 195C61FF137 for ; Tue, 03 Mar 2026 09:23:44 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 39183F07D; Tue, 3 Mar 2026 09:24:46 +0100 (CET) Date: Tue, 03 Mar 2026 09:24:37 +0100 From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= Subject: Re: [PATCH pve-common] RESTEnvironment: fix possible race in `register_worker` To: Hannes Laimer , pve-devel@lists.proxmox.com References: <20260303071526.1150-1-h.laimer@proxmox.com> In-Reply-To: <20260303071526.1150-1-h.laimer@proxmox.com> MIME-Version: 1.0 User-Agent: astroid/0.17.0 (https://github.com/astroidmail/astroid) Message-Id: <1772525404.stk1gnguby.astroid@yuna.none> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1772526256386 X-SPAM-LEVEL: Spam detection results: 0 AWL -1.013 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.66 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.968 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.495 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: NE62FLHJEYRDRNUB3KCZGI5MCS2LMTDG X-Message-ID-Hash: NE62FLHJEYRDRNUB3KCZGI5MCS2LMTDG X-MailFrom: f.gruenbichler@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On March 3, 2026 8:15 am, Hannes Laimer wrote: > If the worker finishes right after we `waitpid` but before we add it to > `WORKER_PIDS` the `worker_reaper` won't `waitpid` it cause it iterates > over `WORKER_PIDS`. So it would be interesting to get more details how this happens in practice (with your reproducer)? the sequence when forking a worker is: - fork - child executes some setup code - child tells parent it is ready - child waits for parent to tell it it can continue register_worker is called by the parent in between the last two steps (after receiving the notification form the child, but before sending the notification to the child), so why does the child disappear inbetween? I think this might actually (also?) be missing error handling in fork_worker? all the POSIX::close/read/write calls there don't check for failure, which means we attempt to register a worker that has already failed at that point? and, somewhat tangentially related - should we switch this code over to use pidfds and waitid to close PID reuse races? > - the clean-up triggered by the SIGCHLD won't catch it cause it needs it= to > be in `WORKER_PIDS` > - and, `register_worker` won't because it was still running when it > `waitpid`'ed it >=20 > Moving the insertion into `WORKER_PIDS` before the `waitpid` solves > this by making sure it is > - always in the var for `worker_reaper` > - and, if SIGCHILD should trigger `worker_reaper` before we add it to > `WORKER_PIDS`, the `waitpid` in `register_worker` itself will catch > it >=20 > Signed-off-by: Hannes Laimer > --- > src/PVE/RESTEnvironment.pm | 11 ++++++----- > 1 file changed, 6 insertions(+), 5 deletions(-) >=20 > diff --git a/src/PVE/RESTEnvironment.pm b/src/PVE/RESTEnvironment.pm > index 4ed5c05..4677687 100644 > --- a/src/PVE/RESTEnvironment.pm > +++ b/src/PVE/RESTEnvironment.pm > @@ -99,17 +99,18 @@ my $register_worker =3D sub { > =20 > return if !$pid; > =20 > - # do not register if already finished > + $WORKER_PIDS->{$pid} =3D { > + user =3D> $user, > + upid =3D> $upid, > + }; > + > + # remove immediately if already finished > my $waitpid =3D waitpid($pid, WNOHANG); > if (defined($waitpid) && ($waitpid =3D=3D $pid)) { > delete($WORKER_PIDS->{$pid}); > return; > } > =20 > - $WORKER_PIDS->{$pid} =3D { > - user =3D> $user, > - upid =3D> $upid, > - }; > }; > =20 > # initialize environment - must be called once at program startup > --=20 > 2.47.3 >=20 >=20 >=20 >=20 >=20 >=20