From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id E46539074D for ; Thu, 22 Sep 2022 14:46:40 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id BED5F19166 for ; Thu, 22 Sep 2022 14:46:10 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Thu, 22 Sep 2022 14:46:10 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id C6F1944536 for ; Thu, 22 Sep 2022 14:46:09 +0200 (CEST) Date: Thu, 22 Sep 2022 14:46:08 +0200 From: Wolfgang Bumiller To: Dominik Csapak Cc: Proxmox VE development discussion , Fabian =?utf-8?Q?Gr=C3=BCnbichler?= Message-ID: <20220922124608.hoqg7kbdlhpg52vo@casey.proxmox.com> References: <20220921124911.3224970-1-d.csapak@proxmox.com> <20220921124911.3224970-2-d.csapak@proxmox.com> <1663834293.mozxwx9wgy.astroid@nora.none> <20220922120138.iwjwxeairowgcwnm@casey.proxmox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-SPAM-LEVEL: Spam detection results: 0 AWL 0.264 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [PATCH qemu-server 1/3] qmeventd: rework 'forced_cleanup' handling and set timeout to 60s X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Sep 2022 12:46:40 -0000 On Thu, Sep 22, 2022 at 02:22:45PM +0200, Dominik Csapak wrote: > On 9/22/22 14:01, Wolfgang Bumiller wrote: > > On Thu, Sep 22, 2022 at 01:31:49PM +0200, Dominik Csapak wrote: > > > [snip] > > > > > -/* > > > > > - * SIGALRM and cleanup handling > > > > > - * > > > > > - * terminate_client will set an alarm for 5 seconds and add its client's PID to > > > > > - * the forced_cleanups list - when the timer expires, we iterate the list and > > > > > - * attempt to issue SIGKILL to all processes which haven't yet stopped. > > > > > - */ > > > > > - > > > > > -static void > > > > > -alarm_handler(__attribute__((unused)) int signum) > > > > > -{ > > > > > - alarm_triggered = 1; > > > > > -} > > > > > - > > > > > > > > wasn't this intentionally decoupled like this? > > > > > > > > alarm_handler just sets the flag > > > > actual force cleanup is conditionalized on the alarm having triggered, > > > > but the cleanup happens outside of the signal handler.. > > > > > > > > is there a reason from switching away from these scheme? we don't need > > > > to do the cleanup in the signal handler (timing is already plenty fuzzy > > > > anyway ;)) > > > > > > no real reason, i found the code somewhat cleaner, but you're right, > > > we probably want to keep that, and just trigger it regularly > > > > From what I can tell the only point of this signal is to interrupt > > `epoll()` after a while to call the cleanup/kill handler since we only > > have a single worker here that needs to do some work after a timeout. > > > > Why not either: > > - set a bool instead of calling `alarm()` which causes the next > > `epoll()` call to use a timeout and call the cleanups if epoll turns > > up empty > - or create a timerfd (timerfd_create(2)) in the beginning which we > > add to the epoll context and use `timerfd_settime(2)` in place of > > `alarm()`, which will also wake up the epoll call without having to add > > timeouts to it > > > > `alarm()` is just such a gross interface... > > In theory we'd also be able to ditch all of those `EINTR` loops as we > > wouldn't be expecting any interrupts anymore... (and if we did expect > > them, we could add a `signalfd(2)` to `epoll()` as well ;-) > > first one sounds much simpler but the second one sounds much more elegant ;) > i'll see what works/feels better > > couldn't we also directly add a new timerfd for each client that > needs such a timeout instead of managing some list ? I'm not really a fan of abusing kernel-side resources for a user-space scheduling problem ;-). If you want a per-client timeout, I'd much rather just have a list of points in time that epoll() should target. That list may well be `struct Client` itself if you want to merge the data as you described below. The `struct Client` could just receive a `STAILQ_ENTRY(Client) cleanup_entries;` member (queue(7), stailq(3)), and a cleanup time which is used to generate the timeout for `epoll()`, and on every wakeup, we can march through the already expired queue entries. > the cleanupdata could go into the even.data.ptr and we wouldn't > have to do anything periodically, just handle the timeout > when epoll wakes up? > > we probably would have to merge the client and clenaupdata structs > so that we can see which is which, but that should not be > that of a problem?