From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <w.bumiller@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id E46539074D
 for <pve-devel@lists.proxmox.com>; Thu, 22 Sep 2022 14:46:40 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id BED5F19166
 for <pve-devel@lists.proxmox.com>; Thu, 22 Sep 2022 14:46:10 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS
 for <pve-devel@lists.proxmox.com>; Thu, 22 Sep 2022 14:46:10 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id C6F1944536
 for <pve-devel@lists.proxmox.com>; Thu, 22 Sep 2022 14:46:09 +0200 (CEST)
Date: Thu, 22 Sep 2022 14:46:08 +0200
From: Wolfgang Bumiller <w.bumiller@proxmox.com>
To: Dominik Csapak <d.csapak@proxmox.com>
Cc: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
 Fabian =?utf-8?Q?Gr=C3=BCnbichler?= <f.gruenbichler@proxmox.com>
Message-ID: <20220922124608.hoqg7kbdlhpg52vo@casey.proxmox.com>
References: <20220921124911.3224970-1-d.csapak@proxmox.com>
 <20220921124911.3224970-2-d.csapak@proxmox.com>
 <1663834293.mozxwx9wgy.astroid@nora.none>
 <a68d9fe5-97b9-cb42-e882-9558a6b17854@proxmox.com>
 <20220922120138.iwjwxeairowgcwnm@casey.proxmox.com>
 <bcf15002-5719-3ba0-fb93-7d0c53a8cb97@proxmox.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <bcf15002-5719-3ba0-fb93-7d0c53a8cb97@proxmox.com>
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.264 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pve-devel] [PATCH qemu-server 1/3] qmeventd: rework
 'forced_cleanup' handling and set timeout to 60s
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Thu, 22 Sep 2022 12:46:40 -0000

On Thu, Sep 22, 2022 at 02:22:45PM +0200, Dominik Csapak wrote:
> On 9/22/22 14:01, Wolfgang Bumiller wrote:
> > On Thu, Sep 22, 2022 at 01:31:49PM +0200, Dominik Csapak wrote:
> > > [snip]
> > > > > -/*
> > > > > - * SIGALRM and cleanup handling
> > > > > - *
> > > > > - * terminate_client will set an alarm for 5 seconds and add its client's PID to
> > > > > - * the forced_cleanups list - when the timer expires, we iterate the list and
> > > > > - * attempt to issue SIGKILL to all processes which haven't yet stopped.
> > > > > - */
> > > > > -
> > > > > -static void
> > > > > -alarm_handler(__attribute__((unused)) int signum)
> > > > > -{
> > > > > -    alarm_triggered = 1;
> > > > > -}
> > > > > -
> > > > 
> > > > wasn't this intentionally decoupled like this?
> > > > 
> > > > alarm_handler just sets the flag
> > > > actual force cleanup is conditionalized on the alarm having triggered,
> > > > but the cleanup happens outside of the signal handler..
> > > > 
> > > > is there a reason from switching away from these scheme? we don't need
> > > > to do the cleanup in the signal handler (timing is already plenty fuzzy
> > > > anyway ;))
> > > 
> > > no real reason, i found the code somewhat cleaner, but you're right,
> > > we probably want to keep that, and just trigger it regularly
> > 
> >  From what I can tell the only point of this signal is to interrupt
> > `epoll()` after a while to call the cleanup/kill handler since we only
> > have a single worker here that needs to do some work after a timeout.
> > 
> > Why not either:
> >    - set a bool instead of calling `alarm()` which causes the next
> >      `epoll()` call to use a timeout and call the cleanups if epoll turns
> >      up empty >    - or create a timerfd (timerfd_create(2)) in the beginning which we
> >      add to the epoll context and use `timerfd_settime(2)` in place of
> >      `alarm()`, which will also wake up the epoll call without having to add
> >      timeouts to it
> > 
> > `alarm()` is just such a gross interface...
> > In theory we'd also be able to ditch all of those `EINTR` loops as we
> > wouldn't be expecting any interrupts anymore... (and if we did expect
> > them, we could add a `signalfd(2)` to `epoll()` as well ;-)
> 
> first one sounds much simpler but the second one sounds much more elegant ;)
> i'll see what works/feels better
> 
> couldn't we also directly add a new timerfd for each client that
> needs such a timeout instead of managing some list ?

I'm not really a fan of abusing kernel-side resources for a user-space
scheduling problem ;-).

If you want a per-client timeout, I'd much rather just have a list of
points in time that epoll() should target.

That list may well be `struct Client` itself if you want to merge the
data as you described below. The `struct Client` could just receive a
`STAILQ_ENTRY(Client) cleanup_entries;` member (queue(7), stailq(3)),
and a cleanup time which is used to generate the timeout for `epoll()`,
and on every wakeup, we can march through the already expired queue
entries.

> the cleanupdata could go into the even.data.ptr and we wouldn't
> have to do anything periodically, just handle the timeout
> when epoll wakes up?
> 
> we probably would have to merge the client and clenaupdata structs
> so that we can see which is which, but that should not be
> that of a problem?