From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id A9F1590AC4 for ; Fri, 23 Sep 2022 10:16:38 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 8258821180 for ; Fri, 23 Sep 2022 10:16:08 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Fri, 23 Sep 2022 10:16:07 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 9352444570 for ; Fri, 23 Sep 2022 10:16:07 +0200 (CEST) Date: Fri, 23 Sep 2022 10:16:06 +0200 From: Wolfgang Bumiller To: Dominik Csapak Cc: pve-devel@lists.proxmox.com Message-ID: <20220923081606.eq2xsnywhnvkceyi@wobu-vie.proxmox.com> References: <20220922141935.653179-1-d.csapak@proxmox.com> <20220922141935.653179-2-d.csapak@proxmox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220922141935.653179-2-d.csapak@proxmox.com> X-SPAM-LEVEL: Spam detection results: 0 AWL 0.262 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [PATCH qemu-server v2 1/3] qmeventd: rework 'forced_cleanup' handling and set timeout to 60s X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Sep 2022 08:16:38 -0000 On Thu, Sep 22, 2022 at 04:19:33PM +0200, Dominik Csapak wrote: > currently, the 'forced_cleanup' (sending SIGKILL to the qemu process), > is intended to be triggered 5 seconds after sending the initial shutdown > signal (SIGTERM) which is sadly not enough for some setups. > > Accidentally, it could be triggered earlier than 5 seconds, if a > SIGALRM triggers in the timespan directly before setting it again. > > Also, this approach means that depending on when machines are shutdown > their forced cleanup may happen after 5 seconds, or any time after, if > new vms are shut off in the meantime. > > Improve this situation by reworking the way we deal with this cleanup. > We save the time incl. timeout in the CleanupData, and set a timeout > to 'epoll_wait' of 10 seconds, which will then trigger a forced_cleanup. > Remove entries from the forced_cleanup list when that entry is killed, > or when the normal cleanup took place. > > To improve the shutdown behaviour, increase the default timeout to 60 > seconds, which should be enough, but add a commandline toggle where > users can set it to a different value. > > Signed-off-by: Dominik Csapak > --- > qmeventd/qmeventd.c | 73 +++++++++++++++++++++++---------------------- > qmeventd/qmeventd.h | 2 ++ > 2 files changed, 39 insertions(+), 36 deletions(-) > > diff --git a/qmeventd/qmeventd.c b/qmeventd/qmeventd.c > index 8d32827..46bc7eb 100644 > --- a/qmeventd/qmeventd.c > +++ b/qmeventd/qmeventd.c > @@ -551,27 +558,16 @@ handle_client(struct Client *client) > json_tokener_free(tok); > } > > - > -/* > - * SIGALRM and cleanup handling > - * > - * terminate_client will set an alarm for 5 seconds and add its client's PID to > - * the forced_cleanups list - when the timer expires, we iterate the list and > - * attempt to issue SIGKILL to all processes which haven't yet stopped. > - */ > - > -static void > -alarm_handler(__attribute__((unused)) int signum) > -{ > - alarm_triggered = 1; > -} > - > static void > -sigkill(void *ptr, __attribute__((unused)) void *unused) If you change the style here... (which I'm not a fan of btw.) > +sigkill(struct CleanupData *ptr, time_t *cur_time) > { > - struct CleanupData data = *((struct CleanupData *)ptr); > + struct CleanupData data = *ptr; ...at least get rid of this line completely ^ and just use `ptr->` instea of `data.`, I see no reason to keep copying the data onto the stack? (or with the old style, make `data` a pointer and skip the cast) > int err; > > + if (data.timeout > *cur_time) { > + return; > + } > + > if (data.pidfd > 0) { > err = pidfd_send_signal(data.pidfd, SIGKILL, NULL, 0); > (void)close(data.pidfd);