From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 5B04160FB0 for ; Wed, 16 Dec 2020 09:12:11 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 4C1B019689 for ; Wed, 16 Dec 2020 09:12:11 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [212.186.127.180]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id BF5D119680 for ; Wed, 16 Dec 2020 09:12:10 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 81F9E45212 for ; Wed, 16 Dec 2020 09:12:10 +0100 (CET) From: Dominik Csapak To: pbs-devel@lists.proxmox.com Date: Wed, 16 Dec 2020 09:12:09 +0100 Message-Id: <20201216081209.6997-1-d.csapak@proxmox.com> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.283 Adjusted score from AWL reputation of From: address KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_DNSWL_MED -2.3 Sender listed at https://www.dnswl.org/, medium trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [daemon.rs] Subject: [pbs-devel] [PATCH proxmox-backup] tools/daemon: improve reload behaviour X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Dec 2020 08:12:11 -0000 it seems that sometimes, the child process signal gets handled before the parent process signal. Systemd then ignores the childs signal (finished reloading) and only after going into reloading state because of the parent. this will never finish. Instead, wait for the state to change to 'reloading' after sending that signal in the parent, an only fork afterwards. This way we ensure that systemd knows about the reloading before actually trying to do it. Signed-off-by: Dominik Csapak --- this all goes away with systemds notify barrier hopefully.... src/tools/daemon.rs | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/src/tools/daemon.rs b/src/tools/daemon.rs index 6bb4a41b..2aa52772 100644 --- a/src/tools/daemon.rs +++ b/src/tools/daemon.rs @@ -291,6 +291,7 @@ where if let Err(e) = systemd_notify(SystemdNotify::Reloading) { log::error!("failed to notify systemd about the state change: {}", e); } + wait_service_is_active_or_reloading(service_name, true).await?; if let Err(e) = reloader.take().unwrap().fork_restart() { log::error!("error during reload: {}", e); let _ = systemd_notify(SystemdNotify::Status("error during reload".to_string())); @@ -305,7 +306,7 @@ where // FIXME: this is a hack, replace with sd_notify_barrier when available if server::is_reload_request() { - wait_service_is_active(service_name).await?; + wait_service_is_active_or_reloading(service_name, false).await?; } log::info!("daemon shut down..."); @@ -313,7 +314,7 @@ where } // hack, do not use if unsure! -async fn wait_service_is_active(service: &str) -> Result<(), Error> { +async fn wait_service_is_active_or_reloading(service: &str, wait_for_reload: bool) -> Result<(), Error> { tokio::time::delay_for(std::time::Duration::new(1, 0)).await; loop { let text = match tokio::process::Command::new("systemctl") @@ -328,7 +329,8 @@ async fn wait_service_is_active(service: &str) -> Result<(), Error> { Err(err) => bail!("executing 'systemctl is-active' failed - {}", err), }; - if text.trim().trim_start() != "reloading" { + let is_reload = text.trim().trim_start() == "reloading"; + if is_reload == wait_for_reload { return Ok(()); } tokio::time::delay_for(std::time::Duration::new(5, 0)).await; -- 2.20.1