From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <t.lamprecht@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 6F83E616D1
 for <pbs-devel@lists.proxmox.com>; Thu, 17 Dec 2020 13:50:19 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 6093925759
 for <pbs-devel@lists.proxmox.com>; Thu, 17 Dec 2020 13:49:49 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [212.186.127.180])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id CA48A25749
 for <pbs-devel@lists.proxmox.com>; Thu, 17 Dec 2020 13:49:48 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 9090C45003
 for <pbs-devel@lists.proxmox.com>; Thu, 17 Dec 2020 13:49:48 +0100 (CET)
To: Proxmox Backup Server development discussion
 <pbs-devel@lists.proxmox.com>, Dominik Csapak <d.csapak@proxmox.com>
References: <20201216081209.6997-1-d.csapak@proxmox.com>
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
Message-ID: <5d0ab7cb-0a77-5a10-d8a4-fc7a916dfdf9@proxmox.com>
Date: Thu, 17 Dec 2020 13:49:46 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:82.0) Gecko/20100101
 Thunderbird/82.0
MIME-Version: 1.0
In-Reply-To: <20201216081209.6997-1-d.csapak@proxmox.com>
Content-Type: text/plain; charset=UTF-8
Content-Language: en-GB
Content-Transfer-Encoding: 7bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.065 Adjusted score from AWL reputation of From: address
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 NICE_REPLY_A           -0.001 Looks like a legit reply (A)
 RCVD_IN_DNSWL_MED        -2.3 Sender listed at https://www.dnswl.org/,
 medium trust
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [daemon.rs]
Subject: Re: [pbs-devel] [PATCH proxmox-backup] tools/daemon: improve reload
 behaviour
X-BeenThere: pbs-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Thu, 17 Dec 2020 12:50:19 -0000

On 16/12/2020 09:12, Dominik Csapak wrote:
> it seems that sometimes, the child process signal gets handled
> before the parent process signal. Systemd then ignores the
> childs signal (finished reloading) and only after going into
> reloading state because of the parent. this will never finish.
> 
> Instead, wait for the state to change to 'reloading' after sending
> that signal in the parent, an only fork afterwards. This way
> we ensure that systemd knows about the reloading before actually trying
> to do it.
> 
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> this all goes away with systemds notify barrier hopefully....
> 
>  src/tools/daemon.rs | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/src/tools/daemon.rs b/src/tools/daemon.rs
> index 6bb4a41b..2aa52772 100644
> --- a/src/tools/daemon.rs
> +++ b/src/tools/daemon.rs
> @@ -291,6 +291,7 @@ where
>          if let Err(e) = systemd_notify(SystemdNotify::Reloading) {
>              log::error!("failed to notify systemd about the state change: {}", e);
>          }
> +        wait_service_is_active_or_reloading(service_name, true).await?;
>          if let Err(e) = reloader.take().unwrap().fork_restart() {
>              log::error!("error during reload: {}", e);
>              let _ = systemd_notify(SystemdNotify::Status("error during reload".to_string()));
> @@ -305,7 +306,7 @@ where
>  
>      // FIXME: this is a hack, replace with sd_notify_barrier when available
>      if server::is_reload_request() {
> -        wait_service_is_active(service_name).await?;
> +        wait_service_is_active_or_reloading(service_name, false).await?;
>      }
>  
>      log::info!("daemon shut down...");
> @@ -313,7 +314,7 @@ where
>  }
>  
>  // hack, do not use if unsure!
> -async fn wait_service_is_active(service: &str) -> Result<(), Error> {
> +async fn wait_service_is_active_or_reloading(service: &str, wait_for_reload: bool) -> Result<(), Error> {
>      tokio::time::delay_for(std::time::Duration::new(1, 0)).await;
>      loop {
>          let text = match tokio::process::Command::new("systemctl")
> @@ -328,7 +329,8 @@ async fn wait_service_is_active(service: &str) -> Result<(), Error> {
>              Err(err) => bail!("executing 'systemctl is-active' failed - {}", err),
>          };
>  
> -        if text.trim().trim_start() != "reloading" {
> +        let is_reload = text.trim().trim_start() == "reloading";
> +        if is_reload == wait_for_reload {

hmm, this feels and reads a bit weird, "wait_for_reload" false meaning
"wait for any status that is not reloading" seems subtle.

Combined with the name of the function this implies passing true means
that it will be waited until the service is active *or* reloading, and 
passing false implies its just waited until active.

Can you rethink the interface here, maybe split it (I'm a bit buried in
other work for more useful thoughts/proposal - sorry).

>              return Ok(());
>          }
>          tokio::time::delay_for(std::time::Duration::new(5, 0)).await;
>