From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <d.csapak@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id BEED3688EF
 for <pbs-devel@lists.proxmox.com>; Wed, 11 Nov 2020 09:28:23 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id AC1E42A030
 for <pbs-devel@lists.proxmox.com>; Wed, 11 Nov 2020 09:27:53 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [212.186.127.180])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id 77C832A025
 for <pbs-devel@lists.proxmox.com>; Wed, 11 Nov 2020 09:27:52 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 41A3441F29
 for <pbs-devel@lists.proxmox.com>; Wed, 11 Nov 2020 09:27:52 +0100 (CET)
From: Dominik Csapak <d.csapak@proxmox.com>
To: pbs-devel@lists.proxmox.com
Date: Wed, 11 Nov 2020 09:27:51 +0100
Message-Id: <20201111082751.7358-1-d.csapak@proxmox.com>
X-Mailer: git-send-email 2.20.1
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.375 Adjusted score from AWL reputation of From: address
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 RCVD_IN_DNSWL_MED        -2.3 Sender listed at https://www.dnswl.org/,
 medium trust
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [proxmox-backup-proxy.rs, daemon.rs, proxmox-backup-api.rs]
Subject: [pbs-devel] [PATCH proxmox-backup v2] daemon: add hack for sd_notify
X-BeenThere: pbs-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Wed, 11 Nov 2020 08:28:23 -0000

sd_notify is not synchronous, iow. it only waits until the message
reaches the queue not until it is processed by systemd

when the process that sent such a message exits before systemd could
process it, it cannot be associated to the correct pid

so in case of reloading, we send a message with 'MAINPID=<newpid>'
to signal that it will change. if now the old process exits before
systemd knows this, it will not accept the 'READY=1' message from the
child, since it rejects the MAINPID change

since there is no (AFAICS) library interface to check the unit status,
we use 'systemctl is-active <SERVICE_NAME>' to check the state until
it is not 'reloading' anymore.

on newer systemd versions, there is 'sd_notify_barrier' which would
allow us to wait for systemd to have all messages from the current
pid to be processed before acknowledging to the child, but on buster
the systemd version is to old...

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
changes from v1:
* only check the state in the parent pid, do not send ready
  in the child in a loop
* changed the comment to include "FIXME"
 src/bin/proxmox-backup-api.rs   |  1 +
 src/bin/proxmox-backup-proxy.rs |  1 +
 src/tools/daemon.rs             | 26 ++++++++++++++++++++++++++
 3 files changed, 28 insertions(+)

diff --git a/src/bin/proxmox-backup-api.rs b/src/bin/proxmox-backup-api.rs
index 7d59717b..70d4cb5d 100644
--- a/src/bin/proxmox-backup-api.rs
+++ b/src/bin/proxmox-backup-api.rs
@@ -76,6 +76,7 @@ async fn run() -> Result<(), Error> {
                 })
             )
         },
+        "proxmox-backup.service",
     );
 
     server::write_pid(buildcfg::PROXMOX_BACKUP_API_PID_FN)?;
diff --git a/src/bin/proxmox-backup-proxy.rs b/src/bin/proxmox-backup-proxy.rs
index 04c976b5..259d558a 100644
--- a/src/bin/proxmox-backup-proxy.rs
+++ b/src/bin/proxmox-backup-proxy.rs
@@ -133,6 +133,7 @@ async fn run() -> Result<(), Error> {
                 .map(|_| ())
             )
         },
+        "proxmox-backup-proxy.service",
     );
 
     server::write_pid(buildcfg::PROXMOX_BACKUP_PROXY_PID_FN)?;
diff --git a/src/tools/daemon.rs b/src/tools/daemon.rs
index 249ce2ad..63eb6dee 100644
--- a/src/tools/daemon.rs
+++ b/src/tools/daemon.rs
@@ -260,6 +260,7 @@ impl Future for NotifyReady {
 pub async fn create_daemon<F, S>(
     address: std::net::SocketAddr,
     create_service: F,
+    service_name: &str,
 ) -> Result<(), Error>
 where
     F: FnOnce(tokio::net::TcpListener, NotifyReady) -> Result<S, Error>,
@@ -301,10 +302,35 @@ where
     if let Some(future) = finish_future {
         future.await;
     }
+
+    // FIXME: this is a hack, replace with sd_notify_barrier when available
+    if server::is_reload_request() {
+        check_service_is_active(service_name).await?;
+    }
+
     log::info!("daemon shut down...");
     Ok(())
 }
 
+pub async fn check_service_is_active(service: &str) -> Result<(), Error> {
+    for _ in 0..5 {
+        tokio::time::delay_for(std::time::Duration::new(5, 0)).await;
+        if let Ok(output) = tokio::process::Command::new("systemctl")
+            .args(&["is-active", service])
+            .output()
+            .await
+        {
+            if let Ok(text) = String::from_utf8(output.stdout) {
+                if text.trim().trim_start() != "reloading" {
+                    return Ok(());
+                }
+            }
+        }
+    }
+
+    Ok(())
+}
+
 #[link(name = "systemd")]
 extern "C" {
     fn sd_notify(unset_environment: c_int, state: *const c_char) -> c_int;
-- 
2.20.1