* [pbs-devel] [PATCH proxmox-backup] tools/daemon: improve reload behaviour
@ 2020-12-16 8:12 Dominik Csapak
2020-12-16 10:21 ` Fabian Ebner
2020-12-17 12:49 ` Thomas Lamprecht
0 siblings, 2 replies; 4+ messages in thread
From: Dominik Csapak @ 2020-12-16 8:12 UTC (permalink / raw)
To: pbs-devel
it seems that sometimes, the child process signal gets handled
before the parent process signal. Systemd then ignores the
childs signal (finished reloading) and only after going into
reloading state because of the parent. this will never finish.
Instead, wait for the state to change to 'reloading' after sending
that signal in the parent, an only fork afterwards. This way
we ensure that systemd knows about the reloading before actually trying
to do it.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
this all goes away with systemds notify barrier hopefully....
src/tools/daemon.rs | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/src/tools/daemon.rs b/src/tools/daemon.rs
index 6bb4a41b..2aa52772 100644
--- a/src/tools/daemon.rs
+++ b/src/tools/daemon.rs
@@ -291,6 +291,7 @@ where
if let Err(e) = systemd_notify(SystemdNotify::Reloading) {
log::error!("failed to notify systemd about the state change: {}", e);
}
+ wait_service_is_active_or_reloading(service_name, true).await?;
if let Err(e) = reloader.take().unwrap().fork_restart() {
log::error!("error during reload: {}", e);
let _ = systemd_notify(SystemdNotify::Status("error during reload".to_string()));
@@ -305,7 +306,7 @@ where
// FIXME: this is a hack, replace with sd_notify_barrier when available
if server::is_reload_request() {
- wait_service_is_active(service_name).await?;
+ wait_service_is_active_or_reloading(service_name, false).await?;
}
log::info!("daemon shut down...");
@@ -313,7 +314,7 @@ where
}
// hack, do not use if unsure!
-async fn wait_service_is_active(service: &str) -> Result<(), Error> {
+async fn wait_service_is_active_or_reloading(service: &str, wait_for_reload: bool) -> Result<(), Error> {
tokio::time::delay_for(std::time::Duration::new(1, 0)).await;
loop {
let text = match tokio::process::Command::new("systemctl")
@@ -328,7 +329,8 @@ async fn wait_service_is_active(service: &str) -> Result<(), Error> {
Err(err) => bail!("executing 'systemctl is-active' failed - {}", err),
};
- if text.trim().trim_start() != "reloading" {
+ let is_reload = text.trim().trim_start() == "reloading";
+ if is_reload == wait_for_reload {
return Ok(());
}
tokio::time::delay_for(std::time::Duration::new(5, 0)).await;
--
2.20.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup] tools/daemon: improve reload behaviour
2020-12-16 8:12 [pbs-devel] [PATCH proxmox-backup] tools/daemon: improve reload behaviour Dominik Csapak
@ 2020-12-16 10:21 ` Fabian Ebner
2020-12-16 12:09 ` Fabian Ebner
2020-12-17 12:49 ` Thomas Lamprecht
1 sibling, 1 reply; 4+ messages in thread
From: Fabian Ebner @ 2020-12-16 10:21 UTC (permalink / raw)
To: pbs-devel, d.csapak
This fixes the postinst problem for me. I let dpkg -i run in a loop 50
times and it never got stuck anymore.
Tested-By: Fabian Ebner <f.ebner@proxmox.com>
Am 16.12.20 um 09:12 schrieb Dominik Csapak:
> it seems that sometimes, the child process signal gets handled
> before the parent process signal. Systemd then ignores the
> childs signal (finished reloading) and only after going into
> reloading state because of the parent. this will never finish.
>
> Instead, wait for the state to change to 'reloading' after sending
> that signal in the parent, an only fork afterwards. This way
> we ensure that systemd knows about the reloading before actually trying
> to do it.
>
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> this all goes away with systemds notify barrier hopefully....
>
> src/tools/daemon.rs | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/src/tools/daemon.rs b/src/tools/daemon.rs
> index 6bb4a41b..2aa52772 100644
> --- a/src/tools/daemon.rs
> +++ b/src/tools/daemon.rs
> @@ -291,6 +291,7 @@ where
> if let Err(e) = systemd_notify(SystemdNotify::Reloading) {
> log::error!("failed to notify systemd about the state change: {}", e);
> }
> + wait_service_is_active_or_reloading(service_name, true).await?;
> if let Err(e) = reloader.take().unwrap().fork_restart() {
> log::error!("error during reload: {}", e);
> let _ = systemd_notify(SystemdNotify::Status("error during reload".to_string()));
> @@ -305,7 +306,7 @@ where
>
> // FIXME: this is a hack, replace with sd_notify_barrier when available
> if server::is_reload_request() {
> - wait_service_is_active(service_name).await?;
> + wait_service_is_active_or_reloading(service_name, false).await?;
> }
>
> log::info!("daemon shut down...");
> @@ -313,7 +314,7 @@ where
> }
>
> // hack, do not use if unsure!
> -async fn wait_service_is_active(service: &str) -> Result<(), Error> {
> +async fn wait_service_is_active_or_reloading(service: &str, wait_for_reload: bool) -> Result<(), Error> {
> tokio::time::delay_for(std::time::Duration::new(1, 0)).await;
> loop {
> let text = match tokio::process::Command::new("systemctl")
> @@ -328,7 +329,8 @@ async fn wait_service_is_active(service: &str) -> Result<(), Error> {
> Err(err) => bail!("executing 'systemctl is-active' failed - {}", err),
> };
>
> - if text.trim().trim_start() != "reloading" {
> + let is_reload = text.trim().trim_start() == "reloading";
> + if is_reload == wait_for_reload {
> return Ok(());
> }
> tokio::time::delay_for(std::time::Duration::new(5, 0)).await;
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup] tools/daemon: improve reload behaviour
2020-12-16 10:21 ` Fabian Ebner
@ 2020-12-16 12:09 ` Fabian Ebner
0 siblings, 0 replies; 4+ messages in thread
From: Fabian Ebner @ 2020-12-16 12:09 UTC (permalink / raw)
To: pbs-devel, Dietmar Maurer, d.csapak
Dietmar suggested testing it some more. It now ran 1500 times in total
with various different 'stress' workloads on the side.
Am 16.12.20 um 11:21 schrieb Fabian Ebner:
> This fixes the postinst problem for me. I let dpkg -i run in a loop 50
> times and it never got stuck anymore.
>
> Tested-By: Fabian Ebner <f.ebner@proxmox.com>
>
> Am 16.12.20 um 09:12 schrieb Dominik Csapak:
>> it seems that sometimes, the child process signal gets handled
>> before the parent process signal. Systemd then ignores the
>> childs signal (finished reloading) and only after going into
>> reloading state because of the parent. this will never finish.
>>
>> Instead, wait for the state to change to 'reloading' after sending
>> that signal in the parent, an only fork afterwards. This way
>> we ensure that systemd knows about the reloading before actually trying
>> to do it.
>>
>> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
>> ---
>> this all goes away with systemds notify barrier hopefully....
>>
>> src/tools/daemon.rs | 8 +++++---
>> 1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/tools/daemon.rs b/src/tools/daemon.rs
>> index 6bb4a41b..2aa52772 100644
>> --- a/src/tools/daemon.rs
>> +++ b/src/tools/daemon.rs
>> @@ -291,6 +291,7 @@ where
>> if let Err(e) = systemd_notify(SystemdNotify::Reloading) {
>> log::error!("failed to notify systemd about the state
>> change: {}", e);
>> }
>> + wait_service_is_active_or_reloading(service_name, true).await?;
>> if let Err(e) = reloader.take().unwrap().fork_restart() {
>> log::error!("error during reload: {}", e);
>> let _ = systemd_notify(SystemdNotify::Status("error
>> during reload".to_string()));
>> @@ -305,7 +306,7 @@ where
>> // FIXME: this is a hack, replace with sd_notify_barrier when
>> available
>> if server::is_reload_request() {
>> - wait_service_is_active(service_name).await?;
>> + wait_service_is_active_or_reloading(service_name, false).await?;
>> }
>> log::info!("daemon shut down...");
>> @@ -313,7 +314,7 @@ where
>> }
>> // hack, do not use if unsure!
>> -async fn wait_service_is_active(service: &str) -> Result<(), Error> {
>> +async fn wait_service_is_active_or_reloading(service: &str,
>> wait_for_reload: bool) -> Result<(), Error> {
>> tokio::time::delay_for(std::time::Duration::new(1, 0)).await;
>> loop {
>> let text = match tokio::process::Command::new("systemctl")
>> @@ -328,7 +329,8 @@ async fn wait_service_is_active(service: &str) ->
>> Result<(), Error> {
>> Err(err) => bail!("executing 'systemctl is-active'
>> failed - {}", err),
>> };
>> - if text.trim().trim_start() != "reloading" {
>> + let is_reload = text.trim().trim_start() == "reloading";
>> + if is_reload == wait_for_reload {
>> return Ok(());
>> }
>> tokio::time::delay_for(std::time::Duration::new(5, 0)).await;
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup] tools/daemon: improve reload behaviour
2020-12-16 8:12 [pbs-devel] [PATCH proxmox-backup] tools/daemon: improve reload behaviour Dominik Csapak
2020-12-16 10:21 ` Fabian Ebner
@ 2020-12-17 12:49 ` Thomas Lamprecht
1 sibling, 0 replies; 4+ messages in thread
From: Thomas Lamprecht @ 2020-12-17 12:49 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Dominik Csapak
On 16/12/2020 09:12, Dominik Csapak wrote:
> it seems that sometimes, the child process signal gets handled
> before the parent process signal. Systemd then ignores the
> childs signal (finished reloading) and only after going into
> reloading state because of the parent. this will never finish.
>
> Instead, wait for the state to change to 'reloading' after sending
> that signal in the parent, an only fork afterwards. This way
> we ensure that systemd knows about the reloading before actually trying
> to do it.
>
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> this all goes away with systemds notify barrier hopefully....
>
> src/tools/daemon.rs | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/src/tools/daemon.rs b/src/tools/daemon.rs
> index 6bb4a41b..2aa52772 100644
> --- a/src/tools/daemon.rs
> +++ b/src/tools/daemon.rs
> @@ -291,6 +291,7 @@ where
> if let Err(e) = systemd_notify(SystemdNotify::Reloading) {
> log::error!("failed to notify systemd about the state change: {}", e);
> }
> + wait_service_is_active_or_reloading(service_name, true).await?;
> if let Err(e) = reloader.take().unwrap().fork_restart() {
> log::error!("error during reload: {}", e);
> let _ = systemd_notify(SystemdNotify::Status("error during reload".to_string()));
> @@ -305,7 +306,7 @@ where
>
> // FIXME: this is a hack, replace with sd_notify_barrier when available
> if server::is_reload_request() {
> - wait_service_is_active(service_name).await?;
> + wait_service_is_active_or_reloading(service_name, false).await?;
> }
>
> log::info!("daemon shut down...");
> @@ -313,7 +314,7 @@ where
> }
>
> // hack, do not use if unsure!
> -async fn wait_service_is_active(service: &str) -> Result<(), Error> {
> +async fn wait_service_is_active_or_reloading(service: &str, wait_for_reload: bool) -> Result<(), Error> {
> tokio::time::delay_for(std::time::Duration::new(1, 0)).await;
> loop {
> let text = match tokio::process::Command::new("systemctl")
> @@ -328,7 +329,8 @@ async fn wait_service_is_active(service: &str) -> Result<(), Error> {
> Err(err) => bail!("executing 'systemctl is-active' failed - {}", err),
> };
>
> - if text.trim().trim_start() != "reloading" {
> + let is_reload = text.trim().trim_start() == "reloading";
> + if is_reload == wait_for_reload {
hmm, this feels and reads a bit weird, "wait_for_reload" false meaning
"wait for any status that is not reloading" seems subtle.
Combined with the name of the function this implies passing true means
that it will be waited until the service is active *or* reloading, and
passing false implies its just waited until active.
Can you rethink the interface here, maybe split it (I'm a bit buried in
other work for more useful thoughts/proposal - sorry).
> return Ok(());
> }
> tokio::time::delay_for(std::time::Duration::new(5, 0)).await;
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-12-17 12:50 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-16 8:12 [pbs-devel] [PATCH proxmox-backup] tools/daemon: improve reload behaviour Dominik Csapak
2020-12-16 10:21 ` Fabian Ebner
2020-12-16 12:09 ` Fabian Ebner
2020-12-17 12:49 ` Thomas Lamprecht
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox