all lists on lists.proxmox.com
 help / color / mirror / Atom feed
* [pmg-devel] [PATCH pmg-api master v2] systemd: fix report services failing if triggered too early by timers
@ 2025-09-23 16:47 Max R. Carrara
  2025-09-23 19:12 ` [pmg-devel] applied: " Stoiko Ivanov
  0 siblings, 1 reply; 2+ messages in thread
From: Max R. Carrara @ 2025-09-23 16:47 UTC (permalink / raw)
  To: pmg-devel

Currently, the `pmgreport.service` and `pmgspamreport.service` units
might fail if their corresponding timers activate them too early.

To elaborate, both timers have `Persistent=true` in addition to their
`OnCalendar` option. `Persistent=true` means that the timer's service
unit will be triggered immediately when the timer is activated, but
only if it would have been triggered while the timer was inactive [0].

Since the timers are activated relatively early, they might trigger
their service units before postfix.service and postgresql.service have
come up, causing `pmgreport.service`, or `pmgspamreport.service`, or
both of them to fail.

Fix this by letting both service units wait until postfix and postgres
are up, which are necessary for the units to run successfully. Do this
by adding the `After` and `Wants` options for `postfix.service` and
`postgresql.service` to both service units.

Other Solutions That Were Considered
------------------------------------

Removing `Persistent=true` from the timers was considered for a
moment, but this might actually cause reports to go missing if PMG is
rebooted or goes down at midnight (so, just before the timers
trigger). While this scenario is probably quite rare, it's not
necessarily unrealistic.

Still, `Persistent=true` will cause unnecessary reports to be sent if
PMG goes down for a prolonged amount of time. This is IMO an *okay*
tradeoff to have; I'd personally rather receive useless reports after
prolonged downtime instead of potentially important reports not being
sent at all just because a reboot happened to be poorly timed.

I also had a look at other possible timer options [0], but none of them
apply / are useful in this case.

For the `pmgreport.service` and `pmgspamreport.service` units
themselves, the `Restart`, `RestartSec`, [1] `StartLimitBurst` and
`StartLimitInterval` [2] options could also be set, but that doesn't
address the underlying issue of the units failing due to their
dependent services not being up (yet).

Additional context:
-------------------

While this is somewhat hard to encounter / debug under normal
circumstances, it is possible to make this race condition much more
apparent by adding an arbitrarily long delay to `postgresql.service`
and `postfix.service` by adding an override for each:

 # systemctl edit postgresql.service

Then add the following:

[Service]
ExecStartPre=-sleep 15

Do the same for `postfix.service`.

Afterwards, change both timers to activate a few seconds after every
boot by adding an override for each:

 # systemctl edit pmgreport.timer

Then add the following:

[Timer]
OnCalendar=
OnBootSec=5

Do the same for `pmgspamreport.timer`.

A reboot should now suffice to make the issue reproducible.
Conversely, the issue should not appear if this commit is applied.
(Also, don't forget to remove the overrides again after debugging.)

[0]: `man 5 systemd.timer`
[1]: `man 5 systemd.service`
[2]: `man 5 systemd.unit`

Signed-off-by: Max R. Carrara <m.carrara@proxmox.com>
---
Changes v1 --> v2:
- fix typo in commit message
- expand on other considered solutions in commit message
- add headings to commit message because it's gotten a bit large
- add logs here in the notes to show where it actually breaks

NOTE: As an example, here's the `pmgreport.service` unit failing, once
because postgres isn't up yet, and once because postfix isn't up yet:

```
× pmgreport.service - Send Daily System Report Mail
     Loaded: loaded (/usr/lib/systemd/system/pmgreport.service; static)
     Active: failed (Result: exit-code) since Mon 2025-09-22 11:38:04 CEST; 2min 36s ago
 Invocation: 8473dbfc856a45a2ba243217bd0d01b0
TriggeredBy: ● pmgreport.timer
    Process: 486 ExecStart=/usr/bin/pmgreport --timespan yesterday --auto (code=exited, status=2)
   Main PID: 486 (code=exited, status=2)
   Mem peak: 104.5M
        CPU: 375ms

Sep 22 11:38:03 pmg-9-alpha-01 systemd[1]: Starting pmgreport.service - Send Daily System Report Mail...
Sep 22 11:38:04 pmg-9-alpha-01 pmgreport[486]: DBI connect('dbname=Proxmox_ruledb;host=/var/run/postgresql;port=5432','root',...) failed: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
                                                       Is the server running locally and accepting connections on that socket? at /usr/share/perl5/PMG/DBTools.pm line 78.
Sep 22 11:38:04 pmg-9-alpha-01 pmgreport[486]: DBI connect('dbname=Proxmox_ruledb;host=/var/run/postgresql;port=5432','root',...) failed: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
Sep 22 11:38:04 pmg-9-alpha-01 pmgreport[486]:         Is the server running locally and accepting connections on that socket? at /usr/share/perl5/PMG/DBTools.pm line 78.
Sep 22 11:38:04 pmg-9-alpha-01 systemd[1]: pmgreport.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Sep 22 11:38:04 pmg-9-alpha-01 systemd[1]: pmgreport.service: Failed with result 'exit-code'.
Sep 22 11:38:04 pmg-9-alpha-01 systemd[1]: Failed to start pmgreport.service - Send Daily System Report Mail.
Sep 22 11:38:04 pmg-9-alpha-01 systemd[1]: pmgreport.service: Consumed 375ms CPU time, 104.5M memory peak.
```

```
○ pmgreport.service - Send Daily System Report Mail
     Loaded: loaded (/usr/lib/systemd/system/pmgreport.service; static)
    Drop-In: /etc/systemd/system/pmgreport.service.d
             └─override.conf
     Active: inactive (dead) since Tue 2025-09-23 17:35:27 CEST; 28s ago
 Invocation: 6881d3253ee64fe78d9940a613b9ecf2
TriggeredBy: ● pmgreport.timer
    Process: 640 ExecStart=/usr/bin/pmgreport --timespan yesterday --auto (code=exited, status=0/SUCCESS)
   Main PID: 640 (code=exited, status=0/SUCCESS)
   Mem peak: 107M
        CPU: 397ms

Sep 23 17:35:26 pmg-9-alpha-01 systemd[1]: Starting pmgreport.service - Send Daily System Report Mail...
Sep 23 17:35:26 pmg-9-alpha-01 pmgreport[640]: unable to connect to localhost at port 10025 at /usr/share/perl5/PMG/Utils.pm line 291.
Sep 23 17:35:27 pmg-9-alpha-01 systemd[1]: pmgreport.service: Deactivated successfully.
Sep 23 17:35:27 pmg-9-alpha-01 systemd[1]: Finished pmgreport.service - Send Daily System Report Mail.
Sep 23 17:35:27 pmg-9-alpha-01 systemd[1]: pmgreport.service: Consumed 397ms CPU time, 107M memory peak.
```

 debian/pmgreport.service     | 4 ++++
 debian/pmgspamreport.service | 4 ++++
 2 files changed, 8 insertions(+)

diff --git a/debian/pmgreport.service b/debian/pmgreport.service
index 6b05213..89e25c7 100644
--- a/debian/pmgreport.service
+++ b/debian/pmgreport.service
@@ -1,6 +1,10 @@
 [Unit]
 Description=Send Daily System Report Mail
 ConditionPathExists=/usr/bin/pmgreport
+After=postfix.service
+After=postgresql.service
+Wants=postfix.service
+Wants=postgresql.service
 
 [Service]
 Type=oneshot
diff --git a/debian/pmgspamreport.service b/debian/pmgspamreport.service
index a20214f..2b4f163 100644
--- a/debian/pmgspamreport.service
+++ b/debian/pmgspamreport.service
@@ -1,6 +1,10 @@
 [Unit]
 Description=Send Daily Spam Report Mails
 ConditionPathExists=/usr/bin/pmgqm
+After=postfix.service
+After=postgresql.service
+Wants=postfix.service
+Wants=postgresql.service
 
 [Service]
 Type=oneshot
-- 
2.47.3



_______________________________________________
pmg-devel mailing list
pmg-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pmg-devel

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [pmg-devel] applied: [PATCH pmg-api master v2] systemd: fix report services failing if triggered too early by timers
  2025-09-23 16:47 [pmg-devel] [PATCH pmg-api master v2] systemd: fix report services failing if triggered too early by timers Max R. Carrara
@ 2025-09-23 19:12 ` Stoiko Ivanov
  0 siblings, 0 replies; 2+ messages in thread
From: Stoiko Ivanov @ 2025-09-23 19:12 UTC (permalink / raw)
  To: Max R. Carrara; +Cc: pmg-devel

Thanks for the fast iteration and the expansive commit-message!
applied the patch!

On Tue, 23 Sep 2025 18:47:20 +0200
"Max R. Carrara" <m.carrara@proxmox.com> wrote:

> Currently, the `pmgreport.service` and `pmgspamreport.service` units
> might fail if their corresponding timers activate them too early.
> 
> To elaborate, both timers have `Persistent=true` in addition to their
> `OnCalendar` option. `Persistent=true` means that the timer's service
> unit will be triggered immediately when the timer is activated, but
> only if it would have been triggered while the timer was inactive [0].
> 
> Since the timers are activated relatively early, they might trigger
> their service units before postfix.service and postgresql.service have
> come up, causing `pmgreport.service`, or `pmgspamreport.service`, or
> both of them to fail.
> 
> Fix this by letting both service units wait until postfix and postgres
> are up, which are necessary for the units to run successfully. Do this
> by adding the `After` and `Wants` options for `postfix.service` and
> `postgresql.service` to both service units.
> 
> Other Solutions That Were Considered
> ------------------------------------
> 
> Removing `Persistent=true` from the timers was considered for a
> moment, but this might actually cause reports to go missing if PMG is
> rebooted or goes down at midnight (so, just before the timers
> trigger). While this scenario is probably quite rare, it's not
> necessarily unrealistic.
> 
> Still, `Persistent=true` will cause unnecessary reports to be sent if
> PMG goes down for a prolonged amount of time. This is IMO an *okay*
> tradeoff to have; I'd personally rather receive useless reports after
> prolonged downtime instead of potentially important reports not being
> sent at all just because a reboot happened to be poorly timed.
> 
> I also had a look at other possible timer options [0], but none of them
> apply / are useful in this case.
> 
> For the `pmgreport.service` and `pmgspamreport.service` units
> themselves, the `Restart`, `RestartSec`, [1] `StartLimitBurst` and
> `StartLimitInterval` [2] options could also be set, but that doesn't
> address the underlying issue of the units failing due to their
> dependent services not being up (yet).
> 
> Additional context:
> -------------------
> 
> While this is somewhat hard to encounter / debug under normal
> circumstances, it is possible to make this race condition much more
> apparent by adding an arbitrarily long delay to `postgresql.service`
> and `postfix.service` by adding an override for each:
> 
>  # systemctl edit postgresql.service
> 
> Then add the following:
> 
> [Service]
> ExecStartPre=-sleep 15
> 
> Do the same for `postfix.service`.
> 
> Afterwards, change both timers to activate a few seconds after every
> boot by adding an override for each:
> 
>  # systemctl edit pmgreport.timer
> 
> Then add the following:
> 
> [Timer]
> OnCalendar=
> OnBootSec=5
> 
> Do the same for `pmgspamreport.timer`.
> 
> A reboot should now suffice to make the issue reproducible.
> Conversely, the issue should not appear if this commit is applied.
> (Also, don't forget to remove the overrides again after debugging.)
> 
> [0]: `man 5 systemd.timer`
> [1]: `man 5 systemd.service`
> [2]: `man 5 systemd.unit`
> 
> Signed-off-by: Max R. Carrara <m.carrara@proxmox.com>
> ---
> Changes v1 --> v2:
> - fix typo in commit message
> - expand on other considered solutions in commit message
> - add headings to commit message because it's gotten a bit large
> - add logs here in the notes to show where it actually breaks
> 
> NOTE: As an example, here's the `pmgreport.service` unit failing, once
> because postgres isn't up yet, and once because postfix isn't up yet:
> 
> ```
> × pmgreport.service - Send Daily System Report Mail
>      Loaded: loaded (/usr/lib/systemd/system/pmgreport.service; static)
>      Active: failed (Result: exit-code) since Mon 2025-09-22 11:38:04 CEST; 2min 36s ago
>  Invocation: 8473dbfc856a45a2ba243217bd0d01b0
> TriggeredBy: ● pmgreport.timer
>     Process: 486 ExecStart=/usr/bin/pmgreport --timespan yesterday --auto (code=exited, status=2)
>    Main PID: 486 (code=exited, status=2)
>    Mem peak: 104.5M
>         CPU: 375ms
> 
> Sep 22 11:38:03 pmg-9-alpha-01 systemd[1]: Starting pmgreport.service - Send Daily System Report Mail...
> Sep 22 11:38:04 pmg-9-alpha-01 pmgreport[486]: DBI connect('dbname=Proxmox_ruledb;host=/var/run/postgresql;port=5432','root',...) failed: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
>                                                        Is the server running locally and accepting connections on that socket? at /usr/share/perl5/PMG/DBTools.pm line 78.
> Sep 22 11:38:04 pmg-9-alpha-01 pmgreport[486]: DBI connect('dbname=Proxmox_ruledb;host=/var/run/postgresql;port=5432','root',...) failed: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
> Sep 22 11:38:04 pmg-9-alpha-01 pmgreport[486]:         Is the server running locally and accepting connections on that socket? at /usr/share/perl5/PMG/DBTools.pm line 78.
> Sep 22 11:38:04 pmg-9-alpha-01 systemd[1]: pmgreport.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
> Sep 22 11:38:04 pmg-9-alpha-01 systemd[1]: pmgreport.service: Failed with result 'exit-code'.
> Sep 22 11:38:04 pmg-9-alpha-01 systemd[1]: Failed to start pmgreport.service - Send Daily System Report Mail.
> Sep 22 11:38:04 pmg-9-alpha-01 systemd[1]: pmgreport.service: Consumed 375ms CPU time, 104.5M memory peak.
> ```
> 
> ```
> ○ pmgreport.service - Send Daily System Report Mail
>      Loaded: loaded (/usr/lib/systemd/system/pmgreport.service; static)
>     Drop-In: /etc/systemd/system/pmgreport.service.d
>              └─override.conf
>      Active: inactive (dead) since Tue 2025-09-23 17:35:27 CEST; 28s ago
>  Invocation: 6881d3253ee64fe78d9940a613b9ecf2
> TriggeredBy: ● pmgreport.timer
>     Process: 640 ExecStart=/usr/bin/pmgreport --timespan yesterday --auto (code=exited, status=0/SUCCESS)
>    Main PID: 640 (code=exited, status=0/SUCCESS)
>    Mem peak: 107M
>         CPU: 397ms
> 
> Sep 23 17:35:26 pmg-9-alpha-01 systemd[1]: Starting pmgreport.service - Send Daily System Report Mail...
> Sep 23 17:35:26 pmg-9-alpha-01 pmgreport[640]: unable to connect to localhost at port 10025 at /usr/share/perl5/PMG/Utils.pm line 291.
> Sep 23 17:35:27 pmg-9-alpha-01 systemd[1]: pmgreport.service: Deactivated successfully.
> Sep 23 17:35:27 pmg-9-alpha-01 systemd[1]: Finished pmgreport.service - Send Daily System Report Mail.
> Sep 23 17:35:27 pmg-9-alpha-01 systemd[1]: pmgreport.service: Consumed 397ms CPU time, 107M memory peak.
> ```
> 
>  debian/pmgreport.service     | 4 ++++
>  debian/pmgspamreport.service | 4 ++++
>  2 files changed, 8 insertions(+)
> 
> diff --git a/debian/pmgreport.service b/debian/pmgreport.service
> index 6b05213..89e25c7 100644
> --- a/debian/pmgreport.service
> +++ b/debian/pmgreport.service
> @@ -1,6 +1,10 @@
>  [Unit]
>  Description=Send Daily System Report Mail
>  ConditionPathExists=/usr/bin/pmgreport
> +After=postfix.service
> +After=postgresql.service
> +Wants=postfix.service
> +Wants=postgresql.service
>  
>  [Service]
>  Type=oneshot
> diff --git a/debian/pmgspamreport.service b/debian/pmgspamreport.service
> index a20214f..2b4f163 100644
> --- a/debian/pmgspamreport.service
> +++ b/debian/pmgspamreport.service
> @@ -1,6 +1,10 @@
>  [Unit]
>  Description=Send Daily Spam Report Mails
>  ConditionPathExists=/usr/bin/pmgqm
> +After=postfix.service
> +After=postgresql.service
> +Wants=postfix.service
> +Wants=postgresql.service
>  
>  [Service]
>  Type=oneshot



_______________________________________________
pmg-devel mailing list
pmg-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pmg-devel

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-09-23 19:12 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-23 16:47 [pmg-devel] [PATCH pmg-api master v2] systemd: fix report services failing if triggered too early by timers Max R. Carrara
2025-09-23 19:12 ` [pmg-devel] applied: " Stoiko Ivanov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal