* [pmg-devel] [PATCH pmg-api master v2] systemd: fix report services failing if triggered too early by timers
@ 2025-09-23 16:47 Max R. Carrara
0 siblings, 0 replies; only message in thread
From: Max R. Carrara @ 2025-09-23 16:47 UTC (permalink / raw)
To: pmg-devel
Currently, the `pmgreport.service` and `pmgspamreport.service` units
might fail if their corresponding timers activate them too early.
To elaborate, both timers have `Persistent=true` in addition to their
`OnCalendar` option. `Persistent=true` means that the timer's service
unit will be triggered immediately when the timer is activated, but
only if it would have been triggered while the timer was inactive [0].
Since the timers are activated relatively early, they might trigger
their service units before postfix.service and postgresql.service have
come up, causing `pmgreport.service`, or `pmgspamreport.service`, or
both of them to fail.
Fix this by letting both service units wait until postfix and postgres
are up, which are necessary for the units to run successfully. Do this
by adding the `After` and `Wants` options for `postfix.service` and
`postgresql.service` to both service units.
Other Solutions That Were Considered
------------------------------------
Removing `Persistent=true` from the timers was considered for a
moment, but this might actually cause reports to go missing if PMG is
rebooted or goes down at midnight (so, just before the timers
trigger). While this scenario is probably quite rare, it's not
necessarily unrealistic.
Still, `Persistent=true` will cause unnecessary reports to be sent if
PMG goes down for a prolonged amount of time. This is IMO an *okay*
tradeoff to have; I'd personally rather receive useless reports after
prolonged downtime instead of potentially important reports not being
sent at all just because a reboot happened to be poorly timed.
I also had a look at other possible timer options [0], but none of them
apply / are useful in this case.
For the `pmgreport.service` and `pmgspamreport.service` units
themselves, the `Restart`, `RestartSec`, [1] `StartLimitBurst` and
`StartLimitInterval` [2] options could also be set, but that doesn't
address the underlying issue of the units failing due to their
dependent services not being up (yet).
Additional context:
-------------------
While this is somewhat hard to encounter / debug under normal
circumstances, it is possible to make this race condition much more
apparent by adding an arbitrarily long delay to `postgresql.service`
and `postfix.service` by adding an override for each:
# systemctl edit postgresql.service
Then add the following:
[Service]
ExecStartPre=-sleep 15
Do the same for `postfix.service`.
Afterwards, change both timers to activate a few seconds after every
boot by adding an override for each:
# systemctl edit pmgreport.timer
Then add the following:
[Timer]
OnCalendar=
OnBootSec=5
Do the same for `pmgspamreport.timer`.
A reboot should now suffice to make the issue reproducible.
Conversely, the issue should not appear if this commit is applied.
(Also, don't forget to remove the overrides again after debugging.)
[0]: `man 5 systemd.timer`
[1]: `man 5 systemd.service`
[2]: `man 5 systemd.unit`
Signed-off-by: Max R. Carrara <m.carrara@proxmox.com>
---
Changes v1 --> v2:
- fix typo in commit message
- expand on other considered solutions in commit message
- add headings to commit message because it's gotten a bit large
- add logs here in the notes to show where it actually breaks
NOTE: As an example, here's the `pmgreport.service` unit failing, once
because postgres isn't up yet, and once because postfix isn't up yet:
```
× pmgreport.service - Send Daily System Report Mail
Loaded: loaded (/usr/lib/systemd/system/pmgreport.service; static)
Active: failed (Result: exit-code) since Mon 2025-09-22 11:38:04 CEST; 2min 36s ago
Invocation: 8473dbfc856a45a2ba243217bd0d01b0
TriggeredBy: ● pmgreport.timer
Process: 486 ExecStart=/usr/bin/pmgreport --timespan yesterday --auto (code=exited, status=2)
Main PID: 486 (code=exited, status=2)
Mem peak: 104.5M
CPU: 375ms
Sep 22 11:38:03 pmg-9-alpha-01 systemd[1]: Starting pmgreport.service - Send Daily System Report Mail...
Sep 22 11:38:04 pmg-9-alpha-01 pmgreport[486]: DBI connect('dbname=Proxmox_ruledb;host=/var/run/postgresql;port=5432','root',...) failed: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
Is the server running locally and accepting connections on that socket? at /usr/share/perl5/PMG/DBTools.pm line 78.
Sep 22 11:38:04 pmg-9-alpha-01 pmgreport[486]: DBI connect('dbname=Proxmox_ruledb;host=/var/run/postgresql;port=5432','root',...) failed: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
Sep 22 11:38:04 pmg-9-alpha-01 pmgreport[486]: Is the server running locally and accepting connections on that socket? at /usr/share/perl5/PMG/DBTools.pm line 78.
Sep 22 11:38:04 pmg-9-alpha-01 systemd[1]: pmgreport.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Sep 22 11:38:04 pmg-9-alpha-01 systemd[1]: pmgreport.service: Failed with result 'exit-code'.
Sep 22 11:38:04 pmg-9-alpha-01 systemd[1]: Failed to start pmgreport.service - Send Daily System Report Mail.
Sep 22 11:38:04 pmg-9-alpha-01 systemd[1]: pmgreport.service: Consumed 375ms CPU time, 104.5M memory peak.
```
```
○ pmgreport.service - Send Daily System Report Mail
Loaded: loaded (/usr/lib/systemd/system/pmgreport.service; static)
Drop-In: /etc/systemd/system/pmgreport.service.d
└─override.conf
Active: inactive (dead) since Tue 2025-09-23 17:35:27 CEST; 28s ago
Invocation: 6881d3253ee64fe78d9940a613b9ecf2
TriggeredBy: ● pmgreport.timer
Process: 640 ExecStart=/usr/bin/pmgreport --timespan yesterday --auto (code=exited, status=0/SUCCESS)
Main PID: 640 (code=exited, status=0/SUCCESS)
Mem peak: 107M
CPU: 397ms
Sep 23 17:35:26 pmg-9-alpha-01 systemd[1]: Starting pmgreport.service - Send Daily System Report Mail...
Sep 23 17:35:26 pmg-9-alpha-01 pmgreport[640]: unable to connect to localhost at port 10025 at /usr/share/perl5/PMG/Utils.pm line 291.
Sep 23 17:35:27 pmg-9-alpha-01 systemd[1]: pmgreport.service: Deactivated successfully.
Sep 23 17:35:27 pmg-9-alpha-01 systemd[1]: Finished pmgreport.service - Send Daily System Report Mail.
Sep 23 17:35:27 pmg-9-alpha-01 systemd[1]: pmgreport.service: Consumed 397ms CPU time, 107M memory peak.
```
debian/pmgreport.service | 4 ++++
debian/pmgspamreport.service | 4 ++++
2 files changed, 8 insertions(+)
diff --git a/debian/pmgreport.service b/debian/pmgreport.service
index 6b05213..89e25c7 100644
--- a/debian/pmgreport.service
+++ b/debian/pmgreport.service
@@ -1,6 +1,10 @@
[Unit]
Description=Send Daily System Report Mail
ConditionPathExists=/usr/bin/pmgreport
+After=postfix.service
+After=postgresql.service
+Wants=postfix.service
+Wants=postgresql.service
[Service]
Type=oneshot
diff --git a/debian/pmgspamreport.service b/debian/pmgspamreport.service
index a20214f..2b4f163 100644
--- a/debian/pmgspamreport.service
+++ b/debian/pmgspamreport.service
@@ -1,6 +1,10 @@
[Unit]
Description=Send Daily Spam Report Mails
ConditionPathExists=/usr/bin/pmgqm
+After=postfix.service
+After=postgresql.service
+Wants=postfix.service
+Wants=postgresql.service
[Service]
Type=oneshot
--
2.47.3
_______________________________________________
pmg-devel mailing list
pmg-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pmg-devel
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2025-09-23 16:47 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-23 16:47 [pmg-devel] [PATCH pmg-api master v2] systemd: fix report services failing if triggered too early by timers Max R. Carrara
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox