From: Stoiko Ivanov <s.ivanov@proxmox.com>
To: "Max R. Carrara" <m.carrara@proxmox.com>
Cc: pmg-devel@lists.proxmox.com
Subject: [pmg-devel] applied: [PATCH pmg-api master v2] systemd: fix report services failing if triggered too early by timers
Date: Tue, 23 Sep 2025 21:12:47 +0200 [thread overview]
Message-ID: <20250923211247.68a5af65@rosa.proxmox.com> (raw)
In-Reply-To: <20250923164723.532488-1-m.carrara@proxmox.com>
Thanks for the fast iteration and the expansive commit-message!
applied the patch!
On Tue, 23 Sep 2025 18:47:20 +0200
"Max R. Carrara" <m.carrara@proxmox.com> wrote:
> Currently, the `pmgreport.service` and `pmgspamreport.service` units
> might fail if their corresponding timers activate them too early.
>
> To elaborate, both timers have `Persistent=true` in addition to their
> `OnCalendar` option. `Persistent=true` means that the timer's service
> unit will be triggered immediately when the timer is activated, but
> only if it would have been triggered while the timer was inactive [0].
>
> Since the timers are activated relatively early, they might trigger
> their service units before postfix.service and postgresql.service have
> come up, causing `pmgreport.service`, or `pmgspamreport.service`, or
> both of them to fail.
>
> Fix this by letting both service units wait until postfix and postgres
> are up, which are necessary for the units to run successfully. Do this
> by adding the `After` and `Wants` options for `postfix.service` and
> `postgresql.service` to both service units.
>
> Other Solutions That Were Considered
> ------------------------------------
>
> Removing `Persistent=true` from the timers was considered for a
> moment, but this might actually cause reports to go missing if PMG is
> rebooted or goes down at midnight (so, just before the timers
> trigger). While this scenario is probably quite rare, it's not
> necessarily unrealistic.
>
> Still, `Persistent=true` will cause unnecessary reports to be sent if
> PMG goes down for a prolonged amount of time. This is IMO an *okay*
> tradeoff to have; I'd personally rather receive useless reports after
> prolonged downtime instead of potentially important reports not being
> sent at all just because a reboot happened to be poorly timed.
>
> I also had a look at other possible timer options [0], but none of them
> apply / are useful in this case.
>
> For the `pmgreport.service` and `pmgspamreport.service` units
> themselves, the `Restart`, `RestartSec`, [1] `StartLimitBurst` and
> `StartLimitInterval` [2] options could also be set, but that doesn't
> address the underlying issue of the units failing due to their
> dependent services not being up (yet).
>
> Additional context:
> -------------------
>
> While this is somewhat hard to encounter / debug under normal
> circumstances, it is possible to make this race condition much more
> apparent by adding an arbitrarily long delay to `postgresql.service`
> and `postfix.service` by adding an override for each:
>
> # systemctl edit postgresql.service
>
> Then add the following:
>
> [Service]
> ExecStartPre=-sleep 15
>
> Do the same for `postfix.service`.
>
> Afterwards, change both timers to activate a few seconds after every
> boot by adding an override for each:
>
> # systemctl edit pmgreport.timer
>
> Then add the following:
>
> [Timer]
> OnCalendar=
> OnBootSec=5
>
> Do the same for `pmgspamreport.timer`.
>
> A reboot should now suffice to make the issue reproducible.
> Conversely, the issue should not appear if this commit is applied.
> (Also, don't forget to remove the overrides again after debugging.)
>
> [0]: `man 5 systemd.timer`
> [1]: `man 5 systemd.service`
> [2]: `man 5 systemd.unit`
>
> Signed-off-by: Max R. Carrara <m.carrara@proxmox.com>
> ---
> Changes v1 --> v2:
> - fix typo in commit message
> - expand on other considered solutions in commit message
> - add headings to commit message because it's gotten a bit large
> - add logs here in the notes to show where it actually breaks
>
> NOTE: As an example, here's the `pmgreport.service` unit failing, once
> because postgres isn't up yet, and once because postfix isn't up yet:
>
> ```
> × pmgreport.service - Send Daily System Report Mail
> Loaded: loaded (/usr/lib/systemd/system/pmgreport.service; static)
> Active: failed (Result: exit-code) since Mon 2025-09-22 11:38:04 CEST; 2min 36s ago
> Invocation: 8473dbfc856a45a2ba243217bd0d01b0
> TriggeredBy: ● pmgreport.timer
> Process: 486 ExecStart=/usr/bin/pmgreport --timespan yesterday --auto (code=exited, status=2)
> Main PID: 486 (code=exited, status=2)
> Mem peak: 104.5M
> CPU: 375ms
>
> Sep 22 11:38:03 pmg-9-alpha-01 systemd[1]: Starting pmgreport.service - Send Daily System Report Mail...
> Sep 22 11:38:04 pmg-9-alpha-01 pmgreport[486]: DBI connect('dbname=Proxmox_ruledb;host=/var/run/postgresql;port=5432','root',...) failed: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
> Is the server running locally and accepting connections on that socket? at /usr/share/perl5/PMG/DBTools.pm line 78.
> Sep 22 11:38:04 pmg-9-alpha-01 pmgreport[486]: DBI connect('dbname=Proxmox_ruledb;host=/var/run/postgresql;port=5432','root',...) failed: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
> Sep 22 11:38:04 pmg-9-alpha-01 pmgreport[486]: Is the server running locally and accepting connections on that socket? at /usr/share/perl5/PMG/DBTools.pm line 78.
> Sep 22 11:38:04 pmg-9-alpha-01 systemd[1]: pmgreport.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
> Sep 22 11:38:04 pmg-9-alpha-01 systemd[1]: pmgreport.service: Failed with result 'exit-code'.
> Sep 22 11:38:04 pmg-9-alpha-01 systemd[1]: Failed to start pmgreport.service - Send Daily System Report Mail.
> Sep 22 11:38:04 pmg-9-alpha-01 systemd[1]: pmgreport.service: Consumed 375ms CPU time, 104.5M memory peak.
> ```
>
> ```
> ○ pmgreport.service - Send Daily System Report Mail
> Loaded: loaded (/usr/lib/systemd/system/pmgreport.service; static)
> Drop-In: /etc/systemd/system/pmgreport.service.d
> └─override.conf
> Active: inactive (dead) since Tue 2025-09-23 17:35:27 CEST; 28s ago
> Invocation: 6881d3253ee64fe78d9940a613b9ecf2
> TriggeredBy: ● pmgreport.timer
> Process: 640 ExecStart=/usr/bin/pmgreport --timespan yesterday --auto (code=exited, status=0/SUCCESS)
> Main PID: 640 (code=exited, status=0/SUCCESS)
> Mem peak: 107M
> CPU: 397ms
>
> Sep 23 17:35:26 pmg-9-alpha-01 systemd[1]: Starting pmgreport.service - Send Daily System Report Mail...
> Sep 23 17:35:26 pmg-9-alpha-01 pmgreport[640]: unable to connect to localhost at port 10025 at /usr/share/perl5/PMG/Utils.pm line 291.
> Sep 23 17:35:27 pmg-9-alpha-01 systemd[1]: pmgreport.service: Deactivated successfully.
> Sep 23 17:35:27 pmg-9-alpha-01 systemd[1]: Finished pmgreport.service - Send Daily System Report Mail.
> Sep 23 17:35:27 pmg-9-alpha-01 systemd[1]: pmgreport.service: Consumed 397ms CPU time, 107M memory peak.
> ```
>
> debian/pmgreport.service | 4 ++++
> debian/pmgspamreport.service | 4 ++++
> 2 files changed, 8 insertions(+)
>
> diff --git a/debian/pmgreport.service b/debian/pmgreport.service
> index 6b05213..89e25c7 100644
> --- a/debian/pmgreport.service
> +++ b/debian/pmgreport.service
> @@ -1,6 +1,10 @@
> [Unit]
> Description=Send Daily System Report Mail
> ConditionPathExists=/usr/bin/pmgreport
> +After=postfix.service
> +After=postgresql.service
> +Wants=postfix.service
> +Wants=postgresql.service
>
> [Service]
> Type=oneshot
> diff --git a/debian/pmgspamreport.service b/debian/pmgspamreport.service
> index a20214f..2b4f163 100644
> --- a/debian/pmgspamreport.service
> +++ b/debian/pmgspamreport.service
> @@ -1,6 +1,10 @@
> [Unit]
> Description=Send Daily Spam Report Mails
> ConditionPathExists=/usr/bin/pmgqm
> +After=postfix.service
> +After=postgresql.service
> +Wants=postfix.service
> +Wants=postgresql.service
>
> [Service]
> Type=oneshot
_______________________________________________
pmg-devel mailing list
pmg-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pmg-devel
prev parent reply other threads:[~2025-09-23 19:12 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-23 16:47 [pmg-devel] " Max R. Carrara
2025-09-23 19:12 ` Stoiko Ivanov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250923211247.68a5af65@rosa.proxmox.com \
--to=s.ivanov@proxmox.com \
--cc=m.carrara@proxmox.com \
--cc=pmg-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.