public inbox for pmg-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Stoiko Ivanov <s.ivanov@proxmox.com>
To: "Max R. Carrara" <m.carrara@proxmox.com>
Cc: pmg-devel@lists.proxmox.com
Subject: [pmg-devel] applied: [PATCH pmg-api master v2] systemd: fix report services failing if triggered too early by timers
Date: Tue, 23 Sep 2025 21:12:47 +0200	[thread overview]
Message-ID: <20250923211247.68a5af65@rosa.proxmox.com> (raw)
In-Reply-To: <20250923164723.532488-1-m.carrara@proxmox.com>

Thanks for the fast iteration and the expansive commit-message!
applied the patch!

On Tue, 23 Sep 2025 18:47:20 +0200
"Max R. Carrara" <m.carrara@proxmox.com> wrote:

> Currently, the `pmgreport.service` and `pmgspamreport.service` units
> might fail if their corresponding timers activate them too early.
> 
> To elaborate, both timers have `Persistent=true` in addition to their
> `OnCalendar` option. `Persistent=true` means that the timer's service
> unit will be triggered immediately when the timer is activated, but
> only if it would have been triggered while the timer was inactive [0].
> 
> Since the timers are activated relatively early, they might trigger
> their service units before postfix.service and postgresql.service have
> come up, causing `pmgreport.service`, or `pmgspamreport.service`, or
> both of them to fail.
> 
> Fix this by letting both service units wait until postfix and postgres
> are up, which are necessary for the units to run successfully. Do this
> by adding the `After` and `Wants` options for `postfix.service` and
> `postgresql.service` to both service units.
> 
> Other Solutions That Were Considered
> ------------------------------------
> 
> Removing `Persistent=true` from the timers was considered for a
> moment, but this might actually cause reports to go missing if PMG is
> rebooted or goes down at midnight (so, just before the timers
> trigger). While this scenario is probably quite rare, it's not
> necessarily unrealistic.
> 
> Still, `Persistent=true` will cause unnecessary reports to be sent if
> PMG goes down for a prolonged amount of time. This is IMO an *okay*
> tradeoff to have; I'd personally rather receive useless reports after
> prolonged downtime instead of potentially important reports not being
> sent at all just because a reboot happened to be poorly timed.
> 
> I also had a look at other possible timer options [0], but none of them
> apply / are useful in this case.
> 
> For the `pmgreport.service` and `pmgspamreport.service` units
> themselves, the `Restart`, `RestartSec`, [1] `StartLimitBurst` and
> `StartLimitInterval` [2] options could also be set, but that doesn't
> address the underlying issue of the units failing due to their
> dependent services not being up (yet).
> 
> Additional context:
> -------------------
> 
> While this is somewhat hard to encounter / debug under normal
> circumstances, it is possible to make this race condition much more
> apparent by adding an arbitrarily long delay to `postgresql.service`
> and `postfix.service` by adding an override for each:
> 
>  # systemctl edit postgresql.service
> 
> Then add the following:
> 
> [Service]
> ExecStartPre=-sleep 15
> 
> Do the same for `postfix.service`.
> 
> Afterwards, change both timers to activate a few seconds after every
> boot by adding an override for each:
> 
>  # systemctl edit pmgreport.timer
> 
> Then add the following:
> 
> [Timer]
> OnCalendar=
> OnBootSec=5
> 
> Do the same for `pmgspamreport.timer`.
> 
> A reboot should now suffice to make the issue reproducible.
> Conversely, the issue should not appear if this commit is applied.
> (Also, don't forget to remove the overrides again after debugging.)
> 
> [0]: `man 5 systemd.timer`
> [1]: `man 5 systemd.service`
> [2]: `man 5 systemd.unit`
> 
> Signed-off-by: Max R. Carrara <m.carrara@proxmox.com>
> ---
> Changes v1 --> v2:
> - fix typo in commit message
> - expand on other considered solutions in commit message
> - add headings to commit message because it's gotten a bit large
> - add logs here in the notes to show where it actually breaks
> 
> NOTE: As an example, here's the `pmgreport.service` unit failing, once
> because postgres isn't up yet, and once because postfix isn't up yet:
> 
> ```
> × pmgreport.service - Send Daily System Report Mail
>      Loaded: loaded (/usr/lib/systemd/system/pmgreport.service; static)
>      Active: failed (Result: exit-code) since Mon 2025-09-22 11:38:04 CEST; 2min 36s ago
>  Invocation: 8473dbfc856a45a2ba243217bd0d01b0
> TriggeredBy: ● pmgreport.timer
>     Process: 486 ExecStart=/usr/bin/pmgreport --timespan yesterday --auto (code=exited, status=2)
>    Main PID: 486 (code=exited, status=2)
>    Mem peak: 104.5M
>         CPU: 375ms
> 
> Sep 22 11:38:03 pmg-9-alpha-01 systemd[1]: Starting pmgreport.service - Send Daily System Report Mail...
> Sep 22 11:38:04 pmg-9-alpha-01 pmgreport[486]: DBI connect('dbname=Proxmox_ruledb;host=/var/run/postgresql;port=5432','root',...) failed: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
>                                                        Is the server running locally and accepting connections on that socket? at /usr/share/perl5/PMG/DBTools.pm line 78.
> Sep 22 11:38:04 pmg-9-alpha-01 pmgreport[486]: DBI connect('dbname=Proxmox_ruledb;host=/var/run/postgresql;port=5432','root',...) failed: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
> Sep 22 11:38:04 pmg-9-alpha-01 pmgreport[486]:         Is the server running locally and accepting connections on that socket? at /usr/share/perl5/PMG/DBTools.pm line 78.
> Sep 22 11:38:04 pmg-9-alpha-01 systemd[1]: pmgreport.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
> Sep 22 11:38:04 pmg-9-alpha-01 systemd[1]: pmgreport.service: Failed with result 'exit-code'.
> Sep 22 11:38:04 pmg-9-alpha-01 systemd[1]: Failed to start pmgreport.service - Send Daily System Report Mail.
> Sep 22 11:38:04 pmg-9-alpha-01 systemd[1]: pmgreport.service: Consumed 375ms CPU time, 104.5M memory peak.
> ```
> 
> ```
> ○ pmgreport.service - Send Daily System Report Mail
>      Loaded: loaded (/usr/lib/systemd/system/pmgreport.service; static)
>     Drop-In: /etc/systemd/system/pmgreport.service.d
>              └─override.conf
>      Active: inactive (dead) since Tue 2025-09-23 17:35:27 CEST; 28s ago
>  Invocation: 6881d3253ee64fe78d9940a613b9ecf2
> TriggeredBy: ● pmgreport.timer
>     Process: 640 ExecStart=/usr/bin/pmgreport --timespan yesterday --auto (code=exited, status=0/SUCCESS)
>    Main PID: 640 (code=exited, status=0/SUCCESS)
>    Mem peak: 107M
>         CPU: 397ms
> 
> Sep 23 17:35:26 pmg-9-alpha-01 systemd[1]: Starting pmgreport.service - Send Daily System Report Mail...
> Sep 23 17:35:26 pmg-9-alpha-01 pmgreport[640]: unable to connect to localhost at port 10025 at /usr/share/perl5/PMG/Utils.pm line 291.
> Sep 23 17:35:27 pmg-9-alpha-01 systemd[1]: pmgreport.service: Deactivated successfully.
> Sep 23 17:35:27 pmg-9-alpha-01 systemd[1]: Finished pmgreport.service - Send Daily System Report Mail.
> Sep 23 17:35:27 pmg-9-alpha-01 systemd[1]: pmgreport.service: Consumed 397ms CPU time, 107M memory peak.
> ```
> 
>  debian/pmgreport.service     | 4 ++++
>  debian/pmgspamreport.service | 4 ++++
>  2 files changed, 8 insertions(+)
> 
> diff --git a/debian/pmgreport.service b/debian/pmgreport.service
> index 6b05213..89e25c7 100644
> --- a/debian/pmgreport.service
> +++ b/debian/pmgreport.service
> @@ -1,6 +1,10 @@
>  [Unit]
>  Description=Send Daily System Report Mail
>  ConditionPathExists=/usr/bin/pmgreport
> +After=postfix.service
> +After=postgresql.service
> +Wants=postfix.service
> +Wants=postgresql.service
>  
>  [Service]
>  Type=oneshot
> diff --git a/debian/pmgspamreport.service b/debian/pmgspamreport.service
> index a20214f..2b4f163 100644
> --- a/debian/pmgspamreport.service
> +++ b/debian/pmgspamreport.service
> @@ -1,6 +1,10 @@
>  [Unit]
>  Description=Send Daily Spam Report Mails
>  ConditionPathExists=/usr/bin/pmgqm
> +After=postfix.service
> +After=postgresql.service
> +Wants=postfix.service
> +Wants=postgresql.service
>  
>  [Service]
>  Type=oneshot



_______________________________________________
pmg-devel mailing list
pmg-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pmg-devel

      reply	other threads:[~2025-09-23 19:12 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-23 16:47 [pmg-devel] " Max R. Carrara
2025-09-23 19:12 ` Stoiko Ivanov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250923211247.68a5af65@rosa.proxmox.com \
    --to=s.ivanov@proxmox.com \
    --cc=m.carrara@proxmox.com \
    --cc=pmg-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal