public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH manager] api: replication: don't send mails about failed replication only once
@ 2022-04-22 12:15 Fabian Ebner
  2022-04-27  8:30 ` [pve-devel] applied: " Thomas Lamprecht
  0 siblings, 1 reply; 2+ messages in thread
From: Fabian Ebner @ 2022-04-22 12:15 UTC (permalink / raw)
  To: pve-devel

but rather multiple times becoming exponentially less frequent.

Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
 PVE/API2/Replication.pm | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/PVE/API2/Replication.pm b/PVE/API2/Replication.pm
index a1bcd89b..a7e54497 100644
--- a/PVE/API2/Replication.pm
+++ b/PVE/API2/Replication.pm
@@ -74,6 +74,19 @@ sub run_single_job {
 sub run_jobs {
     my ($now, $logfunc, $verbose, $mail) = @_;
 
+    my $mail_at_fail_count = sub {
+	my ($fail_count) = @_;
+
+	return 1 if $fail_count == 1;
+
+	# failing job is re-tried every half hour, try to send one mail after 1, 2, 4, 8, etc. days
+	my $i = 1;
+	while ($i * 48 < $fail_count) {
+	    $i = $i * 2;
+	}
+	return $i * 48 == $fail_count;
+    };
+
     my $iteration = $now // time();
 
     my $code = sub {
@@ -93,7 +106,7 @@ sub run_jobs {
 		my $jobstate = PVE::ReplicationState::extract_job_state($state, $jobcfg);
 		eval {
 		    PVE::Tools::sendmail('root', "Replication Job: $jobcfg->{id} failed", $err)
-			if $jobstate->{fail_count} == 1 && $mail;
+			if $mail && $mail_at_fail_count->($jobstate->{fail_count});
 		};
 		warn ": $@" if $@;
 	    }
-- 
2.30.2





^ permalink raw reply	[flat|nested] 2+ messages in thread

* [pve-devel] applied: [PATCH manager] api: replication: don't send mails about failed replication only once
  2022-04-22 12:15 [pve-devel] [PATCH manager] api: replication: don't send mails about failed replication only once Fabian Ebner
@ 2022-04-27  8:30 ` Thomas Lamprecht
  0 siblings, 0 replies; 2+ messages in thread
From: Thomas Lamprecht @ 2022-04-27  8:30 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fabian Ebner

On 22.04.22 14:15, Fabian Ebner wrote:
> but rather multiple times becoming exponentially less frequent.
> 
> Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
> ---
>  PVE/API2/Replication.pm | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
>

applied, thanks!

Added a follow up to always mail the first three tries to notice the admin that the
problem wasn't just a fluke.

I'm now also including some relevant info in the mail, like schedule, last successful
sync and next sync try, the actual fail count so that an admin can tell how often it
was tried (as the mail frequency doesn't matches that) and a note about reducing
notification frequency at $fail_count == 3, to avoid confusion.





^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-04-27  8:31 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-22 12:15 [pve-devel] [PATCH manager] api: replication: don't send mails about failed replication only once Fabian Ebner
2022-04-27  8:30 ` [pve-devel] applied: " Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal