From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id C20691FF137 for ; Tue, 31 Mar 2026 10:21:57 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 186BF13139; Tue, 31 Mar 2026 10:22:25 +0200 (CEST) Message-ID: <7eb684ef-9dc3-43ac-8aff-2ca7194becd8@proxmox.com> Date: Tue, 31 Mar 2026 10:22:21 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH proxmox-backup v6 18/22] datastore: add thresholds notification callback on datastore lookup To: Thomas Lamprecht , pbs-devel@lists.proxmox.com References: <20260319094100.240765-1-c.ebner@proxmox.com> <20260319094100.240765-31-c.ebner@proxmox.com> Content-Language: en-US, de-DE From: Christian Ebner In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1774945286319 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.061 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: X3JHXXUR23LHKUGX6E62Y2XLFBAUGS5S X-Message-ID-Hash: X3JHXXUR23LHKUGX6E62Y2XLFBAUGS5S X-MailFrom: c.ebner@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox Backup Server development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On 3/30/26 10:18 PM, Thomas Lamprecht wrote: > Am 19.03.26 um 10:41 schrieb Christian Ebner: >> Pass in the callback to be executed when a datastore threshold value >> exceeds its set limit. This callback is then callable by the s3 >> client request counters implementation, currently the only component >> defining thresholds. >> >> When a threshold is exceeded, the notification is generated by the >> callback and queued for being send. >> >> Signed-off-by: Christian Ebner >> --- >> pbs-datastore/src/datastore.rs | 36 ++++++++++++++++++++++++---- >> pbs-datastore/src/snapshot_reader.rs | 1 + >> src/tools/mod.rs | 9 ++++++- >> 3 files changed, 41 insertions(+), 5 deletions(-) >> >> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs > [...] >> @@ -710,6 +728,16 @@ impl DataStore { >> let mut request_counters = Self::request_counters(&config, &backend_config)?; >> >> Self::update_notification_thresholds(&mut request_counters, &config)?; >> + request_counters.set_thresholds_exceeded_callback( >> + config.name, >> + Box::new(move |label, counter_id, threshold, value| { >> + thresholds_exceeded_callback.map(|cb| { >> + if let Err(err) = cb(label, counter_id, threshold, value) { >> + log::error!("failed to send notification: {err:?}"); >> + } >> + }); >> + }), >> + ); >> >> (Some(cache), Some(Arc::new(request_counters))) >> } else { > > >> diff --git a/src/tools/mod.rs b/src/tools/mod.rs > [...] >> @@ -192,5 +193,11 @@ pub(crate) fn backup_info_to_snapshot_list_item( >> /// Lookup the datastore by name with given operation. >> #[inline(always)] >> pub fn lookup_with<'a>(name: &'a str, operation: Operation) -> DataStoreLookup<'a> { >> - DataStoreLookup::with(name, operation) >> + DataStoreLookup::with( >> + name, >> + operation, >> + Arc::new(Some( >> + crate::server::notifications::send_datastore_threshold_exceeded, > > But in the s3-client's add_download_traffic we _always_ trigger this threshold, > and this callback _always_ sends a notification. This seems like allowing > for a notification storm? That would be bad and was not the intention here. The intention was for the notification callback to be executed only once when the threshold barrier is being passed, not for all the subsequent traffic exceeding. That is, until the counters are reset or the thresholds adapted. Will double check! > > I do not recall any generic debouncing or rate limit logic in the notification > stack, or am I mistaken? No, I'm also not aware of any such logic being in place > > Because if not, then we really should get some basic rate limiting done here, > like only trigger a notification if the previous one was at least X minutes > ago. Yes, I agree that this should be implemented. As discussed off-list, using the token bucket filter to achieve this might be a good option, will have a look on how to achieve this and send a new version, thanks! > > As on a big (e.g. initial) sync with a underestimated low limit chosen this > could cause millions of notifications - having dealt with some software going > "nuts" w.r.t. mails only a few days ago, I really would like to avoid this, > as it can be more than annoying (e.g., mail server might buckle up on that > amount of mails in the queue). > > But this - if it really does apply and I did not just overlook something - > seems to be the only bigger issue to me. And FWIW, we can also follow up on > this, e.g. I could apply the proxmox parts already. I feel like it make sense to hold off until the next version, in case the threshold check logic needs further adaption.