From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 9012E1FF13A for ; Wed, 01 Apr 2026 13:13:27 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 0550A18558; Wed, 1 Apr 2026 13:13:56 +0200 (CEST) Message-ID: Date: Wed, 1 Apr 2026 13:13:50 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH proxmox-backup v6 18/22] datastore: add thresholds notification callback on datastore lookup To: Thomas Lamprecht , pbs-devel@lists.proxmox.com References: <20260319094100.240765-1-c.ebner@proxmox.com> <20260319094100.240765-31-c.ebner@proxmox.com> Content-Language: en-US, de-DE From: Christian Ebner In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1775041973959 X-SPAM-LEVEL: Spam detection results: 0 AWL -1.436 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 1 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 1 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 1 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: UGV7NQT7MSMZ7DUF3PQ4YTKIPYA42B6O X-Message-ID-Hash: UGV7NQT7MSMZ7DUF3PQ4YTKIPYA42B6O X-MailFrom: c.ebner@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox Backup Server development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On 3/30/26 10:18 PM, Thomas Lamprecht wrote: > Am 19.03.26 um 10:41 schrieb Christian Ebner: >> Pass in the callback to be executed when a datastore threshold value >> exceeds its set limit. This callback is then callable by the s3 >> client request counters implementation, currently the only component >> defining thresholds. >> >> When a threshold is exceeded, the notification is generated by the >> callback and queued for being send. >> >> Signed-off-by: Christian Ebner >> --- >> pbs-datastore/src/datastore.rs | 36 ++++++++++++++++++++++++---- >> pbs-datastore/src/snapshot_reader.rs | 1 + >> src/tools/mod.rs | 9 ++++++- >> 3 files changed, 41 insertions(+), 5 deletions(-) >> >> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs > [...] >> @@ -710,6 +728,16 @@ impl DataStore { >> let mut request_counters = Self::request_counters(&config, &backend_config)?; >> >> Self::update_notification_thresholds(&mut request_counters, &config)?; >> + request_counters.set_thresholds_exceeded_callback( >> + config.name, >> + Box::new(move |label, counter_id, threshold, value| { >> + thresholds_exceeded_callback.map(|cb| { >> + if let Err(err) = cb(label, counter_id, threshold, value) { >> + log::error!("failed to send notification: {err:?}"); >> + } >> + }); >> + }), >> + ); >> >> (Some(cache), Some(Arc::new(request_counters))) >> } else { > > >> diff --git a/src/tools/mod.rs b/src/tools/mod.rs > [...] >> @@ -192,5 +193,11 @@ pub(crate) fn backup_info_to_snapshot_list_item( >> /// Lookup the datastore by name with given operation. >> #[inline(always)] >> pub fn lookup_with<'a>(name: &'a str, operation: Operation) -> DataStoreLookup<'a> { >> - DataStoreLookup::with(name, operation) >> + DataStoreLookup::with( >> + name, >> + operation, >> + Arc::new(Some( >> + crate::server::notifications::send_datastore_threshold_exceeded, > > But in the s3-client's add_download_traffic we _always_ trigger this threshold, > and this callback _always_ sends a notification. This seems like allowing > for a notification storm? > > I do not recall any generic debouncing or rate limit logic in the notification > stack, or am I mistaken? > > Because if not, then we really should get some basic rate limiting done here, > like only trigger a notification if the previous one was at least X minutes > ago. > > As on a big (e.g. initial) sync with a underestimated low limit chosen this > could cause millions of notifications - having dealt with some software going > "nuts" w.r.t. mails only a few days ago, I really would like to avoid this, > as it can be more than annoying (e.g., mail server might buckle up on that > amount of mails in the queue). > > But this - if it really does apply and I did not just overlook something - > seems to be the only bigger issue to me. And FWIW, we can also follow up on > this, e.g. I could apply the proxmox parts already. So I double checked now with some more context: This should not be problematic as SharedRequestCounters::add_download_traffic() and SharedRequestCounters::add_upload_traffic() will only send the notification *once* when the traffic volume passes the set threshold value, since it is checked that the previous counter state was below the threshold and the new counter value is above. And since these counters are atomic and shared, only one client instance will trigger the notification callback. ``` pub fn add_download_traffic(&self, cb_label: &str, count: u64, ordering: Ordering) -> u64 { let prev = self.download.0.fetch_add(count, ordering); let threshold = self.download_threshold.0.load(Ordering::Acquire); let downloaded = prev + count; if threshold > 0 && downloaded > threshold && prev <= threshold { let guard = SHARED_COUNTER_THRESHOLD_EXCEEDED_CALLBACK.read().unwrap(); if let Some(callback) = guard.as_ref() { callback(cb_label, "downloaded", threshold, downloaded); } } prev } ``` Same holds for the request counter thresholds, also there the notification callback will only be triggered once when the new value is exceeding the threshold, while the old value did equal the threshold. The only case where it will be problematic is a configuration where the counters will be reset with a high frequency schedule and the threshold values being low enough to reach the threshold before being reset again. Given this, I think it should be enough to add a simple time based de-bouncing logic to each callback call side, with a sensible hard coded timeout of maybe an hour and log de-bounced attempts. What do you think of that?