From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pbs-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
	by lore.proxmox.com (Postfix) with ESMTPS id 9012E1FF13A
	for <inbox@lore.proxmox.com>; Wed, 01 Apr 2026 13:13:27 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 0550A18558;
	Wed,  1 Apr 2026 13:13:56 +0200 (CEST)
Message-ID: <e555d34f-32b2-4dff-a6a0-86f61e210fb5@proxmox.com>
Date: Wed, 1 Apr 2026 13:13:50 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [PATCH proxmox-backup v6 18/22] datastore: add thresholds
 notification callback on datastore lookup
To: Thomas Lamprecht <t.lamprecht@proxmox.com>, pbs-devel@lists.proxmox.com
References: <20260319094100.240765-1-c.ebner@proxmox.com>
 <20260319094100.240765-31-c.ebner@proxmox.com>
 <fe743865-b62c-47c5-843a-b2705e450292@proxmox.com>
Content-Language: en-US, de-DE
From: Christian Ebner <c.ebner@proxmox.com>
In-Reply-To: <fe743865-b62c-47c5-843a-b2705e450292@proxmox.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2
X-Bm-Transport-Timestamp: 1775041973959
X-SPAM-LEVEL: Spam detection results:  0
	AWL                    -1.436 Adjusted score from AWL reputation of From:
 address
	BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
	DMARC_MISSING             0.1 Missing DMARC policy
	KAM_DMARC_STATUS         0.01 Test Rule for DKIM or SPF Failure with Strict
 Alignment
	RCVD_IN_VALIDITY_CERTIFIED_BLOCKED      1 ADMINISTRATOR NOTICE: The query to
 Validity was blocked.  See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
	RCVD_IN_VALIDITY_RPBL_BLOCKED      1 ADMINISTRATOR NOTICE: The query to
 Validity was blocked.  See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
	RCVD_IN_VALIDITY_SAFE_BLOCKED      1 ADMINISTRATOR NOTICE: The query to
 Validity was blocked.  See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
	SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
	SPF_PASS               -0.001 SPF: sender matches SPF record
Message-ID-Hash: UGV7NQT7MSMZ7DUF3PQ4YTKIPYA42B6O
X-Message-ID-Hash: UGV7NQT7MSMZ7DUF3PQ4YTKIPYA42B6O
X-MailFrom: c.ebner@proxmox.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop;
 banned-address; emergency; member-moderation; nonmember-moderation;
 administrivia; implicit-dest; max-recipients; max-size; news-moderation;
 no-subject; digests; suspicious-header
X-Mailman-Version: 3.3.10
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Owner: <mailto:pbs-devel-owner@lists.proxmox.com>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Subscribe: <mailto:pbs-devel-join@lists.proxmox.com>
List-Unsubscribe: <mailto:pbs-devel-leave@lists.proxmox.com>

On 3/30/26 10:18 PM, Thomas Lamprecht wrote:
> Am 19.03.26 um 10:41 schrieb Christian Ebner:
>> Pass in the callback to be executed when a datastore threshold value
>> exceeds its set limit. This callback is then callable by the s3
>> client request counters implementation, currently the only component
>> defining thresholds.
>>
>> When a threshold is exceeded, the notification is generated by the
>> callback and queued for being send.
>>
>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>> ---
>>   pbs-datastore/src/datastore.rs       | 36 ++++++++++++++++++++++++----
>>   pbs-datastore/src/snapshot_reader.rs |  1 +
>>   src/tools/mod.rs                     |  9 ++++++-
>>   3 files changed, 41 insertions(+), 5 deletions(-)
>>
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>   [...]
>> @@ -710,6 +728,16 @@ impl DataStore {
>>                   let mut request_counters = Self::request_counters(&config, &backend_config)?;
>>   
>>                   Self::update_notification_thresholds(&mut request_counters, &config)?;
>> +                request_counters.set_thresholds_exceeded_callback(
>> +                    config.name,
>> +                    Box::new(move |label, counter_id, threshold, value| {
>> +                        thresholds_exceeded_callback.map(|cb| {
>> +                            if let Err(err) = cb(label, counter_id, threshold, value) {
>> +                                log::error!("failed to send notification: {err:?}");
>> +                            }
>> +                        });
>> +                    }),
>> +                );
>>   
>>                   (Some(cache), Some(Arc::new(request_counters)))
>>               } else {
> 
> 
>> diff --git a/src/tools/mod.rs b/src/tools/mod.rs
>   [...]
>> @@ -192,5 +193,11 @@ pub(crate) fn backup_info_to_snapshot_list_item(
>>   /// Lookup the datastore by name with given operation.
>>   #[inline(always)]
>>   pub fn lookup_with<'a>(name: &'a str, operation: Operation) -> DataStoreLookup<'a> {
>> -    DataStoreLookup::with(name, operation)
>> +    DataStoreLookup::with(
>> +        name,
>> +        operation,
>> +        Arc::new(Some(
>> +            crate::server::notifications::send_datastore_threshold_exceeded,
> 
> But in the s3-client's add_download_traffic we _always_ trigger this threshold,
> and this callback _always_ sends a notification. This seems like allowing
> for a notification storm?
> 
> I do not recall any generic debouncing or rate limit logic in the notification
> stack, or am I mistaken?
> 
> Because if not, then we really should get some basic rate limiting done here,
> like only trigger a notification if the previous one was at least X minutes
> ago.
> 
> As on a big (e.g. initial) sync with a underestimated low limit chosen this
> could cause millions of notifications - having dealt with some software going
> "nuts" w.r.t. mails only a few days ago, I really would like to avoid this,
> as it can be more than annoying (e.g., mail server might buckle up on that
> amount of mails in the queue).
> 
> But this - if it really does apply and I did not just overlook something -
> seems to be the only bigger issue to me. And FWIW, we can also follow up on
> this, e.g. I could apply the proxmox parts already.

So I double checked now with some more context:

This should not be problematic as 
SharedRequestCounters::add_download_traffic() and 
SharedRequestCounters::add_upload_traffic() will only send the 
notification *once* when the traffic volume passes the set threshold 
value, since it is checked that the previous counter state was below the 
threshold and the new counter value is above. And since these counters 
are atomic and shared, only one client instance will trigger the 
notification callback.

```
     pub fn add_download_traffic(&self, cb_label: &str, count: u64, 
ordering: Ordering) -> u64 {
         let prev = self.download.0.fetch_add(count, ordering);
         let threshold = self.download_threshold.0.load(Ordering::Acquire);
         let downloaded = prev + count;
         if threshold > 0 && downloaded > threshold && prev <= threshold {
             let guard = 
SHARED_COUNTER_THRESHOLD_EXCEEDED_CALLBACK.read().unwrap();
             if let Some(callback) = guard.as_ref() {
                 callback(cb_label, "downloaded", threshold, downloaded);
             }
         }
         prev
      }
```

Same holds for the request counter thresholds, also there the 
notification callback will only be triggered once when the new value is 
exceeding the threshold, while the old value did equal the threshold.

The only case where it will be problematic is a configuration where the 
counters will be reset with a high frequency schedule and the threshold 
values being low enough to reach the threshold before being reset again.

Given this, I think it should be enough to add a simple time based 
de-bouncing logic to each callback call side, with a sensible hard coded 
timeout of maybe an hour and log de-bounced attempts.

What do you think of that?