public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
To: "Proxmox Backup Server development discussion"
	<pbs-devel@lists.proxmox.com>,
	"Fabian Grünbichler" <f.gruenbichler@proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup v5 3/4] partial fix #6049: datastore: use config fast-path in Drop
Date: Fri, 28 Nov 2025 10:03:27 +0100	[thread overview]
Message-ID: <a5867e40-676c-4c72-8942-05d8576128e1@proxmox.com> (raw)
In-Reply-To: <1764167584.qinchnekm5.astroid@yuna.none>

On 11/26/25 4:15 PM, Fabian Grünbichler wrote:
> On November 24, 2025 6:04 pm, Samuel Rufinatscha wrote:
>> The Drop impl of DataStore re-read datastore.cfg to decide whether
>> the entry should be evicted from the in-process cache (based on
>> maintenance mode’s clear_from_cache). During the investigation of
>> issue #6049 [1], a flamegraph [2] showed that the config reload in Drop
>> accounted for a measurable share of CPU time under load.
>>
>> This patch wires the datastore config fast path to the Drop
>> impl to eventually avoid an expensive config reload from disk to capture
>> the maintenance mandate. Also, to ensure the Drop handlers will detect
>> that a newer config exists / to mitigate usage of an eventually stale
>> cached entry, generation will not only be bumped on config save, but also
>> on re-read of the config file (slow path), if `update_cache = true`.
>>
>> Links
>>
>> [1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
>> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
>>
>> Fixes: #6049
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> Changes:
>>
>>  From v1 → v2
>> - Replace caching logic with the datastore_section_config_cached()
>>    helper.
>>
>>  From v2 → v3
>> No changes
>>
>>  From v3 → v4, thanks @Fabian
>> - Pass datastore_section_config_cached(false) in Drop to avoid
>>    concurrent cache updates.
>>
>>  From v4 → v5
>> - Rebased only, no changes
>>
>>   pbs-datastore/src/datastore.rs | 60 ++++++++++++++++++++++++++--------
>>   1 file changed, 47 insertions(+), 13 deletions(-)
>>
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index c9cb5d65..7638a899 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -225,15 +225,40 @@ impl Drop for DataStore {
>>               // remove datastore from cache iff
>>               //  - last task finished, and
>>               //  - datastore is in a maintenance mode that mandates it
>> -            let remove_from_cache = last_task
>> -                && pbs_config::datastore::config()
>> -                    .and_then(|(s, _)| s.lookup::<DataStoreConfig>("datastore", self.name()))
>> -                    .is_ok_and(|c| {
>> -                        c.get_maintenance_mode()
>> -                            .is_some_and(|m| m.clear_from_cache())
>> -                    });
> 
> old code here ignored parsing/locking/.. issues and just assumed if no
> config can be obtained nothing should be done..
> 
>> -
>> -            if remove_from_cache {
>> +
>> +            // first check: check if last task finished
>> +            if !last_task {
>> +                return;
>> +            }
>> +
>> +            let (section_config, _gen) = match datastore_section_config_cached(false) {
>> +                Ok(v) => v,
>> +                Err(err) => {
>> +                    log::error!(
>> +                        "failed to load datastore config in Drop for {} - {err}",
>> +                        self.name()
>> +                    );
>> +                    return;
>> +                }
>> +            };
>> +
>> +            let datastore_cfg: DataStoreConfig =
>> +                match section_config.lookup("datastore", self.name()) {
>> +                    Ok(cfg) => cfg,
>> +                    Err(err) => {
>> +                        log::error!(
>> +                            "failed to look up datastore '{}' in Drop - {err}",
>> +                            self.name()
>> +                        );
>> +                        return;
> 
> here we now have fancy error logging ;) which can be fine, but if we go
> from silently ignoring errors to logging them at error level that should
> be mentioned to make it clear that it is intentional.
> 

Makes sense, will mention that change in the commit message.

> besides that, the second error here means that the datastore was removed
> from the config in the meantime.. in which case we should probably
> remove it from the map as well, if is still there, even though we can't
> check the maintenance mode..
> 
>> +                    }
>> +                };
>> +
>> +            // second check: check maintenance mode mandate
> 
> what is a "maintenance mode mandate"? ;)
> 
> keeping it simple, why not just
> 
> // check if maintenance mode requires closing FDs
>

I see, will rephrase this, thanks!

>> +            if datastore_cfg
>> +                .get_maintenance_mode()
>> +                .is_some_and(|m| m.clear_from_cache())
>> +            {
>>                   DATASTORE_MAP.lock().unwrap().remove(self.name());
>>               }
>>           }
>> @@ -307,12 +332,12 @@ impl DatastoreThreadSettings {
>>   /// - If the cached generation matches the current generation, the
>>   ///   cached config is returned.
>>   /// - Otherwise the config is re-read from disk. If `update_cache` is
>> -///   `true`, the new config and current generation are stored in the
>> +///   `true`, the new config and bumped generation are stored in the
>>   ///   cache. Callers that set `update_cache = true` must hold the
>>   ///   datastore config lock to avoid racing with concurrent config
>>   ///   changes.
>>   /// - If `update_cache` is `false`, the freshly read config is returned
>> -///   but the cache is left unchanged.
>> +///   but the cache and generation are left unchanged.
>>   ///
>>   /// If `ConfigVersionCache` is not available, the config is always read
>>   /// from disk and `None` is returned as the generation.
>> @@ -333,14 +358,23 @@ fn datastore_section_config_cached(
> 
> does this part here make any sense in this patch?
> 
> we don't check the generation in the Drop handler anyway, so it will get
> the latest cached version, no matter what?
> 

we don't check the generation in the Drop handler, but the drop handler
depends on this to potentially get a most fresh cached version?

> we'd only end up in this part of the code via lookup_datastore, and only
> if:
> - the previous cached entry and the current one have a different
>    generation -> no need to bump again, the cache is already invalidated
> - there is no previous cached entry -> nothing to invalidate
> 
> I think this part should move to the next patch..

Shouldn't it be rather in PATCH 2 then, instead part of the TTL feature
Also I would adjust the comment below then, so that it doesn't
necessarily just benefit the drop handler that calls
datastore_section_config_cached(false) but would in general future uses
of datastore_section_config_cached(false)?

> 
>>           let (config_raw, _digest) = pbs_config::datastore::config()?;
>>           let config = Arc::new(config_raw);
>>   
>> +        let mut effective_gen = current_gen;
>>           if update_cache {
>> +            // Bump the generation. This ensures that Drop
>> +            // handlers will detect that a newer config exists
>> +            // and will not rely on a stale cached entry for
>> +            // maintenance mandate.
>> +            let prev_gen = version_cache.increase_datastore_generation();
>> +            effective_gen = prev_gen + 1;
>> +
>> +            // Persist
>>               *config_cache = Some(DatastoreConfigCache {
>>                   config: config.clone(),
>> -                last_generation: current_gen,
>> +                last_generation: effective_gen,
>>               });
>>           }
>>   
>> -        Ok((config, Some(current_gen)))
>> +        Ok((config, Some(effective_gen)))
>>       } else {
>>           // Fallback path, no config version cache: read datastore.cfg and return None as generation
>>           *config_cache = None;
>> -- 
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

  reply	other threads:[~2025-11-28  9:03 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-24 17:04 [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path Samuel Rufinatscha
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 1/4] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
2025-11-26 15:15   ` Fabian Grünbichler
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
2025-11-26 15:15   ` Fabian Grünbichler
2025-11-26 17:21     ` Samuel Rufinatscha
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 3/4] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
2025-11-26 15:15   ` Fabian Grünbichler
2025-11-28  9:03     ` Samuel Rufinatscha [this message]
2025-11-28 10:46       ` Fabian Grünbichler
2025-11-28 11:10         ` Samuel Rufinatscha
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
2025-11-26 15:15   ` Fabian Grünbichler
2025-11-26 15:16 ` [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path Fabian Grünbichler
2025-11-26 16:10   ` Samuel Rufinatscha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a5867e40-676c-4c72-8942-05d8576128e1@proxmox.com \
    --to=s.rufinatscha@proxmox.com \
    --cc=f.gruenbichler@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal