all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
To: "Proxmox Backup Server development discussion"
	<pbs-devel@lists.proxmox.com>,
	"Fabian Grünbichler" <f.gruenbichler@proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup v5 3/4] partial fix #6049: datastore: use config fast-path in Drop
Date: Fri, 28 Nov 2025 10:03:27 +0100	[thread overview]
Message-ID: <a5867e40-676c-4c72-8942-05d8576128e1@proxmox.com> (raw)
In-Reply-To: <1764167584.qinchnekm5.astroid@yuna.none>

On 11/26/25 4:15 PM, Fabian Grünbichler wrote:
> On November 24, 2025 6:04 pm, Samuel Rufinatscha wrote:
>> The Drop impl of DataStore re-read datastore.cfg to decide whether
>> the entry should be evicted from the in-process cache (based on
>> maintenance mode’s clear_from_cache). During the investigation of
>> issue #6049 [1], a flamegraph [2] showed that the config reload in Drop
>> accounted for a measurable share of CPU time under load.
>>
>> This patch wires the datastore config fast path to the Drop
>> impl to eventually avoid an expensive config reload from disk to capture
>> the maintenance mandate. Also, to ensure the Drop handlers will detect
>> that a newer config exists / to mitigate usage of an eventually stale
>> cached entry, generation will not only be bumped on config save, but also
>> on re-read of the config file (slow path), if `update_cache = true`.
>>
>> Links
>>
>> [1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
>> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
>>
>> Fixes: #6049
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> Changes:
>>
>>  From v1 → v2
>> - Replace caching logic with the datastore_section_config_cached()
>>    helper.
>>
>>  From v2 → v3
>> No changes
>>
>>  From v3 → v4, thanks @Fabian
>> - Pass datastore_section_config_cached(false) in Drop to avoid
>>    concurrent cache updates.
>>
>>  From v4 → v5
>> - Rebased only, no changes
>>
>>   pbs-datastore/src/datastore.rs | 60 ++++++++++++++++++++++++++--------
>>   1 file changed, 47 insertions(+), 13 deletions(-)
>>
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index c9cb5d65..7638a899 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -225,15 +225,40 @@ impl Drop for DataStore {
>>               // remove datastore from cache iff
>>               //  - last task finished, and
>>               //  - datastore is in a maintenance mode that mandates it
>> -            let remove_from_cache = last_task
>> -                && pbs_config::datastore::config()
>> -                    .and_then(|(s, _)| s.lookup::<DataStoreConfig>("datastore", self.name()))
>> -                    .is_ok_and(|c| {
>> -                        c.get_maintenance_mode()
>> -                            .is_some_and(|m| m.clear_from_cache())
>> -                    });
> 
> old code here ignored parsing/locking/.. issues and just assumed if no
> config can be obtained nothing should be done..
> 
>> -
>> -            if remove_from_cache {
>> +
>> +            // first check: check if last task finished
>> +            if !last_task {
>> +                return;
>> +            }
>> +
>> +            let (section_config, _gen) = match datastore_section_config_cached(false) {
>> +                Ok(v) => v,
>> +                Err(err) => {
>> +                    log::error!(
>> +                        "failed to load datastore config in Drop for {} - {err}",
>> +                        self.name()
>> +                    );
>> +                    return;
>> +                }
>> +            };
>> +
>> +            let datastore_cfg: DataStoreConfig =
>> +                match section_config.lookup("datastore", self.name()) {
>> +                    Ok(cfg) => cfg,
>> +                    Err(err) => {
>> +                        log::error!(
>> +                            "failed to look up datastore '{}' in Drop - {err}",
>> +                            self.name()
>> +                        );
>> +                        return;
> 
> here we now have fancy error logging ;) which can be fine, but if we go
> from silently ignoring errors to logging them at error level that should
> be mentioned to make it clear that it is intentional.
> 

Makes sense, will mention that change in the commit message.

> besides that, the second error here means that the datastore was removed
> from the config in the meantime.. in which case we should probably
> remove it from the map as well, if is still there, even though we can't
> check the maintenance mode..
> 
>> +                    }
>> +                };
>> +
>> +            // second check: check maintenance mode mandate
> 
> what is a "maintenance mode mandate"? ;)
> 
> keeping it simple, why not just
> 
> // check if maintenance mode requires closing FDs
>

I see, will rephrase this, thanks!

>> +            if datastore_cfg
>> +                .get_maintenance_mode()
>> +                .is_some_and(|m| m.clear_from_cache())
>> +            {
>>                   DATASTORE_MAP.lock().unwrap().remove(self.name());
>>               }
>>           }
>> @@ -307,12 +332,12 @@ impl DatastoreThreadSettings {
>>   /// - If the cached generation matches the current generation, the
>>   ///   cached config is returned.
>>   /// - Otherwise the config is re-read from disk. If `update_cache` is
>> -///   `true`, the new config and current generation are stored in the
>> +///   `true`, the new config and bumped generation are stored in the
>>   ///   cache. Callers that set `update_cache = true` must hold the
>>   ///   datastore config lock to avoid racing with concurrent config
>>   ///   changes.
>>   /// - If `update_cache` is `false`, the freshly read config is returned
>> -///   but the cache is left unchanged.
>> +///   but the cache and generation are left unchanged.
>>   ///
>>   /// If `ConfigVersionCache` is not available, the config is always read
>>   /// from disk and `None` is returned as the generation.
>> @@ -333,14 +358,23 @@ fn datastore_section_config_cached(
> 
> does this part here make any sense in this patch?
> 
> we don't check the generation in the Drop handler anyway, so it will get
> the latest cached version, no matter what?
> 

we don't check the generation in the Drop handler, but the drop handler
depends on this to potentially get a most fresh cached version?

> we'd only end up in this part of the code via lookup_datastore, and only
> if:
> - the previous cached entry and the current one have a different
>    generation -> no need to bump again, the cache is already invalidated
> - there is no previous cached entry -> nothing to invalidate
> 
> I think this part should move to the next patch..

Shouldn't it be rather in PATCH 2 then, instead part of the TTL feature
Also I would adjust the comment below then, so that it doesn't
necessarily just benefit the drop handler that calls
datastore_section_config_cached(false) but would in general future uses
of datastore_section_config_cached(false)?

> 
>>           let (config_raw, _digest) = pbs_config::datastore::config()?;
>>           let config = Arc::new(config_raw);
>>   
>> +        let mut effective_gen = current_gen;
>>           if update_cache {
>> +            // Bump the generation. This ensures that Drop
>> +            // handlers will detect that a newer config exists
>> +            // and will not rely on a stale cached entry for
>> +            // maintenance mandate.
>> +            let prev_gen = version_cache.increase_datastore_generation();
>> +            effective_gen = prev_gen + 1;
>> +
>> +            // Persist
>>               *config_cache = Some(DatastoreConfigCache {
>>                   config: config.clone(),
>> -                last_generation: current_gen,
>> +                last_generation: effective_gen,
>>               });
>>           }
>>   
>> -        Ok((config, Some(current_gen)))
>> +        Ok((config, Some(effective_gen)))
>>       } else {
>>           // Fallback path, no config version cache: read datastore.cfg and return None as generation
>>           *config_cache = None;
>> -- 
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

  reply	other threads:[~2025-11-28  9:03 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-24 17:04 [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path Samuel Rufinatscha
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 1/4] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
2025-11-26 15:15   ` Fabian Grünbichler
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
2025-11-26 15:15   ` Fabian Grünbichler
2025-11-26 17:21     ` Samuel Rufinatscha
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 3/4] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
2025-11-26 15:15   ` Fabian Grünbichler
2025-11-28  9:03     ` Samuel Rufinatscha [this message]
2025-11-28 10:46       ` Fabian Grünbichler
2025-11-28 11:10         ` Samuel Rufinatscha
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
2025-11-26 15:15   ` Fabian Grünbichler
2025-11-26 15:16 ` [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path Fabian Grünbichler
2025-11-26 16:10   ` Samuel Rufinatscha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a5867e40-676c-4c72-8942-05d8576128e1@proxmox.com \
    --to=s.rufinatscha@proxmox.com \
    --cc=f.gruenbichler@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal