all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
To: "Proxmox Backup Server development discussion"
	<pbs-devel@lists.proxmox.com>,
	"Fabian Grünbichler" <f.gruenbichler@proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup 1/3] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
Date: Thu, 13 Nov 2025 13:59:34 +0100	[thread overview]
Message-ID: <1c91221e-6979-4b38-9011-0b53921d4093@proxmox.com> (raw)
In-Reply-To: <1762936034.bqz00j2ard.astroid@yuna.none>

On 11/12/25 2:24 PM, Fabian Grünbichler wrote:
> On November 11, 2025 1:29 pm, Samuel Rufinatscha wrote:
>> Repeated /status requests caused lookup_datastore() to re-read and
>> parse datastore.cfg on every call. The issue was mentioned in report
>> #6049 [1]. cargo-flamegraph [2] confirmed that the hot path is
>> dominated by pbs_config::datastore::config() (config parsing).
>>
>> This patch adds a fast path to lookup_datastore() using the shared-
>> memory ConfigVersionCache generation counter for datastore.cfg. It
>> stores the last seen generation number alongside the cached
>> DataStoreImpl and, when a subsequent lookup sees the same generation,
>> we reuse the cached instance without re-reading the on-disk config. If
>> the generation differs (or the cache is unavailable), it falls back to
>> the existing slow path with no behavioral changes.
>>
>> Behavioral notes
>>
>> - The generation is bumped via the existing save_config() path, so
>>    API-driven config changes are detected immediately.
>> - Manual edits to datastore.cfg are not detected by this commit; a TTL
>>    guard is introduced in a later patch to cover that case.
>> - DataStore::drop still performs a config read on the common path; this
>>    is addressed in the next patch.
>>
>> Testing
>>
>> cargo-flamegraph confirms the config-parse hotspot in
>> lookup_datastore() disappears from the hot path.
>>
>> Additionally, an end-to-end benchmark was performed using the
>> `/status?verbose=0` API with 1000 datastores, 5 requests per store,
>> and 16-way parallelism:
>>
>> | Metric                   | Before | After  | Δ (abs) | Δ (%)   |
>> |--------------------------|:------:|:------:|:-------:|:-------:|
>> | Total time               | 13s    | 11s    | −2s     | −15.38% |
>> | Throughput (all rounds)  | 384.62 | 454.55 | +69.93  | +18.18% |
>> | Cold RPS (round #1)      | 76.92  | 90.91  | +13.99  | +18.19% |
>> | Warm RPS (rounds 2..N)   | 307.69 | 363.64 | +55.95  | +18.18% |
>>
>> Throughput improved by ~18% overall, with total time reduced by ~15%.
>> Warm lookups now reuse cached datastore instances and skip redundant
>> config parsing entirely. The first-access round also shows a similar
>> improvement, likely due to reduced contention and I/O when many stores
>> are accessed in parallel.
>>
>> Note: A second hotspot remains in Drop where a config read occurs; that
>> is addressed in the next commit.
> 
> it would be interesting to also have numbers for just lookup+Drop,
> without all the HTTP and TLS overhead to really isolate it. that should
> also make it easier to reliably benchmark with something like hyperfine
> ;)
> 

Good point, I will isolate the numbers - seems like the TLS overhead is 
quite huge. Thanks for the hyperfine reference!

> for my simple config (5 datastores) single-threaded lookup+drop of a
> single datastore 100k times gives around 1.31 speedup for the whole
> series. the slightly modified version from below (which basically runs
> the most expensive part of Drop only once) for the same test setup still
> gives a speedup of 1.17
> 

So the lookup optimization dominates the speedup if we hold longer to 
the datastores, great to see.

> re-running the same benchmark with a config with 1k datastores, querying> M datastores N times gives the following speedups:
> 
> M = 1, N = 1000: 15.6x faster
> M = 10, N = 100: 14.5x
> M = 100, N = 10: 8.8x
> M = 1000, N = 1: 1.8x (so this is basically showing the speedup of the
> improved Drop handling alone!)
> 
> and then a slightly modified version, that keeps the DataStore instance
> around until all N lookups for that datastore are done, then dropping
> them in bulk (which mimics having lots of lookups while a task is
> running, making the Drop handler only do the expensive part every once
> in a while when the last task for a given datastore exits):
> 
> use anyhow::Error;
> 
> use pbs_api_types::Operation;
> use pbs_datastore::DataStore;
> 
> fn main() -> Result<(), Error> {
>      let mut args = std::env::args();
>      args.next();
> 
>      let datastores = if let Some(n) = args.next() {
>          n.parse::<usize>()?
>      } else {
>          1000
>      };
> 
>      let iterations = if let Some(n) = args.next() {
>          n.parse::<usize>()?
>      } else {
>          1000
>      };
> 
>      for d in 1..=datastores {
>          let name = format!("pbs_bench_{d}");
>          let mut stores = Vec::with_capacity(iterations);
>          for i in 1..=iterations {
>              stores.push(DataStore::lookup_datastore(&name, Some(Operation::Write))?);
>          }
>      }
> 
>      Ok(())
> }
> 
> M = 1, N = 1000: 8.5x faster
> M = 10, N = 100: 7.9x
> M = 100, N = 10: 5.2x
> M = 1000, N = 1: 1.9x
> 
> looking at the flamegraph of this isolated benchmark it's now obvious
> that the remaining overhead is lockfiles and tracking the operations
> (both in lookup and when dropping)..
> 

Thanks a lot also for setting up the test environment and providing your 
numbers, which are helpful to compare against!

Regarding the overhead in lockfiles and tracking operations, it is a 
good confirmation that everything else on the hot path is optimized - I 
think the locks and tracking operations could maybe be revisited in a 
future effort!

> side-note: I actually found a bug in our operations tracking while
> benchmarking this series that gave me wildly different numbers because
> the "drop last task" part never got executed as a result:
> 
> https://lore.proxmox.com/pbs-devel/20251112131525.645971-1-f.gruenbichler@proxmox.com/T/#u
> 

Nice catch!

>>
>> Links
>>
>> [1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
>> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
>>
>> Fixes: #6049
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>>   pbs-config/src/config_version_cache.rs | 10 +++-
>>   pbs-datastore/src/datastore.rs         | 77 +++++++++++++++++---------
>>   2 files changed, 59 insertions(+), 28 deletions(-)
>>
>> diff --git a/pbs-config/src/config_version_cache.rs b/pbs-config/src/config_version_cache.rs
>> index e8fb994f..b875f7e0 100644
>> --- a/pbs-config/src/config_version_cache.rs
>> +++ b/pbs-config/src/config_version_cache.rs
>> @@ -26,7 +26,6 @@ struct ConfigVersionCacheDataInner {
>>       // Traffic control (traffic-control.cfg) generation/version.
>>       traffic_control_generation: AtomicUsize,
>>       // datastore (datastore.cfg) generation/version
>> -    // FIXME: remove with PBS 3.0
>>       datastore_generation: AtomicUsize,
>>       // Add further atomics here
>>   }
>> @@ -145,8 +144,15 @@ impl ConfigVersionCache {
>>               .fetch_add(1, Ordering::AcqRel);
>>       }
>>   
>> +    /// Returns the datastore generation number.
>> +    pub fn datastore_generation(&self) -> usize {
>> +        self.shmem
>> +            .data()
>> +            .datastore_generation
>> +            .load(Ordering::Acquire)
>> +    }
>> +
>>       /// Increase the datastore generation number.
>> -    // FIXME: remove with PBS 3.0 or make actually useful again in datastore lookup
>>       pub fn increase_datastore_generation(&self) -> usize {
>>           self.shmem
>>               .data()
> 
> this part could be split out into its own patch
> 

Will factor this out into a separate patch.

>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index 70af94d8..18eebb58 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -32,7 +32,7 @@ use pbs_api_types::{
>>       MaintenanceType, Operation, UPID,
>>   };
>>   use pbs_config::s3::S3_CFG_TYPE_ID;
>> -use pbs_config::BackupLockGuard;
>> +use pbs_config::{BackupLockGuard, ConfigVersionCache};
>>   
>>   use crate::backup_info::{
>>       BackupDir, BackupGroup, BackupInfo, OLD_LOCKING, PROTECTED_MARKER_FILENAME,
>> @@ -140,10 +140,10 @@ pub struct DataStoreImpl {
>>       last_gc_status: Mutex<GarbageCollectionStatus>,
>>       verify_new: bool,
>>       chunk_order: ChunkOrder,
>> -    last_digest: Option<[u8; 32]>,
>>       sync_level: DatastoreFSyncLevel,
>>       backend_config: DatastoreBackendConfig,
>>       lru_store_caching: Option<LocalDatastoreLruCache>,
>> +    cached_config_tag: Option<CachedDatastoreConfigTag>,
>>   }
>>   
>>   impl DataStoreImpl {
>> @@ -156,10 +156,10 @@ impl DataStoreImpl {
>>               last_gc_status: Mutex::new(GarbageCollectionStatus::default()),
>>               verify_new: false,
>>               chunk_order: Default::default(),
>> -            last_digest: None,
>>               sync_level: Default::default(),
>>               backend_config: Default::default(),
>>               lru_store_caching: None,
>> +            cached_config_tag: None,
>>           })
>>       }
>>   }
>> @@ -224,6 +224,15 @@ pub enum DatastoreBackend {
>>       S3(Arc<S3Client>),
>>   }
>>   
>> +/// Used to determine whether a cached datastore instance is still valid
>> +/// or needs to be reloaded after a config change.
>> +struct CachedDatastoreConfigTag {
>> +    /// Maintenance mode at the time the lookup hint was captured, if any.
>> +    last_maintenance_mode: Option<MaintenanceMode>,
>> +    /// Datastore generation number from `ConfigVersionCache`; `None` when the cache wasn't available.
>> +    last_generation: Option<usize>,
> 
> if the whole tag is an option, do we really need to make the generation
> an option as well?
> 

Good point, keeping the tag optional only is enough and will simplify 
generation checks.

>> +}
>> +
>>   impl DataStore {
>>       // This one just panics on everything
>>       #[doc(hidden)]
>> @@ -299,13 +308,40 @@ impl DataStore {
>>           // we use it to decide whether it is okay to delete the datastore.
>>           let _config_lock = pbs_config::datastore::lock_config()?;
>>   
>> -        // we could use the ConfigVersionCache's generation for staleness detection, but  we load
>> -        // the config anyway -> just use digest, additional benefit: manual changes get detected
>> -        let (config, digest) = pbs_config::datastore::config()?;
>> +        // Get the current datastore.cfg generation number
>> +        let gen_num = ConfigVersionCache::new()
>> +            .ok()
>> +            .map(|c| c.datastore_generation());
>> +
>> +        // Fast-path: if we have a cached entry created under the same datastore.cfg generation number, reuse it.
>> +        if let (Some(gen_num), Some(ds)) =
>> +            (gen_num, DATASTORE_MAP.lock().unwrap().get(name).cloned())
>> +        {
>> +            if let Some(cached_tag) = &ds.cached_config_tag {
>> +                if cached_tag.last_generation == Some(gen_num) {
>> +                    if let Some(mm) = &cached_tag.last_maintenance_mode {
>> +                        if let Err(error) = mm.check(operation) {
>> +                            bail!("datastore '{name}' is unavailable: {error}");
>> +                        }
>> +                    }
>> +                    if let Some(operation) = operation {
>> +                        update_active_operations(name, operation, 1)?;
>> +                    }
>> +                    return Ok(Arc::new(Self {
>> +                        inner: ds,
>> +                        operation,
>> +                    }));
>> +                }
>> +            }
>> +        }
>> +
>> +        // Slow path: (re)load config
>> +        let (config, _digest) = pbs_config::datastore::config()?;
>>           let config: DataStoreConfig = config.lookup("datastore", name)?;
>>   
>> -        if let Some(maintenance_mode) = config.get_maintenance_mode() {
>> -            if let Err(error) = maintenance_mode.check(operation) {
>> +        let maintenance_mode = config.get_maintenance_mode();
>> +        if let Some(mm) = &maintenance_mode {
>> +            if let Err(error) = mm.check(operation) {
>>                   bail!("datastore '{name}' is unavailable: {error}");
>>               }
>>           }
> 
> after this here we have a check for the mount status in the old hot
> path, that is missing in the new hot path. the mount status can change
> even if the config doesn't, so we should probably add this back to the
> hot path and re-run the numbers?
> 
> that check again needs more parts of the config, so maybe we could
> explore caching the full config here? e.g., add a new static
> Mutex<Option<(DataStoreConfig, usize)>> (extended by the timestamp in
> the last patch) and adapt the patches here to use it? depending on
> whether we make the cached config available outside of lookup_datastore,
> we could then even not add the maintenance mode to the cache tag, and
> just store the generation number there, and retrieve the maintenance
> mode from the cached config in the Drop implementation..
> 

For the mount check we will need device_uuid and datastore mount dir 
path which we could add to the cached entry, however I think I would 
also rather explore caching of the full global config here. This should 
result in further performance gains - this would/should allow for 
eventual fast-lookups to any other datastore. Will try to integrate it 
in v2.

> there's a little more duplication here, e.g. we lock the map and check
> for an entry in both the fast and slow paths, we could do it once up
> front (nobody can change the map while we have it locked anyway), the
> checks are written twice and could probably be extracted into a helper
> so that future similar checks are added to both paths and not to only
> one by accident, ..
> 

Good point, I will factor it out (e.g. small closure).

>> @@ -321,16 +357,6 @@ impl DataStore {
>>   
>>           // reuse chunk store so that we keep using the same process locker instance!
>>           let chunk_store = if let Some(datastore) = &entry {
>> -            let last_digest = datastore.last_digest.as_ref();
>> -            if let Some(true) = last_digest.map(|last_digest| last_digest == &digest) {
>> -                if let Some(operation) = operation {
>> -                    update_active_operations(name, operation, 1)?;
>> -                }
>> -                return Ok(Arc::new(Self {
>> -                    inner: Arc::clone(datastore),
>> -                    operation,
>> -                }));
>> -            }
>>               Arc::clone(&datastore.chunk_store)
>>           } else {
>>               let tuning: DatastoreTuning = serde_json::from_value(
>> @@ -344,7 +370,11 @@ impl DataStore {
>>               )?)
>>           };
>>   
>> -        let datastore = DataStore::with_store_and_config(chunk_store, config, Some(digest))?;
>> +        let mut datastore = DataStore::with_store_and_config(chunk_store, config)?;
>> +        datastore.cached_config_tag = Some(CachedDatastoreConfigTag {
>> +            last_maintenance_mode: maintenance_mode,
>> +            last_generation: gen_num,
>> +        });
> 
> this part should be in with_store_and_config, which should get the
> last_generation (and later last_update) as parameter(s), just like it
> had the digest before this patch..
> 

Agree, I will move this part.

>>   
>>           let datastore = Arc::new(datastore);
>>           datastore_cache.insert(name.to_string(), datastore.clone());
>> @@ -430,11 +460,7 @@ impl DataStore {
>>               config.absolute_path(),
>>               tuning.sync_level.unwrap_or_default(),
>>           )?;
>> -        let inner = Arc::new(Self::with_store_and_config(
>> -            Arc::new(chunk_store),
>> -            config,
>> -            None,
>> -        )?);
>> +        let inner = Arc::new(Self::with_store_and_config(Arc::new(chunk_store), config)?);
>>   
>>           if let Some(operation) = operation {
>>               update_active_operations(&name, operation, 1)?;
>> @@ -446,7 +472,6 @@ impl DataStore {
>>       fn with_store_and_config(
>>           chunk_store: Arc<ChunkStore>,
>>           config: DataStoreConfig,
>> -        last_digest: Option<[u8; 32]>,
>>       ) -> Result<DataStoreImpl, Error> {
>>           let mut gc_status_path = chunk_store.base_path();
>>           gc_status_path.push(".gc-status");
>> @@ -506,10 +531,10 @@ impl DataStore {
>>               last_gc_status: Mutex::new(gc_status),
>>               verify_new: config.verify_new.unwrap_or(false),
>>               chunk_order: tuning.chunk_order.unwrap_or_default(),
>> -            last_digest,
>>               sync_level: tuning.sync_level.unwrap_or_default(),
>>               backend_config,
>>               lru_store_caching,
>> +            cached_config_tag: None,
>>           })
>>       }
>>   
>> -- 
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

  reply	other threads:[~2025-11-13 12:59 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-11 12:29 [pbs-devel] [PATCH proxmox-backup 0/3] datastore: remove config reload on hot path Samuel Rufinatscha
2025-11-11 12:29 ` [pbs-devel] [PATCH proxmox-backup 1/3] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
2025-11-12 13:24   ` Fabian Grünbichler
2025-11-13 12:59     ` Samuel Rufinatscha [this message]
2025-11-11 12:29 ` [pbs-devel] [PATCH proxmox-backup 2/3] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
2025-11-12 11:24   ` Fabian Grünbichler
2025-11-12 15:20     ` Samuel Rufinatscha
2025-11-11 12:29 ` [pbs-devel] [PATCH proxmox-backup 3/3] datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
2025-11-12 11:27 ` [pbs-devel] [PATCH proxmox-backup 0/3] datastore: remove config reload on hot path Fabian Grünbichler
2025-11-12 17:27   ` Samuel Rufinatscha
2025-11-14 15:08 ` [pbs-devel] superseded: " Samuel Rufinatscha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1c91221e-6979-4b38-9011-0b53921d4093@proxmox.com \
    --to=s.rufinatscha@proxmox.com \
    --cc=f.gruenbichler@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal