all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Christian Ebner <c.ebner@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] [PATCH proxmox-backup v6 29/37] datastore: add local datastore cache for network attached storages
Date: Tue,  8 Jul 2025 19:01:06 +0200	[thread overview]
Message-ID: <20250708170114.1556057-39-c.ebner@proxmox.com> (raw)
In-Reply-To: <20250708170114.1556057-1-c.ebner@proxmox.com>

Use a local datastore as cache using LRU cache replacement policy for
operations on a datastore backed by a network, e.g. by an S3 object
store backend. The goal is to reduce number of requests to the
backend and thereby save costs (monetary as well as time).

The cacher allows to fetch cache items on cache misses via the access
method.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 pbs-datastore/src/datastore.rs                |  53 +++++-
 pbs-datastore/src/lib.rs                      |   3 +
 .../src/local_datastore_lru_cache.rs          | 172 ++++++++++++++++++
 3 files changed, 227 insertions(+), 1 deletion(-)
 create mode 100644 pbs-datastore/src/local_datastore_lru_cache.rs

diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index c1ba2dcea..ef146e84a 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -40,9 +40,10 @@ use crate::dynamic_index::{DynamicIndexReader, DynamicIndexWriter};
 use crate::fixed_index::{FixedIndexReader, FixedIndexWriter};
 use crate::hierarchy::{ListGroups, ListGroupsType, ListNamespaces, ListNamespacesRecursive};
 use crate::index::IndexFile;
+use crate::local_datastore_lru_cache::S3Cacher;
 use crate::s3::S3_CONTENT_PREFIX;
 use crate::task_tracking::{self, update_active_operations};
-use crate::DataBlob;
+use crate::{DataBlob, LocalDatastoreLruCache};
 
 static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> =
     LazyLock::new(|| Mutex::new(HashMap::new()));
@@ -136,6 +137,7 @@ pub struct DataStoreImpl {
     last_digest: Option<[u8; 32]>,
     sync_level: DatastoreFSyncLevel,
     backend_config: DatastoreBackendConfig,
+    lru_store_caching: Option<LocalDatastoreLruCache>,
 }
 
 impl DataStoreImpl {
@@ -151,6 +153,7 @@ impl DataStoreImpl {
             last_digest: None,
             sync_level: Default::default(),
             backend_config: Default::default(),
+            lru_store_caching: None,
         })
     }
 }
@@ -255,6 +258,37 @@ impl DataStore {
         Ok(backend_type)
     }
 
+    pub fn cache(&self) -> Option<&LocalDatastoreLruCache> {
+        self.inner.lru_store_caching.as_ref()
+    }
+
+    /// Check if the digest is present in the local datastore cache.
+    /// Always returns false if there is no cache configured for this datastore.
+    pub fn cache_contains(&self, digest: &[u8; 32]) -> bool {
+        if let Some(cache) = self.inner.lru_store_caching.as_ref() {
+            return cache.contains(digest);
+        }
+        false
+    }
+
+    /// Insert digest as most recently used on in the cache.
+    /// Returns with success if there is no cache configured for this datastore.
+    pub fn cache_insert(&self, digest: &[u8; 32], chunk: &DataBlob) -> Result<(), Error> {
+        if let Some(cache) = self.inner.lru_store_caching.as_ref() {
+            return cache.insert(digest, chunk);
+        }
+        Ok(())
+    }
+
+    pub fn cacher(&self) -> Result<Option<S3Cacher>, Error> {
+        self.backend().map(|backend| match backend {
+            DatastoreBackend::S3(s3_client) => {
+                Some(S3Cacher::new(s3_client, self.inner.chunk_store.clone()))
+            }
+            DatastoreBackend::Filesystem => None,
+        })
+    }
+
     pub fn lookup_datastore(
         name: &str,
         operation: Option<Operation>,
@@ -437,6 +471,16 @@ impl DataStore {
                 .parse_property_string(config.backend.as_deref().unwrap_or(""))?,
         )?;
 
+        const LOCAL_DATASTORE_CACHE_SIZE: usize = 10_000_000;
+        let lru_store_caching = if DatastoreBackendType::S3 == backend_config.ty.unwrap_or_default()
+        {
+            let cache =
+                LocalDatastoreLruCache::new(LOCAL_DATASTORE_CACHE_SIZE, chunk_store.clone());
+            Some(cache)
+        } else {
+            None
+        };
+
         Ok(DataStoreImpl {
             chunk_store,
             gc_mutex: Mutex::new(()),
@@ -446,6 +490,7 @@ impl DataStore {
             last_digest,
             sync_level: tuning.sync_level.unwrap_or_default(),
             backend_config,
+            lru_store_caching,
         })
     }
 
@@ -1579,6 +1624,12 @@ impl DataStore {
                         chunk_count += 1;
 
                         if atime < min_atime {
+                            if let Some(cache) = self.cache() {
+                                let mut digest_bytes = [0u8; 32];
+                                hex::decode_to_slice(digest.as_bytes(), &mut digest_bytes)?;
+                                // ignore errors, phase 3 will retry cleanup anyways
+                                let _ = cache.remove(&digest_bytes);
+                            }
                             delete_list.push(content.key);
                             if bad {
                                 gc_status.removed_bad += 1;
diff --git a/pbs-datastore/src/lib.rs b/pbs-datastore/src/lib.rs
index ca6fdb7d8..b9eb035c2 100644
--- a/pbs-datastore/src/lib.rs
+++ b/pbs-datastore/src/lib.rs
@@ -217,3 +217,6 @@ pub use snapshot_reader::SnapshotReader;
 
 mod local_chunk_reader;
 pub use local_chunk_reader::LocalChunkReader;
+
+mod local_datastore_lru_cache;
+pub use local_datastore_lru_cache::LocalDatastoreLruCache;
diff --git a/pbs-datastore/src/local_datastore_lru_cache.rs b/pbs-datastore/src/local_datastore_lru_cache.rs
new file mode 100644
index 000000000..bb64c52f3
--- /dev/null
+++ b/pbs-datastore/src/local_datastore_lru_cache.rs
@@ -0,0 +1,172 @@
+//! Use a local datastore as cache for operations on a datastore attached via
+//! a network layer (e.g. via the S3 backend).
+
+use std::future::Future;
+use std::sync::Arc;
+
+use anyhow::{bail, Error};
+use http_body_util::BodyExt;
+
+use pbs_tools::async_lru_cache::{AsyncCacher, AsyncLruCache};
+use proxmox_s3_client::S3Client;
+
+use crate::ChunkStore;
+use crate::DataBlob;
+
+#[derive(Clone)]
+pub struct S3Cacher {
+    client: Arc<S3Client>,
+    store: Arc<ChunkStore>,
+}
+
+impl AsyncCacher<[u8; 32], ()> for S3Cacher {
+    fn fetch(
+        &self,
+        key: [u8; 32],
+    ) -> Box<dyn Future<Output = Result<Option<()>, Error>> + Send + 'static> {
+        let client = self.client.clone();
+        let store = self.store.clone();
+        Box::new(async move {
+            let object_key = crate::s3::object_key_from_digest(&key)?;
+            match client.get_object(object_key).await? {
+                None => bail!("could not fetch object with key {}", hex::encode(key)),
+                Some(response) => {
+                    let bytes = response.content.collect().await?.to_bytes();
+                    let chunk = DataBlob::from_raw(bytes.to_vec())?;
+                    store.insert_chunk(&chunk, &key)?;
+                    Ok(Some(()))
+                }
+            }
+        })
+    }
+}
+
+impl S3Cacher {
+    pub fn new(client: Arc<S3Client>, store: Arc<ChunkStore>) -> Self {
+        Self { client, store }
+    }
+}
+
+/// LRU cache using local datastore for caching chunks
+///
+/// Uses a LRU cache, but without storing the values in-memory but rather
+/// on the filesystem
+pub struct LocalDatastoreLruCache {
+    cache: AsyncLruCache<[u8; 32], ()>,
+    store: Arc<ChunkStore>,
+}
+
+impl LocalDatastoreLruCache {
+    pub fn new(capacity: usize, store: Arc<ChunkStore>) -> Self {
+        Self {
+            cache: AsyncLruCache::new(capacity),
+            store,
+        }
+    }
+
+    /// Insert a new chunk into the local datastore cache.
+    ///
+    /// Fails if the chunk cannot be inserted successfully.
+    pub fn insert(&self, digest: &[u8; 32], chunk: &DataBlob) -> Result<(), Error> {
+        self.store.insert_chunk(chunk, digest)?;
+        self.cache.insert(*digest, (), |digest| {
+            let (path, _digest_str) = self.store.chunk_path(&digest);
+            // Truncate to free up space but keep the inode around, since that
+            // is used as marker for chunks in use by garbage collection.
+            if let Err(err) = nix::unistd::truncate(&path, 0) {
+                if err != nix::errno::Errno::ENOENT {
+                    return Err(Error::from(err));
+                }
+            }
+            Ok(())
+        })
+    }
+
+    /// Remove a chunk from the local datastore cache.
+    ///
+    /// Fails if the chunk cannot be deleted successfully.
+    pub fn remove(&self, digest: &[u8; 32]) -> Result<(), Error> {
+        self.cache.remove(*digest);
+        let (path, _digest_str) = self.store.chunk_path(digest);
+        std::fs::remove_file(path).map_err(Error::from)
+    }
+
+    pub async fn access(
+        &self,
+        digest: &[u8; 32],
+        cacher: &mut S3Cacher,
+    ) -> Result<Option<DataBlob>, Error> {
+        if self
+            .cache
+            .access(*digest, cacher, |digest| {
+                let (path, _digest_str) = self.store.chunk_path(&digest);
+                // Truncate to free up space but keep the inode around, since that
+                // is used as marker for chunks in use by garbage collection.
+                if let Err(err) = nix::unistd::truncate(&path, 0) {
+                    if err != nix::errno::Errno::ENOENT {
+                        return Err(Error::from(err));
+                    }
+                }
+                Ok(())
+            })
+            .await?
+            .is_some()
+        {
+            let (path, _digest_str) = self.store.chunk_path(digest);
+            let mut file = match std::fs::File::open(&path) {
+                Ok(file) => file,
+                Err(err) => {
+                    // Expected chunk to be present since LRU cache has it, but it is missing
+                    // locally, try to fetch again
+                    if err.kind() == std::io::ErrorKind::NotFound {
+                        let object_key = crate::s3::object_key_from_digest(digest)?;
+                        match cacher.client.get_object(object_key).await? {
+                            None => {
+                                bail!("could not fetch object with key {}", hex::encode(digest))
+                            }
+                            Some(response) => {
+                                let bytes = response.content.collect().await?.to_bytes();
+                                let chunk = DataBlob::from_raw(bytes.to_vec())?;
+                                self.store.insert_chunk(&chunk, digest)?;
+                                std::fs::File::open(&path)?
+                            }
+                        }
+                    } else {
+                        return Err(Error::from(err));
+                    }
+                }
+            };
+            let chunk = match DataBlob::load_from_reader(&mut file) {
+                Ok(chunk) => chunk,
+                Err(err) => {
+                    use std::io::Seek;
+                    // Check if file is empty marker file, try fetching content if so
+                    if file.seek(std::io::SeekFrom::End(0))? == 0 {
+                        let object_key = crate::s3::object_key_from_digest(digest)?;
+                        match cacher.client.get_object(object_key).await? {
+                            None => {
+                                bail!("could not fetch object with key {}", hex::encode(digest))
+                            }
+                            Some(response) => {
+                                let bytes = response.content.collect().await?.to_bytes();
+                                let chunk = DataBlob::from_raw(bytes.to_vec())?;
+                                self.store.insert_chunk(&chunk, digest)?;
+                                let mut file = std::fs::File::open(&path)?;
+                                DataBlob::load_from_reader(&mut file)?
+                            }
+                        }
+                    } else {
+                        return Err(err);
+                    }
+                }
+            };
+            Ok(Some(chunk))
+        } else {
+            Ok(None)
+        }
+    }
+
+    pub fn contains(&self, digest: &[u8; 32]) -> bool {
+        self.cache.contains(*digest)
+    }
+}
-- 
2.47.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


  parent reply	other threads:[~2025-07-08 17:02 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-08 17:00 [pbs-devel] [PATCH proxmox{, -backup} v6 00/46] fix #2943: S3 storage backend for datastores Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox v6 1/9] s3 client: add crate for AWS s3 compatible object store client Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox v6 2/9] s3 client: implement AWS signature v4 request authentication Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox v6 3/9] s3 client: add dedicated type for s3 object keys Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox v6 4/9] s3 client: add type for last modified timestamp in responses Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox v6 5/9] s3 client: add helper to parse http date headers Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox v6 6/9] s3 client: implement methods to operate on s3 objects in bucket Christian Ebner
2025-07-09 10:04   ` Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox v6 7/9] s3 client: add example usage for basic operations Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox v6 8/9] pbs-api-types: extend datastore config by backend config enum Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox v6 9/9] pbs-api-types: maintenance: add new maintenance mode S3 refresh Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox-backup v6 01/37] datastore: add helpers for path/digest to s3 object key conversion Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox-backup v6 02/37] config: introduce s3 object store client configuration Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox-backup v6 03/37] api: config: implement endpoints to manipulate and list s3 configs Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox-backup v6 04/37] api: datastore: check s3 backend bucket access on datastore create Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox-backup v6 05/37] api/cli: add endpoint and command to check s3 client connection Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox-backup v6 06/37] datastore: allow to get the backend for a datastore Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox-backup v6 07/37] api: backup: store datastore backend in runtime environment Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox-backup v6 08/37] api: backup: conditionally upload chunks to s3 object store backend Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox-backup v6 09/37] api: backup: conditionally upload blobs " Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox-backup v6 10/37] api: backup: conditionally upload indices " Christian Ebner
2025-07-09  7:55   ` Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox-backup v6 11/37] api: backup: conditionally upload manifest " Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox-backup v6 12/37] sync: pull: conditionally upload content to s3 backend Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox-backup v6 13/37] api: reader: fetch chunks based on datastore backend Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox-backup v6 14/37] datastore: local chunk reader: read chunks based on backend Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox-backup v6 15/37] verify worker: add datastore backed to verify worker Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox-backup v6 16/37] verify: implement chunk verification for stores with s3 backend Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox-backup v6 17/37] datastore: create namespace marker in " Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox-backup v6 18/37] datastore: create/delete protected marker file on s3 storage backend Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox-backup v6 19/37] datastore: prune groups/snapshots from s3 object store backend Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox-backup v6 20/37] datastore: get and set owner for s3 " Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox-backup v6 21/37] datastore: implement garbage collection for s3 backend Christian Ebner
2025-07-10  6:59   ` Thomas Lamprecht
2025-07-10  7:36     ` Christian Ebner
2025-07-10  9:47     ` Christian Ebner
2025-07-10 11:15       ` Christian Ebner
2025-07-08 17:00 ` [pbs-devel] [PATCH proxmox-backup v6 22/37] ui: add datastore type selector and reorganize component layout Christian Ebner
2025-07-08 17:01 ` [pbs-devel] [PATCH proxmox-backup v6 23/37] ui: add s3 client edit window for configuration create/edit Christian Ebner
2025-07-08 17:01 ` [pbs-devel] [PATCH proxmox-backup v6 24/37] ui: add s3 client view for configuration Christian Ebner
2025-07-08 17:01 ` [pbs-devel] [PATCH proxmox-backup v6 25/37] ui: expose the s3 client view in the navigation tree Christian Ebner
2025-07-08 17:01 ` [pbs-devel] [PATCH proxmox-backup v6 26/37] ui: add s3 client selector and bucket field for s3 backend setup Christian Ebner
2025-07-08 17:01 ` [pbs-devel] [PATCH proxmox-backup v6 27/37] tools: lru cache: add removed callback for evicted cache nodes Christian Ebner
2025-07-08 17:01 ` [pbs-devel] [PATCH proxmox-backup v6 28/37] tools: async lru cache: implement insert, remove and contains methods Christian Ebner
2025-07-08 17:01 ` Christian Ebner [this message]
2025-07-10 10:05   ` [pbs-devel] [PATCH proxmox-backup v6 29/37] datastore: add local datastore cache for network attached storages Thomas Lamprecht
2025-07-10 10:30     ` Christian Ebner
2025-07-08 17:01 ` [pbs-devel] [PATCH proxmox-backup v6 30/37] api: backup: use local datastore cache on s3 backend chunk upload Christian Ebner
2025-07-08 17:01 ` [pbs-devel] [PATCH proxmox-backup v6 31/37] api: reader: use local datastore cache on s3 backend chunk fetching Christian Ebner
2025-07-08 17:01 ` [pbs-devel] [PATCH proxmox-backup v6 32/37] datastore: local chunk reader: get cached chunk from local cache store Christian Ebner
2025-07-08 17:01 ` [pbs-devel] [PATCH proxmox-backup v6 33/37] api: backup: add no-cache flag to bypass local datastore cache Christian Ebner
2025-07-08 17:01 ` [pbs-devel] [PATCH proxmox-backup v6 34/37] api/datastore: implement refresh endpoint for stores with s3 backend Christian Ebner
2025-07-08 17:01 ` [pbs-devel] [PATCH proxmox-backup v6 35/37] cli: add dedicated subcommand for datastore s3 refresh Christian Ebner
2025-07-08 17:01 ` [pbs-devel] [PATCH proxmox-backup v6 36/37] ui: render s3 refresh as valid maintenance type and task description Christian Ebner
2025-07-08 17:01 ` [pbs-devel] [PATCH proxmox-backup v6 37/37] ui: expose s3 refresh button for datastores backed by object store Christian Ebner
2025-07-10 17:09 ` [pbs-devel] superseded: [PATCH proxmox{, -backup} v6 00/46] fix #2943: S3 storage backend for datastores Christian Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250708170114.1556057-39-c.ebner@proxmox.com \
    --to=c.ebner@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal