From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Proxmox Backup Server development discussion
<pbs-devel@lists.proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup v2 3/3] datastore: insert chunk marker and touch bad chunks in locked context
Date: Mon, 10 Nov 2025 09:31:54 +0100 [thread overview]
Message-ID: <1762762321.tfqlrhk6b0.astroid@yuna.none> (raw)
In-Reply-To: <20251106171358.865503-4-c.ebner@proxmox.com>
On November 6, 2025 6:13 pm, Christian Ebner wrote:
> Assures that both, the touching of bad chunks as well as the
> insertion of missing chunk marker files are done while the chunk
> store mutex is guarded, other operations therefore get a consistent
> state.
>
> To achieve this, introduces a helper method which allows to run a
> callback in a locked context if the chunk file is missing.
>
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> changes since version 1:
> - not present in previous version
>
> pbs-datastore/src/chunk_store.rs | 17 +++++++++++++
> pbs-datastore/src/datastore.rs | 42 +++++++++++++++++++-------------
> 2 files changed, 42 insertions(+), 17 deletions(-)
>
> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
> index b88a0a096..063bc55f6 100644
> --- a/pbs-datastore/src/chunk_store.rs
> +++ b/pbs-datastore/src/chunk_store.rs
> @@ -212,6 +212,23 @@ impl ChunkStore {
> Ok(())
> }
>
> + /// Update the chunk files atime if it exists, call the provided callback inside a chunk store
> + /// locked scope otherwise.
> + pub(super) fn cond_touch_chunk_or_locked<T>(
> + &self,
> + digest: &[u8; 32],
> + callback: T,
> + ) -> Result<(), Error>
> + where
> + T: FnOnce() -> Result<(), Error>,
> + {
> + let _lock = self.mutex.lock();
> + if !self.cond_touch_chunk_no_lock(digest, false)? {
> + callback()?;
> + }
> + Ok(())
> + }
> +
> /// Update the chunk files atime if it exists.
> ///
> /// If the chunk file does not exist, return with error if assert_exists is true, with
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index 4527b40f4..b2f414ce1 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -1302,15 +1302,16 @@ impl DataStore {
> match s3_client {
> None => {
> // Filesystem backend
> - if !self.inner.chunk_store.cond_touch_chunk(digest, false)? {
> + self.inner.chunk_store.cond_touch_chunk_or_locked(digest, || {
> let hex = hex::encode(digest);
> warn!(
> "warning: unable to access non-existent chunk {hex}, required by {file_name:?}"
> );
>
> - // touch any corresponding .bad files to keep them around, meaning if a chunk is
> - // rewritten correctly they will be removed automatically, as well as if no index
> - // file requires the chunk anymore (won't get to this loop then)
> + // touch any corresponding .bad files to keep them around, meaning if a
> + // chunk is rewritten correctly they will be removed automatically, as well
> + // as if no index file requires the chunk anymore (won't get to this loop
> + // then)
> for i in 0..=9 {
> let bad_ext = format!("{i}.bad");
> let mut bad_path = PathBuf::new();
> @@ -1318,22 +1319,29 @@ impl DataStore {
> bad_path.set_extension(bad_ext);
> self.inner.chunk_store.cond_touch_path(&bad_path, false)?;
> }
> - }
> + Ok(())
> + })?;
do we need to hold the mutex for touching the bad chunks? we don't when
creating them during verification (should we?), we do when removing them
during GC..
if we do, then we should probably have a
pub(super) fn cond_touch_bad_chunk(s?) in the chunk store
because each touch here is independent, and there is no overarching need
to hold the mutex across all of them, and this would allow us to make
cond_touch_path private..
should we also touch bad chunks for the S3 case?
> }
> Some(ref _s3_client) => {
> // Update atime on local cache marker files.
> - if !self.inner.chunk_store.cond_touch_chunk(digest, false)? {
> - let (chunk_path, _digest) = self.chunk_path(digest);
> - // Insert empty file as marker to tell GC phase2 that this is
> - // a chunk still in-use, so to keep in the S3 object store.
> - std::fs::File::options()
> - .write(true)
> - .create_new(true)
> - .open(&chunk_path)
> - .with_context(|| {
> - format!("failed to create marker for chunk {}", hex::encode(digest))
> - })?;
> - }
> + self.inner
> + .chunk_store
> + .cond_touch_chunk_or_locked(digest, || {
> + let (chunk_path, _digest) = self.chunk_path(digest);
> + // Insert empty file as marker to tell GC phase2 that this is
> + // a chunk still in-use, so to keep in the S3 object store.
> + std::fs::File::options()
> + .write(true)
> + .create_new(true)
> + .open(&chunk_path)
> + .with_context(|| {
> + format!(
> + "failed to create marker for chunk {}",
> + hex::encode(digest)
> + )
> + })?;
> + Ok(())
> + })?;
AFAICT, we can fix this together with the other S3-races by obtaining
the flock on the chunk here?
i.e.,
// without flock first, since a chunk missing is the unlikely path
// (corruption detected by verify, or manual damage to the chunk store)
if !cond_touch_chunk {
// now with flock, to protect the next two calls against concurrent
// insert+uploads/renames/..
flock {
// somebody else could have inserted it since we checked without
// locking above
if !cond_touch_chunk {
// insert empty marker only if chunk is not there
store.clear_chunk
}
}
}
if (not-)caching were a property of the datastore, instead of the backup
session, we could even just fetch the missing chunk into the cache here
if caching is enabled
> }
> }
> }
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
next prev parent reply other threads:[~2025-11-10 8:31 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-06 17:13 [pbs-devel] [PATCH proxmox-backup v2 0/3] fix GC atime update race window Christian Ebner
2025-11-06 17:13 ` [pbs-devel] [PATCH proxmox-backup v2 1/3] chunk store: limit scope for atime update helper methods Christian Ebner
2025-11-06 17:13 ` [pbs-devel] [PATCH proxmox-backup v2 2/3] chunk store: fix race window between chunk stat and gc cleanup Christian Ebner
2025-11-06 17:13 ` [pbs-devel] [PATCH proxmox-backup v2 3/3] datastore: insert chunk marker and touch bad chunks in locked context Christian Ebner
2025-11-10 8:31 ` Fabian Grünbichler [this message]
2025-11-10 12:39 ` Christian Ebner
2025-11-07 15:45 ` [pbs-devel] partially-applied: [PATCH proxmox-backup v2 0/3] fix GC atime update race window Thomas Lamprecht
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1762762321.tfqlrhk6b0.astroid@yuna.none \
--to=f.gruenbichler@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox