From: Christian Ebner <c.ebner@proxmox.com>
To: "Proxmox Backup Server development discussion"
<pbs-devel@lists.proxmox.com>,
"Fabian Grünbichler" <f.gruenbichler@proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup v2 3/3] datastore: insert chunk marker and touch bad chunks in locked context
Date: Mon, 10 Nov 2025 13:39:44 +0100 [thread overview]
Message-ID: <a9d5ade0-9ecb-4f76-a429-108e18cfffa1@proxmox.com> (raw)
In-Reply-To: <1762762321.tfqlrhk6b0.astroid@yuna.none>
On 11/10/25 9:31 AM, Fabian Grünbichler wrote:
> On November 6, 2025 6:13 pm, Christian Ebner wrote:
>> Assures that both, the touching of bad chunks as well as the
>> insertion of missing chunk marker files are done while the chunk
>> store mutex is guarded, other operations therefore get a consistent
>> state.
>>
>> To achieve this, introduces a helper method which allows to run a
>> callback in a locked context if the chunk file is missing.
>>
>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>> ---
>> changes since version 1:
>> - not present in previous version
>>
>> pbs-datastore/src/chunk_store.rs | 17 +++++++++++++
>> pbs-datastore/src/datastore.rs | 42 +++++++++++++++++++-------------
>> 2 files changed, 42 insertions(+), 17 deletions(-)
>>
>> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
>> index b88a0a096..063bc55f6 100644
>> --- a/pbs-datastore/src/chunk_store.rs
>> +++ b/pbs-datastore/src/chunk_store.rs
>> @@ -212,6 +212,23 @@ impl ChunkStore {
>> Ok(())
>> }
>>
>> + /// Update the chunk files atime if it exists, call the provided callback inside a chunk store
>> + /// locked scope otherwise.
>> + pub(super) fn cond_touch_chunk_or_locked<T>(
>> + &self,
>> + digest: &[u8; 32],
>> + callback: T,
>> + ) -> Result<(), Error>
>> + where
>> + T: FnOnce() -> Result<(), Error>,
>> + {
>> + let _lock = self.mutex.lock();
>> + if !self.cond_touch_chunk_no_lock(digest, false)? {
>> + callback()?;
>> + }
>> + Ok(())
>> + }
>> +
>> /// Update the chunk files atime if it exists.
>> ///
>> /// If the chunk file does not exist, return with error if assert_exists is true, with
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index 4527b40f4..b2f414ce1 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -1302,15 +1302,16 @@ impl DataStore {
>> match s3_client {
>> None => {
>> // Filesystem backend
>> - if !self.inner.chunk_store.cond_touch_chunk(digest, false)? {
>> + self.inner.chunk_store.cond_touch_chunk_or_locked(digest, || {
>> let hex = hex::encode(digest);
>> warn!(
>> "warning: unable to access non-existent chunk {hex}, required by {file_name:?}"
>> );
>>
>> - // touch any corresponding .bad files to keep them around, meaning if a chunk is
>> - // rewritten correctly they will be removed automatically, as well as if no index
>> - // file requires the chunk anymore (won't get to this loop then)
>> + // touch any corresponding .bad files to keep them around, meaning if a
>> + // chunk is rewritten correctly they will be removed automatically, as well
>> + // as if no index file requires the chunk anymore (won't get to this loop
>> + // then)
>> for i in 0..=9 {
>> let bad_ext = format!("{i}.bad");
>> let mut bad_path = PathBuf::new();
>> @@ -1318,22 +1319,29 @@ impl DataStore {
>> bad_path.set_extension(bad_ext);
>> self.inner.chunk_store.cond_touch_path(&bad_path, false)?;
>> }
>> - }
>> + Ok(())
>> + })?;
>
> do we need to hold the mutex for touching the bad chunks? we don't when
> creating them during verification (should we?), we do when removing them
> during GC..
>
> if we do, then we should probably have a
>
> pub(super) fn cond_touch_bad_chunk(s?) in the chunk store
>
> because each touch here is independent, and there is no overarching need
> to hold the mutex across all of them, and this would allow us to make
> cond_touch_path private..
No, these can indeed be independent since cleanup in GC only happens if
the good chunk is not present, and the rename will then simply use the
next free bad filename slot.
> should we also touch bad chunks for the S3 case?
True, so they need to be present to reflect the state for these as well.
Otherwise they will get incorrectly cleaned up in phase 2.
>> }
>> Some(ref _s3_client) => {
>> // Update atime on local cache marker files.
>> - if !self.inner.chunk_store.cond_touch_chunk(digest, false)? {
>> - let (chunk_path, _digest) = self.chunk_path(digest);
>> - // Insert empty file as marker to tell GC phase2 that this is
>> - // a chunk still in-use, so to keep in the S3 object store.
>> - std::fs::File::options()
>> - .write(true)
>> - .create_new(true)
>> - .open(&chunk_path)
>> - .with_context(|| {
>> - format!("failed to create marker for chunk {}", hex::encode(digest))
>> - })?;
>> - }
>> + self.inner
>> + .chunk_store
>> + .cond_touch_chunk_or_locked(digest, || {
>> + let (chunk_path, _digest) = self.chunk_path(digest);
>> + // Insert empty file as marker to tell GC phase2 that this is
>> + // a chunk still in-use, so to keep in the S3 object store.
>> + std::fs::File::options()
>> + .write(true)
>> + .create_new(true)
>> + .open(&chunk_path)
>> + .with_context(|| {
>> + format!(
>> + "failed to create marker for chunk {}",
>> + hex::encode(digest)
>> + )
>> + })?;
>> + Ok(())
>> + })?;
>
> AFAICT, we can fix this together with the other S3-races by obtaining
> the flock on the chunk here?
>
> i.e.,
>
> // without flock first, since a chunk missing is the unlikely path
> // (corruption detected by verify, or manual damage to the chunk store)
> if !cond_touch_chunk {
> // now with flock, to protect the next two calls against concurrent
> // insert+uploads/renames/..
> flock {
> // somebody else could have inserted it since we checked without
> // locking above
> if !cond_touch_chunk {
> // insert empty marker only if chunk is not there
> store.clear_chunk
> }
> }
> }
Okay, will do by moving the s3 specific logic into the regular one,
which covers then both and send a v2, now however as followup patches to
be applied on top of [0] since that is required for the per-chunk file
locking.
[0]
https://lore.proxmox.com/pbs-devel/20251110115627.280318-1-c.ebner@proxmox.com/T/
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
next prev parent reply other threads:[~2025-11-10 12:39 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-06 17:13 [pbs-devel] [PATCH proxmox-backup v2 0/3] fix GC atime update race window Christian Ebner
2025-11-06 17:13 ` [pbs-devel] [PATCH proxmox-backup v2 1/3] chunk store: limit scope for atime update helper methods Christian Ebner
2025-11-06 17:13 ` [pbs-devel] [PATCH proxmox-backup v2 2/3] chunk store: fix race window between chunk stat and gc cleanup Christian Ebner
2025-11-06 17:13 ` [pbs-devel] [PATCH proxmox-backup v2 3/3] datastore: insert chunk marker and touch bad chunks in locked context Christian Ebner
2025-11-10 8:31 ` Fabian Grünbichler
2025-11-10 12:39 ` Christian Ebner [this message]
2025-11-07 15:45 ` [pbs-devel] partially-applied: [PATCH proxmox-backup v2 0/3] fix GC atime update race window Thomas Lamprecht
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a9d5ade0-9ecb-4f76-a429-108e18cfffa1@proxmox.com \
--to=c.ebner@proxmox.com \
--cc=f.gruenbichler@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox