public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Christian Ebner <c.ebner@proxmox.com>,
	Proxmox Backup Server development discussion
	<pbs-devel@lists.proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup v2] GC: chunk store: fix chunk using markers cleanup
Date: Wed, 26 Nov 2025 09:23:50 +0100	[thread overview]
Message-ID: <1764145262.ycdoq9dzrx.astroid@yuna.none> (raw)
In-Reply-To: <2a0d9f19-1a45-4552-8bf4-56d21c37231b@proxmox.com>

On November 25, 2025 3:27 pm, Christian Ebner wrote:
> On 11/25/25 3:19 PM, Fabian Grünbichler wrote:
>> On November 25, 2025 3:00 pm, Christian Ebner wrote:
>>> Since commit 9510ef1a ("GC: assure chunk exists on s3 store when
>>> creating missing chunk marker") chunks which are referenced by
>>> an index file but do not have a local marker file are marked by a
>>> file with the `using` extension, so they are not cleaned up during
>>> phase 2 if the chunk is still present on the backend.
>>>
>>> If the chunk is however not encountered, phase 3 will see the marker
>>> and tries to clean it up, which currently however fails because
>>> it is first tried to be cleaned up from the LRU cache, the filename
>>> being converted to the chunk digest.
>>>
>>> Therefore, clean up any using marker file encountered during phase 3
>>> before any regular or bad chunk, independent from the atime.
>>>
>>> Fixes: https://forum.proxmox.com/threads/176567/post-819437
>>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>>> ---
>>> Changes since version 1 (thanks a lot for offlist discussion Thomas):
>>> - Cleanup using marker chunks independent from atime cutoff
>>>
>>>   pbs-datastore/src/chunk_store.rs | 14 +++++++++++++-
>>>   1 file changed, 13 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
>>> index f53460664..7fe09b914 100644
>>> --- a/pbs-datastore/src/chunk_store.rs
>>> +++ b/pbs-datastore/src/chunk_store.rs
>>> @@ -25,6 +25,8 @@ use crate::file_formats::{
>>>   };
>>>   use crate::{DataBlob, LocalDatastoreLruCache};
>>>   
>>> +const USING_MARKER_FILENAME_EXT: &str = "using";
>>> +
>>>   /// File system based chunk store
>>>   pub struct ChunkStore {
>>>       name: String, // used for error reporting
>>> @@ -426,6 +428,16 @@ impl ChunkStore {
>>>                       drop(lock);
>>>                       continue;
>>>                   }
>>> +                if filename
>>> +                    .to_bytes()
>>> +                    .ends_with(USING_MARKER_FILENAME_EXT.as_bytes())
>>> +                {
>>> +                    unlinkat(Some(dirfd), filename, UnlinkatFlags::NoRemoveDir).map_err(|err| {
>>> +                        format_err!("unlinking chunk using marker {filename:?} failed - {err}")
>>> +                    })?;
>>> +                    drop(lock);
>>> +                    continue;
>>> +                }
>> 
>> this looks okay as a stop-gap, but isn't the actual problem that
>> 
>> .using
>> 
>> and
>> 
>> .0.bad
>> 
>> have the same length, so we end up taking a codepath using a weird "bad
>> but not bad" filename instead of skipping those markers in phase3?
> 
> but we need to clean them up at some point, otherwise the following 
> might happen:
> - chunk is in use by index file, phase 1 sets marker
> - chunk is not present on s3 object store (bad chunk), therefore not 
> seen in phase 2 and not replaced by regular marker file
> - chunk is uploaded
> - both index files are pruned
> - chunk is never cleaned up because using marker file persists.

yes, that's true, since the only purpose is to protect against cleaning
up in phase 2, they don't need to live longer than during GC.

>> in get_chunk_iterator, we skip all files that are not 64 bytes or
>> 64+len(.0.bad) bytes long, but then set the "bad" flag based on the
>> extension..
> 
> this might return the information if this was a using marker by some 
> enum variant instead of the bad boolean flag, so that can be used to 
> clearly distinguish these.

that seems cleaner, yes.


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

  reply	other threads:[~2025-11-26  8:23 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-25 14:00 Christian Ebner
2025-11-25 14:18 ` Fabian Grünbichler
2025-11-25 14:23   ` Thomas Lamprecht
2025-11-25 14:27   ` Christian Ebner
2025-11-26  8:23     ` Fabian Grünbichler [this message]
2025-11-25 15:02 ` [pbs-devel] applied: " Thomas Lamprecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1764145262.ycdoq9dzrx.astroid@yuna.none \
    --to=f.gruenbichler@proxmox.com \
    --cc=c.ebner@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal