From: Christian Ebner <c.ebner@proxmox.com>
To: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>,
"Proxmox Backup Server development discussion"
<pbs-devel@lists.proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup v2 3/8] chunk store: invert chunk filename checks in chunk store iterator
Date: Wed, 14 Jan 2026 10:53:28 +0100 [thread overview]
Message-ID: <9f0921d7-2c06-4270-81ff-3d0aed0e6af8@proxmox.com> (raw)
In-Reply-To: <1768382362.q4ans5hcjl.astroid@yuna.none>
On 1/14/26 10:41 AM, Fabian Grünbichler wrote:
> On January 14, 2026 9:37 am, Christian Ebner wrote:
>> On 1/13/26 11:23 AM, Fabian Grünbichler wrote:
>>> On December 11, 2025 4:38 pm, Christian Ebner wrote:
>>>> Optimizes the chunk filename check towards regular chunk files by
>>>> explicitley checking for the correct length.
>>>>
>>>> While the check for ascii hexdigits needs to be stated twice, this
>>>> avoids to check for the `.bad` extension if the chunk filename did
>>>> already match the expected length.
>>>
>>> I don't get this part, we could still check first and only once that the
>>> first 64 bytes are valid hex?
>>>
>>> if bytes.len() < 64 {
>>> continue;
>>> }
>>>
>>> if !bytes.iter().take(64).all(u8::is_ascii_hexdigit) {
>>> continue;
>>> }
>>
>> But with the code below I'm done after 2 checks in the regular chunk
>> digest case:
>>
>> `bytes.len() == 64 && bytes.iter().take(64).all(u8::is_ascii_hexdigit)`
>>
>> which is the one which is most likely and should be optimized for?
>
> that's easy to do without writing the check twice though (with the added
> benefit of stopping at the first non-hex character):
>
> if bytes.iter().take(64).any(|c| !c.is_ascii_hexdigit) {
> continue;
> }
But now you do the length check only afterwards and need to iterate over
the digits for cases where the length would not match anyways? So not
the same ;)
>
> if bytes.len() == 64 { return .. };
> if bytes.len() < 64 { continue }
> if bytes.len() == 64 + .. && bytes.ends_with(..) { return .. }
> if bytes.len() == 64 + .. && &bytes[64..XX] == ... { return .. }
>
> I still think having the "too short" check up front makes sense - it's
> super cheap, makes the code more readable *and* saves us the iteration
> for such files..
Okay, then let's keep the upfront length check. Any optimization here is
probably out weight by actual IO on the chunks afterwards anyways, so
not too critical. Just tried to optimize since touching it anyways.
>
>>
>> What I tried to tell with the commit message is that the
>> bytes.iter().take(64).all(u8::is_ascii_hexdigit) is now written out
>> twice, but only one of the 2 case will ever be checked.
>>
>>>
>>> // now start looking at the length + potential extension
>>>
>>>>
>>>> This will also help to better distinguish bad chunks and chunks
>>>> used markers for s3 datastores in subsequent changes.
>>>>
>>>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>>>> ---
>>>> pbs-datastore/src/chunk_store.rs | 17 +++++++++++------
>>>> 1 file changed, 11 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
>>>> index a5e5f6261..7980938ad 100644
>>>> --- a/pbs-datastore/src/chunk_store.rs
>>>> +++ b/pbs-datastore/src/chunk_store.rs
>>>> @@ -315,15 +315,20 @@ impl ChunkStore {
>>>> Some(Ok(entry)) => {
>>>> // skip files if they're not a hash
>>>> let bytes = entry.file_name().to_bytes();
>>>> - if bytes.len() != 64 && bytes.len() != 64 + ".0.bad".len() {
>>>> - continue;
>>>> +
>>>> + if bytes.len() == 64 && bytes.iter().take(64).all(u8::is_ascii_hexdigit)
>>>> + {
>>>> + return Some((Ok(entry), percentage, false));
>>>> }
>>>> - if !bytes.iter().take(64).all(u8::is_ascii_hexdigit) {
>>>> - continue;
>>>> +
>>>> + if bytes.len() == 64 + ".0.bad".len()
>>>> + && bytes.iter().take(64).all(u8::is_ascii_hexdigit)
>>>> + {
>>>> + let bad = bytes.ends_with(b".bad");
>>>> + return Some((Ok(entry), percentage, bad));
>>>
>>> while this mimics the old code, it is still broken (a chunk digest +
>>> .fooba or any other 6-byte suffix that is not "??.bad" is returned as
>>> non-bad chunk, since the length matches a bad chunk, but the extension
>>> does not).
>>
>> That was the intention here, to keep this close to the previous
>> behavior. But since we do this check only in the less likely case, I
>> agree that adding the check for exact extension might be the better
>> option here.
>>
>> Will adapt this accordingly, thanks!
>>
>>>
>>>> }
>>>>
>>>> - let bad = bytes.ends_with(b".bad");
>>>> - return Some((Ok(entry), percentage, bad));
>>>> + continue;
>>>> }
>>>> Some(Err(err)) => {
>>>> // stop after first error
>>>> --
>>>> 2.47.3
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> pbs-devel mailing list
>>>> pbs-devel@lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>>
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> pbs-devel mailing list
>>> pbs-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>
>>>
>>
>>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
next prev parent reply other threads:[~2026-01-14 9:53 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-11 15:38 [pbs-devel] [PATCH proxmox-backup v2 0/8] followups for garbage collection Christian Ebner
2025-12-11 15:38 ` [pbs-devel] [PATCH proxmox-backup v2 1/8] GC: Move S3 delete list state and logic to a dedicated struct Christian Ebner
2026-01-13 10:23 ` Fabian Grünbichler
2026-01-14 8:22 ` Christian Ebner
2026-01-14 9:18 ` Fabian Grünbichler
2025-12-11 15:38 ` [pbs-devel] [PATCH proxmox-backup v2 2/8] chunk store: rename and limit scope for chunk store iterator Christian Ebner
2025-12-11 15:38 ` [pbs-devel] [PATCH proxmox-backup v2 3/8] chunk store: invert chunk filename checks in " Christian Ebner
2026-01-13 10:23 ` Fabian Grünbichler
2026-01-14 8:37 ` Christian Ebner
2026-01-14 9:41 ` Fabian Grünbichler
2026-01-14 9:53 ` Christian Ebner [this message]
2025-12-11 15:38 ` [pbs-devel] [PATCH proxmox-backup v2 4/8] chunk store: return chunk extension and check for used marker Christian Ebner
2026-01-13 10:24 ` Fabian Grünbichler
2026-01-14 8:41 ` Christian Ebner
2025-12-11 15:38 ` [pbs-devel] [PATCH proxmox-backup v2 5/8] chunk store: refactor chunk extension parsing into dedicated helper Christian Ebner
2026-01-13 10:24 ` Fabian Grünbichler
2025-12-11 15:38 ` [pbs-devel] [PATCH proxmox-backup v2 6/8] datastore: move bad chunk touching logic to chunk store Christian Ebner
2026-01-13 10:24 ` Fabian Grünbichler
2026-01-14 8:58 ` Christian Ebner
2025-12-11 15:38 ` [pbs-devel] [PATCH proxmox-backup v2 7/8] chunk store: move next bad chunk path generator into dedicated helper Christian Ebner
2025-12-11 15:38 ` [pbs-devel] [PATCH proxmox-backup v2 8/8] chunk store: move bad chunk filename generation " Christian Ebner
2026-01-13 10:24 ` [pbs-devel] [PATCH proxmox-backup v2 0/8] followups for garbage collection Fabian Grünbichler
2026-01-14 12:33 ` [pbs-devel] superseded: " Christian Ebner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9f0921d7-2c06-4270-81ff-3d0aed0e6af8@proxmox.com \
--to=c.ebner@proxmox.com \
--cc=f.gruenbichler@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.