From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Proxmox Backup Server development discussion
<pbs-devel@lists.proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup 5/6] datastore: verify: evict corrupt chunks from in-memory LRU cache
Date: Mon, 27 Oct 2025 11:59:30 +0100 [thread overview]
Message-ID: <1761562054.2fkxbxnisy.astroid@yuna.none> (raw)
In-Reply-To: <20251016131819.349049-6-c.ebner@proxmox.com>
On October 16, 2025 3:18 pm, Christian Ebner wrote:
> Chunks detected as corrupt have been renamed on both, the S3 backend
> and the local datastore cache, but not evicted from the in-memory
> cache containing the LRU chunk digests. This can lead to the chunks
> being considered as already present if their digest is still cached,
> and therefore not being re-inserted in the local store cache and S3
> backend on backup upload.
>
> Fix this by not only renaming the local datastore's chunk marker
> file, but also removing it from the in-memory cache while holding the
> chunk store mutex lock to exclude interference from concurrent chunk
> inserts.
>
> Reported-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> pbs-datastore/src/datastore.rs | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index a7ea8fd96..c551679e3 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -2467,10 +2467,18 @@ impl DataStore {
>
> let _lock = self.inner.chunk_store.mutex().lock().unwrap();
>
> - match std::fs::rename(&path, &new_path) {
> + let result = match std::fs::rename(&path, &new_path) {
> Ok(_) => Ok(Some(format!("corrupted chunk renamed to {new_path:?}"))),
> Err(err) if err.kind() == std::io::ErrorKind::NotFound => Ok(None),
> Err(err) => bail!("could not rename corrupted chunk {path:?} - {err}"),
> + };
> +
> + if let Some(cache) = self.cache() {
> + // Requiremets for calling the unsafe method are met, since snapshots referencing the
> + // corrupt chunk are to be considered corrupt. Ignore the error due to the missing file.
> + let _ = unsafe { cache.remove(digest) };
> }
not exactly related to this patch, but something I noticed while
reviewing it:
how do we protect against concurrent actions on the same chunk between
- verification detects chunk as corrupt
- rename on S3 level
- lock + rename and remove on chunk store/cache level
let's say to verification tasks and a backup writer race, all involving
the same chunk:
- verification A and B detect chunk as corrupt
- verification A renames it on the S3 level
- verification A renames it on the chunk store level and removes it from
the cache
- backup writer uploads it to S3
- verification B renames it on the S3 level
and now either:
- backup writer inserts it into the cache
- verification B renames it on the chunk store level and removes it from
the cache
which means the backup writer creates a corrupt, but not detected as
such backup
or
- verification B attempts to rename it on the chunk store level
(ENOTFOUND) and removes it from the cache
- backup writer inserts it into the cache
which means we just broke cache coherency - the chunk exists in the
chunk store and cache, but not on S3 (anymore)
or another racey variant:
- verification A and B detect chunk as corrupt
- verification A renames it on the S3 level
- verification B renames it on the S3 level but fails because it doesn't
exist there??
I think we might need another synchronization mechanism for verification
involving S3..
> +
> + result
> }
> }
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
next prev parent reply other threads:[~2025-10-27 10:59 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-16 13:18 [pbs-devel] [PATCH proxmox-backup 0/6] s3 store verify: fix concurrency issues and add missing in-memory cache eviction Christian Ebner
2025-10-16 13:18 ` [pbs-devel] [PATCH proxmox-backup 1/6] verify/datastore: make rename corrupt chunk a datastore helper method Christian Ebner
2025-10-16 13:18 ` [pbs-devel] [PATCH proxmox-backup 2/6] datastore: refactor rename_corrupted_chunk error handling Christian Ebner
2025-10-27 10:59 ` Fabian Grünbichler
2025-10-27 11:36 ` Christian Ebner
2025-10-16 13:18 ` [pbs-devel] [PATCH proxmox-backup 3/6] verify: never hold mutex lock in async scope on corrupt chunk rename Christian Ebner
2025-10-27 10:59 ` Fabian Grünbichler
2025-10-27 11:37 ` Christian Ebner
2025-10-16 13:18 ` [pbs-devel] [PATCH proxmox-backup 4/6] datastore: acquire chunk store mutex lock when renaming corrupt chunk Christian Ebner
2025-10-27 10:59 ` Fabian Grünbichler
2025-10-16 13:18 ` [pbs-devel] [PATCH proxmox-backup 5/6] datastore: verify: evict corrupt chunks from in-memory LRU cache Christian Ebner
2025-10-27 10:59 ` Fabian Grünbichler [this message]
2025-10-27 11:53 ` Christian Ebner
2025-10-16 13:18 ` [pbs-devel] [PATCH proxmox-backup 6/6] verify: distinguish s3 object fetching and chunk loading error Christian Ebner
2025-10-27 10:59 ` Fabian Grünbichler
2025-10-27 11:54 ` Christian Ebner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1761562054.2fkxbxnisy.astroid@yuna.none \
--to=f.gruenbichler@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox