all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Proxmox Backup Server development discussion
	<pbs-devel@lists.proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup 5/6] datastore: verify: evict corrupt chunks from in-memory LRU cache
Date: Mon, 27 Oct 2025 11:59:30 +0100	[thread overview]
Message-ID: <1761562054.2fkxbxnisy.astroid@yuna.none> (raw)
In-Reply-To: <20251016131819.349049-6-c.ebner@proxmox.com>

On October 16, 2025 3:18 pm, Christian Ebner wrote:
> Chunks detected as corrupt have been renamed on both, the S3 backend
> and the local datastore cache, but not evicted from the in-memory
> cache containing the LRU chunk digests. This can lead to the chunks
> being considered as already present if their digest is still cached,
> and therefore not being re-inserted in the local store cache and S3
> backend on backup upload.
> 
> Fix this by not only renaming the local datastore's chunk marker
> file, but also removing it from the in-memory cache while holding the
> chunk store mutex lock to exclude interference from concurrent chunk
> inserts.
> 
> Reported-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
>  pbs-datastore/src/datastore.rs | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index a7ea8fd96..c551679e3 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -2467,10 +2467,18 @@ impl DataStore {
>  
>          let _lock = self.inner.chunk_store.mutex().lock().unwrap();
>  
> -        match std::fs::rename(&path, &new_path) {
> +        let result = match std::fs::rename(&path, &new_path) {
>              Ok(_) => Ok(Some(format!("corrupted chunk renamed to {new_path:?}"))),
>              Err(err) if err.kind() == std::io::ErrorKind::NotFound => Ok(None),
>              Err(err) => bail!("could not rename corrupted chunk {path:?} - {err}"),
> +        };
> +
> +        if let Some(cache) = self.cache() {
> +            // Requiremets for calling the unsafe method are met, since snapshots referencing the
> +            // corrupt chunk are to be considered corrupt. Ignore the error due to the missing file.
> +            let _ = unsafe { cache.remove(digest) };
>          }

not exactly related to this patch, but something I noticed while
reviewing it:

how do we protect against concurrent actions on the same chunk between
- verification detects chunk as corrupt
- rename on S3 level
- lock + rename and remove on chunk store/cache level

let's say to verification tasks and a backup writer race, all involving
the same chunk:

- verification A and B detect chunk as corrupt
- verification A renames it on the S3 level
- verification A renames it on the chunk store level and removes it from
  the cache
- backup writer uploads it to S3
- verification B renames it on the S3 level

and now either:
- backup writer inserts it into the cache
- verification B renames it on the chunk store level and removes it from
  the cache

which means the backup writer creates a corrupt, but not detected as
such backup

or
- verification B attempts to rename it on the chunk store level
  (ENOTFOUND) and removes it from the cache
- backup writer inserts it into the cache

which means we just broke cache coherency - the chunk exists in the
chunk store and cache, but not on S3 (anymore)


or another racey variant:
- verification A and B detect chunk as corrupt
- verification A renames it on the S3 level
- verification B renames it on the S3 level but fails because it doesn't
  exist there??

I think we might need another synchronization mechanism for verification
involving S3..

> +
> +        result
>      }
>  }
> -- 
> 2.47.3
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

  reply	other threads:[~2025-10-27 10:59 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-16 13:18 [pbs-devel] [PATCH proxmox-backup 0/6] s3 store verify: fix concurrency issues and add missing in-memory cache eviction Christian Ebner
2025-10-16 13:18 ` [pbs-devel] [PATCH proxmox-backup 1/6] verify/datastore: make rename corrupt chunk a datastore helper method Christian Ebner
2025-10-16 13:18 ` [pbs-devel] [PATCH proxmox-backup 2/6] datastore: refactor rename_corrupted_chunk error handling Christian Ebner
2025-10-27 10:59   ` Fabian Grünbichler
2025-10-27 11:36     ` Christian Ebner
2025-10-16 13:18 ` [pbs-devel] [PATCH proxmox-backup 3/6] verify: never hold mutex lock in async scope on corrupt chunk rename Christian Ebner
2025-10-27 10:59   ` Fabian Grünbichler
2025-10-27 11:37     ` Christian Ebner
2025-10-16 13:18 ` [pbs-devel] [PATCH proxmox-backup 4/6] datastore: acquire chunk store mutex lock when renaming corrupt chunk Christian Ebner
2025-10-27 10:59   ` Fabian Grünbichler
2025-10-16 13:18 ` [pbs-devel] [PATCH proxmox-backup 5/6] datastore: verify: evict corrupt chunks from in-memory LRU cache Christian Ebner
2025-10-27 10:59   ` Fabian Grünbichler [this message]
2025-10-27 11:53     ` Christian Ebner
2025-10-16 13:18 ` [pbs-devel] [PATCH proxmox-backup 6/6] verify: distinguish s3 object fetching and chunk loading error Christian Ebner
2025-10-27 10:59   ` Fabian Grünbichler
2025-10-27 11:54     ` Christian Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1761562054.2fkxbxnisy.astroid@yuna.none \
    --to=f.gruenbichler@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal