From: Christian Ebner <c.ebner@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] [PATCH proxmox-backup v6 00/21] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend
Date: Fri, 14 Nov 2025 14:18:40 +0100 [thread overview]
Message-ID: <20251114131901.441650-1-c.ebner@proxmox.com> (raw)
These patches fix possible race conditions on datastores with s3 backend for
chunk insert, renaming of corrupt chunks during verification and cleanup during
garbage collection. Further, the patches assure consistency between the chunk
marker file of the local datastore cache, the s3 object store and the in-memory
LRU cache during state changes occurring by one of the above mentioned operations.
Consistency is achieved by using a per-chunk file locking mechanism. File locks
are stored on the predefined location for datastore file locks, using the same
`.chunks/prefix/digest` folder layout for consistency and to keep readdir and
other fs operations performant.
As part of the series it is now also assured that chunks which are removed from
the local datastore cache, are also dropped from it's in-memory LRU cache and
therefore a consistent state is achieved. Further, bad chunks are touched as
well during GC phase 1 for s3 backed datastores and the creation of missing
marker files performed conditionally, to avoid consistency issues.
Changes since version 5 (thanks @Fabian for catching 2 more issues):
- Only remove corrupt chunk from cache after renaming it, as otherwise the cache
remove already deletes the chunk file.
- Correctly distinguish bad chunks from regular ones in chunk_path_from_object_key(),
and use that information for correctly processing the bad chunks in GC phase 2.
Changes since version 4:
- Incorporated patches by Fabian for better handling of the chunk store mutex locking
- Add patches to fix missing marker file creation and keeping of bad chunks during
garbage collection for s3 backend
- Document locking order restrictions
Changes since version 3:
- Add patches to limit visibility of BackupDir and BackupGroup destroy
- Refactored s3 upload index helper
- Avoid unneeded double stat for GC phase 3 clenaups
Changes since version 2:
- Incorporate additional race fix as discussed in
https://lore.proxmox.com/pbs-devel/8ab74557-9592-43e7-8706-10fceaae31b7@proxmox.com/T/
and suggested offlist.
Changes since version 1 (thanks @Fabian for review):
- Fix lock inversion for rename corrup chunk.
- Inline the chunk lock helper, making it explicit and thereby avoid calling the
helper for regular datastores.
- Pass the backend to the add_blob datastore helper, so it can be reused for the
backup session and pull sync job.
- Move also the s3 index upload helper from the backup env to the datastore, and
reuse it for the sync job as well.
This patch series obsoletes two previous patch series with unfortunately
incomplete bugfix attempts found at:
- https://lore.proxmox.com/pbs-devel/8d711a20-b193-47a9-8f38-6ce800e6d0e8@proxmox.com/T/
- https://lore.proxmox.com/pbs-devel/20251015164008.975591-1-c.ebner@proxmox.com/T/
proxmox-backup:
Christian Ebner (18):
datastore: GC: drop overly verbose info message during s3 chunk sweep
chunk store: implement per-chunk file locking helper for s3 backend
datastore: acquire chunk store mutex lock when renaming corrupt chunk
datastore: get per-chunk file lock for chunk rename on s3 backend
fix #6961: datastore: verify: evict corrupt chunks from in-memory LRU
cache
datastore: add locking to protect against races on chunk insert for s3
GC: fix race with chunk upload/insert on s3 backends
chunk store: reduce exposure of clear_chunk() to crate only
chunk store: make chunk removal a helper method of the chunk store
GC: cleanup chunk markers from cache in phase 3 on s3 backends
GC: touch bad chunk files independent of backend type
GC: guard missing marker file insertion for s3 backed stores
GC: s3: track if a chunk marker file is missing since a bad chunk
chunk store: add helpers marking missing local chunk markers as
expected
GC: assure chunk exists on s3 store when creating missing chunk marker
datastore: document s3 backend specific locking restrictions
GC: fix: don't drop bad extension for S3 object to chunk path helper
GC: clean up bad chunks from the filesystem only
Fabian Grünbichler (3):
store: split insert_chunk into wrapper + unsafe locked implementation
store: cache: move Mutex acquire to cache insertion
chunk store: rename cache-specific helpers
pbs-datastore/src/backup_info.rs | 2 +-
pbs-datastore/src/chunk_store.rs | 124 +++++++++++-
pbs-datastore/src/datastore.rs | 180 ++++++++++++------
pbs-datastore/src/lib.rs | 13 ++
.../src/local_datastore_lru_cache.rs | 23 ++-
5 files changed, 263 insertions(+), 79 deletions(-)
Summary over all repositories:
5 files changed, 263 insertions(+), 79 deletions(-)
--
Generated by git-murpp 0.8.1
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
next reply other threads:[~2025-11-14 13:18 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-14 13:18 Christian Ebner [this message]
2025-11-14 13:18 ` [pbs-devel] [PATCH proxmox-backup v6 01/21] datastore: GC: drop overly verbose info message during s3 chunk sweep Christian Ebner
2025-11-14 13:18 ` [pbs-devel] [PATCH proxmox-backup v6 02/21] chunk store: implement per-chunk file locking helper for s3 backend Christian Ebner
2025-11-14 13:18 ` [pbs-devel] [PATCH proxmox-backup v6 03/21] datastore: acquire chunk store mutex lock when renaming corrupt chunk Christian Ebner
2025-11-14 13:18 ` [pbs-devel] [PATCH proxmox-backup v6 04/21] datastore: get per-chunk file lock for chunk rename on s3 backend Christian Ebner
2025-11-14 13:18 ` [pbs-devel] [PATCH proxmox-backup v6 05/21] fix #6961: datastore: verify: evict corrupt chunks from in-memory LRU cache Christian Ebner
2025-11-14 13:18 ` [pbs-devel] [PATCH proxmox-backup v6 06/21] datastore: add locking to protect against races on chunk insert for s3 Christian Ebner
2025-11-14 13:18 ` [pbs-devel] [PATCH proxmox-backup v6 07/21] GC: fix race with chunk upload/insert on s3 backends Christian Ebner
2025-11-14 13:18 ` [pbs-devel] [PATCH proxmox-backup v6 08/21] chunk store: reduce exposure of clear_chunk() to crate only Christian Ebner
2025-11-14 13:18 ` [pbs-devel] [PATCH proxmox-backup v6 09/21] chunk store: make chunk removal a helper method of the chunk store Christian Ebner
2025-11-14 13:18 ` [pbs-devel] [PATCH proxmox-backup v6 10/21] store: split insert_chunk into wrapper + unsafe locked implementation Christian Ebner
2025-11-14 13:18 ` [pbs-devel] [PATCH proxmox-backup v6 11/21] store: cache: move Mutex acquire to cache insertion Christian Ebner
2025-11-14 13:18 ` [pbs-devel] [PATCH proxmox-backup v6 12/21] chunk store: rename cache-specific helpers Christian Ebner
2025-11-14 13:18 ` [pbs-devel] [PATCH proxmox-backup v6 13/21] GC: cleanup chunk markers from cache in phase 3 on s3 backends Christian Ebner
2025-11-14 13:18 ` [pbs-devel] [PATCH proxmox-backup v6 14/21] GC: touch bad chunk files independent of backend type Christian Ebner
2025-11-14 13:18 ` [pbs-devel] [PATCH proxmox-backup v6 15/21] GC: guard missing marker file insertion for s3 backed stores Christian Ebner
2025-11-14 13:18 ` [pbs-devel] [PATCH proxmox-backup v6 16/21] GC: s3: track if a chunk marker file is missing since a bad chunk Christian Ebner
2025-11-14 13:18 ` [pbs-devel] [PATCH proxmox-backup v6 17/21] chunk store: add helpers marking missing local chunk markers as expected Christian Ebner
2025-11-14 13:18 ` [pbs-devel] [PATCH proxmox-backup v6 18/21] GC: assure chunk exists on s3 store when creating missing chunk marker Christian Ebner
2025-11-14 13:18 ` [pbs-devel] [PATCH proxmox-backup v6 19/21] datastore: document s3 backend specific locking restrictions Christian Ebner
2025-11-14 13:19 ` [pbs-devel] [PATCH proxmox-backup v6 20/21] GC: fix: don't drop bad extension for S3 object to chunk path helper Christian Ebner
2025-11-14 13:19 ` [pbs-devel] [PATCH proxmox-backup v6 21/21] GC: clean up bad chunks from the filesystem only Christian Ebner
2025-11-14 13:34 ` [pbs-devel] [PATCH proxmox-backup v6 00/21] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Fabian Grünbichler
2025-11-14 22:14 ` Thomas Lamprecht
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251114131901.441650-1-c.ebner@proxmox.com \
--to=c.ebner@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox