all lists on lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH proxmox-backup v4 00/14] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend
@ 2025-11-10 11:56 Christian Ebner
  2025-11-10 11:56 ` [pbs-devel] [PATCH proxmox-backup v4 01/14] datastore: limit scope of snapshot/group destroy methods to crate Christian Ebner
                   ` (15 more replies)
  0 siblings, 16 replies; 17+ messages in thread
From: Christian Ebner @ 2025-11-10 11:56 UTC (permalink / raw)
  To: pbs-devel

These patches fix possible race conditions on datastores with s3 backend for
chunk insert, renaming of corrupt chunks during verification and cleanup during
garbage collection. Further, the patches assure consistency between the chunk
marker file of the local datastore cache, the s3 object store and the in-memory
LRU cache during state changes occurring by one of the above mentioned operations.

Consistency is achieved by using a per-chunk file locking mechanism. File locks
are stored on the predefined location for datastore file locks, using the same
`.chunks/prefix/digest` folder layout for consistency and to keep readdir and
other fs operations performant.

Before introducing the file locking mechanism, the patches refactor pre-existing
code to move most of the backend related logic away from the api code to the
datastore implementation, in order to have a common interface especially for
chunk insert.

As part of the series it is now also assured that chunks which are removed from
the local datastore cache, are also dropped from it's in-memory LRU cache and
therefore a consistent state is achieved.

Changes since version 3:
- Add patches to limit visibility of BackupDir and BackupGroup destroy
- Refactored s3 upload index helper
- Avoid unneeded double stat for GC phase 3 clenaups

Changes since version 2:
- Incorporate additional race fix as discussed in
  https://lore.proxmox.com/pbs-devel/8ab74557-9592-43e7-8706-10fceaae31b7@proxmox.com/T/
  and suggested offlist.

Changes since version 1 (thanks @Fabian for review):
- Fix lock inversion for rename corrup chunk.
- Inline the chunk lock helper, making it explicit and thereby avoid calling the
  helper for regular datastores.
- Pass the backend to the add_blob datastore helper, so it can be reused for the
  backup session and pull sync job.
- Move also the s3 index upload helper from the backup env to the datastore, and
  reuse it for the sync job as well.

This patch series obsoletes two previous patch series with unfortunately
incomplete bugfix attempts found at:
- https://lore.proxmox.com/pbs-devel/8d711a20-b193-47a9-8f38-6ce800e6d0e8@proxmox.com/T/
- https://lore.proxmox.com/pbs-devel/20251015164008.975591-1-c.ebner@proxmox.com/T/

proxmox-backup:

Christian Ebner (14):
  datastore: limit scope of snapshot/group destroy methods to crate
  api/datastore: move s3 index upload helper to datastore backend
  chunk store: implement per-chunk file locking helper for s3 backend
  datastore: acquire chunk store mutex lock when renaming corrupt chunk
  datastore: get per-chunk file lock for chunk rename on s3 backend
  fix #6961: datastore: verify: evict corrupt chunks from in-memory LRU
    cache
  datastore: add locking to protect against races on chunk insert for s3
  GC: fix race with chunk upload/insert on s3 backends
  GC: cleanup chunk markers from cache in phase 3 on s3 backends
  datastore: GC: drop overly verbose info message during s3 chunk sweep
  chunk store: reduce exposure of clear_chunk() to crate only
  chunk store: make chunk removal a helper method of the chunk store
  GC: fix deadlock for cache eviction and garbage collection
  chunk store: never fail when trying to remove missing chunk file

 pbs-datastore/src/backup_info.rs              |  9 +-
 pbs-datastore/src/chunk_store.rs              | 67 ++++++++++++-
 pbs-datastore/src/datastore.rs                | 97 ++++++++++++++++---
 .../src/local_datastore_lru_cache.rs          |  6 +-
 src/api2/backup/environment.rs                | 36 +++----
 src/server/pull.rs                            | 16 +--
 6 files changed, 172 insertions(+), 59 deletions(-)


Summary over all repositories:
  6 files changed, 172 insertions(+), 59 deletions(-)

-- 
Generated by git-murpp 0.8.1


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-11-11 14:31 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-10 11:56 [pbs-devel] [PATCH proxmox-backup v4 00/14] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Christian Ebner
2025-11-10 11:56 ` [pbs-devel] [PATCH proxmox-backup v4 01/14] datastore: limit scope of snapshot/group destroy methods to crate Christian Ebner
2025-11-10 11:56 ` [pbs-devel] [PATCH proxmox-backup v4 02/14] api/datastore: move s3 index upload helper to datastore backend Christian Ebner
2025-11-10 11:56 ` [pbs-devel] [PATCH proxmox-backup v4 03/14] chunk store: implement per-chunk file locking helper for s3 backend Christian Ebner
2025-11-10 11:56 ` [pbs-devel] [PATCH proxmox-backup v4 04/14] datastore: acquire chunk store mutex lock when renaming corrupt chunk Christian Ebner
2025-11-10 11:56 ` [pbs-devel] [PATCH proxmox-backup v4 05/14] datastore: get per-chunk file lock for chunk rename on s3 backend Christian Ebner
2025-11-10 11:56 ` [pbs-devel] [PATCH proxmox-backup v4 06/14] fix #6961: datastore: verify: evict corrupt chunks from in-memory LRU cache Christian Ebner
2025-11-10 11:56 ` [pbs-devel] [PATCH proxmox-backup v4 07/14] datastore: add locking to protect against races on chunk insert for s3 Christian Ebner
2025-11-10 11:56 ` [pbs-devel] [PATCH proxmox-backup v4 08/14] GC: fix race with chunk upload/insert on s3 backends Christian Ebner
2025-11-10 11:56 ` [pbs-devel] [PATCH proxmox-backup v4 09/14] GC: cleanup chunk markers from cache in phase 3 " Christian Ebner
2025-11-10 11:56 ` [pbs-devel] [PATCH proxmox-backup v4 10/14] datastore: GC: drop overly verbose info message during s3 chunk sweep Christian Ebner
2025-11-10 11:56 ` [pbs-devel] [PATCH proxmox-backup v4 11/14] chunk store: reduce exposure of clear_chunk() to crate only Christian Ebner
2025-11-10 11:56 ` [pbs-devel] [PATCH proxmox-backup v4 12/14] chunk store: make chunk removal a helper method of the chunk store Christian Ebner
2025-11-10 11:56 ` [pbs-devel] [PATCH proxmox-backup v4 13/14] GC: fix deadlock for cache eviction and garbage collection Christian Ebner
2025-11-10 11:56 ` [pbs-devel] [PATCH proxmox-backup v4 14/14] chunk store: never fail when trying to remove missing chunk file Christian Ebner
2025-11-11 11:09 ` [pbs-devel] partially-applied: [PATCH proxmox-backup v4 00/14] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Fabian Grünbichler
2025-11-11 14:31 ` [pbs-devel] " Christian Ebner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal