public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Christian Ebner <c.ebner@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] [PATCH proxmox-backup v2 00/12] s3 store: fix issues with chunk s3 backend upload and cache eviction
Date: Wed,  8 Oct 2025 17:21:13 +0200	[thread overview]
Message-ID: <20251008152125.849216-1-c.ebner@proxmox.com> (raw)

These patches fix 2 issues with the current s3 backend implementation
and reduce code duplication.

Patch 1 to 3 rework the garbage collection, deduplicating the common atime
check and garbage collection status update logic and marks the chunk removal
function signature as unsafe, as this requires specific pre-conditions.

Patch 4 to 8 fix an issue which could lead to backup restores failing
during concurrent backups, if the local datastore cache was small, leading
to chunks being evicted from the cache by truncating the contents, while the
chunk reader still accessed the chunk. This is now circumvented by replacing
the chunk file instead, leaving the contents on the already opened file
handle for the reader still accessible.

The remaining patches fix a possible race condition between s3 backend upload
and garbage collection, which can result in chunk loss. If the chunk upload
finished, garbage collection listed and checked the chunk's in-use marker, just
before it being written by the cache insert after the upload, garbage collection
can incorrectly delete the chunk. This is circumvented by setting and checking
an additional chunk marker file, which is created before starting the upload
and removed after cache insert, assuring that these chunks are not removed.

Changes since version 1 (thanks @Fabian):
- Refactor the garbage collection rework patches, using a callback to perform the
  chunk removal, so both filesystem and s3 backend can use the same logic without
  the need to readapt the gc status.
- Completely reworked the local datastore cache access method, so it not only
  serves the contents from s3 backend if that needs to be fetched, but also
  closes the download/insert race and drops quite some duplicate code,
  completely getting rid of the now obsolete S3Cacher
- Rework the chunk insert for s3 to also cover cases were concurrent uploads of
  the same object/key occurs, making sure that the upload marker creation will
  not lead to failure and that the upload marker cleanup is handled correctly as
  well. The only race still open is which of the two concurrent uploads inserts
  to the local cache, but since both versions must encode for the same data (
  as they have the same digest), this is not an issue. If one of the upload
  fails however, both must be considered as failed, since then there is no
  guarantee anymore that garbage collection did not cleanup the chunks from the
  s3 backend in the meantime.

proxmox-backup:

Christian Ebner (12):
  datastore: gc: inline single callsite method
  gc: chunk store: rework atime check and gc status into common helper
  chunk store: add unsafe signature to cache remove method
  local store cache: replace evicted cache chunks instead of truncate
  local store cache: serve response fetched from s3 backend
  local store cache: refactor fetch and insert of chunks for s3 backend
  local store cache: rework access cache fetching and insert logic
  local store cache: drop obsolete cacher implementation
  chunk store: refactor method for chunk insertion
  api: chunk upload: fix race between chunk backend upload and insert
  api: chunk upload: fix race with garbage collection for no-cache on s3
  pull: guard chunk upload and only insert into cache after upload

 pbs-datastore/src/chunk_store.rs              | 287 +++++++++++++++---
 pbs-datastore/src/datastore.rs                | 171 +++++------
 pbs-datastore/src/local_chunk_reader.rs       |  27 +-
 .../src/local_datastore_lru_cache.rs          | 166 ++++------
 src/api2/backup/upload_chunk.rs               |  33 +-
 src/api2/config/datastore.rs                  |   2 +
 src/api2/reader/mod.rs                        |  34 +--
 src/server/pull.rs                            |  16 +-
 8 files changed, 449 insertions(+), 287 deletions(-)


Summary over all repositories:
  8 files changed, 449 insertions(+), 287 deletions(-)

-- 
Generated by git-murpp 0.8.1


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


             reply	other threads:[~2025-10-08 15:22 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-08 15:21 Christian Ebner [this message]
2025-10-08 15:21 ` [pbs-devel] [PATCH proxmox-backup v2 01/12] datastore: gc: inline single callsite method Christian Ebner
2025-10-08 15:21 ` [pbs-devel] [PATCH proxmox-backup v2 02/12] gc: chunk store: rework atime check and gc status into common helper Christian Ebner
2025-10-08 15:21 ` [pbs-devel] [PATCH proxmox-backup v2 03/12] chunk store: add unsafe signature to cache remove method Christian Ebner
2025-10-08 15:21 ` [pbs-devel] [PATCH proxmox-backup v2 04/12] local store cache: replace evicted cache chunks instead of truncate Christian Ebner
2025-10-08 15:21 ` [pbs-devel] [PATCH proxmox-backup v2 05/12] local store cache: serve response fetched from s3 backend Christian Ebner
2025-10-08 15:21 ` [pbs-devel] [PATCH proxmox-backup v2 06/12] local store cache: refactor fetch and insert of chunks for " Christian Ebner
2025-10-08 15:21 ` [pbs-devel] [PATCH proxmox-backup v2 07/12] local store cache: rework access cache fetching and insert logic Christian Ebner
2025-10-08 15:21 ` [pbs-devel] [PATCH proxmox-backup v2 08/12] local store cache: drop obsolete cacher implementation Christian Ebner
2025-10-08 15:21 ` [pbs-devel] [PATCH proxmox-backup v2 09/12] chunk store: refactor method for chunk insertion Christian Ebner
2025-10-08 15:21 ` [pbs-devel] [PATCH proxmox-backup v2 10/12] api: chunk upload: fix race between chunk backend upload and insert Christian Ebner
2025-10-08 15:21 ` [pbs-devel] [PATCH proxmox-backup v2 11/12] api: chunk upload: fix race with garbage collection for no-cache on s3 Christian Ebner
2025-10-08 15:21 ` [pbs-devel] [PATCH proxmox-backup v2 12/12] pull: guard chunk upload and only insert into cache after upload Christian Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251008152125.849216-1-c.ebner@proxmox.com \
    --to=c.ebner@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal