all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Proxmox Backup Server development discussion
	<pbs-devel@lists.proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup 0/7] s3 store: fix issues with chunk s3 backend upload and cache eviction
Date: Mon, 06 Oct 2025 15:18:35 +0200	[thread overview]
Message-ID: <1759754692.4isfk73089.astroid@yuna.none> (raw)
In-Reply-To: <20251006104151.487202-1-c.ebner@proxmox.com>

On October 6, 2025 12:41 pm, Christian Ebner wrote:
> These patches fix 2 issues with the current s3 backend implementation
> and reduce code duplication.
> 
> Patch 1 to 3 rework the garbage collection, deduplicating the common atime
> check and garbage collection status update logic and refactor the chunk removal
> from local datastore cache on s3 backends.
> 
> Patch 4 fixes an issue which could lead to backup restores failing
> during concurrent backups, if the local datastore cache was small, leading
> to chunks being evicted from the cache by truncating the contents, while the
> chunk reader still accessed the chunk. This is now circumvented by replacing
> the chunk file instead, leaving the contents on the already opened file
> handle for the reader still accessible.

these parts here are not 100% there yet (see comments), but I think the
approach taken is fine in principial!

> The remaining patches fix a possible race condition between s3 backend upload
> and garbage collection, which can result in chunk loss. If the chunk upload
> finished, garbage collection listed and checked the chunk's in-use marker, just
> before it being written by the cache insert after the upload, garbage collection
> can incorrectly delete the chunk. This is circumvented by setting and checking
> an additional chunk marker file, which is created before starting the upload
> and removed after cache insert, assuring that these chunks are not removed.

see detailed comments on individual patches, but I think the current
approach of using marker files guarded by the existing chunk store mutex
has a lot of subtle pitfalls surrounding concurrent insertions and error
handling..

one potential alternative would be to use an flock instead, but that
might lead to scalability issues if there are lots of concurrent chunk
uploads..

maybe we could instead do the following:
- lock the mutex & touch the pending marker file before uploading the
  chunk (but allow it to already exist, as that might be a concurrent or
  older failed upload)
- clean it up when inserting into the cache (but treat it not existing
  as benign?)
- in GC, lock and
  - check whether the chunk was properly inserted in the meantime (then
    we know it's fresh and mustn't be cleaned up)
  - if not, check the atime of the pending marker (if it is before the
    cutoff then it must be from an already aborted backup that we don't
    care about, and we can remove the pending marker and the chunk on S3)

and I think at this point we should have a single helper for inserting
an in-memory chunk into an S3-backed datastore, including the upload
handling.. while atm we don't support restoring from tape to S3
directly, once we add that it would be a third place where we'd need the
same/similar code and potentially introduce bugs related to handling all
this (backup API, pull sync are the existing two).

> proxmox-backup:
> 
> Christian Ebner (7):
>   datastore: gc: inline single callsite method
>   gc: chunk store: rework atime check and gc status into common helper
>   chunk store: add and use method to remove chunks
>   chunk store: fix: replace evicted cache chunks instead of truncate
>   api: chunk upload: fix race between chunk backend upload and insert
>   api: chunk upload: fix race with garbage collection for no-cache on s3
>   pull: guard chunk upload and only insert into cache after upload
> 
>  pbs-datastore/src/chunk_store.rs              | 211 +++++++++++++++---
>  pbs-datastore/src/datastore.rs                | 155 +++++++------
>  .../src/local_datastore_lru_cache.rs          |  32 +--
>  src/api2/backup/upload_chunk.rs               |  29 ++-
>  src/api2/config/datastore.rs                  |   2 +
>  src/server/pull.rs                            |  14 +-
>  6 files changed, 299 insertions(+), 144 deletions(-)
> 
> 
> Summary over all repositories:
>   6 files changed, 299 insertions(+), 144 deletions(-)
> 
> -- 
> Generated by git-murpp 0.8.1
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


  parent reply	other threads:[~2025-10-06 13:18 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-06 10:41 Christian Ebner
2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 1/7] datastore: gc: inline single callsite method Christian Ebner
2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 2/7] gc: chunk store: rework atime check and gc status into common helper Christian Ebner
2025-10-06 13:14   ` Fabian Grünbichler
2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 3/7] chunk store: add and use method to remove chunks Christian Ebner
2025-10-06 13:17   ` Fabian Grünbichler
2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 4/7] chunk store: fix: replace evicted cache chunks instead of truncate Christian Ebner
2025-10-06 13:18   ` Fabian Grünbichler
2025-10-06 15:35     ` Christian Ebner
2025-10-06 16:14       ` Christian Ebner
2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 5/7] api: chunk upload: fix race between chunk backend upload and insert Christian Ebner
2025-10-06 13:18   ` Fabian Grünbichler
2025-10-07 10:15     ` Christian Ebner
2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 6/7] api: chunk upload: fix race with garbage collection for no-cache on s3 Christian Ebner
2025-10-06 13:18   ` Fabian Grünbichler
2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 7/7] pull: guard chunk upload and only insert into cache after upload Christian Ebner
2025-10-06 13:18   ` Fabian Grünbichler
2025-10-06 13:18 ` Fabian Grünbichler [this message]
2025-10-08 15:22 ` [pbs-devel] superseded: [PATCH proxmox-backup 0/7] s3 store: fix issues with chunk s3 backend upload and cache eviction Christian Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1759754692.4isfk73089.astroid@yuna.none \
    --to=f.gruenbichler@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal