From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 2DEDB1FF183 for ; Wed, 8 Oct 2025 17:22:05 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id DC207C40F; Wed, 8 Oct 2025 17:22:10 +0200 (CEST) From: Christian Ebner To: pbs-devel@lists.proxmox.com Date: Wed, 8 Oct 2025 17:21:13 +0200 Message-ID: <20251008152125.849216-1-c.ebner@proxmox.com> X-Mailer: git-send-email 2.47.3 MIME-Version: 1.0 X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1759936866648 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.043 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pbs-devel] [PATCH proxmox-backup v2 00/12] s3 store: fix issues with chunk s3 backend upload and cache eviction X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox Backup Server development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" These patches fix 2 issues with the current s3 backend implementation and reduce code duplication. Patch 1 to 3 rework the garbage collection, deduplicating the common atime check and garbage collection status update logic and marks the chunk removal function signature as unsafe, as this requires specific pre-conditions. Patch 4 to 8 fix an issue which could lead to backup restores failing during concurrent backups, if the local datastore cache was small, leading to chunks being evicted from the cache by truncating the contents, while the chunk reader still accessed the chunk. This is now circumvented by replacing the chunk file instead, leaving the contents on the already opened file handle for the reader still accessible. The remaining patches fix a possible race condition between s3 backend upload and garbage collection, which can result in chunk loss. If the chunk upload finished, garbage collection listed and checked the chunk's in-use marker, just before it being written by the cache insert after the upload, garbage collection can incorrectly delete the chunk. This is circumvented by setting and checking an additional chunk marker file, which is created before starting the upload and removed after cache insert, assuring that these chunks are not removed. Changes since version 1 (thanks @Fabian): - Refactor the garbage collection rework patches, using a callback to perform the chunk removal, so both filesystem and s3 backend can use the same logic without the need to readapt the gc status. - Completely reworked the local datastore cache access method, so it not only serves the contents from s3 backend if that needs to be fetched, but also closes the download/insert race and drops quite some duplicate code, completely getting rid of the now obsolete S3Cacher - Rework the chunk insert for s3 to also cover cases were concurrent uploads of the same object/key occurs, making sure that the upload marker creation will not lead to failure and that the upload marker cleanup is handled correctly as well. The only race still open is which of the two concurrent uploads inserts to the local cache, but since both versions must encode for the same data ( as they have the same digest), this is not an issue. If one of the upload fails however, both must be considered as failed, since then there is no guarantee anymore that garbage collection did not cleanup the chunks from the s3 backend in the meantime. proxmox-backup: Christian Ebner (12): datastore: gc: inline single callsite method gc: chunk store: rework atime check and gc status into common helper chunk store: add unsafe signature to cache remove method local store cache: replace evicted cache chunks instead of truncate local store cache: serve response fetched from s3 backend local store cache: refactor fetch and insert of chunks for s3 backend local store cache: rework access cache fetching and insert logic local store cache: drop obsolete cacher implementation chunk store: refactor method for chunk insertion api: chunk upload: fix race between chunk backend upload and insert api: chunk upload: fix race with garbage collection for no-cache on s3 pull: guard chunk upload and only insert into cache after upload pbs-datastore/src/chunk_store.rs | 287 +++++++++++++++--- pbs-datastore/src/datastore.rs | 171 +++++------ pbs-datastore/src/local_chunk_reader.rs | 27 +- .../src/local_datastore_lru_cache.rs | 166 ++++------ src/api2/backup/upload_chunk.rs | 33 +- src/api2/config/datastore.rs | 2 + src/api2/reader/mod.rs | 34 +-- src/server/pull.rs | 16 +- 8 files changed, 449 insertions(+), 287 deletions(-) Summary over all repositories: 8 files changed, 449 insertions(+), 287 deletions(-) -- Generated by git-murpp 0.8.1 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel