From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id C5E521FF179 for ; Wed, 15 Oct 2025 18:40:32 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 9A434947; Wed, 15 Oct 2025 18:40:52 +0200 (CEST) From: Christian Ebner To: pbs-devel@lists.proxmox.com Date: Wed, 15 Oct 2025 18:39:58 +0200 Message-ID: <20251015164008.975591-1-c.ebner@proxmox.com> X-Mailer: git-send-email 2.47.3 MIME-Version: 1.0 X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1760546416279 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.041 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pbs-devel] [PATCH proxmox{, -backup} v3 00/10] s3 store: fix chunk upload/insert and GC race condition for s3 backend X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox Backup Server development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" Patches 1 to 3 extend the current s3-client implementation by an additional helper method, allowing to unconditionally upload chunks on final retry of otherwise conditional uploads. This is to avoid failing backups in case of concurrent uploads to the same chunk digest. The remaining patches fix a possible race condition between s3 backend upload and garbage collection, which can result in chunk loss. If the chunk upload finished, garbage collection listed and checked the chunk's in-use marker, just before it being written by the cache insert after the upload, garbage collection can incorrectly delete the chunk. This is circumvented by setting and checking an additional chunk marker file, which is touched before starting the upload and removed after cache insert, assuring that these chunks are not removed. Chunk upload marker files will not get cleaned up if the upload fails, excluding issues in case of concurrent uploads of the same chunk. Garbage collection has to clean up these markers as well, based on their atime. Changes since version 2 (thanks @Fabian): - Also fix issues arising from the concurrent conditional s3 put requests. - Only cleanup upload marker file on success, add GC logic to cleanup lingering ones from failed uploads. - Refactor chunk upload marker file related helpers. - Split patches into smaller portions. Changes since version 1 (thanks @Fabian): - Refactor the garbage collection rework patches, using a callback to perform the chunk removal, so both filesystem and s3 backend can use the same logic without the need to readapt the gc status. - Completely reworked the local datastore cache access method, so it not only serves the contents from s3 backend if that needs to be fetched, but also closes the download/insert race and drops quite some duplicate code, completely getting rid of the now obsolete S3Cacher - Rework the chunk insert for s3 to also cover cases were concurrent uploads of the same object/key occurs, making sure that the upload marker creation will not lead to failure and that the upload marker cleanup is handled correctly as well. The only race still open is which of the two concurrent uploads inserts to the local cache, but since both versions must encode for the same data ( as they have the same digest), this is not an issue. If one of the upload fails however, both must be considered as failed, since then there is no guarantee anymore that garbage collection did not cleanup the chunks from the s3 backend in the meantime. proxmox: Christian Ebner (2): s3-client: add exponential backoff time for upload retries s3-client: add helper method to force final unconditional upload on proxmox-s3-client/src/client.rs | 36 +++++++++++++++++++++++++++++---- 1 file changed, 32 insertions(+), 4 deletions(-) proxmox-backup: Christian Ebner (8): api/pull: avoid failing on concurrent conditional chunk uploads datastore: GC: drop overly verbose info message during s3 chunk sweep GC: refactor atime gathering for local chunk markers with s3 backend chunk store: refactor method for chunk insertion chunk store: add backend upload marker helpers for s3 backed stores api: chunk upload: fix race between chunk backend upload and insert api: chunk upload: fix race with garbage collection for no-cache on s3 pull: guard chunk upload and only insert into cache after upload pbs-datastore/src/chunk_store.rs | 200 +++++++++++++++++- pbs-datastore/src/datastore.rs | 41 +++- .../src/local_datastore_lru_cache.rs | 3 +- src/api2/backup/upload_chunk.rs | 17 +- src/api2/config/datastore.rs | 2 + src/server/pull.rs | 7 +- 6 files changed, 242 insertions(+), 28 deletions(-) Summary over all repositories: 7 files changed, 274 insertions(+), 32 deletions(-) -- Generated by git-murpp 0.8.1 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel