From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 85C6F1FF191 for ; Tue, 4 Nov 2025 14:07:50 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 0E839D104; Tue, 4 Nov 2025 14:08:23 +0100 (CET) From: Christian Ebner To: pbs-devel@lists.proxmox.com Date: Tue, 4 Nov 2025 14:06:40 +0100 Message-ID: <20251104130659.435139-1-c.ebner@proxmox.com> X-Mailer: git-send-email 2.47.3 MIME-Version: 1.0 X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1762261619008 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.047 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pbs-devel] [PATCH proxmox-backup v2 00/19] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox Backup Server development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" These patches fix possible race conditions on datastores with s3 backend for chunk insert, renaming of corrupt chunks during verification and cleanup during garbage collection. Further, the patches assure consistency between the chunk marker file of the local datastore cache, the s3 object store and the in-memory LRU cache during state changes occurring by one of the above mentioned operations. Consistency is achieved by using a per-chunk file locking mechanism. File locks are stored on the predefined location for datastore file locks, using the same `.chunks/prefix/digest` folder layout for consistency and to keep readdir and other fs operations performant. Before introducing the file locking mechanism, the patches refactor pre-existing code to move most of the backend related logic away from the api code to the datastore implementation, in order to have a common interface especially for chunk insert. As part of the series it is now also assured that chunks which are removed from the local datastore cache, are also dropped from it's in-memory LRU cache and therefore a consistent state is achieved. Changes since version 1 (thanks @Fabian for review): - Fix lock inversion for rename corrup chunk. - Inline the chunk lock helper, making it explicit and thereby avoid calling the helper for regular datastores. - Pass the backend to the add_blob datastore helper, so it can be reused for the backup session and pull sync job. - Move also the s3 index upload helper from the backup env to the datastore, and reuse it for the sync job as well. This patch series obsoletes two previous patch series with unfortunately incomplete bugfix attempts found at: - https://lore.proxmox.com/pbs-devel/8d711a20-b193-47a9-8f38-6ce800e6d0e8@proxmox.com/T/ - https://lore.proxmox.com/pbs-devel/20251015164008.975591-1-c.ebner@proxmox.com/T/ proxmox-backup: Christian Ebner (19): sync: pull: instantiate backend only once per sync job api/datastore: move group notes setting to the datastore api/datastore: move snapshot deletion into dedicated datastore helper api/datastore: move backup log upload by implementing datastore helper api: backup: use datastore add_blob helper for backup session api/datastore: add dedicated datastore helper to set snapshot notes api/datastore: move s3 index upload helper to datastore backend datastore: refactor chunk insert based on backend verify: rename corrupted to corrupt in log output and function names verify/datastore: make rename corrupt chunk a datastore helper method datastore: refactor rename_corrupt_chunk error handling chunk store: implement per-chunk file locking helper for s3 backend datastore: acquire chunk store mutex lock when renaming corrupt chunk datastore: get per-chunk file lock for chunk rename on s3 backend fix #6961: datastore: verify: evict corrupt chunks from in-memory LRU cache datastore: add locking to protect against races on chunk insert for s3 GC: fix race with chunk upload/insert on s3 backends GC: lock chunk marker before cleanup in phase 3 on s3 backends datastore: GC: drop overly verbose info message during s3 chunk sweep pbs-datastore/src/backup_info.rs | 2 +- pbs-datastore/src/chunk_store.rs | 54 +++++- pbs-datastore/src/datastore.rs | 291 +++++++++++++++++++++++++++++-- src/api2/admin/datastore.rs | 77 +++----- src/api2/backup/environment.rs | 53 ++---- src/api2/backup/upload_chunk.rs | 64 ++----- src/api2/tape/restore.rs | 6 +- src/backup/verify.rs | 83 ++------- src/server/pull.rs | 61 +++---- 9 files changed, 415 insertions(+), 276 deletions(-) Summary over all repositories: 9 files changed, 415 insertions(+), 276 deletions(-) -- Generated by git-murpp 0.8.1 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel