From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 1655E1FF15E for ; Mon, 24 Nov 2025 09:13:46 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id ACD3419F20; Mon, 24 Nov 2025 09:13:56 +0100 (CET) Date: Mon, 24 Nov 2025 09:13:49 +0100 From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= To: Proxmox Backup Server development discussion , Thomas Lamprecht References: <20251122104118.205994-1-c.ebner@proxmox.com> <7393d75a-04be-4940-80c9-1f8e9781227a@proxmox.com> In-Reply-To: <7393d75a-04be-4940-80c9-1f8e9781227a@proxmox.com> MIME-Version: 1.0 User-Agent: astroid/0.17.0 (https://github.com/astroidmail/astroid) Message-Id: <1763971841.tk1259krov.astroid@yuna.none> X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1763971997832 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.046 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pbs-devel] [PATCH proxmox-backup] GC: s3: fix local marker cleanup for unreferenced, s3 only chunks X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox Backup Server development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" On November 24, 2025 8:35 am, Christian Ebner wrote: > Thanks for taking a look at this already! > > On 11/22/25 3:55 PM, Thomas Lamprecht wrote: >> Am 22.11.25 um 11:41 schrieb Christian Ebner: >>> If a chunk object is located on the s3 object store only, not being >>> referenced by any index file and having no local marker file it is >>> marked for cleanup by pretending an atime equal to the unix epoch. >>> >>> While this will mark the chunk for deletion from the backend and >>> include it in the delete list for the next s3 delete objects call, it >>> also will lead to the chunk marker and LRU cache entry being tried to >>> clean up locally, which however fails since there is no marker to be >>> cleaned up. >>> >>> In order to treat this edge case with the same cleanup logic, simply >>> insert the marker file if not present, for it to get correctly >>> cleaned up as expected afterwards. This should not happen under >>> normal datastore operation anyways, most likely to appear after >>> re-creation of the datastore from existing bucket contents containing >>> such unreferenced chunks. >>> >>> Fixes: https://forum.proxmox.com/threads/176567/ >>> Signed-off-by: Christian Ebner >>> --- >>> pbs-datastore/src/datastore.rs | 9 +++++---- >>> 1 file changed, 5 insertions(+), 4 deletions(-) >>> >>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs >>> index 65299cca9..a24392d9f 100644 >>> --- a/pbs-datastore/src/datastore.rs >>> +++ b/pbs-datastore/src/datastore.rs >>> @@ -1711,11 +1711,12 @@ impl DataStore { >>> let atime = match std::fs::metadata(&chunk_path) { >>> Ok(stat) => stat.accessed()?, >>> Err(err) if err.kind() == std::io::ErrorKind::NotFound => { >>> + unsafe { >>> + // chunk store lock held >>> + // insert marke unconditionally, cleaned up again below if required >>> + self.inner.chunk_store.replace_chunk_with_marker(&digest)?; >>> + } >>> if self.inner.chunk_store.clear_chunk_expected_mark(&digest)? { >>> - unsafe { >>> - // chunk store lock held >>> - self.inner.chunk_store.replace_chunk_with_marker(&digest)?; >>> - } >>> SystemTime::now() >> >> Why not drop that whole branch instead, it does not really makes sense IIUC. > > No, this branch is needed. This is required for avoiding API calls to > the s3 backend in case the chunk is referenced by an index file as > detected during phase 1, but the local marker file is not present. In > that case we do not want to directly check the existence on the backend > (which we need to make sure to not mark a chunk which is however not > present on the backend), but defer that check to phase 2, where we do > the listing of all chunks anyways. This is done by flagging that chunk > via the .using file. > > Here, if the chunk is encountered during s3 object store listing, but > the local file is missing, we check and clear the chunk expected marker, > which if present tells us the chunk still needs to be used. If not it is > safe to clear it from the backend. > >> >> And `replace_chunk_with_marker` replaces the chunk file directly (no extension) whereas >> `clear_chunk_expected_mark` checks the chunk.using file, so does your reordering even >> change anything, or is there a bug in `replace_chunk_with_marker`? > > `replace_chunk_with_marker` replaces a full chunk file with an empty > marker, but also creates the empty marker if the original file is not > present, so in this particular case it is actually used to create the > marker, not to evict chunks from local datastore cache as under normal > operation. I can send a patch to rename that method to make that clear. > >> >> And independent of that, would it be better (more performant and less confusing) if >> we ignore the "not present in LRU or no marker" in that edge case rather than creating >> a file (doing more IO) just to delete that then again? > > I can do that as well of course by checking a flag in the remove > callback. I opted for not doing that however since above is a very > unlikely case to happen, as the s3 backend and local datastore cache > should be in sync most of the time. > Adding that check would be performed for each chunk being removed, this > only once if the chunk is still present on the backend, but not on the > local datastore cache. > > The additional IO is therefore justfied IMO. > > I could of course also go the route of just setting a boolean flag and > checking that in the callback? we basically already have such a boolean marker - we set `atime` to 0 in this case (and only this case), and we could just ignore the removal errors then? possibly limited to just ignoring ENOTFOUND? > > What do you think? > >>> } else { >>> // File not found, delete by setting atime to unix epoch >> > > > > _______________________________________________ > pbs-devel mailing list > pbs-devel@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel > > > _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel