From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 3C0D31FF15E for ; Mon, 24 Nov 2025 08:35:31 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 47F34197B1; Mon, 24 Nov 2025 08:35:41 +0100 (CET) Message-ID: <7393d75a-04be-4940-80c9-1f8e9781227a@proxmox.com> Date: Mon, 24 Nov 2025 08:35:36 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird To: Thomas Lamprecht , Proxmox Backup Server development discussion References: <20251122104118.205994-1-c.ebner@proxmox.com> Content-Language: en-US, de-DE From: Christian Ebner In-Reply-To: X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1763969702452 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.047 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pbs-devel] [PATCH proxmox-backup] GC: s3: fix local marker cleanup for unreferenced, s3 only chunks X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox Backup Server development discussion Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" Thanks for taking a look at this already! On 11/22/25 3:55 PM, Thomas Lamprecht wrote: > Am 22.11.25 um 11:41 schrieb Christian Ebner: >> If a chunk object is located on the s3 object store only, not being >> referenced by any index file and having no local marker file it is >> marked for cleanup by pretending an atime equal to the unix epoch. >> >> While this will mark the chunk for deletion from the backend and >> include it in the delete list for the next s3 delete objects call, it >> also will lead to the chunk marker and LRU cache entry being tried to >> clean up locally, which however fails since there is no marker to be >> cleaned up. >> >> In order to treat this edge case with the same cleanup logic, simply >> insert the marker file if not present, for it to get correctly >> cleaned up as expected afterwards. This should not happen under >> normal datastore operation anyways, most likely to appear after >> re-creation of the datastore from existing bucket contents containing >> such unreferenced chunks. >> >> Fixes: https://forum.proxmox.com/threads/176567/ >> Signed-off-by: Christian Ebner >> --- >> pbs-datastore/src/datastore.rs | 9 +++++---- >> 1 file changed, 5 insertions(+), 4 deletions(-) >> >> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs >> index 65299cca9..a24392d9f 100644 >> --- a/pbs-datastore/src/datastore.rs >> +++ b/pbs-datastore/src/datastore.rs >> @@ -1711,11 +1711,12 @@ impl DataStore { >> let atime = match std::fs::metadata(&chunk_path) { >> Ok(stat) => stat.accessed()?, >> Err(err) if err.kind() == std::io::ErrorKind::NotFound => { >> + unsafe { >> + // chunk store lock held >> + // insert marke unconditionally, cleaned up again below if required >> + self.inner.chunk_store.replace_chunk_with_marker(&digest)?; >> + } >> if self.inner.chunk_store.clear_chunk_expected_mark(&digest)? { >> - unsafe { >> - // chunk store lock held >> - self.inner.chunk_store.replace_chunk_with_marker(&digest)?; >> - } >> SystemTime::now() > > Why not drop that whole branch instead, it does not really makes sense IIUC. No, this branch is needed. This is required for avoiding API calls to the s3 backend in case the chunk is referenced by an index file as detected during phase 1, but the local marker file is not present. In that case we do not want to directly check the existence on the backend (which we need to make sure to not mark a chunk which is however not present on the backend), but defer that check to phase 2, where we do the listing of all chunks anyways. This is done by flagging that chunk via the .using file. Here, if the chunk is encountered during s3 object store listing, but the local file is missing, we check and clear the chunk expected marker, which if present tells us the chunk still needs to be used. If not it is safe to clear it from the backend. > > And `replace_chunk_with_marker` replaces the chunk file directly (no extension) whereas > `clear_chunk_expected_mark` checks the chunk.using file, so does your reordering even > change anything, or is there a bug in `replace_chunk_with_marker`? `replace_chunk_with_marker` replaces a full chunk file with an empty marker, but also creates the empty marker if the original file is not present, so in this particular case it is actually used to create the marker, not to evict chunks from local datastore cache as under normal operation. I can send a patch to rename that method to make that clear. > > And independent of that, would it be better (more performant and less confusing) if > we ignore the "not present in LRU or no marker" in that edge case rather than creating > a file (doing more IO) just to delete that then again? I can do that as well of course by checking a flag in the remove callback. I opted for not doing that however since above is a very unlikely case to happen, as the s3 backend and local datastore cache should be in sync most of the time. Adding that check would be performed for each chunk being removed, this only once if the chunk is still present on the backend, but not on the local datastore cache. The additional IO is therefore justfied IMO. I could of course also go the route of just setting a boolean flag and checking that in the callback? What do you think? >> } else { >> // File not found, delete by setting atime to unix epoch > _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel