From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id C178A1FF170 for ; Thu, 10 Jul 2025 13:15:17 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 4D68131273; Thu, 10 Jul 2025 13:16:03 +0200 (CEST) Message-ID: Date: Thu, 10 Jul 2025 13:15:27 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: Christian Ebner To: Thomas Lamprecht , Proxmox Backup Server development discussion References: <20250708170114.1556057-1-c.ebner@proxmox.com> <20250708170114.1556057-31-c.ebner@proxmox.com> Content-Language: en-US, de-DE In-Reply-To: X-SPAM-LEVEL: Spam detection results: 0 AWL 0.042 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pbs-devel] [PATCH proxmox-backup v6 21/37] datastore: implement garbage collection for s3 backend X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox Backup Server development discussion Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" On 7/10/25 11:47, Christian Ebner wrote: > On 7/10/25 08:59, Thomas Lamprecht wrote: >> I'm reading through this in not a very orderly fashion, so probably >> won't be a very structured review, but some comments here and there. >> >> Am 08.07.25 um 19:00 schrieb Christian Ebner: >>> Implements the garbage collection for datastore's backed by an s3 >>> object store. >>> Take advantage of the local datastore by placing marker files in the >>> chunk store during phase 1 of the garbage collection, updating their >>> atime if already present. By this expensive api calls can be avoided >>> to update the object metadata (only possible via a copy object >>> operation). >> >> The last sentence would be IMO slightly easier to understand: >> >> This allows us to avoid making expensive API calls to update object >> metadata, which would only be possible via a copy object operation. >> >>> >>> The phase 2 is implemented by fetching a list of all the chunks via >>> the ListObjectsV2 api call, filtered by the chunk folder prefix. >>> This operation has to be performed in patches of 1000 objects, given >> >> s/patches/batches/ >> >>> by the api's response limits. >>> For each object key, lookup the marker file and decide based on the >>> marker existence and it's atime if the chunk object needs to be >>> removed. Deletion happens via the delete objects operation, allowing >>> to delete multiple chunks by a single request. >>> >>> This allows to efficiently lookup chunks which are not in use >>> anymore while being performant and cost effective. >> >> Do you got some rough numbers perchance? E.g., something like "a >> datastore with X indexes, Y acutal data and Z deduplication factor >> is garbage collect in T time on:" and then the time numbers for >> e.g. ceph RGW backed S3, AWS/Cloudlflare S3/R2 and file system, >> just to get some idea of the ballpark we're in, and can also help >> to have such numbers as baseline for potential future optimization >> experiments. > > Ran into issues with GC returning sometimes 400 bad request errors while > performing the baseline performance test. > Interestingly only on AWS, Cloudflare R2 and RADOS Gateway work as > expected. > > I'm currently investigating, but this seems related to the next > continuation token in the list object v2 requests. Found and fixed the culprit: The issue here was the continuation token not being uri encoded, but being passed along as query parameter to the api request. This was an issue, since the token returned by AWS are base64 encoded, therefore might contain equal signs. Fixed this by reworking the encoding logic, instead of already encoding the strings in the S3ObjectKey, encode the path and all query parameters in the s3 client's `build_uri` helper, which is used by all api calls anyways. It makes actually more sense to have it there, as then object keys can be parsed/passed in the body without encoding/decoding steps. _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel