From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 7585B1FF146 for ; Mon, 19 Jan 2026 14:27:19 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 481201EDA8; Mon, 19 Jan 2026 14:27:26 +0100 (CET) From: Christian Ebner To: pbs-devel@lists.proxmox.com Date: Mon, 19 Jan 2026 14:27:03 +0100 Message-ID: <20260119132707.686523-1-c.ebner@proxmox.com> X-Mailer: git-send-email 2.47.3 MIME-Version: 1.0 X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1768829186792 X-SPAM-LEVEL: Spam detection results: 0 AWL -0.352 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_ASCII_DIVIDERS 0.8 Email that uses ascii formatting dividers and possible spam tricks KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pbs-devel] [RFC proxmox-backup 0/4] fix #5799: Gather per-namespace/group/snapshot storage usage stats X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox Backup Server development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" Disclaimer: These patches are still in a development state and send as RFC to discuss implementation details especially with respect to acceptable required memory footprint and performance limitations. As is, yhey are not intended for production use yet. Issue #5799 requested to gather and cache information about raw storage size of uniquely referenced chunks and deduplication factor for backup groups, with the intent to provide better introspection for storage optimization by allowing to pruning specific backup groups/snapshot based on this additional information. This patches draft an approach to generate such statistics during garbage collection, by collecting chunk to namespace/group/snapshot relations an providing an in-memory reverse mapping from chunk digests to namespaces/groups/snapshots referencing given chunk. This reverse mapping would further allow to e.g. mark snapshots as invalid if referenced chunks are missing. During phase 1, snapshots referencing chunk digests are stored in a lookup table. The actual namespace, group and snapshot data is stored in dedicated indexes, only referenced by the respective key in the lookup table, with the intent to keep the slot size predictable and small for better allocation. During phase 2 raw chunk size is collected while iterating over chunk files. Finally, the statistics are gathered by accumulating the counts of each chunk digest for each of the namespace/group/snapshot, taking advantage of the lookup map. Currently, the information is being gathered unconditionally and logged to the garbage collection task log, but it is planned to make this opt-in and store gathered data on the namespace/group/snapshot level, to e.g. be shown on the datastore content listings or a dedicated content listing. The following differences in RSS max values were observed via `watch -n 1 "ps -p $(pidof proxmox-backup-proxy) -o rss | tail -n 1 | tee -a ps-rss.out"` and compared to initial RSS values after service restart (and GC LRU cache disabled by setting to 0) on 2 datastores: | Delta RSS (MiB) | index files | chunk count | deduplication factor | ---------------------------------------------------------------------- | 412.355 | 1125 | 982236 | 14.69 | | 168.414 | 213 | 598312 | 5.93 | ---------------------------------------------------------------------- Open questions and ideas to discuss: - Is the observed memory requirement acceptable if provided as opt-in feature? Are there other ideas to further reduce the memory footprint? I was pondering about a indirection mapping to group digests by common prefix and only store individual suffixes, which however only scales better when there is no need to store this as hashmap, so not really suitable due to diminished lookup performance. - Conditionally replace the GC LRU cache by the lookup map if this feature is enabled. The digests need to be stored anyways, so it would make sense to use it to avoid multiple chunk atime updates instead. - Add a dedicated tab to show the contents independent from the current datastore contents? This would reduce the risk of misinterpretation as this is no real-time data. - Add this as dedicated task instead of combining it with garbage collection? This would allow to perform information gathering on specific sub-namespaces, groups or selected snapshots only. Link to the bugtracker issue: https://bugzilla.proxmox.com/show_bug.cgi?id=5799 proxmox-backup: Christian Ebner (4): chunk store: restrict chunk sweep helper method to module parent datastore: add namespace/group/snapshot indices for reverse lookups datastore: introduce reverse chunk digest lookup table fix #5799: GC: track chunk digests and accumulate statistics pbs-datastore/src/chunk_store.rs | 11 +- pbs-datastore/src/datastore.rs | 46 +++- pbs-datastore/src/lib.rs | 1 + pbs-datastore/src/reverse_digest_map.rs | 349 ++++++++++++++++++++++++ 4 files changed, 404 insertions(+), 3 deletions(-) create mode 100644 pbs-datastore/src/reverse_digest_map.rs Summary over all repositories: 4 files changed, 404 insertions(+), 3 deletions(-) -- Generated by git-murpp 0.8.1 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel