From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <pbs-devel-bounces@lists.proxmox.com> Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 82E3B1FF15C for <inbox@lore.proxmox.com>; Wed, 26 Mar 2025 11:04:27 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 0187B3309D; Wed, 26 Mar 2025 11:04:23 +0100 (CET) From: Christian Ebner <c.ebner@proxmox.com> To: pbs-devel@lists.proxmox.com Date: Wed, 26 Mar 2025 11:03:28 +0100 Message-Id: <20250326100333.116722-1-c.ebner@proxmox.com> X-Mailer: git-send-email 2.39.5 MIME-Version: 1.0 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.029 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [datastore.rs, proxmox-backup-proxy.rs] Subject: [pbs-devel] [PATCH v5 proxmox-backup 0/5] fix #5331: GC: avoid multiple atime updates X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion <pbs-devel.lists.proxmox.com> List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe> List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/> List-Post: <mailto:pbs-devel@lists.proxmox.com> List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help> List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe> Reply-To: Proxmox Backup Server development discussion <pbs-devel@lists.proxmox.com> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" <pbs-devel-bounces@lists.proxmox.com> This patches implement the logic to greatly improve the performance of phase 1 garbage collection by avoiding multiple atime updates on the same chunk. Currently, phase 1 GC iterates over all folders in the datastore looking and collecting all image index files without taking any logical assumptions (e.g. namespaces, groups, snapshots, ...). This is to avoid accidentally missing image index files located in unexpected paths and therefore not marking their chunks as in use, leading to potential data losses. This patches improve phase 1 by: - Iterating index images using the datatstore's iterators for detecting regular index files. Paths outside of the iterator logic are still taken into account and processed as well by generating a list of all the found images first, removing index files encountered while iterating, finally leaving a list of indexes with unexpected paths. These unexpected paths are now also logged, for the user to potentially take action. - Keeping track of recently touched chunks by storing their digests in a LRU cache, skipping over expensive atime updates for chunks already present in the cache. Most notably changes since version 4 (thanks Thomas for feedback): - Added basic benchmark results to the respective commit messages - Extend reasoning in commit messages - Adapted variable name, fixed formatting issue Most notably changes since version 3 (thanks Wolfgang for feedback): - Use `with_context` over `context` to avoid possibly unnecessary allocation - Align terminology with docs and rest of the codebase by using index file instead of image in method and variable names. Most notably changes since version 2 (thanks Fabian for feedback): - Use LRU cache instead of keeping track of chunks from the previous snapshot in the group. - Split patches to logically separate iteration from caching logic - Adapt for better anyhow context error propagation and formatting Most notably changes since version 1 (thanks Fabian for feedback): - Logically iterate using pre-existing iterators instead of constructing data structure for iteration when listing images. - Tested that double listing does not affect runtime. - Chunks are now remembered for all archives per snapshot, not just a single archive per snapshot as previously, this mimics more closely the backup behaviour, this give some additional gains in some cases. Christian Ebner (5): tools: lru cache: tell if node was already present or newly inserted garbage collection: format error including anyhow error context datastore: add helper method to open index reader from path garbage collection: generate index file list via datastore iterators fix #5331: garbage collection: avoid multiple chunk atime updates pbs-datastore/src/datastore.rs | 179 ++++++++++++++++++++++---------- pbs-tools/src/lru_cache.rs | 4 +- src/api2/admin/datastore.rs | 6 +- src/bin/proxmox-backup-proxy.rs | 2 +- 4 files changed, 131 insertions(+), 60 deletions(-) -- 2.39.5 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel