From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <pbs-devel-bounces@lists.proxmox.com> Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 407B91FF176 for <inbox@lore.proxmox.com>; Fri, 21 Feb 2025 15:02:06 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 7C8F633FB; Fri, 21 Feb 2025 15:02:05 +0100 (CET) From: Christian Ebner <c.ebner@proxmox.com> To: pbs-devel@lists.proxmox.com Date: Fri, 21 Feb 2025 15:01:08 +0100 Message-Id: <20250221140110.377328-4-c.ebner@proxmox.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250221140110.377328-1-c.ebner@proxmox.com> References: <20250221140110.377328-1-c.ebner@proxmox.com> MIME-Version: 1.0 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.031 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pbs-devel] [PATCH proxmox-backup 3/5] garbage collection: add structure for optimized image iteration X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion <pbs-devel.lists.proxmox.com> List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe> List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/> List-Post: <mailto:pbs-devel@lists.proxmox.com> List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help> List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe> Reply-To: Proxmox Backup Server development discussion <pbs-devel@lists.proxmox.com> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" <pbs-devel-bounces@lists.proxmox.com> Implements the GroupedImageList struct and methods, which groups index files (image) paths by a hierarchy for optimized iteration during phase 1 of garbage collection. Currently, phase 1 of garbage collection iterates over all folders in the datastore, without considering any logical organization. This is to avoid missing image indices which might have unexpected paths, thereby deleting chunks which are still in use by these indices in GC phase 2. The new structure helps to iterate over the index files in a more logical way, without missing strange paths. The hierarchical organization helps to avoid touching shared chunks of incremental snapshot backups in a backup group multiple times, by allowing tracking of these without excessive memory requirements. Since deduplication happens on a per image basis for subsequent snapshots, the hierarchy is chosen as follows: - ns/group - image filename - snapshot timestamp This allows to iterate over consecutive snapshots for the same images in the same backup namespace and group. Signed-off-by: Christian Ebner <c.ebner@proxmox.com> --- pbs-datastore/src/datastore.rs | 63 ++++++++++++++++++++++++++++++++++ 1 file changed, 63 insertions(+) diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs index eda78193d..520f54548 100644 --- a/pbs-datastore/src/datastore.rs +++ b/pbs-datastore/src/datastore.rs @@ -1,4 +1,5 @@ use std::collections::{HashMap, HashSet}; +use std::ffi::OsString; use std::io::{self, Write}; use std::os::unix::ffi::OsStrExt; use std::os::unix::io::AsRawFd; @@ -1573,3 +1574,65 @@ impl DataStore { Ok(()) } } + +struct GroupedImageList { + groups: HashMap<String, HashMap<OsString, Vec<(i64, PathBuf)>>>, + strange_path_images: Vec<PathBuf>, +} + +impl GroupedImageList { + fn new() -> Self { + Self { + groups: HashMap::new(), + strange_path_images: Vec::new(), + } + } + + fn insert(&mut self, img: &Path, base_path: &Path) -> Result<(), Error> { + let img = img.to_path_buf(); + + if let Some(backup_dir_path) = img.parent() { + let backup_dir_path = backup_dir_path.strip_prefix(base_path)?; + + if let Some(backup_dir_str) = backup_dir_path.to_str() { + if let Ok((namespace, backup_dir)) = + pbs_api_types::parse_ns_and_snapshot(backup_dir_str) + { + if let Some(filename) = img.file_name() { + let filename = filename.to_os_string(); + let group_key = format!("{namespace}/{group}", group = backup_dir.group); + + if let Some(images) = self.groups.get_mut(&group_key) { + if let Some(snapshots) = images.get_mut(&filename) { + snapshots.push((backup_dir.time, img)); + } else { + let snapshots = vec![(backup_dir.time, img)]; + images.insert(filename, snapshots); + } + } else { + // ns/group not present, insert new + let snapshots = vec![(backup_dir.time, img)]; + let mut images = HashMap::new(); + images.insert(filename, snapshots); + self.groups.insert(group_key, images); + } + return Ok(()); + } + } + } + } + + self.strange_path_images.push(img); + Ok(()) + } + + fn len(&self) -> usize { + let mut count = self.strange_path_images.len(); + for (_group, images) in self.groups.iter() { + for (_image, snapshots) in images.iter() { + count += snapshots.len(); + } + } + count + } +} -- 2.39.5 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel