From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pbs-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
	by lore.proxmox.com (Postfix) with ESMTPS id 407B91FF176
	for <inbox@lore.proxmox.com>; Fri, 21 Feb 2025 15:02:06 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 7C8F633FB;
	Fri, 21 Feb 2025 15:02:05 +0100 (CET)
From: Christian Ebner <c.ebner@proxmox.com>
To: pbs-devel@lists.proxmox.com
Date: Fri, 21 Feb 2025 15:01:08 +0100
Message-Id: <20250221140110.377328-4-c.ebner@proxmox.com>
X-Mailer: git-send-email 2.39.5
In-Reply-To: <20250221140110.377328-1-c.ebner@proxmox.com>
References: <20250221140110.377328-1-c.ebner@proxmox.com>
MIME-Version: 1.0
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.031 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: [pbs-devel] [PATCH proxmox-backup 3/5] garbage collection: add
 structure for optimized image iteration
X-BeenThere: pbs-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe>
Reply-To: Proxmox Backup Server development discussion
 <pbs-devel@lists.proxmox.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: pbs-devel-bounces@lists.proxmox.com
Sender: "pbs-devel" <pbs-devel-bounces@lists.proxmox.com>

Implements the GroupedImageList struct and methods, which groups
index files (image) paths by a hierarchy for optimized iteration
during phase 1 of garbage collection.

Currently, phase 1 of garbage collection iterates over all folders in
the datastore, without considering any logical organization. This is
to avoid missing image indices which might have unexpected paths,
thereby deleting chunks which are still in use by these indices in GC
phase 2.

The new structure helps to iterate over the index files in a more
logical way, without missing strange paths. The hierarchical
organization helps to avoid touching shared chunks of incremental
snapshot backups in a backup group multiple times, by allowing
tracking of these without excessive memory requirements.

Since deduplication happens on a per image basis for subsequent
snapshots, the hierarchy is chosen as follows:
- ns/group
- image filename
- snapshot timestamp

This allows to iterate over consecutive snapshots for the same images
in the same backup namespace and group.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 pbs-datastore/src/datastore.rs | 63 ++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index eda78193d..520f54548 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1,4 +1,5 @@
 use std::collections::{HashMap, HashSet};
+use std::ffi::OsString;
 use std::io::{self, Write};
 use std::os::unix::ffi::OsStrExt;
 use std::os::unix::io::AsRawFd;
@@ -1573,3 +1574,65 @@ impl DataStore {
         Ok(())
     }
 }
+
+struct GroupedImageList {
+    groups: HashMap<String, HashMap<OsString, Vec<(i64, PathBuf)>>>,
+    strange_path_images: Vec<PathBuf>,
+}
+
+impl GroupedImageList {
+    fn new() -> Self {
+        Self {
+            groups: HashMap::new(),
+            strange_path_images: Vec::new(),
+        }
+    }
+
+    fn insert(&mut self, img: &Path, base_path: &Path) -> Result<(), Error> {
+        let img = img.to_path_buf();
+
+        if let Some(backup_dir_path) = img.parent() {
+            let backup_dir_path = backup_dir_path.strip_prefix(base_path)?;
+
+            if let Some(backup_dir_str) = backup_dir_path.to_str() {
+                if let Ok((namespace, backup_dir)) =
+                    pbs_api_types::parse_ns_and_snapshot(backup_dir_str)
+                {
+                    if let Some(filename) = img.file_name() {
+                        let filename = filename.to_os_string();
+                        let group_key = format!("{namespace}/{group}", group = backup_dir.group);
+
+                        if let Some(images) = self.groups.get_mut(&group_key) {
+                            if let Some(snapshots) = images.get_mut(&filename) {
+                                snapshots.push((backup_dir.time, img));
+                            } else {
+                                let snapshots = vec![(backup_dir.time, img)];
+                                images.insert(filename, snapshots);
+                            }
+                        } else {
+                            // ns/group not present, insert new
+                            let snapshots = vec![(backup_dir.time, img)];
+                            let mut images = HashMap::new();
+                            images.insert(filename, snapshots);
+                            self.groups.insert(group_key, images);
+                        }
+                        return Ok(());
+                    }
+                }
+            }
+        }
+
+        self.strange_path_images.push(img);
+        Ok(())
+    }
+
+    fn len(&self) -> usize {
+        let mut count = self.strange_path_images.len();
+        for (_group, images) in self.groups.iter() {
+            for (_image, snapshots) in images.iter() {
+                count += snapshots.len();
+            }
+        }
+        count
+    }
+}
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel