From: Christian Ebner <c.ebner@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] [PATCH proxmox-backup 3/5] garbage collection: add structure for optimized image iteration
Date: Fri, 21 Feb 2025 15:01:08 +0100 [thread overview]
Message-ID: <20250221140110.377328-4-c.ebner@proxmox.com> (raw)
In-Reply-To: <20250221140110.377328-1-c.ebner@proxmox.com>
Implements the GroupedImageList struct and methods, which groups
index files (image) paths by a hierarchy for optimized iteration
during phase 1 of garbage collection.
Currently, phase 1 of garbage collection iterates over all folders in
the datastore, without considering any logical organization. This is
to avoid missing image indices which might have unexpected paths,
thereby deleting chunks which are still in use by these indices in GC
phase 2.
The new structure helps to iterate over the index files in a more
logical way, without missing strange paths. The hierarchical
organization helps to avoid touching shared chunks of incremental
snapshot backups in a backup group multiple times, by allowing
tracking of these without excessive memory requirements.
Since deduplication happens on a per image basis for subsequent
snapshots, the hierarchy is chosen as follows:
- ns/group
- image filename
- snapshot timestamp
This allows to iterate over consecutive snapshots for the same images
in the same backup namespace and group.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/datastore.rs | 63 ++++++++++++++++++++++++++++++++++
1 file changed, 63 insertions(+)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index eda78193d..520f54548 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1,4 +1,5 @@
use std::collections::{HashMap, HashSet};
+use std::ffi::OsString;
use std::io::{self, Write};
use std::os::unix::ffi::OsStrExt;
use std::os::unix::io::AsRawFd;
@@ -1573,3 +1574,65 @@ impl DataStore {
Ok(())
}
}
+
+struct GroupedImageList {
+ groups: HashMap<String, HashMap<OsString, Vec<(i64, PathBuf)>>>,
+ strange_path_images: Vec<PathBuf>,
+}
+
+impl GroupedImageList {
+ fn new() -> Self {
+ Self {
+ groups: HashMap::new(),
+ strange_path_images: Vec::new(),
+ }
+ }
+
+ fn insert(&mut self, img: &Path, base_path: &Path) -> Result<(), Error> {
+ let img = img.to_path_buf();
+
+ if let Some(backup_dir_path) = img.parent() {
+ let backup_dir_path = backup_dir_path.strip_prefix(base_path)?;
+
+ if let Some(backup_dir_str) = backup_dir_path.to_str() {
+ if let Ok((namespace, backup_dir)) =
+ pbs_api_types::parse_ns_and_snapshot(backup_dir_str)
+ {
+ if let Some(filename) = img.file_name() {
+ let filename = filename.to_os_string();
+ let group_key = format!("{namespace}/{group}", group = backup_dir.group);
+
+ if let Some(images) = self.groups.get_mut(&group_key) {
+ if let Some(snapshots) = images.get_mut(&filename) {
+ snapshots.push((backup_dir.time, img));
+ } else {
+ let snapshots = vec![(backup_dir.time, img)];
+ images.insert(filename, snapshots);
+ }
+ } else {
+ // ns/group not present, insert new
+ let snapshots = vec![(backup_dir.time, img)];
+ let mut images = HashMap::new();
+ images.insert(filename, snapshots);
+ self.groups.insert(group_key, images);
+ }
+ return Ok(());
+ }
+ }
+ }
+ }
+
+ self.strange_path_images.push(img);
+ Ok(())
+ }
+
+ fn len(&self) -> usize {
+ let mut count = self.strange_path_images.len();
+ for (_group, images) in self.groups.iter() {
+ for (_image, snapshots) in images.iter() {
+ count += snapshots.len();
+ }
+ }
+ count
+ }
+}
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
next prev parent reply other threads:[~2025-02-21 14:02 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-21 14:01 [pbs-devel] [PATCH proxmox-backup 0/5] GC: avoid multiple atime updates Christian Ebner
2025-02-21 14:01 ` [pbs-devel] [PATCH proxmox-backup 1/5] datastore: restrict datastores list_images method scope to module Christian Ebner
2025-02-21 14:01 ` [pbs-devel] [PATCH proxmox-backup 2/5] garbage collection: refactor archive type based chunk marking logic Christian Ebner
2025-02-21 14:01 ` Christian Ebner [this message]
2025-03-05 13:47 ` [pbs-devel] [PATCH proxmox-backup 3/5] garbage collection: add structure for optimized image iteration Fabian Grünbichler
2025-03-07 8:24 ` Christian Ebner
2025-03-07 8:53 ` Fabian Grünbichler
2025-03-07 8:59 ` Christian Ebner
2025-02-21 14:01 ` [pbs-devel] [PATCH proxmox-backup 4/5] garbage collection: allow to keep track of already touched chunks Christian Ebner
2025-02-21 14:01 ` [pbs-devel] [PATCH proxmox-backup 5/5] fix #5331: garbage collection: avoid multiple chunk atime updates Christian Ebner
2025-02-21 15:35 ` [pbs-devel] [PATCH proxmox-backup 0/5] GC: avoid multiple " Roland
2025-02-21 15:49 ` Christian Ebner
2025-02-22 17:50 ` Roland
2025-03-10 11:18 ` Christian Ebner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250221140110.377328-4-c.ebner@proxmox.com \
--to=c.ebner@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal