public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Hannes Laimer <h.laimer@proxmox.com>, pbs-devel@lists.proxmox.com
Subject: Re: [PATCH proxmox-backup v7 3/9] datastore: add move-namespace
Date: Mon, 20 Apr 2026 16:49:48 +0200	[thread overview]
Message-ID: <1776694266.6hgr1v4evp.astroid@yuna.none> (raw)
In-Reply-To: <20260416171830.266553-4-h.laimer@proxmox.com>

On April 16, 2026 7:18 pm, Hannes Laimer wrote:
> Relocate an entire namespace subtree (the given namespace, all child
> namespaces, and their groups) to a new location within the same
> datastore.
> 
> Groups that cannot be locked because another task is running are
> deferred and retried once after all other groups have been processed.
> Groups that still fail are reported and remain at the source so they
> can be retried individually.
> 
> When merging is enabled and a source group already exists in the
> target, the snapshots are merged provided ownership matches and
> source snapshots are strictly newer.
> 
> An optional max-depth parameter limits how many levels of child
> namespaces are included. When delete-source is true (the default),
> empty source namespaces are removed after the move.
> 
> Signed-off-by: Hannes Laimer <h.laimer@proxmox.com>
> ---
>  pbs-datastore/src/datastore.rs | 245 ++++++++++++++++++++++++++++++++-
>  1 file changed, 244 insertions(+), 1 deletion(-)
> 
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index f88c356e..536824c9 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -34,7 +34,7 @@ use pbs_api_types::{
>      ArchiveType, Authid, BackupGroupDeleteStats, BackupNamespace, BackupType, ChunkOrder,
>      DataStoreConfig, DatastoreBackendConfig, DatastoreBackendType, DatastoreFSyncLevel,
>      DatastoreTuning, GarbageCollectionCacheStats, GarbageCollectionStatus, MaintenanceMode,
> -    MaintenanceType, Operation, S3Statistics, UPID,
> +    MaintenanceType, Operation, S3Statistics, MAX_NAMESPACE_DEPTH, UPID,
>  };
>  use pbs_config::s3::S3_CFG_TYPE_ID;
>  use pbs_config::{BackupLockGuard, ConfigVersionCache};
> @@ -1124,6 +1124,249 @@ impl DataStore {
>          Ok((removed_all_requested, stats))
>      }
>  
> +    /// Move a backup namespace (including all child namespaces and groups) to a new location.
> +    ///
> +    /// Groups are moved one at a time. For each group, exclusive locks on the group and all
> +    /// its snapshots (locked and moved in batches) ensure no concurrent operations are active. Groups
> +    /// that cannot be locked (because a backup, verify, or other task is running) are deferred
> +    /// and retried once, then reported in the error.
> +    ///
> +    /// If the target namespace already exists, groups are moved into it. Groups that fail
> +    /// (lock conflict, merge invariant violation, or I/O error) are skipped and reported at
> +    /// the end - they remain at the source and can be retried individually.
> +    ///
> +    /// `max_depth` limits how many levels of child namespaces below `source_ns` are included.
> +    /// `None` means unlimited (move the entire subtree). `Some(0)` moves only the groups
> +    /// directly in `source_ns` (no child namespaces).
> +    ///
> +    /// When `delete_source` is true the source namespace directories are removed after all
> +    /// groups have been moved. When false the (now empty) source namespaces are kept.
> +    ///
> +    /// When `merge_groups` is true and a source group already exists in the target namespace,
> +    /// the snapshots are merged into the existing target group (provided ownership matches
> +    /// and source snapshots are strictly newer). When false the operation fails for that
> +    /// group.
> +    pub fn move_namespace(
> +        self: &Arc<Self>,
> +        source_ns: &BackupNamespace,
> +        target_ns: &BackupNamespace,
> +        max_depth: Option<usize>,
> +        delete_source: bool,
> +        merge_groups: bool,
> +    ) -> Result<(), Error> {
> +        if *OLD_LOCKING {
> +            bail!("move operations require new-style file-based locking");
> +        }
> +        if source_ns.is_root() {
> +            bail!("cannot move root namespace");
> +        }
> +        if source_ns == target_ns {
> +            bail!("source and target namespace must be different");
> +        }
> +
> +        if !self.namespace_exists(source_ns) {
> +            bail!("source namespace '{source_ns}' does not exist");
> +        }
> +        let target_parent = target_ns.parent();
> +        if !target_ns.is_root() && !self.namespace_exists(&target_parent) {
> +            bail!("target namespace parent '{target_parent}' does not exist");
> +        }
> +        if source_ns.contains(target_ns).is_some() {
> +            bail!(
> +                "cannot move namespace '{source_ns}' into its own subtree (target: '{target_ns}')"
> +            );
> +        }

all of these are already done right before calling this, so should
either be moved into a small helper to avoid deviating, or not done
twice IMHO. we don't do anything inbetween other than forking a worker
after all..

> +
> +        let all_source_ns: Vec<BackupNamespace> = if let Some(depth) = max_depth {
> +            ListNamespacesRecursive::new_max_depth(Arc::clone(self), source_ns.clone(), depth)?
> +                .collect::<Result<Vec<_>, Error>>()?
> +        } else {
> +            self.recursive_iter_backup_ns(source_ns.clone())?
> +                .collect::<Result<Vec<_>, Error>>()?
> +        };

this could just be ListNamespacesRecrusive::new_max_depth with the depth
unwrapped to the max value if not set?

> +
> +        let all_source_groups: Vec<BackupGroup> = all_source_ns
> +            .iter()
> +            .map(|ns| self.iter_backup_groups(ns.clone()))
> +            .collect::<Result<Vec<_>, Error>>()?
> +            .into_iter()
> +            .flatten()
> +            .collect::<Result<Vec<_>, Error>>()?;

the only benefit of this is a single log line below, and the chances of
conflicts might be lower if we iterate each namespace on its own,
instead of all of them up front?

> +
> +        let subtree_depth = all_source_ns
> +            .iter()
> +            .map(BackupNamespace::depth)
> +            .max()
> +            .map_or(0, |d| d - source_ns.depth());
> +        if subtree_depth + target_ns.depth() > MAX_NAMESPACE_DEPTH {
> +            bail!(
> +                "move would exceed maximum namespace depth \
> +                ({subtree_depth}+{} > {MAX_NAMESPACE_DEPTH})",
> +                target_ns.depth(),
> +            );
> +        }

this is basically crate::sync::check_namespace_depth_limit, should it
move to BackupNamespace or some other place?

> +
> +        let backend = self.backend()?;
> +
> +        log::info!(
> +            "moving namespace '{source_ns}' -> '{target_ns}': {} namespaces, {} groups",
> +            all_source_ns.len(),
> +            all_source_groups.len(),
> +        );
> +
> +        // Create target local namespace directories upfront (covers empty namespaces).
> +        for ns in &all_source_ns {
> +            let target_child = ns.map_prefix(source_ns, target_ns)?;
> +            std::fs::create_dir_all(self.namespace_path(&target_child)).with_context(|| {
> +                format!("failed to create local dir for target namespace '{target_child}'")
> +            })?;

this could use Datastore::create_namespace , which does some extra
checks and

> +        }
> +
> +        // For S3: create namespace markers for all target namespaces.
> +        if let DatastoreBackend::S3(s3_client) = &backend {
> +            for ns in &all_source_ns {
> +                let target_child = ns.map_prefix(source_ns, target_ns)?;
> +                let object_key = crate::s3::object_key_from_path(
> +                    &target_child.path(),
> +                    NAMESPACE_MARKER_FILENAME,
> +                )
> +                .context("invalid namespace marker object key")?;
> +                log::debug!("creating S3 namespace marker for '{target_child}': {object_key:?}");
> +                proxmox_async::runtime::block_on(
> +                    s3_client.upload_no_replace_with_retry(object_key, Bytes::from("")),
> +                )
> +                .context("failed to create namespace marker on S3 backend")?;

also does this for us

> +            }
> +        }
> +
> +        // Move each group with per-group and per-snapshot locking. Groups that cannot be
> +        // locked are deferred and retried once after all other groups have been processed.
> +        let mut failed_groups: Vec<(BackupNamespace, String)> = Vec::new();
> +        let mut failed_ns: HashSet<BackupNamespace> = HashSet::new();
> +        let mut deferred: Vec<&BackupGroup> = Vec::new();
> +
> +        for group in &all_source_groups {
> +            if let Err(err) =
> +                self.lock_and_move_group(group, source_ns, target_ns, &backend, merge_groups)
> +            {
> +                match err {
> +                    MoveGroupError::Soft(err) => {
> +                        log::info!(
> +                            "deferring group '{}' in '{}'  - lock conflict, will retry: {err:#}",
> +                            group.group(),
> +                            group.backup_ns(),
> +                        );
> +                        deferred.push(group);
> +                    }
> +                    MoveGroupError::Hard(err) => {
> +                        warn!(
> +                            "failed to move group '{}' in '{}': {err:#}",
> +                            group.group(),
> +                            group.backup_ns(),
> +                        );
> +                        failed_groups.push((group.backup_ns().clone(), group.group().to_string()));
> +                        failed_ns.insert(group.backup_ns().clone());
> +                    }
> +                }
> +            }
> +        }
> +
> +        // Retry deferred groups once.
> +        if !deferred.is_empty() {
> +            log::info!("retrying {} deferred group(s)...", deferred.len());
> +            for group in deferred {
> +                if let Err(err) =
> +                    self.lock_and_move_group(group, source_ns, target_ns, &backend, merge_groups)
> +                {
> +                    let err = match err {
> +                        MoveGroupError::Soft(err) | MoveGroupError::Hard(err) => err,
> +                    };
> +                    warn!(
> +                        "failed to move group '{}' in '{}' on retry: {err:#}",
> +                        group.group(),
> +                        group.backup_ns(),
> +                    );
> +                    failed_groups.push((group.backup_ns().clone(), group.group().to_string()));
> +                    failed_ns.insert(group.backup_ns().clone());
> +                }
> +            }
> +        }
> +
> +        // Clean up source namespaces when requested. Process deepest-first so child
> +        // directories are removed before their parents.
> +        if delete_source {
> +            let moved_ns: HashSet<&BackupNamespace> = all_source_ns.iter().collect();

this is confusingly named, these are the namespaces we *tried to* move,
not necessarily the ones we actually *moved*.

> +
> +            for ns in all_source_ns.iter().rev() {
> +                // Skip namespaces that still have groups from failed moves or child
> +                // namespaces that were excluded by the depth limit.
> +                if failed_ns
> +                    .iter()
> +                    .any(|fns| fns == ns || ns.contains(fns).is_some())

nit: fns is a bit opaque, maybe `failed` or `failed_ns`?

should this log a warning that this NS was not cleaned up, and why?

> +                {
> +                    continue;
> +                }
> +                let has_excluded_children = self
> +                    .iter_backup_ns(ns.clone())
> +                    .ok()
> +                    .into_iter()
> +                    .flatten()
> +                    .filter_map(|child| child.ok())

this could use iter_backup_ns_ok instead?

> +                    .any(|child| !moved_ns.contains(&child));

this does a lot of nested iteration now.. what we actually want to check
is if there are any namespaces left below, because if there are, we
can't remove, either because
- we didn't try to move all children in the first place (depth check)
- we didn't manage to move all children we wanted to move (failed_ns,
  handled above)
- new children were created in the meantime

this check here handles the first and third case?

> +                if has_excluded_children {
> +                    continue;
> +                }
> +
> +                // Remove type subdirectories first (should be empty after per-group renames),
> +                // then the namespace directory itself. Uses remove_dir which fails on
> +                // non-empty directories, handling concurrent group creation gracefully.
> +                let ns_path = self.namespace_path(ns);
> +                if let Ok(entries) = std::fs::read_dir(&ns_path) {
> +                    for entry in entries.flatten() {
> +                        if let Err(err) = std::fs::remove_dir(entry.path()) {
> +                            warn!(
> +                                "failed to remove source directory {:?}: {err}",
> +                                entry.path(),
> +                            );
> +                        }
> +                    }
> +                }
> +                if let Err(err) = std::fs::remove_dir(&ns_path) {
> +                    warn!("failed to remove source namespace directory {ns_path:?}: {err}");
> +                    continue;
> +                }
> +
> +                // Local directory removed  - delete the S3 namespace marker too.
> +                if let DatastoreBackend::S3(s3_client) = &backend {
> +                    let object_key =
> +                        crate::s3::object_key_from_path(&ns.path(), NAMESPACE_MARKER_FILENAME)
> +                            .context("invalid namespace marker object key")?;
> +                    log::debug!("deleting source S3 namespace marker for '{ns}': {object_key:?}");
> +                    if let Err(err) =
> +                        proxmox_async::runtime::block_on(s3_client.delete_object(object_key))
> +                    {
> +                        warn!("failed to delete source S3 namespace marker for '{ns}': {err:#}");
> +                    }
> +                }

we do have remove_namespace_recursive already, which is what is used by
the delete_namespace API endpoint, and do not offer a way to just delete
a single namespace with protection against removing empty child
namespaces.. should we just use that here? maybe extending it with a
delete_child_namespaces parameter, and setting that to false here and in
pull, but true in the API endpoint?

> +            }
> +        }
> +
> +        if !failed_groups.is_empty() {
> +            let group_list: Vec<String> = failed_groups
> +                .iter()
> +                .map(|(ns, group)| format!("'{group}' in '{ns}'"))
> +                .collect();
> +            bail!(
> +                "namespace move partially completed; {} group(s) could not be moved \
> +                and remain at source: {}",
> +                failed_groups.len(),
> +                group_list.join(", "),
> +            );
> +        }
> +
> +        Ok(())
> +    }
> +
>      /// Remove a complete backup group including all snapshots.
>      ///
>      /// Returns `BackupGroupDeleteStats`, containing the number of deleted snapshots
> -- 
> 2.47.3
> 
> 
> 
> 
> 
> 




  reply	other threads:[~2026-04-20 14:49 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-16 17:18 [PATCH proxmox-backup v7 0/9] fixes #6195: add support for moving groups and namespaces Hannes Laimer
2026-04-16 17:18 ` [PATCH proxmox-backup v7 1/9] ui: show empty groups Hannes Laimer
2026-04-16 17:18 ` [PATCH proxmox-backup v7 2/9] datastore: add move-group Hannes Laimer
2026-04-20 14:49   ` Fabian Grünbichler
2026-04-21 10:43     ` Hannes Laimer
2026-04-16 17:18 ` [PATCH proxmox-backup v7 3/9] datastore: add move-namespace Hannes Laimer
2026-04-20 14:49   ` Fabian Grünbichler [this message]
2026-04-16 17:18 ` [PATCH proxmox-backup v7 4/9] docs: add section on moving namespaces and groups Hannes Laimer
2026-04-16 17:18 ` [PATCH proxmox-backup v7 5/9] api: add POST endpoint for move-group Hannes Laimer
2026-04-20 14:49   ` Fabian Grünbichler
2026-04-16 17:18 ` [PATCH proxmox-backup v7 6/9] api: add POST endpoint for move-namespace Hannes Laimer
2026-04-16 17:18 ` [PATCH proxmox-backup v7 7/9] ui: add move group action Hannes Laimer
2026-04-16 17:18 ` [PATCH proxmox-backup v7 8/9] ui: add move namespace action Hannes Laimer
2026-04-16 17:18 ` [PATCH proxmox-backup v7 9/9] cli: add move-namespace and move-group commands Hannes Laimer
2026-04-20 15:02 ` [PATCH proxmox-backup v7 0/9] fixes #6195: add support for moving groups and namespaces Fabian Grünbichler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1776694266.6hgr1v4evp.astroid@yuna.none \
    --to=f.gruenbichler@proxmox.com \
    --cc=h.laimer@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal