From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 908531FF142 for ; Tue, 21 Apr 2026 12:44:04 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 40F6A1C438; Tue, 21 Apr 2026 12:44:04 +0200 (CEST) Message-ID: Date: Tue, 21 Apr 2026 12:43:28 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH proxmox-backup v7 2/9] datastore: add move-group To: =?UTF-8?Q?Fabian_Gr=C3=BCnbichler?= , pbs-devel@lists.proxmox.com References: <20260416171830.266553-1-h.laimer@proxmox.com> <20260416171830.266553-3-h.laimer@proxmox.com> <1776689762.plz71v8ptr.astroid@yuna.none> Content-Language: en-US From: Hannes Laimer In-Reply-To: <1776689762.plz71v8ptr.astroid@yuna.none> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1776768123454 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.082 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: SLAYJK5DHCP2DNNM65EOZKEODC22C3N6 X-Message-ID-Hash: SLAYJK5DHCP2DNNM65EOZKEODC22C3N6 X-MailFrom: h.laimer@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox Backup Server development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On 2026-04-20 16:48, Fabian Grünbichler wrote: > On April 16, 2026 7:18 pm, Hannes Laimer wrote: >> Add support for moving a single backup group to a different namespace >> within the same datastore. >> >> For the filesystem backend each snapshot directory is renamed >> individually. For S3 all objects are copied to the target prefix >> before deleting the source, per snapshot. >> >> Exclusive locks on the group and all its snapshots are acquired >> before the move to ensure no concurrent operations are active. >> Snapshots are locked and moved in batches to avoid exhausting file >> descriptors on groups with many snapshots. >> >> If the target group already exists and merging is enabled, the >> individual snapshots are moved into the existing target group >> provided both groups have the same owner and the source snapshots >> are strictly newer than the target snapshots. >> >> Signed-off-by: Hannes Laimer >> --- [..] >> + /// Lock and move a single group. Acquires exclusive locks on both source and target >> + /// group paths, then moves snapshots in batches - each batch is locked, moved, and >> + /// released before the next batch starts. This ensures no concurrent readers (e.g. >> + /// verify) can operate on snapshots being moved, without exhausting file descriptors. >> + /// >> + /// Returns a `MoveGroupError` on failure, indicating whether the failure was a >> + /// transient lock conflict (retryable) or a hard error. >> + pub(crate) fn lock_and_move_group( > > this can be private > >> + self: &Arc, >> + group: &BackupGroup, >> + source_ns: &BackupNamespace, >> + target_ns: &BackupNamespace, > > the order of parameters is different than above.. > >> + backend: &DatastoreBackend, >> + merge_groups: bool, >> + ) -> Result<(), MoveGroupError> { >> + let target_group_ns = group >> + .backup_ns() >> + .map_prefix(source_ns, target_ns) >> + .map_err(MoveGroupError::Hard)?; >> + >> + let target_group = self.backup_group(target_group_ns.clone(), group.group().clone()); >> + >> + // Acquire exclusive group locks - source prevents new snapshot additions/removals, >> + // target prevents concurrent creation at the destination path. >> + let _group_lock = group.lock().map_err(MoveGroupError::Soft)?; >> + let _target_group_lock = target_group.lock().map_err(MoveGroupError::Soft)?; > > this (continued below) > >> + let merge = target_group.exists(); >> + >> + if merge && !merge_groups { >> + return Err(MoveGroupError::Hard(format_err!( >> + "group '{}' already exists in target namespace '{target_group_ns}' \ >> + and merging is disabled", >> + group.group(), >> + ))); >> + } >> + >> + if merge { >> + group >> + .check_merge_invariants(&target_group_ns) >> + .map_err(MoveGroupError::Hard)?; >> + } >> + >> + let mut snapshots: Vec<_> = group >> + .iter_snapshots() >> + .map_err(MoveGroupError::Hard)? >> + .collect::, _>>() >> + .map_err(MoveGroupError::Hard)?; >> + // Sort by time so that a partial failure (e.g. batch N succeeds but N+1 >> + // fails) leaves a clean time-ordered split: all moved snapshots are older >> + // than all remaining ones. This allows a merge retry to succeed since the >> + // invariant (source oldest > target newest) still holds. >> + snapshots.sort_by_key(|s| s.backup_time()); > > if we need the full list of sorted backups, why not use > group.list_backups() and BackupInfo::sort_list ? > list_backups() does do extra IO (`list_backup_files()`), without us actually needing it thanks for taking a look! will address the rest in a v8 :) >> + >> + // Ensure the target type and group directories exist. >> + std::fs::create_dir_all(self.type_path(&target_group_ns, group.group().ty)) >> + .map_err(|err| MoveGroupError::Hard(err.into()))?; >> + let target_group_path = self.group_path(&target_group_ns, group.group()); >> + if !merge { >> + std::fs::create_dir_all(&target_group_path) >> + .map_err(|err| MoveGroupError::Hard(err.into()))?; >> + // Copy owner so the group is visible in the UI during the move (especially >> + // relevant for S3 where the copy can take a while). >> + if let Ok(owner) = group.get_owner() { >> + if let Err(err) = target_group.set_owner(&owner, true) { >> + warn!( >> + "failed to set owner for target group '{}' in '{target_group_ns}': {err:#}", >> + group.group(), >> + ); >> + } >> + } >> + } > > (continued from above) and this could be replaced with a call to > `create_locked_backup_group`. downside - differentiating between > "currently locked" and "other error" is harder, but the upside is that > that code already handles a lot of the logic for us, and is how backup, > tape restore and pull jobs create new groups or add snapshots to > existing groups.. > > maybe it would be worth it to refactor that implementation, and a add > new non-pub variant/wrapper that > - returns whether it created the group or not > - allows us to differentiate locking from other errors? > >> + >> + log::info!( >> + "{} group '{}/{}' from '{}' to '{target_group_ns}'", >> + if merge { "merging" } else { "moving" }, >> + group.group().ty, >> + group.group().id, >> + group.backup_ns(), >> + ); >> + >> + // Lock and move snapshots in batches. Each batch is locked, moved, and released >> + // before the next batch starts. This ensures no concurrent reader (e.g. verify, >> + // which only takes a snapshot-level shared lock) can operate on a snapshot while >> + // it is being moved. >> + for chunk in snapshots.chunks(MAX_SNAPSHOT_LOCKS) { >> + let _batch_locks: Vec<_> = chunk >> + .iter() >> + .map(|snap| snap.lock().map_err(MoveGroupError::Soft)) >> + .collect::, _>>()?; > > this is funky, but I guess it does the job ;) I'd feel a bit more > comfortable if the Vec<_> were explicit though.. > >> + >> + for snap in chunk { >> + group >> + .move_snapshot(snap, &target_group_path, backend) >> + .map_err(MoveGroupError::Hard)?; > > see above, I think we can leverage the type system a bit more here.. > >> + } >> + // batch locks released here >> + } >> + >> + // Copy group notes to the target (unless merging into a group that already has notes). >> + let src_notes_path = self.group_notes_path(group.backup_ns(), group.group()); >> + let dst_notes_path = self.group_notes_path(&target_group_ns, group.group()); >> + if src_notes_path.exists() && !dst_notes_path.exists() { >> + if let Err(err) = std::fs::copy(&src_notes_path, &dst_notes_path) { > > should this be a rename? should we warn if merging and the notes have > diverged? what if this is on S3? > >> + warn!( >> + "failed to copy group notes from {src_notes_path:?} to {dst_notes_path:?}: {err}" >> + ); >> + } >> + } >> + [..]