From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 531D51FF13B for ; Wed, 22 Apr 2026 11:30:48 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 2019112161; Wed, 22 Apr 2026 11:30:48 +0200 (CEST) Message-ID: <8fa35401-0228-4f09-a9fc-1c66de829d48@proxmox.com> Date: Wed, 22 Apr 2026 11:30:13 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH proxmox-backup v7 2/9] datastore: add move-group To: Hannes Laimer , =?UTF-8?Q?Fabian_Gr=C3=BCnbichler?= , pbs-devel@lists.proxmox.com References: <20260416171830.266553-1-h.laimer@proxmox.com> <20260416171830.266553-3-h.laimer@proxmox.com> <84cfe249-23bf-4498-90e1-90b44dd944b2@proxmox.com> <1776847977.nipqfzc6ef.astroid@yuna.none> Content-Language: en-US, de-DE From: Christian Ebner In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1776850127515 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.071 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: POVSKRCAPCSFJT6ZJFPOFKOEGSEVXR5A X-Message-ID-Hash: POVSKRCAPCSFJT6ZJFPOFKOEGSEVXR5A X-MailFrom: c.ebner@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox Backup Server development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On 4/22/26 11:23 AM, Hannes Laimer wrote: > On 2026-04-22 11:06, Fabian Grünbichler wrote: >> On April 22, 2026 10:40 am, Christian Ebner wrote: >>> On 4/16/26 7:18 PM, Hannes Laimer wrote: >>>> Add support for moving a single backup group to a different namespace >>>> within the same datastore. >>>> >>>> For the filesystem backend each snapshot directory is renamed >>>> individually. For S3 all objects are copied to the target prefix >>>> before deleting the source, per snapshot. >>>> >>>> Exclusive locks on the group and all its snapshots are acquired >>>> before the move to ensure no concurrent operations are active. >>>> Snapshots are locked and moved in batches to avoid exhausting file >>>> descriptors on groups with many snapshots. >>> >>> Unless I overlook it, there currently is still one major issue which can >>> lead to data loss with this: >>> >>> Garbage collection uses the Datastore's list_index_files() method to >>> collect all index files at the start of phase 1. This is to know which >>> chunks need atime updates to mark them as in use. Snapshots which >>> disappear in the mean time can be ignored, as the chunks may then no >>> longer be in use. Snapshots created in the mean time are safe, as there >>> it is the cutoff time protecting newly written chunks which are not >>> referenced by any of the index files which are now not in the list. >>> >>> But if the move happens after GC started and collected the index files, >>> but before reaching that index files. the moved index file still might >>> reference chunks which are in-use, but now never get an atime update. >>> >>> Locking unfortunately does not protect against this. >>> >>> So if there is an ongoing garbage collection phase 1, there is the need >>> for some mechanism to re-inject the index files in the list of indices >>> and therefore chunks to process. >>> This might require to write the moved indices to a file, so they can be >>> read and processed at the end of GC phase 1 even if GC is running in a >>> different process. And it requires to flock that file and wait for it to >>> become available before continuing. >> >> or moving could obtain the GC lock, and you simply cannot move while a >> GC is running or start a GC while a move is in progress? though the >> latter might be problematic.. it is already possible to block GC in >> practice if you have a writer that never finishes (assuming the proxy is >> reloaded every once in a while, which happens once per day at least). >> >> I guess your approach is similar to the trash feature we've discussed a >> while back (just without restoring from trash and all the associated >> complexity ;)).. it would only require blocking moves during this "phase >> 1.5" instead of the whole GC, which would of course be nice.. but it >> also increases the amount of work move needs to do by quite a bit.. > > is it that much though? it would be just appending a line to a file for > every moved index, compared to the actual moving itself, this seems > rather minor, no? > > and gc would just have to read this file (if it exists) once at the end > of phase one Another option might be to re-list all the index files at the end of phase1, and process any not already processed one?