From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 660B31FF13B for ; Wed, 22 Apr 2026 13:12:17 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id CBA4716158; Wed, 22 Apr 2026 13:12:16 +0200 (CEST) Message-ID: <215e1655-04d9-49b7-a78c-d40a7d2e61d8@proxmox.com> Date: Wed, 22 Apr 2026 13:11:42 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH proxmox-backup v7 2/9] datastore: add move-group From: Hannes Laimer To: Christian Ebner , =?UTF-8?Q?Fabian_Gr=C3=BCnbichler?= , pbs-devel@lists.proxmox.com References: <20260416171830.266553-1-h.laimer@proxmox.com> <20260416171830.266553-3-h.laimer@proxmox.com> <84cfe249-23bf-4498-90e1-90b44dd944b2@proxmox.com> <1776847977.nipqfzc6ef.astroid@yuna.none> <8fa35401-0228-4f09-a9fc-1c66de829d48@proxmox.com> <1776851956.4wbeejt3wr.astroid@yuna.none> <265cba9a-65b8-42d2-a3dd-681550b46a31@proxmox.com> Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1776856214835 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.081 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: RP3VWGADX6EA4VSWLZ6N5ZZMWVJ4LXGO X-Message-ID-Hash: RP3VWGADX6EA4VSWLZ6N5ZZMWVJ4LXGO X-MailFrom: h.laimer@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox Backup Server development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On 2026-04-22 12:31, Hannes Laimer wrote: > > > On 2026-04-22 12:22, Christian Ebner wrote: >> On 4/22/26 12:16 PM, Hannes Laimer wrote: >>> On 2026-04-22 12:03, Fabian Grünbichler wrote: >>>> On April 22, 2026 11:30 am, Christian Ebner wrote: >>>>> On 4/22/26 11:23 AM, Hannes Laimer wrote: >>>>>> On 2026-04-22 11:06, Fabian Grünbichler wrote: >>>>>>> On April 22, 2026 10:40 am, Christian Ebner wrote: >>>>>>>> On 4/16/26 7:18 PM, Hannes Laimer wrote: >>>>>>>>> Add support for moving a single backup group to a different >>>>>>>>> namespace >>>>>>>>> within the same datastore. >>>>>>>>> >>>>>>>>> For the filesystem backend each snapshot directory is renamed >>>>>>>>> individually. For S3 all objects are copied to the target prefix >>>>>>>>> before deleting the source, per snapshot. >>>>>>>>> >>>>>>>>> Exclusive locks on the group and all its snapshots are acquired >>>>>>>>> before the move to ensure no concurrent operations are active. >>>>>>>>> Snapshots are locked and moved in batches to avoid exhausting file >>>>>>>>> descriptors on groups with many snapshots. >>>>>>>> >>>>>>>> Unless I overlook it, there currently is still one major issue >>>>>>>> which can >>>>>>>> lead to data loss with this: >>>>>>>> >>>>>>>> Garbage collection uses the Datastore's list_index_files() method to >>>>>>>> collect all index files at the start of phase 1. This is to know >>>>>>>> which >>>>>>>> chunks need atime updates to mark them as in use. Snapshots which >>>>>>>> disappear in the mean time can be ignored, as the chunks may then no >>>>>>>> longer be in use. Snapshots created in the mean time are safe, as >>>>>>>> there >>>>>>>> it is the cutoff time protecting newly written chunks which are not >>>>>>>> referenced by any of the index files which are now not in the list. >>>>>>>> >>>>>>>> But if the move happens after GC started and collected the index >>>>>>>> files, >>>>>>>> but before reaching that index files. the moved index file still >>>>>>>> might >>>>>>>> reference chunks which are in-use, but now never get an atime >>>>>>>> update. >>>>>>>> >>>>>>>> Locking unfortunately does not protect against this. >>>>>>>> >>>>>>>> So if there is an ongoing garbage collection phase 1, there is >>>>>>>> the need >>>>>>>> for some mechanism to re-inject the index files in the list of >>>>>>>> indices >>>>>>>> and therefore chunks to process. >>>>>>>> This might require to write the moved indices to a file, so they >>>>>>>> can be >>>>>>>> read and processed at the end of GC phase 1 even if GC is running >>>>>>>> in a >>>>>>>> different process. And it requires to flock that file and wait >>>>>>>> for it to >>>>>>>> become available before continuing. >>>>>>> >>>>>>> or moving could obtain the GC lock, and you simply cannot move >>>>>>> while a >>>>>>> GC is running or start a GC while a move is in progress? though the >>>>>>> latter might be problematic.. it is already possible to block GC in >>>>>>> practice if you have a writer that never finishes (assuming the >>>>>>> proxy is >>>>>>> reloaded every once in a while, which happens once per day at least). >>>>>>> >>>>>>> I guess your approach is similar to the trash feature we've >>>>>>> discussed a >>>>>>> while back (just without restoring from trash and all the associated >>>>>>> complexity ;)).. it would only require blocking moves during this >>>>>>> "phase >>>>>>> 1.5" instead of the whole GC, which would of course be nice.. but it >>>>>>> also increases the amount of work move needs to do by quite a bit.. >>>>>> >>>>>> is it that much though? it would be just appending a line to a file >>>>>> for >>>>>> every moved index, compared to the actual moving itself, this seems >>>>>> rather minor, no? >>>> >>>> it is another readdir + parsing for each moved snapshot, which is >>>> definitely not nothing.. but it also isn't that bad. >>>> >>>> we'd also need a mechanism to support multiple moves, so we need a >>>> shared (obtained by moves) lock that allows creating and writing such >>>> files, and an exclusive lock (obtained by GC) that allows processing and >>>> clearing them? >>>> >>> >>> yes, at least that's what i had in mind. line appends are atomic, so >>> concurrent moves writing lines to the same file wouldn't be a problem. >>> and a second move would have to wait till gc is done draining(if it >>> started while gc is draining), at which point the gc already has touched >>> everything that might be moved be the second move >>> >>> hope i didn't miss anything, but this should be working. >>> (technically continuous mv's could starve the gc cause flock puts shared >>> locks before exclusive ones... but that seems to be a rather constructed >>> problem since moves are user triggered...) >> >> Why not use only exclusive locks then? The operation of appending a new >> line should be rather limited in duration, and i think it would be okay >> to have a short timeout when trying to acquiring the flock. > > The lock has to be held for the whole move, not just the append. > > - move renames s1, briefly takes exclusive, writes s1, releases > - GC's drain acquires exclusive (nobody holds it), drains s1, > truncates, releases -> phase 2 starts > - move renames s2, takes exclusive, writes s2, releases > > s2's entry sits in the journal, but won't we read... we'd have the same > problem again. exclusive locks over the whole mv would also block other > moves, that wouldn't really have to wait > had a short off-list discussion with @Chris, and he's right, if we do the append before the rename we don't need the lock for the whole move and we can use exclusive locks and only lock around the appends