From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 268241FF13B for ; Wed, 22 Apr 2026 12:05:08 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 06A89140C7; Wed, 22 Apr 2026 12:05:08 +0200 (CEST) Date: Wed, 22 Apr 2026 12:05:01 +0200 From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= Subject: Re: [PATCH proxmox-backup v7 2/9] datastore: add move-group To: Christian Ebner , Hannes Laimer , pbs-devel@lists.proxmox.com References: <20260416171830.266553-1-h.laimer@proxmox.com> <20260416171830.266553-3-h.laimer@proxmox.com> <84cfe249-23bf-4498-90e1-90b44dd944b2@proxmox.com> <1776847977.nipqfzc6ef.astroid@yuna.none> <8fa35401-0228-4f09-a9fc-1c66de829d48@proxmox.com> In-Reply-To: <8fa35401-0228-4f09-a9fc-1c66de829d48@proxmox.com> MIME-Version: 1.0 User-Agent: astroid/0.17.0 (https://github.com/astroidmail/astroid) Message-Id: <1776851956.4wbeejt3wr.astroid@yuna.none> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1776852217368 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.055 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: SQ3VA5TYVGLKZW33CEESH5P62L3YHVXP X-Message-ID-Hash: SQ3VA5TYVGLKZW33CEESH5P62L3YHVXP X-MailFrom: f.gruenbichler@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox Backup Server development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On April 22, 2026 11:30 am, Christian Ebner wrote: > On 4/22/26 11:23 AM, Hannes Laimer wrote: >> On 2026-04-22 11:06, Fabian Gr=C3=BCnbichler wrote: >>> On April 22, 2026 10:40 am, Christian Ebner wrote: >>>> On 4/16/26 7:18 PM, Hannes Laimer wrote: >>>>> Add support for moving a single backup group to a different namespace >>>>> within the same datastore. >>>>> >>>>> For the filesystem backend each snapshot directory is renamed >>>>> individually. For S3 all objects are copied to the target prefix >>>>> before deleting the source, per snapshot. >>>>> >>>>> Exclusive locks on the group and all its snapshots are acquired >>>>> before the move to ensure no concurrent operations are active. >>>>> Snapshots are locked and moved in batches to avoid exhausting file >>>>> descriptors on groups with many snapshots. >>>> >>>> Unless I overlook it, there currently is still one major issue which c= an >>>> lead to data loss with this: >>>> >>>> Garbage collection uses the Datastore's list_index_files() method to >>>> collect all index files at the start of phase 1. This is to know which >>>> chunks need atime updates to mark them as in use. Snapshots which >>>> disappear in the mean time can be ignored, as the chunks may then no >>>> longer be in use. Snapshots created in the mean time are safe, as ther= e >>>> it is the cutoff time protecting newly written chunks which are not >>>> referenced by any of the index files which are now not in the list. >>>> >>>> But if the move happens after GC started and collected the index files= , >>>> but before reaching that index files. the moved index file still might >>>> reference chunks which are in-use, but now never get an atime update. >>>> >>>> Locking unfortunately does not protect against this. >>>> >>>> So if there is an ongoing garbage collection phase 1, there is the nee= d >>>> for some mechanism to re-inject the index files in the list of indices >>>> and therefore chunks to process. >>>> This might require to write the moved indices to a file, so they can b= e >>>> read and processed at the end of GC phase 1 even if GC is running in a >>>> different process. And it requires to flock that file and wait for it = to >>>> become available before continuing. >>> >>> or moving could obtain the GC lock, and you simply cannot move while a >>> GC is running or start a GC while a move is in progress? though the >>> latter might be problematic.. it is already possible to block GC in >>> practice if you have a writer that never finishes (assuming the proxy i= s >>> reloaded every once in a while, which happens once per day at least). >>> >>> I guess your approach is similar to the trash feature we've discussed a >>> while back (just without restoring from trash and all the associated >>> complexity ;)).. it would only require blocking moves during this "phas= e >>> 1.5" instead of the whole GC, which would of course be nice.. but it >>> also increases the amount of work move needs to do by quite a bit.. >>=20 >> is it that much though? it would be just appending a line to a file for >> every moved index, compared to the actual moving itself, this seems >> rather minor, no? it is another readdir + parsing for each moved snapshot, which is definitely not nothing.. but it also isn't that bad. we'd also need a mechanism to support multiple moves, so we need a shared (obtained by moves) lock that allows creating and writing such files, and an exclusive lock (obtained by GC) that allows processing and clearing them? >> and gc would just have to read this file (if it exists) once at the end >> of phase one >=20 > Another option might be to re-list all the index files at the end of=20 > phase1, and process any not already processed one? that is much more expensive, and in particular, also makes the "there were no parallel moves" case more expensive as well which we should avoid.. and also we'd need to forbid further moves while this second processing is going on, or repeat this possibly very often..