From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <pve-devel-bounces@lists.proxmox.com> Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id D84941FF19A for <inbox@lore.proxmox.com>; Fri, 7 Mar 2025 13:15:34 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 8E8051A36C; Fri, 7 Mar 2025 13:15:27 +0100 (CET) Date: Fri, 7 Mar 2025 13:14:53 +0100 (CET) From: =?UTF-8?Q?Fabian_Gr=C3=BCnbichler?= <f.gruenbichler@proxmox.com> To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>, Friedrich Weber <f.weber@proxmox.com> Message-ID: <1233088344.6300.1741349693091@webmail.proxmox.com> In-Reply-To: <20250307095245.65698-1-f.weber@proxmox.com> References: <20250307095245.65698-1-f.weber@proxmox.com> MIME-Version: 1.0 X-Priority: 3 Importance: Normal X-Mailer: Open-Xchange Mailer v7.10.6-Rev74 X-Originating-Client: open-xchange-appsuite X-SPAM-LEVEL: Spam detection results: 0 AWL -0.107 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment POISEN_SPAM_PILL 0.1 Meta: its spam POISEN_SPAM_PILL_1 0.1 random spam to be learned in bayes POISEN_SPAM_PILL_3 0.1 random spam to be learned in bayes RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [PATCH storage/zfsonlinux v2 0/3] fix #4997: lvm: avoid autoactivating (new) LVs after boot X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com> List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe> List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/> List-Post: <mailto:pve-devel@lists.proxmox.com> List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help> List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe> Reply-To: Proxmox VE development discussion <pve-devel@lists.proxmox.com> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" <pve-devel-bounces@lists.proxmox.com> > Friedrich Weber <f.weber@proxmox.com> hat am 07.03.2025 10:52 CET geschrieben: > # Summary > > With default settings, LVM autoactivates LVs when it sees a new VG, e.g. after > boot or iSCSI login. In a cluster with guest disks on a shared LVM VG (e.g. on > top of iSCSI/Fibre Channel (FC)/direct-attached SAS), this can indirectly cause > guest creation or migration to fail. See bug #4997 [1] and patch #3 for > details. > > The goal of this series is to avoid autoactivating LVs that hold guest disks in > order to fix #4997. With this series applied, (newly created) LVs should only > be active on the node that hosts the corresponding guest. > > # Difference to v1 > > v1 [0] went with an approach that creates new LVs with the "activation skip" > flag set, which means lvchange will only activate the LV if the `-K` flag is > passed. This approach requires a two-step patch apply process to not break > mixed-version clusters: With PVE 8.4, we would add a change that passes the > `-K` flag when activating, and only with PVE 9 we would also create new LVs > with the "activation skip" flag. This approach works but is relatively complex. > > # This v2 approach > > This v2 goes a different route: Instead of "activation skip" flag, we use the > the `--setautoactivation n` flag for LVs. This flag is available in Debian's > LVM since Bookworm and only concerns autoactivation, not normal activation as > performed by our LVM plugin. Hence, it is enough to create new LVs with > `--setautoactivation n`, we don't require any two-step patch apply process. We > may still need to wait for PVE 9 to apply this though (see next section). > Considering this, the advantage over v1 may appear quite small, though I still > find v2's approach preferable because the `--setautoactivation` flag seems less > invasive than the activation skip flag. thanks for the clear description and analysis! :) > # Mixed 7/8 cluster > > Unfortunately, we need to consider the mixed-version cluster between PVE 7 and > PVE 8 because PVE 7/Bullseye's LVM does not know `--setautoactivation`. A user > upgrading from PVE 7 will temporarily have a mixed 7/8 cluster. Once this > series is applied, the PVE 8 nodes will create new LVs with > `--setautoactivation n`, which the PVE 7 nodes do not know. In my tests, the > PVE 7 nodes can read/interact with such LVs just fine, *but*: As soon as a PVE > 7 node creates a new (unrelated) LV, the `--setautoactivation n` flag is reset > to default `y` on *all* LVs of the VG. I presume this is because creating a new > LV rewrites metadata, and the PVE 7 LVM doesn't write out the > `--setautoactivation n` flag. I imagine (have not tested) this will cause > problems on a mixed cluster. > > Hence, it may be safer to hold off on applying patches #2/#3 until PVE 9, > because then we can be sure all nodes run at least PVE 8. wow - I wonder what other similar timebombs are lurking there?? e.g., for the changelog for bullseye -> bookworm: 2.03.15: "Increase some hash table size to better support large device sets." (hopefully those are in memory?) 2.03.14: various VDO things (not used by us at least) 2.03.12: "Allow attaching cache to thin data volume." and other thin-related changes (not relevant for shared thankfully), auto activation, more VDO things bookworm -> trixie: 2.03.28: various things using radix_trees now (hopefully in memory?) 2.03.25: same 2.03.24: "Support external origin between different thin-pool." & "Support creation of thin-pool with VDO use for its data volume" (local only thankfully), 2.03.23: "Set the first lv_attr flag for raid integrity images to i or I." (not used by us, maybe in memory only) 2.03.22: "Support parsing of vdo geometry format version 4.", "Fix parsing of VDO metadata.", "Allow snapshots of raid+integrity LV." (all not applicable to regular PVE setups) 2.03.18: "Add support for writecache metadata_only and pause_writeback settings." (?) so at least nothing that jumps out as definitely problematic for our common setups.. > # Interaction with zfs-initramfs > > One complication is that zfs-initramfs ships an initramfs-tools script that > unconditionally activates *all* VGs that are visible at boot time, see [2]. > This is the case for local VGs, but also for shared VGs on FC/SAS. Patch #2 of > this series fixes this by making the script perform autoactivation instead, > which honors the `--setautoactivation` flag. The patch is necessary to fix > #4997 on FC/SAS-attached LUNs too. > > # Bonus fix for FC/SAS multipath+LVM issue > > As it turns out, this series seems to additionally fix an issue on hosts with > LVM on FC/SAS-attached LUNs *with multipath* where LVM would report "Device > mismatch detected" warnings because the LVs are activated too early in the boot > process before multipath is available. Our current suggested workaround is to > install multipath-tools-boot [2]. With this series applied, this shouldn't be > necessary anymore, as (newly created) LVs are not auto-activated after boot. > > # LVM-thick/LVM-thin > > Note that this change affects all LVs on LVM-thick, not just ones on shared > storage. As a result, also on single-node hosts, local guest disk LVs on > LVM-thick will not be automatically active after boot anymore (after applying > all patches of this series). Guest disk LVs on LVM-thin will still be > auto-activated, but since LVM-thin storage is necessarily local, we don't run > into #4997 here. we could check the shared property, but I don't think having them not auto-activated hurts as long as it is documented.. > > # Transition to LVs with `--setautoactivation n` > > Both v1 and v2 approaches only take effect for new LVs, so we should probably > also have pve8to9 check for guest disks on (shared?) LVM that have > autoactivation enabled, and suggest to the user to manually disable > autoactivation on the LVs, or even the entire VG if it holds only PVE-managed > LVs. if we want to wait for PVE 9 anyway to start enabling (disabling? ;)) it, then the upgrade script would be a nice place to tell users to fix up their volumes? OTOH, setting the flag automatically starting with PVE 9 also for existing volumes should have no downsides, we need to document anyway that the behaviour there changed (so that people that rely on them becoming auto-activated on boot can adapt whatever is relying on that).. or we could provide a script that does it post-upgrade.. > We could implement something on top to make the transition smoother, some ideas: > > - When activating existing LVs, check the auto activation flag, and if auto > activation is still enabled, disable it. the only question is whether we want to "pay" for that on each activate_volume? > - When creating a new VG via the LVM storage plugin or the /disks/lvm API > endpoints, disable autoactivation for the whole VG. But this may become > confusing as then we'd deal with the flag on both LVs and VGs. Also, especially > for FC/SAS-attached LUNs, users may not use the GUI/API and instead create the > VG via the command line. We could adjust our guides to use `vgcreate > --setautoactivation n` [4], but not all users may follow these guides. _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel