all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
	Friedrich Weber <f.weber@proxmox.com>
Subject: Re: [pve-devel] [PATCH storage/zfsonlinux v2 0/3] fix #4997: lvm: avoid autoactivating (new) LVs after boot
Date: Fri, 7 Mar 2025 13:14:53 +0100 (CET)	[thread overview]
Message-ID: <1233088344.6300.1741349693091@webmail.proxmox.com> (raw)
In-Reply-To: <20250307095245.65698-1-f.weber@proxmox.com>

> Friedrich Weber <f.weber@proxmox.com> hat am 07.03.2025 10:52 CET geschrieben:
> # Summary
> 
> With default settings, LVM autoactivates LVs when it sees a new VG, e.g. after
> boot or iSCSI login. In a cluster with guest disks on a shared LVM VG (e.g. on
> top of iSCSI/Fibre Channel (FC)/direct-attached SAS), this can indirectly cause
> guest creation or migration to fail. See bug #4997 [1] and patch #3 for
> details.
> 
> The goal of this series is to avoid autoactivating LVs that hold guest disks in
> order to fix #4997. With this series applied, (newly created) LVs should only
> be active on the node that hosts the corresponding guest.
> 
> # Difference to v1
> 
> v1 [0] went with an approach that creates new LVs with the "activation skip"
> flag set, which means lvchange will only activate the LV if the `-K` flag is
> passed. This approach requires a two-step patch apply process to not break
> mixed-version clusters: With PVE 8.4, we would add a change that passes the
> `-K` flag when activating, and only with PVE 9 we would also create new LVs
> with the "activation skip" flag. This approach works but is relatively complex.
> 
> # This v2 approach
> 
> This v2 goes a different route: Instead of "activation skip" flag, we use the
> the `--setautoactivation n` flag for LVs. This flag is available in Debian's
> LVM since Bookworm and only concerns autoactivation, not normal activation as
> performed by our LVM plugin. Hence, it is enough to create new LVs with
> `--setautoactivation n`, we don't require any two-step patch apply process. We
> may still need to wait for PVE 9 to apply this though (see next section).
> Considering this, the advantage over v1 may appear quite small, though I still
> find v2's approach preferable because the `--setautoactivation` flag seems less
> invasive than the activation skip flag.

thanks for the clear description and analysis! :)

> # Mixed 7/8 cluster
> 
> Unfortunately, we need to consider the mixed-version cluster between PVE 7 and
> PVE 8 because PVE 7/Bullseye's LVM does not know `--setautoactivation`. A user
> upgrading from PVE 7 will temporarily have a mixed 7/8 cluster. Once this
> series is applied, the PVE 8 nodes will create new LVs with
> `--setautoactivation n`, which the PVE 7 nodes do not know. In my tests, the
> PVE 7 nodes can read/interact with such LVs just fine, *but*: As soon as a PVE
> 7 node creates a new (unrelated) LV, the `--setautoactivation n` flag is reset
> to default `y` on *all* LVs of the VG. I presume this is because creating a new
> LV rewrites metadata, and the PVE 7 LVM doesn't write out the
> `--setautoactivation n` flag. I imagine (have not tested) this will cause
> problems on a mixed cluster.
> 
> Hence, it may be safer to hold off on applying patches #2/#3 until PVE 9,
> because then we can be sure all nodes run at least PVE 8.

wow - I wonder what other similar timebombs are lurking there??

e.g., for the changelog for bullseye -> bookworm:

2.03.15:   "Increase some hash table size to better support large device sets." (hopefully those are in memory?)
2.03.14: various VDO things (not used by us at least)
2.03.12:   "Allow attaching cache to thin data volume." and other thin-related changes (not relevant for shared thankfully), auto activation, more VDO things

bookworm -> trixie:

2.03.28: various things using radix_trees now (hopefully in memory?)
2.03.25: same
2.03.24: "Support external origin between different thin-pool." & "Support creation of thin-pool with VDO use for its data volume" (local only thankfully), 
2.03.23: "Set the first lv_attr flag for raid integrity images to i or I." (not used by us, maybe in memory only)
2.03.22: "Support parsing of vdo geometry format version 4.", "Fix parsing of VDO metadata.", "Allow snapshots of raid+integrity LV." (all not applicable to regular PVE setups)
2.03.18: "Add support for writecache metadata_only and pause_writeback settings." (?)

so at least nothing that jumps out as definitely problematic for our common setups..

> # Interaction with zfs-initramfs
> 
> One complication is that zfs-initramfs ships an initramfs-tools script that
> unconditionally activates *all* VGs that are visible at boot time, see [2].
> This is the case for local VGs, but also for shared VGs on FC/SAS. Patch #2 of
> this series fixes this by making the script perform autoactivation instead,
> which honors the `--setautoactivation` flag. The patch is necessary to fix
> #4997 on FC/SAS-attached LUNs too.
> 
> # Bonus fix for FC/SAS multipath+LVM issue
> 
> As it turns out, this series seems to additionally fix an issue on hosts with
> LVM on FC/SAS-attached LUNs *with multipath* where LVM would report "Device
> mismatch detected" warnings because the LVs are activated too early in the boot
> process before multipath is available. Our current suggested workaround is to
> install multipath-tools-boot [2]. With this series applied, this shouldn't be
> necessary anymore, as (newly created) LVs are not auto-activated after boot.
> 
> # LVM-thick/LVM-thin
> 
> Note that this change affects all LVs on LVM-thick, not just ones on shared
> storage. As a result, also on single-node hosts, local guest disk LVs on
> LVM-thick will not be automatically active after boot anymore (after applying
> all patches of this series). Guest disk LVs on LVM-thin will still be
> auto-activated, but since LVM-thin storage is necessarily local, we don't run
> into #4997 here.

we could check the shared property, but I don't think having them not
auto-activated hurts as long as it is documented..

> 
> # Transition to LVs with `--setautoactivation n`
> 
> Both v1 and v2 approaches only take effect for new LVs, so we should probably
> also have pve8to9 check for guest disks on (shared?) LVM that have
> autoactivation enabled, and suggest to the user to manually disable
> autoactivation on the LVs, or even the entire VG if it holds only PVE-managed
> LVs.

if we want to wait for PVE 9 anyway to start enabling (disabling? ;)) it, then
the upgrade script would be a nice place to tell users to fix up their volumes?
OTOH, setting the flag automatically starting with PVE 9 also for existing
volumes should have no downsides, we need to document anyway that the behaviour
there changed (so that people that rely on them becoming auto-activated on boot
can adapt whatever is relying on that).. or we could provide a script that does
it post-upgrade..

> We could implement something on top to make the transition smoother, some ideas:
> 
> - When activating existing LVs, check the auto activation flag, and if auto
>   activation is still enabled, disable it.

the only question is whether we want to "pay" for that on each activate_volume?

> - When creating a new VG via the LVM storage plugin or the /disks/lvm API
>   endpoints, disable autoactivation for the whole VG. But this may become
>   confusing as then we'd deal with the flag on both LVs and VGs. Also, especially
>   for FC/SAS-attached LUNs, users may not use the GUI/API and instead create the
>   VG via the command line. We could adjust our guides to use `vgcreate
>   --setautoactivation n` [4], but not all users may follow these guides.


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


  parent reply	other threads:[~2025-03-07 12:15 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-07  9:52 Friedrich Weber
2025-03-07  9:52 ` [pve-devel] [PATCH zfsonlinux v2 1/3] zfs-initramfs: add patch to avoid activating all LVs at boot Friedrich Weber
2025-03-07 10:57   ` Friedrich Weber
2025-03-07 11:49   ` Fabian Grünbichler
2025-03-07 17:01     ` Friedrich Weber
2025-03-16 19:24   ` [pve-devel] applied: " Thomas Lamprecht
2025-03-07  9:52 ` [pve-devel] [PATCH pve-storage v2 2/3] lvm: create: use multiple lines for lvcreate commandline definition Friedrich Weber
2025-03-07  9:52 ` [pve-devel] [PATCH pve-storage v2 3/3] fix #4997: lvm: create: disable autoactivation for new logical volumes Friedrich Weber
2025-03-07 12:14 ` Fabian Grünbichler [this message]
2025-03-10 14:01   ` [pve-devel] [PATCH storage/zfsonlinux v2 0/3] fix #4997: lvm: avoid autoactivating (new) LVs after boot Friedrich Weber
2025-03-11 10:40     ` Fabian Grünbichler
2025-03-12  8:39       ` Friedrich Weber
2025-04-29 11:43         ` Friedrich Weber

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1233088344.6300.1741349693091@webmail.proxmox.com \
    --to=f.gruenbichler@proxmox.com \
    --cc=f.weber@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal