all lists on lists.proxmox.com
 help / color / mirror / Atom feed
* [PATCH storage,cluster,manager 0/13] multipath: cluster-wide config, storage and health overview
@ 2026-06-26 12:07 Thomas Lamprecht
  2026-06-26 12:07 ` [PATCH storage 01/13] multipath: add helper library and managed configuration Thomas Lamprecht
                   ` (12 more replies)
  0 siblings, 13 replies; 16+ messages in thread
From: Thomas Lamprecht @ 2026-06-26 12:07 UTC (permalink / raw)
  To: pve-devel

This is a proof of concept for better multipath support in PVE. I talked it
over with Friedrich back in May and put it together to get early feedback,
also because there is other multipath and iSCSI work going on (for example
Mira's storage mapping series).

Today multipath is set up by hand on each node, with nothing managing it
cluster-wide. This series tries to improve that by adding:

 - cluster-wide config: a new /etc/pve/multipath.cfg in pmxcfs, kept as a
   SectionConfig: a 'defaults' section for the global multipathd knobs plus
   one 'wwid' section per allow-listed LUN holding its optional alias and any
   per-LUN knobs. Free-form hardware overrides live in a separate
   /etc/pve/multipath-overrides.conf. pvestatd renders both into the local
   multipathd drop-ins, so you set it up once for the whole cluster. Map names
   stay WWID-based and equal on every node (user_friendly_names no,
   find_multipaths strict).

 - multipath as storage: a new 'multipath' storage type exposes the maps as
   raw volumes by WWID at the stable /dev/disk/by-id path. An LVM storage can
   use it as its 'base', so a shared volume group gets path redundancy with no
   manual device setup. This standalone type is deliberately provisional: the
   cleaner long-term shape is multipath as a capability toggled on the
   transport plugins (iSCSI/FC/NVMe-oF) rather than a peer storage type. It is
   kept standalone here to keep the POC self-contained, and because FC has no
   PVE transport plugin to hang such a capability off.

 - health overview: each node publishes its per-WWID map health into the
   pmxcfs KV store, and /cluster/multipath/status turns that into a per-WWID
   by per-node matrix with a rolled-up cluster-state. The web UI adds a
   Datacenter "Multipath" panel (table plus config editor) and a read-only
   per-node view under Disks. This matrix is the most generalizable piece;
   it is really a per-resource, per-node health roll-up and could become a
   small shared primitive that other features reuse.

Everything keys on the (global) WWID, never the node-local sdX or mpathN names.

Repo (build-)dependencies:
 pve-manager -> pve-storage -> pve-cluster.

I tested it on a three-node cluster against an iSCSI target with two portals:
the config reaches all nodes, the matrix flags a path fault on one node
(rolling that LUN up to 'degraded') while the rest stay healthy, and a guest
on the shared volume group migrates between nodes without copying its disk.
Still a fairly simple test, so more would be needed to be sure nothing is off.

Open points:

 - Health shows the paths a map has right now, so a node that fully lost a
   path (removed, not just failed) still looks fine on its last remaining
   path. The series does surface a node that lost all paths (one that is
   expected but silent) as missing, but not the "down to one of two" case;
   catching lost redundancy properly needs a notion of how many paths to
   expect, and I would like input on how to model that. The "expected nodes"
   set is derived here from where a multipath storage is enabled; sourcing it
   from the storage mapping series instead would be cleaner.
 - Upgrade and adoption: PVE rewrites /etc/multipath/wwids to match its
   allow-list, so it drops WWIDs that PVE did not add, which is risky on nodes
   whose multipath was set up by hand or that boot from SAN. There is also no
   migration from an existing multipath.conf, and a no-touch guarantee for
   boot-from-SAN devices needs thought.
 - The 'multipath' storage type is provisional. The alternative is to make path
   coalescing a property of the transport storages (multipath on iscsi, and
   in-kernel ANA on a future nvme-of) with LVM using the transport as its base,
   instead of a separate type. That gives fewer storages to set up for iSCSI, a
   natural home for NVMe-oF, and no extra peer type next to iscsi, at the cost
   of touching the transport plugins and the ongoing iSCSI/NVMe-oF work
   (Dietmar). I lean towards it as the target and would like opinions.
 - The pure logic (config parsing, health derivation, status aggregation) could
   move to rust crate(s) used from Perl via perlmod, with the cluster and
   multipathd glue wrapper code staying in Perl.
 - Whether the per-node trigger should stay in pvestatd or move to its own
   service or timer.
 - How this should fit with the storage mapping work.

pve-storage:

Thomas Lamprecht (7):
      multipath: add helper library and managed configuration
      api: disks: add read-only multipath status endpoint
      api: multipath: add cluster-wide configuration endpoints
      multipath: add storage plugin for multipath LUNs
      lvm: allow a multipath storage as the base device
      multipath: broadcast per-node map health to the cluster KV store
      api: multipath: add cluster-wide health status endpoint

 src/PVE/API2/Disks.pm              |   7 +
 src/PVE/API2/Disks/Makefile        |   1 +
 src/PVE/API2/Disks/Multipath.pm    | 206 ++++++++++++++
 src/PVE/API2/Makefile              |   1 +
 src/PVE/API2/Multipath.pm          | 538 +++++++++++++++++++++++++++++++++++++
 src/PVE/Makefile                   |   4 +
 src/PVE/Multipath.pm               | 447 ++++++++++++++++++++++++++++++
 src/PVE/Multipath/ClusterConfig.pm |  55 ++++
 src/PVE/Multipath/Config.pm        | 361 +++++++++++++++++++++++++
 src/PVE/Multipath/Generator.pm     | 147 ++++++++++
 src/PVE/Storage.pm                 |   2 +
 src/PVE/Storage/LVMPlugin.pm       |   7 +-
 src/PVE/Storage/Makefile           |   3 +-
 src/PVE/Storage/MultipathPlugin.pm | 186 +++++++++++++
 src/test/Makefile                  |   5 +-
 src/test/run_multipath_tests.pl    | 423 +++++++++++++++++++++++++++++
 16 files changed, 2388 insertions(+), 5 deletions(-)

pve-cluster:

Thomas Lamprecht (1):
      pmxcfs: track cluster-wide multipath configuration

 src/PVE/Cluster.pm  | 2 ++
 src/pmxcfs/status.c | 2 ++
 2 files changed, 4 insertions(+)

pve-manager:

Thomas Lamprecht (5):
      pvestatd: apply the cluster-wide multipath config on each node
      api: cluster: mount the multipath configuration endpoint
      pvestatd: broadcast multipath map health to the cluster
      ui: dc: add multipath health matrix and config editor
      ui: node: show multipath maps and their paths under Disks

 PVE/API2/Cluster.pm            |   7 +
 PVE/Service/pvestatd.pm        |  14 ++
 www/manager6/Makefile          |   2 +
 www/manager6/Utils.js          |  25 +++
 www/manager6/dc/Config.js      |   6 +
 www/manager6/dc/Multipath.js   | 371 +++++++++++++++++++++++++++++++++++++++++
 www/manager6/node/Config.js    |   7 +
 www/manager6/node/Multipath.js | 163 ++++++++++++++++++
 8 files changed, 595 insertions(+)




^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2026-06-26 14:43 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-26 12:07 [PATCH storage,cluster,manager 0/13] multipath: cluster-wide config, storage and health overview Thomas Lamprecht
2026-06-26 12:07 ` [PATCH storage 01/13] multipath: add helper library and managed configuration Thomas Lamprecht
2026-06-26 14:43   ` Maximiliano Sandoval
2026-06-26 12:07 ` [PATCH storage 02/13] api: disks: add read-only multipath status endpoint Thomas Lamprecht
2026-06-26 12:07 ` [PATCH storage 03/13] api: multipath: add cluster-wide configuration endpoints Thomas Lamprecht
2026-06-26 12:07 ` [PATCH storage 04/13] multipath: add storage plugin for multipath LUNs Thomas Lamprecht
2026-06-26 12:07 ` [PATCH storage 05/13] lvm: allow a multipath storage as the base device Thomas Lamprecht
2026-06-26 12:07 ` [PATCH storage 06/13] multipath: broadcast per-node map health to the cluster KV store Thomas Lamprecht
2026-06-26 12:07 ` [PATCH storage 07/13] api: multipath: add cluster-wide health status endpoint Thomas Lamprecht
2026-06-26 12:07 ` [PATCH cluster 08/13] pmxcfs: track cluster-wide multipath configuration Thomas Lamprecht
2026-06-26 12:07 ` [PATCH manager 09/13] pvestatd: apply the cluster-wide multipath config on each node Thomas Lamprecht
2026-06-26 12:07 ` [PATCH manager 10/13] api: cluster: mount the multipath configuration endpoint Thomas Lamprecht
2026-06-26 12:07 ` [PATCH manager 11/13] pvestatd: broadcast multipath map health to the cluster Thomas Lamprecht
2026-06-26 12:07 ` [PATCH manager 12/13] ui: dc: add multipath health matrix and config editor Thomas Lamprecht
2026-06-26 14:05   ` Maximiliano Sandoval
2026-06-26 12:07 ` [PATCH manager 13/13] ui: node: show multipath maps and their paths under Disks Thomas Lamprecht

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal