From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v2 storage,cluster,manager 0/13] multipath: cluster-wide config, storage and health overview
Date: Fri, 3 Jul 2026 14:46:00 +0200 [thread overview]
Message-ID: <20260703124707.1172980-2-t.lamprecht@proxmox.com> (raw)
This is v2 of the proof-of-concept series for better multipath support in
PVE. It addresses the v1 feedback (thx @Maximiliano) and some further
findings of my own made while addressing and testing this.
Changes since v1:
- only prune WWIDs that PVE itself added from the allow-list, leaving a
hand-made or boot-from-SAN setup untouched (the main v1 open point);
entries that merely overlap the cluster config are never adopted
- tear the generated files down again when the cluster config is emptied
- guard all config writes with a digest against concurrent modifications;
the free-form overrides moved to their own endpoint with a digest and
lock of their own (their editor could not save at all in v1)
- surface per-node config-apply failures in the status (now { luns, nodes })
and the Datacenter panel, including a missing multipath-tools package
- mark a LUN 'missing' only on the nodes expected to carry it, derived per
LUN from the consuming storage chain
- timestamp and dedup the health broadcasts; a stale reporter shows as
'unknown' instead of its last snapshot
- update the health grid in place (DiffStore) instead of flickering
- assorted review fixes (naming, theming, i18n, renderer encoding, quoting
of generated values, robustness against malformed pmxcfs kv peer broadcasts)
Original cover letter
=====================
This is a proof of concept for better multipath support in PVE. I talked it
over with Friedrich back in May and put it together to get early feedback,
also because there is other multipath and iSCSI work going on (for example
Mira's storage mapping series).
Today multipath is set up by hand on each node, with nothing managing it
cluster-wide. This series tries to improve that by adding:
- cluster-wide config: a new /etc/pve/multipath.cfg in pmxcfs, kept as a
SectionConfig: a 'defaults' section for the global multipathd knobs plus
one 'wwid' section per allow-listed LUN holding its optional alias and any
per-LUN knobs. Free-form hardware overrides live in a separate
/etc/pve/multipath-overrides.conf. pvestatd renders both into the local
multipathd drop-ins, so you set it up once for the whole cluster. Map names
stay WWID-based and equal on every node (user_friendly_names no,
find_multipaths strict).
- multipath as storage: a new 'multipath' storage type exposes the maps as
raw volumes by WWID at the stable /dev/disk/by-id path. An LVM storage can
use it as its 'base', so a shared volume group gets path redundancy with no
manual device setup. This standalone type is deliberately provisional: the
cleaner long-term shape is multipath as a capability toggled on the
transport plugins (iSCSI/FC/NVMe-oF) rather than a peer storage type. It is
kept standalone here to keep the POC self-contained, and because FC has no
PVE transport plugin to hang such a capability off.
- health overview: each node publishes its per-WWID map health into the
pmxcfs KV store, and /cluster/multipath/status turns that into a per-WWID
by per-node matrix with a rolled-up cluster-state, plus a per-node note when
a node could not apply the configuration. The web UI adds a Datacenter
"Multipath" panel (table plus config editor) and a read-only per-node view
under Disks. This matrix is the most generalizable piece; it is really a
per-resource, per-node health roll-up and could become a small shared
primitive that other features reuse.
Everything keys on the (global) WWID, never the node-local sdX or mpathN names.
Repo (build-)dependencies:
pve-manager -> pve-storage -> pve-cluster.
I tested it on a three-node cluster against an iSCSI target with two portals:
the config reaches all nodes, the matrix flags a path fault on one node
(rolling that LUN up to 'degraded') while the rest stay healthy, a hand-added
WWID survives the managed reconcile untouched, and a node that fails to apply
the config shows up in the status. Still a fairly simple test, so more would be
needed to be sure nothing is off.
Open points:
- Health shows the paths a map has right now, so a node that fully lost a
path (removed, not just failed) still looks fine on its last remaining
path. The series does surface a node that lost all paths (one that is
expected but silent) as missing, but not the "down to one of two" case;
catching lost redundancy properly needs a notion of how many paths to
expect, and I would like input on how to model that. The expected-node set
is derived per LUN from the consuming storage chain's node restrictions,
falling back to every node with a multipath storage enabled; sourcing it
from the storage mapping series instead would be cleaner.
- The 'multipath' storage type is provisional. The alternative is to make path
coalescing a property of the transport storages (multipath on iscsi, and
in-kernel ANA on a future nvme-of) with LVM using the transport as its base,
instead of a separate type. That gives fewer storages to set up for iSCSI, a
natural home for NVMe-oF, and no extra peer type next to iscsi, at the cost
of touching the transport plugins and the ongoing iSCSI/NVMe-oF work
(Dietmar). I lean towards it as the target and would like opinions.
- The pure logic (config parsing, health derivation, status aggregation) could
move to rust crate(s) used from Perl via perlmod, with the cluster and
multipathd glue wrapper code staying in Perl.
- Whether the per-node trigger should stay in pvestatd or move to its own
service or timer.
- How this should fit with the storage mapping work.
pve-storage:
Thomas Lamprecht (7):
multipath: add helper library and managed configuration
api: disks: add read-only multipath status endpoint
api: multipath: add cluster-wide configuration endpoints
multipath: add storage plugin for multipath LUNs
lvm: allow a multipath storage as the base device
multipath: broadcast per-node map health to the cluster KV store
api: multipath: add cluster-wide health status endpoint
src/PVE/API2/Disks.pm | 7 +
src/PVE/API2/Disks/Makefile | 1 +
src/PVE/API2/Disks/Multipath.pm | 206 ++++++++++
src/PVE/API2/Makefile | 1 +
src/PVE/API2/Multipath.pm | 651 +++++++++++++++++++++++++++++
src/PVE/Makefile | 4 +
src/PVE/Multipath.pm | 613 +++++++++++++++++++++++++++
src/PVE/Multipath/ClusterConfig.pm | 73 ++++
src/PVE/Multipath/Config.pm | 380 +++++++++++++++++
src/PVE/Multipath/Generator.pm | 190 +++++++++
src/PVE/Storage.pm | 2 +
src/PVE/Storage/LVMPlugin.pm | 7 +-
src/PVE/Storage/Makefile | 3 +-
src/PVE/Storage/MultipathPlugin.pm | 187 +++++++++
src/PVE/Storage/Plugin.pm | 2 +-
src/test/Makefile | 5 +-
src/test/run_multipath_tests.pl | 586 ++++++++++++++++++++++++++
17 files changed, 2912 insertions(+), 6 deletions(-)
pve-cluster:
Thomas Lamprecht (1):
pmxcfs: track cluster-wide multipath configuration
src/PVE/Cluster.pm | 2 ++
src/pmxcfs/status.c | 2 ++
2 files changed, 4 insertions(+)
pve-manager:
Thomas Lamprecht (5):
pvestatd: apply the cluster-wide multipath config on each node
api: cluster: mount the multipath configuration endpoint
pvestatd: broadcast multipath map health to the cluster
ui: dc: add multipath health matrix and config editor
ui: node: show multipath maps and their paths under Disks
PVE/API2/Cluster.pm | 7 +
PVE/Service/pvestatd.pm | 18 ++
www/manager6/Makefile | 2 +
www/manager6/Utils.js | 25 ++
www/manager6/dc/Config.js | 6 +
www/manager6/dc/Multipath.js | 444 ++++++++++++++++++++++++++++++++
www/manager6/node/Config.js | 7 +
www/manager6/node/Multipath.js | 171 +++++++++++++
8 files changed, 680 insertions(+)
next reply other threads:[~2026-07-03 15:32 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-07-03 12:46 Thomas Lamprecht [this message]
2026-07-03 12:46 ` [PATCH v2 storage 01/13] multipath: add helper library and managed configuration Thomas Lamprecht
2026-07-03 12:46 ` [PATCH v2 storage 02/13] api: disks: add read-only multipath status endpoint Thomas Lamprecht
2026-07-03 12:46 ` [PATCH v2 storage 03/13] api: multipath: add cluster-wide configuration endpoints Thomas Lamprecht
2026-07-03 12:46 ` [PATCH v2 storage 04/13] multipath: add storage plugin for multipath LUNs Thomas Lamprecht
2026-07-03 12:46 ` [PATCH v2 storage 05/13] lvm: allow a multipath storage as the base device Thomas Lamprecht
2026-07-03 12:46 ` [PATCH v2 storage 06/13] multipath: broadcast per-node map health to the cluster KV store Thomas Lamprecht
2026-07-03 12:46 ` [PATCH v2 storage 07/13] api: multipath: add cluster-wide health status endpoint Thomas Lamprecht
2026-07-03 12:46 ` [PATCH v2 cluster 08/13] pmxcfs: track cluster-wide multipath configuration Thomas Lamprecht
2026-07-03 12:46 ` [PATCH v2 manager 09/13] pvestatd: apply the cluster-wide multipath config on each node Thomas Lamprecht
2026-07-03 12:46 ` [PATCH v2 manager 10/13] api: cluster: mount the multipath configuration endpoint Thomas Lamprecht
2026-07-03 12:46 ` [PATCH v2 manager 11/13] pvestatd: broadcast multipath map health to the cluster Thomas Lamprecht
2026-07-03 12:46 ` [PATCH v2 manager 12/13] ui: dc: add multipath health matrix and config editor Thomas Lamprecht
2026-07-03 12:46 ` [PATCH v2 manager 13/13] ui: node: show multipath maps and their paths under Disks Thomas Lamprecht
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260703124707.1172980-2-t.lamprecht@proxmox.com \
--to=t.lamprecht@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox