From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v2 storage 01/13] multipath: add helper library and managed configuration
Date: Fri, 3 Jul 2026 14:46:01 +0200 [thread overview]
Message-ID: <20260703124707.1172980-3-t.lamprecht@proxmox.com> (raw)
In-Reply-To: <20260703124707.1172980-2-t.lamprecht@proxmox.com>
Multipath on PVE is configured by hand and per node today, with nothing
that keeps it consistent across a cluster. Add the foundation for
managing it cluster-wide instead.
The library reads the assembled maps and their health from multipathd.
The configuration is a SectionConfig kept in pmxcfs: one 'defaults'
section for the global multipathd knobs, plus one 'wwid' section per
allow-listed LUN holding its optional alias and any per-LUN knobs.
Parameters are kebab-case and rendered to multipathd's snake_case
keywords, validated through the section schema so a bad value cannot
reach the generated drop-in.
The managed baseline is deliberately conservative: it only assembles
explicitly allow-listed LUNs and keeps map names stable and WWID-based,
so a device is named the same on every node and an LVM PV on it stays
stable cluster-wide. Hardware-specific tuning lives in a separate,
admin-owned override rather than in the generated baseline, and the two
are written to distinct drop-ins, as multipath does not accept a
repeated 'defaults' section in one file.
Parsing and generation stay in a pure module with no dependency on
PVE::Cluster, so they remain unit-testable and usable on a node whose
pve-cluster does not yet observe the new file; registering it in pmxcfs
needs the matching pve-cluster change.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
Changes in v2:
- prune only WWIDs that PVE itself added from /etc/multipath/wwids
- record ownership only for WWIDs PVE actually added, so a hand-made
entry that overlaps the cluster config is never adopted or pruned
- tear the drop-ins and the ownership record down when the cluster
config is emptied, instead of enforcing the defaults forever
- re-check for multipath-tools when it was absent, so a running
pvestatd notices a later install; report the missing package as an
apply error on nodes a multipath storage is enabled on
- double-quote generated values containing whitespace, multipath.conf
only accepts whitespace inside quoted strings
- harden the override guard against same-line brace tricks and
constrain path-selector values
- rename list_wwids to list_etc_multipath_wwids
- sync() returns its apply error instead of swallowing it
src/PVE/Makefile | 4 +
src/PVE/Multipath.pm | 395 +++++++++++++++++++++++++++++
src/PVE/Multipath/ClusterConfig.pm | 73 ++++++
src/PVE/Multipath/Config.pm | 380 +++++++++++++++++++++++++++
src/PVE/Multipath/Generator.pm | 190 ++++++++++++++
src/test/Makefile | 5 +-
src/test/run_multipath_tests.pl | 360 ++++++++++++++++++++++++++
7 files changed, 1406 insertions(+), 1 deletion(-)
create mode 100644 src/PVE/Multipath.pm
create mode 100644 src/PVE/Multipath/ClusterConfig.pm
create mode 100644 src/PVE/Multipath/Config.pm
create mode 100644 src/PVE/Multipath/Generator.pm
create mode 100755 src/test/run_multipath_tests.pl
diff --git a/src/PVE/Makefile b/src/PVE/Makefile
index 9e9f6aa..7ddd646 100644
--- a/src/PVE/Makefile
+++ b/src/PVE/Makefile
@@ -4,6 +4,10 @@
install:
install -D -m 0644 Storage.pm ${DESTDIR}${PERLDIR}/PVE/Storage.pm
install -D -m 0644 Diskmanage.pm ${DESTDIR}${PERLDIR}/PVE/Diskmanage.pm
+ install -D -m 0644 Multipath.pm ${DESTDIR}${PERLDIR}/PVE/Multipath.pm
+ install -D -m 0644 Multipath/Config.pm ${DESTDIR}${PERLDIR}/PVE/Multipath/Config.pm
+ install -D -m 0644 Multipath/ClusterConfig.pm ${DESTDIR}${PERLDIR}/PVE/Multipath/ClusterConfig.pm
+ install -D -m 0644 Multipath/Generator.pm ${DESTDIR}${PERLDIR}/PVE/Multipath/Generator.pm
install -D -m 0644 CephConfig.pm ${DESTDIR}${PERLDIR}/PVE/CephConfig.pm
install -D -m 0644 GuestImport.pm ${DESTDIR}${PERLDIR}/PVE/GuestImport.pm
make -C Storage install
diff --git a/src/PVE/Multipath.pm b/src/PVE/Multipath.pm
new file mode 100644
index 0000000..64118b3
--- /dev/null
+++ b/src/PVE/Multipath.pm
@@ -0,0 +1,395 @@
+package PVE::Multipath;
+
+use strict;
+use warnings;
+
+use JSON qw(decode_json);
+
+use PVE::Tools qw(run_command file_read_firstline file_get_contents file_set_contents);
+
+use PVE::Multipath::Config;
+
+# Helper library around device-mapper multipath (multipathd). The single place that knows how to
+# talk to multipathd and turn its state into a normalized, stable structure for the rest of the
+# storage stack: health reporting, the API, and consumers that wait for a map before binding to it.
+#
+# Everything keys on the SCSI/NAA WWID, as reported by multipathd in the map's 'uuid' field. Map
+# names (mpathN / aliases) and the underlying 'sdX' paths are node-local and unstable; the WWID is
+# not.
+
+my $MULTIPATH = '/sbin/multipath';
+my $MULTIPATHD = '/sbin/multipathd';
+
+# health states for a single map, derived from its path states
+use constant {
+ HEALTH_OPTIMAL => 'optimal', # all paths active
+ HEALTH_DEGRADED => 'degraded', # some but not all paths active
+ HEALTH_FAILED => 'failed', # no active path left
+};
+
+my $supported;
+
+# Cached only while true: a long-lived daemon like pvestatd must notice a later multipath-tools
+# install without a restart, so a negative result is re-checked on every call.
+sub is_supported {
+ return $supported if $supported;
+ $supported = (-x $MULTIPATH && -x $MULTIPATHD) ? 1 : 0;
+ return $supported;
+}
+
+sub assert_supported {
+ die "no multipath support - please install 'multipath-tools'\n" if !is_supported();
+ return 1;
+}
+
+# Returns whether the multipathd daemon is reachable. Used for status output, so it never dies and
+# just reports 0 when multipath is unavailable or the daemon socket cannot be queried.
+sub is_running {
+ return 0 if !is_supported();
+
+ my $running = 0;
+ eval {
+ run_command(
+ [$MULTIPATHD, 'show', 'daemon'],
+ outfunc => sub {
+ my ($line) = @_;
+ # example: "pid 1234 idle" / "pid 1234 running"
+ $running = 1 if $line =~ m/^pid \d+ \S+/;
+ },
+ errfunc => sub { },
+ );
+ };
+ return $running;
+}
+
+# Runs `multipathd show <subcmd> json` and returns the raw output. Kept separate from parsing so
+# tests can feed recorded fixtures to the parse_*() functions without a running daemon.
+my sub query_multipathd_json {
+ my ($subcmd) = @_;
+
+ assert_supported();
+
+ my $output = '';
+ run_command(
+ [$MULTIPATHD, 'show', $subcmd, 'json'],
+ outfunc => sub { $output .= "$_[0]\n"; },
+ errfunc => sub { warn "$_[0]\n"; },
+ );
+
+ return $output;
+}
+
+my sub derive_health {
+ my ($paths_active, $paths_total) = @_;
+
+ return HEALTH_FAILED if !$paths_total || !$paths_active;
+ return HEALTH_OPTIMAL if $paths_active == $paths_total;
+ return HEALTH_DEGRADED;
+}
+
+my sub normalize_path {
+ my ($path) = @_;
+
+ my $res = {
+ dev => $path->{dev},
+ # 'active' or 'failed' - the state device-mapper sees
+ 'dm-state' => $path->{dm_st},
+ # 'running', 'faulty' or 'offline' - the state the kernel block layer sees
+ 'dev-state' => $path->{dev_st},
+ # path checker result, e.g. 'ready' / 'faulty' / 'ghost'
+ 'check-state' => $path->{chk_st},
+ };
+ $res->{priority} = int($path->{pri}) if defined($path->{pri});
+
+ # multipathd renders unset string fields as the literal '[undef]'
+ my $wwpn = $path->{target_wwpn};
+ my $hba = $path->{host_adapter};
+ undef $wwpn if !defined($wwpn) || $wwpn eq '[undef]' || $wwpn eq '';
+ undef $hba if !defined($hba) || $hba eq '[undef]' || $hba eq '';
+ $res->{'target-wwpn'} = $wwpn if defined($wwpn);
+ $res->{'host-adapter'} = $hba if defined($hba);
+ # a real target WWPN (0x...) means Fibre Channel; iSCSI/SAS transport is derived from sysfs by
+ # get_maps(). Do NOT treat the field's mere presence as FC, multipathd reports it as '[undef]'
+ # for iSCSI.
+ $res->{transport} = 'fc' if defined($wwpn) && $wwpn =~ /^0x[0-9a-f]+$/i;
+
+ return $res;
+}
+
+# Turns the output of `multipathd show maps json` into a normalized list of maps. Pure (no I/O) on
+# purpose: it derives everything it can from the JSON alone, so it can be unit-tested against
+# recorded fixtures. Live-only bits (byte size, transport) are added by get_maps() below.
+sub parse_maps_json {
+ my ($json) = @_;
+
+ my $data = eval { decode_json($json) };
+ die "could not parse multipathd maps JSON: $@\n" if $@;
+
+ my $maps = [];
+ for my $map (($data->{maps} // [])->@*) {
+ my $path_groups = [];
+ my ($paths_total, $paths_active) = (0, 0);
+
+ for my $group (($map->{path_groups} // [])->@*) {
+ my $paths = [];
+ for my $path (($group->{paths} // [])->@*) {
+ my $normalized = normalize_path($path);
+ $paths_total++;
+ $paths_active++ if ($normalized->{'dm-state'} // '') eq 'active';
+ push $paths->@*, $normalized;
+ }
+ push $path_groups->@*,
+ {
+ group => int($group->{group} // 0),
+ 'dm-state' => $group->{dm_st},
+ priority => int($group->{pri} // 0),
+ paths => $paths,
+ };
+ }
+
+ push $maps->@*, {
+ wwid => $map->{uuid},
+ name => $map->{name},
+ sysfs => $map->{sysfs}, # the 'dm-N' kernel name
+ 'dm-state' => $map->{dm_st},
+ 'paths-total' => $paths_total,
+ 'paths-active' => $paths_active,
+ health => derive_health($paths_active, $paths_total),
+ 'path-groups' => $path_groups,
+ };
+ }
+
+ return $maps;
+}
+
+# Best-effort byte size of a dm device from sysfs (Linux reports size in 512b sectors regardless of
+# the real block size).
+my sub dm_size_bytes {
+ my ($sysfs) = @_;
+
+ return undef if !$sysfs;
+ my $sectors = file_read_firstline("/sys/block/$sysfs/size");
+ return undef if !defined($sectors) || $sectors !~ m/^\d+$/;
+ return int($sectors) * 512;
+}
+
+my sub dir_has_entries {
+ my ($dir) = @_;
+
+ return 0 if !-d $dir;
+ opendir(my $dh, $dir) or return 0;
+ my @entries = grep { $_ ne '.' && $_ ne '..' } readdir($dh);
+ closedir($dh);
+ return scalar(@entries) ? 1 : 0;
+}
+
+# Best-effort transport of a single 'sdX' path from its sysfs topology; only iSCSI and SAS need it,
+# Fibre Channel is already set from the map JSON.
+my sub path_transport {
+ my ($dev) = @_;
+
+ return undef if !$dev;
+ my $link = readlink("/sys/block/$dev");
+ return undef if !$link;
+ return 'iscsi' if $link =~ m{/session\d+/};
+ return 'fc' if $link =~ m{/rport-\d+};
+ return 'sas' if $link =~ m{/end_device-};
+ return undef;
+}
+
+# Returns the normalized maps enriched with information that requires the local system (size, a
+# stable consumer path). Dies if multipath is not supported; callers that just want status should
+# guard with is_supported()/is_running().
+sub get_maps {
+ my $maps = parse_maps_json(query_multipathd_json('maps'));
+
+ for my $map ($maps->@*) {
+ $map->{size} = dm_size_bytes($map->{sysfs});
+ # WWID-stable path, present independently of the (node-local) map name
+ $map->{path} = "/dev/disk/by-id/dm-uuid-mpath-$map->{wwid}"
+ if defined($map->{wwid});
+ $map->{used} = dir_has_entries("/sys/block/$map->{sysfs}/holders")
+ if $map->{sysfs};
+
+ my %transports;
+ for my $group ($map->{'path-groups'}->@*) {
+ for my $path ($group->{paths}->@*) {
+ $path->{transport} //= path_transport($path->{dev});
+ $transports{ $path->{transport} } = 1 if defined($path->{transport});
+ }
+ }
+ # only expose a map-level transport when all paths agree on it
+ my @transports = keys %transports;
+ $map->{transport} = $transports[0] if scalar(@transports) == 1;
+ }
+
+ return $maps;
+}
+
+sub get_map_for_wwid {
+ my ($wwid) = @_;
+
+ for my $map (get_maps()->@*) {
+ return $map if defined($map->{wwid}) && $map->{wwid} eq $wwid;
+ }
+ return undef;
+}
+
+# Polls until a map for the given WWID exists, up to $timeout seconds. A consumer like the iSCSI
+# plugin uses this after a login or rescan to bind to the coalesced dm device rather than to a
+# transient single 'sdX' path.
+sub wait_for_map {
+ my ($wwid, $timeout) = @_;
+
+ $timeout //= 10;
+
+ my $deadline = time() + $timeout;
+ while (1) {
+ my $map = eval { get_map_for_wwid($wwid) };
+ return $map if $map;
+ return undef if time() >= $deadline;
+ sleep(1);
+ }
+}
+
+my $WWIDS_FILE = '/etc/multipath/wwids';
+
+# Node-local record of the WWIDs Proxmox VE added to $WWIDS_FILE, so the generator prunes only its
+# own entries and never ones from a hand-made or boot-from-SAN setup that share the file. Not
+# consulted by multipathd itself.
+my $MANAGED_WWIDS_FILE = '/etc/multipath/wwids.pve';
+
+# The WWIDs in /etc/multipath/wwids, the on-disk allow-list that multipathd assembles from with
+# 'find_multipaths strict'. Distinct from the cluster config's desired set (Config::wwid_list).
+sub list_etc_multipath_wwids {
+ return [] if !-e $WWIDS_FILE;
+ return PVE::Multipath::Config::parse_wwids(file_get_contents($WWIDS_FILE));
+}
+
+sub add_wwid {
+ my ($wwid) = @_;
+
+ assert_supported();
+ run_command([$MULTIPATH, '-a', $wwid]);
+}
+
+sub remove_wwid {
+ my ($wwid) = @_;
+
+ assert_supported();
+ run_command([$MULTIPATH, '-w', $wwid]);
+}
+
+# The WWIDs Proxmox VE added to the allow-list, from the node-local record (empty if none yet).
+sub managed_wwids {
+ return [] if !-e $MANAGED_WWIDS_FILE;
+ return PVE::Multipath::Config::parse_wwids(file_get_contents($MANAGED_WWIDS_FILE));
+}
+
+sub set_managed_wwids {
+ my ($wwids) = @_;
+
+ # an absent record means Proxmox VE owns nothing, the same state as before first use
+ if (!scalar($wwids->@*)) {
+ if (-e $MANAGED_WWIDS_FILE) {
+ unlink($MANAGED_WWIDS_FILE)
+ or die "could not remove '$MANAGED_WWIDS_FILE': $!\n";
+ }
+ return;
+ }
+ file_set_contents($MANAGED_WWIDS_FILE, PVE::Multipath::Config::format_wwids($wwids));
+}
+
+# Plan the allow-list changes for the generator: add the desired WWIDs that are not active yet, and
+# remove only WWIDs that Proxmox VE added before ($managed) and no longer wants, so entries from a
+# hand-made or boot-from-SAN setup are never touched. The arguments are sets (hashref of wwid => 1).
+sub plan_wwid_changes {
+ my ($desired, $current, $managed) = @_;
+
+ my @to_add = sort grep { !$current->{$_} } keys %$desired;
+ my @to_remove = sort grep { !$desired->{$_} && $current->{$_} } keys %$managed;
+ return (\@to_add, \@to_remove);
+}
+
+# Pure: the WWIDs Proxmox VE owns after a reconcile pass, for the node-local record that limits
+# future pruning. Ownership covers only entries PVE put into the allow-list itself: what it already
+# owned and still wants, what it just added ($added), and entries whose removal failed ($keep, so
+# the prune is retried). A desired WWID that was already in the allow-list from someone else stays
+# foreign, so dropping it from the cluster config later never prunes the hand-made entry. All
+# arguments are sets (hashref of wwid => 1).
+sub owned_wwid_record {
+ my ($desired, $managed, $added, $keep) = @_;
+
+ my %owned = map { $_ => 1 } (grep { $managed->{$_} || $added->{$_} } keys %$desired),
+ keys %$keep;
+ return [sort keys %owned];
+}
+
+# Pure: derive from the parsed cluster-wide storage configuration which nodes are expected to
+# carry multipath maps, and which storage consumes which LUN. A LUN consumed by an LVM storage
+# through a multipath base volume is expected wherever that chain is enabled, the intersection of
+# both node restrictions (an unrestricted storage counts as all nodes); any other allow-listed LUN
+# is expected wherever some multipath storage is enabled. Returns:
+# consumers { wwid => storeid } of the consuming LVM storage
+# nodes { node => 1 } union over all enabled multipath storages
+# 'wwid-nodes' { wwid => { node => 1 } } for consumed LUNs
+sub storage_expectations {
+ my ($storage_cfg, $all_nodes) = @_;
+
+ my $res = { consumers => {}, nodes => {}, 'wwid-nodes' => {} };
+ my $ids = $storage_cfg->{ids} // {};
+
+ my $storage_nodes = sub {
+ my ($scfg) = @_;
+ return { $scfg->{nodes}->%* } if $scfg->{nodes} && %{ $scfg->{nodes} };
+ return { map { $_ => 1 } $all_nodes->@* };
+ };
+
+ for my $storeid (keys %$ids) {
+ my $scfg = $ids->{$storeid};
+ next if ($scfg->{type} // '') ne 'multipath' || $scfg->{disable};
+ my $nodes = $storage_nodes->($scfg);
+ $res->{nodes}->{$_} = 1 for keys %$nodes;
+ }
+
+ for my $storeid (sort keys %$ids) {
+ my $scfg = $ids->{$storeid};
+ next if ($scfg->{type} // '') ne 'lvm' || !defined($scfg->{base});
+
+ # minimal 'storage:volname' split; the volname of a multipath base volume is the WWID
+ my ($baseid, $wwid) = $scfg->{base} =~ m/^([a-zA-Z][a-zA-Z0-9\-_.]*):(\S+)$/;
+ next if !defined($wwid);
+ my $basecfg = $ids->{$baseid};
+ next if !$basecfg || ($basecfg->{type} // '') ne 'multipath';
+
+ $res->{consumers}->{$wwid} = $storeid;
+ next if $scfg->{disable} || $basecfg->{disable};
+
+ my $base_nodes = $storage_nodes->($basecfg);
+ my $lvm_nodes = $storage_nodes->($scfg);
+ $res->{'wwid-nodes'}->{$wwid} =
+ { map { $_ => 1 } grep { $base_nodes->{$_} } keys %$lvm_nodes };
+ }
+
+ return $res;
+}
+
+# Live wrapper around storage_expectations(), reading the cluster-wide storage configuration and
+# node list. Never dies; degrades to empty expectations when either is unavailable.
+sub cluster_storage_expectations {
+ require PVE::Storage;
+ require PVE::Cluster;
+
+ my $cfg = eval { PVE::Storage::config() };
+ return { consumers => {}, nodes => {}, 'wwid-nodes' => {} } if !$cfg;
+ my $all_nodes = eval { PVE::Cluster::get_nodelist() } // [];
+ return storage_expectations($cfg, $all_nodes);
+}
+
+# Re-read the configuration and rebuild maps accordingly, after a config or allow-list change.
+sub reconfigure {
+ assert_supported();
+ run_command([$MULTIPATHD, 'reconfigure']);
+}
+
+1;
diff --git a/src/PVE/Multipath/ClusterConfig.pm b/src/PVE/Multipath/ClusterConfig.pm
new file mode 100644
index 0000000..aa8400e
--- /dev/null
+++ b/src/PVE/Multipath/ClusterConfig.pm
@@ -0,0 +1,73 @@
+package PVE::Multipath::ClusterConfig;
+
+use strict;
+use warnings;
+
+use Digest::SHA ();
+
+use PVE::Cluster qw(cfs_register_file cfs_read_file cfs_write_file cfs_lock_file);
+
+use PVE::Multipath::Config;
+
+# Cluster-wide multipath configuration, replicated by pmxcfs. The structured allow-list, aliases and
+# knobs live in multipath.cfg (a SectionConfig); the free-form hardware override text lives in a
+# separate plain file so it stays hand-editable and diffable.
+my $FILENAME = 'multipath.cfg';
+my $OVERRIDES_FILENAME = 'multipath-overrides.conf';
+
+cfs_register_file(
+ $FILENAME,
+ sub { PVE::Multipath::Config->parse_config(@_); },
+ sub { PVE::Multipath::Config->write_config(@_); },
+);
+
+cfs_register_file(
+ $OVERRIDES_FILENAME,
+ \&PVE::Multipath::Config::parse_overrides,
+ \&PVE::Multipath::Config::write_overrides,
+);
+
+sub read_config {
+ return cfs_read_file($FILENAME);
+}
+
+sub write_config {
+ my ($cfg) = @_;
+ cfs_write_file($FILENAME, $cfg);
+}
+
+sub read_overrides {
+ return cfs_read_file($OVERRIDES_FILENAME);
+}
+
+sub write_overrides {
+ my ($text) = @_;
+ cfs_write_file($OVERRIDES_FILENAME, $text);
+}
+
+# Digest over the raw override text, to detect concurrent modifications; the SectionConfig digest
+# of multipath.cfg does not cover this separate file.
+sub overrides_digest {
+ my ($text) = @_;
+ return Digest::SHA::sha1_hex($text // '');
+}
+
+sub lock_config {
+ my ($code, $errmsg) = @_;
+
+ cfs_lock_file($FILENAME, undef, $code);
+ if (my $err = $@) {
+ $errmsg ? die "$errmsg: $err" : die $err;
+ }
+}
+
+sub lock_overrides {
+ my ($code, $errmsg) = @_;
+
+ cfs_lock_file($OVERRIDES_FILENAME, undef, $code);
+ if (my $err = $@) {
+ $errmsg ? die "$errmsg: $err" : die $err;
+ }
+}
+
+1;
diff --git a/src/PVE/Multipath/Config.pm b/src/PVE/Multipath/Config.pm
new file mode 100644
index 0000000..9db59df
--- /dev/null
+++ b/src/PVE/Multipath/Config.pm
@@ -0,0 +1,380 @@
+package PVE::Multipath::Config;
+
+use strict;
+use warnings;
+
+use PVE::SectionConfig;
+
+use base qw(PVE::SectionConfig);
+
+# Parser and writer for the cluster-wide source of truth in pmxcfs (/etc/pve/multipath.cfg). It is a
+# SectionConfig: a single 'defaults' section for global multipathd knobs, plus one 'wwid' section
+# per allow-listed LUN holding its optional alias and per-LUN knobs. Free-form hardware overrides
+# (device {} entries) live in a separate plain file, see PVE::Multipath::ClusterConfig. Kept pure so
+# it stays unit-testable without PVE::Cluster.
+
+# Conservative, cluster-friendly defaults, applied when the 'defaults' section omits them:
+# - find_multipaths strict -> only explicitly allow-listed LUNs get assembled, so boot/root and
+# unrelated disks stay untouched.
+# - user_friendly_names no -> the map name is the WWID, identical on every node, so an LVM PV on
+# /dev/disk/by-id/dm-uuid-mpath-<wwid> is stable cluster-wide without a node-local bindings file.
+my $MANAGED_DEFAULTS = {
+ 'find-multipaths' => 'strict',
+ 'user-friendly-names' => 'no',
+ 'polling-interval' => 5,
+};
+
+sub managed_defaults { return { $MANAGED_DEFAULTS->%* }; }
+
+# Knobs valid both globally (defaults) and per-LUN (a multipaths{} entry), per multipath.conf(5).
+my $shared_knobs = {
+ 'no-path-retry' => {
+ type => 'string',
+ pattern => '(?:queue|fail|\d+)',
+ typetext => 'queue|fail|<count>',
+ description =>
+ "How to react when all paths are down: keep queuing, fail at once, or retry"
+ . " for the given number of polling intervals.",
+ optional => 1,
+ },
+ 'path-grouping-policy' => {
+ type => 'string',
+ enum => [qw(failover multibus group_by_serial group_by_prio group_by_node_name)],
+ description => "How paths are grouped into priority groups.",
+ optional => 1,
+ },
+ failback => {
+ type => 'string',
+ pattern => '(?:manual|immediate|followover|\d+)',
+ typetext => 'manual|immediate|followover|<seconds>',
+ description => "When to fail back to a restored higher-priority path group.",
+ optional => 1,
+ },
+ 'path-selector' => {
+ type => 'string',
+ pattern =>
+ '(?:round-robin|queue-length|service-time|historical-service-time|io-affinity) \d+',
+ typetext => '<selector> <version>',
+ description => "Path selector algorithm used within a priority group, for example"
+ . " 'service-time 0'.",
+ optional => 1,
+ },
+};
+
+# Knobs that only make sense globally.
+my $defaults_only_knobs = {
+ 'find-multipaths' => {
+ type => 'string',
+ enum => [qw(yes no strict greedy smart)],
+ default => 'strict',
+ description => "Which devices multipathd assembles into a map. 'strict' only takes"
+ . " explicitly allow-listed WWIDs.",
+ optional => 1,
+ },
+ 'user-friendly-names' => {
+ type => 'string',
+ enum => [qw(yes no)],
+ default => 'no',
+ description => "Whether to use node-local mpathN names. Keep 'no' for stable WWID-based"
+ . " names across the cluster.",
+ optional => 1,
+ },
+ 'polling-interval' => {
+ type => 'integer',
+ minimum => 1,
+ default => 5,
+ description => "Interval between path checks, in seconds.",
+ optional => 1,
+ },
+};
+
+# Knobs that only make sense per-LUN.
+my $wwid_only_knobs = {
+ alias => {
+ type => 'string',
+ pattern => '[a-zA-Z0-9][a-zA-Z0-9._-]*',
+ maxLength => 64,
+ description =>
+ "Human-readable map name for this WWID; multipathd uses it as the map name.",
+ optional => 1,
+ },
+ 'rr-min-io-rq' => {
+ type => 'integer',
+ minimum => 1,
+ description =>
+ "Number of I/O requests to route to a path before switching, request-based.",
+ optional => 1,
+ },
+ 'rr-weight' => {
+ type => 'string',
+ enum => [qw(priorities uniform)],
+ description => "Whether to weight paths by priority when balancing I/O.",
+ optional => 1,
+ },
+};
+
+my $defaultData = {
+ propertyList => {
+ type => { description => "Section type ('defaults' or 'wwid')." },
+ id => {
+ type => 'string',
+ description =>
+ "Section ID: the literal 'defaults', or a LUN WWID for 'wwid' sections.",
+ pattern => '[a-zA-Z0-9._:-]+',
+ maxLength => 128,
+ },
+ $shared_knobs->%*,
+ $defaults_only_knobs->%*,
+ $wwid_only_knobs->%*,
+ },
+};
+
+sub private { return $defaultData; }
+
+package PVE::Multipath::Config::Defaults;
+
+use base qw(PVE::Multipath::Config);
+
+sub type { return 'defaults'; }
+
+sub options {
+ return {
+ 'find-multipaths' => { optional => 1 },
+ 'user-friendly-names' => { optional => 1 },
+ 'polling-interval' => { optional => 1 },
+ 'no-path-retry' => { optional => 1 },
+ 'path-grouping-policy' => { optional => 1 },
+ failback => { optional => 1 },
+ 'path-selector' => { optional => 1 },
+ };
+}
+
+__PACKAGE__->register();
+
+package PVE::Multipath::Config::Wwid;
+
+use base qw(PVE::Multipath::Config);
+
+sub type { return 'wwid'; }
+
+sub options {
+ return {
+ alias => { optional => 1 },
+ 'no-path-retry' => { optional => 1 },
+ 'path-grouping-policy' => { optional => 1 },
+ failback => { optional => 1 },
+ 'path-selector' => { optional => 1 },
+ 'rr-min-io-rq' => { optional => 1 },
+ 'rr-weight' => { optional => 1 },
+ };
+}
+
+__PACKAGE__->register();
+
+package PVE::Multipath::Config;
+
+__PACKAGE__->init();
+
+# multipathd subsections accept only these top-level keywords; the admin override file is checked
+# against them. 'multipaths' is generated from the wwid sections, so an admin block would collide.
+my $OVERRIDE_KEYWORDS =
+ { devices => 1, overrides => 1, defaults => 1, blacklist => 1, blacklist_exceptions => 1 };
+
+# Validate the free-form override text before it can break multipathd's parser cluster-wide. This is
+# a guard, not a full parse: balanced braces, only known top-level sections, and no 'multipaths'
+# block (that is generated from the wwid sections and a duplicate is fatal to multipathd).
+sub check_overrides {
+ my ($text) = @_;
+
+ return if !defined($text) || $text !~ /\S/;
+
+ # walk every block opener and brace in order, so a section keyword after a closing brace on
+ # the same line is still checked at its real depth
+ my $depth = 0;
+ for my $line (split(/\n/, $text)) {
+ next if $line =~ /^\s*#/;
+ while ($line =~ /(\w+)\s*\{|([{}])/g) {
+ if (defined($1)) {
+ my $kw = $1;
+ die "multipath overrides: 'multipaths' is managed via aliases, do not set it"
+ . " here\n"
+ if $kw eq 'multipaths';
+ die "multipath overrides: unknown top-level section '$kw'\n"
+ if $depth == 0 && !$OVERRIDE_KEYWORDS->{$kw};
+ $depth++;
+ } elsif ($2 eq '{') {
+ $depth++;
+ } else {
+ die "multipath overrides: unbalanced braces\n" if $depth == 0;
+ $depth--;
+ }
+ }
+ }
+ die "multipath overrides: unbalanced braces\n" if $depth != 0;
+
+ return;
+}
+
+# Read/write the separate, admin-owned override file (/etc/pve/multipath-overrides.conf). Stored and
+# rendered verbatim, so it stays hand-editable and diffable.
+sub parse_overrides {
+ my ($filename, $raw) = @_;
+ return $raw // '';
+}
+
+sub write_overrides {
+ my ($filename, $text) = @_;
+ $text //= '';
+ $text =~ s/\s+$//;
+ return length($text) ? "$text\n" : '';
+}
+
+my $MANAGED_HEADER =
+ "# This file is managed by Proxmox VE - do not edit by hand.\n"
+ . "# Hardware-/node-specific overrides belong in the override config.\n";
+
+# A multipath.conf value must be a single word or one double-quoted string; only the quoted form
+# can carry whitespace, see multipath.conf(5). The schemas admit no double quotes in values, so no
+# escaping is needed.
+my sub render_value {
+ my ($value) = @_;
+ return $value =~ /\s/ ? "\"$value\"" : $value;
+}
+
+# Renders a named section from a key => value hash, keys sorted for a stable, diffable result. The
+# config and API use kebab-case parameters, multipathd keywords are snake_case, so map '-' to '_'.
+my sub render_section {
+ my ($name, $kv) = @_;
+
+ my $out = "$name {\n";
+ for my $key (sort keys $kv->%*) {
+ (my $keyword = $key) =~ tr/-/_/;
+ $out .= "\t$keyword " . render_value($kv->{$key}) . "\n";
+ }
+ $out .= "}\n";
+ return $out;
+}
+
+# Builds the Proxmox-managed baseline drop-in (header + defaults section) from the effective global
+# knobs. Admin overrides are not merged in here: they go into a separate conf.d file, as multipath
+# rejects two 'defaults' blocks in one file (duplicate keyword) and drops the second.
+sub generate_managed_conf {
+ my ($defaults) = @_;
+ $defaults //= managed_defaults();
+
+ return $MANAGED_HEADER . "\n" . render_section('defaults', $defaults);
+}
+
+# The WWID allow-list file (/etc/multipath/wwids) holds one '/<wwid>/' per line; parse it and back.
+sub parse_wwids {
+ my ($text) = @_;
+
+ my $wwids = [];
+ for my $line (split(/\n/, $text // '')) {
+ next if $line =~ /^\s*#/;
+ next if $line =~ /^\s*$/;
+ if ($line =~ m{^/(.+)/\s*$}) {
+ push $wwids->@*, $1;
+ }
+ }
+ return $wwids;
+}
+
+sub format_wwids {
+ my ($wwids) = @_;
+
+ my $out = "# Multipath wwids, managed by Proxmox VE\n";
+ $out .= "/$_/\n" for sort $wwids->@*;
+ return $out;
+}
+
+# Builds a 'multipaths {}' block from the per-WWID sections (alias plus any per-LUN knobs); returns
+# the empty string when no WWID has an alias or a knob set.
+sub build_multipaths_block {
+ my ($wwid_opts) = @_;
+
+ my @entries = grep { %{ $wwid_opts->{$_} } } sort keys %$wwid_opts;
+ return '' if !@entries;
+
+ my $out = "multipaths {\n";
+ for my $wwid (@entries) {
+ my $opts = $wwid_opts->{$wwid};
+ $out .= "\tmultipath {\n";
+ $out .= "\t\twwid $wwid\n";
+ for my $key (sort keys %$opts) {
+ (my $keyword = $key) =~ tr/-/_/;
+ $out .= "\t\t$keyword " . render_value($opts->{$key}) . "\n";
+ }
+ $out .= "\t}\n";
+ }
+ $out .= "}\n";
+ return $out;
+}
+
+# The knob property definitions as an API parameter schema. Strip the schema 'default' so an update
+# that omits a knob leaves it unchanged instead of resetting it to the managed default.
+my sub api_schema {
+ my ($props) = @_;
+
+ my $res = {};
+ for my $key (keys %$props) {
+ $res->{$key} = { $props->{$key}->%* };
+ delete $res->{$key}->{default};
+ }
+ return $res;
+}
+
+# Settable global knobs (the 'defaults' section) as an API parameter schema.
+sub defaults_api_schema {
+ return api_schema({ $shared_knobs->%*, $defaults_only_knobs->%* });
+}
+
+# Settable per-WWID knobs (including the alias) as an API parameter schema.
+sub wwid_api_schema {
+ return api_schema({ $shared_knobs->%*, $wwid_only_knobs->%* });
+}
+
+# Effective global knobs: the 'defaults' section merged onto the conservative managed defaults.
+sub effective_defaults {
+ my ($cfg) = @_;
+
+ my $defaults = managed_defaults();
+ if (my $section = $cfg->{ids}->{defaults}) {
+ $defaults->{$_} = $section->{$_} for grep { $_ ne 'type' } keys %$section;
+ }
+ return $defaults;
+}
+
+# The allow-listed WWIDs, that is the ids of the 'wwid' sections.
+sub wwid_list {
+ my ($cfg) = @_;
+ return [sort grep { ($cfg->{ids}->{$_}->{type} // '') eq 'wwid' } keys $cfg->{ids}->%*];
+}
+
+# { wwid => alias } for the WWIDs that have one.
+sub aliases {
+ my ($cfg) = @_;
+
+ my $res = {};
+ for my $wwid (keys $cfg->{ids}->%*) {
+ my $section = $cfg->{ids}->{$wwid};
+ next if ($section->{type} // '') ne 'wwid';
+ $res->{$wwid} = $section->{alias} if defined($section->{alias});
+ }
+ return $res;
+}
+
+# { wwid => { alias?, knob => value, ... } }, the per-WWID input to build_multipaths_block().
+sub wwid_opts {
+ my ($cfg) = @_;
+
+ my $res = {};
+ for my $wwid (keys $cfg->{ids}->%*) {
+ my $section = $cfg->{ids}->{$wwid};
+ next if ($section->{type} // '') ne 'wwid';
+ $res->{$wwid} = { map { $_ => $section->{$_} } grep { $_ ne 'type' } keys %$section };
+ }
+ return $res;
+}
+
+1;
diff --git a/src/PVE/Multipath/Generator.pm b/src/PVE/Multipath/Generator.pm
new file mode 100644
index 0000000..b840e9f
--- /dev/null
+++ b/src/PVE/Multipath/Generator.pm
@@ -0,0 +1,190 @@
+package PVE::Multipath::Generator;
+
+use strict;
+use warnings;
+
+use File::Path qw(make_path);
+
+use PVE::Tools qw(file_get_contents file_set_contents);
+
+use PVE::Multipath;
+use PVE::Multipath::Config;
+use PVE::Multipath::ClusterConfig;
+
+# Renders the effective node-local multipath configuration from the cluster-wide source of truth
+# (/etc/pve/multipath.cfg) and reloads multipathd when something changed.
+#
+# The rendered files live on the local filesystem, so they survive reboots and are available to
+# multipathd at boot even before pmxcfs is up; the last successful render is the boot-time fallback.
+
+# Proxmox-owned drop-ins; the admin's /etc/multipath.conf keeps its default 'config_dir
+# /etc/multipath/conf.d'. The managed baseline, the admin overrides, and the generated aliases each
+# get their own file so multipath merges them across files instead of hitting a duplicate section
+# keyword: two 'defaults' blocks in one file are rejected outright, and our 'multipaths' alias block
+# would clash with a 'multipaths' section in the overrides. The overrides file sorts after the
+# baseline, so an admin's defaults override it; the aliases file is a separate 'multipaths' block,
+# so its order does not matter.
+my $DEFAULTS_DROPIN = '/etc/multipath/conf.d/pve-defaults.conf';
+my $OVERRIDES_DROPIN = '/etc/multipath/conf.d/pve-overrides.conf';
+my $ALIASES_DROPIN = '/etc/multipath/conf.d/pve-aliases.conf';
+
+my sub write_if_changed {
+ my ($path, $content) = @_;
+
+ my $old = -e $path ? eval { file_get_contents($path) } : undef;
+ return 0 if defined($old) && $old eq $content;
+
+ my $dir = $path =~ s!/[^/]+$!!r;
+ make_path($dir) if !-d $dir;
+ file_set_contents($path, $content);
+ return 1;
+}
+
+my sub remove_if_present {
+ my ($path) = @_;
+
+ return 0 if !-e $path;
+ unlink($path) or die "could not remove '$path': $!\n";
+ return 1;
+}
+
+# Whether the cluster-wide configuration has anything for the nodes to render; when it does not,
+# the generator tears its local files down instead of enforcing the managed defaults forever, so
+# the node falls back to a hand-managed or pristine multipath setup.
+my sub config_in_use {
+ my ($cfg, $overrides) = @_;
+
+ return
+ scalar(PVE::Multipath::Config::wwid_list($cfg)->@*)
+ || (defined($overrides) && length($overrides))
+ || $cfg->{ids}->{defaults};
+}
+
+sub regenerate {
+ my ($cfg, $overrides) = @_;
+ $cfg //= PVE::Multipath::ClusterConfig::read_config();
+ $overrides //= PVE::Multipath::ClusterConfig::read_overrides();
+
+ my $changed = 0;
+
+ if (config_in_use($cfg, $overrides)) {
+ my $defaults = PVE::Multipath::Config::effective_defaults($cfg);
+ $changed = 1
+ if write_if_changed(
+ $DEFAULTS_DROPIN,
+ PVE::Multipath::Config::generate_managed_conf($defaults),
+ );
+ } else {
+ $changed = 1 if remove_if_present($DEFAULTS_DROPIN);
+ }
+
+ if (defined($overrides) && length($overrides)) {
+ my $content =
+ "# Managed by Proxmox VE - edit overrides in /etc/pve/multipath-overrides.conf.\n\n"
+ . "$overrides\n";
+ $changed = 1 if write_if_changed($OVERRIDES_DROPIN, $content);
+ } else {
+ $changed = 1 if remove_if_present($OVERRIDES_DROPIN);
+ }
+
+ my $block =
+ PVE::Multipath::Config::build_multipaths_block(PVE::Multipath::Config::wwid_opts($cfg));
+ if (length($block)) {
+ my $content =
+ "# Managed by Proxmox VE - edit aliases and per-LUN options in /etc/pve/multipath.cfg.\n\n"
+ . $block;
+ $changed = 1 if write_if_changed($ALIASES_DROPIN, $content);
+ } else {
+ $changed = 1 if remove_if_present($ALIASES_DROPIN);
+ }
+
+ # Bring the WWID allow-list (/etc/multipath/wwids) in line with the cluster config through
+ # multipath's own add/remove, so its on-disk format stays intact. Prune only WWIDs that Proxmox
+ # VE added itself (tracked node-locally), never ones from a hand-made or boot-from-SAN setup that
+ # share the file. Isolate each op: one failing WWID must not abort the whole pass, or it would
+ # stall every other WWID on every run; a failed op leaves the file unchanged and is retried next
+ # pass.
+ my %desired = map { $_ => 1 } PVE::Multipath::Config::wwid_list($cfg)->@*;
+ my %current = map { $_ => 1 } PVE::Multipath::list_etc_multipath_wwids()->@*;
+ my %managed = map { $_ => 1 } PVE::Multipath::managed_wwids()->@*;
+ my ($to_add, $to_remove) = PVE::Multipath::plan_wwid_changes(\%desired, \%current, \%managed);
+
+ my @errors;
+ my %added; # WWIDs this pass actually put into the allow-list, the only new ones PVE owns
+ for my $wwid ($to_add->@*) {
+ eval { PVE::Multipath::add_wwid($wwid); };
+ if (my $err = $@) {
+ push @errors, "adding WWID '$wwid' failed - $err";
+ } else {
+ $changed = 1;
+ $added{$wwid} = 1;
+ }
+ }
+ my %kept; # WWIDs whose prune failed: keep owning them so the removal is retried
+ for my $wwid ($to_remove->@*) {
+ eval { PVE::Multipath::remove_wwid($wwid); };
+ if (my $err = $@) {
+ push @errors, "removing WWID '$wwid' failed - $err";
+ $kept{$wwid} = 1;
+ } else {
+ $changed = 1;
+ }
+ }
+
+ # remember what Proxmox VE now owns so the next pass prunes only its own entries; write only when
+ # the set actually changes, to avoid needless churn
+ my $owned = PVE::Multipath::owned_wwid_record(\%desired, \%managed, \%added, \%kept);
+ if (join("\0", $owned->@*) ne join("\0", sort keys %managed)) {
+ eval { PVE::Multipath::set_managed_wwids($owned); };
+ push @errors, "recording managed WWIDs failed - $@" if $@;
+ }
+
+ # reload the daemon for whatever did converge, even if some ops failed
+ if ($changed && PVE::Multipath::is_running()) {
+ eval { PVE::Multipath::reconfigure(); };
+ push @errors, "reconfigure failed - $@" if $@;
+ }
+
+ die join('', @errors) if @errors;
+
+ return $changed;
+}
+
+# Safe periodic entry point for a status loop like pvestatd: a no-op when multipath is not in use on
+# this node, and never throws. Returns undef on success or a no-op, or an error message the caller
+# can surface (for example by broadcasting it) so a drifted node does not fail silently.
+sub sync {
+ my $cfg = eval { PVE::Multipath::ClusterConfig::read_config() };
+ return "reading cluster config failed - $@" if $@;
+
+ my $overrides = eval { PVE::Multipath::ClusterConfig::read_overrides() };
+ my $in_use = config_in_use($cfg, $overrides);
+
+ if (!PVE::Multipath::is_supported()) {
+ # a node a multipath storage is enabled on cannot apply an in-use config without the
+ # tools, so report that instead of silently showing up as 'missing' in the health matrix;
+ # unconcerned nodes (quorum or compute-only) stay silent
+ return if !$in_use;
+ require PVE::INotify;
+ my $node = PVE::INotify::nodename();
+ my $expectations = PVE::Multipath::cluster_storage_expectations();
+ return if !$expectations->{nodes}->{$node};
+ return "multipath-tools is not installed\n";
+ }
+
+ # stay out of the way unless the feature is in use or its local files still need a teardown
+ # (a leftover drop-in or managed-WWIDs record means a prior pass failed partway; retry it)
+ return
+ if !$in_use
+ && !-e $DEFAULTS_DROPIN
+ && !-e $OVERRIDES_DROPIN
+ && !-e $ALIASES_DROPIN
+ && !scalar(PVE::Multipath::managed_wwids()->@*);
+
+ eval { regenerate($cfg, $overrides) };
+ return "$@" if $@;
+
+ return;
+}
+
+1;
diff --git a/src/test/Makefile b/src/test/Makefile
index ee025bc..51c7360 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -1,6 +1,6 @@
all: test
-test: test_zfspoolplugin test_lvmplugin test_disklist test_bwlimit test_plugin test_ovf test_volume_access
+test: test_zfspoolplugin test_lvmplugin test_disklist test_bwlimit test_plugin test_ovf test_volume_access test_multipath
test_zfspoolplugin: run_test_zfspoolplugin.pl
./run_test_zfspoolplugin.pl
@@ -22,3 +22,6 @@ test_ovf: run_ovf_tests.pl
test_volume_access: run_volume_access_tests.pl
./run_volume_access_tests.pl
+
+test_multipath: run_multipath_tests.pl
+ ./run_multipath_tests.pl
diff --git a/src/test/run_multipath_tests.pl b/src/test/run_multipath_tests.pl
new file mode 100755
index 0000000..09b6061
--- /dev/null
+++ b/src/test/run_multipath_tests.pl
@@ -0,0 +1,360 @@
+#!/usr/bin/perl
+
+use strict;
+use warnings;
+
+use JSON;
+use Test::More;
+
+use lib ('.', '..');
+use PVE::Multipath;
+use PVE::Multipath::Config;
+
+# A recorded `multipathd show maps json` reply with three maps exercising each
+# health state: an all-active map, a partially-failed map and an all-failed map.
+my $maps_json = <<'EOF';
+{
+ "major_version": 0,
+ "minor_version": 1,
+ "maps": [
+ {
+ "name": "mpatha",
+ "uuid": "3600140500a1b2c3d4e5f6a7b8c9d0e1f",
+ "sysfs": "dm-0",
+ "dm_st": "active",
+ "paths": 2,
+ "path_groups": [
+ {
+ "group": 1,
+ "dm_st": "active",
+ "pri": 50,
+ "paths": [
+ { "dev": "sdb", "dm_st": "active", "dev_st": "running", "chk_st": "ready", "pri": 50, "target_wwpn": "0x500a098000000001" },
+ { "dev": "sdc", "dm_st": "active", "dev_st": "running", "chk_st": "ready", "pri": 50, "target_wwpn": "0x500a098000000002" }
+ ]
+ }
+ ]
+ },
+ {
+ "name": "mpathb",
+ "uuid": "360014050aabbccddeeff00112233445566",
+ "sysfs": "dm-1",
+ "dm_st": "active",
+ "paths": 2,
+ "path_groups": [
+ {
+ "group": 1,
+ "dm_st": "active",
+ "pri": 10,
+ "paths": [
+ { "dev": "sdd", "dm_st": "active", "dev_st": "running", "chk_st": "ready", "pri": 10 }
+ ]
+ },
+ {
+ "group": 2,
+ "dm_st": "enabled",
+ "pri": 0,
+ "paths": [
+ { "dev": "sde", "dm_st": "failed", "dev_st": "faulty", "chk_st": "faulty", "pri": 0 }
+ ]
+ }
+ ]
+ },
+ {
+ "name": "mpathc",
+ "uuid": "36001405ffffffffffffffffffffffffff",
+ "sysfs": "dm-2",
+ "dm_st": "active",
+ "paths": 1,
+ "path_groups": [
+ {
+ "group": 1,
+ "dm_st": "enabled",
+ "pri": 0,
+ "paths": [
+ { "dev": "sdf", "dm_st": "failed", "dev_st": "faulty", "chk_st": "faulty", "pri": 0, "target_wwpn": "[undef]", "host_adapter": "[undef]" }
+ ]
+ }
+ ]
+ }
+ ]
+}
+EOF
+
+my $maps = PVE::Multipath::parse_maps_json($maps_json);
+
+is(scalar($maps->@*), 3, 'parsed all three maps');
+
+my ($a, $b, $c) = $maps->@*;
+
+# fully healthy map
+is($a->{wwid}, '3600140500a1b2c3d4e5f6a7b8c9d0e1f', 'map a wwid taken from uuid');
+is($a->{name}, 'mpatha', 'map a name');
+is($a->{sysfs}, 'dm-0', 'map a sysfs name');
+is($a->{'paths-total'}, 2, 'map a counts both paths');
+is($a->{'paths-active'}, 2, 'map a has two active paths');
+is($a->{health}, 'optimal', 'map a is optimal');
+is(scalar($a->{'path-groups'}->@*), 1, 'map a has one path group');
+is(
+ $a->{'path-groups'}->[0]->{paths}->[0]->{'target-wwpn'},
+ '0x500a098000000001',
+ 'FC target wwpn is preserved',
+);
+is(
+ $a->{'path-groups'}->[0]->{paths}->[0]->{transport},
+ 'fc',
+ 'transport derived as fc from a target wwpn',
+);
+
+# one failed path out of two
+is($b->{'paths-total'}, 2, 'map b counts both paths across groups');
+is($b->{'paths-active'}, 1, 'map b has one active path');
+is($b->{health}, 'degraded', 'map b is degraded');
+
+# no active path left
+is($c->{'paths-total'}, 1, 'map c counts its single path');
+is($c->{'paths-active'}, 0, 'map c has no active path');
+is($c->{health}, 'failed', 'map c is failed');
+ok(
+ !defined($c->{'path-groups'}->[0]->{paths}->[0]->{'target-wwpn'}),
+ "multipathd '[undef]' target_wwpn is cleaned away (not stored)",
+);
+ok(
+ !defined($c->{'path-groups'}->[0]->{paths}->[0]->{transport}),
+ "'[undef]' target_wwpn does not imply fc transport",
+);
+
+# empty / no maps must parse to an empty list, not die
+my $empty = PVE::Multipath::parse_maps_json('{ "major_version": 0, "maps": [] }');
+is_deeply($empty, [], 'no maps parses to empty list');
+
+# malformed input must die with a clear error
+eval { PVE::Multipath::parse_maps_json('not json') };
+ok($@ =~ m/could not parse multipathd maps JSON/, 'invalid JSON raises a clear error');
+
+# --- config generation / WWID allow-list ---
+my $conf = PVE::Multipath::Config::generate_managed_conf();
+like($conf, qr/managed by Proxmox VE/, 'managed conf carries the managed header');
+like($conf, qr/user_friendly_names no/, 'baseline sets user_friendly_names no');
+like($conf, qr/find_multipaths strict/, 'baseline opts in explicitly via find_multipaths strict');
+is(
+ scalar(() = $conf =~ /^defaults \{/mg),
+ 1,
+ 'baseline has exactly one defaults block (a second would be a duplicate-keyword error)',
+);
+
+# multipath.conf only allows whitespace inside double-quoted values
+like(
+ PVE::Multipath::Config::generate_managed_conf({
+ PVE::Multipath::Config::managed_defaults()->%*, 'path-selector' => 'service-time 0',
+ }),
+ qr/^\tpath_selector "service-time 0"$/m,
+ 'a value containing whitespace renders double-quoted, single words stay bare',
+);
+
+my $wwids = PVE::Multipath::Config::parse_wwids("# Multipath wwids\n/3600abc/\n/3600def/\n");
+is_deeply($wwids, ['3600abc', '3600def'], 'parse_wwids extracts the wwids');
+like(
+ PVE::Multipath::Config::format_wwids(['3600def', '3600abc']),
+ qr{/3600abc/\n/3600def/},
+ 'format_wwids sorts and slash-wraps',
+);
+
+# --- cluster config (pmxcfs source of truth): SectionConfig parse/write ---
+my $raw =
+ "defaults: defaults\n\tfind-multipaths strict\n\tno-path-retry queue\n\n"
+ . "wwid: 3600def\n\talias san-b-lun0\n\n"
+ . "wwid: 3600abc\n\talias san-a-lun0\n\tno-path-retry 18\n";
+my $cc = PVE::Multipath::Config->parse_config('multipath.cfg', $raw);
+is_deeply(
+ PVE::Multipath::Config::wwid_list($cc),
+ ['3600abc', '3600def'],
+ 'wwid sections become the allow-list (sorted)',
+);
+is_deeply(
+ PVE::Multipath::Config::aliases($cc),
+ { '3600abc' => 'san-a-lun0', '3600def' => 'san-b-lun0' },
+ 'aliases read from the wwid sections',
+);
+is(
+ PVE::Multipath::Config::effective_defaults($cc)->{'no-path-retry'},
+ 'queue',
+ 'defaults section knob is read',
+);
+is(
+ PVE::Multipath::Config::effective_defaults($cc)->{'user-friendly-names'},
+ 'no',
+ 'an unset defaults knob falls back to the managed default',
+);
+
+my $written = PVE::Multipath::Config->write_config('multipath.cfg', $cc);
+my $cc2 = PVE::Multipath::Config->parse_config('multipath.cfg', $written);
+is_deeply(
+ PVE::Multipath::Config::wwid_list($cc2),
+ ['3600abc', '3600def'],
+ 'wwids survive the SectionConfig round-trip',
+);
+is_deeply(
+ PVE::Multipath::Config::aliases($cc2),
+ PVE::Multipath::Config::aliases($cc),
+ 'aliases survive the round-trip',
+);
+is($cc2->{ids}->{'3600abc'}->{'no-path-retry'}, 18, 'a per-WWID knob survives the round-trip');
+
+is_deeply(
+ PVE::Multipath::Config::wwid_list(PVE::Multipath::Config->parse_config('multipath.cfg', '')),
+ [],
+ 'an empty cluster config has no WWIDs',
+);
+
+# --- multipaths{} block (alias plus per-WWID knobs) ---
+my $block = PVE::Multipath::Config::build_multipaths_block({
+ '3600def' => { alias => 'san-b-lun0' },
+ '3600abc' => { alias => 'san-a-lun0', 'no-path-retry' => 18 },
+ '3600nul' => {},
+});
+like($block, qr/^multipaths \{/m, 'block opens with multipaths {');
+is(
+ scalar(() = $block =~ /^\tmultipath \{/mg),
+ 2,
+ 'one multipath{} per WWID that has an alias or a knob (the empty WWID is skipped)',
+);
+like(
+ $block,
+ qr/wwid 3600abc.*?alias san-a-lun0.*?no_path_retry 18/s,
+ 'block carries the alias and the per-WWID knob',
+);
+my $abc_pos = index($block, 'wwid 3600abc');
+my $def_pos = index($block, 'wwid 3600def');
+ok($abc_pos < $def_pos, 'block emits entries in WWID-sorted order');
+is(PVE::Multipath::Config::build_multipaths_block({}), '', 'no WWIDs render to the empty string');
+like(
+ PVE::Multipath::Config::build_multipaths_block({
+ '3600abc' => { 'path-selector' => 'round-robin 0' },
+ }),
+ qr/^\t\tpath_selector "round-robin 0"$/m,
+ 'a per-WWID value containing whitespace renders double-quoted',
+);
+
+# --- WWID allow-list reconcile plan: only prune what PVE itself added ---
+# 'f' is a foreign, hand-added WWID present in the local allow-list but never managed by PVE.
+{
+ my ($add, $remove) = PVE::Multipath::plan_wwid_changes(
+ { a => 1, b => 1 }, # desired (cluster config)
+ { b => 1, f => 1 }, # current allow-list
+ { b => 1 }, # WWIDs PVE added before
+ );
+ is_deeply($add, ['a'], 'adds desired WWIDs that are not active yet');
+ is_deeply($remove, [], 'a foreign WWID PVE never added is left untouched');
+
+ ($add, $remove) = PVE::Multipath::plan_wwid_changes(
+ { a => 1 }, # desired
+ { a => 1, b => 1, f => 1 }, # current allow-list
+ { a => 1, b => 1 }, # PVE added a and b before
+ );
+ is_deeply($add, [], 'nothing to add when the desired WWID is already active');
+ is_deeply(
+ $remove,
+ ['b'],
+ 'prunes a PVE-managed WWID dropped from the config, but never the foreign one',
+ );
+}
+
+# --- ownership record: never adopt a pre-existing hand-made allow-list entry ---
+{
+ # 'h' was already in the allow-list when it appeared in the cluster config, so PVE never added
+ # it and it must stay foreign even while desired; dropping it from the config later must not
+ # prune the hand-made entry (think a boot-from-SAN LUN added just for health monitoring)
+ my $owned = PVE::Multipath::owned_wwid_record(
+ { a => 1, h => 1 }, # desired
+ { a => 1 }, # managed before
+ {}, # nothing added this pass (both were already in the allow-list)
+ {},
+ );
+ is_deeply($owned, ['a'], 'a desired but hand-added WWID is not adopted into the record');
+
+ $owned = PVE::Multipath::owned_wwid_record(
+ { a => 1, n => 1 }, # desired
+ { a => 1 }, # managed before
+ { n => 1 }, # newly added by this pass
+ {},
+ );
+ is_deeply($owned, ['a', 'n'], 'a WWID PVE just added becomes owned');
+
+ $owned = PVE::Multipath::owned_wwid_record(
+ { a => 1 }, # desired
+ { a => 1, gone => 1 }, # managed; 'gone' was dropped from the config
+ {},
+ { gone => 1 }, # but its prune failed
+ );
+ is_deeply($owned, ['a', 'gone'], 'a failed prune keeps ownership so it is retried');
+
+ $owned = PVE::Multipath::owned_wwid_record(
+ { a => 1, f => 1 }, # desired
+ { a => 1 }, # managed before
+ {}, # adding 'f' failed, so it never reached the allow-list
+ {},
+ );
+ is_deeply($owned, ['a'], 'a failed add does not claim ownership');
+
+ is_deeply(PVE::Multipath::owned_wwid_record({}, {}, {}, {}), [], 'empty in, empty out');
+}
+
+# --- storage-derived expectations (consumers and per-LUN expected node sets) ---
+{
+ my $storage_cfg = {
+ ids => {
+ mp => { type => 'multipath' },
+ mpb => { type => 'multipath', nodes => { n1 => 1, n2 => 1 } },
+ mpoff => { type => 'multipath', disable => 1, nodes => { n9 => 1 } },
+ lvm1 => { type => 'lvm', base => 'mp:3600abc', nodes => { n1 => 1 } },
+ lvm2 => { type => 'lvm', base => 'mpb:3600def' },
+ lvmplain => { type => 'lvm' },
+ other => { type => 'nfs' },
+ },
+ };
+ my $exp = PVE::Multipath::storage_expectations($storage_cfg, ['n1', 'n2', 'n3']);
+ is_deeply(
+ $exp->{consumers},
+ { '3600abc' => 'lvm1', '3600def' => 'lvm2' },
+ 'LVM storages over a multipath base are found as consumers',
+ );
+ is_deeply(
+ $exp->{nodes},
+ { n1 => 1, n2 => 1, n3 => 1 },
+ 'expected union covers all nodes for an unrestricted storage and skips disabled ones',
+ );
+ is_deeply(
+ $exp->{'wwid-nodes'}->{'3600abc'},
+ { n1 => 1 },
+ 'a consumed LUN is expected only on the restricted consumer nodes',
+ );
+ is_deeply(
+ $exp->{'wwid-nodes'}->{'3600def'},
+ { n1 => 1, n2 => 1 },
+ 'the base storage restriction caps an unrestricted consumer',
+ );
+}
+
+# --- override guard ---
+eval { PVE::Multipath::Config::check_overrides("devices {\n\tdevice {\n\t\tvendor X\n\t}\n}\n") };
+is($@, '', 'a well-formed devices{} block passes the guard');
+eval { PVE::Multipath::Config::check_overrides("multipaths {\n}\n") };
+like($@, qr/managed via aliases/, 'a multipaths{} block is rejected, it is generated');
+eval { PVE::Multipath::Config::check_overrides("devices {\n") };
+like($@, qr/unbalanced braces/, 'unbalanced braces are rejected');
+eval { PVE::Multipath::Config::check_overrides("frobnicate {\n}\n") };
+like($@, qr/unknown top-level section/, 'an unknown top-level section is rejected');
+eval { PVE::Multipath::Config::check_overrides("blacklist { } multipaths {\n}\n") };
+like($@, qr/managed via aliases/, 'a multipaths{} block cannot hide behind a same-line close');
+eval { PVE::Multipath::Config::check_overrides("blacklist { } frobnicate {\n}\n") };
+like($@, qr/unknown top-level section/, 'an unknown section cannot hide behind a same-line close');
+eval { PVE::Multipath::Config::check_overrides("}\ndevices {\n") };
+like($@, qr/unbalanced braces/, 'a closing brace before any open is rejected');
+is(
+ PVE::Multipath::Config::write_overrides('x', "text \n\n"),
+ "text\n",
+ 'the overrides writer trims trailing whitespace',
+);
+
+done_testing();
--
2.47.3
next prev parent reply other threads:[~2026-07-03 15:31 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-07-03 12:46 [PATCH v2 storage,cluster,manager 0/13] multipath: cluster-wide config, storage and health overview Thomas Lamprecht
2026-07-03 12:46 ` Thomas Lamprecht [this message]
2026-07-03 12:46 ` [PATCH v2 storage 02/13] api: disks: add read-only multipath status endpoint Thomas Lamprecht
2026-07-03 12:46 ` [PATCH v2 storage 03/13] api: multipath: add cluster-wide configuration endpoints Thomas Lamprecht
2026-07-03 12:46 ` [PATCH v2 storage 04/13] multipath: add storage plugin for multipath LUNs Thomas Lamprecht
2026-07-03 12:46 ` [PATCH v2 storage 05/13] lvm: allow a multipath storage as the base device Thomas Lamprecht
2026-07-03 12:46 ` [PATCH v2 storage 06/13] multipath: broadcast per-node map health to the cluster KV store Thomas Lamprecht
2026-07-03 12:46 ` [PATCH v2 storage 07/13] api: multipath: add cluster-wide health status endpoint Thomas Lamprecht
2026-07-03 12:46 ` [PATCH v2 cluster 08/13] pmxcfs: track cluster-wide multipath configuration Thomas Lamprecht
2026-07-03 12:46 ` [PATCH v2 manager 09/13] pvestatd: apply the cluster-wide multipath config on each node Thomas Lamprecht
2026-07-03 12:46 ` [PATCH v2 manager 10/13] api: cluster: mount the multipath configuration endpoint Thomas Lamprecht
2026-07-03 12:46 ` [PATCH v2 manager 11/13] pvestatd: broadcast multipath map health to the cluster Thomas Lamprecht
2026-07-03 12:46 ` [PATCH v2 manager 12/13] ui: dc: add multipath health matrix and config editor Thomas Lamprecht
2026-07-03 12:46 ` [PATCH v2 manager 13/13] ui: node: show multipath maps and their paths under Disks Thomas Lamprecht
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260703124707.1172980-3-t.lamprecht@proxmox.com \
--to=t.lamprecht@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.