From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH storage 01/13] multipath: add helper library and managed configuration
Date: Fri, 26 Jun 2026 14:07:31 +0200 [thread overview]
Message-ID: <20260626121000.2095591-2-t.lamprecht@proxmox.com> (raw)
In-Reply-To: <20260626121000.2095591-1-t.lamprecht@proxmox.com>
Multipath on PVE is configured by hand and per node today, with nothing
that keeps it consistent across a cluster. Add the foundation for
managing it cluster-wide instead.
The library reads the assembled maps and their health from multipathd.
The configuration is a SectionConfig kept in pmxcfs: one 'defaults'
section for the global multipathd knobs, plus one 'wwid' section per
allow-listed LUN holding its optional alias and any per-LUN knobs.
Parameters are kebab-case and rendered to multipathd's snake_case
keywords, validated through the section schema so a bad value cannot
reach the generated drop-in.
The managed baseline is deliberately conservative: it only assembles
explicitly allow-listed LUNs and keeps map names stable and WWID-based,
so a device is named the same on every node and an LVM PV on it stays
stable cluster-wide. Hardware-specific tuning lives in a separate,
admin-owned override rather than in the generated baseline, and the two
are written to distinct drop-ins, as multipath does not accept a
repeated 'defaults' section in one file.
Parsing and generation stay in a pure module with no dependency on
PVE::Cluster, so they remain unit-testable and usable on a node whose
pve-cluster does not yet observe the new file; registering it in pmxcfs
needs the matching pve-cluster change.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
src/PVE/Makefile | 4 +
src/PVE/Multipath.pm | 282 ++++++++++++++++++++++
src/PVE/Multipath/ClusterConfig.pm | 55 +++++
src/PVE/Multipath/Config.pm | 361 +++++++++++++++++++++++++++++
src/PVE/Multipath/Generator.pm | 148 ++++++++++++
src/test/Makefile | 5 +-
src/test/run_multipath_tests.pl | 238 +++++++++++++++++++
7 files changed, 1092 insertions(+), 1 deletion(-)
create mode 100644 src/PVE/Multipath.pm
create mode 100644 src/PVE/Multipath/ClusterConfig.pm
create mode 100644 src/PVE/Multipath/Config.pm
create mode 100644 src/PVE/Multipath/Generator.pm
create mode 100755 src/test/run_multipath_tests.pl
diff --git a/src/PVE/Makefile b/src/PVE/Makefile
index 9e9f6aa..7ddd646 100644
--- a/src/PVE/Makefile
+++ b/src/PVE/Makefile
@@ -4,6 +4,10 @@
install:
install -D -m 0644 Storage.pm ${DESTDIR}${PERLDIR}/PVE/Storage.pm
install -D -m 0644 Diskmanage.pm ${DESTDIR}${PERLDIR}/PVE/Diskmanage.pm
+ install -D -m 0644 Multipath.pm ${DESTDIR}${PERLDIR}/PVE/Multipath.pm
+ install -D -m 0644 Multipath/Config.pm ${DESTDIR}${PERLDIR}/PVE/Multipath/Config.pm
+ install -D -m 0644 Multipath/ClusterConfig.pm ${DESTDIR}${PERLDIR}/PVE/Multipath/ClusterConfig.pm
+ install -D -m 0644 Multipath/Generator.pm ${DESTDIR}${PERLDIR}/PVE/Multipath/Generator.pm
install -D -m 0644 CephConfig.pm ${DESTDIR}${PERLDIR}/PVE/CephConfig.pm
install -D -m 0644 GuestImport.pm ${DESTDIR}${PERLDIR}/PVE/GuestImport.pm
make -C Storage install
diff --git a/src/PVE/Multipath.pm b/src/PVE/Multipath.pm
new file mode 100644
index 0000000..59c1103
--- /dev/null
+++ b/src/PVE/Multipath.pm
@@ -0,0 +1,282 @@
+package PVE::Multipath;
+
+use strict;
+use warnings;
+
+use JSON qw(decode_json);
+
+use PVE::Tools qw(run_command file_read_firstline file_get_contents);
+
+use PVE::Multipath::Config;
+
+# Helper library around device-mapper multipath (multipathd). The single place that knows how to
+# talk to multipathd and turn its state into a normalized, stable structure for the rest of the
+# storage stack: health reporting, the API, and consumers that wait for a map before binding to it.
+#
+# Everything keys on the SCSI/NAA WWID, as reported by multipathd in the map's 'uuid' field. Map
+# names (mpathN / aliases) and the underlying 'sdX' paths are node-local and unstable; the WWID is
+# not.
+
+my $MULTIPATH = '/sbin/multipath';
+my $MULTIPATHD = '/sbin/multipathd';
+
+# health states for a single map, derived from its path states
+use constant {
+ HEALTH_OPTIMAL => 'optimal', # all paths active
+ HEALTH_DEGRADED => 'degraded', # some but not all paths active
+ HEALTH_FAILED => 'failed', # no active path left
+};
+
+my $supported;
+
+sub is_supported {
+ return $supported if defined($supported);
+ $supported = (-x $MULTIPATH && -x $MULTIPATHD) ? 1 : 0;
+ return $supported;
+}
+
+sub assert_supported {
+ die "no multipath support - please install 'multipath-tools'\n" if !is_supported();
+ return 1;
+}
+
+# Returns whether the multipathd daemon is reachable. Used for status output, so it never dies and
+# just reports 0 when multipath is unavailable or the daemon socket cannot be queried.
+sub is_running {
+ return 0 if !is_supported();
+
+ my $running = 0;
+ eval {
+ run_command(
+ [$MULTIPATHD, 'show', 'daemon'],
+ outfunc => sub {
+ my ($line) = @_;
+ # example: "pid 1234 idle" / "pid 1234 running"
+ $running = 1 if $line =~ m/^pid \d+ \S+/;
+ },
+ errfunc => sub { },
+ );
+ };
+ return $running;
+}
+
+# Runs `multipathd show <subcmd> json` and returns the raw output. Kept separate from parsing so
+# tests can feed recorded fixtures to the parse_*() functions without a running daemon.
+my sub query_multipathd_json {
+ my ($subcmd) = @_;
+
+ assert_supported();
+
+ my $output = '';
+ run_command(
+ [$MULTIPATHD, 'show', $subcmd, 'json'],
+ outfunc => sub { $output .= "$_[0]\n"; },
+ errfunc => sub { warn "$_[0]\n"; },
+ );
+
+ return $output;
+}
+
+my sub derive_health {
+ my ($paths_active, $paths_total) = @_;
+
+ return HEALTH_FAILED if !$paths_total || !$paths_active;
+ return HEALTH_OPTIMAL if $paths_active == $paths_total;
+ return HEALTH_DEGRADED;
+}
+
+my sub normalize_path {
+ my ($path) = @_;
+
+ my $res = {
+ dev => $path->{dev},
+ # 'active' or 'failed' - the state device-mapper sees
+ 'dm-state' => $path->{dm_st},
+ # 'running', 'faulty' or 'offline' - the state the kernel block layer sees
+ 'dev-state' => $path->{dev_st},
+ # path checker result, e.g. 'ready' / 'faulty' / 'ghost'
+ 'check-state' => $path->{chk_st},
+ };
+ $res->{priority} = int($path->{pri}) if defined($path->{pri});
+
+ # multipathd renders unset string fields as the literal '[undef]'
+ my $wwpn = $path->{target_wwpn};
+ my $hba = $path->{host_adapter};
+ undef $wwpn if !defined($wwpn) || $wwpn eq '[undef]' || $wwpn eq '';
+ undef $hba if !defined($hba) || $hba eq '[undef]' || $hba eq '';
+ $res->{'target-wwpn'} = $wwpn if defined($wwpn);
+ $res->{'host-adapter'} = $hba if defined($hba);
+ # a real target WWPN (0x...) means Fibre Channel; iSCSI/SAS transport is derived from sysfs by
+ # get_maps(). Do NOT treat the field's mere presence as FC, multipathd reports it as '[undef]'
+ # for iSCSI.
+ $res->{transport} = 'fc' if defined($wwpn) && $wwpn =~ /^0x[0-9a-f]+$/i;
+
+ return $res;
+}
+
+# Turns the output of `multipathd show maps json` into a normalized list of maps. Pure (no I/O) on
+# purpose: it derives everything it can from the JSON alone, so it can be unit-tested against
+# recorded fixtures. Live-only bits (byte size, transport) are added by get_maps() below.
+sub parse_maps_json {
+ my ($json) = @_;
+
+ my $data = eval { decode_json($json) };
+ die "could not parse multipathd maps JSON: $@\n" if $@;
+
+ my $maps = [];
+ for my $map (($data->{maps} // [])->@*) {
+ my $path_groups = [];
+ my ($paths_total, $paths_active) = (0, 0);
+
+ for my $group (($map->{path_groups} // [])->@*) {
+ my $paths = [];
+ for my $path (($group->{paths} // [])->@*) {
+ my $normalized = normalize_path($path);
+ $paths_total++;
+ $paths_active++ if ($normalized->{'dm-state'} // '') eq 'active';
+ push $paths->@*, $normalized;
+ }
+ push $path_groups->@*,
+ {
+ group => int($group->{group} // 0),
+ 'dm-state' => $group->{dm_st},
+ priority => int($group->{pri} // 0),
+ paths => $paths,
+ };
+ }
+
+ push $maps->@*, {
+ wwid => $map->{uuid},
+ name => $map->{name},
+ sysfs => $map->{sysfs}, # the 'dm-N' kernel name
+ 'dm-state' => $map->{dm_st},
+ 'paths-total' => $paths_total,
+ 'paths-active' => $paths_active,
+ health => derive_health($paths_active, $paths_total),
+ 'path-groups' => $path_groups,
+ };
+ }
+
+ return $maps;
+}
+
+# Best-effort byte size of a dm device from sysfs (Linux reports size in 512b sectors regardless of
+# the real block size).
+my sub dm_size_bytes {
+ my ($sysfs) = @_;
+
+ return undef if !$sysfs;
+ my $sectors = file_read_firstline("/sys/block/$sysfs/size");
+ return undef if !defined($sectors) || $sectors !~ m/^\d+$/;
+ return int($sectors) * 512;
+}
+
+my sub dir_has_entries {
+ my ($dir) = @_;
+
+ return 0 if !-d $dir;
+ opendir(my $dh, $dir) or return 0;
+ my @entries = grep { $_ ne '.' && $_ ne '..' } readdir($dh);
+ closedir($dh);
+ return scalar(@entries) ? 1 : 0;
+}
+
+# Best-effort transport of a single 'sdX' path from its sysfs topology; only iSCSI and SAS need it,
+# Fibre Channel is already set from the map JSON.
+my sub path_transport {
+ my ($dev) = @_;
+
+ return undef if !$dev;
+ my $link = readlink("/sys/block/$dev");
+ return undef if !$link;
+ return 'iscsi' if $link =~ m{/session\d+/};
+ return 'fc' if $link =~ m{/rport-\d+};
+ return 'sas' if $link =~ m{/end_device-};
+ return undef;
+}
+
+# Returns the normalized maps enriched with information that requires the local system (size, a
+# stable consumer path). Dies if multipath is not supported; callers that just want status should
+# guard with is_supported()/is_running().
+sub get_maps {
+ my $maps = parse_maps_json(query_multipathd_json('maps'));
+
+ for my $map ($maps->@*) {
+ $map->{size} = dm_size_bytes($map->{sysfs});
+ # WWID-stable path, present independently of the (node-local) map name
+ $map->{path} = "/dev/disk/by-id/dm-uuid-mpath-$map->{wwid}"
+ if defined($map->{wwid});
+ $map->{used} = dir_has_entries("/sys/block/$map->{sysfs}/holders")
+ if $map->{sysfs};
+
+ my %transports;
+ for my $group ($map->{'path-groups'}->@*) {
+ for my $path ($group->{paths}->@*) {
+ $path->{transport} //= path_transport($path->{dev});
+ $transports{ $path->{transport} } = 1 if defined($path->{transport});
+ }
+ }
+ # only expose a map-level transport when all paths agree on it
+ my @transports = keys %transports;
+ $map->{transport} = $transports[0] if scalar(@transports) == 1;
+ }
+
+ return $maps;
+}
+
+sub get_map_for_wwid {
+ my ($wwid) = @_;
+
+ for my $map (get_maps()->@*) {
+ return $map if defined($map->{wwid}) && $map->{wwid} eq $wwid;
+ }
+ return undef;
+}
+
+# Polls until a map for the given WWID exists, up to $timeout seconds. A consumer like the iSCSI
+# plugin uses this after a login or rescan to bind to the coalesced dm device rather than to a
+# transient single 'sdX' path.
+sub wait_for_map {
+ my ($wwid, $timeout) = @_;
+
+ $timeout //= 10;
+
+ my $deadline = time() + $timeout;
+ while (1) {
+ my $map = eval { get_map_for_wwid($wwid) };
+ return $map if $map;
+ return undef if time() >= $deadline;
+ sleep(1);
+ }
+}
+
+my $WWIDS_FILE = '/etc/multipath/wwids';
+
+# The managed allow-list of LUNs (WWIDs) to assemble into a map; with 'find_multipaths strict' only
+# these get multipathed.
+sub list_wwids {
+ return [] if !-e $WWIDS_FILE;
+ return PVE::Multipath::Config::parse_wwids(file_get_contents($WWIDS_FILE));
+}
+
+sub add_wwid {
+ my ($wwid) = @_;
+
+ assert_supported();
+ run_command([$MULTIPATH, '-a', $wwid]);
+}
+
+sub remove_wwid {
+ my ($wwid) = @_;
+
+ assert_supported();
+ run_command([$MULTIPATH, '-w', $wwid]);
+}
+
+# Re-read the configuration and rebuild maps accordingly, after a config or allow-list change.
+sub reconfigure {
+ assert_supported();
+ run_command([$MULTIPATHD, 'reconfigure']);
+}
+
+1;
diff --git a/src/PVE/Multipath/ClusterConfig.pm b/src/PVE/Multipath/ClusterConfig.pm
new file mode 100644
index 0000000..0b09c3f
--- /dev/null
+++ b/src/PVE/Multipath/ClusterConfig.pm
@@ -0,0 +1,55 @@
+package PVE::Multipath::ClusterConfig;
+
+use strict;
+use warnings;
+
+use PVE::Cluster qw(cfs_register_file cfs_read_file cfs_write_file cfs_lock_file);
+
+use PVE::Multipath::Config;
+
+# Cluster-wide multipath configuration, replicated by pmxcfs. The structured allow-list, aliases and
+# knobs live in multipath.cfg (a SectionConfig); the free-form hardware override text lives in a
+# separate plain file so it stays hand-editable and diffable.
+my $FILENAME = 'multipath.cfg';
+my $OVERRIDES_FILENAME = 'multipath-overrides.conf';
+
+cfs_register_file(
+ $FILENAME,
+ sub { PVE::Multipath::Config->parse_config(@_); },
+ sub { PVE::Multipath::Config->write_config(@_); },
+);
+
+cfs_register_file(
+ $OVERRIDES_FILENAME,
+ \&PVE::Multipath::Config::parse_overrides,
+ \&PVE::Multipath::Config::write_overrides,
+);
+
+sub read_config {
+ return cfs_read_file($FILENAME);
+}
+
+sub write_config {
+ my ($cfg) = @_;
+ cfs_write_file($FILENAME, $cfg);
+}
+
+sub read_overrides {
+ return cfs_read_file($OVERRIDES_FILENAME);
+}
+
+sub write_overrides {
+ my ($text) = @_;
+ cfs_write_file($OVERRIDES_FILENAME, $text);
+}
+
+sub lock_config {
+ my ($code, $errmsg) = @_;
+
+ cfs_lock_file($FILENAME, undef, $code);
+ if (my $err = $@) {
+ $errmsg ? die "$errmsg: $err" : die $err;
+ }
+}
+
+1;
diff --git a/src/PVE/Multipath/Config.pm b/src/PVE/Multipath/Config.pm
new file mode 100644
index 0000000..21ad72e
--- /dev/null
+++ b/src/PVE/Multipath/Config.pm
@@ -0,0 +1,361 @@
+package PVE::Multipath::Config;
+
+use strict;
+use warnings;
+
+use PVE::SectionConfig;
+
+use base qw(PVE::SectionConfig);
+
+# Parser and writer for the cluster-wide source of truth in pmxcfs (/etc/pve/multipath.cfg). It is a
+# SectionConfig: a single 'defaults' section for global multipathd knobs, plus one 'wwid' section
+# per allow-listed LUN holding its optional alias and per-LUN knobs. Free-form hardware overrides
+# (device {} entries) live in a separate plain file, see PVE::Multipath::ClusterConfig. Kept pure so
+# it stays unit-testable without PVE::Cluster.
+
+# Conservative, cluster-friendly defaults, applied when the 'defaults' section omits them:
+# - find_multipaths strict -> only explicitly allow-listed LUNs get assembled, so boot/root and
+# unrelated disks stay untouched.
+# - user_friendly_names no -> the map name is the WWID, identical on every node, so an LVM PV on
+# /dev/disk/by-id/dm-uuid-mpath-<wwid> is stable cluster-wide without a node-local bindings file.
+my $MANAGED_DEFAULTS = {
+ 'find-multipaths' => 'strict',
+ 'user-friendly-names' => 'no',
+ 'polling-interval' => 5,
+};
+
+sub managed_defaults { return { $MANAGED_DEFAULTS->%* }; }
+
+# Knobs valid both globally (defaults) and per-LUN (a multipaths{} entry), per multipath.conf(5).
+my $shared_knobs = {
+ 'no-path-retry' => {
+ type => 'string',
+ pattern => '(?:queue|fail|\d+)',
+ typetext => 'queue|fail|<count>',
+ description =>
+ "How to react when all paths are down: keep queuing, fail at once, or retry"
+ . " for the given number of polling intervals.",
+ optional => 1,
+ },
+ 'path-grouping-policy' => {
+ type => 'string',
+ enum => [qw(failover multibus group_by_serial group_by_prio group_by_node_name)],
+ description => "How paths are grouped into priority groups.",
+ optional => 1,
+ },
+ failback => {
+ type => 'string',
+ pattern => '(?:manual|immediate|followover|\d+)',
+ typetext => 'manual|immediate|followover|<seconds>',
+ description => "When to fail back to a restored higher-priority path group.",
+ optional => 1,
+ },
+ 'path-selector' => {
+ type => 'string',
+ maxLength => 64,
+ description => "Path selector algorithm used within a priority group, for example"
+ . " 'service-time 0'.",
+ optional => 1,
+ },
+};
+
+# Knobs that only make sense globally.
+my $defaults_only_knobs = {
+ 'find-multipaths' => {
+ type => 'string',
+ enum => [qw(yes no strict greedy smart)],
+ default => 'strict',
+ description => "Which devices multipathd assembles into a map. 'strict' only takes"
+ . " explicitly allow-listed WWIDs.",
+ optional => 1,
+ },
+ 'user-friendly-names' => {
+ type => 'string',
+ enum => [qw(yes no)],
+ default => 'no',
+ description => "Whether to use node-local mpathN names. Keep 'no' for stable WWID-based"
+ . " names across the cluster.",
+ optional => 1,
+ },
+ 'polling-interval' => {
+ type => 'integer',
+ minimum => 1,
+ default => 5,
+ description => "Interval between path checks, in seconds.",
+ optional => 1,
+ },
+};
+
+# Knobs that only make sense per-LUN.
+my $wwid_only_knobs = {
+ alias => {
+ type => 'string',
+ pattern => '[a-zA-Z0-9][a-zA-Z0-9._-]*',
+ maxLength => 64,
+ description =>
+ "Human-readable map name for this WWID; multipathd uses it as the map name.",
+ optional => 1,
+ },
+ 'rr-min-io-rq' => {
+ type => 'integer',
+ minimum => 1,
+ description =>
+ "Number of I/O requests to route to a path before switching, request-based.",
+ optional => 1,
+ },
+ 'rr-weight' => {
+ type => 'string',
+ enum => [qw(priorities uniform)],
+ description => "Whether to weight paths by priority when balancing I/O.",
+ optional => 1,
+ },
+};
+
+my $defaultData = {
+ propertyList => {
+ type => { description => "Section type ('defaults' or 'wwid')." },
+ id => {
+ type => 'string',
+ description =>
+ "Section ID: the literal 'defaults', or a LUN WWID for 'wwid' sections.",
+ pattern => '[a-zA-Z0-9._:-]+',
+ maxLength => 128,
+ },
+ $shared_knobs->%*,
+ $defaults_only_knobs->%*,
+ $wwid_only_knobs->%*,
+ },
+};
+
+sub private { return $defaultData; }
+
+package PVE::Multipath::Config::Defaults;
+
+use base qw(PVE::Multipath::Config);
+
+sub type { return 'defaults'; }
+
+sub options {
+ return {
+ 'find-multipaths' => { optional => 1 },
+ 'user-friendly-names' => { optional => 1 },
+ 'polling-interval' => { optional => 1 },
+ 'no-path-retry' => { optional => 1 },
+ 'path-grouping-policy' => { optional => 1 },
+ failback => { optional => 1 },
+ 'path-selector' => { optional => 1 },
+ };
+}
+
+__PACKAGE__->register();
+
+package PVE::Multipath::Config::Wwid;
+
+use base qw(PVE::Multipath::Config);
+
+sub type { return 'wwid'; }
+
+sub options {
+ return {
+ alias => { optional => 1 },
+ 'no-path-retry' => { optional => 1 },
+ 'path-grouping-policy' => { optional => 1 },
+ failback => { optional => 1 },
+ 'path-selector' => { optional => 1 },
+ 'rr-min-io-rq' => { optional => 1 },
+ 'rr-weight' => { optional => 1 },
+ };
+}
+
+__PACKAGE__->register();
+
+package PVE::Multipath::Config;
+
+__PACKAGE__->init();
+
+# multipathd subsections accept only these top-level keywords; the admin override file is checked
+# against them. 'multipaths' is generated from the wwid sections, so an admin block would collide.
+my $OVERRIDE_KEYWORDS =
+ { devices => 1, overrides => 1, defaults => 1, blacklist => 1, blacklist_exceptions => 1 };
+
+# Validate the free-form override text before it can break multipathd's parser cluster-wide. This is
+# a guard, not a full parse: balanced braces, only known top-level sections, and no 'multipaths'
+# block (that is generated from the wwid sections and a duplicate is fatal to multipathd).
+sub check_overrides {
+ my ($text) = @_;
+
+ return if !defined($text) || $text !~ /\S/;
+
+ my ($open, $close) = (0, 0);
+ for my $line (split(/\n/, $text)) {
+ next if $line =~ /^\s*#/;
+ $open += ($line =~ tr/{//);
+ $close += ($line =~ tr/}//);
+ if ($line =~ /^\s*(\w+)\s*\{/) {
+ my $kw = $1;
+ die "multipath overrides: 'multipaths' is managed via aliases, do not set it here\n"
+ if $kw eq 'multipaths';
+ die "multipath overrides: unknown top-level section '$kw'\n"
+ if $open - $close == 1 && !$OVERRIDE_KEYWORDS->{$kw};
+ }
+ }
+ die "multipath overrides: unbalanced braces\n" if $open != $close;
+
+ return;
+}
+
+# Read/write the separate, admin-owned override file (/etc/pve/multipath-overrides.conf). Stored and
+# rendered verbatim, so it stays hand-editable and diffable.
+sub parse_overrides {
+ my ($filename, $raw) = @_;
+ return $raw // '';
+}
+
+sub write_overrides {
+ my ($filename, $text) = @_;
+ $text //= '';
+ $text =~ s/\s+$//;
+ return length($text) ? "$text\n" : '';
+}
+
+my $MANAGED_HEADER =
+ "# This file is managed by Proxmox VE - do not edit by hand.\n"
+ . "# Hardware-/node-specific overrides belong in the override config.\n";
+
+# Renders a named section from a key => value hash, keys sorted for a stable, diffable result. The
+# config and API use kebab-case parameters, multipathd keywords are snake_case, so map '-' to '_'.
+my sub render_section {
+ my ($name, $kv) = @_;
+
+ my $out = "$name {\n";
+ for my $key (sort keys $kv->%*) {
+ (my $keyword = $key) =~ tr/-/_/;
+ $out .= "\t$keyword $kv->{$key}\n";
+ }
+ $out .= "}\n";
+ return $out;
+}
+
+# Builds the Proxmox-managed baseline drop-in (header + defaults section) from the effective global
+# knobs. Admin overrides are not merged in here: they go into a separate conf.d file, as multipath
+# rejects two 'defaults' blocks in one file (duplicate keyword) and drops the second.
+sub generate_managed_conf {
+ my ($defaults) = @_;
+ $defaults //= managed_defaults();
+
+ return $MANAGED_HEADER . "\n" . render_section('defaults', $defaults);
+}
+
+# The WWID allow-list file (/etc/multipath/wwids) holds one '/<wwid>/' per line; parse it and back.
+sub parse_wwids {
+ my ($text) = @_;
+
+ my $wwids = [];
+ for my $line (split(/\n/, $text // '')) {
+ next if $line =~ /^\s*#/;
+ next if $line =~ /^\s*$/;
+ if ($line =~ m{^/(.+)/\s*$}) {
+ push $wwids->@*, $1;
+ }
+ }
+ return $wwids;
+}
+
+sub format_wwids {
+ my ($wwids) = @_;
+
+ my $out = "# Multipath wwids, managed by Proxmox VE\n";
+ $out .= "/$_/\n" for sort $wwids->@*;
+ return $out;
+}
+
+# Builds a 'multipaths {}' block from the per-WWID sections (alias plus any per-LUN knobs); returns
+# the empty string when no WWID has an alias or a knob set.
+sub build_multipaths_block {
+ my ($wwid_opts) = @_;
+
+ my @entries = grep { %{ $wwid_opts->{$_} } } sort keys %$wwid_opts;
+ return '' if !@entries;
+
+ my $out = "multipaths {\n";
+ for my $wwid (@entries) {
+ my $opts = $wwid_opts->{$wwid};
+ $out .= "\tmultipath {\n";
+ $out .= "\t\twwid $wwid\n";
+ for my $key (sort keys %$opts) {
+ (my $keyword = $key) =~ tr/-/_/;
+ $out .= "\t\t$keyword $opts->{$key}\n";
+ }
+ $out .= "\t}\n";
+ }
+ $out .= "}\n";
+ return $out;
+}
+
+# The knob property definitions as an API parameter schema. Strip the schema 'default' so an update
+# that omits a knob leaves it unchanged instead of resetting it to the managed default.
+my sub api_schema {
+ my ($props) = @_;
+
+ my $res = {};
+ for my $key (keys %$props) {
+ $res->{$key} = { $props->{$key}->%* };
+ delete $res->{$key}->{default};
+ }
+ return $res;
+}
+
+# Settable global knobs (the 'defaults' section) as an API parameter schema.
+sub defaults_api_schema {
+ return api_schema({ $shared_knobs->%*, $defaults_only_knobs->%* });
+}
+
+# Settable per-WWID knobs (including the alias) as an API parameter schema.
+sub wwid_api_schema {
+ return api_schema({ $shared_knobs->%*, $wwid_only_knobs->%* });
+}
+
+# Effective global knobs: the 'defaults' section merged onto the conservative managed defaults.
+sub effective_defaults {
+ my ($cfg) = @_;
+
+ my $defaults = managed_defaults();
+ if (my $section = $cfg->{ids}->{defaults}) {
+ $defaults->{$_} = $section->{$_} for grep { $_ ne 'type' } keys %$section;
+ }
+ return $defaults;
+}
+
+# The allow-listed WWIDs, that is the ids of the 'wwid' sections.
+sub wwid_list {
+ my ($cfg) = @_;
+ return [sort grep { ($cfg->{ids}->{$_}->{type} // '') eq 'wwid' } keys $cfg->{ids}->%*];
+}
+
+# { wwid => alias } for the WWIDs that have one.
+sub aliases {
+ my ($cfg) = @_;
+
+ my $res = {};
+ for my $wwid (keys $cfg->{ids}->%*) {
+ my $section = $cfg->{ids}->{$wwid};
+ next if ($section->{type} // '') ne 'wwid';
+ $res->{$wwid} = $section->{alias} if defined($section->{alias});
+ }
+ return $res;
+}
+
+# { wwid => { alias?, knob => value, ... } }, the per-WWID input to build_multipaths_block().
+sub wwid_opts {
+ my ($cfg) = @_;
+
+ my $res = {};
+ for my $wwid (keys $cfg->{ids}->%*) {
+ my $section = $cfg->{ids}->{$wwid};
+ next if ($section->{type} // '') ne 'wwid';
+ $res->{$wwid} = { map { $_ => $section->{$_} } grep { $_ ne 'type' } keys %$section };
+ }
+ return $res;
+}
+
+1;
diff --git a/src/PVE/Multipath/Generator.pm b/src/PVE/Multipath/Generator.pm
new file mode 100644
index 0000000..0bcd37f
--- /dev/null
+++ b/src/PVE/Multipath/Generator.pm
@@ -0,0 +1,148 @@
+package PVE::Multipath::Generator;
+
+use strict;
+use warnings;
+
+use File::Path qw(make_path);
+
+use PVE::Tools qw(file_get_contents file_set_contents);
+
+use PVE::Multipath;
+use PVE::Multipath::Config;
+use PVE::Multipath::ClusterConfig;
+
+# Renders the effective node-local multipath configuration from the cluster-wide source of truth
+# (/etc/pve/multipath.cfg) and reloads multipathd when something changed.
+#
+# The rendered files live on the local filesystem, so they survive reboots and are available to
+# multipathd at boot even before pmxcfs is up; the last successful render is the boot-time fallback.
+
+# Proxmox-owned drop-ins; the admin's /etc/multipath.conf keeps its default 'config_dir
+# /etc/multipath/conf.d'. The managed baseline, the admin overrides, and the generated aliases each
+# get their own file so multipath merges them across files instead of hitting a duplicate section
+# keyword: two 'defaults' blocks in one file are rejected outright, and our 'multipaths' alias block
+# would clash with a 'multipaths' section in the overrides. The overrides file sorts after the
+# baseline, so an admin's defaults override it; the aliases file is a separate 'multipaths' block,
+# so its order does not matter.
+my $DEFAULTS_DROPIN = '/etc/multipath/conf.d/pve-defaults.conf';
+my $OVERRIDES_DROPIN = '/etc/multipath/conf.d/pve-overrides.conf';
+my $ALIASES_DROPIN = '/etc/multipath/conf.d/pve-aliases.conf';
+
+my sub write_if_changed {
+ my ($path, $content) = @_;
+
+ my $old = -e $path ? eval { file_get_contents($path) } : undef;
+ return 0 if defined($old) && $old eq $content;
+
+ my $dir = $path =~ s!/[^/]+$!!r;
+ make_path($dir) if !-d $dir;
+ file_set_contents($path, $content);
+ return 1;
+}
+
+my sub remove_if_present {
+ my ($path) = @_;
+
+ return 0 if !-e $path;
+ unlink($path) or die "could not remove '$path': $!\n";
+ return 1;
+}
+
+sub regenerate {
+ my ($cfg, $overrides) = @_;
+ $cfg //= PVE::Multipath::ClusterConfig::read_config();
+ $overrides //= PVE::Multipath::ClusterConfig::read_overrides();
+
+ my $changed = 0;
+
+ my $defaults = PVE::Multipath::Config::effective_defaults($cfg);
+ $changed = 1
+ if write_if_changed($DEFAULTS_DROPIN,
+ PVE::Multipath::Config::generate_managed_conf($defaults));
+
+ if (defined($overrides) && length($overrides)) {
+ my $content =
+ "# Managed by Proxmox VE - edit overrides in /etc/pve/multipath-overrides.conf.\n\n"
+ . "$overrides\n";
+ $changed = 1 if write_if_changed($OVERRIDES_DROPIN, $content);
+ } else {
+ $changed = 1 if remove_if_present($OVERRIDES_DROPIN);
+ }
+
+ my $block =
+ PVE::Multipath::Config::build_multipaths_block(PVE::Multipath::Config::wwid_opts($cfg));
+ if (length($block)) {
+ my $content =
+ "# Managed by Proxmox VE - edit aliases and per-LUN options in /etc/pve/multipath.cfg.\n\n"
+ . $block;
+ $changed = 1 if write_if_changed($ALIASES_DROPIN, $content);
+ } else {
+ $changed = 1 if remove_if_present($ALIASES_DROPIN);
+ }
+
+ # Bring the WWID allow-list (/etc/multipath/wwids) in line with the cluster config through
+ # multipath's own add/remove, so its on-disk format stays intact. Isolate each op: one failing
+ # WWID must not abort the whole pass, or it would stall every other WWID on every run; a failed
+ # op leaves the file unchanged and is retried next pass.
+ my %desired = map { $_ => 1 } PVE::Multipath::Config::wwid_list($cfg)->@*;
+ my %current = map { $_ => 1 } PVE::Multipath::list_wwids()->@*;
+
+ my @errors;
+ for my $wwid (sort keys %desired) {
+ next if $current{$wwid};
+ eval { PVE::Multipath::add_wwid($wwid); };
+ if (my $err = $@) {
+ push @errors, "adding WWID '$wwid' failed - $err";
+ } else {
+ $changed = 1;
+ }
+ }
+ for my $wwid (sort keys %current) {
+ next if $desired{$wwid};
+ eval { PVE::Multipath::remove_wwid($wwid); };
+ if (my $err = $@) {
+ push @errors, "removing WWID '$wwid' failed - $err";
+ } else {
+ $changed = 1;
+ }
+ }
+
+ # reload the daemon for whatever did converge, even if some ops failed
+ if ($changed && PVE::Multipath::is_running()) {
+ eval { PVE::Multipath::reconfigure(); };
+ push @errors, "reconfigure failed - $@" if $@;
+ }
+
+ die join('', @errors) if @errors;
+
+ return $changed;
+}
+
+# Safe periodic entry point for a status loop like pvestatd: a no-op when multipath is not in use on
+# this node, and never throws, so a caller stays a single guarded line and the same entry point
+# works from a systemd unit or CLI.
+sub sync {
+ return 0 if !PVE::Multipath::is_supported();
+
+ my $cfg = eval { PVE::Multipath::ClusterConfig::read_config() };
+ if (my $err = $@) {
+ warn "multipath: reading cluster config failed - $err";
+ return 0;
+ }
+ my $overrides = eval { PVE::Multipath::ClusterConfig::read_overrides() };
+
+ # stay out of the way unless the feature is in use: nothing configured cluster-wide and no
+ # local drop-in present
+ return 0
+ if !scalar(PVE::Multipath::Config::wwid_list($cfg)->@*)
+ && !(defined($overrides) && length($overrides))
+ && !$cfg->{ids}->{defaults}
+ && !-e $DEFAULTS_DROPIN;
+
+ my $changed = eval { regenerate($cfg, $overrides) };
+ warn "multipath: config sync failed - $@" if $@;
+
+ return $changed // 0;
+}
+
+1;
diff --git a/src/test/Makefile b/src/test/Makefile
index ee025bc..51c7360 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -1,6 +1,6 @@
all: test
-test: test_zfspoolplugin test_lvmplugin test_disklist test_bwlimit test_plugin test_ovf test_volume_access
+test: test_zfspoolplugin test_lvmplugin test_disklist test_bwlimit test_plugin test_ovf test_volume_access test_multipath
test_zfspoolplugin: run_test_zfspoolplugin.pl
./run_test_zfspoolplugin.pl
@@ -22,3 +22,6 @@ test_ovf: run_ovf_tests.pl
test_volume_access: run_volume_access_tests.pl
./run_volume_access_tests.pl
+
+test_multipath: run_multipath_tests.pl
+ ./run_multipath_tests.pl
diff --git a/src/test/run_multipath_tests.pl b/src/test/run_multipath_tests.pl
new file mode 100755
index 0000000..f710308
--- /dev/null
+++ b/src/test/run_multipath_tests.pl
@@ -0,0 +1,238 @@
+#!/usr/bin/perl
+
+use strict;
+use warnings;
+
+use JSON;
+use Test::More;
+
+use lib ('.', '..');
+use PVE::Multipath;
+use PVE::Multipath::Config;
+
+# A recorded `multipathd show maps json` reply with three maps exercising each
+# health state: an all-active map, a partially-failed map and an all-failed map.
+my $maps_json = <<'EOF';
+{
+ "major_version": 0,
+ "minor_version": 1,
+ "maps": [
+ {
+ "name": "mpatha",
+ "uuid": "3600140500a1b2c3d4e5f6a7b8c9d0e1f",
+ "sysfs": "dm-0",
+ "dm_st": "active",
+ "paths": 2,
+ "path_groups": [
+ {
+ "group": 1,
+ "dm_st": "active",
+ "pri": 50,
+ "paths": [
+ { "dev": "sdb", "dm_st": "active", "dev_st": "running", "chk_st": "ready", "pri": 50, "target_wwpn": "0x500a098000000001" },
+ { "dev": "sdc", "dm_st": "active", "dev_st": "running", "chk_st": "ready", "pri": 50, "target_wwpn": "0x500a098000000002" }
+ ]
+ }
+ ]
+ },
+ {
+ "name": "mpathb",
+ "uuid": "360014050aabbccddeeff00112233445566",
+ "sysfs": "dm-1",
+ "dm_st": "active",
+ "paths": 2,
+ "path_groups": [
+ {
+ "group": 1,
+ "dm_st": "active",
+ "pri": 10,
+ "paths": [
+ { "dev": "sdd", "dm_st": "active", "dev_st": "running", "chk_st": "ready", "pri": 10 }
+ ]
+ },
+ {
+ "group": 2,
+ "dm_st": "enabled",
+ "pri": 0,
+ "paths": [
+ { "dev": "sde", "dm_st": "failed", "dev_st": "faulty", "chk_st": "faulty", "pri": 0 }
+ ]
+ }
+ ]
+ },
+ {
+ "name": "mpathc",
+ "uuid": "36001405ffffffffffffffffffffffffff",
+ "sysfs": "dm-2",
+ "dm_st": "active",
+ "paths": 1,
+ "path_groups": [
+ {
+ "group": 1,
+ "dm_st": "enabled",
+ "pri": 0,
+ "paths": [
+ { "dev": "sdf", "dm_st": "failed", "dev_st": "faulty", "chk_st": "faulty", "pri": 0, "target_wwpn": "[undef]", "host_adapter": "[undef]" }
+ ]
+ }
+ ]
+ }
+ ]
+}
+EOF
+
+my $maps = PVE::Multipath::parse_maps_json($maps_json);
+
+is(scalar($maps->@*), 3, 'parsed all three maps');
+
+my ($a, $b, $c) = $maps->@*;
+
+# fully healthy map
+is($a->{wwid}, '3600140500a1b2c3d4e5f6a7b8c9d0e1f', 'map a wwid taken from uuid');
+is($a->{name}, 'mpatha', 'map a name');
+is($a->{sysfs}, 'dm-0', 'map a sysfs name');
+is($a->{'paths-total'}, 2, 'map a counts both paths');
+is($a->{'paths-active'}, 2, 'map a has two active paths');
+is($a->{health}, 'optimal', 'map a is optimal');
+is(scalar($a->{'path-groups'}->@*), 1, 'map a has one path group');
+is(
+ $a->{'path-groups'}->[0]->{paths}->[0]->{'target-wwpn'},
+ '0x500a098000000001',
+ 'FC target wwpn is preserved',
+);
+is(
+ $a->{'path-groups'}->[0]->{paths}->[0]->{transport},
+ 'fc',
+ 'transport derived as fc from a target wwpn',
+);
+
+# one failed path out of two
+is($b->{'paths-total'}, 2, 'map b counts both paths across groups');
+is($b->{'paths-active'}, 1, 'map b has one active path');
+is($b->{health}, 'degraded', 'map b is degraded');
+
+# no active path left
+is($c->{'paths-total'}, 1, 'map c counts its single path');
+is($c->{'paths-active'}, 0, 'map c has no active path');
+is($c->{health}, 'failed', 'map c is failed');
+ok(
+ !defined($c->{'path-groups'}->[0]->{paths}->[0]->{'target-wwpn'}),
+ "multipathd '[undef]' target_wwpn is cleaned away (not stored)",
+);
+ok(
+ !defined($c->{'path-groups'}->[0]->{paths}->[0]->{transport}),
+ "'[undef]' target_wwpn does not imply fc transport",
+);
+
+# empty / no maps must parse to an empty list, not die
+my $empty = PVE::Multipath::parse_maps_json('{ "major_version": 0, "maps": [] }');
+is_deeply($empty, [], 'no maps parses to empty list');
+
+# malformed input must die with a clear error
+eval { PVE::Multipath::parse_maps_json('not json') };
+ok($@ =~ m/could not parse multipathd maps JSON/, 'invalid JSON raises a clear error');
+
+# --- config generation / WWID allow-list ---
+my $conf = PVE::Multipath::Config::generate_managed_conf();
+like($conf, qr/managed by Proxmox VE/, 'managed conf carries the managed header');
+like($conf, qr/user_friendly_names no/, 'baseline sets user_friendly_names no');
+like($conf, qr/find_multipaths strict/, 'baseline opts in explicitly via find_multipaths strict');
+is(
+ scalar(() = $conf =~ /^defaults \{/mg),
+ 1,
+ 'baseline has exactly one defaults block (a second would be a duplicate-keyword error)',
+);
+
+my $wwids = PVE::Multipath::Config::parse_wwids("# Multipath wwids\n/3600abc/\n/3600def/\n");
+is_deeply($wwids, ['3600abc', '3600def'], 'parse_wwids extracts the wwids');
+like(
+ PVE::Multipath::Config::format_wwids(['3600def', '3600abc']),
+ qr{/3600abc/\n/3600def/},
+ 'format_wwids sorts and slash-wraps',
+);
+
+# --- cluster config (pmxcfs source of truth): SectionConfig parse/write ---
+my $raw =
+ "defaults: defaults\n\tfind-multipaths strict\n\tno-path-retry queue\n\n"
+ . "wwid: 3600def\n\talias san-b-lun0\n\n"
+ . "wwid: 3600abc\n\talias san-a-lun0\n\tno-path-retry 18\n";
+my $cc = PVE::Multipath::Config->parse_config('multipath.cfg', $raw);
+is_deeply(
+ PVE::Multipath::Config::wwid_list($cc),
+ ['3600abc', '3600def'],
+ 'wwid sections become the allow-list (sorted)',
+);
+is_deeply(
+ PVE::Multipath::Config::aliases($cc),
+ { '3600abc' => 'san-a-lun0', '3600def' => 'san-b-lun0' },
+ 'aliases read from the wwid sections',
+);
+is(
+ PVE::Multipath::Config::effective_defaults($cc)->{'no-path-retry'},
+ 'queue',
+ 'defaults section knob is read',
+);
+is(
+ PVE::Multipath::Config::effective_defaults($cc)->{'user-friendly-names'},
+ 'no',
+ 'an unset defaults knob falls back to the managed default',
+);
+
+my $written = PVE::Multipath::Config->write_config('multipath.cfg', $cc);
+my $cc2 = PVE::Multipath::Config->parse_config('multipath.cfg', $written);
+is_deeply(
+ PVE::Multipath::Config::wwid_list($cc2),
+ ['3600abc', '3600def'],
+ 'wwids survive the SectionConfig round-trip',
+);
+is_deeply(
+ PVE::Multipath::Config::aliases($cc2),
+ PVE::Multipath::Config::aliases($cc),
+ 'aliases survive the round-trip',
+);
+is($cc2->{ids}->{'3600abc'}->{'no-path-retry'}, 18, 'a per-WWID knob survives the round-trip');
+
+is_deeply(
+ PVE::Multipath::Config::wwid_list(PVE::Multipath::Config->parse_config('multipath.cfg', '')),
+ [],
+ 'an empty cluster config has no WWIDs',
+);
+
+# --- multipaths{} block (alias plus per-WWID knobs) ---
+my $block = PVE::Multipath::Config::build_multipaths_block({
+ '3600def' => { alias => 'san-b-lun0' },
+ '3600abc' => { alias => 'san-a-lun0', 'no-path-retry' => 18 },
+ '3600nul' => {},
+});
+like($block, qr/^multipaths \{/m, 'block opens with multipaths {');
+is(
+ scalar(() = $block =~ /^\tmultipath \{/mg),
+ 2,
+ 'one multipath{} per WWID that has an alias or a knob (the empty WWID is skipped)',
+);
+like(
+ $block,
+ qr/wwid 3600abc.*?alias san-a-lun0.*?no_path_retry 18/s,
+ 'block carries the alias and the per-WWID knob',
+);
+my $abc_pos = index($block, 'wwid 3600abc');
+my $def_pos = index($block, 'wwid 3600def');
+ok($abc_pos < $def_pos, 'block emits entries in WWID-sorted order');
+is(PVE::Multipath::Config::build_multipaths_block({}), '', 'no WWIDs render to the empty string');
+
+# --- override guard ---
+eval { PVE::Multipath::Config::check_overrides("devices {\n\tdevice {\n\t\tvendor X\n\t}\n}\n") };
+is($@, '', 'a well-formed devices{} block passes the guard');
+eval { PVE::Multipath::Config::check_overrides("multipaths {\n}\n") };
+like($@, qr/managed via aliases/, 'a multipaths{} block is rejected, it is generated');
+eval { PVE::Multipath::Config::check_overrides("devices {\n") };
+like($@, qr/unbalanced braces/, 'unbalanced braces are rejected');
+eval { PVE::Multipath::Config::check_overrides("frobnicate {\n}\n") };
+like($@, qr/unknown top-level section/, 'an unknown top-level section is rejected');
+is(
+ PVE::Multipath::Config::write_overrides('x', "text \n\n"),
+ "text\n",
+ 'the overrides writer trims trailing whitespace',
+);
+
+done_testing();
--
2.47.3
next prev parent reply other threads:[~2026-06-26 12:12 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-26 12:07 [PATCH storage,cluster,manager 0/13] multipath: cluster-wide config, storage and health overview Thomas Lamprecht
2026-06-26 12:07 ` Thomas Lamprecht [this message]
2026-06-26 14:43 ` [PATCH storage 01/13] multipath: add helper library and managed configuration Maximiliano Sandoval
2026-06-26 12:07 ` [PATCH storage 02/13] api: disks: add read-only multipath status endpoint Thomas Lamprecht
2026-06-26 12:07 ` [PATCH storage 03/13] api: multipath: add cluster-wide configuration endpoints Thomas Lamprecht
2026-06-26 12:07 ` [PATCH storage 04/13] multipath: add storage plugin for multipath LUNs Thomas Lamprecht
2026-06-26 12:07 ` [PATCH storage 05/13] lvm: allow a multipath storage as the base device Thomas Lamprecht
2026-06-26 12:07 ` [PATCH storage 06/13] multipath: broadcast per-node map health to the cluster KV store Thomas Lamprecht
2026-06-26 12:07 ` [PATCH storage 07/13] api: multipath: add cluster-wide health status endpoint Thomas Lamprecht
2026-06-26 12:07 ` [PATCH cluster 08/13] pmxcfs: track cluster-wide multipath configuration Thomas Lamprecht
2026-06-26 12:07 ` [PATCH manager 09/13] pvestatd: apply the cluster-wide multipath config on each node Thomas Lamprecht
2026-06-26 12:07 ` [PATCH manager 10/13] api: cluster: mount the multipath configuration endpoint Thomas Lamprecht
2026-06-26 12:07 ` [PATCH manager 11/13] pvestatd: broadcast multipath map health to the cluster Thomas Lamprecht
2026-06-26 12:07 ` [PATCH manager 12/13] ui: dc: add multipath health matrix and config editor Thomas Lamprecht
2026-06-26 14:05 ` Maximiliano Sandoval
2026-06-26 12:07 ` [PATCH manager 13/13] ui: node: show multipath maps and their paths under Disks Thomas Lamprecht
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260626121000.2095591-2-t.lamprecht@proxmox.com \
--to=t.lamprecht@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox