From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id EBF8A1FF14C for ; Fri, 26 Jun 2026 16:43:42 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 2B6B915E81; Fri, 26 Jun 2026 16:43:40 +0200 (CEST) From: Maximiliano Sandoval To: Thomas Lamprecht Subject: Re: [PATCH storage 01/13] multipath: add helper library and managed configuration In-Reply-To: <20260626121000.2095591-2-t.lamprecht@proxmox.com> (Thomas Lamprecht's message of "Fri, 26 Jun 2026 14:07:31 +0200") References: <20260626121000.2095591-1-t.lamprecht@proxmox.com> <20260626121000.2095591-2-t.lamprecht@proxmox.com> User-Agent: mu4e 1.12.9; emacs 30.1 Date: Fri, 26 Jun 2026 16:43:27 +0200 Message-ID: MIME-Version: 1.0 Content-Type: text/plain X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1782485001526 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.124 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [diskmanage.pm,generator.pm,cephconfig.pm,guestimport.pm,multipath.pm,storage.pm,clusterconfig.pm,config.pm] Message-ID-Hash: RNEDO4NDG2T2MZE5HKVB7IVWPGJEEUUB X-Message-ID-Hash: RNEDO4NDG2T2MZE5HKVB7IVWPGJEEUUB X-MailFrom: m.sandoval@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: pve-devel@lists.proxmox.com X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Thomas Lamprecht writes: > Multipath on PVE is configured by hand and per node today, with nothing > that keeps it consistent across a cluster. Add the foundation for > managing it cluster-wide instead. > > The library reads the assembled maps and their health from multipathd. > The configuration is a SectionConfig kept in pmxcfs: one 'defaults' > section for the global multipathd knobs, plus one 'wwid' section per > allow-listed LUN holding its optional alias and any per-LUN knobs. > Parameters are kebab-case and rendered to multipathd's snake_case > keywords, validated through the section schema so a bad value cannot > reach the generated drop-in. > > The managed baseline is deliberately conservative: it only assembles > explicitly allow-listed LUNs and keeps map names stable and WWID-based, > so a device is named the same on every node and an LVM PV on it stays > stable cluster-wide. Hardware-specific tuning lives in a separate, > admin-owned override rather than in the generated baseline, and the two > are written to distinct drop-ins, as multipath does not accept a > repeated 'defaults' section in one file. > > Parsing and generation stay in a pure module with no dependency on > PVE::Cluster, so they remain unit-testable and usable on a node whose > pve-cluster does not yet observe the new file; registering it in pmxcfs > needs the matching pve-cluster change. > > Signed-off-by: Thomas Lamprecht > --- > src/PVE/Makefile | 4 + > src/PVE/Multipath.pm | 282 ++++++++++++++++++++++ > src/PVE/Multipath/ClusterConfig.pm | 55 +++++ > src/PVE/Multipath/Config.pm | 361 +++++++++++++++++++++++++++++ > src/PVE/Multipath/Generator.pm | 148 ++++++++++++ > src/test/Makefile | 5 +- > src/test/run_multipath_tests.pl | 238 +++++++++++++++++++ > 7 files changed, 1092 insertions(+), 1 deletion(-) > create mode 100644 src/PVE/Multipath.pm > create mode 100644 src/PVE/Multipath/ClusterConfig.pm > create mode 100644 src/PVE/Multipath/Config.pm > create mode 100644 src/PVE/Multipath/Generator.pm > create mode 100755 src/test/run_multipath_tests.pl > > diff --git a/src/PVE/Makefile b/src/PVE/Makefile > index 9e9f6aa..7ddd646 100644 > --- a/src/PVE/Makefile > +++ b/src/PVE/Makefile > @@ -4,6 +4,10 @@ > install: > install -D -m 0644 Storage.pm ${DESTDIR}${PERLDIR}/PVE/Storage.pm > install -D -m 0644 Diskmanage.pm ${DESTDIR}${PERLDIR}/PVE/Diskmanage.pm > + install -D -m 0644 Multipath.pm ${DESTDIR}${PERLDIR}/PVE/Multipath.pm > + install -D -m 0644 Multipath/Config.pm ${DESTDIR}${PERLDIR}/PVE/Multipath/Config.pm > + install -D -m 0644 Multipath/ClusterConfig.pm ${DESTDIR}${PERLDIR}/PVE/Multipath/ClusterConfig.pm > + install -D -m 0644 Multipath/Generator.pm ${DESTDIR}${PERLDIR}/PVE/Multipath/Generator.pm > install -D -m 0644 CephConfig.pm ${DESTDIR}${PERLDIR}/PVE/CephConfig.pm > install -D -m 0644 GuestImport.pm ${DESTDIR}${PERLDIR}/PVE/GuestImport.pm > make -C Storage install > diff --git a/src/PVE/Multipath.pm b/src/PVE/Multipath.pm > new file mode 100644 > index 0000000..59c1103 > --- /dev/null > +++ b/src/PVE/Multipath.pm > @@ -0,0 +1,282 @@ > +package PVE::Multipath; > + > +use strict; > +use warnings; > + > +use JSON qw(decode_json); > + > +use PVE::Tools qw(run_command file_read_firstline file_get_contents); > + > +use PVE::Multipath::Config; > + > +# Helper library around device-mapper multipath (multipathd). The single place that knows how to > +# talk to multipathd and turn its state into a normalized, stable structure for the rest of the > +# storage stack: health reporting, the API, and consumers that wait for a map before binding to it. > +# > +# Everything keys on the SCSI/NAA WWID, as reported by multipathd in the map's 'uuid' field. Map > +# names (mpathN / aliases) and the underlying 'sdX' paths are node-local and unstable; the WWID is > +# not. > + > +my $MULTIPATH = '/sbin/multipath'; > +my $MULTIPATHD = '/sbin/multipathd'; > + > +# health states for a single map, derived from its path states > +use constant { > + HEALTH_OPTIMAL => 'optimal', # all paths active > + HEALTH_DEGRADED => 'degraded', # some but not all paths active > + HEALTH_FAILED => 'failed', # no active path left > +}; > + > +my $supported; > + > +sub is_supported { > + return $supported if defined($supported); > + $supported = (-x $MULTIPATH && -x $MULTIPATHD) ? 1 : 0; > + return $supported; > +} > + > +sub assert_supported { > + die "no multipath support - please install 'multipath-tools'\n" if !is_supported(); > + return 1; > +} > + > +# Returns whether the multipathd daemon is reachable. Used for status output, so it never dies and > +# just reports 0 when multipath is unavailable or the daemon socket cannot be queried. > +sub is_running { > + return 0 if !is_supported(); > + > + my $running = 0; > + eval { > + run_command( > + [$MULTIPATHD, 'show', 'daemon'], > + outfunc => sub { > + my ($line) = @_; > + # example: "pid 1234 idle" / "pid 1234 running" > + $running = 1 if $line =~ m/^pid \d+ \S+/; > + }, > + errfunc => sub { }, > + ); > + }; > + return $running; > +} > + > +# Runs `multipathd show json` and returns the raw output. Kept separate from parsing so > +# tests can feed recorded fixtures to the parse_*() functions without a running daemon. > +my sub query_multipathd_json { > + my ($subcmd) = @_; > + > + assert_supported(); > + > + my $output = ''; > + run_command( > + [$MULTIPATHD, 'show', $subcmd, 'json'], > + outfunc => sub { $output .= "$_[0]\n"; }, > + errfunc => sub { warn "$_[0]\n"; }, > + ); > + > + return $output; > +} > + > +my sub derive_health { > + my ($paths_active, $paths_total) = @_; > + > + return HEALTH_FAILED if !$paths_total || !$paths_active; > + return HEALTH_OPTIMAL if $paths_active == $paths_total; > + return HEALTH_DEGRADED; > +} > + > +my sub normalize_path { > + my ($path) = @_; > + > + my $res = { > + dev => $path->{dev}, > + # 'active' or 'failed' - the state device-mapper sees > + 'dm-state' => $path->{dm_st}, > + # 'running', 'faulty' or 'offline' - the state the kernel block layer sees > + 'dev-state' => $path->{dev_st}, > + # path checker result, e.g. 'ready' / 'faulty' / 'ghost' > + 'check-state' => $path->{chk_st}, > + }; > + $res->{priority} = int($path->{pri}) if defined($path->{pri}); > + > + # multipathd renders unset string fields as the literal '[undef]' > + my $wwpn = $path->{target_wwpn}; > + my $hba = $path->{host_adapter}; > + undef $wwpn if !defined($wwpn) || $wwpn eq '[undef]' || $wwpn eq ''; > + undef $hba if !defined($hba) || $hba eq '[undef]' || $hba eq ''; > + $res->{'target-wwpn'} = $wwpn if defined($wwpn); > + $res->{'host-adapter'} = $hba if defined($hba); > + # a real target WWPN (0x...) means Fibre Channel; iSCSI/SAS transport is derived from sysfs by > + # get_maps(). Do NOT treat the field's mere presence as FC, multipathd reports it as '[undef]' > + # for iSCSI. > + $res->{transport} = 'fc' if defined($wwpn) && $wwpn =~ /^0x[0-9a-f]+$/i; > + > + return $res; > +} > + > +# Turns the output of `multipathd show maps json` into a normalized list of maps. Pure (no I/O) on > +# purpose: it derives everything it can from the JSON alone, so it can be unit-tested against > +# recorded fixtures. Live-only bits (byte size, transport) are added by get_maps() below. > +sub parse_maps_json { > + my ($json) = @_; > + > + my $data = eval { decode_json($json) }; > + die "could not parse multipathd maps JSON: $@\n" if $@; > + > + my $maps = []; > + for my $map (($data->{maps} // [])->@*) { > + my $path_groups = []; > + my ($paths_total, $paths_active) = (0, 0); > + > + for my $group (($map->{path_groups} // [])->@*) { > + my $paths = []; > + for my $path (($group->{paths} // [])->@*) { > + my $normalized = normalize_path($path); > + $paths_total++; > + $paths_active++ if ($normalized->{'dm-state'} // '') eq 'active'; > + push $paths->@*, $normalized; > + } > + push $path_groups->@*, > + { > + group => int($group->{group} // 0), > + 'dm-state' => $group->{dm_st}, > + priority => int($group->{pri} // 0), > + paths => $paths, > + }; > + } > + > + push $maps->@*, { > + wwid => $map->{uuid}, > + name => $map->{name}, > + sysfs => $map->{sysfs}, # the 'dm-N' kernel name > + 'dm-state' => $map->{dm_st}, > + 'paths-total' => $paths_total, > + 'paths-active' => $paths_active, > + health => derive_health($paths_active, $paths_total), > + 'path-groups' => $path_groups, > + }; > + } > + > + return $maps; > +} > + > +# Best-effort byte size of a dm device from sysfs (Linux reports size in 512b sectors regardless of > +# the real block size). > +my sub dm_size_bytes { > + my ($sysfs) = @_; > + > + return undef if !$sysfs; > + my $sectors = file_read_firstline("/sys/block/$sysfs/size"); > + return undef if !defined($sectors) || $sectors !~ m/^\d+$/; > + return int($sectors) * 512; > +} > + > +my sub dir_has_entries { > + my ($dir) = @_; > + > + return 0 if !-d $dir; > + opendir(my $dh, $dir) or return 0; > + my @entries = grep { $_ ne '.' && $_ ne '..' } readdir($dh); > + closedir($dh); > + return scalar(@entries) ? 1 : 0; > +} > + > +# Best-effort transport of a single 'sdX' path from its sysfs topology; only iSCSI and SAS need it, > +# Fibre Channel is already set from the map JSON. > +my sub path_transport { > + my ($dev) = @_; > + > + return undef if !$dev; > + my $link = readlink("/sys/block/$dev"); > + return undef if !$link; > + return 'iscsi' if $link =~ m{/session\d+/}; > + return 'fc' if $link =~ m{/rport-\d+}; > + return 'sas' if $link =~ m{/end_device-}; > + return undef; > +} > + > +# Returns the normalized maps enriched with information that requires the local system (size, a > +# stable consumer path). Dies if multipath is not supported; callers that just want status should > +# guard with is_supported()/is_running(). > +sub get_maps { > + my $maps = parse_maps_json(query_multipathd_json('maps')); > + > + for my $map ($maps->@*) { > + $map->{size} = dm_size_bytes($map->{sysfs}); > + # WWID-stable path, present independently of the (node-local) map name > + $map->{path} = "/dev/disk/by-id/dm-uuid-mpath-$map->{wwid}" > + if defined($map->{wwid}); > + $map->{used} = dir_has_entries("/sys/block/$map->{sysfs}/holders") > + if $map->{sysfs}; > + > + my %transports; > + for my $group ($map->{'path-groups'}->@*) { > + for my $path ($group->{paths}->@*) { > + $path->{transport} //= path_transport($path->{dev}); > + $transports{ $path->{transport} } = 1 if defined($path->{transport}); > + } > + } > + # only expose a map-level transport when all paths agree on it > + my @transports = keys %transports; > + $map->{transport} = $transports[0] if scalar(@transports) == 1; > + } > + > + return $maps; > +} > + > +sub get_map_for_wwid { > + my ($wwid) = @_; > + > + for my $map (get_maps()->@*) { > + return $map if defined($map->{wwid}) && $map->{wwid} eq $wwid; > + } > + return undef; > +} > + > +# Polls until a map for the given WWID exists, up to $timeout seconds. A consumer like the iSCSI > +# plugin uses this after a login or rescan to bind to the coalesced dm device rather than to a > +# transient single 'sdX' path. > +sub wait_for_map { > + my ($wwid, $timeout) = @_; > + > + $timeout //= 10; > + > + my $deadline = time() + $timeout; > + while (1) { > + my $map = eval { get_map_for_wwid($wwid) }; > + return $map if $map; > + return undef if time() >= $deadline; > + sleep(1); > + } > +} > + > +my $WWIDS_FILE = '/etc/multipath/wwids'; > + > +# The managed allow-list of LUNs (WWIDs) to assemble into a map; with 'find_multipaths strict' only > +# these get multipathed. > +sub list_wwids { Together with wwid_list this is a bit of a confusing name, could it be, e.g. "sub list_etc_multipath_wwids"? > + return [] if !-e $WWIDS_FILE; > + return PVE::Multipath::Config::parse_wwids(file_get_contents($WWIDS_FILE)); > +} > + > +sub add_wwid { > + my ($wwid) = @_; > + > + assert_supported(); > + run_command([$MULTIPATH, '-a', $wwid]); > +} > + > +sub remove_wwid { > + my ($wwid) = @_; > + > + assert_supported(); > + run_command([$MULTIPATH, '-w', $wwid]); > +} > + > +# Re-read the configuration and rebuild maps accordingly, after a config or allow-list change. > +sub reconfigure { > + assert_supported(); > + run_command([$MULTIPATHD, 'reconfigure']); > +} > + > +1; > diff --git a/src/PVE/Multipath/ClusterConfig.pm b/src/PVE/Multipath/ClusterConfig.pm > new file mode 100644 > index 0000000..0b09c3f > --- /dev/null > +++ b/src/PVE/Multipath/ClusterConfig.pm > @@ -0,0 +1,55 @@ > +package PVE::Multipath::ClusterConfig; > + > +use strict; > +use warnings; > + > +use PVE::Cluster qw(cfs_register_file cfs_read_file cfs_write_file cfs_lock_file); > + > +use PVE::Multipath::Config; > + > +# Cluster-wide multipath configuration, replicated by pmxcfs. The structured allow-list, aliases and > +# knobs live in multipath.cfg (a SectionConfig); the free-form hardware override text lives in a > +# separate plain file so it stays hand-editable and diffable. > +my $FILENAME = 'multipath.cfg'; > +my $OVERRIDES_FILENAME = 'multipath-overrides.conf'; > + > +cfs_register_file( > + $FILENAME, > + sub { PVE::Multipath::Config->parse_config(@_); }, > + sub { PVE::Multipath::Config->write_config(@_); }, > +); > + > +cfs_register_file( > + $OVERRIDES_FILENAME, > + \&PVE::Multipath::Config::parse_overrides, > + \&PVE::Multipath::Config::write_overrides, > +); > + > +sub read_config { > + return cfs_read_file($FILENAME); > +} > + > +sub write_config { > + my ($cfg) = @_; > + cfs_write_file($FILENAME, $cfg); > +} > + > +sub read_overrides { > + return cfs_read_file($OVERRIDES_FILENAME); > +} > + > +sub write_overrides { > + my ($text) = @_; > + cfs_write_file($OVERRIDES_FILENAME, $text); > +} > + > +sub lock_config { > + my ($code, $errmsg) = @_; > + > + cfs_lock_file($FILENAME, undef, $code); > + if (my $err = $@) { > + $errmsg ? die "$errmsg: $err" : die $err; > + } > +} > + > +1; > diff --git a/src/PVE/Multipath/Config.pm b/src/PVE/Multipath/Config.pm > new file mode 100644 > index 0000000..21ad72e > --- /dev/null > +++ b/src/PVE/Multipath/Config.pm > @@ -0,0 +1,361 @@ > +package PVE::Multipath::Config; > + > +use strict; > +use warnings; > + > +use PVE::SectionConfig; > + > +use base qw(PVE::SectionConfig); > + > +# Parser and writer for the cluster-wide source of truth in pmxcfs (/etc/pve/multipath.cfg). It is a > +# SectionConfig: a single 'defaults' section for global multipathd knobs, plus one 'wwid' section > +# per allow-listed LUN holding its optional alias and per-LUN knobs. Free-form hardware overrides > +# (device {} entries) live in a separate plain file, see PVE::Multipath::ClusterConfig. Kept pure so > +# it stays unit-testable without PVE::Cluster. > + > +# Conservative, cluster-friendly defaults, applied when the 'defaults' section omits them: > +# - find_multipaths strict -> only explicitly allow-listed LUNs get assembled, so boot/root and > +# unrelated disks stay untouched. > +# - user_friendly_names no -> the map name is the WWID, identical on every node, so an LVM PV on > +# /dev/disk/by-id/dm-uuid-mpath- is stable cluster-wide without a node-local bindings file. > +my $MANAGED_DEFAULTS = { > + 'find-multipaths' => 'strict', > + 'user-friendly-names' => 'no', > + 'polling-interval' => 5, > +}; > + > +sub managed_defaults { return { $MANAGED_DEFAULTS->%* }; } > + > +# Knobs valid both globally (defaults) and per-LUN (a multipaths{} entry), per multipath.conf(5). > +my $shared_knobs = { > + 'no-path-retry' => { > + type => 'string', > + pattern => '(?:queue|fail|\d+)', > + typetext => 'queue|fail|', > + description => > + "How to react when all paths are down: keep queuing, fail at once, or retry" > + . " for the given number of polling intervals.", > + optional => 1, > + }, > + 'path-grouping-policy' => { > + type => 'string', > + enum => [qw(failover multibus group_by_serial group_by_prio group_by_node_name)], > + description => "How paths are grouped into priority groups.", > + optional => 1, > + }, > + failback => { > + type => 'string', > + pattern => '(?:manual|immediate|followover|\d+)', > + typetext => 'manual|immediate|followover|', > + description => "When to fail back to a restored higher-priority path group.", > + optional => 1, > + }, > + 'path-selector' => { > + type => 'string', > + maxLength => 64, > + description => "Path selector algorithm used within a priority group, for example" > + . " 'service-time 0'.", > + optional => 1, > + }, > +}; > + > +# Knobs that only make sense globally. > +my $defaults_only_knobs = { > + 'find-multipaths' => { > + type => 'string', > + enum => [qw(yes no strict greedy smart)], > + default => 'strict', > + description => "Which devices multipathd assembles into a map. 'strict' only takes" > + . " explicitly allow-listed WWIDs.", > + optional => 1, > + }, > + 'user-friendly-names' => { > + type => 'string', > + enum => [qw(yes no)], > + default => 'no', > + description => "Whether to use node-local mpathN names. Keep 'no' for stable WWID-based" > + . " names across the cluster.", > + optional => 1, > + }, > + 'polling-interval' => { > + type => 'integer', > + minimum => 1, > + default => 5, > + description => "Interval between path checks, in seconds.", > + optional => 1, > + }, > +}; > + > +# Knobs that only make sense per-LUN. > +my $wwid_only_knobs = { > + alias => { > + type => 'string', > + pattern => '[a-zA-Z0-9][a-zA-Z0-9._-]*', > + maxLength => 64, > + description => > + "Human-readable map name for this WWID; multipathd uses it as the map name.", > + optional => 1, > + }, > + 'rr-min-io-rq' => { > + type => 'integer', > + minimum => 1, > + description => > + "Number of I/O requests to route to a path before switching, request-based.", > + optional => 1, > + }, > + 'rr-weight' => { > + type => 'string', > + enum => [qw(priorities uniform)], > + description => "Whether to weight paths by priority when balancing I/O.", > + optional => 1, > + }, > +}; > + > +my $defaultData = { > + propertyList => { > + type => { description => "Section type ('defaults' or 'wwid')." }, > + id => { > + type => 'string', > + description => > + "Section ID: the literal 'defaults', or a LUN WWID for 'wwid' sections.", > + pattern => '[a-zA-Z0-9._:-]+', > + maxLength => 128, > + }, > + $shared_knobs->%*, > + $defaults_only_knobs->%*, > + $wwid_only_knobs->%*, > + }, > +}; > + > +sub private { return $defaultData; } > + > +package PVE::Multipath::Config::Defaults; > + > +use base qw(PVE::Multipath::Config); > + > +sub type { return 'defaults'; } > + > +sub options { > + return { > + 'find-multipaths' => { optional => 1 }, > + 'user-friendly-names' => { optional => 1 }, > + 'polling-interval' => { optional => 1 }, > + 'no-path-retry' => { optional => 1 }, > + 'path-grouping-policy' => { optional => 1 }, > + failback => { optional => 1 }, > + 'path-selector' => { optional => 1 }, > + }; > +} > + > +__PACKAGE__->register(); > + > +package PVE::Multipath::Config::Wwid; > + > +use base qw(PVE::Multipath::Config); > + > +sub type { return 'wwid'; } > + > +sub options { > + return { > + alias => { optional => 1 }, > + 'no-path-retry' => { optional => 1 }, > + 'path-grouping-policy' => { optional => 1 }, > + failback => { optional => 1 }, > + 'path-selector' => { optional => 1 }, > + 'rr-min-io-rq' => { optional => 1 }, > + 'rr-weight' => { optional => 1 }, > + }; > +} > + > +__PACKAGE__->register(); > + > +package PVE::Multipath::Config; > + > +__PACKAGE__->init(); > + > +# multipathd subsections accept only these top-level keywords; the admin override file is checked > +# against them. 'multipaths' is generated from the wwid sections, so an admin block would collide. > +my $OVERRIDE_KEYWORDS = > + { devices => 1, overrides => 1, defaults => 1, blacklist => 1, blacklist_exceptions => 1 }; > + > +# Validate the free-form override text before it can break multipathd's parser cluster-wide. This is > +# a guard, not a full parse: balanced braces, only known top-level sections, and no 'multipaths' > +# block (that is generated from the wwid sections and a duplicate is fatal to multipathd). > +sub check_overrides { > + my ($text) = @_; > + > + return if !defined($text) || $text !~ /\S/; > + > + my ($open, $close) = (0, 0); > + for my $line (split(/\n/, $text)) { > + next if $line =~ /^\s*#/; > + $open += ($line =~ tr/{//); > + $close += ($line =~ tr/}//); > + if ($line =~ /^\s*(\w+)\s*\{/) { > + my $kw = $1; > + die "multipath overrides: 'multipaths' is managed via aliases, do not set it here\n" > + if $kw eq 'multipaths'; > + die "multipath overrides: unknown top-level section '$kw'\n" > + if $open - $close == 1 && !$OVERRIDE_KEYWORDS->{$kw}; > + } > + } > + die "multipath overrides: unbalanced braces\n" if $open != $close; > + > + return; > +} > + > +# Read/write the separate, admin-owned override file (/etc/pve/multipath-overrides.conf). Stored and > +# rendered verbatim, so it stays hand-editable and diffable. > +sub parse_overrides { > + my ($filename, $raw) = @_; > + return $raw // ''; > +} > + > +sub write_overrides { > + my ($filename, $text) = @_; > + $text //= ''; > + $text =~ s/\s+$//; > + return length($text) ? "$text\n" : ''; > +} > + > +my $MANAGED_HEADER = > + "# This file is managed by Proxmox VE - do not edit by hand.\n" > + . "# Hardware-/node-specific overrides belong in the override config.\n"; > + > +# Renders a named section from a key => value hash, keys sorted for a stable, diffable result. The > +# config and API use kebab-case parameters, multipathd keywords are snake_case, so map '-' to '_'. > +my sub render_section { > + my ($name, $kv) = @_; > + > + my $out = "$name {\n"; > + for my $key (sort keys $kv->%*) { > + (my $keyword = $key) =~ tr/-/_/; > + $out .= "\t$keyword $kv->{$key}\n"; > + } > + $out .= "}\n"; > + return $out; > +} > + > +# Builds the Proxmox-managed baseline drop-in (header + defaults section) from the effective global > +# knobs. Admin overrides are not merged in here: they go into a separate conf.d file, as multipath > +# rejects two 'defaults' blocks in one file (duplicate keyword) and drops the second. > +sub generate_managed_conf { > + my ($defaults) = @_; > + $defaults //= managed_defaults(); > + > + return $MANAGED_HEADER . "\n" . render_section('defaults', $defaults); > +} > + > +# The WWID allow-list file (/etc/multipath/wwids) holds one '//' per line; parse it and back. > +sub parse_wwids { > + my ($text) = @_; > + > + my $wwids = []; > + for my $line (split(/\n/, $text // '')) { > + next if $line =~ /^\s*#/; > + next if $line =~ /^\s*$/; > + if ($line =~ m{^/(.+)/\s*$}) { > + push $wwids->@*, $1; > + } > + } > + return $wwids; > +} > + > +sub format_wwids { > + my ($wwids) = @_; > + > + my $out = "# Multipath wwids, managed by Proxmox VE\n"; > + $out .= "/$_/\n" for sort $wwids->@*; > + return $out; > +} > + > +# Builds a 'multipaths {}' block from the per-WWID sections (alias plus any per-LUN knobs); returns > +# the empty string when no WWID has an alias or a knob set. > +sub build_multipaths_block { > + my ($wwid_opts) = @_; > + > + my @entries = grep { %{ $wwid_opts->{$_} } } sort keys %$wwid_opts; > + return '' if !@entries; > + > + my $out = "multipaths {\n"; > + for my $wwid (@entries) { > + my $opts = $wwid_opts->{$wwid}; > + $out .= "\tmultipath {\n"; > + $out .= "\t\twwid $wwid\n"; > + for my $key (sort keys %$opts) { > + (my $keyword = $key) =~ tr/-/_/; > + $out .= "\t\t$keyword $opts->{$key}\n"; > + } > + $out .= "\t}\n"; > + } > + $out .= "}\n"; > + return $out; > +} > + > +# The knob property definitions as an API parameter schema. Strip the schema 'default' so an update > +# that omits a knob leaves it unchanged instead of resetting it to the managed default. > +my sub api_schema { > + my ($props) = @_; > + > + my $res = {}; > + for my $key (keys %$props) { > + $res->{$key} = { $props->{$key}->%* }; > + delete $res->{$key}->{default}; > + } > + return $res; > +} > + > +# Settable global knobs (the 'defaults' section) as an API parameter schema. > +sub defaults_api_schema { > + return api_schema({ $shared_knobs->%*, $defaults_only_knobs->%* }); > +} > + > +# Settable per-WWID knobs (including the alias) as an API parameter schema. > +sub wwid_api_schema { > + return api_schema({ $shared_knobs->%*, $wwid_only_knobs->%* }); > +} > + > +# Effective global knobs: the 'defaults' section merged onto the conservative managed defaults. > +sub effective_defaults { > + my ($cfg) = @_; > + > + my $defaults = managed_defaults(); > + if (my $section = $cfg->{ids}->{defaults}) { > + $defaults->{$_} = $section->{$_} for grep { $_ ne 'type' } keys %$section; > + } > + return $defaults; > +} > + > +# The allow-listed WWIDs, that is the ids of the 'wwid' sections. > +sub wwid_list { > + my ($cfg) = @_; > + return [sort grep { ($cfg->{ids}->{$_}->{type} // '') eq 'wwid' } keys $cfg->{ids}->%*]; > +} > + > +# { wwid => alias } for the WWIDs that have one. > +sub aliases { > + my ($cfg) = @_; > + > + my $res = {}; > + for my $wwid (keys $cfg->{ids}->%*) { > + my $section = $cfg->{ids}->{$wwid}; > + next if ($section->{type} // '') ne 'wwid'; > + $res->{$wwid} = $section->{alias} if defined($section->{alias}); > + } > + return $res; > +} > + > +# { wwid => { alias?, knob => value, ... } }, the per-WWID input to build_multipaths_block(). > +sub wwid_opts { > + my ($cfg) = @_; > + > + my $res = {}; > + for my $wwid (keys $cfg->{ids}->%*) { > + my $section = $cfg->{ids}->{$wwid}; > + next if ($section->{type} // '') ne 'wwid'; > + $res->{$wwid} = { map { $_ => $section->{$_} } grep { $_ ne 'type' } keys %$section }; > + } > + return $res; > +} > + > +1; > diff --git a/src/PVE/Multipath/Generator.pm b/src/PVE/Multipath/Generator.pm > new file mode 100644 > index 0000000..0bcd37f > --- /dev/null > +++ b/src/PVE/Multipath/Generator.pm > @@ -0,0 +1,148 @@ > +package PVE::Multipath::Generator; > + > +use strict; > +use warnings; > + > +use File::Path qw(make_path); > + > +use PVE::Tools qw(file_get_contents file_set_contents); > + > +use PVE::Multipath; > +use PVE::Multipath::Config; > +use PVE::Multipath::ClusterConfig; > + > +# Renders the effective node-local multipath configuration from the cluster-wide source of truth > +# (/etc/pve/multipath.cfg) and reloads multipathd when something changed. > +# > +# The rendered files live on the local filesystem, so they survive reboots and are available to > +# multipathd at boot even before pmxcfs is up; the last successful render is the boot-time fallback. > + > +# Proxmox-owned drop-ins; the admin's /etc/multipath.conf keeps its default 'config_dir > +# /etc/multipath/conf.d'. The managed baseline, the admin overrides, and the generated aliases each > +# get their own file so multipath merges them across files instead of hitting a duplicate section > +# keyword: two 'defaults' blocks in one file are rejected outright, and our 'multipaths' alias block > +# would clash with a 'multipaths' section in the overrides. The overrides file sorts after the > +# baseline, so an admin's defaults override it; the aliases file is a separate 'multipaths' block, > +# so its order does not matter. > +my $DEFAULTS_DROPIN = '/etc/multipath/conf.d/pve-defaults.conf'; > +my $OVERRIDES_DROPIN = '/etc/multipath/conf.d/pve-overrides.conf'; > +my $ALIASES_DROPIN = '/etc/multipath/conf.d/pve-aliases.conf'; > + > +my sub write_if_changed { > + my ($path, $content) = @_; > + > + my $old = -e $path ? eval { file_get_contents($path) } : undef; > + return 0 if defined($old) && $old eq $content; > + > + my $dir = $path =~ s!/[^/]+$!!r; > + make_path($dir) if !-d $dir; > + file_set_contents($path, $content); > + return 1; > +} > + > +my sub remove_if_present { > + my ($path) = @_; > + > + return 0 if !-e $path; > + unlink($path) or die "could not remove '$path': $!\n"; > + return 1; > +} > + > +sub regenerate { > + my ($cfg, $overrides) = @_; > + $cfg //= PVE::Multipath::ClusterConfig::read_config(); > + $overrides //= PVE::Multipath::ClusterConfig::read_overrides(); > + > + my $changed = 0; > + > + my $defaults = PVE::Multipath::Config::effective_defaults($cfg); > + $changed = 1 > + if write_if_changed($DEFAULTS_DROPIN, > + PVE::Multipath::Config::generate_managed_conf($defaults)); > + > + if (defined($overrides) && length($overrides)) { > + my $content = > + "# Managed by Proxmox VE - edit overrides in /etc/pve/multipath-overrides.conf.\n\n" > + . "$overrides\n"; > + $changed = 1 if write_if_changed($OVERRIDES_DROPIN, $content); > + } else { > + $changed = 1 if remove_if_present($OVERRIDES_DROPIN); > + } > + > + my $block = > + PVE::Multipath::Config::build_multipaths_block(PVE::Multipath::Config::wwid_opts($cfg)); > + if (length($block)) { > + my $content = > + "# Managed by Proxmox VE - edit aliases and per-LUN options in /etc/pve/multipath.cfg.\n\n" > + . $block; > + $changed = 1 if write_if_changed($ALIASES_DROPIN, $content); > + } else { > + $changed = 1 if remove_if_present($ALIASES_DROPIN); > + } > + > + # Bring the WWID allow-list (/etc/multipath/wwids) in line with the cluster config through > + # multipath's own add/remove, so its on-disk format stays intact. Isolate each op: one failing > + # WWID must not abort the whole pass, or it would stall every other WWID on every run; a failed > + # op leaves the file unchanged and is retried next pass. > + my %desired = map { $_ => 1 } PVE::Multipath::Config::wwid_list($cfg)->@*; > + my %current = map { $_ => 1 } PVE::Multipath::list_wwids()->@*; > + > + my @errors; > + for my $wwid (sort keys %desired) { > + next if $current{$wwid}; > + eval { PVE::Multipath::add_wwid($wwid); }; > + if (my $err = $@) { > + push @errors, "adding WWID '$wwid' failed - $err"; > + } else { > + $changed = 1; > + } > + } > + for my $wwid (sort keys %current) { > + next if $desired{$wwid}; > + eval { PVE::Multipath::remove_wwid($wwid); }; > + if (my $err = $@) { > + push @errors, "removing WWID '$wwid' failed - $err"; > + } else { > + $changed = 1; > + } > + } > + > + # reload the daemon for whatever did converge, even if some ops failed > + if ($changed && PVE::Multipath::is_running()) { > + eval { PVE::Multipath::reconfigure(); }; > + push @errors, "reconfigure failed - $@" if $@; > + } > + > + die join('', @errors) if @errors; > + > + return $changed; > +} > + > +# Safe periodic entry point for a status loop like pvestatd: a no-op when multipath is not in use on > +# this node, and never throws, so a caller stays a single guarded line and the same entry point > +# works from a systemd unit or CLI. > +sub sync { > + return 0 if !PVE::Multipath::is_supported(); > + > + my $cfg = eval { PVE::Multipath::ClusterConfig::read_config() }; > + if (my $err = $@) { > + warn "multipath: reading cluster config failed - $err"; > + return 0; > + } > + my $overrides = eval { PVE::Multipath::ClusterConfig::read_overrides() }; > + > + # stay out of the way unless the feature is in use: nothing configured cluster-wide and no > + # local drop-in present > + return 0 > + if !scalar(PVE::Multipath::Config::wwid_list($cfg)->@*) > + && !(defined($overrides) && length($overrides)) > + && !$cfg->{ids}->{defaults} > + && !-e $DEFAULTS_DROPIN; > + > + my $changed = eval { regenerate($cfg, $overrides) }; > + warn "multipath: config sync failed - $@" if $@; > + > + return $changed // 0; > +} > + > +1; > diff --git a/src/test/Makefile b/src/test/Makefile > index ee025bc..51c7360 100644 > --- a/src/test/Makefile > +++ b/src/test/Makefile > @@ -1,6 +1,6 @@ > all: test > > -test: test_zfspoolplugin test_lvmplugin test_disklist test_bwlimit test_plugin test_ovf test_volume_access > +test: test_zfspoolplugin test_lvmplugin test_disklist test_bwlimit test_plugin test_ovf test_volume_access test_multipath > > test_zfspoolplugin: run_test_zfspoolplugin.pl > ./run_test_zfspoolplugin.pl > @@ -22,3 +22,6 @@ test_ovf: run_ovf_tests.pl > > test_volume_access: run_volume_access_tests.pl > ./run_volume_access_tests.pl > + > +test_multipath: run_multipath_tests.pl > + ./run_multipath_tests.pl > diff --git a/src/test/run_multipath_tests.pl b/src/test/run_multipath_tests.pl > new file mode 100755 > index 0000000..f710308 > --- /dev/null > +++ b/src/test/run_multipath_tests.pl > @@ -0,0 +1,238 @@ > +#!/usr/bin/perl > + > +use strict; > +use warnings; > + > +use JSON; > +use Test::More; > + > +use lib ('.', '..'); > +use PVE::Multipath; > +use PVE::Multipath::Config; > + > +# A recorded `multipathd show maps json` reply with three maps exercising each > +# health state: an all-active map, a partially-failed map and an all-failed map. > +my $maps_json = <<'EOF'; > +{ > + "major_version": 0, > + "minor_version": 1, > + "maps": [ > + { > + "name": "mpatha", > + "uuid": "3600140500a1b2c3d4e5f6a7b8c9d0e1f", > + "sysfs": "dm-0", > + "dm_st": "active", > + "paths": 2, > + "path_groups": [ > + { > + "group": 1, > + "dm_st": "active", > + "pri": 50, > + "paths": [ > + { "dev": "sdb", "dm_st": "active", "dev_st": "running", "chk_st": "ready", "pri": 50, "target_wwpn": "0x500a098000000001" }, > + { "dev": "sdc", "dm_st": "active", "dev_st": "running", "chk_st": "ready", "pri": 50, "target_wwpn": "0x500a098000000002" } > + ] > + } > + ] > + }, > + { > + "name": "mpathb", > + "uuid": "360014050aabbccddeeff00112233445566", > + "sysfs": "dm-1", > + "dm_st": "active", > + "paths": 2, > + "path_groups": [ > + { > + "group": 1, > + "dm_st": "active", > + "pri": 10, > + "paths": [ > + { "dev": "sdd", "dm_st": "active", "dev_st": "running", "chk_st": "ready", "pri": 10 } > + ] > + }, > + { > + "group": 2, > + "dm_st": "enabled", > + "pri": 0, > + "paths": [ > + { "dev": "sde", "dm_st": "failed", "dev_st": "faulty", "chk_st": "faulty", "pri": 0 } > + ] > + } > + ] > + }, > + { > + "name": "mpathc", > + "uuid": "36001405ffffffffffffffffffffffffff", > + "sysfs": "dm-2", > + "dm_st": "active", > + "paths": 1, > + "path_groups": [ > + { > + "group": 1, > + "dm_st": "enabled", > + "pri": 0, > + "paths": [ > + { "dev": "sdf", "dm_st": "failed", "dev_st": "faulty", "chk_st": "faulty", "pri": 0, "target_wwpn": "[undef]", "host_adapter": "[undef]" } > + ] > + } > + ] > + } > + ] > +} > +EOF > + > +my $maps = PVE::Multipath::parse_maps_json($maps_json); > + > +is(scalar($maps->@*), 3, 'parsed all three maps'); > + > +my ($a, $b, $c) = $maps->@*; > + > +# fully healthy map > +is($a->{wwid}, '3600140500a1b2c3d4e5f6a7b8c9d0e1f', 'map a wwid taken from uuid'); > +is($a->{name}, 'mpatha', 'map a name'); > +is($a->{sysfs}, 'dm-0', 'map a sysfs name'); > +is($a->{'paths-total'}, 2, 'map a counts both paths'); > +is($a->{'paths-active'}, 2, 'map a has two active paths'); > +is($a->{health}, 'optimal', 'map a is optimal'); > +is(scalar($a->{'path-groups'}->@*), 1, 'map a has one path group'); > +is( > + $a->{'path-groups'}->[0]->{paths}->[0]->{'target-wwpn'}, > + '0x500a098000000001', > + 'FC target wwpn is preserved', > +); > +is( > + $a->{'path-groups'}->[0]->{paths}->[0]->{transport}, > + 'fc', > + 'transport derived as fc from a target wwpn', > +); > + > +# one failed path out of two > +is($b->{'paths-total'}, 2, 'map b counts both paths across groups'); > +is($b->{'paths-active'}, 1, 'map b has one active path'); > +is($b->{health}, 'degraded', 'map b is degraded'); > + > +# no active path left > +is($c->{'paths-total'}, 1, 'map c counts its single path'); > +is($c->{'paths-active'}, 0, 'map c has no active path'); > +is($c->{health}, 'failed', 'map c is failed'); > +ok( > + !defined($c->{'path-groups'}->[0]->{paths}->[0]->{'target-wwpn'}), > + "multipathd '[undef]' target_wwpn is cleaned away (not stored)", > +); > +ok( > + !defined($c->{'path-groups'}->[0]->{paths}->[0]->{transport}), > + "'[undef]' target_wwpn does not imply fc transport", > +); > + > +# empty / no maps must parse to an empty list, not die > +my $empty = PVE::Multipath::parse_maps_json('{ "major_version": 0, "maps": [] }'); > +is_deeply($empty, [], 'no maps parses to empty list'); > + > +# malformed input must die with a clear error > +eval { PVE::Multipath::parse_maps_json('not json') }; > +ok($@ =~ m/could not parse multipathd maps JSON/, 'invalid JSON raises a clear error'); > + > +# --- config generation / WWID allow-list --- > +my $conf = PVE::Multipath::Config::generate_managed_conf(); > +like($conf, qr/managed by Proxmox VE/, 'managed conf carries the managed header'); > +like($conf, qr/user_friendly_names no/, 'baseline sets user_friendly_names no'); > +like($conf, qr/find_multipaths strict/, 'baseline opts in explicitly via find_multipaths strict'); > +is( > + scalar(() = $conf =~ /^defaults \{/mg), > + 1, > + 'baseline has exactly one defaults block (a second would be a duplicate-keyword error)', > +); > + > +my $wwids = PVE::Multipath::Config::parse_wwids("# Multipath wwids\n/3600abc/\n/3600def/\n"); > +is_deeply($wwids, ['3600abc', '3600def'], 'parse_wwids extracts the wwids'); > +like( > + PVE::Multipath::Config::format_wwids(['3600def', '3600abc']), > + qr{/3600abc/\n/3600def/}, > + 'format_wwids sorts and slash-wraps', > +); > + > +# --- cluster config (pmxcfs source of truth): SectionConfig parse/write --- > +my $raw = > + "defaults: defaults\n\tfind-multipaths strict\n\tno-path-retry queue\n\n" > + . "wwid: 3600def\n\talias san-b-lun0\n\n" > + . "wwid: 3600abc\n\talias san-a-lun0\n\tno-path-retry 18\n"; > +my $cc = PVE::Multipath::Config->parse_config('multipath.cfg', $raw); > +is_deeply( > + PVE::Multipath::Config::wwid_list($cc), > + ['3600abc', '3600def'], > + 'wwid sections become the allow-list (sorted)', > +); > +is_deeply( > + PVE::Multipath::Config::aliases($cc), > + { '3600abc' => 'san-a-lun0', '3600def' => 'san-b-lun0' }, > + 'aliases read from the wwid sections', > +); > +is( > + PVE::Multipath::Config::effective_defaults($cc)->{'no-path-retry'}, > + 'queue', > + 'defaults section knob is read', > +); > +is( > + PVE::Multipath::Config::effective_defaults($cc)->{'user-friendly-names'}, > + 'no', > + 'an unset defaults knob falls back to the managed default', > +); > + > +my $written = PVE::Multipath::Config->write_config('multipath.cfg', $cc); > +my $cc2 = PVE::Multipath::Config->parse_config('multipath.cfg', $written); > +is_deeply( > + PVE::Multipath::Config::wwid_list($cc2), > + ['3600abc', '3600def'], > + 'wwids survive the SectionConfig round-trip', > +); > +is_deeply( > + PVE::Multipath::Config::aliases($cc2), > + PVE::Multipath::Config::aliases($cc), > + 'aliases survive the round-trip', > +); > +is($cc2->{ids}->{'3600abc'}->{'no-path-retry'}, 18, 'a per-WWID knob survives the round-trip'); > + > +is_deeply( > + PVE::Multipath::Config::wwid_list(PVE::Multipath::Config->parse_config('multipath.cfg', '')), > + [], > + 'an empty cluster config has no WWIDs', > +); > + > +# --- multipaths{} block (alias plus per-WWID knobs) --- > +my $block = PVE::Multipath::Config::build_multipaths_block({ > + '3600def' => { alias => 'san-b-lun0' }, > + '3600abc' => { alias => 'san-a-lun0', 'no-path-retry' => 18 }, > + '3600nul' => {}, > +}); > +like($block, qr/^multipaths \{/m, 'block opens with multipaths {'); > +is( > + scalar(() = $block =~ /^\tmultipath \{/mg), > + 2, > + 'one multipath{} per WWID that has an alias or a knob (the empty WWID is skipped)', > +); > +like( > + $block, > + qr/wwid 3600abc.*?alias san-a-lun0.*?no_path_retry 18/s, > + 'block carries the alias and the per-WWID knob', > +); > +my $abc_pos = index($block, 'wwid 3600abc'); > +my $def_pos = index($block, 'wwid 3600def'); > +ok($abc_pos < $def_pos, 'block emits entries in WWID-sorted order'); > +is(PVE::Multipath::Config::build_multipaths_block({}), '', 'no WWIDs render to the empty string'); > + > +# --- override guard --- > +eval { PVE::Multipath::Config::check_overrides("devices {\n\tdevice {\n\t\tvendor X\n\t}\n}\n") }; > +is($@, '', 'a well-formed devices{} block passes the guard'); > +eval { PVE::Multipath::Config::check_overrides("multipaths {\n}\n") }; > +like($@, qr/managed via aliases/, 'a multipaths{} block is rejected, it is generated'); > +eval { PVE::Multipath::Config::check_overrides("devices {\n") }; > +like($@, qr/unbalanced braces/, 'unbalanced braces are rejected'); > +eval { PVE::Multipath::Config::check_overrides("frobnicate {\n}\n") }; > +like($@, qr/unknown top-level section/, 'an unknown top-level section is rejected'); > +is( > + PVE::Multipath::Config::write_overrides('x', "text \n\n"), > + "text\n", > + 'the overrides writer trims trailing whitespace', > +); > + > +done_testing(); -- Maximiliano