From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id B6F2A1FF14C for ; Fri, 26 Jun 2026 14:12:01 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 46375FE84; Fri, 26 Jun 2026 14:11:01 +0200 (CEST) From: Thomas Lamprecht To: pve-devel@lists.proxmox.com Subject: [PATCH storage 01/13] multipath: add helper library and managed configuration Date: Fri, 26 Jun 2026 14:07:31 +0200 Message-ID: <20260626121000.2095591-2-t.lamprecht@proxmox.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260626121000.2095591-1-t.lamprecht@proxmox.com> References: <20260626121000.2095591-1-t.lamprecht@proxmox.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1782475801163 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.005 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: EVHLBDVQDQWUTMR7KI4U3GQ57C5NYAJC X-Message-ID-Hash: EVHLBDVQDQWUTMR7KI4U3GQ57C5NYAJC X-MailFrom: t.lamprecht@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Multipath on PVE is configured by hand and per node today, with nothing that keeps it consistent across a cluster. Add the foundation for managing it cluster-wide instead. The library reads the assembled maps and their health from multipathd. The configuration is a SectionConfig kept in pmxcfs: one 'defaults' section for the global multipathd knobs, plus one 'wwid' section per allow-listed LUN holding its optional alias and any per-LUN knobs. Parameters are kebab-case and rendered to multipathd's snake_case keywords, validated through the section schema so a bad value cannot reach the generated drop-in. The managed baseline is deliberately conservative: it only assembles explicitly allow-listed LUNs and keeps map names stable and WWID-based, so a device is named the same on every node and an LVM PV on it stays stable cluster-wide. Hardware-specific tuning lives in a separate, admin-owned override rather than in the generated baseline, and the two are written to distinct drop-ins, as multipath does not accept a repeated 'defaults' section in one file. Parsing and generation stay in a pure module with no dependency on PVE::Cluster, so they remain unit-testable and usable on a node whose pve-cluster does not yet observe the new file; registering it in pmxcfs needs the matching pve-cluster change. Signed-off-by: Thomas Lamprecht --- src/PVE/Makefile | 4 + src/PVE/Multipath.pm | 282 ++++++++++++++++++++++ src/PVE/Multipath/ClusterConfig.pm | 55 +++++ src/PVE/Multipath/Config.pm | 361 +++++++++++++++++++++++++++++ src/PVE/Multipath/Generator.pm | 148 ++++++++++++ src/test/Makefile | 5 +- src/test/run_multipath_tests.pl | 238 +++++++++++++++++++ 7 files changed, 1092 insertions(+), 1 deletion(-) create mode 100644 src/PVE/Multipath.pm create mode 100644 src/PVE/Multipath/ClusterConfig.pm create mode 100644 src/PVE/Multipath/Config.pm create mode 100644 src/PVE/Multipath/Generator.pm create mode 100755 src/test/run_multipath_tests.pl diff --git a/src/PVE/Makefile b/src/PVE/Makefile index 9e9f6aa..7ddd646 100644 --- a/src/PVE/Makefile +++ b/src/PVE/Makefile @@ -4,6 +4,10 @@ install: install -D -m 0644 Storage.pm ${DESTDIR}${PERLDIR}/PVE/Storage.pm install -D -m 0644 Diskmanage.pm ${DESTDIR}${PERLDIR}/PVE/Diskmanage.pm + install -D -m 0644 Multipath.pm ${DESTDIR}${PERLDIR}/PVE/Multipath.pm + install -D -m 0644 Multipath/Config.pm ${DESTDIR}${PERLDIR}/PVE/Multipath/Config.pm + install -D -m 0644 Multipath/ClusterConfig.pm ${DESTDIR}${PERLDIR}/PVE/Multipath/ClusterConfig.pm + install -D -m 0644 Multipath/Generator.pm ${DESTDIR}${PERLDIR}/PVE/Multipath/Generator.pm install -D -m 0644 CephConfig.pm ${DESTDIR}${PERLDIR}/PVE/CephConfig.pm install -D -m 0644 GuestImport.pm ${DESTDIR}${PERLDIR}/PVE/GuestImport.pm make -C Storage install diff --git a/src/PVE/Multipath.pm b/src/PVE/Multipath.pm new file mode 100644 index 0000000..59c1103 --- /dev/null +++ b/src/PVE/Multipath.pm @@ -0,0 +1,282 @@ +package PVE::Multipath; + +use strict; +use warnings; + +use JSON qw(decode_json); + +use PVE::Tools qw(run_command file_read_firstline file_get_contents); + +use PVE::Multipath::Config; + +# Helper library around device-mapper multipath (multipathd). The single place that knows how to +# talk to multipathd and turn its state into a normalized, stable structure for the rest of the +# storage stack: health reporting, the API, and consumers that wait for a map before binding to it. +# +# Everything keys on the SCSI/NAA WWID, as reported by multipathd in the map's 'uuid' field. Map +# names (mpathN / aliases) and the underlying 'sdX' paths are node-local and unstable; the WWID is +# not. + +my $MULTIPATH = '/sbin/multipath'; +my $MULTIPATHD = '/sbin/multipathd'; + +# health states for a single map, derived from its path states +use constant { + HEALTH_OPTIMAL => 'optimal', # all paths active + HEALTH_DEGRADED => 'degraded', # some but not all paths active + HEALTH_FAILED => 'failed', # no active path left +}; + +my $supported; + +sub is_supported { + return $supported if defined($supported); + $supported = (-x $MULTIPATH && -x $MULTIPATHD) ? 1 : 0; + return $supported; +} + +sub assert_supported { + die "no multipath support - please install 'multipath-tools'\n" if !is_supported(); + return 1; +} + +# Returns whether the multipathd daemon is reachable. Used for status output, so it never dies and +# just reports 0 when multipath is unavailable or the daemon socket cannot be queried. +sub is_running { + return 0 if !is_supported(); + + my $running = 0; + eval { + run_command( + [$MULTIPATHD, 'show', 'daemon'], + outfunc => sub { + my ($line) = @_; + # example: "pid 1234 idle" / "pid 1234 running" + $running = 1 if $line =~ m/^pid \d+ \S+/; + }, + errfunc => sub { }, + ); + }; + return $running; +} + +# Runs `multipathd show json` and returns the raw output. Kept separate from parsing so +# tests can feed recorded fixtures to the parse_*() functions without a running daemon. +my sub query_multipathd_json { + my ($subcmd) = @_; + + assert_supported(); + + my $output = ''; + run_command( + [$MULTIPATHD, 'show', $subcmd, 'json'], + outfunc => sub { $output .= "$_[0]\n"; }, + errfunc => sub { warn "$_[0]\n"; }, + ); + + return $output; +} + +my sub derive_health { + my ($paths_active, $paths_total) = @_; + + return HEALTH_FAILED if !$paths_total || !$paths_active; + return HEALTH_OPTIMAL if $paths_active == $paths_total; + return HEALTH_DEGRADED; +} + +my sub normalize_path { + my ($path) = @_; + + my $res = { + dev => $path->{dev}, + # 'active' or 'failed' - the state device-mapper sees + 'dm-state' => $path->{dm_st}, + # 'running', 'faulty' or 'offline' - the state the kernel block layer sees + 'dev-state' => $path->{dev_st}, + # path checker result, e.g. 'ready' / 'faulty' / 'ghost' + 'check-state' => $path->{chk_st}, + }; + $res->{priority} = int($path->{pri}) if defined($path->{pri}); + + # multipathd renders unset string fields as the literal '[undef]' + my $wwpn = $path->{target_wwpn}; + my $hba = $path->{host_adapter}; + undef $wwpn if !defined($wwpn) || $wwpn eq '[undef]' || $wwpn eq ''; + undef $hba if !defined($hba) || $hba eq '[undef]' || $hba eq ''; + $res->{'target-wwpn'} = $wwpn if defined($wwpn); + $res->{'host-adapter'} = $hba if defined($hba); + # a real target WWPN (0x...) means Fibre Channel; iSCSI/SAS transport is derived from sysfs by + # get_maps(). Do NOT treat the field's mere presence as FC, multipathd reports it as '[undef]' + # for iSCSI. + $res->{transport} = 'fc' if defined($wwpn) && $wwpn =~ /^0x[0-9a-f]+$/i; + + return $res; +} + +# Turns the output of `multipathd show maps json` into a normalized list of maps. Pure (no I/O) on +# purpose: it derives everything it can from the JSON alone, so it can be unit-tested against +# recorded fixtures. Live-only bits (byte size, transport) are added by get_maps() below. +sub parse_maps_json { + my ($json) = @_; + + my $data = eval { decode_json($json) }; + die "could not parse multipathd maps JSON: $@\n" if $@; + + my $maps = []; + for my $map (($data->{maps} // [])->@*) { + my $path_groups = []; + my ($paths_total, $paths_active) = (0, 0); + + for my $group (($map->{path_groups} // [])->@*) { + my $paths = []; + for my $path (($group->{paths} // [])->@*) { + my $normalized = normalize_path($path); + $paths_total++; + $paths_active++ if ($normalized->{'dm-state'} // '') eq 'active'; + push $paths->@*, $normalized; + } + push $path_groups->@*, + { + group => int($group->{group} // 0), + 'dm-state' => $group->{dm_st}, + priority => int($group->{pri} // 0), + paths => $paths, + }; + } + + push $maps->@*, { + wwid => $map->{uuid}, + name => $map->{name}, + sysfs => $map->{sysfs}, # the 'dm-N' kernel name + 'dm-state' => $map->{dm_st}, + 'paths-total' => $paths_total, + 'paths-active' => $paths_active, + health => derive_health($paths_active, $paths_total), + 'path-groups' => $path_groups, + }; + } + + return $maps; +} + +# Best-effort byte size of a dm device from sysfs (Linux reports size in 512b sectors regardless of +# the real block size). +my sub dm_size_bytes { + my ($sysfs) = @_; + + return undef if !$sysfs; + my $sectors = file_read_firstline("/sys/block/$sysfs/size"); + return undef if !defined($sectors) || $sectors !~ m/^\d+$/; + return int($sectors) * 512; +} + +my sub dir_has_entries { + my ($dir) = @_; + + return 0 if !-d $dir; + opendir(my $dh, $dir) or return 0; + my @entries = grep { $_ ne '.' && $_ ne '..' } readdir($dh); + closedir($dh); + return scalar(@entries) ? 1 : 0; +} + +# Best-effort transport of a single 'sdX' path from its sysfs topology; only iSCSI and SAS need it, +# Fibre Channel is already set from the map JSON. +my sub path_transport { + my ($dev) = @_; + + return undef if !$dev; + my $link = readlink("/sys/block/$dev"); + return undef if !$link; + return 'iscsi' if $link =~ m{/session\d+/}; + return 'fc' if $link =~ m{/rport-\d+}; + return 'sas' if $link =~ m{/end_device-}; + return undef; +} + +# Returns the normalized maps enriched with information that requires the local system (size, a +# stable consumer path). Dies if multipath is not supported; callers that just want status should +# guard with is_supported()/is_running(). +sub get_maps { + my $maps = parse_maps_json(query_multipathd_json('maps')); + + for my $map ($maps->@*) { + $map->{size} = dm_size_bytes($map->{sysfs}); + # WWID-stable path, present independently of the (node-local) map name + $map->{path} = "/dev/disk/by-id/dm-uuid-mpath-$map->{wwid}" + if defined($map->{wwid}); + $map->{used} = dir_has_entries("/sys/block/$map->{sysfs}/holders") + if $map->{sysfs}; + + my %transports; + for my $group ($map->{'path-groups'}->@*) { + for my $path ($group->{paths}->@*) { + $path->{transport} //= path_transport($path->{dev}); + $transports{ $path->{transport} } = 1 if defined($path->{transport}); + } + } + # only expose a map-level transport when all paths agree on it + my @transports = keys %transports; + $map->{transport} = $transports[0] if scalar(@transports) == 1; + } + + return $maps; +} + +sub get_map_for_wwid { + my ($wwid) = @_; + + for my $map (get_maps()->@*) { + return $map if defined($map->{wwid}) && $map->{wwid} eq $wwid; + } + return undef; +} + +# Polls until a map for the given WWID exists, up to $timeout seconds. A consumer like the iSCSI +# plugin uses this after a login or rescan to bind to the coalesced dm device rather than to a +# transient single 'sdX' path. +sub wait_for_map { + my ($wwid, $timeout) = @_; + + $timeout //= 10; + + my $deadline = time() + $timeout; + while (1) { + my $map = eval { get_map_for_wwid($wwid) }; + return $map if $map; + return undef if time() >= $deadline; + sleep(1); + } +} + +my $WWIDS_FILE = '/etc/multipath/wwids'; + +# The managed allow-list of LUNs (WWIDs) to assemble into a map; with 'find_multipaths strict' only +# these get multipathed. +sub list_wwids { + return [] if !-e $WWIDS_FILE; + return PVE::Multipath::Config::parse_wwids(file_get_contents($WWIDS_FILE)); +} + +sub add_wwid { + my ($wwid) = @_; + + assert_supported(); + run_command([$MULTIPATH, '-a', $wwid]); +} + +sub remove_wwid { + my ($wwid) = @_; + + assert_supported(); + run_command([$MULTIPATH, '-w', $wwid]); +} + +# Re-read the configuration and rebuild maps accordingly, after a config or allow-list change. +sub reconfigure { + assert_supported(); + run_command([$MULTIPATHD, 'reconfigure']); +} + +1; diff --git a/src/PVE/Multipath/ClusterConfig.pm b/src/PVE/Multipath/ClusterConfig.pm new file mode 100644 index 0000000..0b09c3f --- /dev/null +++ b/src/PVE/Multipath/ClusterConfig.pm @@ -0,0 +1,55 @@ +package PVE::Multipath::ClusterConfig; + +use strict; +use warnings; + +use PVE::Cluster qw(cfs_register_file cfs_read_file cfs_write_file cfs_lock_file); + +use PVE::Multipath::Config; + +# Cluster-wide multipath configuration, replicated by pmxcfs. The structured allow-list, aliases and +# knobs live in multipath.cfg (a SectionConfig); the free-form hardware override text lives in a +# separate plain file so it stays hand-editable and diffable. +my $FILENAME = 'multipath.cfg'; +my $OVERRIDES_FILENAME = 'multipath-overrides.conf'; + +cfs_register_file( + $FILENAME, + sub { PVE::Multipath::Config->parse_config(@_); }, + sub { PVE::Multipath::Config->write_config(@_); }, +); + +cfs_register_file( + $OVERRIDES_FILENAME, + \&PVE::Multipath::Config::parse_overrides, + \&PVE::Multipath::Config::write_overrides, +); + +sub read_config { + return cfs_read_file($FILENAME); +} + +sub write_config { + my ($cfg) = @_; + cfs_write_file($FILENAME, $cfg); +} + +sub read_overrides { + return cfs_read_file($OVERRIDES_FILENAME); +} + +sub write_overrides { + my ($text) = @_; + cfs_write_file($OVERRIDES_FILENAME, $text); +} + +sub lock_config { + my ($code, $errmsg) = @_; + + cfs_lock_file($FILENAME, undef, $code); + if (my $err = $@) { + $errmsg ? die "$errmsg: $err" : die $err; + } +} + +1; diff --git a/src/PVE/Multipath/Config.pm b/src/PVE/Multipath/Config.pm new file mode 100644 index 0000000..21ad72e --- /dev/null +++ b/src/PVE/Multipath/Config.pm @@ -0,0 +1,361 @@ +package PVE::Multipath::Config; + +use strict; +use warnings; + +use PVE::SectionConfig; + +use base qw(PVE::SectionConfig); + +# Parser and writer for the cluster-wide source of truth in pmxcfs (/etc/pve/multipath.cfg). It is a +# SectionConfig: a single 'defaults' section for global multipathd knobs, plus one 'wwid' section +# per allow-listed LUN holding its optional alias and per-LUN knobs. Free-form hardware overrides +# (device {} entries) live in a separate plain file, see PVE::Multipath::ClusterConfig. Kept pure so +# it stays unit-testable without PVE::Cluster. + +# Conservative, cluster-friendly defaults, applied when the 'defaults' section omits them: +# - find_multipaths strict -> only explicitly allow-listed LUNs get assembled, so boot/root and +# unrelated disks stay untouched. +# - user_friendly_names no -> the map name is the WWID, identical on every node, so an LVM PV on +# /dev/disk/by-id/dm-uuid-mpath- is stable cluster-wide without a node-local bindings file. +my $MANAGED_DEFAULTS = { + 'find-multipaths' => 'strict', + 'user-friendly-names' => 'no', + 'polling-interval' => 5, +}; + +sub managed_defaults { return { $MANAGED_DEFAULTS->%* }; } + +# Knobs valid both globally (defaults) and per-LUN (a multipaths{} entry), per multipath.conf(5). +my $shared_knobs = { + 'no-path-retry' => { + type => 'string', + pattern => '(?:queue|fail|\d+)', + typetext => 'queue|fail|', + description => + "How to react when all paths are down: keep queuing, fail at once, or retry" + . " for the given number of polling intervals.", + optional => 1, + }, + 'path-grouping-policy' => { + type => 'string', + enum => [qw(failover multibus group_by_serial group_by_prio group_by_node_name)], + description => "How paths are grouped into priority groups.", + optional => 1, + }, + failback => { + type => 'string', + pattern => '(?:manual|immediate|followover|\d+)', + typetext => 'manual|immediate|followover|', + description => "When to fail back to a restored higher-priority path group.", + optional => 1, + }, + 'path-selector' => { + type => 'string', + maxLength => 64, + description => "Path selector algorithm used within a priority group, for example" + . " 'service-time 0'.", + optional => 1, + }, +}; + +# Knobs that only make sense globally. +my $defaults_only_knobs = { + 'find-multipaths' => { + type => 'string', + enum => [qw(yes no strict greedy smart)], + default => 'strict', + description => "Which devices multipathd assembles into a map. 'strict' only takes" + . " explicitly allow-listed WWIDs.", + optional => 1, + }, + 'user-friendly-names' => { + type => 'string', + enum => [qw(yes no)], + default => 'no', + description => "Whether to use node-local mpathN names. Keep 'no' for stable WWID-based" + . " names across the cluster.", + optional => 1, + }, + 'polling-interval' => { + type => 'integer', + minimum => 1, + default => 5, + description => "Interval between path checks, in seconds.", + optional => 1, + }, +}; + +# Knobs that only make sense per-LUN. +my $wwid_only_knobs = { + alias => { + type => 'string', + pattern => '[a-zA-Z0-9][a-zA-Z0-9._-]*', + maxLength => 64, + description => + "Human-readable map name for this WWID; multipathd uses it as the map name.", + optional => 1, + }, + 'rr-min-io-rq' => { + type => 'integer', + minimum => 1, + description => + "Number of I/O requests to route to a path before switching, request-based.", + optional => 1, + }, + 'rr-weight' => { + type => 'string', + enum => [qw(priorities uniform)], + description => "Whether to weight paths by priority when balancing I/O.", + optional => 1, + }, +}; + +my $defaultData = { + propertyList => { + type => { description => "Section type ('defaults' or 'wwid')." }, + id => { + type => 'string', + description => + "Section ID: the literal 'defaults', or a LUN WWID for 'wwid' sections.", + pattern => '[a-zA-Z0-9._:-]+', + maxLength => 128, + }, + $shared_knobs->%*, + $defaults_only_knobs->%*, + $wwid_only_knobs->%*, + }, +}; + +sub private { return $defaultData; } + +package PVE::Multipath::Config::Defaults; + +use base qw(PVE::Multipath::Config); + +sub type { return 'defaults'; } + +sub options { + return { + 'find-multipaths' => { optional => 1 }, + 'user-friendly-names' => { optional => 1 }, + 'polling-interval' => { optional => 1 }, + 'no-path-retry' => { optional => 1 }, + 'path-grouping-policy' => { optional => 1 }, + failback => { optional => 1 }, + 'path-selector' => { optional => 1 }, + }; +} + +__PACKAGE__->register(); + +package PVE::Multipath::Config::Wwid; + +use base qw(PVE::Multipath::Config); + +sub type { return 'wwid'; } + +sub options { + return { + alias => { optional => 1 }, + 'no-path-retry' => { optional => 1 }, + 'path-grouping-policy' => { optional => 1 }, + failback => { optional => 1 }, + 'path-selector' => { optional => 1 }, + 'rr-min-io-rq' => { optional => 1 }, + 'rr-weight' => { optional => 1 }, + }; +} + +__PACKAGE__->register(); + +package PVE::Multipath::Config; + +__PACKAGE__->init(); + +# multipathd subsections accept only these top-level keywords; the admin override file is checked +# against them. 'multipaths' is generated from the wwid sections, so an admin block would collide. +my $OVERRIDE_KEYWORDS = + { devices => 1, overrides => 1, defaults => 1, blacklist => 1, blacklist_exceptions => 1 }; + +# Validate the free-form override text before it can break multipathd's parser cluster-wide. This is +# a guard, not a full parse: balanced braces, only known top-level sections, and no 'multipaths' +# block (that is generated from the wwid sections and a duplicate is fatal to multipathd). +sub check_overrides { + my ($text) = @_; + + return if !defined($text) || $text !~ /\S/; + + my ($open, $close) = (0, 0); + for my $line (split(/\n/, $text)) { + next if $line =~ /^\s*#/; + $open += ($line =~ tr/{//); + $close += ($line =~ tr/}//); + if ($line =~ /^\s*(\w+)\s*\{/) { + my $kw = $1; + die "multipath overrides: 'multipaths' is managed via aliases, do not set it here\n" + if $kw eq 'multipaths'; + die "multipath overrides: unknown top-level section '$kw'\n" + if $open - $close == 1 && !$OVERRIDE_KEYWORDS->{$kw}; + } + } + die "multipath overrides: unbalanced braces\n" if $open != $close; + + return; +} + +# Read/write the separate, admin-owned override file (/etc/pve/multipath-overrides.conf). Stored and +# rendered verbatim, so it stays hand-editable and diffable. +sub parse_overrides { + my ($filename, $raw) = @_; + return $raw // ''; +} + +sub write_overrides { + my ($filename, $text) = @_; + $text //= ''; + $text =~ s/\s+$//; + return length($text) ? "$text\n" : ''; +} + +my $MANAGED_HEADER = + "# This file is managed by Proxmox VE - do not edit by hand.\n" + . "# Hardware-/node-specific overrides belong in the override config.\n"; + +# Renders a named section from a key => value hash, keys sorted for a stable, diffable result. The +# config and API use kebab-case parameters, multipathd keywords are snake_case, so map '-' to '_'. +my sub render_section { + my ($name, $kv) = @_; + + my $out = "$name {\n"; + for my $key (sort keys $kv->%*) { + (my $keyword = $key) =~ tr/-/_/; + $out .= "\t$keyword $kv->{$key}\n"; + } + $out .= "}\n"; + return $out; +} + +# Builds the Proxmox-managed baseline drop-in (header + defaults section) from the effective global +# knobs. Admin overrides are not merged in here: they go into a separate conf.d file, as multipath +# rejects two 'defaults' blocks in one file (duplicate keyword) and drops the second. +sub generate_managed_conf { + my ($defaults) = @_; + $defaults //= managed_defaults(); + + return $MANAGED_HEADER . "\n" . render_section('defaults', $defaults); +} + +# The WWID allow-list file (/etc/multipath/wwids) holds one '//' per line; parse it and back. +sub parse_wwids { + my ($text) = @_; + + my $wwids = []; + for my $line (split(/\n/, $text // '')) { + next if $line =~ /^\s*#/; + next if $line =~ /^\s*$/; + if ($line =~ m{^/(.+)/\s*$}) { + push $wwids->@*, $1; + } + } + return $wwids; +} + +sub format_wwids { + my ($wwids) = @_; + + my $out = "# Multipath wwids, managed by Proxmox VE\n"; + $out .= "/$_/\n" for sort $wwids->@*; + return $out; +} + +# Builds a 'multipaths {}' block from the per-WWID sections (alias plus any per-LUN knobs); returns +# the empty string when no WWID has an alias or a knob set. +sub build_multipaths_block { + my ($wwid_opts) = @_; + + my @entries = grep { %{ $wwid_opts->{$_} } } sort keys %$wwid_opts; + return '' if !@entries; + + my $out = "multipaths {\n"; + for my $wwid (@entries) { + my $opts = $wwid_opts->{$wwid}; + $out .= "\tmultipath {\n"; + $out .= "\t\twwid $wwid\n"; + for my $key (sort keys %$opts) { + (my $keyword = $key) =~ tr/-/_/; + $out .= "\t\t$keyword $opts->{$key}\n"; + } + $out .= "\t}\n"; + } + $out .= "}\n"; + return $out; +} + +# The knob property definitions as an API parameter schema. Strip the schema 'default' so an update +# that omits a knob leaves it unchanged instead of resetting it to the managed default. +my sub api_schema { + my ($props) = @_; + + my $res = {}; + for my $key (keys %$props) { + $res->{$key} = { $props->{$key}->%* }; + delete $res->{$key}->{default}; + } + return $res; +} + +# Settable global knobs (the 'defaults' section) as an API parameter schema. +sub defaults_api_schema { + return api_schema({ $shared_knobs->%*, $defaults_only_knobs->%* }); +} + +# Settable per-WWID knobs (including the alias) as an API parameter schema. +sub wwid_api_schema { + return api_schema({ $shared_knobs->%*, $wwid_only_knobs->%* }); +} + +# Effective global knobs: the 'defaults' section merged onto the conservative managed defaults. +sub effective_defaults { + my ($cfg) = @_; + + my $defaults = managed_defaults(); + if (my $section = $cfg->{ids}->{defaults}) { + $defaults->{$_} = $section->{$_} for grep { $_ ne 'type' } keys %$section; + } + return $defaults; +} + +# The allow-listed WWIDs, that is the ids of the 'wwid' sections. +sub wwid_list { + my ($cfg) = @_; + return [sort grep { ($cfg->{ids}->{$_}->{type} // '') eq 'wwid' } keys $cfg->{ids}->%*]; +} + +# { wwid => alias } for the WWIDs that have one. +sub aliases { + my ($cfg) = @_; + + my $res = {}; + for my $wwid (keys $cfg->{ids}->%*) { + my $section = $cfg->{ids}->{$wwid}; + next if ($section->{type} // '') ne 'wwid'; + $res->{$wwid} = $section->{alias} if defined($section->{alias}); + } + return $res; +} + +# { wwid => { alias?, knob => value, ... } }, the per-WWID input to build_multipaths_block(). +sub wwid_opts { + my ($cfg) = @_; + + my $res = {}; + for my $wwid (keys $cfg->{ids}->%*) { + my $section = $cfg->{ids}->{$wwid}; + next if ($section->{type} // '') ne 'wwid'; + $res->{$wwid} = { map { $_ => $section->{$_} } grep { $_ ne 'type' } keys %$section }; + } + return $res; +} + +1; diff --git a/src/PVE/Multipath/Generator.pm b/src/PVE/Multipath/Generator.pm new file mode 100644 index 0000000..0bcd37f --- /dev/null +++ b/src/PVE/Multipath/Generator.pm @@ -0,0 +1,148 @@ +package PVE::Multipath::Generator; + +use strict; +use warnings; + +use File::Path qw(make_path); + +use PVE::Tools qw(file_get_contents file_set_contents); + +use PVE::Multipath; +use PVE::Multipath::Config; +use PVE::Multipath::ClusterConfig; + +# Renders the effective node-local multipath configuration from the cluster-wide source of truth +# (/etc/pve/multipath.cfg) and reloads multipathd when something changed. +# +# The rendered files live on the local filesystem, so they survive reboots and are available to +# multipathd at boot even before pmxcfs is up; the last successful render is the boot-time fallback. + +# Proxmox-owned drop-ins; the admin's /etc/multipath.conf keeps its default 'config_dir +# /etc/multipath/conf.d'. The managed baseline, the admin overrides, and the generated aliases each +# get their own file so multipath merges them across files instead of hitting a duplicate section +# keyword: two 'defaults' blocks in one file are rejected outright, and our 'multipaths' alias block +# would clash with a 'multipaths' section in the overrides. The overrides file sorts after the +# baseline, so an admin's defaults override it; the aliases file is a separate 'multipaths' block, +# so its order does not matter. +my $DEFAULTS_DROPIN = '/etc/multipath/conf.d/pve-defaults.conf'; +my $OVERRIDES_DROPIN = '/etc/multipath/conf.d/pve-overrides.conf'; +my $ALIASES_DROPIN = '/etc/multipath/conf.d/pve-aliases.conf'; + +my sub write_if_changed { + my ($path, $content) = @_; + + my $old = -e $path ? eval { file_get_contents($path) } : undef; + return 0 if defined($old) && $old eq $content; + + my $dir = $path =~ s!/[^/]+$!!r; + make_path($dir) if !-d $dir; + file_set_contents($path, $content); + return 1; +} + +my sub remove_if_present { + my ($path) = @_; + + return 0 if !-e $path; + unlink($path) or die "could not remove '$path': $!\n"; + return 1; +} + +sub regenerate { + my ($cfg, $overrides) = @_; + $cfg //= PVE::Multipath::ClusterConfig::read_config(); + $overrides //= PVE::Multipath::ClusterConfig::read_overrides(); + + my $changed = 0; + + my $defaults = PVE::Multipath::Config::effective_defaults($cfg); + $changed = 1 + if write_if_changed($DEFAULTS_DROPIN, + PVE::Multipath::Config::generate_managed_conf($defaults)); + + if (defined($overrides) && length($overrides)) { + my $content = + "# Managed by Proxmox VE - edit overrides in /etc/pve/multipath-overrides.conf.\n\n" + . "$overrides\n"; + $changed = 1 if write_if_changed($OVERRIDES_DROPIN, $content); + } else { + $changed = 1 if remove_if_present($OVERRIDES_DROPIN); + } + + my $block = + PVE::Multipath::Config::build_multipaths_block(PVE::Multipath::Config::wwid_opts($cfg)); + if (length($block)) { + my $content = + "# Managed by Proxmox VE - edit aliases and per-LUN options in /etc/pve/multipath.cfg.\n\n" + . $block; + $changed = 1 if write_if_changed($ALIASES_DROPIN, $content); + } else { + $changed = 1 if remove_if_present($ALIASES_DROPIN); + } + + # Bring the WWID allow-list (/etc/multipath/wwids) in line with the cluster config through + # multipath's own add/remove, so its on-disk format stays intact. Isolate each op: one failing + # WWID must not abort the whole pass, or it would stall every other WWID on every run; a failed + # op leaves the file unchanged and is retried next pass. + my %desired = map { $_ => 1 } PVE::Multipath::Config::wwid_list($cfg)->@*; + my %current = map { $_ => 1 } PVE::Multipath::list_wwids()->@*; + + my @errors; + for my $wwid (sort keys %desired) { + next if $current{$wwid}; + eval { PVE::Multipath::add_wwid($wwid); }; + if (my $err = $@) { + push @errors, "adding WWID '$wwid' failed - $err"; + } else { + $changed = 1; + } + } + for my $wwid (sort keys %current) { + next if $desired{$wwid}; + eval { PVE::Multipath::remove_wwid($wwid); }; + if (my $err = $@) { + push @errors, "removing WWID '$wwid' failed - $err"; + } else { + $changed = 1; + } + } + + # reload the daemon for whatever did converge, even if some ops failed + if ($changed && PVE::Multipath::is_running()) { + eval { PVE::Multipath::reconfigure(); }; + push @errors, "reconfigure failed - $@" if $@; + } + + die join('', @errors) if @errors; + + return $changed; +} + +# Safe periodic entry point for a status loop like pvestatd: a no-op when multipath is not in use on +# this node, and never throws, so a caller stays a single guarded line and the same entry point +# works from a systemd unit or CLI. +sub sync { + return 0 if !PVE::Multipath::is_supported(); + + my $cfg = eval { PVE::Multipath::ClusterConfig::read_config() }; + if (my $err = $@) { + warn "multipath: reading cluster config failed - $err"; + return 0; + } + my $overrides = eval { PVE::Multipath::ClusterConfig::read_overrides() }; + + # stay out of the way unless the feature is in use: nothing configured cluster-wide and no + # local drop-in present + return 0 + if !scalar(PVE::Multipath::Config::wwid_list($cfg)->@*) + && !(defined($overrides) && length($overrides)) + && !$cfg->{ids}->{defaults} + && !-e $DEFAULTS_DROPIN; + + my $changed = eval { regenerate($cfg, $overrides) }; + warn "multipath: config sync failed - $@" if $@; + + return $changed // 0; +} + +1; diff --git a/src/test/Makefile b/src/test/Makefile index ee025bc..51c7360 100644 --- a/src/test/Makefile +++ b/src/test/Makefile @@ -1,6 +1,6 @@ all: test -test: test_zfspoolplugin test_lvmplugin test_disklist test_bwlimit test_plugin test_ovf test_volume_access +test: test_zfspoolplugin test_lvmplugin test_disklist test_bwlimit test_plugin test_ovf test_volume_access test_multipath test_zfspoolplugin: run_test_zfspoolplugin.pl ./run_test_zfspoolplugin.pl @@ -22,3 +22,6 @@ test_ovf: run_ovf_tests.pl test_volume_access: run_volume_access_tests.pl ./run_volume_access_tests.pl + +test_multipath: run_multipath_tests.pl + ./run_multipath_tests.pl diff --git a/src/test/run_multipath_tests.pl b/src/test/run_multipath_tests.pl new file mode 100755 index 0000000..f710308 --- /dev/null +++ b/src/test/run_multipath_tests.pl @@ -0,0 +1,238 @@ +#!/usr/bin/perl + +use strict; +use warnings; + +use JSON; +use Test::More; + +use lib ('.', '..'); +use PVE::Multipath; +use PVE::Multipath::Config; + +# A recorded `multipathd show maps json` reply with three maps exercising each +# health state: an all-active map, a partially-failed map and an all-failed map. +my $maps_json = <<'EOF'; +{ + "major_version": 0, + "minor_version": 1, + "maps": [ + { + "name": "mpatha", + "uuid": "3600140500a1b2c3d4e5f6a7b8c9d0e1f", + "sysfs": "dm-0", + "dm_st": "active", + "paths": 2, + "path_groups": [ + { + "group": 1, + "dm_st": "active", + "pri": 50, + "paths": [ + { "dev": "sdb", "dm_st": "active", "dev_st": "running", "chk_st": "ready", "pri": 50, "target_wwpn": "0x500a098000000001" }, + { "dev": "sdc", "dm_st": "active", "dev_st": "running", "chk_st": "ready", "pri": 50, "target_wwpn": "0x500a098000000002" } + ] + } + ] + }, + { + "name": "mpathb", + "uuid": "360014050aabbccddeeff00112233445566", + "sysfs": "dm-1", + "dm_st": "active", + "paths": 2, + "path_groups": [ + { + "group": 1, + "dm_st": "active", + "pri": 10, + "paths": [ + { "dev": "sdd", "dm_st": "active", "dev_st": "running", "chk_st": "ready", "pri": 10 } + ] + }, + { + "group": 2, + "dm_st": "enabled", + "pri": 0, + "paths": [ + { "dev": "sde", "dm_st": "failed", "dev_st": "faulty", "chk_st": "faulty", "pri": 0 } + ] + } + ] + }, + { + "name": "mpathc", + "uuid": "36001405ffffffffffffffffffffffffff", + "sysfs": "dm-2", + "dm_st": "active", + "paths": 1, + "path_groups": [ + { + "group": 1, + "dm_st": "enabled", + "pri": 0, + "paths": [ + { "dev": "sdf", "dm_st": "failed", "dev_st": "faulty", "chk_st": "faulty", "pri": 0, "target_wwpn": "[undef]", "host_adapter": "[undef]" } + ] + } + ] + } + ] +} +EOF + +my $maps = PVE::Multipath::parse_maps_json($maps_json); + +is(scalar($maps->@*), 3, 'parsed all three maps'); + +my ($a, $b, $c) = $maps->@*; + +# fully healthy map +is($a->{wwid}, '3600140500a1b2c3d4e5f6a7b8c9d0e1f', 'map a wwid taken from uuid'); +is($a->{name}, 'mpatha', 'map a name'); +is($a->{sysfs}, 'dm-0', 'map a sysfs name'); +is($a->{'paths-total'}, 2, 'map a counts both paths'); +is($a->{'paths-active'}, 2, 'map a has two active paths'); +is($a->{health}, 'optimal', 'map a is optimal'); +is(scalar($a->{'path-groups'}->@*), 1, 'map a has one path group'); +is( + $a->{'path-groups'}->[0]->{paths}->[0]->{'target-wwpn'}, + '0x500a098000000001', + 'FC target wwpn is preserved', +); +is( + $a->{'path-groups'}->[0]->{paths}->[0]->{transport}, + 'fc', + 'transport derived as fc from a target wwpn', +); + +# one failed path out of two +is($b->{'paths-total'}, 2, 'map b counts both paths across groups'); +is($b->{'paths-active'}, 1, 'map b has one active path'); +is($b->{health}, 'degraded', 'map b is degraded'); + +# no active path left +is($c->{'paths-total'}, 1, 'map c counts its single path'); +is($c->{'paths-active'}, 0, 'map c has no active path'); +is($c->{health}, 'failed', 'map c is failed'); +ok( + !defined($c->{'path-groups'}->[0]->{paths}->[0]->{'target-wwpn'}), + "multipathd '[undef]' target_wwpn is cleaned away (not stored)", +); +ok( + !defined($c->{'path-groups'}->[0]->{paths}->[0]->{transport}), + "'[undef]' target_wwpn does not imply fc transport", +); + +# empty / no maps must parse to an empty list, not die +my $empty = PVE::Multipath::parse_maps_json('{ "major_version": 0, "maps": [] }'); +is_deeply($empty, [], 'no maps parses to empty list'); + +# malformed input must die with a clear error +eval { PVE::Multipath::parse_maps_json('not json') }; +ok($@ =~ m/could not parse multipathd maps JSON/, 'invalid JSON raises a clear error'); + +# --- config generation / WWID allow-list --- +my $conf = PVE::Multipath::Config::generate_managed_conf(); +like($conf, qr/managed by Proxmox VE/, 'managed conf carries the managed header'); +like($conf, qr/user_friendly_names no/, 'baseline sets user_friendly_names no'); +like($conf, qr/find_multipaths strict/, 'baseline opts in explicitly via find_multipaths strict'); +is( + scalar(() = $conf =~ /^defaults \{/mg), + 1, + 'baseline has exactly one defaults block (a second would be a duplicate-keyword error)', +); + +my $wwids = PVE::Multipath::Config::parse_wwids("# Multipath wwids\n/3600abc/\n/3600def/\n"); +is_deeply($wwids, ['3600abc', '3600def'], 'parse_wwids extracts the wwids'); +like( + PVE::Multipath::Config::format_wwids(['3600def', '3600abc']), + qr{/3600abc/\n/3600def/}, + 'format_wwids sorts and slash-wraps', +); + +# --- cluster config (pmxcfs source of truth): SectionConfig parse/write --- +my $raw = + "defaults: defaults\n\tfind-multipaths strict\n\tno-path-retry queue\n\n" + . "wwid: 3600def\n\talias san-b-lun0\n\n" + . "wwid: 3600abc\n\talias san-a-lun0\n\tno-path-retry 18\n"; +my $cc = PVE::Multipath::Config->parse_config('multipath.cfg', $raw); +is_deeply( + PVE::Multipath::Config::wwid_list($cc), + ['3600abc', '3600def'], + 'wwid sections become the allow-list (sorted)', +); +is_deeply( + PVE::Multipath::Config::aliases($cc), + { '3600abc' => 'san-a-lun0', '3600def' => 'san-b-lun0' }, + 'aliases read from the wwid sections', +); +is( + PVE::Multipath::Config::effective_defaults($cc)->{'no-path-retry'}, + 'queue', + 'defaults section knob is read', +); +is( + PVE::Multipath::Config::effective_defaults($cc)->{'user-friendly-names'}, + 'no', + 'an unset defaults knob falls back to the managed default', +); + +my $written = PVE::Multipath::Config->write_config('multipath.cfg', $cc); +my $cc2 = PVE::Multipath::Config->parse_config('multipath.cfg', $written); +is_deeply( + PVE::Multipath::Config::wwid_list($cc2), + ['3600abc', '3600def'], + 'wwids survive the SectionConfig round-trip', +); +is_deeply( + PVE::Multipath::Config::aliases($cc2), + PVE::Multipath::Config::aliases($cc), + 'aliases survive the round-trip', +); +is($cc2->{ids}->{'3600abc'}->{'no-path-retry'}, 18, 'a per-WWID knob survives the round-trip'); + +is_deeply( + PVE::Multipath::Config::wwid_list(PVE::Multipath::Config->parse_config('multipath.cfg', '')), + [], + 'an empty cluster config has no WWIDs', +); + +# --- multipaths{} block (alias plus per-WWID knobs) --- +my $block = PVE::Multipath::Config::build_multipaths_block({ + '3600def' => { alias => 'san-b-lun0' }, + '3600abc' => { alias => 'san-a-lun0', 'no-path-retry' => 18 }, + '3600nul' => {}, +}); +like($block, qr/^multipaths \{/m, 'block opens with multipaths {'); +is( + scalar(() = $block =~ /^\tmultipath \{/mg), + 2, + 'one multipath{} per WWID that has an alias or a knob (the empty WWID is skipped)', +); +like( + $block, + qr/wwid 3600abc.*?alias san-a-lun0.*?no_path_retry 18/s, + 'block carries the alias and the per-WWID knob', +); +my $abc_pos = index($block, 'wwid 3600abc'); +my $def_pos = index($block, 'wwid 3600def'); +ok($abc_pos < $def_pos, 'block emits entries in WWID-sorted order'); +is(PVE::Multipath::Config::build_multipaths_block({}), '', 'no WWIDs render to the empty string'); + +# --- override guard --- +eval { PVE::Multipath::Config::check_overrides("devices {\n\tdevice {\n\t\tvendor X\n\t}\n}\n") }; +is($@, '', 'a well-formed devices{} block passes the guard'); +eval { PVE::Multipath::Config::check_overrides("multipaths {\n}\n") }; +like($@, qr/managed via aliases/, 'a multipaths{} block is rejected, it is generated'); +eval { PVE::Multipath::Config::check_overrides("devices {\n") }; +like($@, qr/unbalanced braces/, 'unbalanced braces are rejected'); +eval { PVE::Multipath::Config::check_overrides("frobnicate {\n}\n") }; +like($@, qr/unknown top-level section/, 'an unknown top-level section is rejected'); +is( + PVE::Multipath::Config::write_overrides('x', "text \n\n"), + "text\n", + 'the overrides writer trims trailing whitespace', +); + +done_testing(); -- 2.47.3