public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH storage 07/13] api: multipath: add cluster-wide health status endpoint
Date: Fri, 26 Jun 2026 14:07:37 +0200	[thread overview]
Message-ID: <20260626121000.2095591-8-t.lamprecht@proxmox.com> (raw)
In-Reply-To: <20260626121000.2095591-1-t.lamprecht@proxmox.com>

A per-node view cannot tell whether a LUN is healthy across the whole
cluster. Add an endpoint that collects the per-node broadcasts and
combines them into a per-WWID by per-node matrix, rolled up to one
cluster-state per LUN.

The broadcasts are cross-checked against live membership, so a stale
value from an offline node reads as 'unknown' rather than as healthy.
The roll-up is taken over the nodes that are actively multipathing, so
a LUN that is optimal on three nodes but degraded on a fourth shows up
as degraded instead of hiding behind the healthy majority. A node where
a multipath storage is enabled but that broadcasts nothing is surfaced
as missing rather than vanishing from the matrix. Consuming storages
are labeled from the cluster storage config.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
 src/PVE/API2/Multipath.pm       | 116 +++++++++++++++++++++++++++
 src/PVE/Multipath.pm            | 111 ++++++++++++++++++++++++++
 src/test/run_multipath_tests.pl | 135 ++++++++++++++++++++++++++++++++
 3 files changed, 362 insertions(+)

diff --git a/src/PVE/API2/Multipath.pm b/src/PVE/API2/Multipath.pm
index 6a165d5..5336d71 100644
--- a/src/PVE/API2/Multipath.pm
+++ b/src/PVE/API2/Multipath.pm
@@ -3,10 +3,14 @@ package PVE::API2::Multipath;
 use strict;
 use warnings;
 
+use JSON qw(decode_json);
+
+use PVE::Cluster;
 use PVE::Exception qw(raise_param_exc);
 use PVE::Storage;
 use PVE::Tools qw(extract_param);
 
+use PVE::Multipath;
 use PVE::Multipath::Config;
 use PVE::Multipath::ClusterConfig;
 
@@ -43,6 +47,33 @@ my sub multipath_consumers {
     return $consumers;
 }
 
+# The nodes where an allow-listed LUN is supposed to be assembled: those where a multipath storage
+# is enabled (its node restriction, or every cluster node when unrestricted). Read from the cluster
+# storage config so it is node-invariant.
+my sub multipath_expected_nodes {
+    my $expected = {};
+
+    my $cfg = eval { PVE::Storage::config() };
+    return $expected if !$cfg;
+
+    my $all_nodes;
+    my $ids = $cfg->{ids} // {};
+    for my $storeid (sort keys %$ids) {
+        my $scfg = $ids->{$storeid};
+        next if ($scfg->{type} // '') ne 'multipath';
+        next if $scfg->{disable};
+
+        if ($scfg->{nodes}) {
+            $expected->{$_} = 1 for keys $scfg->{nodes}->%*;
+        } else {
+            $all_nodes //= PVE::Cluster::get_nodelist();
+            $expected->{$_} = 1 for $all_nodes->@*;
+        }
+    }
+
+    return $expected;
+}
+
 # multipathd resolves an alias to a map name, so two WWIDs sharing one alias makes it drop a map
 # (the loser is order-dependent and only logged at level 1). Reject a collision up front.
 my sub assert_alias_free {
@@ -122,6 +153,91 @@ __PACKAGE__->register_method({
     },
 });
 
+__PACKAGE__->register_method({
+    name => 'status',
+    path => 'status',
+    method => 'GET',
+    protected => 1,
+    description => "Cluster-wide multipath health: a per-WWID by per-node matrix"
+        . " rolled up over the nodes that are actively multipathing.",
+    permissions => {
+        check => ['perm', '/', ['Sys.Audit']],
+    },
+    parameters => {
+        additionalProperties => 0,
+        properties => {},
+    },
+    returns => {
+        type => 'array',
+        items => {
+            type => 'object',
+            additionalProperties => 1,
+            properties => {
+                wwid => { type => 'string', description => 'The LUN WWID.' },
+                alias => {
+                    type => 'string',
+                    description => 'The configured alias, if any.',
+                    optional => 1,
+                },
+                'used-by' => {
+                    type => 'string',
+                    description => 'The storage consuming this LUN, if any.',
+                    optional => 1,
+                },
+                size => {
+                    type => 'integer',
+                    description => 'LUN size in bytes, as reported by a node.',
+                    optional => 1,
+                },
+                'cluster-state' => {
+                    type => 'string',
+                    description => "Worst map state across the actively multipathing nodes:"
+                        . " 'optimal', 'degraded' (some paths down on a node), 'missing' (an"
+                        . " active node has not assembled it), 'failed' (no active path), or"
+                        . " 'unknown' (no active node reports it).",
+                    enum => ['optimal', 'degraded', 'missing', 'failed', 'unknown'],
+                },
+                nodes => {
+                    type => 'object',
+                    description => 'Per-node map state, keyed by node name.',
+                    additionalProperties => 1,
+                },
+            },
+        },
+    },
+    code => sub {
+        my $cfg = PVE::Multipath::ClusterConfig::read_config();
+
+        my $raw_kv = PVE::Cluster::get_node_kv('multipath');
+        my $node_kv = {};
+        for my $node (keys %$raw_kv) {
+            my $decoded = eval { decode_json($raw_kv->{$node}) };
+            $node_kv->{$node} = $decoded if $decoded;
+        }
+
+        my $expected = multipath_expected_nodes();
+
+        # resolve liveness for every node we might place in the matrix: those that broadcast and
+        # those a multipath storage expects (and that may be silent)
+        my $members = PVE::Cluster::get_members() // {};
+        my $online = {};
+        for my $node (keys %$node_kv, keys %$expected) {
+            # standalone clusters carry no member info; treat the reporter as live
+            $online->{$node} =
+                (!%$members || ($members->{$node} && $members->{$node}->{online})) ? 1 : 0;
+        }
+
+        return PVE::Multipath::aggregate_cluster_status(
+            PVE::Multipath::Config::wwid_list($cfg),
+            PVE::Multipath::Config::aliases($cfg),
+            multipath_consumers(),
+            $node_kv,
+            $online,
+            $expected,
+        );
+    },
+});
+
 __PACKAGE__->register_method({
     name => 'set_overrides',
     path => '',
diff --git a/src/PVE/Multipath.pm b/src/PVE/Multipath.pm
index 5647189..2b93d57 100644
--- a/src/PVE/Multipath.pm
+++ b/src/PVE/Multipath.pm
@@ -333,4 +333,115 @@ sub broadcast_health {
     warn "multipath: health broadcast failed - $@" if $@;
 }
 
+# Severity ordering for rolling per-node states up into a cluster state; a higher number is worse.
+# 'unknown' is a stale or offline node and never drives the roll-up, so it sits below 'optimal'.
+my $STATE_RANK = {
+    unknown => -1,
+    optimal => 0,
+    degraded => 1,
+    missing => 2,
+    failed => 3,
+};
+
+# Pure: fold the per-node health summaries (already JSON-decoded) into a per-WWID cluster matrix.
+# Inputs:
+#   $allow_wwids  arrayref, the cluster WWID allow-list
+#   $aliases      { wwid => name }
+#   $used_by      { wwid => storage-id } of consuming LVM storages
+#   $node_kv      { node => summary } as broadcast by broadcast_health()
+#   $online       { node => bool }; a node absent here counts as offline
+#   $expected     { node => 1 } nodes where multipath storage is enabled, so an
+#                 allow-listed LUN is supposed to be present there
+#
+# The cluster-state is rolled up over the nodes that should carry each LUN. A node that reports a
+# summary but lacks the LUN, or an expected node that reports nothing at all (it lost every path and
+# cleared its broadcast), is 'missing'; without the $expected set such a node would silently drop
+# out of the view instead of going red. A node that carries a stale broadcast while offline, or an
+# expected node that is offline, shows as 'unknown' and does not drive the roll-up.
+sub aggregate_cluster_status {
+    my ($allow_wwids, $aliases, $used_by, $node_kv, $online, $expected) = @_;
+
+    $allow_wwids //= [];
+    $aliases //= {};
+    $used_by //= {};
+    $node_kv //= {};
+    $online //= {};
+    $expected //= {};
+
+    my %allow = map { $_ => 1 } $allow_wwids->@*;
+
+    # report the allow-list plus any WWID a node actually sees
+    my %wwids = %allow;
+    for my $node (keys %$node_kv) {
+        $wwids{$_} = 1 for keys $node_kv->{$node}->%*;
+    }
+
+    my $res = [];
+    for my $wwid (sort keys %wwids) {
+        my $nodes = {};
+        my $worst = 'optimal';
+        my $have_active = 0;
+        my $size;
+
+        my $rank = sub {
+            my ($state) = @_;
+            $worst = $state if $STATE_RANK->{$state} > $STATE_RANK->{$worst};
+        };
+
+        for my $node (sort keys %$node_kv) {
+            my $entry = $node_kv->{$node}->{$wwid};
+
+            if (!$online->{$node}) {
+                $nodes->{$node} = { state => 'unknown' } if $entry;
+                next;
+            }
+
+            $have_active = 1;
+            if ($entry) {
+                $nodes->{$node} = {
+                    state => $entry->{state},
+                    'paths-active' => $entry->{'paths-active'},
+                    'paths-total' => $entry->{'paths-total'},
+                    defined($entry->{transport}) ? (transport => $entry->{transport}) : (),
+                };
+                $size //= $entry->{size};
+                $rank->($entry->{state});
+            } else {
+                # node is actively multipathing but has not assembled this LUN
+                $nodes->{$node} = { state => 'missing' };
+                $rank->('missing');
+            }
+        }
+
+        # A LUN on the allow-list should assemble on every node where a multipath storage is
+        # enabled. An expected node with no broadcast at all is missing the map (online) or
+        # unreachable (offline); fold it in so a node that lost all its paths surfaces instead of
+        # vanishing.
+        if ($allow{$wwid}) {
+            for my $node (sort keys %$expected) {
+                next if exists $nodes->{$node};
+                if ($online->{$node}) {
+                    $have_active = 1;
+                    $nodes->{$node} = { state => 'missing' };
+                    $rank->('missing');
+                } else {
+                    $nodes->{$node} = { state => 'unknown' };
+                }
+            }
+        }
+
+        push $res->@*,
+            {
+                wwid => $wwid,
+                defined($aliases->{$wwid}) ? (alias => $aliases->{$wwid}) : (),
+                defined($used_by->{$wwid}) ? ('used-by' => $used_by->{$wwid}) : (),
+                defined($size) ? (size => $size) : (),
+                'cluster-state' => $have_active ? $worst : 'unknown',
+                nodes => $nodes,
+            };
+    }
+
+    return $res;
+}
+
 1;
diff --git a/src/test/run_multipath_tests.pl b/src/test/run_multipath_tests.pl
index affec23..9e7e1db 100755
--- a/src/test/run_multipath_tests.pl
+++ b/src/test/run_multipath_tests.pl
@@ -285,4 +285,139 @@ my $many = [
 my $big = JSON::encode_json(PVE::Multipath::summarize_maps_for_broadcast($many));
 ok(length($big) < 32 * 1024, "100-map summary (" . length($big) . " B) fits the KV size limit");
 
+# --- cluster status aggregation ---
+my $node_kv = {
+    nodeA => {
+        wA =>
+            { state => 'optimal', 'paths-active' => 2, 'paths-total' => 2, transport => 'iscsi' },
+        wB => {
+            state => 'optimal',
+            'paths-active' => 2,
+            'paths-total' => 2,
+            transport => 'iscsi',
+            size => 42,
+        },
+    },
+    nodeB => {
+        wA =>
+            { state => 'degraded', 'paths-active' => 1, 'paths-total' => 2, transport => 'iscsi' },
+        # nodeB is active but does not see wB
+    },
+    nodeC => {
+        # stale broadcast from an offline node
+        wA => { state => 'optimal', 'paths-active' => 2, 'paths-total' => 2 },
+    },
+};
+my $agg = PVE::Multipath::aggregate_cluster_status(
+    ['wA', 'wB', 'wZ'], # allow-list incl. an unseen WWID
+    { wA => 'lun-a' }, # alias
+    { wB => 'mptank' }, # used-by
+    $node_kv,
+    { nodeA => 1, nodeB => 1, nodeC => 0 }, # nodeC offline
+);
+my %by_wwid = map { $_->{wwid} => $_ } $agg->@*;
+
+is_deeply([sort keys %by_wwid], ['wA', 'wB', 'wZ'], 'matrix covers allow-list and seen WWIDs');
+
+is($by_wwid{wA}->{alias}, 'lun-a', 'alias surfaced on the WWID row');
+is($by_wwid{wA}->{'cluster-state'}, 'degraded', 'degraded on one active node rolls up to degraded');
+is($by_wwid{wA}->{nodes}->{nodeA}->{state}, 'optimal', 'per-node optimal cell kept');
+is($by_wwid{wA}->{nodes}->{nodeB}->{state}, 'degraded', 'per-node degraded cell kept');
+is($by_wwid{wA}->{nodes}->{nodeC}->{state}, 'unknown',
+    'offline node with stale data shows unknown');
+
+is($by_wwid{wB}->{'used-by'}, 'mptank', 'consuming storage surfaced as used-by');
+is($by_wwid{wB}->{size}, 42, 'LUN size surfaced from a reporting node');
+is(
+    $by_wwid{wB}->{'cluster-state'},
+    'missing',
+    'active node not assembling the LUN rolls up to missing',
+);
+is(
+    $by_wwid{wB}->{nodes}->{nodeB}->{state},
+    'missing',
+    'missing marked on the active node lacking it',
+);
+
+is(
+    $by_wwid{wZ}->{'cluster-state'},
+    'missing',
+    'allow-listed WWID no active node assembled is missing everywhere',
+);
+is($by_wwid{wZ}->{nodes}->{nodeA}->{state}, 'missing', 'active node missing the allow-listed WWID');
+ok(!exists $by_wwid{wZ}->{nodes}->{nodeC}, 'offline node contributes no cell for an unseen WWID');
+
+# a WWID only an offline node ever reported, with no online active node, is unknown
+my $agg_off = PVE::Multipath::aggregate_cluster_status(
+    ['wA'],
+    {},
+    {},
+    { dead => { wA => { state => 'optimal', 'paths-active' => 2, 'paths-total' => 2 } } },
+    { dead => 0 },
+);
+is(
+    $agg_off->[0]->{'cluster-state'},
+    'unknown',
+    'no online active node leaves the cluster-state unknown',
+);
+is($agg_off->[0]->{nodes}->{dead}->{state}, 'unknown', 'stale offline node shown as unknown');
+
+# failure outranks degraded in the roll-up
+my $agg2 = PVE::Multipath::aggregate_cluster_status(
+    ['wA'],
+    {},
+    {},
+    {
+        n1 => { wA => { state => 'degraded', 'paths-active' => 1, 'paths-total' => 2 } },
+        n2 => { wA => { state => 'failed', 'paths-active' => 0, 'paths-total' => 2 } },
+    },
+    { n1 => 1, n2 => 1 },
+);
+is($agg2->[0]->{'cluster-state'}, 'failed', 'failed outranks degraded in the cluster roll-up');
+
+# --- expected-node set: a node that lost all paths (silent) must not vanish ---
+# nodeS is expected (a multipath storage is enabled there) and online, but
+# broadcasts nothing - e.g. every path to the SAN is down so it cleared its KV.
+my $exp_kv = {
+    nodeA => { wA => { state => 'optimal', 'paths-active' => 2, 'paths-total' => 2 } },
+};
+my $online = { nodeA => 1, nodeS => 1, nodeOff => 0 };
+my $expected = { nodeA => 1, nodeS => 1, nodeOff => 1 };
+my $eagg = PVE::Multipath::aggregate_cluster_status(
+    ['wA'], {}, {}, $exp_kv, $online, $expected,
+);
+my $row = $eagg->[0];
+is($row->{nodes}->{nodeA}->{state}, 'optimal', 'reporting node keeps its real state');
+is(
+    $row->{nodes}->{nodeS}->{state},
+    'missing',
+    'expected online but silent node shows missing instead of vanishing',
+);
+is($row->{nodes}->{nodeOff}->{state}, 'unknown', 'expected offline node shows unknown');
+is($row->{'cluster-state'}, 'missing', 'a silent expected node drags the cluster-state to missing');
+
+# without $expected the silent node would have been invisible (regression guard
+# for the old behavior, proving the new param is what surfaces it)
+my $noexp = PVE::Multipath::aggregate_cluster_status(['wA'], {}, {}, $exp_kv, $online);
+ok(
+    !exists $noexp->[0]->{nodes}->{nodeS},
+    'without the expected set the silent node is absent (the gap the param closes)',
+);
+is($noexp->[0]->{'cluster-state'}, 'optimal', 'and the cluster-state would falsely read optimal');
+
+# expected augmentation applies only to allow-listed WWIDs, not to a LUN that a
+# node merely happens to report off-list
+my $offlist = PVE::Multipath::aggregate_cluster_status(
+    [],
+    {},
+    {},
+    { nodeA => { wX => { state => 'optimal', 'paths-active' => 1, 'paths-total' => 1 } } },
+    { nodeA => 1, nodeS => 1 },
+    { nodeA => 1, nodeS => 1 },
+);
+ok(
+    !exists $offlist->[0]->{nodes}->{nodeS},
+    'non-allow-listed WWID does not synthesize missing cells for expected nodes',
+);
+
 done_testing();
-- 
2.47.3





  parent reply	other threads:[~2026-06-26 12:11 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-26 12:07 [PATCH storage,cluster,manager 0/13] multipath: cluster-wide config, storage and health overview Thomas Lamprecht
2026-06-26 12:07 ` [PATCH storage 01/13] multipath: add helper library and managed configuration Thomas Lamprecht
2026-06-26 14:43   ` Maximiliano Sandoval
2026-06-26 12:07 ` [PATCH storage 02/13] api: disks: add read-only multipath status endpoint Thomas Lamprecht
2026-06-26 12:07 ` [PATCH storage 03/13] api: multipath: add cluster-wide configuration endpoints Thomas Lamprecht
2026-06-26 12:07 ` [PATCH storage 04/13] multipath: add storage plugin for multipath LUNs Thomas Lamprecht
2026-06-26 12:07 ` [PATCH storage 05/13] lvm: allow a multipath storage as the base device Thomas Lamprecht
2026-06-26 12:07 ` [PATCH storage 06/13] multipath: broadcast per-node map health to the cluster KV store Thomas Lamprecht
2026-06-26 12:07 ` Thomas Lamprecht [this message]
2026-06-26 12:07 ` [PATCH cluster 08/13] pmxcfs: track cluster-wide multipath configuration Thomas Lamprecht
2026-06-26 12:07 ` [PATCH manager 09/13] pvestatd: apply the cluster-wide multipath config on each node Thomas Lamprecht
2026-06-26 12:07 ` [PATCH manager 10/13] api: cluster: mount the multipath configuration endpoint Thomas Lamprecht
2026-06-26 12:07 ` [PATCH manager 11/13] pvestatd: broadcast multipath map health to the cluster Thomas Lamprecht
2026-06-26 12:07 ` [PATCH manager 12/13] ui: dc: add multipath health matrix and config editor Thomas Lamprecht
2026-06-26 14:05   ` Maximiliano Sandoval
2026-06-26 12:07 ` [PATCH manager 13/13] ui: node: show multipath maps and their paths under Disks Thomas Lamprecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260626121000.2095591-8-t.lamprecht@proxmox.com \
    --to=t.lamprecht@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal