all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Kefu Chai <k.chai@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH storage 1/2] fix #7000: rbd: handle corrupt or inaccessible images gracefully
Date: Wed,  1 Apr 2026 20:59:32 +0800	[thread overview]
Message-ID: <20260401125933.3643604-2-k.chai@proxmox.com> (raw)
In-Reply-To: <20260401125933.3643604-1-k.chai@proxmox.com>

When an RBD pool contains a corrupt or orphaned image, 'rbd ls --long
--format json' emits a per-image error to stderr and omits the broken
image from its output. PVE previously discarded all stderr from this
command via 'errfunc => sub { }', so on a non-zero exit the error
surfaced as a generic 500 without identifying the problematic image.

The exit code is unreliable: it reflects only the last image processed
(last-wins), so a per-image failure may or may not propagate depending
on the order images are visited. The per-image error on stderr is the
only reliable signal.

Capture stderr from 'rbd ls --long'. When any errors are detected
(non-zero exit or per-image stderr messages), fall back to 'rbd ls
--format json' which only lists image names without opening them and
always succeeds. Images present in the name list but absent from the
detailed listing are returned with size=-1 so the caller can identify
them as inaccessible. A per-image warning naming the broken image is
emitted to aid diagnosis.

If the name-only listing also fails, a fatal error (pool not found, auth
failure, etc.) is re-propagated unchanged.

When no errors occur, behaviour is unchanged.

While at it, use '--long' instead of '-l' for readability and
consistency with the other long-form options used in the command.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>
---
 src/PVE/Storage/RBDPlugin.pm | 87 ++++++++++++++++++++++++++++++------
 1 file changed, 73 insertions(+), 14 deletions(-)

diff --git a/src/PVE/Storage/RBDPlugin.pm b/src/PVE/Storage/RBDPlugin.pm
index 7d3e7ab..92d1f63 100644
--- a/src/PVE/Storage/RBDPlugin.pm
+++ b/src/PVE/Storage/RBDPlugin.pm
@@ -222,36 +222,95 @@ sub rbd_ls {
     my ($scfg, $storeid) = @_;
 
     my $raw = '';
-    my $parser = sub { $raw .= shift };
+    my @errs;
 
-    my $cmd = $rbd_cmd->($scfg, $storeid, 'ls', '-l', '--format', 'json');
-    run_rbd_command($cmd, errmsg => "rbd error", errfunc => sub { }, outfunc => $parser);
+    my $cmd = $rbd_cmd->($scfg, $storeid, 'ls', '--long', '--format', 'json');
+    eval {
+        run_rbd_command(
+            $cmd,
+            errmsg => "rbd error",
+            errfunc => sub { push(@errs, shift); },
+            outfunc => sub { $raw .= shift; },
+        );
+    };
+    my $ls_err = $@;
 
+    # rbd ls --long outputs a complete JSON array of successfully-opened images;
+    # images that fail to open are omitted from the output and logged to stderr,
+    # but the command still exits 0. Parse whatever we got.
     my $result;
     if ($raw eq '') {
         $result = [];
     } elsif ($raw =~ m/^(\[.*\])$/s) { # untaint
         $result = JSON::decode_json($1);
-    } else {
+    } elsif (!$ls_err) {
         die "got unexpected data from rbd ls: '$raw'\n";
     }
 
     my $list = {};
 
-    foreach my $el (@$result) {
-        next if defined($el->{snapshot});
+    if ($result) {
+        for my $el (@$result) {
+            next if defined($el->{snapshot});
 
-        my $image = $el->{image};
+            my $image = $el->{image};
 
-        my ($owner) = $image =~ m/^(?:vm|base)-(\d+)-/;
-        next if !defined($owner);
+            my ($owner) = $image =~ m/^(?:vm|base)-([0-9]+)-/;
+            next if !defined($owner);
+
+            $list->{$image} = {
+                name => $image,
+                size => $el->{size},
+                parent => $get_parent_image_name->($el->{parent}),
+                vmid => $owner,
+            };
+        }
+    }
 
-        $list->{$image} = {
-            name => $image,
-            size => $el->{size},
-            parent => $get_parent_image_name->($el->{parent}),
-            vmid => $owner,
+    # rbd ls --long exit code is unreliable: it reflects only the last image
+    # processed (last-wins), so stderr is the only reliable signal for
+    # per-image errors.
+    #
+    # When any errors were detected (non-zero exit or stderr), fall back to
+    # name-only listing which never opens images and always succeeds. If the
+    # name-only listing itself fails, re-propagate as a fatal error (pool not
+    # found, auth failure, etc.).
+    if ($ls_err || @errs) {
+        my $details = @errs ? ": @errs" : "";
+        warn "rbd ls --long had errors, checking for broken images$details\n";
+
+        my $names_raw = '';
+        my $names_cmd = $rbd_cmd->($scfg, $storeid, 'ls', '--format', 'json');
+        eval {
+            run_rbd_command(
+                $names_cmd,
+                errmsg => "rbd error",
+                errfunc => sub { },
+                outfunc => sub { $names_raw .= shift; },
+            );
         };
+        die $@ if $@;
+
+        my $all_names = [];
+        if ($names_raw =~ m/^(\[.*\])$/s) { # untaint
+            $all_names = eval { JSON::decode_json($1); };
+            die "invalid JSON output from 'rbd ls': $@\n" if $@;
+        }
+
+        for my $image ($all_names->@*) {
+            next if exists($list->{$image});
+
+            my ($owner) = $image =~ m/^(?:vm|base)-([0-9]+)-/;
+            next if !defined($owner);
+
+            warn "rbd image '$image' is corrupt or inaccessible\n";
+            $list->{$image} = {
+                name => $image,
+                size => -1,
+                parent => undef,
+                vmid => $owner,
+            };
+        }
     }
 
     return $list;
-- 
2.47.3





  reply	other threads:[~2026-04-01 12:59 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-01 12:59 [PATCH manager/storage 0/2] fix #7000: rbd: graceful handling of corrupt/inaccessible images Kefu Chai
2026-04-01 12:59 ` Kefu Chai [this message]
2026-04-01 12:59 ` [PATCH manager 2/2] storage: content: show inaccessible RBD images in UI Kefu Chai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260401125933.3643604-2-k.chai@proxmox.com \
    --to=k.chai@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal