all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Kefu Chai <k.chai@proxmox.com>
To: pve-devel@lists.proxmox.com
Cc: Lukas Sichert <l.sichert@proxmox.com>
Subject: [PATCH v3 storage 1/2] fix #7000: rbd: handle corrupt or inaccessible images gracefully
Date: Mon, 29 Jun 2026 19:48:26 +0800	[thread overview]
Message-ID: <20260629114827.2385004-2-k.chai@proxmox.com> (raw)
In-Reply-To: <20260629114827.2385004-1-k.chai@proxmox.com>

When an RBD pool contains a corrupt or orphaned image, 'rbd ls --long
--format json' emits a per-image error to stderr and omits the broken
image from its output. PVE discarded all stderr via 'errfunc => sub { }',
so a non-zero exit surfaced as a generic 500 with no indication of
which image was broken.

The exit code is unreliable: it reflects only the last image processed,
so a per-image failure may or may not propagate depending on visit order.
Stderr is the only reliable signal.

Capture stderr from 'rbd ls --long'. When the command fails or stderr
is non-empty, fall back to 'rbd ls --format json', which lists names
without opening images. Images in the name list but absent from the
detailed output get size=-1 so callers can identify them as
inaccessible. A warning names each broken image.

If the name-only listing also fails (pool not found, auth failure,
etc.), the error re-propagates unchanged. When no errors occur,
behaviour is unchanged.

While at it, use '--long' instead of '-l' for readability.

The warn message extracts the meaningful part of each librbd error from
stderr. Ceph log lines have the format:

  TIMESTAMP ADDR LEVEL librbd::Class::Method: message: (errno) strerror

The "librbd::" prefix is injected by the dout_prefix macro defined in
each librbd source file; cpp_strerror() always formats the errno as
"(N) strerror", so ": (" reliably marks the end of the human-readable
message. Duplicate messages are deduplicated in case many images fail
the same way.

Reviewed-by: Lukas Sichert <l.sichert@proxmox.com>
Tested-by: Lukas Sichert <l.sichert@proxmox.com>
Signed-off-by: Kefu Chai <k.chai@proxmox.com>
---
 src/PVE/Storage/RBDPlugin.pm | 96 ++++++++++++++++++++++++++++++------
 1 file changed, 82 insertions(+), 14 deletions(-)

diff --git a/src/PVE/Storage/RBDPlugin.pm b/src/PVE/Storage/RBDPlugin.pm
index b537425..d74d6d2 100644
--- a/src/PVE/Storage/RBDPlugin.pm
+++ b/src/PVE/Storage/RBDPlugin.pm
@@ -222,36 +222,104 @@ sub rbd_ls {
     my ($scfg, $storeid) = @_;
 
     my $raw = '';
-    my $parser = sub { $raw .= shift };
+    my @errs;
 
-    my $cmd = $rbd_cmd->($scfg, $storeid, 'ls', '-l', '--format', 'json');
-    run_rbd_command($cmd, errmsg => "rbd error", errfunc => sub { }, outfunc => $parser);
+    my $cmd = $rbd_cmd->($scfg, $storeid, 'ls', '--long', '--format', 'json');
+    eval {
+        run_rbd_command(
+            $cmd,
+            errmsg => "rbd error",
+            errfunc => sub { push(@errs, shift); },
+            outfunc => sub { $raw .= shift; },
+        );
+    };
+    my $ls_err = $@;
 
+    # rbd ls --long outputs a complete JSON array of successfully-opened images;
+    # images that fail to open are omitted from the output and logged to stderr,
+    # but the command still exits 0. Parse whatever we got.
     my $result;
     if ($raw eq '') {
         $result = [];
     } elsif ($raw =~ m/^(\[.*\])$/s) { # untaint
         $result = JSON::decode_json($1);
-    } else {
+    } elsif (!$ls_err) {
         die "got unexpected data from rbd ls: '$raw'\n";
     }
 
     my $list = {};
 
-    foreach my $el (@$result) {
-        next if defined($el->{snapshot});
+    if ($result) {
+        for my $el (@$result) {
+            next if defined($el->{snapshot});
+
+            my $image = $el->{image};
 
-        my $image = $el->{image};
+            my ($owner) = $image =~ m/^(?:vm|base)-([0-9]+)-/;
+            next if !defined($owner);
 
-        my ($owner) = $image =~ m/^(?:vm|base)-(\d+)-/;
-        next if !defined($owner);
+            $list->{$image} = {
+                name => $image,
+                size => $el->{size},
+                parent => $get_parent_image_name->($el->{parent}),
+                vmid => $owner,
+            };
+        }
+    }
 
-        $list->{$image} = {
-            name => $image,
-            size => $el->{size},
-            parent => $get_parent_image_name->($el->{parent}),
-            vmid => $owner,
+    # rbd ls --long exit code is unreliable: it reflects only the last image
+    # processed (last-wins), so stderr is the only reliable signal for
+    # per-image errors.
+    #
+    # When any errors were detected (non-zero exit or stderr), fall back to
+    # name-only listing which never opens images and always succeeds. If the
+    # name-only listing itself fails, re-propagate as a fatal error (pool not
+    # found, auth failure, etc.).
+    if ($ls_err || @errs) {
+        # e.g.: 2026-04-10T15:12:39.025+0200 77994ffff6c0 -1 librbd::image::OpenRequest: failed to retrieve initial metadata: (2) No such file or directory
+        # deduplicate in case multiple images fail the same way.
+        my %seen;
+        my @msgs;
+        for my $line (@errs) {
+            if ($line =~ /(librbd::.*?): \(/) {
+                push(@msgs, $1) if !$seen{$1}++;
+            }
+        }
+        my $details = @msgs ? ": " . join("; ", @msgs) : "";
+        warn "rbd ls --long had errors, checking for broken images$details\n";
+
+        my $names_raw = '';
+        my $names_cmd = $rbd_cmd->($scfg, $storeid, 'ls', '--format', 'json');
+        eval {
+            run_rbd_command(
+                $names_cmd,
+                errmsg => "rbd error",
+                errfunc => sub { },
+                outfunc => sub { $names_raw .= shift; },
+            );
         };
+        die $@ if $@;
+
+        my $all_names = [];
+        if ($names_raw =~ m/^(\[.*\])$/s) { # untaint
+            $all_names = eval { JSON::decode_json($1); };
+            die "invalid JSON output from 'rbd ls': $@\n" if $@;
+        }
+
+        for my $image (@$all_names) {
+            next if exists($list->{$image});
+
+            my ($owner) = $image =~ m/^(?:vm|base)-([0-9]+)-/;
+            next if !defined($owner);
+
+            warn "rbd image '$image' is corrupt or inaccessible\n";
+            $list->{$image} = {
+                name => $image,
+                size => -1,
+                parent => undef,
+                vmid => $owner,
+            };
+        }
     }
 
     return $list;
-- 
2.47.3





  reply	other threads:[~2026-06-29 11:49 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-29 11:48 [PATCH v3 manager/storage 0/2] rbd: handle corrupt or inaccessible images gracefully Kefu Chai
2026-06-29 11:48 ` Kefu Chai [this message]
2026-06-29 11:48 ` [PATCH v3 manager 2/2] storage: content: show inaccessible RBD images in UI Kefu Chai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260629114827.2385004-2-k.chai@proxmox.com \
    --to=k.chai@proxmox.com \
    --cc=l.sichert@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal