From: Lukas Sichert <l.sichert@proxmox.com>
To: Kefu Chai <k.chai@proxmox.com>, pve-devel@lists.proxmox.com
Subject: Re: [PATCH manager/storage 0/2] fix #7000: rbd: graceful handling of corrupt/inaccessible images
Date: Fri, 10 Apr 2026 16:56:10 +0200 [thread overview]
Message-ID: <45e91b2d-0ea2-4122-8d10-3544fb01b08f@proxmox.com> (raw)
In-Reply-To: <20260401125933.3643604-1-k.chai@proxmox.com>
I was able to recreate the bug by creating a container in the rbd pool
and then deleting the header using 'rados -p <rbd-pool> rm
rbd_header.<id>'. After applying the patch, the GUI correctly shows
'N/A (inaccessible)'.
The changes look good to me. Using size => -1 makes sense in this context.
Small nit:
The line
'warn "rbd ls --long had errors, checking for broken images$details\n";'
currently produces rather noisy output including timestamps and full
context, e.g.:
'rbd ls --long had errors, checking for broken images:
2026-04-10T15:12:39.025+0200 77994ffff6c0 -1
librbd::image::OpenRequest: failed to retrieve initial metadata: (2)
No such file or directory
rbd: error opening vm-104-disk-0: (2) No such file or directory'.
Since the affected image is reported later anyway, it would be cleaner
to reduce this to the core librbd error message. For example:
'my $details = join('', @errs) =~
/^(?:\S+\s+\S+\s+-?\d+\s+)?(librbd::.*?): \(/m ? ": $1" : "";'
to only print
'librbd::image::OpenRequest: failed to retrieve initial metadata'
But this exact regex implementation might be a bit fragile, as I don't
know if there is always ': (' after the specific error message.
But as this is a rather small change, feel free to use my Tested-by
and Reviewed-by in the next version, if no bigger changes are made.
Reviewed-by: Lukas Sichert <l.sichert@proxmox.com>
Tested-by: Lukas Sichert <l.sichert@proxmox.com>
On 2026-04-01 14:59, Kefu Chai wrote:
> When a Ceph RBD pool contains a corrupt or orphaned image, the PVE WebGUI
> shows a generic 500 error without identifying which image is broken.
>
> The root cause is in rbd_ls(): all stderr was discarded via
> errfunc => sub { }, so per-image error messages from librbd (which name
> the broken image) were lost. The rbd ls -l exit code is also unreliable:
> it reflects only the last image processed, so a per-image failure may or
> may not propagate to the caller depending on image order. This was
> confirmed by auditing the Ceph main and latest Squid source, and verified
> by testing against a cluster built from Ceph main HEAD.
>
> The fix captures stderr and treats any error signal (non-zero exit or
> stderr output) as a cue to run a fallback name-only 'rbd ls', which never
> opens images and succeeds for valid pools. Images present in the name list
> but absent from the detailed results are returned with size=-1. If the
> fallback also fails, the error is a fatal one (pool not found, auth
> failure) and is re-propagated as before.
>
> A few alternatives were considered for how to signal inaccessible images
> to the UI:
>
> a) Omit broken images entirely. Simple, but the storage content view would
> silently appear healthy with no indication that images are missing.
>
> b) Add a new status field (e.g. status => 'inaccessible'). Explicit and
> extensible, but requires an API schema change and all callers to handle
> the new field.
>
> c) Emit a non-fatal warning alongside the partial results. This would
> require changes to the REST framework's error model, which is not how
> other storage plugins report partial failures.
>
> d) Use size => 0 as a sentinel. No API change needed, but ambiguous since
> a newly created image can legitimately have size 0.
>
> e) Use size => -1 as a sentinel (this patch). No API schema change needed;
> the field is already type => 'integer' with no minimum constraint, and
> the value flows through the stack unchanged. The UI patch renders it as
> 'N/A (inaccessible)'. The trade-off is that -1 is an implicit convention
> rather than a proper status field, which could be formalised later.
>
> Kefu Chai (2):
> fix #7000: rbd: handle corrupt or inaccessible images gracefully
> storage: content: show inaccessible RBD images in UI
>
> src/PVE/Storage/RBDPlugin.pm | 87 ++++++++++++++++++++++++++++++------
> www/manager6/storage/ContentView.js | 7 ++++++-
> 2 files changed, 79 insertions(+), 15 deletions(-)
>
prev parent reply other threads:[~2026-04-10 14:56 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-01 12:59 Kefu Chai
2026-04-01 12:59 ` [PATCH storage 1/2] fix #7000: rbd: handle corrupt or inaccessible images gracefully Kefu Chai
2026-04-01 12:59 ` [PATCH manager 2/2] storage: content: show inaccessible RBD images in UI Kefu Chai
2026-04-10 14:56 ` Lukas Sichert [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=45e91b2d-0ea2-4122-8d10-3544fb01b08f@proxmox.com \
--to=l.sichert@proxmox.com \
--cc=k.chai@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.