public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Lukas Sichert <l.sichert@proxmox.com>
To: Kefu Chai <k.chai@proxmox.com>, pve-devel@lists.proxmox.com
Subject: Re: [PATCH manager/storage 0/2] fix #7000: rbd: graceful handling of corrupt/inaccessible images
Date: Fri, 10 Apr 2026 16:56:10 +0200	[thread overview]
Message-ID: <45e91b2d-0ea2-4122-8d10-3544fb01b08f@proxmox.com> (raw)
In-Reply-To: <20260401125933.3643604-1-k.chai@proxmox.com>

I was able to recreate the bug by creating a container in the rbd pool
and then deleting the header using 'rados -p <rbd-pool> rm
rbd_header.<id>'. After applying the patch, the GUI correctly shows
'N/A (inaccessible)'.

The changes look good to me. Using size => -1 makes sense in this context.

Small nit:
The line
'warn "rbd ls --long had errors, checking for broken images$details\n";'
currently produces rather noisy output including timestamps and full
context, e.g.:
'rbd ls --long had errors, checking for broken images:
2026-04-10T15:12:39.025+0200 77994ffff6c0 -1
librbd::image::OpenRequest: failed to retrieve initial metadata: (2)
No such file or directory
rbd: error opening vm-104-disk-0: (2) No such file or directory'.
Since the affected image is reported later anyway, it would be cleaner
to reduce this to the core librbd error message. For example:
'my $details = join('', @errs) =~
/^(?:\S+\s+\S+\s+-?\d+\s+)?(librbd::.*?): \(/m ? ": $1" : "";'
to only print
'librbd::image::OpenRequest: failed to retrieve initial metadata'
But this exact regex implementation might be a bit fragile, as I don't
know if there is always ': (' after the specific error message.

But as this is a rather small change, feel free to use my Tested-by
and Reviewed-by in the next version, if no bigger changes are made.

Reviewed-by: Lukas Sichert <l.sichert@proxmox.com>
Tested-by: Lukas Sichert <l.sichert@proxmox.com>

On 2026-04-01 14:59, Kefu Chai wrote:
> When a Ceph RBD pool contains a corrupt or orphaned image, the PVE WebGUI
> shows a generic 500 error without identifying which image is broken.
> 
> The root cause is in rbd_ls(): all stderr was discarded via
> errfunc => sub { }, so per-image error messages from librbd (which name
> the broken image) were lost. The rbd ls -l exit code is also unreliable:
> it reflects only the last image processed, so a per-image failure may or
> may not propagate to the caller depending on image order. This was
> confirmed by auditing the Ceph main and latest Squid source, and verified
> by testing against a cluster built from Ceph main HEAD.
> 
> The fix captures stderr and treats any error signal (non-zero exit or
> stderr output) as a cue to run a fallback name-only 'rbd ls', which never
> opens images and succeeds for valid pools. Images present in the name list
> but absent from the detailed results are returned with size=-1. If the
> fallback also fails, the error is a fatal one (pool not found, auth
> failure) and is re-propagated as before.
> 
> A few alternatives were considered for how to signal inaccessible images
> to the UI:
> 
> a) Omit broken images entirely. Simple, but the storage content view would
>    silently appear healthy with no indication that images are missing.
> 
> b) Add a new status field (e.g. status => 'inaccessible'). Explicit and
>    extensible, but requires an API schema change and all callers to handle
>    the new field.
> 
> c) Emit a non-fatal warning alongside the partial results. This would
>    require changes to the REST framework's error model, which is not how
>    other storage plugins report partial failures.
> 
> d) Use size => 0 as a sentinel. No API change needed, but ambiguous since
>    a newly created image can legitimately have size 0.
> 
> e) Use size => -1 as a sentinel (this patch). No API schema change needed;
>    the field is already type => 'integer' with no minimum constraint, and
>    the value flows through the stack unchanged. The UI patch renders it as
>    'N/A (inaccessible)'. The trade-off is that -1 is an implicit convention
>    rather than a proper status field, which could be formalised later.
> 
> Kefu Chai (2):
>   fix #7000: rbd: handle corrupt or inaccessible images gracefully
>   storage: content: show inaccessible RBD images in UI
> 
>  src/PVE/Storage/RBDPlugin.pm        | 87 ++++++++++++++++++++++++++++++------
>  www/manager6/storage/ContentView.js |  7 ++++++-
>  2 files changed, 79 insertions(+), 15 deletions(-)
> 





      parent reply	other threads:[~2026-04-10 14:56 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-01 12:59 Kefu Chai
2026-04-01 12:59 ` [PATCH storage 1/2] fix #7000: rbd: handle corrupt or inaccessible images gracefully Kefu Chai
2026-04-01 12:59 ` [PATCH manager 2/2] storage: content: show inaccessible RBD images in UI Kefu Chai
2026-04-10 14:56 ` Lukas Sichert [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45e91b2d-0ea2-4122-8d10-3544fb01b08f@proxmox.com \
    --to=l.sichert@proxmox.com \
    --cc=k.chai@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal