public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [PATCH manager/storage 0/2] fix #7000: rbd: graceful handling of corrupt/inaccessible images
@ 2026-04-01 12:59 Kefu Chai
  2026-04-01 12:59 ` [PATCH storage 1/2] fix #7000: rbd: handle corrupt or inaccessible images gracefully Kefu Chai
  2026-04-01 12:59 ` [PATCH manager 2/2] storage: content: show inaccessible RBD images in UI Kefu Chai
  0 siblings, 2 replies; 3+ messages in thread
From: Kefu Chai @ 2026-04-01 12:59 UTC (permalink / raw)
  To: pve-devel

When a Ceph RBD pool contains a corrupt or orphaned image, the PVE WebGUI
shows a generic 500 error without identifying which image is broken.

The root cause is in rbd_ls(): all stderr was discarded via
errfunc => sub { }, so per-image error messages from librbd (which name
the broken image) were lost. The rbd ls -l exit code is also unreliable:
it reflects only the last image processed, so a per-image failure may or
may not propagate to the caller depending on image order. This was
confirmed by auditing the Ceph main and latest Squid source, and verified
by testing against a cluster built from Ceph main HEAD.

The fix captures stderr and treats any error signal (non-zero exit or
stderr output) as a cue to run a fallback name-only 'rbd ls', which never
opens images and succeeds for valid pools. Images present in the name list
but absent from the detailed results are returned with size=-1. If the
fallback also fails, the error is a fatal one (pool not found, auth
failure) and is re-propagated as before.

A few alternatives were considered for how to signal inaccessible images
to the UI:

a) Omit broken images entirely. Simple, but the storage content view would
   silently appear healthy with no indication that images are missing.

b) Add a new status field (e.g. status => 'inaccessible'). Explicit and
   extensible, but requires an API schema change and all callers to handle
   the new field.

c) Emit a non-fatal warning alongside the partial results. This would
   require changes to the REST framework's error model, which is not how
   other storage plugins report partial failures.

d) Use size => 0 as a sentinel. No API change needed, but ambiguous since
   a newly created image can legitimately have size 0.

e) Use size => -1 as a sentinel (this patch). No API schema change needed;
   the field is already type => 'integer' with no minimum constraint, and
   the value flows through the stack unchanged. The UI patch renders it as
   'N/A (inaccessible)'. The trade-off is that -1 is an implicit convention
   rather than a proper status field, which could be formalised later.

Kefu Chai (2):
  fix #7000: rbd: handle corrupt or inaccessible images gracefully
  storage: content: show inaccessible RBD images in UI

 src/PVE/Storage/RBDPlugin.pm        | 87 ++++++++++++++++++++++++++++++------
 www/manager6/storage/ContentView.js |  7 ++++++-
 2 files changed, 79 insertions(+), 15 deletions(-)

-- 
2.47.3





^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-04-01 12:59 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-04-01 12:59 [PATCH manager/storage 0/2] fix #7000: rbd: graceful handling of corrupt/inaccessible images Kefu Chai
2026-04-01 12:59 ` [PATCH storage 1/2] fix #7000: rbd: handle corrupt or inaccessible images gracefully Kefu Chai
2026-04-01 12:59 ` [PATCH manager 2/2] storage: content: show inaccessible RBD images in UI Kefu Chai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal