From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 5AE9E1FF13A for ; Wed, 01 Apr 2026 14:59:46 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id D5CFB1B4A6; Wed, 1 Apr 2026 15:00:14 +0200 (CEST) From: Kefu Chai To: pve-devel@lists.proxmox.com Subject: [PATCH manager/storage 0/2] fix #7000: rbd: graceful handling of corrupt/inaccessible images Date: Wed, 1 Apr 2026 20:59:31 +0800 Message-ID: <20260401125933.3643604-1-k.chai@proxmox.com> X-Mailer: git-send-email 2.47.3 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1775048322805 X-SPAM-LEVEL: Spam detection results: 0 AWL -1.103 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 1 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 1 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 1 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: SESG3MVEYMRGOGK2NMNOCK4K4ZHLOXUK X-Message-ID-Hash: SESG3MVEYMRGOGK2NMNOCK4K4ZHLOXUK X-MailFrom: k.chai@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: When a Ceph RBD pool contains a corrupt or orphaned image, the PVE WebGUI shows a generic 500 error without identifying which image is broken. The root cause is in rbd_ls(): all stderr was discarded via errfunc => sub { }, so per-image error messages from librbd (which name the broken image) were lost. The rbd ls -l exit code is also unreliable: it reflects only the last image processed, so a per-image failure may or may not propagate to the caller depending on image order. This was confirmed by auditing the Ceph main and latest Squid source, and verified by testing against a cluster built from Ceph main HEAD. The fix captures stderr and treats any error signal (non-zero exit or stderr output) as a cue to run a fallback name-only 'rbd ls', which never opens images and succeeds for valid pools. Images present in the name list but absent from the detailed results are returned with size=-1. If the fallback also fails, the error is a fatal one (pool not found, auth failure) and is re-propagated as before. A few alternatives were considered for how to signal inaccessible images to the UI: a) Omit broken images entirely. Simple, but the storage content view would silently appear healthy with no indication that images are missing. b) Add a new status field (e.g. status => 'inaccessible'). Explicit and extensible, but requires an API schema change and all callers to handle the new field. c) Emit a non-fatal warning alongside the partial results. This would require changes to the REST framework's error model, which is not how other storage plugins report partial failures. d) Use size => 0 as a sentinel. No API change needed, but ambiguous since a newly created image can legitimately have size 0. e) Use size => -1 as a sentinel (this patch). No API schema change needed; the field is already type => 'integer' with no minimum constraint, and the value flows through the stack unchanged. The UI patch renders it as 'N/A (inaccessible)'. The trade-off is that -1 is an implicit convention rather than a proper status field, which could be formalised later. Kefu Chai (2): fix #7000: rbd: handle corrupt or inaccessible images gracefully storage: content: show inaccessible RBD images in UI src/PVE/Storage/RBDPlugin.pm | 87 ++++++++++++++++++++++++++++++------ www/manager6/storage/ContentView.js | 7 ++++++- 2 files changed, 79 insertions(+), 15 deletions(-) -- 2.47.3