From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 0F9F61FF13B for ; Wed, 25 Mar 2026 04:51:09 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 17FDC2B5E; Wed, 25 Mar 2026 04:51:21 +0100 (CET) From: Kefu Chai To: pve-devel@lists.proxmox.com Subject: [PATCH manager 0/1] ceph: add opt-in locality-aware replica reads (crush_location_hook) Date: Wed, 25 Mar 2026 11:51:03 +0800 Message-ID: <20260325035104.2264118-1-k.chai@proxmox.com> X-Mailer: git-send-email 2.47.3 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1774410624537 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.379 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: 44ITCARMVVB5ZLSHVP3BPCUL2MG65XWY X-Message-ID-Hash: 44ITCARMVVB5ZLSHVP3BPCUL2MG65XWY X-MailFrom: k.chai@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: This patch was prompted by a forum thread [1] in which a user reported persistent high IO wait on PostgreSQL VMs running on a three-AZ Ceph cluster. The discussion surfaced a general optimization opportunity: librbd, by default, always reads from the primary OSD regardless of its location. In a multi-AZ deployment, that can mean every read pays a cross-AZ round-trip even when a same-AZ replica is available. rbd_read_from_replica_policy = localize addresses this by directing librbd to prefer the nearest replica, but it requires the client to declare its own position in the CRUSH hierarchy. This patch ships a hook script that supplies that position by querying the live CRUSH map (ceph osd crush find), and wires it up as an opt-in in pveceph init. The benefit scales with topology: in a multi-AZ cluster it keeps reads within the same AZ; in a hyperconverged setup, reads to a co-located OSD never leave the host at all. The feature is opt-in because it can degrade performance when replicas are equidistant or when the hook falls back to an incorrect CRUSH root — see the commit message for details. [1] https://forum.proxmox.com/threads/ceph-vm-with-high-io-wait.181751/ Kefu Chai (1): ceph: add opt-in locality-aware replica reads (crush_location_hook) PVE/API2/Ceph.pm | 17 ++++++++++ bin/Makefile | 3 +- bin/ceph-crush-location | 43 ++++++++++++++++++++++++++ www/manager6/ceph/CephInstallWizard.js | 8 ++++- 4 files changed, 69 insertions(+), 2 deletions(-) create mode 100644 bin/ceph-crush-location -- 2.47.3