From: Kefu Chai <k.chai@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH manager 0/1] ceph: add opt-in locality-aware replica reads (crush_location_hook)
Date: Wed, 25 Mar 2026 11:51:03 +0800 [thread overview]
Message-ID: <20260325035104.2264118-1-k.chai@proxmox.com> (raw)
This patch was prompted by a forum thread [1] in which a user reported
persistent high IO wait on PostgreSQL VMs running on a three-AZ Ceph
cluster. The discussion surfaced a general optimization opportunity:
librbd, by default, always reads from the primary OSD regardless of
its location. In a multi-AZ deployment, that can mean every read pays
a cross-AZ round-trip even when a same-AZ replica is available.
rbd_read_from_replica_policy = localize addresses this by directing
librbd to prefer the nearest replica, but it requires the client to
declare its own position in the CRUSH hierarchy. This patch ships a
hook script that supplies that position by querying the live CRUSH map
(ceph osd crush find), and wires it up as an opt-in in pveceph init.
The benefit scales with topology: in a multi-AZ cluster it keeps reads
within the same AZ; in a hyperconverged setup, reads to a co-located
OSD never leave the host at all. The feature is opt-in because it can
degrade performance when replicas are equidistant or when the hook
falls back to an incorrect CRUSH root — see the commit message for
details.
[1] https://forum.proxmox.com/threads/ceph-vm-with-high-io-wait.181751/
Kefu Chai (1):
ceph: add opt-in locality-aware replica reads (crush_location_hook)
PVE/API2/Ceph.pm | 17 ++++++++++
bin/Makefile | 3 +-
bin/ceph-crush-location | 43 ++++++++++++++++++++++++++
www/manager6/ceph/CephInstallWizard.js | 8 ++++-
4 files changed, 69 insertions(+), 2 deletions(-)
create mode 100644 bin/ceph-crush-location
--
2.47.3
next reply other threads:[~2026-03-25 3:51 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-25 3:51 Kefu Chai [this message]
2026-03-25 3:51 ` [PATCH manager 1/1] " Kefu Chai
2026-03-26 3:44 ` [PATCH manager 0/1] " Kefu Chai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260325035104.2264118-1-k.chai@proxmox.com \
--to=k.chai@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox