all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Kefu Chai <k.chai@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH manager 0/1] ceph: add opt-in locality-aware replica reads (crush_location_hook)
Date: Wed, 25 Mar 2026 11:51:03 +0800	[thread overview]
Message-ID: <20260325035104.2264118-1-k.chai@proxmox.com> (raw)

This patch was prompted by a forum thread [1] in which a user reported
persistent high IO wait on PostgreSQL VMs running on a three-AZ Ceph
cluster. The discussion surfaced a general optimization opportunity:
librbd, by default, always reads from the primary OSD regardless of
its location. In a multi-AZ deployment, that can mean every read pays
a cross-AZ round-trip even when a same-AZ replica is available.

rbd_read_from_replica_policy = localize addresses this by directing
librbd to prefer the nearest replica, but it requires the client to
declare its own position in the CRUSH hierarchy. This patch ships a
hook script that supplies that position by querying the live CRUSH map
(ceph osd crush find), and wires it up as an opt-in in pveceph init.

The benefit scales with topology: in a multi-AZ cluster it keeps reads
within the same AZ; in a hyperconverged setup, reads to a co-located
OSD never leave the host at all. The feature is opt-in because it can
degrade performance when replicas are equidistant or when the hook
falls back to an incorrect CRUSH root — see the commit message for
details.

[1] https://forum.proxmox.com/threads/ceph-vm-with-high-io-wait.181751/
  

Kefu Chai (1):
  ceph: add opt-in locality-aware replica reads (crush_location_hook)

 PVE/API2/Ceph.pm                       | 17 ++++++++++
 bin/Makefile                           |  3 +-
 bin/ceph-crush-location                | 43 ++++++++++++++++++++++++++
 www/manager6/ceph/CephInstallWizard.js |  8 ++++-
 4 files changed, 69 insertions(+), 2 deletions(-)
 create mode 100644 bin/ceph-crush-location

-- 
2.47.3





             reply	other threads:[~2026-03-25  3:51 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-25  3:51 Kefu Chai [this message]
2026-03-25  3:51 ` [PATCH manager 1/1] " Kefu Chai
2026-03-26  3:44 ` [PATCH manager 0/1] " Kefu Chai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260325035104.2264118-1-k.chai@proxmox.com \
    --to=k.chai@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal