all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: "Kefu Chai" <k.chai@proxmox.com>
To: "Kefu Chai" <k.chai@proxmox.com>, <pve-devel@lists.proxmox.com>
Subject: Re: [PATCH manager 0/1] ceph: add opt-in locality-aware replica reads (crush_location_hook)
Date: Thu, 26 Mar 2026 11:44:40 +0800	[thread overview]
Message-ID: <DHCEKP54BYYB.3OV8EU87R0I22@proxmox.com> (raw)
In-Reply-To: <20260325035104.2264118-1-k.chai@proxmox.com>

Hi,

Putting a hold on this series for now.

Friedrich kindly pointed out that Tentacle v20.2.0 ships with a regression [1]
that affects rbd_read_from_replica_policy=localize. The issue is that commit
4b01c004b5d [2] ("PrimaryLogPG: don't accept ops with mixed balance_reads and
rwordered flags") causes the OSD to reject write ops that carry the LOCALIZE_READS
flag, returning -EINVAL. Since librbd sets this flag connection-wide when the
localize policy is active, this can lead to silent write failures.

The fix (a revert, PR #66611 [3]) has been merged to the tentacle branch and
should ship with v20.2.1, which is currently in QE validation [4]. Squid is not
affected — the problematic commit was only cherry-picked into tentacle.

I'll resend once v20.2.1 is released and picked up by our Tentacle packages.
The patch itself is opt-in, so there's no urgency.

Thanks,
Kefu

[1] https://tracker.ceph.com/issues/73997
[2] https://github.com/ceph/ceph/commit/4b01c004b5dc342cbdfb7cb26b47f6afe6245599
[3] https://github.com/ceph/ceph/pull/66611
[4] https://tracker.ceph.com/issues/74838

On Wed Mar 25, 2026 at 11:51 AM CST, Kefu Chai wrote:
> This patch was prompted by a forum thread [1] in which a user reported
> persistent high IO wait on PostgreSQL VMs running on a three-AZ Ceph
> cluster. The discussion surfaced a general optimization opportunity:
> librbd, by default, always reads from the primary OSD regardless of
> its location. In a multi-AZ deployment, that can mean every read pays
> a cross-AZ round-trip even when a same-AZ replica is available.
>
> rbd_read_from_replica_policy = localize addresses this by directing
> librbd to prefer the nearest replica, but it requires the client to
> declare its own position in the CRUSH hierarchy. This patch ships a
> hook script that supplies that position by querying the live CRUSH map
> (ceph osd crush find), and wires it up as an opt-in in pveceph init.
>
> The benefit scales with topology: in a multi-AZ cluster it keeps reads
> within the same AZ; in a hyperconverged setup, reads to a co-located
> OSD never leave the host at all. The feature is opt-in because it can
> degrade performance when replicas are equidistant or when the hook
> falls back to an incorrect CRUSH root — see the commit message for
> details.
>
> [1] https://forum.proxmox.com/threads/ceph-vm-with-high-io-wait.181751/
>   
>
> Kefu Chai (1):
>   ceph: add opt-in locality-aware replica reads (crush_location_hook)
>
>  PVE/API2/Ceph.pm                       | 17 ++++++++++
>  bin/Makefile                           |  3 +-
>  bin/ceph-crush-location                | 43 ++++++++++++++++++++++++++
>  www/manager6/ceph/CephInstallWizard.js |  8 ++++-
>  4 files changed, 69 insertions(+), 2 deletions(-)
>  create mode 100644 bin/ceph-crush-location





      parent reply	other threads:[~2026-03-26  3:44 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-25  3:51 Kefu Chai
2026-03-25  3:51 ` [PATCH manager 1/1] " Kefu Chai
2026-03-26  3:44 ` Kefu Chai [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DHCEKP54BYYB.3OV8EU87R0I22@proxmox.com \
    --to=k.chai@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal