public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Kefu Chai" <k.chai@proxmox.com>
To: "Kefu Chai" <k.chai@proxmox.com>, <pve-devel@lists.proxmox.com>
Subject: Re: [PATCH manager 0/1] ceph: add opt-in locality-aware replica reads (crush_location_hook)
Date: Thu, 26 Mar 2026 11:44:40 +0800	[thread overview]
Message-ID: <DHCEKP54BYYB.3OV8EU87R0I22@proxmox.com> (raw)
In-Reply-To: <20260325035104.2264118-1-k.chai@proxmox.com>

Hi,

Putting a hold on this series for now.

Friedrich kindly pointed out that Tentacle v20.2.0 ships with a regression [1]
that affects rbd_read_from_replica_policy=localize. The issue is that commit
4b01c004b5d [2] ("PrimaryLogPG: don't accept ops with mixed balance_reads and
rwordered flags") causes the OSD to reject write ops that carry the LOCALIZE_READS
flag, returning -EINVAL. Since librbd sets this flag connection-wide when the
localize policy is active, this can lead to silent write failures.

The fix (a revert, PR #66611 [3]) has been merged to the tentacle branch and
should ship with v20.2.1, which is currently in QE validation [4]. Squid is not
affected — the problematic commit was only cherry-picked into tentacle.

I'll resend once v20.2.1 is released and picked up by our Tentacle packages.
The patch itself is opt-in, so there's no urgency.

Thanks,
Kefu

[1] https://tracker.ceph.com/issues/73997
[2] https://github.com/ceph/ceph/commit/4b01c004b5dc342cbdfb7cb26b47f6afe6245599
[3] https://github.com/ceph/ceph/pull/66611
[4] https://tracker.ceph.com/issues/74838

On Wed Mar 25, 2026 at 11:51 AM CST, Kefu Chai wrote:
> This patch was prompted by a forum thread [1] in which a user reported
> persistent high IO wait on PostgreSQL VMs running on a three-AZ Ceph
> cluster. The discussion surfaced a general optimization opportunity:
> librbd, by default, always reads from the primary OSD regardless of
> its location. In a multi-AZ deployment, that can mean every read pays
> a cross-AZ round-trip even when a same-AZ replica is available.
>
> rbd_read_from_replica_policy = localize addresses this by directing
> librbd to prefer the nearest replica, but it requires the client to
> declare its own position in the CRUSH hierarchy. This patch ships a
> hook script that supplies that position by querying the live CRUSH map
> (ceph osd crush find), and wires it up as an opt-in in pveceph init.
>
> The benefit scales with topology: in a multi-AZ cluster it keeps reads
> within the same AZ; in a hyperconverged setup, reads to a co-located
> OSD never leave the host at all. The feature is opt-in because it can
> degrade performance when replicas are equidistant or when the hook
> falls back to an incorrect CRUSH root — see the commit message for
> details.
>
> [1] https://forum.proxmox.com/threads/ceph-vm-with-high-io-wait.181751/
>   
>
> Kefu Chai (1):
>   ceph: add opt-in locality-aware replica reads (crush_location_hook)
>
>  PVE/API2/Ceph.pm                       | 17 ++++++++++
>  bin/Makefile                           |  3 +-
>  bin/ceph-crush-location                | 43 ++++++++++++++++++++++++++
>  www/manager6/ceph/CephInstallWizard.js |  8 ++++-
>  4 files changed, 69 insertions(+), 2 deletions(-)
>  create mode 100644 bin/ceph-crush-location





      parent reply	other threads:[~2026-03-26  3:44 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-25  3:51 Kefu Chai
2026-03-25  3:51 ` [PATCH manager 1/1] " Kefu Chai
2026-03-26  3:44 ` Kefu Chai [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DHCEKP54BYYB.3OV8EU87R0I22@proxmox.com \
    --to=k.chai@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal