* [PVE-User] Proxmox HCI Ceph: "osd_max_backfills" is overridden and set to 1000
@ 2023-05-30 10:00 Benjamin Hofer
2023-05-30 12:14 ` Stefan Hanreich
0 siblings, 1 reply; 2+ messages in thread
From: Benjamin Hofer @ 2023-05-30 10:00 UTC (permalink / raw)
To: pve-user
Dear community,
We've set up a Proxmox hyper-converged Ceph cluster in production.
After syncing in one new OSD using the "pveceph osd create" command,
we got massive network performance issues and outages. We then found
that "osd_max_backfills" is set to 1000 (Ceph default is 1) and that
this (along with some other values) have been overridden.
Does anyone know a root cause? I can't imagine that this is the
Proxmox default behaviour and I'm very sure that we didn't change
anything (actually I didn't even know the value before researching and
talking to colleagues with deeper Ceph knowledge).
System:
PVE version output: pve-manager/7.3-6/723bb6ec (running kernel: 5.15.102-1-pve)
ceph version 17.2.5 (e04241aa9b639588fa6c864845287d2824cb6b55) quincy (stable)
# ceph config get osd.1
WHO MASK LEVEL OPTION VALUE RO
osd.1 basic osd_mclock_max_capacity_iops_ssd 17080.220753
# ceph config show osd.1
NAME VALUE
SOURCE OVERRIDES IGNORES
auth_client_required cephx
file
auth_cluster_required cephx
file
auth_service_required cephx
file
cluster_network 10.0.18.0/24
file
daemonize false
override
keyring $osd_data/keyring
default
leveldb_log
default
mon_allow_pool_delete true
file
mon_host 10.0.18.30 10.0.18.10
10.0.18.20 file
ms_bind_ipv4 true
file
ms_bind_ipv6 false
file
no_config_file false
override
osd_delete_sleep 0.000000
override
osd_delete_sleep_hdd 0.000000
override
osd_delete_sleep_hybrid 0.000000
override
osd_delete_sleep_ssd 0.000000
override
osd_max_backfills 1000
override
osd_mclock_max_capacity_iops_ssd 17080.220753
mon
osd_mclock_scheduler_background_best_effort_lim 999999
default
osd_mclock_scheduler_background_best_effort_res 534
default
osd_mclock_scheduler_background_best_effort_wgt 2
default
osd_mclock_scheduler_background_recovery_lim 2135
default
osd_mclock_scheduler_background_recovery_res 534
default
osd_mclock_scheduler_background_recovery_wgt 1
default
osd_mclock_scheduler_client_lim 999999
default
osd_mclock_scheduler_client_res 1068
default
osd_mclock_scheduler_client_wgt 2
default
osd_pool_default_min_size 2
file
osd_pool_default_size 3
file
osd_recovery_max_active 1000
override
osd_recovery_max_active_hdd 1000
override
osd_recovery_max_active_ssd 1000
override
osd_recovery_sleep 0.000000
override
osd_recovery_sleep_hdd 0.000000
override
osd_recovery_sleep_hybrid 0.000000
override
osd_recovery_sleep_ssd 0.000000
override
osd_scrub_sleep 0.000000
override
osd_snap_trim_sleep 0.000000
override
osd_snap_trim_sleep_hdd 0.000000
override
osd_snap_trim_sleep_hybrid 0.000000
override
osd_snap_trim_sleep_ssd 0.000000
override
public_network 10.0.18.0/24
file
rbd_default_features 61
default
rbd_qos_exclude_ops 0
default
setgroup ceph
cmdline
setuser ceph
cmdline
Thanks a lot in advance.
Best
Benjamin
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PVE-User] Proxmox HCI Ceph: "osd_max_backfills" is overridden and set to 1000
2023-05-30 10:00 [PVE-User] Proxmox HCI Ceph: "osd_max_backfills" is overridden and set to 1000 Benjamin Hofer
@ 2023-05-30 12:14 ` Stefan Hanreich
0 siblings, 0 replies; 2+ messages in thread
From: Stefan Hanreich @ 2023-05-30 12:14 UTC (permalink / raw)
To: Proxmox VE user list, Benjamin Hofer
Hi Benjamin
This behavior was introduced in Ceph with the new mClock scheduler [1].
If the mClock scheduler is used, the osd_max_backfills option gets
overridden (to 1000), among others.
This is what is very likely causing the issues in your cluster when
rebalancing. With the mClock scheduler the parameters for tuning
rebalancing have changed. In our wiki you can find a description of the
new parameters and how you can use them [2].
This should be fixed in the newer Ceph version 17.2.6 [3] [4], which is
already available via our repositories (no-subscription as well as
enterprise). It contains the fix for this issue and should override the
max_backfills to a more reasonable value. Nevertheless, you should still
take a look at the new mClock tuning options.
Kind Regards
Stefan
[1] https://github.com/ceph/ceph/pull/38920
[2] https://pve.proxmox.com/wiki/Ceph_mclock_tuning
[3] https://github.com/ceph/ceph/pull/48226/files
[4]
https://github.com/ceph/ceph/commit/89e48395f8b1329066a1d7e05a4e9e083c88c1a6
On 5/30/23 12:00, Benjamin Hofer wrote:
> Dear community,
>
> We've set up a Proxmox hyper-converged Ceph cluster in production.
> After syncing in one new OSD using the "pveceph osd create" command,
> we got massive network performance issues and outages. We then found
> that "osd_max_backfills" is set to 1000 (Ceph default is 1) and that
> this (along with some other values) have been overridden.
>
> Does anyone know a root cause? I can't imagine that this is the
> Proxmox default behaviour and I'm very sure that we didn't change
> anything (actually I didn't even know the value before researching and
> talking to colleagues with deeper Ceph knowledge).
>
> System:
>
> PVE version output: pve-manager/7.3-6/723bb6ec (running kernel: 5.15.102-1-pve)
> ceph version 17.2.5 (e04241aa9b639588fa6c864845287d2824cb6b55) quincy (stable)
>
> # ceph config get osd.1
> WHO MASK LEVEL OPTION VALUE RO
> osd.1 basic osd_mclock_max_capacity_iops_ssd 17080.220753
>
> # ceph config show osd.1
> NAME VALUE
> SOURCE OVERRIDES IGNORES
> auth_client_required cephx
> file
> auth_cluster_required cephx
> file
> auth_service_required cephx
> file
> cluster_network 10.0.18.0/24
> file
> daemonize false
> override
> keyring $osd_data/keyring
> default
> leveldb_log
> default
> mon_allow_pool_delete true
> file
> mon_host 10.0.18.30 10.0.18.10
> 10.0.18.20 file
> ms_bind_ipv4 true
> file
> ms_bind_ipv6 false
> file
> no_config_file false
> override
> osd_delete_sleep 0.000000
> override
> osd_delete_sleep_hdd 0.000000
> override
> osd_delete_sleep_hybrid 0.000000
> override
> osd_delete_sleep_ssd 0.000000
> override
> osd_max_backfills 1000
> override
> osd_mclock_max_capacity_iops_ssd 17080.220753
> mon
> osd_mclock_scheduler_background_best_effort_lim 999999
> default
> osd_mclock_scheduler_background_best_effort_res 534
> default
> osd_mclock_scheduler_background_best_effort_wgt 2
> default
> osd_mclock_scheduler_background_recovery_lim 2135
> default
> osd_mclock_scheduler_background_recovery_res 534
> default
> osd_mclock_scheduler_background_recovery_wgt 1
> default
> osd_mclock_scheduler_client_lim 999999
> default
> osd_mclock_scheduler_client_res 1068
> default
> osd_mclock_scheduler_client_wgt 2
> default
> osd_pool_default_min_size 2
> file
> osd_pool_default_size 3
> file
> osd_recovery_max_active 1000
> override
> osd_recovery_max_active_hdd 1000
> override
> osd_recovery_max_active_ssd 1000
> override
> osd_recovery_sleep 0.000000
> override
> osd_recovery_sleep_hdd 0.000000
> override
> osd_recovery_sleep_hybrid 0.000000
> override
> osd_recovery_sleep_ssd 0.000000
> override
> osd_scrub_sleep 0.000000
> override
> osd_snap_trim_sleep 0.000000
> override
> osd_snap_trim_sleep_hdd 0.000000
> override
> osd_snap_trim_sleep_hybrid 0.000000
> override
> osd_snap_trim_sleep_ssd 0.000000
> override
> public_network 10.0.18.0/24
> file
> rbd_default_features 61
> default
> rbd_qos_exclude_ops 0
> default
> setgroup ceph
> cmdline
> setuser ceph
> cmdline
>
> Thanks a lot in advance.
>
> Best
> Benjamin
>
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2023-05-30 12:14 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-30 10:00 [PVE-User] Proxmox HCI Ceph: "osd_max_backfills" is overridden and set to 1000 Benjamin Hofer
2023-05-30 12:14 ` Stefan Hanreich
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox