* [pve-devel] [PATCH manager] fix #4631: ceph: osd: create: add osds-per-device
@ 2023-04-18 12:26 Aaron Lauterer
2023-08-21 8:20 ` Fiona Ebner
0 siblings, 1 reply; 4+ messages in thread
From: Aaron Lauterer @ 2023-04-18 12:26 UTC (permalink / raw)
To: pve-devel
Allows to automatically create multiple OSDs per physical device. The
main use case are fast NVME drives that would be bottlenecked by a
single OSD service.
By using the 'ceph-volume lvm batch' command instead of the 'ceph-volume
lvm create' for multiple OSDs / device, we don't have to deal with the
split of the drive ourselves.
But this means that the parameters to specify a DB or WAL device won't
work as the 'batch' command doesn't use them. Dedicated DB and WAL
devices don't make much sense anyway if we place the OSDs on fast NVME
drives.
Some other changes to how the command is built were needed as well, as
the 'batch' command needs the path to the disk as a positional argument,
not as '--data /dev/sdX'.
We drop the '--cluster-fsid' paramter because the 'batch' command
doesn't accept it. The 'create' will fall back to reading it from the
ceph.conf file.
Removal of OSDs works as expected without any code changes. As long as
there are other OSDs on a disk, the VG & PV won't be removed, even if
'cleanup' is enabled.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
---
There are a few other places where we can improve the OSD handling if
multiple OSDs are on one physical disk.
* Disk overview in the Node -> Disks panel: show all OSD IDs present on
the disk instead of the last one in the list that was parsed
* Create window: allow to select disks that already have OSDs on it,
but still have free space available
These things will need additions and changes to the API. Especially for
the changes we probably want to target the Proxmox VE 8 release.
PVE/API2/Ceph/OSD.pm | 27 +++++++++++++++++++++++----
1 file changed, 23 insertions(+), 4 deletions(-)
diff --git a/PVE/API2/Ceph/OSD.pm b/PVE/API2/Ceph/OSD.pm
index ded35990..43dad56b 100644
--- a/PVE/API2/Ceph/OSD.pm
+++ b/PVE/API2/Ceph/OSD.pm
@@ -275,6 +275,12 @@ __PACKAGE__->register_method ({
type => 'string',
description => "Set the device class of the OSD in crush."
},
+ 'osds-per-device' => {
+ optional => 1,
+ type => 'number',
+ minimum => '1',
+ description => 'OSD services per physical device. Can improve fast NVME utilization.',
+ },
},
},
returns => { type => 'string' },
@@ -294,6 +300,15 @@ __PACKAGE__->register_method ({
# extract parameter info and fail if a device is set more than once
my $devs = {};
+ # allow 'osds-per-device' only without dedicated db and/or wal devs. We cannot specify them with
+ # 'ceph-volume lvm batch' and they don't make a lot of sense on fast NVMEs anyway.
+ if ($param->{'osds-per-device'}) {
+ for my $type ( qw(db_dev wal_dev) ) {
+ die "Cannot use 'osds-per-device' parameter with '${type}'"
+ if $param->{$type};
+ }
+ }
+
my $ceph_conf = cfs_read_file('ceph.conf');
my $osd_network = $ceph_conf->{global}->{cluster_network};
@@ -364,8 +379,6 @@ __PACKAGE__->register_method ({
my $monstat = $rados->mon_command({ prefix => 'quorum_status' });
die "unable to get fsid\n" if !$monstat->{monmap} || !$monstat->{monmap}->{fsid};
- my $fsid = $monstat->{monmap}->{fsid};
- $fsid = $1 if $fsid =~ m/^([0-9a-f\-]+)$/;
my $ceph_bootstrap_osd_keyring = PVE::Ceph::Tools::get_config('ceph_bootstrap_osd_keyring');
@@ -470,7 +483,10 @@ __PACKAGE__->register_method ({
$test_disk_requirements->($disklist);
my $dev_class = $param->{'crush-device-class'};
- my $cmd = ['ceph-volume', 'lvm', 'create', '--cluster-fsid', $fsid ];
+ # create allows for detailed configuration of DB and WAL devices
+ # batch for easy creation of multiple OSDs (per device)
+ my $create_mode = $param->{'osds-per-device'} ? 'batch' : 'create';
+ my $cmd = ['ceph-volume', 'lvm', $create_mode ];
push @$cmd, '--crush-device-class', $dev_class if $dev_class;
my $devname = $devs->{dev}->{name};
@@ -504,8 +520,11 @@ __PACKAGE__->register_method ({
push @$cmd, "--block.$type", $part_or_lv;
}
- push @$cmd, '--data', $devpath;
+ push @$cmd, '--data' if $create_mode eq 'create';
+ push @$cmd, $devpath;
push @$cmd, '--dmcrypt' if $param->{encrypted};
+ push @$cmd, '--osds-per-device', $param->{'osds-per-device'}, '--yes'
+ if $create_mode eq 'batch';
PVE::Diskmanage::wipe_blockdev($devpath);
--
2.30.2
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [pve-devel] [PATCH manager] fix #4631: ceph: osd: create: add osds-per-device
2023-04-18 12:26 [pve-devel] [PATCH manager] fix #4631: ceph: osd: create: add osds-per-device Aaron Lauterer
@ 2023-08-21 8:20 ` Fiona Ebner
2023-08-21 10:51 ` Aaron Lauterer
0 siblings, 1 reply; 4+ messages in thread
From: Fiona Ebner @ 2023-08-21 8:20 UTC (permalink / raw)
To: Proxmox VE development discussion, Aaron Lauterer
Am 18.04.23 um 14:26 schrieb Aaron Lauterer:
> Allows to automatically create multiple OSDs per physical device. The
> main use case are fast NVME drives that would be bottlenecked by a
> single OSD service.
>
> By using the 'ceph-volume lvm batch' command instead of the 'ceph-volume
> lvm create' for multiple OSDs / device, we don't have to deal with the
> split of the drive ourselves.
>
> But this means that the parameters to specify a DB or WAL device won't
> work as the 'batch' command doesn't use them. Dedicated DB and WAL
> devices don't make much sense anyway if we place the OSDs on fast NVME
> drives.
>
> Some other changes to how the command is built were needed as well, as
> the 'batch' command needs the path to the disk as a positional argument,
> not as '--data /dev/sdX'.
> We drop the '--cluster-fsid' paramter because the 'batch' command
> doesn't accept it. The 'create' will fall back to reading it from the
> ceph.conf file.
>
> Removal of OSDs works as expected without any code changes. As long as
> there are other OSDs on a disk, the VG & PV won't be removed, even if
> 'cleanup' is enabled.
>
> Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
> ---
I noticed a warning while testing
--> DEPRECATION NOTICE
--> You are using the legacy automatic disk sorting behavior
--> The Pacific release will change the default to --no-auto
--> passed data devices: 1 physical, 0 LVM
--> relative data size: 0.3333333333333333
Note that I'm on Quincy, so maybe they didn't still didn't change it :P
> @@ -275,6 +275,12 @@ __PACKAGE__->register_method ({
> type => 'string',
> description => "Set the device class of the OSD in crush."
> },
> + 'osds-per-device' => {
> + optional => 1,
> + type => 'number',
should be integer
> + minimum => '1',
> + description => 'OSD services per physical device. Can improve fast NVME utilization.',
Can we add an explicit recommendation against doing it for other disk
types? I imagine it's not beneficial for those, or?
> + },
> },
> },
> returns => { type => 'string' },
> @@ -294,6 +300,15 @@ __PACKAGE__->register_method ({
> # extract parameter info and fail if a device is set more than once
> my $devs = {};
>
> + # allow 'osds-per-device' only without dedicated db and/or wal devs. We cannot specify them with
> + # 'ceph-volume lvm batch' and they don't make a lot of sense on fast NVMEs anyway.
> + if ($param->{'osds-per-device'}) {
> + for my $type ( qw(db_dev wal_dev) ) {
> + die "Cannot use 'osds-per-device' parameter with '${type}'"
Missing newline after error message.
Could also use raise_param_exc().
> + if $param->{$type};
> + }
> + }
> +
> my $ceph_conf = cfs_read_file('ceph.conf');
>
> my $osd_network = $ceph_conf->{global}->{cluster_network};
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [pve-devel] [PATCH manager] fix #4631: ceph: osd: create: add osds-per-device
2023-08-21 8:20 ` Fiona Ebner
@ 2023-08-21 10:51 ` Aaron Lauterer
2023-08-21 11:27 ` Fiona Ebner
0 siblings, 1 reply; 4+ messages in thread
From: Aaron Lauterer @ 2023-08-21 10:51 UTC (permalink / raw)
To: Fiona Ebner, Proxmox VE development discussion
responses inline
On 8/21/23 10:20, Fiona Ebner wrote:
> Am 18.04.23 um 14:26 schrieb Aaron Lauterer:
>> Allows to automatically create multiple OSDs per physical device. The
>> main use case are fast NVME drives that would be bottlenecked by a
>> single OSD service.
>>
>> By using the 'ceph-volume lvm batch' command instead of the 'ceph-volume
>> lvm create' for multiple OSDs / device, we don't have to deal with the
>> split of the drive ourselves.
>>
>> But this means that the parameters to specify a DB or WAL device won't
>> work as the 'batch' command doesn't use them. Dedicated DB and WAL
>> devices don't make much sense anyway if we place the OSDs on fast NVME
>> drives.
>>
>> Some other changes to how the command is built were needed as well, as
>> the 'batch' command needs the path to the disk as a positional argument,
>> not as '--data /dev/sdX'.
>> We drop the '--cluster-fsid' paramter because the 'batch' command
>> doesn't accept it. The 'create' will fall back to reading it from the
>> ceph.conf file.
>>
>> Removal of OSDs works as expected without any code changes. As long as
>> there are other OSDs on a disk, the VG & PV won't be removed, even if
>> 'cleanup' is enabled.
>>
>> Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
>> ---
>
> I noticed a warning while testing
>
> --> DEPRECATION NOTICE
> --> You are using the legacy automatic disk sorting behavior
> --> The Pacific release will change the default to --no-auto
> --> passed data devices: 1 physical, 0 LVM
> --> relative data size: 0.3333333333333333
>
> Note that I'm on Quincy, so maybe they didn't still didn't change it :P
Also shows up when using `ceph-volume lvm batch …` directly. So I guess not much
we can do about it after consulting the man page.
>
>> @@ -275,6 +275,12 @@ __PACKAGE__->register_method ({
>> type => 'string',
>> description => "Set the device class of the OSD in crush."
>> },
>> + 'osds-per-device' => {
>> + optional => 1,
>> + type => 'number',
>
> should be integer
will change
>
>> + minimum => '1',
>> + description => 'OSD services per physical device. Can improve fast NVME utilization.',
>
> Can we add an explicit recommendation against doing it for other disk
> types? I imagine it's not beneficial for those, or?
What about something like:
"Only useful for fast NVME devices to utilize their performance better."?
>
>> + },
>> },
>> },
>> returns => { type => 'string' },
>> @@ -294,6 +300,15 @@ __PACKAGE__->register_method ({
>> # extract parameter info and fail if a device is set more than once
>> my $devs = {};
>>
>> + # allow 'osds-per-device' only without dedicated db and/or wal devs. We cannot specify them with
>> + # 'ceph-volume lvm batch' and they don't make a lot of sense on fast NVMEs anyway.
>> + if ($param->{'osds-per-device'}) {
>> + for my $type ( qw(db_dev wal_dev) ) {
>> + die "Cannot use 'osds-per-device' parameter with '${type}'"
>
> Missing newline after error message.
> Could also use raise_param_exc().
Ah thanks. Will switch it to an `raise_param_exc()` where we don't need the
newline AFAICT?
>
>> + if $param->{$type};
>> + }
>> + }
>> +
>> my $ceph_conf = cfs_read_file('ceph.conf');
>>
>> my $osd_network = $ceph_conf->{global}->{cluster_network};
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [pve-devel] [PATCH manager] fix #4631: ceph: osd: create: add osds-per-device
2023-08-21 10:51 ` Aaron Lauterer
@ 2023-08-21 11:27 ` Fiona Ebner
0 siblings, 0 replies; 4+ messages in thread
From: Fiona Ebner @ 2023-08-21 11:27 UTC (permalink / raw)
To: Aaron Lauterer, Proxmox VE development discussion
Am 21.08.23 um 12:51 schrieb Aaron Lauterer:
> responses inline
>
Feel free to cut away irrelevant bits when responding (could've cut away
the commit message myself last time already) ;)
> On 8/21/23 10:20, Fiona Ebner wrote:
>> I noticed a warning while testing
>>
>> --> DEPRECATION NOTICE
>> --> You are using the legacy automatic disk sorting behavior
>> --> The Pacific release will change the default to --no-auto
>> --> passed data devices: 1 physical, 0 LVM
>> --> relative data size: 0.3333333333333333
>>
>> Note that I'm on Quincy, so maybe they didn't still didn't change it :P
>
> Also shows up when using `ceph-volume lvm batch …` directly. So I guess
> not much we can do about it after consulting the man page.
We could explicitly pass --no-auto I guess [0]? While it doesn't make a
difference, since we only pass one disk to 'batch', it would at least
avoid the warning.
[0]: https://docs.ceph.com/en/reef/ceph-volume/lvm/batch/
>>
>>> + minimum => '1',
>>> + description => 'OSD services per physical device. Can
>>> improve fast NVME utilization.',
>>
>> Can we add an explicit recommendation against doing it for other disk
>> types? I imagine it's not beneficial for those, or?
>
> What about something like:
> "Only useful for fast NVME devices to utilize their performance better."?
>
Sounds good to me.
>>
>>> + },
>>> },
>>> },
>>> returns => { type => 'string' },
>>> @@ -294,6 +300,15 @@ __PACKAGE__->register_method ({
>>> # extract parameter info and fail if a device is set more than
>>> once
>>> my $devs = {};
>>> + # allow 'osds-per-device' only without dedicated db and/or wal
>>> devs. We cannot specify them with
>>> + # 'ceph-volume lvm batch' and they don't make a lot of sense on
>>> fast NVMEs anyway.
>>> + if ($param->{'osds-per-device'}) {
>>> + for my $type ( qw(db_dev wal_dev) ) {
>>> + die "Cannot use 'osds-per-device' parameter with '${type}'"
>>
>> Missing newline after error message.
>> Could also use raise_param_exc().
>
> Ah thanks. Will switch it to an `raise_param_exc()` where we don't need
> the newline AFAICT?
Yes, the function will add a newline.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-08-21 11:27 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-18 12:26 [pve-devel] [PATCH manager] fix #4631: ceph: osd: create: add osds-per-device Aaron Lauterer
2023-08-21 8:20 ` Fiona Ebner
2023-08-21 10:51 ` Aaron Lauterer
2023-08-21 11:27 ` Fiona Ebner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox