* [pve-devel] [PATCH manager v2] fix #4631: ceph: osd: create: add osds-per-device
@ 2023-08-21 11:45 Aaron Lauterer
2023-08-23 9:05 ` Wolfgang Bumiller
0 siblings, 1 reply; 3+ messages in thread
From: Aaron Lauterer @ 2023-08-21 11:45 UTC (permalink / raw)
To: pve-devel
Allows to automatically create multiple OSDs per physical device. The
main use case are fast NVME drives that would be bottlenecked by a
single OSD service.
By using the 'ceph-volume lvm batch' command instead of the 'ceph-volume
lvm create' for multiple OSDs / device, we don't have to deal with the
split of the drive ourselves.
But this means that the parameters to specify a DB or WAL device won't
work as the 'batch' command doesn't use them. Dedicated DB and WAL
devices don't make much sense anyway if we place the OSDs on fast NVME
drives.
Some other changes to how the command is built were needed as well, as
the 'batch' command needs the path to the disk as a positional argument,
not as '--data /dev/sdX'.
We drop the '--cluster-fsid' paramter because the 'batch' command
doesn't accept it. The 'create' will fall back to reading it from the
ceph.conf file.
Removal of OSDs works as expected without any code changes. As long as
there are other OSDs on a disk, the VG & PV won't be removed, even if
'cleanup' is enabled.
The '--no-auto' paramter is used to avoid the following deprecation
warning:
```
--> DEPRECATION NOTICE
--> You are using the legacy automatic disk sorting behavior
--> The Pacific release will change the default to --no-auto
--> passed data devices: 1 physical, 0 LVM
--> relative data size: 0.3333333333333333
```
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
---
changes since v1:
* change parameter type to integer
* rephrase parameter description to make it clear it is only useful for
NVMEs
* change error handling to raise_param_exc
* add --no-auto param to ceph-volume
PVE/API2/Ceph/OSD.pm | 28 ++++++++++++++++++++++++----
1 file changed, 24 insertions(+), 4 deletions(-)
diff --git a/PVE/API2/Ceph/OSD.pm b/PVE/API2/Ceph/OSD.pm
index ded35990..3a51bebc 100644
--- a/PVE/API2/Ceph/OSD.pm
+++ b/PVE/API2/Ceph/OSD.pm
@@ -275,6 +275,13 @@ __PACKAGE__->register_method ({
type => 'string',
description => "Set the device class of the OSD in crush."
},
+ 'osds-per-device' => {
+ optional => 1,
+ type => 'integer',
+ minimum => '1',
+ description => 'OSD services per physical device. Only useful for fast ".
+ "NVME devices to utilize their performance better.',
+ },
},
},
returns => { type => 'string' },
@@ -294,6 +301,15 @@ __PACKAGE__->register_method ({
# extract parameter info and fail if a device is set more than once
my $devs = {};
+ # allow 'osds-per-device' only without dedicated db and/or wal devs. We cannot specify them with
+ # 'ceph-volume lvm batch' and they don't make a lot of sense on fast NVMEs anyway.
+ if ($param->{'osds-per-device'}) {
+ for my $type ( qw(db_dev wal_dev) ) {
+ raise_param_exc({ $type => "canot use 'osds-per-device' parameter with '${type}'" })
+ if $param->{$type};
+ }
+ }
+
my $ceph_conf = cfs_read_file('ceph.conf');
my $osd_network = $ceph_conf->{global}->{cluster_network};
@@ -364,8 +380,6 @@ __PACKAGE__->register_method ({
my $monstat = $rados->mon_command({ prefix => 'quorum_status' });
die "unable to get fsid\n" if !$monstat->{monmap} || !$monstat->{monmap}->{fsid};
- my $fsid = $monstat->{monmap}->{fsid};
- $fsid = $1 if $fsid =~ m/^([0-9a-f\-]+)$/;
my $ceph_bootstrap_osd_keyring = PVE::Ceph::Tools::get_config('ceph_bootstrap_osd_keyring');
@@ -470,7 +484,10 @@ __PACKAGE__->register_method ({
$test_disk_requirements->($disklist);
my $dev_class = $param->{'crush-device-class'};
- my $cmd = ['ceph-volume', 'lvm', 'create', '--cluster-fsid', $fsid ];
+ # create allows for detailed configuration of DB and WAL devices
+ # batch for easy creation of multiple OSDs (per device)
+ my $create_mode = $param->{'osds-per-device'} ? 'batch' : 'create';
+ my $cmd = ['ceph-volume', 'lvm', $create_mode ];
push @$cmd, '--crush-device-class', $dev_class if $dev_class;
my $devname = $devs->{dev}->{name};
@@ -504,8 +521,11 @@ __PACKAGE__->register_method ({
push @$cmd, "--block.$type", $part_or_lv;
}
- push @$cmd, '--data', $devpath;
+ push @$cmd, '--data' if $create_mode eq 'create';
+ push @$cmd, $devpath;
push @$cmd, '--dmcrypt' if $param->{encrypted};
+ push @$cmd, '--osds-per-device', $param->{'osds-per-device'}, '--yes', '--no-auto'
+ if $create_mode eq 'batch';
PVE::Diskmanage::wipe_blockdev($devpath);
--
2.39.2
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [pve-devel] [PATCH manager v2] fix #4631: ceph: osd: create: add osds-per-device
2023-08-21 11:45 [pve-devel] [PATCH manager v2] fix #4631: ceph: osd: create: add osds-per-device Aaron Lauterer
@ 2023-08-23 9:05 ` Wolfgang Bumiller
2023-08-23 9:38 ` Aaron Lauterer
0 siblings, 1 reply; 3+ messages in thread
From: Wolfgang Bumiller @ 2023-08-23 9:05 UTC (permalink / raw)
To: Aaron Lauterer; +Cc: pve-devel
On Mon, Aug 21, 2023 at 01:45:54PM +0200, Aaron Lauterer wrote:
> Allows to automatically create multiple OSDs per physical device. The
> main use case are fast NVME drives that would be bottlenecked by a
> single OSD service.
>
> By using the 'ceph-volume lvm batch' command instead of the 'ceph-volume
> lvm create' for multiple OSDs / device, we don't have to deal with the
> split of the drive ourselves.
>
> But this means that the parameters to specify a DB or WAL device won't
> work as the 'batch' command doesn't use them. Dedicated DB and WAL
> devices don't make much sense anyway if we place the OSDs on fast NVME
> drives.
>
> Some other changes to how the command is built were needed as well, as
> the 'batch' command needs the path to the disk as a positional argument,
> not as '--data /dev/sdX'.
> We drop the '--cluster-fsid' paramter because the 'batch' command
> doesn't accept it. The 'create' will fall back to reading it from the
> ceph.conf file.
>
> Removal of OSDs works as expected without any code changes. As long as
> there are other OSDs on a disk, the VG & PV won't be removed, even if
> 'cleanup' is enabled.
>
> The '--no-auto' paramter is used to avoid the following deprecation
> warning:
> ```
> --> DEPRECATION NOTICE
> --> You are using the legacy automatic disk sorting behavior
> --> The Pacific release will change the default to --no-auto
> --> passed data devices: 1 physical, 0 LVM
> --> relative data size: 0.3333333333333333
> ```
>
> Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
> ---
> changes since v1:
> * change parameter type to integer
> * rephrase parameter description to make it clear it is only useful for
> NVMEs
> * change error handling to raise_param_exc
> * add --no-auto param to ceph-volume
>
> PVE/API2/Ceph/OSD.pm | 28 ++++++++++++++++++++++++----
> 1 file changed, 24 insertions(+), 4 deletions(-)
>
> diff --git a/PVE/API2/Ceph/OSD.pm b/PVE/API2/Ceph/OSD.pm
> index ded35990..3a51bebc 100644
> --- a/PVE/API2/Ceph/OSD.pm
> +++ b/PVE/API2/Ceph/OSD.pm
> @@ -275,6 +275,13 @@ __PACKAGE__->register_method ({
> type => 'string',
> description => "Set the device class of the OSD in crush."
> },
> + 'osds-per-device' => {
> + optional => 1,
> + type => 'integer',
> + minimum => '1',
> + description => 'OSD services per physical device. Only useful for fast ".
> + "NVME devices to utilize their performance better.',
> + },
> },
> },
> returns => { type => 'string' },
> @@ -294,6 +301,15 @@ __PACKAGE__->register_method ({
> # extract parameter info and fail if a device is set more than once
> my $devs = {};
>
> + # allow 'osds-per-device' only without dedicated db and/or wal devs. We cannot specify them with
> + # 'ceph-volume lvm batch' and they don't make a lot of sense on fast NVMEs anyway.
> + if ($param->{'osds-per-device'}) {
> + for my $type ( qw(db_dev wal_dev) ) {
> + raise_param_exc({ $type => "canot use 'osds-per-device' parameter with '${type}'" })
> + if $param->{$type};
> + }
> + }
> +
> my $ceph_conf = cfs_read_file('ceph.conf');
>
> my $osd_network = $ceph_conf->{global}->{cluster_network};
> @@ -364,8 +380,6 @@ __PACKAGE__->register_method ({
> my $monstat = $rados->mon_command({ prefix => 'quorum_status' });
>
> die "unable to get fsid\n" if !$monstat->{monmap} || !$monstat->{monmap}->{fsid};
Does it make sense to still keep this line above if we don't use fsid
anymore? (And if not, $monstat then also becomes unused.)
> - my $fsid = $monstat->{monmap}->{fsid};
> - $fsid = $1 if $fsid =~ m/^([0-9a-f\-]+)$/;
>
> my $ceph_bootstrap_osd_keyring = PVE::Ceph::Tools::get_config('ceph_bootstrap_osd_keyring');
>
> @@ -470,7 +484,10 @@ __PACKAGE__->register_method ({
> $test_disk_requirements->($disklist);
>
> my $dev_class = $param->{'crush-device-class'};
> - my $cmd = ['ceph-volume', 'lvm', 'create', '--cluster-fsid', $fsid ];
> + # create allows for detailed configuration of DB and WAL devices
> + # batch for easy creation of multiple OSDs (per device)
> + my $create_mode = $param->{'osds-per-device'} ? 'batch' : 'create';
> + my $cmd = ['ceph-volume', 'lvm', $create_mode ];
> push @$cmd, '--crush-device-class', $dev_class if $dev_class;
>
> my $devname = $devs->{dev}->{name};
> @@ -504,8 +521,11 @@ __PACKAGE__->register_method ({
> push @$cmd, "--block.$type", $part_or_lv;
> }
>
> - push @$cmd, '--data', $devpath;
> + push @$cmd, '--data' if $create_mode eq 'create';
> + push @$cmd, $devpath;
^ not the biggest fan of mixing positional arguments and options - does
`ceph-volume` support `--` to separate positional arguments explicitly?
If so, we should use it.
ceph-volume lvm create --osds-per-device 3 -- /dev/path
as opposed to
ceph-volume lvm create /dev/path --osds-per-device 3
that's just bad style and potentially dangerous
> push @$cmd, '--dmcrypt' if $param->{encrypted};
> + push @$cmd, '--osds-per-device', $param->{'osds-per-device'}, '--yes', '--no-auto'
> + if $create_mode eq 'batch';
>
> PVE::Diskmanage::wipe_blockdev($devpath);
>
> --
> 2.39.2
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [pve-devel] [PATCH manager v2] fix #4631: ceph: osd: create: add osds-per-device
2023-08-23 9:05 ` Wolfgang Bumiller
@ 2023-08-23 9:38 ` Aaron Lauterer
0 siblings, 0 replies; 3+ messages in thread
From: Aaron Lauterer @ 2023-08-23 9:38 UTC (permalink / raw)
To: Wolfgang Bumiller; +Cc: pve-devel
>>
>> die "unable to get fsid\n" if !$monstat->{monmap} || !$monstat->{monmap}->{fsid};
>
> Does it make sense to still keep this line above if we don't use fsid
> anymore? (And if not, $monstat then also becomes unused.)
>
thanks for catching this, will remove it
>> - my $fsid = $monstat->{monmap}->{fsid};
>> - $fsid = $1 if $fsid =~ m/^([0-9a-f\-]+)$/;
>>
>> my $ceph_bootstrap_osd_keyring = PVE::Ceph::Tools::get_config('ceph_bootstrap_osd_keyring');
>>
>> @@ -470,7 +484,10 @@ __PACKAGE__->register_method ({
>> $test_disk_requirements->($disklist);
>>
>> my $dev_class = $param->{'crush-device-class'};
>> - my $cmd = ['ceph-volume', 'lvm', 'create', '--cluster-fsid', $fsid ];
>> + # create allows for detailed configuration of DB and WAL devices
>> + # batch for easy creation of multiple OSDs (per device)
>> + my $create_mode = $param->{'osds-per-device'} ? 'batch' : 'create';
>> + my $cmd = ['ceph-volume', 'lvm', $create_mode ];
>> push @$cmd, '--crush-device-class', $dev_class if $dev_class;
>>
>> my $devname = $devs->{dev}->{name};
>> @@ -504,8 +521,11 @@ __PACKAGE__->register_method ({
>> push @$cmd, "--block.$type", $part_or_lv;
>> }
>>
>> - push @$cmd, '--data', $devpath;
>> + push @$cmd, '--data' if $create_mode eq 'create';
>> + push @$cmd, $devpath;
>
> ^ not the biggest fan of mixing positional arguments and options - does
> `ceph-volume` support `--` to separate positional arguments explicitly?
> If so, we should use it.
> ceph-volume lvm create --osds-per-device 3 -- /dev/path
> as opposed to
> ceph-volume lvm create /dev/path --osds-per-device 3
> that's just bad style and potentially dangerous
good point. I'll rework the building of the call
>
>> push @$cmd, '--dmcrypt' if $param->{encrypted};
>> + push @$cmd, '--osds-per-device', $param->{'osds-per-device'}, '--yes', '--no-auto'
>> + if $create_mode eq 'batch';
>>
>> PVE::Diskmanage::wipe_blockdev($devpath);
>>
>> --
>> 2.39.2
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-08-23 9:38 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-21 11:45 [pve-devel] [PATCH manager v2] fix #4631: ceph: osd: create: add osds-per-device Aaron Lauterer
2023-08-23 9:05 ` Wolfgang Bumiller
2023-08-23 9:38 ` Aaron Lauterer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox