From: Wolfgang Bumiller <w.bumiller@proxmox.com>
To: Aaron Lauterer <a.lauterer@proxmox.com>
Cc: pve-devel@lists.proxmox.com
Subject: Re: [pve-devel] [PATCH manager v2] fix #4631: ceph: osd: create: add osds-per-device
Date: Wed, 23 Aug 2023 11:05:24 +0200 [thread overview]
Message-ID: <orv7p6bohtn7qss26vni3xl4fpublbpr2ejhav3h7nyuxd3gym@242lc6srr6ca> (raw)
In-Reply-To: <20230821114554.1775149-1-a.lauterer@proxmox.com>
On Mon, Aug 21, 2023 at 01:45:54PM +0200, Aaron Lauterer wrote:
> Allows to automatically create multiple OSDs per physical device. The
> main use case are fast NVME drives that would be bottlenecked by a
> single OSD service.
>
> By using the 'ceph-volume lvm batch' command instead of the 'ceph-volume
> lvm create' for multiple OSDs / device, we don't have to deal with the
> split of the drive ourselves.
>
> But this means that the parameters to specify a DB or WAL device won't
> work as the 'batch' command doesn't use them. Dedicated DB and WAL
> devices don't make much sense anyway if we place the OSDs on fast NVME
> drives.
>
> Some other changes to how the command is built were needed as well, as
> the 'batch' command needs the path to the disk as a positional argument,
> not as '--data /dev/sdX'.
> We drop the '--cluster-fsid' paramter because the 'batch' command
> doesn't accept it. The 'create' will fall back to reading it from the
> ceph.conf file.
>
> Removal of OSDs works as expected without any code changes. As long as
> there are other OSDs on a disk, the VG & PV won't be removed, even if
> 'cleanup' is enabled.
>
> The '--no-auto' paramter is used to avoid the following deprecation
> warning:
> ```
> --> DEPRECATION NOTICE
> --> You are using the legacy automatic disk sorting behavior
> --> The Pacific release will change the default to --no-auto
> --> passed data devices: 1 physical, 0 LVM
> --> relative data size: 0.3333333333333333
> ```
>
> Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
> ---
> changes since v1:
> * change parameter type to integer
> * rephrase parameter description to make it clear it is only useful for
> NVMEs
> * change error handling to raise_param_exc
> * add --no-auto param to ceph-volume
>
> PVE/API2/Ceph/OSD.pm | 28 ++++++++++++++++++++++++----
> 1 file changed, 24 insertions(+), 4 deletions(-)
>
> diff --git a/PVE/API2/Ceph/OSD.pm b/PVE/API2/Ceph/OSD.pm
> index ded35990..3a51bebc 100644
> --- a/PVE/API2/Ceph/OSD.pm
> +++ b/PVE/API2/Ceph/OSD.pm
> @@ -275,6 +275,13 @@ __PACKAGE__->register_method ({
> type => 'string',
> description => "Set the device class of the OSD in crush."
> },
> + 'osds-per-device' => {
> + optional => 1,
> + type => 'integer',
> + minimum => '1',
> + description => 'OSD services per physical device. Only useful for fast ".
> + "NVME devices to utilize their performance better.',
> + },
> },
> },
> returns => { type => 'string' },
> @@ -294,6 +301,15 @@ __PACKAGE__->register_method ({
> # extract parameter info and fail if a device is set more than once
> my $devs = {};
>
> + # allow 'osds-per-device' only without dedicated db and/or wal devs. We cannot specify them with
> + # 'ceph-volume lvm batch' and they don't make a lot of sense on fast NVMEs anyway.
> + if ($param->{'osds-per-device'}) {
> + for my $type ( qw(db_dev wal_dev) ) {
> + raise_param_exc({ $type => "canot use 'osds-per-device' parameter with '${type}'" })
> + if $param->{$type};
> + }
> + }
> +
> my $ceph_conf = cfs_read_file('ceph.conf');
>
> my $osd_network = $ceph_conf->{global}->{cluster_network};
> @@ -364,8 +380,6 @@ __PACKAGE__->register_method ({
> my $monstat = $rados->mon_command({ prefix => 'quorum_status' });
>
> die "unable to get fsid\n" if !$monstat->{monmap} || !$monstat->{monmap}->{fsid};
Does it make sense to still keep this line above if we don't use fsid
anymore? (And if not, $monstat then also becomes unused.)
> - my $fsid = $monstat->{monmap}->{fsid};
> - $fsid = $1 if $fsid =~ m/^([0-9a-f\-]+)$/;
>
> my $ceph_bootstrap_osd_keyring = PVE::Ceph::Tools::get_config('ceph_bootstrap_osd_keyring');
>
> @@ -470,7 +484,10 @@ __PACKAGE__->register_method ({
> $test_disk_requirements->($disklist);
>
> my $dev_class = $param->{'crush-device-class'};
> - my $cmd = ['ceph-volume', 'lvm', 'create', '--cluster-fsid', $fsid ];
> + # create allows for detailed configuration of DB and WAL devices
> + # batch for easy creation of multiple OSDs (per device)
> + my $create_mode = $param->{'osds-per-device'} ? 'batch' : 'create';
> + my $cmd = ['ceph-volume', 'lvm', $create_mode ];
> push @$cmd, '--crush-device-class', $dev_class if $dev_class;
>
> my $devname = $devs->{dev}->{name};
> @@ -504,8 +521,11 @@ __PACKAGE__->register_method ({
> push @$cmd, "--block.$type", $part_or_lv;
> }
>
> - push @$cmd, '--data', $devpath;
> + push @$cmd, '--data' if $create_mode eq 'create';
> + push @$cmd, $devpath;
^ not the biggest fan of mixing positional arguments and options - does
`ceph-volume` support `--` to separate positional arguments explicitly?
If so, we should use it.
ceph-volume lvm create --osds-per-device 3 -- /dev/path
as opposed to
ceph-volume lvm create /dev/path --osds-per-device 3
that's just bad style and potentially dangerous
> push @$cmd, '--dmcrypt' if $param->{encrypted};
> + push @$cmd, '--osds-per-device', $param->{'osds-per-device'}, '--yes', '--no-auto'
> + if $create_mode eq 'batch';
>
> PVE::Diskmanage::wipe_blockdev($devpath);
>
> --
> 2.39.2
next prev parent reply other threads:[~2023-08-23 9:05 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-21 11:45 Aaron Lauterer
2023-08-23 9:05 ` Wolfgang Bumiller [this message]
2023-08-23 9:38 ` Aaron Lauterer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=orv7p6bohtn7qss26vni3xl4fpublbpr2ejhav3h7nyuxd3gym@242lc6srr6ca \
--to=w.bumiller@proxmox.com \
--cc=a.lauterer@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox