From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 4A3508D6A for ; Wed, 23 Aug 2023 11:05:26 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 33D6714552 for ; Wed, 23 Aug 2023 11:05:26 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Wed, 23 Aug 2023 11:05:25 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 5FE42437B3 for ; Wed, 23 Aug 2023 11:05:25 +0200 (CEST) Date: Wed, 23 Aug 2023 11:05:24 +0200 From: Wolfgang Bumiller To: Aaron Lauterer Cc: pve-devel@lists.proxmox.com Message-ID: References: <20230821114554.1775149-1-a.lauterer@proxmox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230821114554.1775149-1-a.lauterer@proxmox.com> X-SPAM-LEVEL: Spam detection results: 0 AWL 0.101 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [PATCH manager v2] fix #4631: ceph: osd: create: add osds-per-device X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Aug 2023 09:05:26 -0000 On Mon, Aug 21, 2023 at 01:45:54PM +0200, Aaron Lauterer wrote: > Allows to automatically create multiple OSDs per physical device. The > main use case are fast NVME drives that would be bottlenecked by a > single OSD service. > > By using the 'ceph-volume lvm batch' command instead of the 'ceph-volume > lvm create' for multiple OSDs / device, we don't have to deal with the > split of the drive ourselves. > > But this means that the parameters to specify a DB or WAL device won't > work as the 'batch' command doesn't use them. Dedicated DB and WAL > devices don't make much sense anyway if we place the OSDs on fast NVME > drives. > > Some other changes to how the command is built were needed as well, as > the 'batch' command needs the path to the disk as a positional argument, > not as '--data /dev/sdX'. > We drop the '--cluster-fsid' paramter because the 'batch' command > doesn't accept it. The 'create' will fall back to reading it from the > ceph.conf file. > > Removal of OSDs works as expected without any code changes. As long as > there are other OSDs on a disk, the VG & PV won't be removed, even if > 'cleanup' is enabled. > > The '--no-auto' paramter is used to avoid the following deprecation > warning: > ``` > --> DEPRECATION NOTICE > --> You are using the legacy automatic disk sorting behavior > --> The Pacific release will change the default to --no-auto > --> passed data devices: 1 physical, 0 LVM > --> relative data size: 0.3333333333333333 > ``` > > Signed-off-by: Aaron Lauterer > --- > changes since v1: > * change parameter type to integer > * rephrase parameter description to make it clear it is only useful for > NVMEs > * change error handling to raise_param_exc > * add --no-auto param to ceph-volume > > PVE/API2/Ceph/OSD.pm | 28 ++++++++++++++++++++++++---- > 1 file changed, 24 insertions(+), 4 deletions(-) > > diff --git a/PVE/API2/Ceph/OSD.pm b/PVE/API2/Ceph/OSD.pm > index ded35990..3a51bebc 100644 > --- a/PVE/API2/Ceph/OSD.pm > +++ b/PVE/API2/Ceph/OSD.pm > @@ -275,6 +275,13 @@ __PACKAGE__->register_method ({ > type => 'string', > description => "Set the device class of the OSD in crush." > }, > + 'osds-per-device' => { > + optional => 1, > + type => 'integer', > + minimum => '1', > + description => 'OSD services per physical device. Only useful for fast ". > + "NVME devices to utilize their performance better.', > + }, > }, > }, > returns => { type => 'string' }, > @@ -294,6 +301,15 @@ __PACKAGE__->register_method ({ > # extract parameter info and fail if a device is set more than once > my $devs = {}; > > + # allow 'osds-per-device' only without dedicated db and/or wal devs. We cannot specify them with > + # 'ceph-volume lvm batch' and they don't make a lot of sense on fast NVMEs anyway. > + if ($param->{'osds-per-device'}) { > + for my $type ( qw(db_dev wal_dev) ) { > + raise_param_exc({ $type => "canot use 'osds-per-device' parameter with '${type}'" }) > + if $param->{$type}; > + } > + } > + > my $ceph_conf = cfs_read_file('ceph.conf'); > > my $osd_network = $ceph_conf->{global}->{cluster_network}; > @@ -364,8 +380,6 @@ __PACKAGE__->register_method ({ > my $monstat = $rados->mon_command({ prefix => 'quorum_status' }); > > die "unable to get fsid\n" if !$monstat->{monmap} || !$monstat->{monmap}->{fsid}; Does it make sense to still keep this line above if we don't use fsid anymore? (And if not, $monstat then also becomes unused.) > - my $fsid = $monstat->{monmap}->{fsid}; > - $fsid = $1 if $fsid =~ m/^([0-9a-f\-]+)$/; > > my $ceph_bootstrap_osd_keyring = PVE::Ceph::Tools::get_config('ceph_bootstrap_osd_keyring'); > > @@ -470,7 +484,10 @@ __PACKAGE__->register_method ({ > $test_disk_requirements->($disklist); > > my $dev_class = $param->{'crush-device-class'}; > - my $cmd = ['ceph-volume', 'lvm', 'create', '--cluster-fsid', $fsid ]; > + # create allows for detailed configuration of DB and WAL devices > + # batch for easy creation of multiple OSDs (per device) > + my $create_mode = $param->{'osds-per-device'} ? 'batch' : 'create'; > + my $cmd = ['ceph-volume', 'lvm', $create_mode ]; > push @$cmd, '--crush-device-class', $dev_class if $dev_class; > > my $devname = $devs->{dev}->{name}; > @@ -504,8 +521,11 @@ __PACKAGE__->register_method ({ > push @$cmd, "--block.$type", $part_or_lv; > } > > - push @$cmd, '--data', $devpath; > + push @$cmd, '--data' if $create_mode eq 'create'; > + push @$cmd, $devpath; ^ not the biggest fan of mixing positional arguments and options - does `ceph-volume` support `--` to separate positional arguments explicitly? If so, we should use it. ceph-volume lvm create --osds-per-device 3 -- /dev/path as opposed to ceph-volume lvm create /dev/path --osds-per-device 3 that's just bad style and potentially dangerous > push @$cmd, '--dmcrypt' if $param->{encrypted}; > + push @$cmd, '--osds-per-device', $param->{'osds-per-device'}, '--yes', '--no-auto' > + if $create_mode eq 'batch'; > > PVE::Diskmanage::wipe_blockdev($devpath); > > -- > 2.39.2