public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Aaron Lauterer <a.lauterer@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH manager] fix #4631: ceph: osd: create: add osds-per-device
Date: Tue, 18 Apr 2023 14:26:46 +0200	[thread overview]
Message-ID: <20230418122646.3079833-1-a.lauterer@proxmox.com> (raw)

Allows to automatically create multiple OSDs per physical device. The
main use case are fast NVME drives that would be bottlenecked by a
single OSD service.

By using the 'ceph-volume lvm batch' command instead of the 'ceph-volume
lvm create' for multiple OSDs / device, we don't have to deal with the
split of the drive ourselves.

But this means that the parameters to specify a DB or WAL device won't
work as the 'batch' command doesn't use them. Dedicated DB and WAL
devices don't make much sense anyway if we place the OSDs on fast NVME
drives.

Some other changes to how the command is built were needed as well, as
the 'batch' command needs the path to the disk as a positional argument,
not as '--data /dev/sdX'.
We drop the '--cluster-fsid' paramter because the 'batch' command
doesn't accept it. The 'create' will fall back to reading it from the
ceph.conf file.

Removal of OSDs works as expected without any code changes. As long as
there are other OSDs on a disk, the VG & PV won't be removed, even if
'cleanup' is enabled.

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
---

There are a few other places where we can improve the OSD handling if
multiple OSDs are on one physical disk.

* Disk overview in the Node -> Disks panel: show all OSD IDs present on
  the disk instead of the last one in the list that was parsed
* Create window: allow to select disks that already have OSDs on it,
    but still have free space available

These things will need additions and changes to the API. Especially for
the changes we probably want to target the Proxmox VE 8 release.

 PVE/API2/Ceph/OSD.pm | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/PVE/API2/Ceph/OSD.pm b/PVE/API2/Ceph/OSD.pm
index ded35990..43dad56b 100644
--- a/PVE/API2/Ceph/OSD.pm
+++ b/PVE/API2/Ceph/OSD.pm
@@ -275,6 +275,12 @@ __PACKAGE__->register_method ({
 		type => 'string',
 		description => "Set the device class of the OSD in crush."
 	    },
+	    'osds-per-device' => {
+		optional => 1,
+		type => 'number',
+		minimum => '1',
+		description => 'OSD services per physical device. Can improve fast NVME utilization.',
+	    },
 	},
     },
     returns => { type => 'string' },
@@ -294,6 +300,15 @@ __PACKAGE__->register_method ({
 	# extract parameter info and fail if a device is set more than once
 	my $devs = {};
 
+	# allow 'osds-per-device' only without dedicated db and/or wal devs. We cannot specify them with
+	# 'ceph-volume lvm batch' and they don't make a lot of sense on fast NVMEs anyway.
+	if ($param->{'osds-per-device'}) {
+	    for my $type ( qw(db_dev wal_dev) ) {
+		die "Cannot use 'osds-per-device' parameter with '${type}'"
+		    if $param->{$type};
+	    }
+	}
+
 	my $ceph_conf = cfs_read_file('ceph.conf');
 
 	my $osd_network = $ceph_conf->{global}->{cluster_network};
@@ -364,8 +379,6 @@ __PACKAGE__->register_method ({
 	my $monstat = $rados->mon_command({ prefix => 'quorum_status' });
 
 	die "unable to get fsid\n" if !$monstat->{monmap} || !$monstat->{monmap}->{fsid};
-	my $fsid = $monstat->{monmap}->{fsid};
-        $fsid = $1 if $fsid =~ m/^([0-9a-f\-]+)$/;
 
 	my $ceph_bootstrap_osd_keyring = PVE::Ceph::Tools::get_config('ceph_bootstrap_osd_keyring');
 
@@ -470,7 +483,10 @@ __PACKAGE__->register_method ({
 		$test_disk_requirements->($disklist);
 
 		my $dev_class = $param->{'crush-device-class'};
-		my $cmd = ['ceph-volume', 'lvm', 'create', '--cluster-fsid', $fsid ];
+		# create allows for detailed configuration of DB and WAL devices
+		# batch for easy creation of multiple OSDs (per device)
+		my $create_mode = $param->{'osds-per-device'} ? 'batch' : 'create';
+		my $cmd = ['ceph-volume', 'lvm', $create_mode ];
 		push @$cmd, '--crush-device-class', $dev_class if $dev_class;
 
 		my $devname = $devs->{dev}->{name};
@@ -504,8 +520,11 @@ __PACKAGE__->register_method ({
 		    push @$cmd, "--block.$type", $part_or_lv;
 		}
 
-		push @$cmd, '--data', $devpath;
+		push @$cmd, '--data' if $create_mode eq 'create';
+		push @$cmd, $devpath;
 		push @$cmd, '--dmcrypt' if $param->{encrypted};
+		push @$cmd, '--osds-per-device', $param->{'osds-per-device'}, '--yes'
+		    if $create_mode eq 'batch';
 
 		PVE::Diskmanage::wipe_blockdev($devpath);
 
-- 
2.30.2





             reply	other threads:[~2023-04-18 12:27 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-18 12:26 Aaron Lauterer [this message]
2023-08-21  8:20 ` Fiona Ebner
2023-08-21 10:51   ` Aaron Lauterer
2023-08-21 11:27     ` Fiona Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230418122646.3079833-1-a.lauterer@proxmox.com \
    --to=a.lauterer@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal