public inbox for pve-user@lists.proxmox.com
 help / color / mirror / Atom feed
* [PVE-User] DeviceMapper devices get filtered by Proxmox
@ 2023-07-20 12:21 Uwe Sauter
  2023-07-25  7:24 ` Uwe Sauter
       [not found] ` <dc743429b8e92c12ec74c8844605f4b1@antreich.com>
  0 siblings, 2 replies; 4+ messages in thread
From: Uwe Sauter @ 2023-07-20 12:21 UTC (permalink / raw)
  To: PVE User List

Dear all,

I'd like to use some existing hardware to create a Ceph cluster. Unfortunately, all my external
disks are filtered by the WebUI and by pceceph command which prevents using them as OSD disks.

My servers are connected to a SAS-JBOD containing 60 SAS-HDDs. The connection is done via 2
dual-port SAS-HBAs, each port connecting to both of the JBOD controllers.
This means that all my external disks can be reached via four SAS-path and thus I see 240 disks in
/dev. (Actually, I see 244 /dev/sd* devices as there are also 4 internal disks for the OS…)
Using multipathd, each disk with its four entries in /dev is available a fifth time in
/dev/mapper/${WWN} which is a symlink to /dev/dm-${NUMBER}.

Both /dev/dm-${NUMBER} and /dev/mapper/${WWN} entries are filtered by Proxmox.

Is there a way to remove this filter or to define my own set of filters?

Using the /dev/sd* entries would be possible but I would loose the redundancy provided by multipathd.

Regards,

	Uwe



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PVE-User] DeviceMapper devices get filtered by Proxmox
  2023-07-20 12:21 [PVE-User] DeviceMapper devices get filtered by Proxmox Uwe Sauter
@ 2023-07-25  7:24 ` Uwe Sauter
       [not found] ` <dc743429b8e92c12ec74c8844605f4b1@antreich.com>
  1 sibling, 0 replies; 4+ messages in thread
From: Uwe Sauter @ 2023-07-25  7:24 UTC (permalink / raw)
  To: PVE User List

So, I've been looking further into this and indeed, there seem to be very strict filters regarding
the block device names that Proxmox allows to be used.

/usr/share/perl5/PVE/Diskmanage.pm

512         # whitelisting following devices
513         # - hdX         ide block device
514         # - sdX         scsi/sata block device
515         # - vdX         virtIO block device
516         # - xvdX:       xen virtual block device
517         # - nvmeXnY:    nvme devices
518         # - cciss!cXnY  cciss devices
519         print Dumper($dev);
520         return if $dev !~ m/^(h|s|x?v)d[a-z]+$/ &&
521                   $dev !~ m/^nvme\d+n\d+$/ &&
522                   $dev !~ m/^cciss\!c\d+d\d+$/;

I don't understand all the consequences of allowing ALL ^dm-\d+$ devices but with proper filtering
it should be possible to allow multipath devices (and given that there might be udev rules that
create additinal symlinks below /dev, each device's name should be resolved to its canonical name
before checking).

To give an example I have sdc as an internal disk that contains a Bluestore OSD:

ls -l /dev/mapper/ | grep dm-0
lrwxrwxrwx 1 root root       7 Jul 22 23:23
ceph--aa6b72f7--f185--44b4--9922--6ae4e6278d10-osd--block--b0a60266--90c0--444f--ae12--328ecfebd87d
-> ../dm-0

Devices sde, sdaw, sdco, sdeg are the same SAS disk and form multipath device dm-2:

# ls -la /dev/mapper/ | grep dm-2$
lrwxrwxrwx  1 root root       7 Jul 22 23:23 35000cca26a7402e4 -> ../dm-2
# multipath -ll | grep -A6 'dm-2 '
35000cca26a7402e4 dm-2 HGST,HUH721010AL5200
size=9.1T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 1:0:0:0  sde  8:64    active ready running
  |- 1:0:45:0 sdaw 67:0    active ready running
  |- 5:0:0:0  sdco 69:192  active ready running
  `- 5:0:45:0 sdeg 128:128 active ready running

Both dm-0 and dm-2 are device mapper devices but it is possible to get their type:

# dmsetup status /dev/dm-0
0 3750739968 linear

# dmsetup status /dev/dm-2
0 19532873728 multipath 2 0 0 0 1 1 A 0 4 2 8:64 A 0 0 1 67:0 A 0 0 1 69:192 A 0 0 1 128:128 A 0 0 1

Another way, possibly less exact, to filter is if /sys/block/dm-X/slaves/ contains more than one entry.

# ls -l /sys/block/dm-0/slaves
total 0
lrwxrwxrwx 1 root root 0 Jul 25 09:01 sdc ->
../../../../pci0000:00/0000:00:11.4/ata3/host3/target3:0:0/3:0:0:0/block/sdc

# ls -l /sys/block/dm-2/slaves
total 0
lrwxrwxrwx 1 root root 0 Jul 25 09:03 sdaw ->
../../../../pci0000:80/0000:80:02.0/0000:82:00.0/host1/port-1:1/expander-1:1/port-1:1:0/end_device-1:1:0/target1:0:45/1:0:45:0/block/sdaw
lrwxrwxrwx 1 root root 0 Jul 25 09:03 sdco ->
../../../../pci0000:80/0000:80:03.0/0000:83:00.0/host5/port-5:0/expander-5:0/port-5:0:0/end_device-5:0:0/target5:0:0/5:0:0:0/block/sdco
lrwxrwxrwx 1 root root 0 Jul 25 09:03 sde ->
../../../../pci0000:80/0000:80:02.0/0000:82:00.0/host1/port-1:0/expander-1:0/port-1:0:0/end_device-1:0:0/target1:0:0/1:0:0:0/block/sde
lrwxrwxrwx 1 root root 0 Jul 25 09:03 sdeg ->
../../../../pci0000:80/0000:80:03.0/0000:83:00.0/host5/port-5:1/expander-5:1/port-5:1:0/end_device-5:1:0/target5:0:45/5:0:45:0/block/sdeg


Is there any chance that multipath devices will be supported by PVE in the near future?
I'd be willing to test on my non-production system…


Regards,

	Uwe


Am 20.07.23 um 14:21 schrieb Uwe Sauter:
> Dear all,
> 
> I'd like to use some existing hardware to create a Ceph cluster. Unfortunately, all my external
> disks are filtered by the WebUI and by pceceph command which prevents using them as OSD disks.
> 
> My servers are connected to a SAS-JBOD containing 60 SAS-HDDs. The connection is done via 2
> dual-port SAS-HBAs, each port connecting to both of the JBOD controllers.
> This means that all my external disks can be reached via four SAS-path and thus I see 240 disks in
> /dev. (Actually, I see 244 /dev/sd* devices as there are also 4 internal disks for the OS…)
> Using multipathd, each disk with its four entries in /dev is available a fifth time in
> /dev/mapper/${WWN} which is a symlink to /dev/dm-${NUMBER}.
> 
> Both /dev/dm-${NUMBER} and /dev/mapper/${WWN} entries are filtered by Proxmox.
> 
> Is there a way to remove this filter or to define my own set of filters?
> 
> Using the /dev/sd* entries would be possible but I would loose the redundancy provided by multipathd.
> 
> Regards,
> 
> 	Uwe




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PVE-User] DeviceMapper devices get filtered by Proxmox
       [not found] ` <dc743429b8e92c12ec74c8844605f4b1@antreich.com>
@ 2023-07-25 11:48   ` Uwe Sauter
       [not found]   ` <b75e2575243c3366cdeb52e073947566@antreich.com>
  1 sibling, 0 replies; 4+ messages in thread
From: Uwe Sauter @ 2023-07-25 11:48 UTC (permalink / raw)
  To: Alwin Antreich, Proxmox VE user list

Hi Alwin,

Am 25.07.23 um 12:40 schrieb Alwin Antreich:
> Hi Uwe,
> 
> July 25, 2023 9:24 AM, "Uwe Sauter" <uwe.sauter.de@gmail.com> wrote:
> 
>> So, I've been looking further into this and indeed, there seem to be very strict filters regarding
>> the block device names that Proxmox allows to be used.
>>
>> /usr/share/perl5/PVE/Diskmanage.pm
>>
>> 512 # whitelisting following devices
>> 513 # - hdX ide block device
>> 514 # - sdX scsi/sata block device
>> 515 # - vdX virtIO block device
>> 516 # - xvdX: xen virtual block device
>> 517 # - nvmeXnY: nvme devices
>> 518 # - cciss!cXnY cciss devices
>> 519 print Dumper($dev);
>> 520 return if $dev !~ m/^(h|s|x?v)d[a-z]+$/ &&
>> 521 $dev !~ m/^nvme\d+n\d+$/ &&
>> 522 $dev !~ m/^cciss\!c\d+d\d+$/;
>>
>> I don't understand all the consequences of allowing ALL ^dm-\d+$ devices but with proper filtering
>> it should be possible to allow multipath devices (and given that there might be udev rules that
>> create additinal symlinks below /dev, each device's name should be resolved to its canonical name
>> before checking).
> It is also a matter of ceph support [0]. Aside the extra complexity, using the amount of HDDs is not a good use-case for virtualization. And HDDs definitely need the DB/WAL on a separate device (60x disks -> 5x NVMe).

Well, if the documentation is to be trusted, there is multipath support since Octupus.
My use-case is not hyper-converged virtualization; I simply am using Proxmox due to its good UI and
integration of Ceph (and because it does not rely on containers to deploy Ceph).

I am aware that HDDs will need some amount of flash but I do have a couple of SAS-SSDs at hand that
I can put into the JBODs. And currently all this is just a proof of concept.

> Best to set it up with ceph-volume directly. See the forum post [1] for the experience of other users.

Thanks for the link though I have to support the arguments of the forum members that multipath is an
enterprise feature that should be supported by an enterprise-class virtualization solution.


Best,

	Uwe

> Cheers,
> Alwin
> 
> [0] https://docs.ceph.com/en/latest/ceph-volume/lvm/prepare/#multipath-support
> [1] https://forum.proxmox.com/threads/ceph-with-multipath.70813/




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PVE-User] DeviceMapper devices get filtered by Proxmox
       [not found]   ` <b75e2575243c3366cdeb52e073947566@antreich.com>
@ 2023-07-26  8:37     ` Uwe Sauter
  0 siblings, 0 replies; 4+ messages in thread
From: Uwe Sauter @ 2023-07-26  8:37 UTC (permalink / raw)
  To: Alwin Antreich, Proxmox VE user list

Good morning Alwin,

>> Well, if the documentation is to be trusted, there is multipath support since Octupus.
>> My use-case is not hyper-converged virtualization; I simply am using Proxmox due to its good UI and
>> integration of Ceph (and because it does not rely on containers to deploy Ceph).
> I understand, though cephadm isn't that horrible and there are still other ceph solutions out there. ;)

If you have air-gapped systems and no containers in your environment, using cephadm would require a
whole lot of effort just to make the containers available. On the other hand Proxmox provides this
very nice tool to mirror repositories… just saying.

>> I am aware that HDDs will need some amount of flash but I do have a couple of SAS-SSDs at hand that
>> I can put into the JBODs. And currently all this is just a proof of concept.
> Yeah but the ratio is 4 DB/WAL to 1 SSD, opposed to 12:1 for an NVMe.
> To add, you will need 512 GB RAM for OSDs (60 x 8GB) alone and at least a 32C/64T CPU. Probably some 2x 25 GbE NIC too, depending on the use-case.

For a PoC SSDs and smaller RAM size should be ok. The servers do have 40 cores and 2x 25GbE so that
isn't the problem. And once we see if Ceph fits to the rest of our environment we would invest in
properly sized hardware.

> Just saying, there are certain expectations with that many disks in one node.
>  
>> Thanks for the link though I have to support the arguments of the forum members that multipath is
>> an
>> enterprise feature that should be supported by an enterprise-class virtualization solution.
> Well, multipath is supported. Just not in combination with Ceph. And PVE is not a storage product (yet).

I need to disagree on that one. The WebUI disk overview does not show the multipath devices, only
the member disks. Yes, there is a comment stating that a device is a multipath member but no way to
select the multipath device itself. So, from a users point of view, PVE does not support multipath,
it just recognizes multipath members.
And the Create ZFS and Create LVM Volume Group setup pop-ups show neither multipath members nor
multipath devices.

Thanks

	Uwe



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-07-26  8:37 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-20 12:21 [PVE-User] DeviceMapper devices get filtered by Proxmox Uwe Sauter
2023-07-25  7:24 ` Uwe Sauter
     [not found] ` <dc743429b8e92c12ec74c8844605f4b1@antreich.com>
2023-07-25 11:48   ` Uwe Sauter
     [not found]   ` <b75e2575243c3366cdeb52e073947566@antreich.com>
2023-07-26  8:37     ` Uwe Sauter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal