From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 6FD249E99F for ; Thu, 2 Nov 2023 15:28:58 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 513E2121F1 for ; Thu, 2 Nov 2023 15:28:28 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Thu, 2 Nov 2023 15:28:23 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 68C7D43C2B for ; Thu, 2 Nov 2023 15:28:23 +0100 (CET) Message-ID: <5e29095a-a07e-ef36-22e9-90b0a2f78f90@proxmox.com> Date: Thu, 2 Nov 2023 15:28:22 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.14.0 From: Filip Schauer To: Wolfgang Bumiller Cc: pve-devel@lists.proxmox.com References: <20231024125554.131800-1-f.schauer@proxmox.com> <20231024125554.131800-2-f.schauer@proxmox.com> <2rzmdty5ax4v5fssxkvjey4rfhzrcdmjzx5dti4m73lpbekqcf@3wna2j3j2jks> Content-Language: en-US In-Reply-To: <2rzmdty5ax4v5fssxkvjey4rfhzrcdmjzx5dti4m73lpbekqcf@3wna2j3j2jks> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-SPAM-LEVEL: Spam detection results: 0 AWL 1.677 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -3.777 Looks like a legit reply (A) SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [lxc.pm, config.pm, mount.auto, proxmox.com] Subject: Re: [pve-devel] [PATCH v2 container 1/1] Add device passthrough X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Nov 2023 14:28:58 -0000 On 30/10/2023 14:34, Wolfgang Bumiller wrote: > On Tue, Oct 24, 2023 at 02:55:53PM +0200, Filip Schauer wrote: >> Add a dev[n] argument to the container config to pass devices through to >> a container. A device can be passed by its path. Alternatively a mapped >> USB device can be passed through with usbmapping=. >> >> Signed-off-by: Filip Schauer >> --- >> src/PVE/LXC.pm | 34 +++++++++++++++++++++++- >> src/PVE/LXC/Config.pm | 60 +++++++++++++++++++++++++++++++++++++++++++ >> 2 files changed, 93 insertions(+), 1 deletion(-) >> >> diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm >> index c9b5ba7..a3ddb62 100644 >> --- a/src/PVE/LXC.pm >> +++ b/src/PVE/LXC.pm >> @@ -5,7 +5,8 @@ use warnings; >> >> use Cwd qw(); >> use Errno qw(ELOOP ENOTDIR EROFS ECONNREFUSED EEXIST); >> -use Fcntl qw(O_RDONLY O_WRONLY O_NOFOLLOW O_DIRECTORY); >> +use Fcntl qw(O_RDONLY O_WRONLY O_NOFOLLOW O_DIRECTORY :mode); >> +use File::Basename; >> use File::Path; >> use File::Spec; >> use IO::Poll qw(POLLIN POLLHUP); >> @@ -639,6 +640,37 @@ sub update_lxc_config { >> $raw .= "lxc.mount.auto = sys:mixed\n"; >> } >> >> + # Clear passthrough directory from previous run >> + my $passthrough_dir = "/var/lib/lxc/$vmid/passthrough"; >> + File::Path::rmtree($passthrough_dir); > I think we need to make a few changes here. > > First: we don't necessarily need this directory. > Having a device list would certainly be nice, but it makes more sense to > just have a file we can easily parse (possibly even just a json hash), > like the `devices` file we already create in the pre-start hook, except > prepared *for* the pre-start hook, which *should* be able to just > `mknod` the devices right into the container's `/dev` on startup. Devices mknoded into the container's /dev directory in the pre-start hook will not be visible in the container once it is fully started. Meanwhile mknoding a device to a different path inside the container works fine. It seems that LXC mounts over the /dev directory. This can be solved by calling mknod in lxc-pve-autodev-hook, but this does not work with unprivileged containers without the mknod capability. So are bind mounts our only option without modifying LXC, or am I overlooking something? > We'd also avoid "lingering" device nodes with potentially harmful > uid/permissions in /var, which is certainly better from a security POV. > > But note that we do need the `lxc.cgroup2.*` entries before starting the > container in order to ensure the devices cgroup has the right > permissions. > >> + >> + PVE::LXC::Config->foreach_passthrough_device($conf, sub { >> + my ($key, $sanitized_path) = @_; >> + >> + my $absolute_path = "/$sanitized_path"; >> + my ($mode, $rdev) = (stat($absolute_path))[2, 6]; >> + die "Could not find major and minor ids of device $absolute_path.\n" >> + unless ($mode && $rdev); >> + >> + my $major = PVE::Tools::dev_t_major($rdev); >> + my $minor = PVE::Tools::dev_t_minor($rdev); >> + my $device_type_char = S_ISBLK($mode) ? 'b' : 'c'; >> + my $passthrough_device_path = "$passthrough_dir/$sanitized_path"; >> + File::Path::make_path(dirname($passthrough_device_path)); >> + PVE::Tools::run_command([ >> + '/usr/bin/mknod', >> + '-m', '0660', >> + $passthrough_device_path, >> + $device_type_char, >> + $major, >> + $minor >> + ]); > It's probably worth adding a helper for the mknod syscall to > `PVE::Tools`, there are a bunch of syscalls in there already. > >> + chown 100000, 100000, $passthrough_device_path if ($unprivileged); > ^ This isn't necessarily the correct id. Users may have custom id > mappings. > `PVE::LXC::parse_id_maps($conf)` returns the mapping alongside the root > uid and gid. (See for example `sub mount_all` for how it's used. > >> + >> + $raw .= "lxc.cgroup2.devices.allow = $device_type_char $major:$minor rw\n"; >> + $raw .= "lxc.mount.entry = $passthrough_device_path $sanitized_path none bind,create=file\n"; >> + }); >> + >> # WARNING: DO NOT REMOVE this without making sure that loop device nodes >> # cannot be exposed to the container with r/w access (cgroup perms). >> # When this is enabled mounts will still remain in the monitor's namespace >> diff --git a/src/PVE/LXC/Config.pm b/src/PVE/LXC/Config.pm >> index 56e1f10..edd813e 100644 >> --- a/src/PVE/LXC/Config.pm >> +++ b/src/PVE/LXC/Config.pm >> @@ -29,6 +29,7 @@ mkdir $lockdir; >> mkdir "/etc/pve/nodes/$nodename/lxc"; >> my $MAX_MOUNT_POINTS = 256; >> my $MAX_UNUSED_DISKS = $MAX_MOUNT_POINTS; >> +my $MAX_DEVICES = 256; >> >> # BEGIN implemented abstract methods from PVE::AbstractConfig >> >> @@ -908,6 +909,49 @@ for (my $i = 0; $i < $MAX_UNUSED_DISKS; $i++) { >> } >> } >> >> +PVE::JSONSchema::register_format('pve-lxc-dev-string', \&verify_lxc_dev_string); >> +sub verify_lxc_dev_string { >> + my ($dev, $noerr) = @_; >> + >> + if ( >> + $dev =~m@/\.\.?/@ || >> + $dev =~m@/\.\.?$@ || >> + $dev !~ m!^/dev/! >> + ) { >> + return undef if $noerr; >> + die "$dev is not a valid device path\n"; >> + } >> + >> + return $dev; >> +} >> + >> +my $dev_desc = { >> + path => { >> + optional => 1, >> + type => 'string', >> + default_key => 1, >> + format => 'pve-lxc-dev-string', >> + format_description => 'Path', >> + description => 'Device to pass through to the container', >> + verbose_description => 'Path to the device to pass through to the container' >> + }, >> + usbmapping => { >> + optional => 1, >> + type => 'string', >> + format => 'pve-configid', >> + format_description => 'mapping-id', >> + description => 'The ID of a cluster wide USB mapping.' >> + } >> +}; >> + >> +for (my $i = 0; $i < $MAX_DEVICES; $i++) { >> + $confdesc->{"dev$i"} = { >> + optional => 1, >> + type => 'string', format => $dev_desc, >> + description => "Device to pass through to the container", >> + } >> +} >> + >> sub parse_pct_config { >> my ($filename, $raw, $strict) = @_; >> >> @@ -1255,6 +1299,22 @@ sub parse_volume { >> return; >> } >> >> +sub parse_device { >> + my ($class, $device_string, $noerr) = @_; >> + >> + my $res; >> + eval { $res = PVE::JSONSchema::parse_property_string($dev_desc, $device_string) }; >> + if ($@) { >> + return undef if $noerr; >> + die $@; >> + } >> + >> + die "Either path or usbmapping has to be defined" >> + unless (defined($res->{path}) || defined($res->{usbmapping})); >> + >> + return $res; >> +} >> + >> sub print_volume { >> my ($class, $key, $volume) = @_; >> >> -- >> 2.39.2 >> >> >> >> _______________________________________________ >> pve-devel mailing list >> pve-devel@lists.proxmox.com >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel >> >>