public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH container v4 0/4] implement device hotplug
@ 2025-07-28 12:47 Filip Schauer
  2025-07-28 12:47 ` [pve-devel] [PATCH container v4 1/4] extract apparmor profile & namespace switch to a helper Filip Schauer
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Filip Schauer @ 2025-07-28 12:47 UTC (permalink / raw)
  To: pve-devel

For now this only includes adding devices to a running container.
(not removing or editing existing devices)

Changes since v3:
* Rebase onto new master (b3163a6125cb) and adopt new perltidy format
* Validate device with get_device_mode_and_rdev helper

Changes since v2:
* Remove superfluous comments stating the obvious
* Ensure that the path in the container where the device node is placed
  exists
* Split long lines

Changes since v1:
* Reduce code repetition between lxc-pve-prestart-hook and
  device_passthrough_hotplug by extracting passthrough device node
  creation to a helper.

Filip Schauer (4):
  extract apparmor profile & namespace switch to a helper
  extract passthrough device node creation to a helper
  config: support printing a device
  implement device hotplug

 src/PVE/LXC.pm            | 186 +++++++++++++++++++++++++++++++-------
 src/PVE/LXC/Config.pm     |  25 +++++
 src/lxc-pve-prestart-hook |  27 ++----
 3 files changed, 187 insertions(+), 51 deletions(-)

-- 
2.47.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] [PATCH container v4 1/4] extract apparmor profile & namespace switch to a helper
  2025-07-28 12:47 [pve-devel] [PATCH container v4 0/4] implement device hotplug Filip Schauer
@ 2025-07-28 12:47 ` Filip Schauer
  2025-07-28 12:47 ` [pve-devel] [PATCH container v4 2/4] extract passthrough device node creation " Filip Schauer
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Filip Schauer @ 2025-07-28 12:47 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
---
 src/PVE/LXC.pm | 75 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 45 insertions(+), 30 deletions(-)

diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm
index 741bb33..47fb86a 100644
--- a/src/PVE/LXC.pm
+++ b/src/PVE/LXC.pm
@@ -2125,15 +2125,42 @@ sub __mountpoint_mount {
     die "unsupported storage";
 }
 
+my $enter_mnt_ns_and_change_aa_profile = sub {
+    my ($mnt_ns, $aa_profile, $code_pre_changeprofile) = @_;
+
+    # Grab a file descriptor to our apparmor label file so we can change the profile
+    sysopen(my $aa_fd, "/proc/self/attr/current", O_WRONLY)
+        or die "failed to open '/proc/self/attr/current' for writing: $!\n";
+
+    # But switch namespaces first, to make sure the namespace switches aren't blocked by
+    # apparmor.
+    PVE::Tools::setns(fileno($mnt_ns), PVE::Tools::CLONE_NEWNS);
+    chdir('/')
+        or die "failed to change root directory within mount namespace: $!\n";
+
+    $code_pre_changeprofile->() if defined($code_pre_changeprofile);
+
+    # Now switch our apparmor profile:
+    my $data = "changeprofile $aa_profile";
+    my $data_written = syswrite($aa_fd, $data, length($data));
+    if (!defined($data_written) || $data_written != length($data)) {
+        die "failed to change apparmor profile: $!\n";
+    }
+    # Check errors on close as well:
+    close($aa_fd)
+        or die "failed to change apparmor profile (close() failed): $!\n";
+};
+
 sub mountpoint_hotplug : prototype($$$$$) {
     my ($vmid, $conf, $opt, $mp, $storage_cfg) = @_;
 
     my (undef, $root_uid, $root_gid) = PVE::LXC::parse_id_maps($conf);
 
-    # We do the rest in a fork with an unshared mount namespace, because:
-    #  -) change our papparmor profile to that of /usr/bin/lxc-start
-    #  -) we're now going to 'stage' # the mountpoint, then grab it, then move into the
-    #     container's namespace, then mount it.
+    # We do the rest in a fork with an unshared mount namespace:
+    #  -) change our apparmor profile to 'pve-container-mounthotplug', which is '/usr/bin/lxc-start'
+    #     with move_mount privileges on every mount.
+    #  -) we're now going to 'stage' the mountpoint, then grab it, then move into the container's
+    #     namespace, then mount it.
 
     PVE::Tools::run_fork(sub {
         # Pin the container pid longer, we also need to get its monitor/parent:
@@ -2146,32 +2173,20 @@ sub mountpoint_hotplug : prototype($$$$$) {
         my $ct_mnt_ns = $get_container_namespace->($vmid, $ct_pid, 'mnt');
         my $monitor_mnt_ns = $get_container_namespace->($vmid, $monitor_pid, 'mnt');
 
-        # Grab a file descriptor to our apparmor label file so we can change into the 'lxc-start'
-        # profile to lower our privileges to the same level we have in the start hook:
-        sysopen(my $aa_fd, "/proc/self/attr/current", O_WRONLY)
-            or die "failed to open '/proc/self/attr/current' for writing: $!\n";
-        # But switch namespaces first, to make sure the namespace switches aren't blocked by
-        # apparmor.
-
-        # Change into the monitor's mount namespace. We "pin" the mount into the monitor's
-        # namespace for it to remain active there since the container will be able to unmount
-        # hotplugged mount points and thereby potentially free up loop devices, which is a security
-        # concern.
-        PVE::Tools::setns(fileno($monitor_mnt_ns), PVE::Tools::CLONE_NEWNS);
-        chdir('/')
-            or die "failed to change root directory within the monitor's mount namespace: $!\n";
-
-        my $dir = get_staging_mount_path($opt);
-
-        # Now switch our apparmor profile before mounting:
-        my $data = 'changeprofile pve-container-mounthotplug';
-        my $data_written = syswrite($aa_fd, $data, length($data));
-        if (!defined($data_written) || $data_written != length($data)) {
-            die "failed to change apparmor profile: $!\n";
-        }
-        # Check errors on close as well:
-        close($aa_fd)
-            or die "failed to change apparmor profile (close() failed): $!\n";
+        # -) Change into the monitor's mount namespace. We "pin" the mount into the monitor's
+        #    namespace for it to remain active there since the container will be able to unmount
+        #    hotplugged mount points and thereby potentially free up loop devices, which is a
+        #    security concern.
+        # -) Prepare the staging directory.
+        # -) Switch to the 'pve-container-mounthotplug' apparmor profile.
+        my $dir;
+        $enter_mnt_ns_and_change_aa_profile->(
+            $monitor_mnt_ns,
+            "pve-container-mounthotplug",
+            sub {
+                $dir = get_staging_mount_path($opt);
+            },
+        );
 
         my $mount_fd = mountpoint_stage($mp, $dir, $storage_cfg, undef, $root_uid, $root_gid);
 
-- 
2.47.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] [PATCH container v4 2/4] extract passthrough device node creation to a helper
  2025-07-28 12:47 [pve-devel] [PATCH container v4 0/4] implement device hotplug Filip Schauer
  2025-07-28 12:47 ` [pve-devel] [PATCH container v4 1/4] extract apparmor profile & namespace switch to a helper Filip Schauer
@ 2025-07-28 12:47 ` Filip Schauer
  2025-07-28 12:47 ` [pve-devel] [PATCH container v4 3/4] config: support printing a device Filip Schauer
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Filip Schauer @ 2025-07-28 12:47 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
---
 src/PVE/LXC.pm            | 27 +++++++++++++++++++++++++++
 src/lxc-pve-prestart-hook | 27 +++++++--------------------
 2 files changed, 34 insertions(+), 20 deletions(-)

diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm
index 47fb86a..e5c0714 100644
--- a/src/PVE/LXC.pm
+++ b/src/PVE/LXC.pm
@@ -6,6 +6,7 @@ use warnings;
 use Cwd qw();
 use Errno qw(ELOOP ENOTDIR EROFS ECONNREFUSED EEXIST);
 use Fcntl qw(O_RDONLY O_WRONLY O_NOFOLLOW O_DIRECTORY :mode);
+use File::Basename;
 use File::Path;
 use File::Spec;
 use IO::Poll qw(POLLIN POLLHUP);
@@ -1747,6 +1748,32 @@ sub run_with_loopdev {
     return $device;
 }
 
+sub create_passthrough_device_node($$$$$) {
+    my ($passthrough_dir, $device, $mode, $rdev, $id_map) = @_;
+
+    my $passthrough_device_path = $passthrough_dir . $device->{path};
+    File::Path::make_path(dirname($passthrough_device_path));
+    PVE::Tools::mknod($passthrough_device_path, $mode, $rdev)
+        or die("failed to mknod $passthrough_device_path: $!\n");
+
+    # Use chmod because umask could mess with the access mode on mknod
+    my $passthrough_mode = 0660;
+    $passthrough_mode = oct($device->{mode}) if defined($device->{mode});
+    chmod $passthrough_mode, $passthrough_device_path
+        or die "failed to chmod $passthrough_mode $passthrough_device_path: $!\n";
+
+    my $uid = 0;
+    my $gid = 0;
+    $uid = $device->{uid} if defined($device->{uid});
+    $gid = $device->{gid} if defined($device->{gid});
+    $uid = PVE::LXC::map_ct_uid_to_host($uid, $id_map);
+    $gid = PVE::LXC::map_ct_gid_to_host($gid, $id_map);
+    chown $uid, $gid, $passthrough_device_path
+        or die("failed to chown $uid:$gid $passthrough_device_path: $!\n");
+
+    return $passthrough_device_path;
+}
+
 # In scalar mode: returns a file handle to the deepest directory node.
 # In list context: returns a list of:
 #   * the deepest directory node
diff --git a/src/lxc-pve-prestart-hook b/src/lxc-pve-prestart-hook
index 0c23041..73125e1 100755
--- a/src/lxc-pve-prestart-hook
+++ b/src/lxc-pve-prestart-hook
@@ -142,26 +142,13 @@ PVE::LXC::Tools::lxc_hook(
             die "Could not get mode or device ID of $absolute_path\n"
                 if (!defined($mode) || !defined($rdev));
 
-            my $passthrough_device_path = $passthrough_dir . $absolute_path;
-            File::Path::make_path(dirname($passthrough_device_path));
-            PVE::Tools::mknod($passthrough_device_path, $mode, $rdev)
-                or die("failed to mknod $passthrough_device_path: $!\n");
-
-            # Use chmod because umask could mess with the access mode on mknod
-            my $passthrough_mode = 0660;
-            $passthrough_mode = oct($device->{mode}) if defined($device->{mode});
-            chmod $passthrough_mode, $passthrough_device_path
-                or die "failed to chmod $passthrough_mode $passthrough_device_path: $!\n";
-
-            # Set uid and gid of the device node
-            my $uid = 0;
-            my $gid = 0;
-            $uid = $device->{uid} if defined($device->{uid});
-            $gid = $device->{gid} if defined($device->{gid});
-            $uid = PVE::LXC::map_ct_uid_to_host($uid, $id_map);
-            $gid = PVE::LXC::map_ct_gid_to_host($gid, $id_map);
-            chown $uid, $gid, $passthrough_device_path
-                or die("failed to chown $uid:$gid $passthrough_device_path: $!\n");
+            PVE::LXC::create_passthrough_device_node(
+                $passthrough_dir,
+                $device,
+                $mode,
+                $rdev,
+                $id_map,
+            );
 
             push @$passthrough_devices, [$absolute_path, $mode, $rdev];
         };
-- 
2.47.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] [PATCH container v4 3/4] config: support printing a device
  2025-07-28 12:47 [pve-devel] [PATCH container v4 0/4] implement device hotplug Filip Schauer
  2025-07-28 12:47 ` [pve-devel] [PATCH container v4 1/4] extract apparmor profile & namespace switch to a helper Filip Schauer
  2025-07-28 12:47 ` [pve-devel] [PATCH container v4 2/4] extract passthrough device node creation " Filip Schauer
@ 2025-07-28 12:47 ` Filip Schauer
  2025-07-28 12:47 ` [pve-devel] [PATCH container v4 4/4] implement device hotplug Filip Schauer
  2025-07-30 11:40 ` [pve-devel] applied-series: [PATCH container v4 0/4] " Thomas Lamprecht
  4 siblings, 0 replies; 8+ messages in thread
From: Filip Schauer @ 2025-07-28 12:47 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
---
 src/PVE/LXC/Config.pm | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/src/PVE/LXC/Config.pm b/src/PVE/LXC/Config.pm
index 7bdb8b9..7aa6263 100644
--- a/src/PVE/LXC/Config.pm
+++ b/src/PVE/LXC/Config.pm
@@ -1435,6 +1435,12 @@ sub print_volume {
     return $class->print_ct_mountpoint($volume, $key eq 'rootfs');
 }
 
+sub print_device {
+    my ($class, $info) = @_;
+
+    return PVE::JSONSchema::print_property_string($info, $dev_desc);
+}
+
 sub volid_key {
     my ($class) = @_;
 
-- 
2.47.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] [PATCH container v4 4/4] implement device hotplug
  2025-07-28 12:47 [pve-devel] [PATCH container v4 0/4] implement device hotplug Filip Schauer
                   ` (2 preceding siblings ...)
  2025-07-28 12:47 ` [pve-devel] [PATCH container v4 3/4] config: support printing a device Filip Schauer
@ 2025-07-28 12:47 ` Filip Schauer
  2025-07-30 10:59   ` Thomas Lamprecht
  2025-07-30 11:40 ` [pve-devel] applied-series: [PATCH container v4 0/4] " Thomas Lamprecht
  4 siblings, 1 reply; 8+ messages in thread
From: Filip Schauer @ 2025-07-28 12:47 UTC (permalink / raw)
  To: pve-devel

This only includes adding devices to a running container. Removing or
editing existing devices is still not implemented.

Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
---
 src/PVE/LXC.pm        | 84 ++++++++++++++++++++++++++++++++++++++++++-
 src/PVE/LXC/Config.pm | 19 ++++++++++
 2 files changed, 102 insertions(+), 1 deletion(-)

diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm
index e5c0714..63fb5d1 100644
--- a/src/PVE/LXC.pm
+++ b/src/PVE/LXC.pm
@@ -5,7 +5,7 @@ use warnings;
 
 use Cwd qw();
 use Errno qw(ELOOP ENOTDIR EROFS ECONNREFUSED EEXIST);
-use Fcntl qw(O_RDONLY O_WRONLY O_NOFOLLOW O_DIRECTORY :mode);
+use Fcntl qw(O_RDONLY O_WRONLY O_NOFOLLOW O_DIRECTORY O_CREAT :mode);
 use File::Basename;
 use File::Path;
 use File::Spec;
@@ -2178,6 +2178,88 @@ my $enter_mnt_ns_and_change_aa_profile = sub {
         or die "failed to change apparmor profile (close() failed): $!\n";
 };
 
+sub device_passthrough_hotplug : prototype($$$) {
+    my ($vmid, $conf, $dev) = @_;
+
+    my ($mode, $rdev) = PVE::LXC::Tools::get_device_mode_and_rdev($dev->{path});
+    my $device_type = S_ISBLK($mode) ? 'b' : 'c';
+    my $major = PVE::Tools::dev_t_major($rdev);
+    my $minor = PVE::Tools::dev_t_minor($rdev);
+
+    # We do the rest in a fork with an unshared mount namespace:
+    #  -) change our apparmor profile to 'pve-container-mounthotplug', which is '/usr/bin/lxc-start'
+    #     with move_mount privileges on every mount.
+    #  -) create the device node, then grab it, create a file to bind mount the device node onto in
+    #     the container, switch to the container mount namespace, and move_mount the device node.
+
+    PVE::Tools::run_fork(sub {
+        # Pin the container pid longer, we also need to get its monitor/parent:
+        my ($ct_pid, $ct_pidfd) = open_lxc_pid($vmid)
+            or die "failed to open pidfd of container $vmid\'s init process\n";
+
+        my ($monitor_pid, $monitor_pidfd) = open_ppid($ct_pid)
+            or die "failed to open pidfd of container $vmid\'s monitor process\n";
+
+        my $ct_mnt_ns = $get_container_namespace->($vmid, $ct_pid, 'mnt');
+        my $ct_user_ns = $get_container_namespace->($vmid, $ct_pid, 'user');
+        my $monitor_mnt_ns = $get_container_namespace->($vmid, $monitor_pid, 'mnt');
+
+        # Enter monitor mount namespace and switch to 'pve-container-mounthotplug' apparmor profile.
+        $enter_mnt_ns_and_change_aa_profile->(
+            $monitor_mnt_ns, "pve-container-mounthotplug", undef,
+        );
+
+        my $id_map = (PVE::LXC::parse_id_maps($conf))[0];
+        my $passthrough_device_path = create_passthrough_device_node(
+            "/var/lib/lxc/$vmid/passthrough",
+            $dev, $mode, $rdev, $id_map,
+        );
+
+        my $srcfh = PVE::Tools::open_tree(
+            &AT_FDCWD,
+            $passthrough_device_path,
+            &OPEN_TREE_CLOEXEC | &OPEN_TREE_CLONE,
+        ) or die "open_tree() on passthrough device node failed: $!\n";
+
+        if ($conf->{unprivileged}) {
+            PVE::Tools::setns(fileno($ct_user_ns), PVE::Tools::CLONE_NEWUSER)
+                or die "failed to enter user namespace of container $vmid: $!\n";
+
+            POSIX::setuid(0);
+            POSIX::setgid(0);
+        }
+
+        # Create a regular file in the container to bind mount the device node onto.
+        my $device_path = "/proc/$ct_pid/root$dev->{path}";
+        File::Path::make_path(dirname($device_path));
+        sysopen(my $dstfh, $device_path, O_CREAT)
+            or die "failed to create '$device_path': $!\n";
+
+        # Enter the container mount namespace
+        PVE::Tools::setns(fileno($ct_mnt_ns), PVE::Tools::CLONE_NEWNS);
+        chdir('/')
+            or die "failed to change directory within the container's mount namespace: $!\n";
+
+        # Bind mount the device node into the container
+        PVE::Tools::move_mount(
+            fileno($srcfh),
+            '',
+            fileno($dstfh),
+            '',
+            &MOVE_MOUNT_F_EMPTY_PATH | &MOVE_MOUNT_T_EMPTY_PATH,
+        ) or die "move_mount failed: $!\n";
+    });
+
+    # Allow or deny device access with cgroup2
+    run_command(["lxc-cgroup", "-n", $vmid, "devices.deny", "$device_type $major:$minor w"])
+        if ($dev->{'deny-write'});
+
+    my $allow_perms = $dev->{'deny-write'} ? 'r' : 'rw';
+    run_command([
+        "lxc-cgroup", "-n", $vmid, "devices.allow", "$device_type $major:$minor $allow_perms",
+    ]);
+}
+
 sub mountpoint_hotplug : prototype($$$$$) {
     my ($vmid, $conf, $opt, $mp, $storage_cfg) = @_;
 
diff --git a/src/PVE/LXC/Config.pm b/src/PVE/LXC/Config.pm
index 7aa6263..de963bc 100644
--- a/src/PVE/LXC/Config.pm
+++ b/src/PVE/LXC/Config.pm
@@ -1613,6 +1613,13 @@ sub vmconfig_hotplug_pending {
                 $class->apply_pending_mountpoint($vmid, $conf, $opt, $storecfg, 1);
                 # apply_pending_mountpoint modifies the value if it creates a new disk
                 $value = $conf->{pending}->{$opt};
+            } elsif ($opt =~ m/^dev(\d+)$/) {
+                if (exists($conf->{$opt})) {
+                    die "skip\n"; # don't try to hotplug over existing dev
+                }
+
+                $class->apply_pending_device_passthrough($vmid, $conf, $opt, 1);
+                $value = $conf->{pending}->{$opt};
             } else {
                 die "skip\n"; # skip non-hotpluggable
             }
@@ -1732,6 +1739,18 @@ my $rescan_volume = sub {
     warn "Could not rescan volume size - $@\n" if $@;
 };
 
+sub apply_pending_device_passthrough {
+    my ($class, $vmid, $conf, $opt, $running) = @_;
+
+    my $dev = $class->parse_device($conf->{pending}->{$opt});
+    my $old = $conf->{$opt};
+    if ($running) {
+        die "skip\n" if defined($old); # TODO: editing a device passthrough
+        PVE::LXC::device_passthrough_hotplug($vmid, $conf, $dev);
+        $conf->{pending}->{$opt} = $class->print_device($dev);
+    }
+}
+
 sub apply_pending_mountpoint {
     my ($class, $vmid, $conf, $opt, $storecfg, $running) = @_;
 
-- 
2.47.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pve-devel] [PATCH container v4 4/4] implement device hotplug
  2025-07-28 12:47 ` [pve-devel] [PATCH container v4 4/4] implement device hotplug Filip Schauer
@ 2025-07-30 10:59   ` Thomas Lamprecht
  2025-07-30 12:12     ` Filip Schauer
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Lamprecht @ 2025-07-30 10:59 UTC (permalink / raw)
  To: Proxmox VE development discussion, Filip Schauer

Am 28.07.25 um 14:49 schrieb Filip Schauer:
> This only includes adding devices to a running container. Removing or
> editing existing devices is still not implemented.
> 
> Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
> ---
>  src/PVE/LXC.pm        | 84 ++++++++++++++++++++++++++++++++++++++++++-
>  src/PVE/LXC/Config.pm | 19 ++++++++++
>  2 files changed, 102 insertions(+), 1 deletion(-)
> 
> diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm
> index e5c0714..63fb5d1 100644
> --- a/src/PVE/LXC.pm
> +++ b/src/PVE/LXC.pm
> @@ -5,7 +5,7 @@ use warnings;
>  
>  use Cwd qw();
>  use Errno qw(ELOOP ENOTDIR EROFS ECONNREFUSED EEXIST);
> -use Fcntl qw(O_RDONLY O_WRONLY O_NOFOLLOW O_DIRECTORY :mode);
> +use Fcntl qw(O_RDONLY O_WRONLY O_NOFOLLOW O_DIRECTORY O_CREAT :mode);
>  use File::Basename;
>  use File::Path;
>  use File::Spec;
> @@ -2178,6 +2178,88 @@ my $enter_mnt_ns_and_change_aa_profile = sub {
>          or die "failed to change apparmor profile (close() failed): $!\n";
>  };
>  
> +sub device_passthrough_hotplug : prototype($$$) {
> +    my ($vmid, $conf, $dev) = @_;
> +
> +    my ($mode, $rdev) = PVE::LXC::Tools::get_device_mode_and_rdev($dev->{path});
> +    my $device_type = S_ISBLK($mode) ? 'b' : 'c';
> +    my $major = PVE::Tools::dev_t_major($rdev);
> +    my $minor = PVE::Tools::dev_t_minor($rdev);
> +
> +    # We do the rest in a fork with an unshared mount namespace:
> +    #  -) change our apparmor profile to 'pve-container-mounthotplug', which is '/usr/bin/lxc-start'
> +    #     with move_mount privileges on every mount.
> +    #  -) create the device node, then grab it, create a file to bind mount the device node onto in
> +    #     the container, switch to the container mount namespace, and move_mount the device node.
> +
> +    PVE::Tools::run_fork(sub {
> +        # Pin the container pid longer, we also need to get its monitor/parent:
> +        my ($ct_pid, $ct_pidfd) = open_lxc_pid($vmid)
> +            or die "failed to open pidfd of container $vmid\'s init process\n";
> +
> +        my ($monitor_pid, $monitor_pidfd) = open_ppid($ct_pid)
> +            or die "failed to open pidfd of container $vmid\'s monitor process\n";
> +
> +        my $ct_mnt_ns = $get_container_namespace->($vmid, $ct_pid, 'mnt');
> +        my $ct_user_ns = $get_container_namespace->($vmid, $ct_pid, 'user');
> +        my $monitor_mnt_ns = $get_container_namespace->($vmid, $monitor_pid, 'mnt');
> +
> +        # Enter monitor mount namespace and switch to 'pve-container-mounthotplug' apparmor profile.
> +        $enter_mnt_ns_and_change_aa_profile->(
> +            $monitor_mnt_ns, "pve-container-mounthotplug", undef,
> +        );
> +
> +        my $id_map = (PVE::LXC::parse_id_maps($conf))[0];
> +        my $passthrough_device_path = create_passthrough_device_node(
> +            "/var/lib/lxc/$vmid/passthrough",
> +            $dev, $mode, $rdev, $id_map,
> +        );
> +
> +        my $srcfh = PVE::Tools::open_tree(
> +            &AT_FDCWD,
> +            $passthrough_device_path,
> +            &OPEN_TREE_CLOEXEC | &OPEN_TREE_CLONE,
> +        ) or die "open_tree() on passthrough device node failed: $!\n";
> +
> +        if ($conf->{unprivileged}) {
> +            PVE::Tools::setns(fileno($ct_user_ns), PVE::Tools::CLONE_NEWUSER)
> +                or die "failed to enter user namespace of container $vmid: $!\n";
> +
> +            POSIX::setuid(0);
> +            POSIX::setgid(0);
> +        }
> +
> +        # Create a regular file in the container to bind mount the device node onto.
> +        my $device_path = "/proc/$ct_pid/root$dev->{path}";
> +        File::Path::make_path(dirname($device_path));
> +        sysopen(my $dstfh, $device_path, O_CREAT)
> +            or die "failed to create '$device_path': $!\n";
> +
> +        # Enter the container mount namespace
> +        PVE::Tools::setns(fileno($ct_mnt_ns), PVE::Tools::CLONE_NEWNS);
> +        chdir('/')
> +            or die "failed to change directory within the container's mount namespace: $!\n";
> +
> +        # Bind mount the device node into the container
> +        PVE::Tools::move_mount(
> +            fileno($srcfh),
> +            '',
> +            fileno($dstfh),
> +            '',
> +            &MOVE_MOUNT_F_EMPTY_PATH | &MOVE_MOUNT_T_EMPTY_PATH,
> +        ) or die "move_mount failed: $!\n";
> +    });
> +
> +    # Allow or deny device access with cgroup2
> +    run_command(["lxc-cgroup", "-n", $vmid, "devices.deny", "$device_type $major:$minor w"])
> +        if ($dev->{'deny-write'});
> +
> +    my $allow_perms = $dev->{'deny-write'} ? 'r' : 'rw';
> +    run_command([
> +        "lxc-cgroup", "-n", $vmid, "devices.allow", "$device_type $major:$minor $allow_perms",
> +    ]);
> +}
> +
>  sub mountpoint_hotplug : prototype($$$$$) {
>      my ($vmid, $conf, $opt, $mp, $storage_cfg) = @_;
>  
> diff --git a/src/PVE/LXC/Config.pm b/src/PVE/LXC/Config.pm
> index 7aa6263..de963bc 100644
> --- a/src/PVE/LXC/Config.pm
> +++ b/src/PVE/LXC/Config.pm
> @@ -1613,6 +1613,13 @@ sub vmconfig_hotplug_pending {
>                  $class->apply_pending_mountpoint($vmid, $conf, $opt, $storecfg, 1);
>                  # apply_pending_mountpoint modifies the value if it creates a new disk
>                  $value = $conf->{pending}->{$opt};
> +            } elsif ($opt =~ m/^dev(\d+)$/) {
> +                if (exists($conf->{$opt})) {
> +                    die "skip\n"; # don't try to hotplug over existing dev
> +                }
> +
> +                $class->apply_pending_device_passthrough($vmid, $conf, $opt, 1);

Below would do well with a comment, and actually I'm not really sure it's needed,
currently the print_device called in apply_pending_device_passthrough is just
printing the property string per the format, which we just parsed before,
so this should be 1:1 the same before after here? Unlike for mpX mountpoints,
where the formatted string might change. Or is this just for future preparation
for more complex handling/devices?

In anyway, can be fine as is now, it doesn't really hurt either, it's just
a little bit confusing and might be unnecessary, so maybe take another look
at this or provide some rationale.

> +                $value = $conf->{pending}->{$opt};
>              } else {
>                  die "skip\n"; # skip non-hotpluggable
>              }
> @@ -1732,6 +1739,18 @@ my $rescan_volume = sub {
>      warn "Could not rescan volume size - $@\n" if $@;
>  };
>  
> +sub apply_pending_device_passthrough {
> +    my ($class, $vmid, $conf, $opt, $running) = @_;
> +
> +    my $dev = $class->parse_device($conf->{pending}->{$opt});

parsed here

> +    my $old = $conf->{$opt};
> +    if ($running) {
> +        die "skip\n" if defined($old); # TODO: editing a device passthrough
> +        PVE::LXC::device_passthrough_hotplug($vmid, $conf, $dev);
> +        $conf->{pending}->{$opt} = $class->print_device($dev);

serialized 1:1 again here

> +    }
> +}
> +
>  sub apply_pending_mountpoint {
>      my ($class, $vmid, $conf, $opt, $storecfg, $running) = @_;
>  



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] applied-series: [PATCH container v4 0/4] implement device hotplug
  2025-07-28 12:47 [pve-devel] [PATCH container v4 0/4] implement device hotplug Filip Schauer
                   ` (3 preceding siblings ...)
  2025-07-28 12:47 ` [pve-devel] [PATCH container v4 4/4] implement device hotplug Filip Schauer
@ 2025-07-30 11:40 ` Thomas Lamprecht
  4 siblings, 0 replies; 8+ messages in thread
From: Thomas Lamprecht @ 2025-07-30 11:40 UTC (permalink / raw)
  To: pve-devel, Filip Schauer

On Mon, 28 Jul 2025 14:47:48 +0200, Filip Schauer wrote:
> For now this only includes adding devices to a running container.
> (not removing or editing existing devices)
> 
> Changes since v3:
> * Rebase onto new master (b3163a6125cb) and adopt new perltidy format
> * Validate device with get_device_mode_and_rdev helper
> 
> [...]

Applied, thanks! As mentioned in a separate reply, please take a look if
re-serializing the pending option is really sensible/useful here.

[1/4] extract apparmor profile & namespace switch to a helper
      commit: 4721f8e653444c9a069b8b235eb512a293a6bc2f
[2/4] extract passthrough device node creation to a helper
      commit: 2ea0dff57306133f33867ed6c7ce1a4b3a14ce4b
[3/4] config: support printing a device
      commit: a568007b2f1f03bd23aec213b1c1578383ef2cf4
[4/4] implement device hotplug
      commit: c702167e575888072f5cbb34d1a2d1828c26e1f3


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pve-devel] [PATCH container v4 4/4] implement device hotplug
  2025-07-30 10:59   ` Thomas Lamprecht
@ 2025-07-30 12:12     ` Filip Schauer
  0 siblings, 0 replies; 8+ messages in thread
From: Filip Schauer @ 2025-07-30 12:12 UTC (permalink / raw)
  To: Thomas Lamprecht, Proxmox VE development discussion

On 30/07/2025 12:59, Thomas Lamprecht wrote:
> Below would do well with a comment, and actually I'm not really sure it's needed,
> currently the print_device called in apply_pending_device_passthrough is just
> printing the property string per the format, which we just parsed before,
> so this should be 1:1 the same before after here? Unlike for mpX mountpoints,
> where the formatted string might change. Or is this just for future preparation
> for more complex handling/devices?
>
> In anyway, can be fine as is now, it doesn't really hurt either, it's just
> a little bit confusing and might be unnecessary, so maybe take another look
> at this or provide some rationale.

This is indeed unnecessary. I will send a patch to simplify this.



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-07-30 12:12 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-07-28 12:47 [pve-devel] [PATCH container v4 0/4] implement device hotplug Filip Schauer
2025-07-28 12:47 ` [pve-devel] [PATCH container v4 1/4] extract apparmor profile & namespace switch to a helper Filip Schauer
2025-07-28 12:47 ` [pve-devel] [PATCH container v4 2/4] extract passthrough device node creation " Filip Schauer
2025-07-28 12:47 ` [pve-devel] [PATCH container v4 3/4] config: support printing a device Filip Schauer
2025-07-28 12:47 ` [pve-devel] [PATCH container v4 4/4] implement device hotplug Filip Schauer
2025-07-30 10:59   ` Thomas Lamprecht
2025-07-30 12:12     ` Filip Schauer
2025-07-30 11:40 ` [pve-devel] applied-series: [PATCH container v4 0/4] " Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal