all lists on lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH container v4 0/4] implement device hotplug
@ 2025-07-28 12:47 Filip Schauer
  2025-07-28 12:47 ` [pve-devel] [PATCH container v4 1/4] extract apparmor profile & namespace switch to a helper Filip Schauer
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Filip Schauer @ 2025-07-28 12:47 UTC (permalink / raw)
  To: pve-devel

For now this only includes adding devices to a running container.
(not removing or editing existing devices)

Changes since v3:
* Rebase onto new master (b3163a6125cb) and adopt new perltidy format
* Validate device with get_device_mode_and_rdev helper

Changes since v2:
* Remove superfluous comments stating the obvious
* Ensure that the path in the container where the device node is placed
  exists
* Split long lines

Changes since v1:
* Reduce code repetition between lxc-pve-prestart-hook and
  device_passthrough_hotplug by extracting passthrough device node
  creation to a helper.

Filip Schauer (4):
  extract apparmor profile & namespace switch to a helper
  extract passthrough device node creation to a helper
  config: support printing a device
  implement device hotplug

 src/PVE/LXC.pm            | 186 +++++++++++++++++++++++++++++++-------
 src/PVE/LXC/Config.pm     |  25 +++++
 src/lxc-pve-prestart-hook |  27 ++----
 3 files changed, 187 insertions(+), 51 deletions(-)

-- 
2.47.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] [PATCH container v4 1/4] extract apparmor profile & namespace switch to a helper
  2025-07-28 12:47 [pve-devel] [PATCH container v4 0/4] implement device hotplug Filip Schauer
@ 2025-07-28 12:47 ` Filip Schauer
  2025-07-28 12:47 ` [pve-devel] [PATCH container v4 2/4] extract passthrough device node creation " Filip Schauer
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Filip Schauer @ 2025-07-28 12:47 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
---
 src/PVE/LXC.pm | 75 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 45 insertions(+), 30 deletions(-)

diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm
index 741bb33..47fb86a 100644
--- a/src/PVE/LXC.pm
+++ b/src/PVE/LXC.pm
@@ -2125,15 +2125,42 @@ sub __mountpoint_mount {
     die "unsupported storage";
 }
 
+my $enter_mnt_ns_and_change_aa_profile = sub {
+    my ($mnt_ns, $aa_profile, $code_pre_changeprofile) = @_;
+
+    # Grab a file descriptor to our apparmor label file so we can change the profile
+    sysopen(my $aa_fd, "/proc/self/attr/current", O_WRONLY)
+        or die "failed to open '/proc/self/attr/current' for writing: $!\n";
+
+    # But switch namespaces first, to make sure the namespace switches aren't blocked by
+    # apparmor.
+    PVE::Tools::setns(fileno($mnt_ns), PVE::Tools::CLONE_NEWNS);
+    chdir('/')
+        or die "failed to change root directory within mount namespace: $!\n";
+
+    $code_pre_changeprofile->() if defined($code_pre_changeprofile);
+
+    # Now switch our apparmor profile:
+    my $data = "changeprofile $aa_profile";
+    my $data_written = syswrite($aa_fd, $data, length($data));
+    if (!defined($data_written) || $data_written != length($data)) {
+        die "failed to change apparmor profile: $!\n";
+    }
+    # Check errors on close as well:
+    close($aa_fd)
+        or die "failed to change apparmor profile (close() failed): $!\n";
+};
+
 sub mountpoint_hotplug : prototype($$$$$) {
     my ($vmid, $conf, $opt, $mp, $storage_cfg) = @_;
 
     my (undef, $root_uid, $root_gid) = PVE::LXC::parse_id_maps($conf);
 
-    # We do the rest in a fork with an unshared mount namespace, because:
-    #  -) change our papparmor profile to that of /usr/bin/lxc-start
-    #  -) we're now going to 'stage' # the mountpoint, then grab it, then move into the
-    #     container's namespace, then mount it.
+    # We do the rest in a fork with an unshared mount namespace:
+    #  -) change our apparmor profile to 'pve-container-mounthotplug', which is '/usr/bin/lxc-start'
+    #     with move_mount privileges on every mount.
+    #  -) we're now going to 'stage' the mountpoint, then grab it, then move into the container's
+    #     namespace, then mount it.
 
     PVE::Tools::run_fork(sub {
         # Pin the container pid longer, we also need to get its monitor/parent:
@@ -2146,32 +2173,20 @@ sub mountpoint_hotplug : prototype($$$$$) {
         my $ct_mnt_ns = $get_container_namespace->($vmid, $ct_pid, 'mnt');
         my $monitor_mnt_ns = $get_container_namespace->($vmid, $monitor_pid, 'mnt');
 
-        # Grab a file descriptor to our apparmor label file so we can change into the 'lxc-start'
-        # profile to lower our privileges to the same level we have in the start hook:
-        sysopen(my $aa_fd, "/proc/self/attr/current", O_WRONLY)
-            or die "failed to open '/proc/self/attr/current' for writing: $!\n";
-        # But switch namespaces first, to make sure the namespace switches aren't blocked by
-        # apparmor.
-
-        # Change into the monitor's mount namespace. We "pin" the mount into the monitor's
-        # namespace for it to remain active there since the container will be able to unmount
-        # hotplugged mount points and thereby potentially free up loop devices, which is a security
-        # concern.
-        PVE::Tools::setns(fileno($monitor_mnt_ns), PVE::Tools::CLONE_NEWNS);
-        chdir('/')
-            or die "failed to change root directory within the monitor's mount namespace: $!\n";
-
-        my $dir = get_staging_mount_path($opt);
-
-        # Now switch our apparmor profile before mounting:
-        my $data = 'changeprofile pve-container-mounthotplug';
-        my $data_written = syswrite($aa_fd, $data, length($data));
-        if (!defined($data_written) || $data_written != length($data)) {
-            die "failed to change apparmor profile: $!\n";
-        }
-        # Check errors on close as well:
-        close($aa_fd)
-            or die "failed to change apparmor profile (close() failed): $!\n";
+        # -) Change into the monitor's mount namespace. We "pin" the mount into the monitor's
+        #    namespace for it to remain active there since the container will be able to unmount
+        #    hotplugged mount points and thereby potentially free up loop devices, which is a
+        #    security concern.
+        # -) Prepare the staging directory.
+        # -) Switch to the 'pve-container-mounthotplug' apparmor profile.
+        my $dir;
+        $enter_mnt_ns_and_change_aa_profile->(
+            $monitor_mnt_ns,
+            "pve-container-mounthotplug",
+            sub {
+                $dir = get_staging_mount_path($opt);
+            },
+        );
 
         my $mount_fd = mountpoint_stage($mp, $dir, $storage_cfg, undef, $root_uid, $root_gid);
 
-- 
2.47.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] [PATCH container v4 2/4] extract passthrough device node creation to a helper
  2025-07-28 12:47 [pve-devel] [PATCH container v4 0/4] implement device hotplug Filip Schauer
  2025-07-28 12:47 ` [pve-devel] [PATCH container v4 1/4] extract apparmor profile & namespace switch to a helper Filip Schauer
@ 2025-07-28 12:47 ` Filip Schauer
  2025-07-28 12:47 ` [pve-devel] [PATCH container v4 3/4] config: support printing a device Filip Schauer
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Filip Schauer @ 2025-07-28 12:47 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
---
 src/PVE/LXC.pm            | 27 +++++++++++++++++++++++++++
 src/lxc-pve-prestart-hook | 27 +++++++--------------------
 2 files changed, 34 insertions(+), 20 deletions(-)

diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm
index 47fb86a..e5c0714 100644
--- a/src/PVE/LXC.pm
+++ b/src/PVE/LXC.pm
@@ -6,6 +6,7 @@ use warnings;
 use Cwd qw();
 use Errno qw(ELOOP ENOTDIR EROFS ECONNREFUSED EEXIST);
 use Fcntl qw(O_RDONLY O_WRONLY O_NOFOLLOW O_DIRECTORY :mode);
+use File::Basename;
 use File::Path;
 use File::Spec;
 use IO::Poll qw(POLLIN POLLHUP);
@@ -1747,6 +1748,32 @@ sub run_with_loopdev {
     return $device;
 }
 
+sub create_passthrough_device_node($$$$$) {
+    my ($passthrough_dir, $device, $mode, $rdev, $id_map) = @_;
+
+    my $passthrough_device_path = $passthrough_dir . $device->{path};
+    File::Path::make_path(dirname($passthrough_device_path));
+    PVE::Tools::mknod($passthrough_device_path, $mode, $rdev)
+        or die("failed to mknod $passthrough_device_path: $!\n");
+
+    # Use chmod because umask could mess with the access mode on mknod
+    my $passthrough_mode = 0660;
+    $passthrough_mode = oct($device->{mode}) if defined($device->{mode});
+    chmod $passthrough_mode, $passthrough_device_path
+        or die "failed to chmod $passthrough_mode $passthrough_device_path: $!\n";
+
+    my $uid = 0;
+    my $gid = 0;
+    $uid = $device->{uid} if defined($device->{uid});
+    $gid = $device->{gid} if defined($device->{gid});
+    $uid = PVE::LXC::map_ct_uid_to_host($uid, $id_map);
+    $gid = PVE::LXC::map_ct_gid_to_host($gid, $id_map);
+    chown $uid, $gid, $passthrough_device_path
+        or die("failed to chown $uid:$gid $passthrough_device_path: $!\n");
+
+    return $passthrough_device_path;
+}
+
 # In scalar mode: returns a file handle to the deepest directory node.
 # In list context: returns a list of:
 #   * the deepest directory node
diff --git a/src/lxc-pve-prestart-hook b/src/lxc-pve-prestart-hook
index 0c23041..73125e1 100755
--- a/src/lxc-pve-prestart-hook
+++ b/src/lxc-pve-prestart-hook
@@ -142,26 +142,13 @@ PVE::LXC::Tools::lxc_hook(
             die "Could not get mode or device ID of $absolute_path\n"
                 if (!defined($mode) || !defined($rdev));
 
-            my $passthrough_device_path = $passthrough_dir . $absolute_path;
-            File::Path::make_path(dirname($passthrough_device_path));
-            PVE::Tools::mknod($passthrough_device_path, $mode, $rdev)
-                or die("failed to mknod $passthrough_device_path: $!\n");
-
-            # Use chmod because umask could mess with the access mode on mknod
-            my $passthrough_mode = 0660;
-            $passthrough_mode = oct($device->{mode}) if defined($device->{mode});
-            chmod $passthrough_mode, $passthrough_device_path
-                or die "failed to chmod $passthrough_mode $passthrough_device_path: $!\n";
-
-            # Set uid and gid of the device node
-            my $uid = 0;
-            my $gid = 0;
-            $uid = $device->{uid} if defined($device->{uid});
-            $gid = $device->{gid} if defined($device->{gid});
-            $uid = PVE::LXC::map_ct_uid_to_host($uid, $id_map);
-            $gid = PVE::LXC::map_ct_gid_to_host($gid, $id_map);
-            chown $uid, $gid, $passthrough_device_path
-                or die("failed to chown $uid:$gid $passthrough_device_path: $!\n");
+            PVE::LXC::create_passthrough_device_node(
+                $passthrough_dir,
+                $device,
+                $mode,
+                $rdev,
+                $id_map,
+            );
 
             push @$passthrough_devices, [$absolute_path, $mode, $rdev];
         };
-- 
2.47.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] [PATCH container v4 3/4] config: support printing a device
  2025-07-28 12:47 [pve-devel] [PATCH container v4 0/4] implement device hotplug Filip Schauer
  2025-07-28 12:47 ` [pve-devel] [PATCH container v4 1/4] extract apparmor profile & namespace switch to a helper Filip Schauer
  2025-07-28 12:47 ` [pve-devel] [PATCH container v4 2/4] extract passthrough device node creation " Filip Schauer
@ 2025-07-28 12:47 ` Filip Schauer
  2025-07-28 12:47 ` [pve-devel] [PATCH container v4 4/4] implement device hotplug Filip Schauer
  2025-07-30 11:40 ` [pve-devel] applied-series: [PATCH container v4 0/4] " Thomas Lamprecht
  4 siblings, 0 replies; 8+ messages in thread
From: Filip Schauer @ 2025-07-28 12:47 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
---
 src/PVE/LXC/Config.pm | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/src/PVE/LXC/Config.pm b/src/PVE/LXC/Config.pm
index 7bdb8b9..7aa6263 100644
--- a/src/PVE/LXC/Config.pm
+++ b/src/PVE/LXC/Config.pm
@@ -1435,6 +1435,12 @@ sub print_volume {
     return $class->print_ct_mountpoint($volume, $key eq 'rootfs');
 }
 
+sub print_device {
+    my ($class, $info) = @_;
+
+    return PVE::JSONSchema::print_property_string($info, $dev_desc);
+}
+
 sub volid_key {
     my ($class) = @_;
 
-- 
2.47.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] [PATCH container v4 4/4] implement device hotplug
  2025-07-28 12:47 [pve-devel] [PATCH container v4 0/4] implement device hotplug Filip Schauer
                   ` (2 preceding siblings ...)
  2025-07-28 12:47 ` [pve-devel] [PATCH container v4 3/4] config: support printing a device Filip Schauer
@ 2025-07-28 12:47 ` Filip Schauer
  2025-07-30 10:59   ` Thomas Lamprecht
  2025-07-30 11:40 ` [pve-devel] applied-series: [PATCH container v4 0/4] " Thomas Lamprecht
  4 siblings, 1 reply; 8+ messages in thread
From: Filip Schauer @ 2025-07-28 12:47 UTC (permalink / raw)
  To: pve-devel

This only includes adding devices to a running container. Removing or
editing existing devices is still not implemented.

Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
---
 src/PVE/LXC.pm        | 84 ++++++++++++++++++++++++++++++++++++++++++-
 src/PVE/LXC/Config.pm | 19 ++++++++++
 2 files changed, 102 insertions(+), 1 deletion(-)

diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm
index e5c0714..63fb5d1 100644
--- a/src/PVE/LXC.pm
+++ b/src/PVE/LXC.pm
@@ -5,7 +5,7 @@ use warnings;
 
 use Cwd qw();
 use Errno qw(ELOOP ENOTDIR EROFS ECONNREFUSED EEXIST);
-use Fcntl qw(O_RDONLY O_WRONLY O_NOFOLLOW O_DIRECTORY :mode);
+use Fcntl qw(O_RDONLY O_WRONLY O_NOFOLLOW O_DIRECTORY O_CREAT :mode);
 use File::Basename;
 use File::Path;
 use File::Spec;
@@ -2178,6 +2178,88 @@ my $enter_mnt_ns_and_change_aa_profile = sub {
         or die "failed to change apparmor profile (close() failed): $!\n";
 };
 
+sub device_passthrough_hotplug : prototype($$$) {
+    my ($vmid, $conf, $dev) = @_;
+
+    my ($mode, $rdev) = PVE::LXC::Tools::get_device_mode_and_rdev($dev->{path});
+    my $device_type = S_ISBLK($mode) ? 'b' : 'c';
+    my $major = PVE::Tools::dev_t_major($rdev);
+    my $minor = PVE::Tools::dev_t_minor($rdev);
+
+    # We do the rest in a fork with an unshared mount namespace:
+    #  -) change our apparmor profile to 'pve-container-mounthotplug', which is '/usr/bin/lxc-start'
+    #     with move_mount privileges on every mount.
+    #  -) create the device node, then grab it, create a file to bind mount the device node onto in
+    #     the container, switch to the container mount namespace, and move_mount the device node.
+
+    PVE::Tools::run_fork(sub {
+        # Pin the container pid longer, we also need to get its monitor/parent:
+        my ($ct_pid, $ct_pidfd) = open_lxc_pid($vmid)
+            or die "failed to open pidfd of container $vmid\'s init process\n";
+
+        my ($monitor_pid, $monitor_pidfd) = open_ppid($ct_pid)
+            or die "failed to open pidfd of container $vmid\'s monitor process\n";
+
+        my $ct_mnt_ns = $get_container_namespace->($vmid, $ct_pid, 'mnt');
+        my $ct_user_ns = $get_container_namespace->($vmid, $ct_pid, 'user');
+        my $monitor_mnt_ns = $get_container_namespace->($vmid, $monitor_pid, 'mnt');
+
+        # Enter monitor mount namespace and switch to 'pve-container-mounthotplug' apparmor profile.
+        $enter_mnt_ns_and_change_aa_profile->(
+            $monitor_mnt_ns, "pve-container-mounthotplug", undef,
+        );
+
+        my $id_map = (PVE::LXC::parse_id_maps($conf))[0];
+        my $passthrough_device_path = create_passthrough_device_node(
+            "/var/lib/lxc/$vmid/passthrough",
+            $dev, $mode, $rdev, $id_map,
+        );
+
+        my $srcfh = PVE::Tools::open_tree(
+            &AT_FDCWD,
+            $passthrough_device_path,
+            &OPEN_TREE_CLOEXEC | &OPEN_TREE_CLONE,
+        ) or die "open_tree() on passthrough device node failed: $!\n";
+
+        if ($conf->{unprivileged}) {
+            PVE::Tools::setns(fileno($ct_user_ns), PVE::Tools::CLONE_NEWUSER)
+                or die "failed to enter user namespace of container $vmid: $!\n";
+
+            POSIX::setuid(0);
+            POSIX::setgid(0);
+        }
+
+        # Create a regular file in the container to bind mount the device node onto.
+        my $device_path = "/proc/$ct_pid/root$dev->{path}";
+        File::Path::make_path(dirname($device_path));
+        sysopen(my $dstfh, $device_path, O_CREAT)
+            or die "failed to create '$device_path': $!\n";
+
+        # Enter the container mount namespace
+        PVE::Tools::setns(fileno($ct_mnt_ns), PVE::Tools::CLONE_NEWNS);
+        chdir('/')
+            or die "failed to change directory within the container's mount namespace: $!\n";
+
+        # Bind mount the device node into the container
+        PVE::Tools::move_mount(
+            fileno($srcfh),
+            '',
+            fileno($dstfh),
+            '',
+            &MOVE_MOUNT_F_EMPTY_PATH | &MOVE_MOUNT_T_EMPTY_PATH,
+        ) or die "move_mount failed: $!\n";
+    });
+
+    # Allow or deny device access with cgroup2
+    run_command(["lxc-cgroup", "-n", $vmid, "devices.deny", "$device_type $major:$minor w"])
+        if ($dev->{'deny-write'});
+
+    my $allow_perms = $dev->{'deny-write'} ? 'r' : 'rw';
+    run_command([
+        "lxc-cgroup", "-n", $vmid, "devices.allow", "$device_type $major:$minor $allow_perms",
+    ]);
+}
+
 sub mountpoint_hotplug : prototype($$$$$) {
     my ($vmid, $conf, $opt, $mp, $storage_cfg) = @_;
 
diff --git a/src/PVE/LXC/Config.pm b/src/PVE/LXC/Config.pm
index 7aa6263..de963bc 100644
--- a/src/PVE/LXC/Config.pm
+++ b/src/PVE/LXC/Config.pm
@@ -1613,6 +1613,13 @@ sub vmconfig_hotplug_pending {
                 $class->apply_pending_mountpoint($vmid, $conf, $opt, $storecfg, 1);
                 # apply_pending_mountpoint modifies the value if it creates a new disk
                 $value = $conf->{pending}->{$opt};
+            } elsif ($opt =~ m/^dev(\d+)$/) {
+                if (exists($conf->{$opt})) {
+                    die "skip\n"; # don't try to hotplug over existing dev
+                }
+
+                $class->apply_pending_device_passthrough($vmid, $conf, $opt, 1);
+                $value = $conf->{pending}->{$opt};
             } else {
                 die "skip\n"; # skip non-hotpluggable
             }
@@ -1732,6 +1739,18 @@ my $rescan_volume = sub {
     warn "Could not rescan volume size - $@\n" if $@;
 };
 
+sub apply_pending_device_passthrough {
+    my ($class, $vmid, $conf, $opt, $running) = @_;
+
+    my $dev = $class->parse_device($conf->{pending}->{$opt});
+    my $old = $conf->{$opt};
+    if ($running) {
+        die "skip\n" if defined($old); # TODO: editing a device passthrough
+        PVE::LXC::device_passthrough_hotplug($vmid, $conf, $dev);
+        $conf->{pending}->{$opt} = $class->print_device($dev);
+    }
+}
+
 sub apply_pending_mountpoint {
     my ($class, $vmid, $conf, $opt, $storecfg, $running) = @_;
 
-- 
2.47.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pve-devel] [PATCH container v4 4/4] implement device hotplug
  2025-07-28 12:47 ` [pve-devel] [PATCH container v4 4/4] implement device hotplug Filip Schauer
@ 2025-07-30 10:59   ` Thomas Lamprecht
  2025-07-30 12:12     ` Filip Schauer
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Lamprecht @ 2025-07-30 10:59 UTC (permalink / raw)
  To: Proxmox VE development discussion, Filip Schauer

Am 28.07.25 um 14:49 schrieb Filip Schauer:
> This only includes adding devices to a running container. Removing or
> editing existing devices is still not implemented.
> 
> Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
> ---
>  src/PVE/LXC.pm        | 84 ++++++++++++++++++++++++++++++++++++++++++-
>  src/PVE/LXC/Config.pm | 19 ++++++++++
>  2 files changed, 102 insertions(+), 1 deletion(-)
> 
> diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm
> index e5c0714..63fb5d1 100644
> --- a/src/PVE/LXC.pm
> +++ b/src/PVE/LXC.pm
> @@ -5,7 +5,7 @@ use warnings;
>  
>  use Cwd qw();
>  use Errno qw(ELOOP ENOTDIR EROFS ECONNREFUSED EEXIST);
> -use Fcntl qw(O_RDONLY O_WRONLY O_NOFOLLOW O_DIRECTORY :mode);
> +use Fcntl qw(O_RDONLY O_WRONLY O_NOFOLLOW O_DIRECTORY O_CREAT :mode);
>  use File::Basename;
>  use File::Path;
>  use File::Spec;
> @@ -2178,6 +2178,88 @@ my $enter_mnt_ns_and_change_aa_profile = sub {
>          or die "failed to change apparmor profile (close() failed): $!\n";
>  };
>  
> +sub device_passthrough_hotplug : prototype($$$) {
> +    my ($vmid, $conf, $dev) = @_;
> +
> +    my ($mode, $rdev) = PVE::LXC::Tools::get_device_mode_and_rdev($dev->{path});
> +    my $device_type = S_ISBLK($mode) ? 'b' : 'c';
> +    my $major = PVE::Tools::dev_t_major($rdev);
> +    my $minor = PVE::Tools::dev_t_minor($rdev);
> +
> +    # We do the rest in a fork with an unshared mount namespace:
> +    #  -) change our apparmor profile to 'pve-container-mounthotplug', which is '/usr/bin/lxc-start'
> +    #     with move_mount privileges on every mount.
> +    #  -) create the device node, then grab it, create a file to bind mount the device node onto in
> +    #     the container, switch to the container mount namespace, and move_mount the device node.
> +
> +    PVE::Tools::run_fork(sub {
> +        # Pin the container pid longer, we also need to get its monitor/parent:
> +        my ($ct_pid, $ct_pidfd) = open_lxc_pid($vmid)
> +            or die "failed to open pidfd of container $vmid\'s init process\n";
> +
> +        my ($monitor_pid, $monitor_pidfd) = open_ppid($ct_pid)
> +            or die "failed to open pidfd of container $vmid\'s monitor process\n";
> +
> +        my $ct_mnt_ns = $get_container_namespace->($vmid, $ct_pid, 'mnt');
> +        my $ct_user_ns = $get_container_namespace->($vmid, $ct_pid, 'user');
> +        my $monitor_mnt_ns = $get_container_namespace->($vmid, $monitor_pid, 'mnt');
> +
> +        # Enter monitor mount namespace and switch to 'pve-container-mounthotplug' apparmor profile.
> +        $enter_mnt_ns_and_change_aa_profile->(
> +            $monitor_mnt_ns, "pve-container-mounthotplug", undef,
> +        );
> +
> +        my $id_map = (PVE::LXC::parse_id_maps($conf))[0];
> +        my $passthrough_device_path = create_passthrough_device_node(
> +            "/var/lib/lxc/$vmid/passthrough",
> +            $dev, $mode, $rdev, $id_map,
> +        );
> +
> +        my $srcfh = PVE::Tools::open_tree(
> +            &AT_FDCWD,
> +            $passthrough_device_path,
> +            &OPEN_TREE_CLOEXEC | &OPEN_TREE_CLONE,
> +        ) or die "open_tree() on passthrough device node failed: $!\n";
> +
> +        if ($conf->{unprivileged}) {
> +            PVE::Tools::setns(fileno($ct_user_ns), PVE::Tools::CLONE_NEWUSER)
> +                or die "failed to enter user namespace of container $vmid: $!\n";
> +
> +            POSIX::setuid(0);
> +            POSIX::setgid(0);
> +        }
> +
> +        # Create a regular file in the container to bind mount the device node onto.
> +        my $device_path = "/proc/$ct_pid/root$dev->{path}";
> +        File::Path::make_path(dirname($device_path));
> +        sysopen(my $dstfh, $device_path, O_CREAT)
> +            or die "failed to create '$device_path': $!\n";
> +
> +        # Enter the container mount namespace
> +        PVE::Tools::setns(fileno($ct_mnt_ns), PVE::Tools::CLONE_NEWNS);
> +        chdir('/')
> +            or die "failed to change directory within the container's mount namespace: $!\n";
> +
> +        # Bind mount the device node into the container
> +        PVE::Tools::move_mount(
> +            fileno($srcfh),
> +            '',
> +            fileno($dstfh),
> +            '',
> +            &MOVE_MOUNT_F_EMPTY_PATH | &MOVE_MOUNT_T_EMPTY_PATH,
> +        ) or die "move_mount failed: $!\n";
> +    });
> +
> +    # Allow or deny device access with cgroup2
> +    run_command(["lxc-cgroup", "-n", $vmid, "devices.deny", "$device_type $major:$minor w"])
> +        if ($dev->{'deny-write'});
> +
> +    my $allow_perms = $dev->{'deny-write'} ? 'r' : 'rw';
> +    run_command([
> +        "lxc-cgroup", "-n", $vmid, "devices.allow", "$device_type $major:$minor $allow_perms",
> +    ]);
> +}
> +
>  sub mountpoint_hotplug : prototype($$$$$) {
>      my ($vmid, $conf, $opt, $mp, $storage_cfg) = @_;
>  
> diff --git a/src/PVE/LXC/Config.pm b/src/PVE/LXC/Config.pm
> index 7aa6263..de963bc 100644
> --- a/src/PVE/LXC/Config.pm
> +++ b/src/PVE/LXC/Config.pm
> @@ -1613,6 +1613,13 @@ sub vmconfig_hotplug_pending {
>                  $class->apply_pending_mountpoint($vmid, $conf, $opt, $storecfg, 1);
>                  # apply_pending_mountpoint modifies the value if it creates a new disk
>                  $value = $conf->{pending}->{$opt};
> +            } elsif ($opt =~ m/^dev(\d+)$/) {
> +                if (exists($conf->{$opt})) {
> +                    die "skip\n"; # don't try to hotplug over existing dev
> +                }
> +
> +                $class->apply_pending_device_passthrough($vmid, $conf, $opt, 1);

Below would do well with a comment, and actually I'm not really sure it's needed,
currently the print_device called in apply_pending_device_passthrough is just
printing the property string per the format, which we just parsed before,
so this should be 1:1 the same before after here? Unlike for mpX mountpoints,
where the formatted string might change. Or is this just for future preparation
for more complex handling/devices?

In anyway, can be fine as is now, it doesn't really hurt either, it's just
a little bit confusing and might be unnecessary, so maybe take another look
at this or provide some rationale.

> +                $value = $conf->{pending}->{$opt};
>              } else {
>                  die "skip\n"; # skip non-hotpluggable
>              }
> @@ -1732,6 +1739,18 @@ my $rescan_volume = sub {
>      warn "Could not rescan volume size - $@\n" if $@;
>  };
>  
> +sub apply_pending_device_passthrough {
> +    my ($class, $vmid, $conf, $opt, $running) = @_;
> +
> +    my $dev = $class->parse_device($conf->{pending}->{$opt});

parsed here

> +    my $old = $conf->{$opt};
> +    if ($running) {
> +        die "skip\n" if defined($old); # TODO: editing a device passthrough
> +        PVE::LXC::device_passthrough_hotplug($vmid, $conf, $dev);
> +        $conf->{pending}->{$opt} = $class->print_device($dev);

serialized 1:1 again here

> +    }
> +}
> +
>  sub apply_pending_mountpoint {
>      my ($class, $vmid, $conf, $opt, $storecfg, $running) = @_;
>  



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] applied-series: [PATCH container v4 0/4] implement device hotplug
  2025-07-28 12:47 [pve-devel] [PATCH container v4 0/4] implement device hotplug Filip Schauer
                   ` (3 preceding siblings ...)
  2025-07-28 12:47 ` [pve-devel] [PATCH container v4 4/4] implement device hotplug Filip Schauer
@ 2025-07-30 11:40 ` Thomas Lamprecht
  4 siblings, 0 replies; 8+ messages in thread
From: Thomas Lamprecht @ 2025-07-30 11:40 UTC (permalink / raw)
  To: pve-devel, Filip Schauer

On Mon, 28 Jul 2025 14:47:48 +0200, Filip Schauer wrote:
> For now this only includes adding devices to a running container.
> (not removing or editing existing devices)
> 
> Changes since v3:
> * Rebase onto new master (b3163a6125cb) and adopt new perltidy format
> * Validate device with get_device_mode_and_rdev helper
> 
> [...]

Applied, thanks! As mentioned in a separate reply, please take a look if
re-serializing the pending option is really sensible/useful here.

[1/4] extract apparmor profile & namespace switch to a helper
      commit: 4721f8e653444c9a069b8b235eb512a293a6bc2f
[2/4] extract passthrough device node creation to a helper
      commit: 2ea0dff57306133f33867ed6c7ce1a4b3a14ce4b
[3/4] config: support printing a device
      commit: a568007b2f1f03bd23aec213b1c1578383ef2cf4
[4/4] implement device hotplug
      commit: c702167e575888072f5cbb34d1a2d1828c26e1f3


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pve-devel] [PATCH container v4 4/4] implement device hotplug
  2025-07-30 10:59   ` Thomas Lamprecht
@ 2025-07-30 12:12     ` Filip Schauer
  0 siblings, 0 replies; 8+ messages in thread
From: Filip Schauer @ 2025-07-30 12:12 UTC (permalink / raw)
  To: Thomas Lamprecht, Proxmox VE development discussion

On 30/07/2025 12:59, Thomas Lamprecht wrote:
> Below would do well with a comment, and actually I'm not really sure it's needed,
> currently the print_device called in apply_pending_device_passthrough is just
> printing the property string per the format, which we just parsed before,
> so this should be 1:1 the same before after here? Unlike for mpX mountpoints,
> where the formatted string might change. Or is this just for future preparation
> for more complex handling/devices?
>
> In anyway, can be fine as is now, it doesn't really hurt either, it's just
> a little bit confusing and might be unnecessary, so maybe take another look
> at this or provide some rationale.

This is indeed unnecessary. I will send a patch to simplify this.



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-07-30 12:12 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-07-28 12:47 [pve-devel] [PATCH container v4 0/4] implement device hotplug Filip Schauer
2025-07-28 12:47 ` [pve-devel] [PATCH container v4 1/4] extract apparmor profile & namespace switch to a helper Filip Schauer
2025-07-28 12:47 ` [pve-devel] [PATCH container v4 2/4] extract passthrough device node creation " Filip Schauer
2025-07-28 12:47 ` [pve-devel] [PATCH container v4 3/4] config: support printing a device Filip Schauer
2025-07-28 12:47 ` [pve-devel] [PATCH container v4 4/4] implement device hotplug Filip Schauer
2025-07-30 10:59   ` Thomas Lamprecht
2025-07-30 12:12     ` Filip Schauer
2025-07-30 11:40 ` [pve-devel] applied-series: [PATCH container v4 0/4] " Thomas Lamprecht

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal