* [pve-devel] [PATCH container v2 0/4] implement device hotplug
@ 2025-04-23 12:56 Filip Schauer
2025-04-23 12:56 ` [pve-devel] [PATCH container v2 1/4] extract apparmor profile & namespace switch to a helper Filip Schauer
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Filip Schauer @ 2025-04-23 12:56 UTC (permalink / raw)
To: pve-devel
For now this only includes adding devices to a running container.
(not removing or editing existing devices)
Changes since v1:
* Reduce code repetition between lxc-pve-prestart-hook and
device_passthrough_hotplug by extracting passthrough device node
creation to a helper.
Filip Schauer (4):
extract apparmor profile & namespace switch to a helper
extract passthrough device node creation to a helper
config: support printing a device
implement device hotplug
src/PVE/LXC.pm | 174 +++++++++++++++++++++++++++++++-------
src/PVE/LXC/Config.pm | 25 ++++++
src/lxc-pve-prestart-hook | 21 +----
3 files changed, 169 insertions(+), 51 deletions(-)
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* [pve-devel] [PATCH container v2 1/4] extract apparmor profile & namespace switch to a helper
2025-04-23 12:56 [pve-devel] [PATCH container v2 0/4] implement device hotplug Filip Schauer
@ 2025-04-23 12:56 ` Filip Schauer
2025-04-23 12:56 ` [pve-devel] [PATCH container v2 2/4] extract passthrough device node creation " Filip Schauer
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Filip Schauer @ 2025-04-23 12:56 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
---
src/PVE/LXC.pm | 71 +++++++++++++++++++++++++++++---------------------
1 file changed, 41 insertions(+), 30 deletions(-)
diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm
index a58c997..d929608 100644
--- a/src/PVE/LXC.pm
+++ b/src/PVE/LXC.pm
@@ -2010,15 +2010,42 @@ sub __mountpoint_mount {
die "unsupported storage";
}
+my $enter_mnt_ns_and_change_aa_profile = sub {
+ my ($mnt_ns, $aa_profile, $code_pre_changeprofile) = @_;
+
+ # Grab a file descriptor to our apparmor label file so we can change the profile
+ sysopen(my $aa_fd, "/proc/self/attr/current", O_WRONLY)
+ or die "failed to open '/proc/self/attr/current' for writing: $!\n";
+
+ # But switch namespaces first, to make sure the namespace switches aren't blocked by
+ # apparmor.
+ PVE::Tools::setns(fileno($mnt_ns), PVE::Tools::CLONE_NEWNS);
+ chdir('/')
+ or die "failed to change root directory within mount namespace: $!\n";
+
+ $code_pre_changeprofile->() if defined($code_pre_changeprofile);
+
+ # Now switch our apparmor profile:
+ my $data = "changeprofile $aa_profile";
+ my $data_written = syswrite($aa_fd, $data, length($data));
+ if (!defined($data_written) || $data_written != length($data)) {
+ die "failed to change apparmor profile: $!\n";
+ }
+ # Check errors on close as well:
+ close($aa_fd)
+ or die "failed to change apparmor profile (close() failed): $!\n";
+};
+
sub mountpoint_hotplug :prototype($$$$$) {
my ($vmid, $conf, $opt, $mp, $storage_cfg) = @_;
my (undef, $root_uid, $root_gid) = PVE::LXC::parse_id_maps($conf);
- # We do the rest in a fork with an unshared mount namespace, because:
- # -) change our papparmor profile to that of /usr/bin/lxc-start
- # -) we're now going to 'stage' # the mountpoint, then grab it, then move into the
- # container's namespace, then mount it.
+ # We do the rest in a fork with an unshared mount namespace:
+ # -) change our apparmor profile to 'pve-container-mounthotplug', which is '/usr/bin/lxc-start'
+ # with move_mount privileges on every mount.
+ # -) we're now going to 'stage' the mountpoint, then grab it, then move into the container's
+ # namespace, then mount it.
PVE::Tools::run_fork(sub {
# Pin the container pid longer, we also need to get its monitor/parent:
@@ -2031,32 +2058,16 @@ sub mountpoint_hotplug :prototype($$$$$) {
my $ct_mnt_ns = $get_container_namespace->($vmid, $ct_pid, 'mnt');
my $monitor_mnt_ns = $get_container_namespace->($vmid, $monitor_pid, 'mnt');
- # Grab a file descriptor to our apparmor label file so we can change into the 'lxc-start'
- # profile to lower our privileges to the same level we have in the start hook:
- sysopen(my $aa_fd, "/proc/self/attr/current", O_WRONLY)
- or die "failed to open '/proc/self/attr/current' for writing: $!\n";
- # But switch namespaces first, to make sure the namespace switches aren't blocked by
- # apparmor.
-
- # Change into the monitor's mount namespace. We "pin" the mount into the monitor's
- # namespace for it to remain active there since the container will be able to unmount
- # hotplugged mount points and thereby potentially free up loop devices, which is a security
- # concern.
- PVE::Tools::setns(fileno($monitor_mnt_ns), PVE::Tools::CLONE_NEWNS);
- chdir('/')
- or die "failed to change root directory within the monitor's mount namespace: $!\n";
-
- my $dir = get_staging_mount_path($opt);
-
- # Now switch our apparmor profile before mounting:
- my $data = 'changeprofile pve-container-mounthotplug';
- my $data_written = syswrite($aa_fd, $data, length($data));
- if (!defined($data_written) || $data_written != length($data)) {
- die "failed to change apparmor profile: $!\n";
- }
- # Check errors on close as well:
- close($aa_fd)
- or die "failed to change apparmor profile (close() failed): $!\n";
+ # -) Change into the monitor's mount namespace. We "pin" the mount into the monitor's
+ # namespace for it to remain active there since the container will be able to unmount
+ # hotplugged mount points and thereby potentially free up loop devices, which is a
+ # security concern.
+ # -) Prepare the staging directory.
+ # -) Switch to the 'pve-container-mounthotplug' apparmor profile.
+ my $dir;
+ $enter_mnt_ns_and_change_aa_profile->($monitor_mnt_ns, "pve-container-mounthotplug", sub {
+ $dir = get_staging_mount_path($opt);
+ });
my $mount_fd = mountpoint_stage($mp, $dir, $storage_cfg, undef, $root_uid, $root_gid);
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* [pve-devel] [PATCH container v2 2/4] extract passthrough device node creation to a helper
2025-04-23 12:56 [pve-devel] [PATCH container v2 0/4] implement device hotplug Filip Schauer
2025-04-23 12:56 ` [pve-devel] [PATCH container v2 1/4] extract apparmor profile & namespace switch to a helper Filip Schauer
@ 2025-04-23 12:56 ` Filip Schauer
2025-04-23 12:56 ` [pve-devel] [PATCH container v2 3/4] config: support printing a device Filip Schauer
2025-04-23 12:56 ` [pve-devel] [PATCH container v2 4/4] implement device hotplug Filip Schauer
3 siblings, 0 replies; 5+ messages in thread
From: Filip Schauer @ 2025-04-23 12:56 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
---
src/PVE/LXC.pm | 29 +++++++++++++++++++++++++++++
src/lxc-pve-prestart-hook | 21 +--------------------
2 files changed, 30 insertions(+), 20 deletions(-)
diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm
index d929608..d985b88 100644
--- a/src/PVE/LXC.pm
+++ b/src/PVE/LXC.pm
@@ -6,6 +6,7 @@ use warnings;
use Cwd qw();
use Errno qw(ELOOP ENOTDIR EROFS ECONNREFUSED EEXIST);
use Fcntl qw(O_RDONLY O_WRONLY O_NOFOLLOW O_DIRECTORY :mode);
+use File::Basename;
use File::Path;
use File::Spec;
use IO::Poll qw(POLLIN POLLHUP);
@@ -1662,6 +1663,34 @@ sub run_with_loopdev {
return $device;
}
+sub create_passthrough_device_node($$$$$) {
+ my ($passthrough_dir, $device, $mode, $rdev, $id_map) = @_;
+
+ # Create the device node
+ my $passthrough_device_path = $passthrough_dir . $device->{path};
+ File::Path::make_path(dirname($passthrough_device_path));
+ PVE::Tools::mknod($passthrough_device_path, $mode, $rdev)
+ or die("failed to mknod $passthrough_device_path: $!\n");
+
+ # Use chmod because umask could mess with the access mode on mknod
+ my $passthrough_mode = 0660;
+ $passthrough_mode = oct($device->{mode}) if defined($device->{mode});
+ chmod $passthrough_mode, $passthrough_device_path
+ or die "failed to chmod $passthrough_mode $passthrough_device_path: $!\n";
+
+ # Set uid and gid of the device node
+ my $uid = 0;
+ my $gid = 0;
+ $uid = $device->{uid} if defined($device->{uid});
+ $gid = $device->{gid} if defined($device->{gid});
+ $uid = PVE::LXC::map_ct_uid_to_host($uid, $id_map);
+ $gid = PVE::LXC::map_ct_gid_to_host($gid, $id_map);
+ chown $uid, $gid, $passthrough_device_path
+ or die("failed to chown $uid:$gid $passthrough_device_path: $!\n");
+
+ return $passthrough_device_path;
+}
+
# In scalar mode: returns a file handle to the deepest directory node.
# In list context: returns a list of:
# * the deepest directory node
diff --git a/src/lxc-pve-prestart-hook b/src/lxc-pve-prestart-hook
index fdaead2..61dfb3b 100755
--- a/src/lxc-pve-prestart-hook
+++ b/src/lxc-pve-prestart-hook
@@ -144,26 +144,7 @@ PVE::LXC::Tools::lxc_hook('pre-start', 'lxc', sub {
die "Could not get mode or device ID of $absolute_path\n"
if (!defined($mode) || !defined($rdev));
- my $passthrough_device_path = $passthrough_dir . $absolute_path;
- File::Path::make_path(dirname($passthrough_device_path));
- PVE::Tools::mknod($passthrough_device_path, $mode, $rdev)
- or die("failed to mknod $passthrough_device_path: $!\n");
-
- # Use chmod because umask could mess with the access mode on mknod
- my $passthrough_mode = 0660;
- $passthrough_mode = oct($device->{mode}) if defined($device->{mode});
- chmod $passthrough_mode, $passthrough_device_path
- or die "failed to chmod $passthrough_mode $passthrough_device_path: $!\n";
-
- # Set uid and gid of the device node
- my $uid = 0;
- my $gid = 0;
- $uid = $device->{uid} if defined($device->{uid});
- $gid = $device->{gid} if defined($device->{gid});
- $uid = PVE::LXC::map_ct_uid_to_host($uid, $id_map);
- $gid = PVE::LXC::map_ct_gid_to_host($gid, $id_map);
- chown $uid, $gid, $passthrough_device_path
- or die("failed to chown $uid:$gid $passthrough_device_path: $!\n");
+ PVE::LXC::create_passthrough_device_node($passthrough_dir, $device, $mode, $rdev, $id_map);
push @$passthrough_devices, [$absolute_path, $mode, $rdev];
};
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* [pve-devel] [PATCH container v2 3/4] config: support printing a device
2025-04-23 12:56 [pve-devel] [PATCH container v2 0/4] implement device hotplug Filip Schauer
2025-04-23 12:56 ` [pve-devel] [PATCH container v2 1/4] extract apparmor profile & namespace switch to a helper Filip Schauer
2025-04-23 12:56 ` [pve-devel] [PATCH container v2 2/4] extract passthrough device node creation " Filip Schauer
@ 2025-04-23 12:56 ` Filip Schauer
2025-04-23 12:56 ` [pve-devel] [PATCH container v2 4/4] implement device hotplug Filip Schauer
3 siblings, 0 replies; 5+ messages in thread
From: Filip Schauer @ 2025-04-23 12:56 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
---
src/PVE/LXC/Config.pm | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/src/PVE/LXC/Config.pm b/src/PVE/LXC/Config.pm
index 0740e8c..555767f 100644
--- a/src/PVE/LXC/Config.pm
+++ b/src/PVE/LXC/Config.pm
@@ -1364,6 +1364,12 @@ sub print_volume {
return $class->print_ct_mountpoint($volume, $key eq 'rootfs');
}
+sub print_device {
+ my ($class, $info) = @_;
+
+ return PVE::JSONSchema::print_property_string($info, $dev_desc);
+}
+
sub volid_key {
my ($class) = @_;
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* [pve-devel] [PATCH container v2 4/4] implement device hotplug
2025-04-23 12:56 [pve-devel] [PATCH container v2 0/4] implement device hotplug Filip Schauer
` (2 preceding siblings ...)
2025-04-23 12:56 ` [pve-devel] [PATCH container v2 3/4] config: support printing a device Filip Schauer
@ 2025-04-23 12:56 ` Filip Schauer
3 siblings, 0 replies; 5+ messages in thread
From: Filip Schauer @ 2025-04-23 12:56 UTC (permalink / raw)
To: pve-devel
This only includes adding devices to a running container. Removing or
editing existing devices is still not implemented.
Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
---
src/PVE/LXC.pm | 74 ++++++++++++++++++++++++++++++++++++++++++-
src/PVE/LXC/Config.pm | 19 +++++++++++
2 files changed, 92 insertions(+), 1 deletion(-)
diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm
index d985b88..0c8c2e9 100644
--- a/src/PVE/LXC.pm
+++ b/src/PVE/LXC.pm
@@ -5,7 +5,7 @@ use warnings;
use Cwd qw();
use Errno qw(ELOOP ENOTDIR EROFS ECONNREFUSED EEXIST);
-use Fcntl qw(O_RDONLY O_WRONLY O_NOFOLLOW O_DIRECTORY :mode);
+use Fcntl qw(O_RDONLY O_WRONLY O_NOFOLLOW O_DIRECTORY O_CREAT :mode);
use File::Basename;
use File::Path;
use File::Spec;
@@ -2065,6 +2065,78 @@ my $enter_mnt_ns_and_change_aa_profile = sub {
or die "failed to change apparmor profile (close() failed): $!\n";
};
+sub device_passthrough_hotplug :prototype($$$) {
+ my ($vmid, $conf, $dev) = @_;
+
+ my ($mode, $rdev) = (stat($dev->{path}))[2, 6];
+
+ die "Could not get mode or device ID of $dev->{path}\n"
+ if (!defined($mode) || !defined($rdev));
+
+ # We do the rest in a fork with an unshared mount namespace:
+ # -) change our apparmor profile to 'pve-container-mounthotplug', which is '/usr/bin/lxc-start'
+ # with move_mount privileges on every mount.
+ # -) create the device node, then grab it, create a file to bind mount the device node onto in
+ # the container, switch to the container mount namespace, and move_mount the device node.
+
+ PVE::Tools::run_fork(sub {
+ # Pin the container pid longer, we also need to get its monitor/parent:
+ my ($ct_pid, $ct_pidfd) = open_lxc_pid($vmid)
+ or die "failed to open pidfd of container $vmid\'s init process\n";
+
+ my ($monitor_pid, $monitor_pidfd) = open_ppid($ct_pid)
+ or die "failed to open pidfd of container $vmid\'s monitor process\n";
+
+ my $ct_mnt_ns = $get_container_namespace->($vmid, $ct_pid, 'mnt');
+ my $ct_user_ns = $get_container_namespace->($vmid, $ct_pid, 'user');
+ my $monitor_mnt_ns = $get_container_namespace->($vmid, $monitor_pid, 'mnt');
+
+ # Enter monitor mount namespace and switch to 'pve-container-mounthotplug' apparmor profile.
+ $enter_mnt_ns_and_change_aa_profile->($monitor_mnt_ns, "pve-container-mounthotplug", undef);
+
+ my $id_map = (PVE::LXC::parse_id_maps($conf))[0];
+ my $passthrough_device_path = create_passthrough_device_node(
+ "/var/lib/lxc/$vmid/passthrough", $dev, $mode, $rdev, $id_map);
+
+ my $srcfh = PVE::Tools::open_tree(&AT_FDCWD, $passthrough_device_path, &OPEN_TREE_CLOEXEC | &OPEN_TREE_CLONE)
+ or die "open_tree() on passthrough device node failed: $!\n";
+
+ if ($conf->{unprivileged}) {
+ PVE::Tools::setns(fileno($ct_user_ns), PVE::Tools::CLONE_NEWUSER)
+ or die "failed to enter user namespace of container $vmid: $!\n";
+
+ POSIX::setuid(0);
+ POSIX::setgid(0);
+ }
+
+ # Create a regular file in the container to bind mount the device node onto.
+ sysopen(my $dstfh, "/proc/$ct_pid/root$dev->{path}", O_CREAT)
+ or die "failed to open '/proc/$ct_pid/root$dev->{path}': $!\n";
+
+ # Enter the container mount namespace
+ PVE::Tools::setns(fileno($ct_mnt_ns), PVE::Tools::CLONE_NEWNS);
+ chdir('/')
+ or die "failed to change root directory within the container's mount namespace: $!\n";
+
+ # Bind mount the device node into the container
+ PVE::Tools::move_mount(fileno($srcfh), '', fileno($dstfh), '', &MOVE_MOUNT_F_EMPTY_PATH | &MOVE_MOUNT_T_EMPTY_PATH)
+ or die "move_mount failed: $!\n";
+ });
+
+ # Allow or deny device access with cgroup2
+ my $major = PVE::Tools::dev_t_major($rdev);
+ my $minor = PVE::Tools::dev_t_minor($rdev);
+ my $device_type = S_ISBLK($mode) ? 'b' : 'c';
+
+ run_command(["lxc-cgroup", "-n", $vmid, "devices.deny", "$device_type $major:$minor w"])
+ if ($dev->{'deny-write'});
+
+ my $allow_perms = $dev->{'deny-write'} ? 'r' : 'rw';
+ run_command([
+ "lxc-cgroup", "-n", $vmid, "devices.allow", "$device_type $major:$minor $allow_perms"
+ ]);
+}
+
sub mountpoint_hotplug :prototype($$$$$) {
my ($vmid, $conf, $opt, $mp, $storage_cfg) = @_;
diff --git a/src/PVE/LXC/Config.pm b/src/PVE/LXC/Config.pm
index 555767f..f6795a5 100644
--- a/src/PVE/LXC/Config.pm
+++ b/src/PVE/LXC/Config.pm
@@ -1535,6 +1535,13 @@ sub vmconfig_hotplug_pending {
$class->apply_pending_mountpoint($vmid, $conf, $opt, $storecfg, 1);
# apply_pending_mountpoint modifies the value if it creates a new disk
$value = $conf->{pending}->{$opt};
+ } elsif ($opt =~ m/^dev(\d+)$/) {
+ if (exists($conf->{$opt})) {
+ die "skip\n"; # don't try to hotplug over existing dev
+ }
+
+ $class->apply_pending_device_passthrough($vmid, $conf, $opt, 1);
+ $value = $conf->{pending}->{$opt};
} else {
die "skip\n"; # skip non-hotpluggable
}
@@ -1629,6 +1636,18 @@ my $rescan_volume = sub {
warn "Could not rescan volume size - $@\n" if $@;
};
+sub apply_pending_device_passthrough {
+ my ($class, $vmid, $conf, $opt, $running) = @_;
+
+ my $dev = $class->parse_device($conf->{pending}->{$opt});
+ my $old = $conf->{$opt};
+ if ($running) {
+ die "skip\n" if defined($old); # TODO: editing a device passthrough
+ PVE::LXC::device_passthrough_hotplug($vmid, $conf, $dev);
+ $conf->{pending}->{$opt} = $class->print_device($dev);
+ }
+}
+
sub apply_pending_mountpoint {
my ($class, $vmid, $conf, $opt, $storecfg, $running) = @_;
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-04-23 12:57 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-23 12:56 [pve-devel] [PATCH container v2 0/4] implement device hotplug Filip Schauer
2025-04-23 12:56 ` [pve-devel] [PATCH container v2 1/4] extract apparmor profile & namespace switch to a helper Filip Schauer
2025-04-23 12:56 ` [pve-devel] [PATCH container v2 2/4] extract passthrough device node creation " Filip Schauer
2025-04-23 12:56 ` [pve-devel] [PATCH container v2 3/4] config: support printing a device Filip Schauer
2025-04-23 12:56 ` [pve-devel] [PATCH container v2 4/4] implement device hotplug Filip Schauer
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal