* [pve-devel] [PATCH manger/container] detect containers not supporting pure cgroupv2 @ 2021-07-02 18:21 Stoiko Ivanov 2021-07-02 18:21 ` [pve-devel] [PATCH container 1/1] prestart-hook: detect cgroupv2 incompatible systemd version Stoiko Ivanov 2021-07-02 18:21 ` [pve-devel] [PATCH manager 1/1] pve6to7: check for containers not supporting pure cgroupv2 Stoiko Ivanov 0 siblings, 2 replies; 4+ messages in thread From: Stoiko Ivanov @ 2021-07-02 18:21 UTC (permalink / raw) To: pve-devel This series addresses the issue of running containers, which boot with a systemd version which is too old (<232) to support the unified cgroup hierarchy - This includes CentOS 7 and Ubuntu 16.04 containers. The patch for pve-container simply logs to syslog with level err to notify the user. Since container start runs through our stack into systemd (and back into our stack), I did not see a better option (grateful for feedback if there is of course). One alternative might be to mount the container once in vm_start (or the API calls), check and unmount again - but this seemed a bit expensive to do unconditionally on every start. The patch for pve6to7 simply loops through all containers and checks for the condition pve-container: Stoiko Ivanov (1): prestart-hook: detect cgroupv2 incompatible systemd version src/PVE/LXC/Setup.pm | 8 ++++++++ src/PVE/LXC/Setup/Base.pm | 36 ++++++++++++++++++++++++++++++++++++ src/lxc-pve-prestart-hook | 7 +++++++ 3 files changed, 51 insertions(+) pve-manager: Stoiko Ivanov (1): pve6to7: check for containers not supporting pure cgroupv2 PVE/CLI/pve6to7.pm | 68 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 68 insertions(+) -- 2.30.2 ^ permalink raw reply [flat|nested] 4+ messages in thread
* [pve-devel] [PATCH container 1/1] prestart-hook: detect cgroupv2 incompatible systemd version 2021-07-02 18:21 [pve-devel] [PATCH manger/container] detect containers not supporting pure cgroupv2 Stoiko Ivanov @ 2021-07-02 18:21 ` Stoiko Ivanov 2021-07-02 18:21 ` [pve-devel] [PATCH manager 1/1] pve6to7: check for containers not supporting pure cgroupv2 Stoiko Ivanov 1 sibling, 0 replies; 4+ messages in thread From: Stoiko Ivanov @ 2021-07-02 18:21 UTC (permalink / raw) To: pve-devel Some container OS (e.g. CentOS 7, Ubuntu 16.04) are booted with systemd, in a version which is not able to run with a pure cgroupv2 (a.k.a unified hierarchy) environment. Detect those in the lxc-pve-prestart-hook, because there we already have all mount-points set up. This approach only leaves syslog/journal as place for notifying the user since starting a container eventually runs `systemctl start pve-container@VMID.service`, where we lose the prints to stdout and stderr (and the RPCEnvironment for warning in the tasklog). The alternative of shortly mounting all container mounts just to obtain the systemd-version, before starting the container seems prohibitively expensive. The heuristic of /sbin/init needing to be a link to something ending in systemd is taken from the systemd documentation[0] and was verified on a few of our container-templates (Ubuntu, Debian, SUSE, CentOS, Arch). [0] https://www.freedesktop.org/software/systemd/man/systemd.html Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com> --- src/PVE/LXC/Setup.pm | 8 ++++++++ src/PVE/LXC/Setup/Base.pm | 36 ++++++++++++++++++++++++++++++++++++ src/lxc-pve-prestart-hook | 7 +++++++ 3 files changed, 51 insertions(+) diff --git a/src/PVE/LXC/Setup.pm b/src/PVE/LXC/Setup.pm index cf72b03..9abdc85 100644 --- a/src/PVE/LXC/Setup.pm +++ b/src/PVE/LXC/Setup.pm @@ -421,4 +421,12 @@ sub get_ct_os_release { return &$parse_os_release($data); } +sub unified_cgroupv2_support { + my ($self) = @_; + + $self->protected_call(sub { + $self->{plugin}->unified_cgroupv2_support(); + }); +} + 1; diff --git a/src/PVE/LXC/Setup/Base.pm b/src/PVE/LXC/Setup/Base.pm index 663df73..a5b77d3 100644 --- a/src/PVE/LXC/Setup/Base.pm +++ b/src/PVE/LXC/Setup/Base.pm @@ -503,6 +503,42 @@ sub clear_machine_id { } } +# tries to guess the systemd version based on the existence of +# (/usr)?/lib/systemd/libsystemd-shared<version>.so. It was introduced in v231. +sub get_systemd_version { + my ($self) = @_; + + my $sd_lib_dir = $self->ct_is_directory("/lib/systemd") ? + "/lib/systemd" : "/usr/lib/systemd"; + my $libsd = PVE::Tools::dir_glob_regex($sd_lib_dir, "libsystemd-shared-.+\.so"); + if (defined($libsd) && $libsd =~ /libsystemd-shared-(\d+)\.so/) { + return $1; + } + + return undef; +} + +sub unified_cgroupv2_support { + my ($self) = @_; + + # https://www.freedesktop.org/software/systemd/man/systemd.html + # systemd is installed as symlink to /sbin/init + my $systemd = $self->ct_readlink('/sbin/init'); + + # assume non-systemd init will run with unified cgroupv2 + if (!defined($systemd) || $systemd !~ m@/systemd$@) { + return 1; + } + + # systemd version 232 (e.g. debian stretch) supports the unified hierarchy + my $sdver = $self->get_systemd_version(); + if (!defined($sdver) || $sdver < 232) { + return 0; + } + + return 1 +} + sub pre_start_hook { my ($self, $conf) = @_; diff --git a/src/lxc-pve-prestart-hook b/src/lxc-pve-prestart-hook index 8d876a8..fac587e 100755 --- a/src/lxc-pve-prestart-hook +++ b/src/lxc-pve-prestart-hook @@ -15,6 +15,7 @@ use PVE::LXC::Config; use PVE::LXC::Setup; use PVE::LXC::Tools; use PVE::LXC; +use PVE::SafeSyslog; use PVE::Storage; use PVE::Syscall qw(:fsmount); use PVE::Tools qw(AT_FDCWD O_PATH); @@ -126,6 +127,12 @@ PVE::LXC::Tools::lxc_hook('pre-start', 'lxc', sub { my $lxc_setup = PVE::LXC::Setup->new($conf, $rootdir); $lxc_setup->pre_start_hook(); + if (PVE::CGroup::cgroup_mode() == 2) { + if(!$lxc_setup->unified_cgroupv2_support()) { + syslog('err', "CT $vmid does not support running in a pure cgroupv2 environment\n"); + } + } + if (@$devices) { my $devlist = ''; foreach my $dev (@$devices) { -- 2.30.2 ^ permalink raw reply [flat|nested] 4+ messages in thread
* [pve-devel] [PATCH manager 1/1] pve6to7: check for containers not supporting pure cgroupv2 2021-07-02 18:21 [pve-devel] [PATCH manger/container] detect containers not supporting pure cgroupv2 Stoiko Ivanov 2021-07-02 18:21 ` [pve-devel] [PATCH container 1/1] prestart-hook: detect cgroupv2 incompatible systemd version Stoiko Ivanov @ 2021-07-02 18:21 ` Stoiko Ivanov 2021-07-02 22:32 ` Thomas Lamprecht 1 sibling, 1 reply; 4+ messages in thread From: Stoiko Ivanov @ 2021-07-02 18:21 UTC (permalink / raw) To: pve-devel Ordered as much as possible to exit early, still might take quite some time on systems with many containers (which do support cgroupv2). needs a versioned bump on pve-container Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com> --- PVE/CLI/pve6to7.pm | 68 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 68 insertions(+) diff --git a/PVE/CLI/pve6to7.pm b/PVE/CLI/pve6to7.pm index 60edac11..3d7c67bd 100644 --- a/PVE/CLI/pve6to7.pm +++ b/PVE/CLI/pve6to7.pm @@ -23,6 +23,9 @@ use PVE::Tools qw(run_command split_list); use PVE::QemuConfig; use PVE::QemuServer; use PVE::VZDump::Common; +use PVE::LXC; +use PVE::LXC::Config; +use PVE::LXC::Setup; use Term::ANSIColor; @@ -890,6 +893,70 @@ sub check_storage_content { log_pass("no problems found"); } } +sub check_containers_cgroup_compat { + + my $kernel_cli = PVE::Tools::file_get_contents('/proc/cmdline'); + if ($kernel_cli =~ /systemd.unified_cgroup_hierarchy=0/){ + log_skip("System explicitly configured for legacy hybrid cgroup hierarchy."); + return; + } + + my $cts = eval { PVE::API2::LXC->vmlist({ node => $nodename }) }; + if ($@) { + log_warn("Failed to retrieve information about this node's CTs - $@"); + return; + } + + if (!defined($cts) || !scalar(@$cts)) { + log_skip("No containers on node detected."); + return; + } + my @running_vmids = map { $_->{status} eq 'running' ? $_->{vmid} : () } @$cts; + my @offline_vmids = map { $_->{status} ne 'running' ? $_->{vmid} : () } @$cts; + + my $legacy_container=0; + + for my $ctid (@running_vmids) { + my $pid = eval { PVE::LXC::find_lxc_pid($ctid) }; + if (my $err = $@) { + log_warn("Failed to get PID for running CT $ctid - $err"); + next; + } + my $rootdir = "/proc/$pid/root"; + my $conf = PVE::LXC::Config->load_config($ctid); + my $lxc_setup = PVE::LXC::Setup->new($conf, $rootdir); + if (!$lxc_setup->unified_cgroupv2_support()) { + log_warn("CT $ctid does not support running in a unified cgroup v2 layout - either " . + "upgrade it or set systemd.unified_cgroup_hierarchy=0 in the kernel cmdline - " . + "skipping further checks"); + return; + } + } + + my $storage_cfg = PVE::Storage::config(); + for my $ctid (@offline_vmids) { + my ($conf, $rootdir, $lxc_setup); + eval { + $conf = PVE::LXC::Config->load_config($ctid); + $rootdir = PVE::LXC::mount_all($ctid, $storage_cfg, $conf); + $lxc_setup = PVE::LXC::Setup->new($conf, $rootdir); + }; + if (my $err = $@) { + log_warn("Failed to load config and mount CT $ctid - $err"); + eval { PVE::LXC::umount_all($ctid, $storage_cfg, $conf) }; + next; + } + if (!$lxc_setup->unified_cgroupv2_support()) { + log_warn("CT $ctid does not support running in a unified cgroup v2 layout - either " . + "upgrade it or set systemd.unified_cgroup_hierarchy=0 in the kernel cmdline - " . + "skipping further checks"); + eval { PVE::LXC::umount_all($ctid, $storage_cfg, $conf) }; + last; + } + + eval { PVE::LXC::umount_all($ctid, $storage_cfg, $conf) }; + } +}; sub check_misc { print_header("MISCELLANEOUS CHECKS"); @@ -986,6 +1053,7 @@ sub check_misc { check_custom_pool_roles(); check_description_lengths(); check_storage_content(); + check_containers_cgroup_compat(); } __PACKAGE__->register_method ({ -- 2.30.2 ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [pve-devel] [PATCH manager 1/1] pve6to7: check for containers not supporting pure cgroupv2 2021-07-02 18:21 ` [pve-devel] [PATCH manager 1/1] pve6to7: check for containers not supporting pure cgroupv2 Stoiko Ivanov @ 2021-07-02 22:32 ` Thomas Lamprecht 0 siblings, 0 replies; 4+ messages in thread From: Thomas Lamprecht @ 2021-07-02 22:32 UTC (permalink / raw) To: Proxmox VE development discussion, Stoiko Ivanov On 02.07.21 20:21, Stoiko Ivanov wrote: > Ordered as much as possible to exit early, still might take quite some > time on systems with many containers (which do support cgroupv2). The early abort once one is found seems like a good idea in general, but I still do not really like that happening unconditionally, this could get hidden behind and opt-in CLI option flag - with a single skip log if not taken. An admin with only bleeding-edge Arch Linux container then could then just snicker over software from the stone age and just continue ;) Also, you're currently missing some cheap optimizations like skipping devuan/alpine config ostypes early, doing needless work for them. > > needs a versioned bump on pve-container I'd rather prefer copying the required helpers over, as this is mainly required for stable-6, and it would make it way easier than having versioned dependency handling for just this in two releases. > > Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com> > --- > PVE/CLI/pve6to7.pm | 68 ++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 68 insertions(+) > > diff --git a/PVE/CLI/pve6to7.pm b/PVE/CLI/pve6to7.pm > index 60edac11..3d7c67bd 100644 > --- a/PVE/CLI/pve6to7.pm > +++ b/PVE/CLI/pve6to7.pm > @@ -23,6 +23,9 @@ use PVE::Tools qw(run_command split_list); > use PVE::QemuConfig; > use PVE::QemuServer; > use PVE::VZDump::Common; > +use PVE::LXC; > +use PVE::LXC::Config; > +use PVE::LXC::Setup; > > use Term::ANSIColor; > > @@ -890,6 +893,70 @@ sub check_storage_content { > log_pass("no problems found"); > } > } > +sub check_containers_cgroup_compat { > + > + my $kernel_cli = PVE::Tools::file_get_contents('/proc/cmdline'); > + if ($kernel_cli =~ /systemd.unified_cgroup_hierarchy=0/){ > + log_skip("System explicitly configured for legacy hybrid cgroup hierarchy."); > + return; > + } > + > + my $cts = eval { PVE::API2::LXC->vmlist({ node => $nodename }) }; > + if ($@) { > + log_warn("Failed to retrieve information about this node's CTs - $@"); > + return; > + } > + > + if (!defined($cts) || !scalar(@$cts)) { > + log_skip("No containers on node detected."); > + return; > + } > + my @running_vmids = map { $_->{status} eq 'running' ? $_->{vmid} : () } @$cts; > + my @offline_vmids = map { $_->{status} ne 'running' ? $_->{vmid} : () } @$cts; nit, but why not grep? Would make it a bit more explicit here, avoiding that any innocent reader thinks map makes this not work and then spent time getting proved otherwise ;-) > + > + my $legacy_container=0; > + > + for my $ctid (@running_vmids) { > + my $pid = eval { PVE::LXC::find_lxc_pid($ctid) }; > + if (my $err = $@) { > + log_warn("Failed to get PID for running CT $ctid - $err"); > + next; > + } > + my $rootdir = "/proc/$pid/root"; > + my $conf = PVE::LXC::Config->load_config($ctid); > + my $lxc_setup = PVE::LXC::Setup->new($conf, $rootdir); > + if (!$lxc_setup->unified_cgroupv2_support()) { > + log_warn("CT $ctid does not support running in a unified cgroup v2 layout - either " . Maybe start with "Found at least one CT ($ctid) which does not supp...", makes the nature of the check slightly less subtle IMO. > + "upgrade it or set systemd.unified_cgroup_hierarchy=0 in the kernel cmdline - " . > + "skipping further checks"); > + return; > + } > + } > + > + my $storage_cfg = PVE::Storage::config(); > + for my $ctid (@offline_vmids) { > + my ($conf, $rootdir, $lxc_setup); > + eval { > + $conf = PVE::LXC::Config->load_config($ctid); > + $rootdir = PVE::LXC::mount_all($ctid, $storage_cfg, $conf); > + $lxc_setup = PVE::LXC::Setup->new($conf, $rootdir); > + }; > + if (my $err = $@) { > + log_warn("Failed to load config and mount CT $ctid - $err"); > + eval { PVE::LXC::umount_all($ctid, $storage_cfg, $conf) }; > + next; > + } > + if (!$lxc_setup->unified_cgroupv2_support()) { > + log_warn("CT $ctid does not support running in a unified cgroup v2 layout - either " . > + "upgrade it or set systemd.unified_cgroup_hierarchy=0 in the kernel cmdline - " . > + "skipping further checks"); maybe factor out the common part of that specific log message > + eval { PVE::LXC::umount_all($ctid, $storage_cfg, $conf) }; > + last; > + } > + > + eval { PVE::LXC::umount_all($ctid, $storage_cfg, $conf) }; > + } > +}; > > sub check_misc { > print_header("MISCELLANEOUS CHECKS"); > @@ -986,6 +1053,7 @@ sub check_misc { > check_custom_pool_roles(); > check_description_lengths(); > check_storage_content(); > + check_containers_cgroup_compat(); > } > > __PACKAGE__->register_method ({ > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-07-02 22:33 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-07-02 18:21 [pve-devel] [PATCH manger/container] detect containers not supporting pure cgroupv2 Stoiko Ivanov 2021-07-02 18:21 ` [pve-devel] [PATCH container 1/1] prestart-hook: detect cgroupv2 incompatible systemd version Stoiko Ivanov 2021-07-02 18:21 ` [pve-devel] [PATCH manager 1/1] pve6to7: check for containers not supporting pure cgroupv2 Stoiko Ivanov 2021-07-02 22:32 ` Thomas Lamprecht
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox