* [pve-devel] [PATCH manger/container] detect containers not supporting pure cgroupv2
@ 2021-07-02 18:21 Stoiko Ivanov
2021-07-02 18:21 ` [pve-devel] [PATCH container 1/1] prestart-hook: detect cgroupv2 incompatible systemd version Stoiko Ivanov
2021-07-02 18:21 ` [pve-devel] [PATCH manager 1/1] pve6to7: check for containers not supporting pure cgroupv2 Stoiko Ivanov
0 siblings, 2 replies; 4+ messages in thread
From: Stoiko Ivanov @ 2021-07-02 18:21 UTC (permalink / raw)
To: pve-devel
This series addresses the issue of running containers, which boot with a
systemd version which is too old (<232) to support the unified cgroup
hierarchy - This includes CentOS 7 and Ubuntu 16.04 containers.
The patch for pve-container simply logs to syslog with level err to notify
the user. Since container start runs through our stack into systemd
(and back into our stack), I did not see a better option (grateful for
feedback if there is of course).
One alternative might be to mount the container once in vm_start (or the
API calls), check and unmount again - but this seemed a bit expensive to do
unconditionally on every start.
The patch for pve6to7 simply loops through all containers and checks for
the condition
pve-container:
Stoiko Ivanov (1):
prestart-hook: detect cgroupv2 incompatible systemd version
src/PVE/LXC/Setup.pm | 8 ++++++++
src/PVE/LXC/Setup/Base.pm | 36 ++++++++++++++++++++++++++++++++++++
src/lxc-pve-prestart-hook | 7 +++++++
3 files changed, 51 insertions(+)
pve-manager:
Stoiko Ivanov (1):
pve6to7: check for containers not supporting pure cgroupv2
PVE/CLI/pve6to7.pm | 68 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 68 insertions(+)
--
2.30.2
^ permalink raw reply [flat|nested] 4+ messages in thread
* [pve-devel] [PATCH container 1/1] prestart-hook: detect cgroupv2 incompatible systemd version
2021-07-02 18:21 [pve-devel] [PATCH manger/container] detect containers not supporting pure cgroupv2 Stoiko Ivanov
@ 2021-07-02 18:21 ` Stoiko Ivanov
2021-07-02 18:21 ` [pve-devel] [PATCH manager 1/1] pve6to7: check for containers not supporting pure cgroupv2 Stoiko Ivanov
1 sibling, 0 replies; 4+ messages in thread
From: Stoiko Ivanov @ 2021-07-02 18:21 UTC (permalink / raw)
To: pve-devel
Some container OS (e.g. CentOS 7, Ubuntu 16.04) are booted with
systemd, in a version which is not able to run with a pure cgroupv2
(a.k.a unified hierarchy) environment.
Detect those in the lxc-pve-prestart-hook, because there we already
have all mount-points set up.
This approach only leaves syslog/journal as place for notifying the
user since starting a container eventually runs `systemctl start
pve-container@VMID.service`, where we lose the prints to stdout and
stderr (and the RPCEnvironment for warning in the tasklog).
The alternative of shortly mounting all container mounts just to
obtain the systemd-version, before starting the container seems
prohibitively expensive.
The heuristic of /sbin/init needing to be a link to something ending
in systemd is taken from the systemd documentation[0] and was verified
on a few of our container-templates (Ubuntu, Debian, SUSE, CentOS, Arch).
[0] https://www.freedesktop.org/software/systemd/man/systemd.html
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
src/PVE/LXC/Setup.pm | 8 ++++++++
src/PVE/LXC/Setup/Base.pm | 36 ++++++++++++++++++++++++++++++++++++
src/lxc-pve-prestart-hook | 7 +++++++
3 files changed, 51 insertions(+)
diff --git a/src/PVE/LXC/Setup.pm b/src/PVE/LXC/Setup.pm
index cf72b03..9abdc85 100644
--- a/src/PVE/LXC/Setup.pm
+++ b/src/PVE/LXC/Setup.pm
@@ -421,4 +421,12 @@ sub get_ct_os_release {
return &$parse_os_release($data);
}
+sub unified_cgroupv2_support {
+ my ($self) = @_;
+
+ $self->protected_call(sub {
+ $self->{plugin}->unified_cgroupv2_support();
+ });
+}
+
1;
diff --git a/src/PVE/LXC/Setup/Base.pm b/src/PVE/LXC/Setup/Base.pm
index 663df73..a5b77d3 100644
--- a/src/PVE/LXC/Setup/Base.pm
+++ b/src/PVE/LXC/Setup/Base.pm
@@ -503,6 +503,42 @@ sub clear_machine_id {
}
}
+# tries to guess the systemd version based on the existence of
+# (/usr)?/lib/systemd/libsystemd-shared<version>.so. It was introduced in v231.
+sub get_systemd_version {
+ my ($self) = @_;
+
+ my $sd_lib_dir = $self->ct_is_directory("/lib/systemd") ?
+ "/lib/systemd" : "/usr/lib/systemd";
+ my $libsd = PVE::Tools::dir_glob_regex($sd_lib_dir, "libsystemd-shared-.+\.so");
+ if (defined($libsd) && $libsd =~ /libsystemd-shared-(\d+)\.so/) {
+ return $1;
+ }
+
+ return undef;
+}
+
+sub unified_cgroupv2_support {
+ my ($self) = @_;
+
+ # https://www.freedesktop.org/software/systemd/man/systemd.html
+ # systemd is installed as symlink to /sbin/init
+ my $systemd = $self->ct_readlink('/sbin/init');
+
+ # assume non-systemd init will run with unified cgroupv2
+ if (!defined($systemd) || $systemd !~ m@/systemd$@) {
+ return 1;
+ }
+
+ # systemd version 232 (e.g. debian stretch) supports the unified hierarchy
+ my $sdver = $self->get_systemd_version();
+ if (!defined($sdver) || $sdver < 232) {
+ return 0;
+ }
+
+ return 1
+}
+
sub pre_start_hook {
my ($self, $conf) = @_;
diff --git a/src/lxc-pve-prestart-hook b/src/lxc-pve-prestart-hook
index 8d876a8..fac587e 100755
--- a/src/lxc-pve-prestart-hook
+++ b/src/lxc-pve-prestart-hook
@@ -15,6 +15,7 @@ use PVE::LXC::Config;
use PVE::LXC::Setup;
use PVE::LXC::Tools;
use PVE::LXC;
+use PVE::SafeSyslog;
use PVE::Storage;
use PVE::Syscall qw(:fsmount);
use PVE::Tools qw(AT_FDCWD O_PATH);
@@ -126,6 +127,12 @@ PVE::LXC::Tools::lxc_hook('pre-start', 'lxc', sub {
my $lxc_setup = PVE::LXC::Setup->new($conf, $rootdir);
$lxc_setup->pre_start_hook();
+ if (PVE::CGroup::cgroup_mode() == 2) {
+ if(!$lxc_setup->unified_cgroupv2_support()) {
+ syslog('err', "CT $vmid does not support running in a pure cgroupv2 environment\n");
+ }
+ }
+
if (@$devices) {
my $devlist = '';
foreach my $dev (@$devices) {
--
2.30.2
^ permalink raw reply [flat|nested] 4+ messages in thread
* [pve-devel] [PATCH manager 1/1] pve6to7: check for containers not supporting pure cgroupv2
2021-07-02 18:21 [pve-devel] [PATCH manger/container] detect containers not supporting pure cgroupv2 Stoiko Ivanov
2021-07-02 18:21 ` [pve-devel] [PATCH container 1/1] prestart-hook: detect cgroupv2 incompatible systemd version Stoiko Ivanov
@ 2021-07-02 18:21 ` Stoiko Ivanov
2021-07-02 22:32 ` Thomas Lamprecht
1 sibling, 1 reply; 4+ messages in thread
From: Stoiko Ivanov @ 2021-07-02 18:21 UTC (permalink / raw)
To: pve-devel
Ordered as much as possible to exit early, still might take quite some
time on systems with many containers (which do support cgroupv2).
needs a versioned bump on pve-container
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
PVE/CLI/pve6to7.pm | 68 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 68 insertions(+)
diff --git a/PVE/CLI/pve6to7.pm b/PVE/CLI/pve6to7.pm
index 60edac11..3d7c67bd 100644
--- a/PVE/CLI/pve6to7.pm
+++ b/PVE/CLI/pve6to7.pm
@@ -23,6 +23,9 @@ use PVE::Tools qw(run_command split_list);
use PVE::QemuConfig;
use PVE::QemuServer;
use PVE::VZDump::Common;
+use PVE::LXC;
+use PVE::LXC::Config;
+use PVE::LXC::Setup;
use Term::ANSIColor;
@@ -890,6 +893,70 @@ sub check_storage_content {
log_pass("no problems found");
}
}
+sub check_containers_cgroup_compat {
+
+ my $kernel_cli = PVE::Tools::file_get_contents('/proc/cmdline');
+ if ($kernel_cli =~ /systemd.unified_cgroup_hierarchy=0/){
+ log_skip("System explicitly configured for legacy hybrid cgroup hierarchy.");
+ return;
+ }
+
+ my $cts = eval { PVE::API2::LXC->vmlist({ node => $nodename }) };
+ if ($@) {
+ log_warn("Failed to retrieve information about this node's CTs - $@");
+ return;
+ }
+
+ if (!defined($cts) || !scalar(@$cts)) {
+ log_skip("No containers on node detected.");
+ return;
+ }
+ my @running_vmids = map { $_->{status} eq 'running' ? $_->{vmid} : () } @$cts;
+ my @offline_vmids = map { $_->{status} ne 'running' ? $_->{vmid} : () } @$cts;
+
+ my $legacy_container=0;
+
+ for my $ctid (@running_vmids) {
+ my $pid = eval { PVE::LXC::find_lxc_pid($ctid) };
+ if (my $err = $@) {
+ log_warn("Failed to get PID for running CT $ctid - $err");
+ next;
+ }
+ my $rootdir = "/proc/$pid/root";
+ my $conf = PVE::LXC::Config->load_config($ctid);
+ my $lxc_setup = PVE::LXC::Setup->new($conf, $rootdir);
+ if (!$lxc_setup->unified_cgroupv2_support()) {
+ log_warn("CT $ctid does not support running in a unified cgroup v2 layout - either " .
+ "upgrade it or set systemd.unified_cgroup_hierarchy=0 in the kernel cmdline - " .
+ "skipping further checks");
+ return;
+ }
+ }
+
+ my $storage_cfg = PVE::Storage::config();
+ for my $ctid (@offline_vmids) {
+ my ($conf, $rootdir, $lxc_setup);
+ eval {
+ $conf = PVE::LXC::Config->load_config($ctid);
+ $rootdir = PVE::LXC::mount_all($ctid, $storage_cfg, $conf);
+ $lxc_setup = PVE::LXC::Setup->new($conf, $rootdir);
+ };
+ if (my $err = $@) {
+ log_warn("Failed to load config and mount CT $ctid - $err");
+ eval { PVE::LXC::umount_all($ctid, $storage_cfg, $conf) };
+ next;
+ }
+ if (!$lxc_setup->unified_cgroupv2_support()) {
+ log_warn("CT $ctid does not support running in a unified cgroup v2 layout - either " .
+ "upgrade it or set systemd.unified_cgroup_hierarchy=0 in the kernel cmdline - " .
+ "skipping further checks");
+ eval { PVE::LXC::umount_all($ctid, $storage_cfg, $conf) };
+ last;
+ }
+
+ eval { PVE::LXC::umount_all($ctid, $storage_cfg, $conf) };
+ }
+};
sub check_misc {
print_header("MISCELLANEOUS CHECKS");
@@ -986,6 +1053,7 @@ sub check_misc {
check_custom_pool_roles();
check_description_lengths();
check_storage_content();
+ check_containers_cgroup_compat();
}
__PACKAGE__->register_method ({
--
2.30.2
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [pve-devel] [PATCH manager 1/1] pve6to7: check for containers not supporting pure cgroupv2
2021-07-02 18:21 ` [pve-devel] [PATCH manager 1/1] pve6to7: check for containers not supporting pure cgroupv2 Stoiko Ivanov
@ 2021-07-02 22:32 ` Thomas Lamprecht
0 siblings, 0 replies; 4+ messages in thread
From: Thomas Lamprecht @ 2021-07-02 22:32 UTC (permalink / raw)
To: Proxmox VE development discussion, Stoiko Ivanov
On 02.07.21 20:21, Stoiko Ivanov wrote:
> Ordered as much as possible to exit early, still might take quite some
> time on systems with many containers (which do support cgroupv2).
The early abort once one is found seems like a good idea in general, but
I still do not really like that happening unconditionally, this could get hidden
behind and opt-in CLI option flag - with a single skip log if not taken.
An admin with only bleeding-edge Arch Linux container then could then just
snicker over software from the stone age and just continue ;)
Also, you're currently missing some cheap optimizations like skipping devuan/alpine
config ostypes early, doing needless work for them.
>
> needs a versioned bump on pve-container
I'd rather prefer copying the required helpers over, as this is mainly required
for stable-6, and it would make it way easier than having versioned dependency
handling for just this in two releases.
>
> Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
> ---
> PVE/CLI/pve6to7.pm | 68 ++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 68 insertions(+)
>
> diff --git a/PVE/CLI/pve6to7.pm b/PVE/CLI/pve6to7.pm
> index 60edac11..3d7c67bd 100644
> --- a/PVE/CLI/pve6to7.pm
> +++ b/PVE/CLI/pve6to7.pm
> @@ -23,6 +23,9 @@ use PVE::Tools qw(run_command split_list);
> use PVE::QemuConfig;
> use PVE::QemuServer;
> use PVE::VZDump::Common;
> +use PVE::LXC;
> +use PVE::LXC::Config;
> +use PVE::LXC::Setup;
>
> use Term::ANSIColor;
>
> @@ -890,6 +893,70 @@ sub check_storage_content {
> log_pass("no problems found");
> }
> }
> +sub check_containers_cgroup_compat {
> +
> + my $kernel_cli = PVE::Tools::file_get_contents('/proc/cmdline');
> + if ($kernel_cli =~ /systemd.unified_cgroup_hierarchy=0/){
> + log_skip("System explicitly configured for legacy hybrid cgroup hierarchy.");
> + return;
> + }
> +
> + my $cts = eval { PVE::API2::LXC->vmlist({ node => $nodename }) };
> + if ($@) {
> + log_warn("Failed to retrieve information about this node's CTs - $@");
> + return;
> + }
> +
> + if (!defined($cts) || !scalar(@$cts)) {
> + log_skip("No containers on node detected.");
> + return;
> + }
> + my @running_vmids = map { $_->{status} eq 'running' ? $_->{vmid} : () } @$cts;
> + my @offline_vmids = map { $_->{status} ne 'running' ? $_->{vmid} : () } @$cts;
nit, but why not grep? Would make it a bit more explicit here, avoiding that any
innocent reader thinks map makes this not work and then spent time getting proved
otherwise ;-)
> +
> + my $legacy_container=0;
> +
> + for my $ctid (@running_vmids) {
> + my $pid = eval { PVE::LXC::find_lxc_pid($ctid) };
> + if (my $err = $@) {
> + log_warn("Failed to get PID for running CT $ctid - $err");
> + next;
> + }
> + my $rootdir = "/proc/$pid/root";
> + my $conf = PVE::LXC::Config->load_config($ctid);
> + my $lxc_setup = PVE::LXC::Setup->new($conf, $rootdir);
> + if (!$lxc_setup->unified_cgroupv2_support()) {
> + log_warn("CT $ctid does not support running in a unified cgroup v2 layout - either " .
Maybe start with "Found at least one CT ($ctid) which does not supp...", makes the
nature of the check slightly less subtle IMO.
> + "upgrade it or set systemd.unified_cgroup_hierarchy=0 in the kernel cmdline - " .
> + "skipping further checks");
> + return;
> + }
> + }
> +
> + my $storage_cfg = PVE::Storage::config();
> + for my $ctid (@offline_vmids) {
> + my ($conf, $rootdir, $lxc_setup);
> + eval {
> + $conf = PVE::LXC::Config->load_config($ctid);
> + $rootdir = PVE::LXC::mount_all($ctid, $storage_cfg, $conf);
> + $lxc_setup = PVE::LXC::Setup->new($conf, $rootdir);
> + };
> + if (my $err = $@) {
> + log_warn("Failed to load config and mount CT $ctid - $err");
> + eval { PVE::LXC::umount_all($ctid, $storage_cfg, $conf) };
> + next;
> + }
> + if (!$lxc_setup->unified_cgroupv2_support()) {
> + log_warn("CT $ctid does not support running in a unified cgroup v2 layout - either " .
> + "upgrade it or set systemd.unified_cgroup_hierarchy=0 in the kernel cmdline - " .
> + "skipping further checks");
maybe factor out the common part of that specific log message
> + eval { PVE::LXC::umount_all($ctid, $storage_cfg, $conf) };
> + last;
> + }
> +
> + eval { PVE::LXC::umount_all($ctid, $storage_cfg, $conf) };
> + }
> +};
>
> sub check_misc {
> print_header("MISCELLANEOUS CHECKS");
> @@ -986,6 +1053,7 @@ sub check_misc {
> check_custom_pool_roles();
> check_description_lengths();
> check_storage_content();
> + check_containers_cgroup_compat();
> }
>
> __PACKAGE__->register_method ({
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-07-02 22:33 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-02 18:21 [pve-devel] [PATCH manger/container] detect containers not supporting pure cgroupv2 Stoiko Ivanov
2021-07-02 18:21 ` [pve-devel] [PATCH container 1/1] prestart-hook: detect cgroupv2 incompatible systemd version Stoiko Ivanov
2021-07-02 18:21 ` [pve-devel] [PATCH manager 1/1] pve6to7: check for containers not supporting pure cgroupv2 Stoiko Ivanov
2021-07-02 22:32 ` Thomas Lamprecht
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal