public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH manger/container] detect containers not supporting pure cgroupv2
@ 2021-07-02 18:21 Stoiko Ivanov
  2021-07-02 18:21 ` [pve-devel] [PATCH container 1/1] prestart-hook: detect cgroupv2 incompatible systemd version Stoiko Ivanov
  2021-07-02 18:21 ` [pve-devel] [PATCH manager 1/1] pve6to7: check for containers not supporting pure cgroupv2 Stoiko Ivanov
  0 siblings, 2 replies; 4+ messages in thread
From: Stoiko Ivanov @ 2021-07-02 18:21 UTC (permalink / raw)
  To: pve-devel

This series addresses the issue of running containers, which boot with a
systemd version which is too old (<232) to support the unified cgroup
hierarchy - This includes CentOS 7 and Ubuntu 16.04 containers.

The patch for pve-container simply logs to syslog with level err to notify
the user. Since container start runs through our stack into systemd
(and back into our stack), I did not see a better option (grateful for
feedback if there is of course).

One alternative might be to mount the container once in vm_start (or the
API calls), check and unmount again - but this seemed a bit expensive to do
unconditionally on every start.

The patch for pve6to7 simply loops through all containers and checks for
the condition

pve-container:
Stoiko Ivanov (1):
  prestart-hook: detect cgroupv2 incompatible systemd version

 src/PVE/LXC/Setup.pm      |  8 ++++++++
 src/PVE/LXC/Setup/Base.pm | 36 ++++++++++++++++++++++++++++++++++++
 src/lxc-pve-prestart-hook |  7 +++++++
 3 files changed, 51 insertions(+)

pve-manager:
Stoiko Ivanov (1):
  pve6to7: check for containers not supporting pure cgroupv2

 PVE/CLI/pve6to7.pm | 68 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)

-- 
2.30.2





^ permalink raw reply	[flat|nested] 4+ messages in thread

* [pve-devel] [PATCH container 1/1] prestart-hook: detect cgroupv2 incompatible systemd version
  2021-07-02 18:21 [pve-devel] [PATCH manger/container] detect containers not supporting pure cgroupv2 Stoiko Ivanov
@ 2021-07-02 18:21 ` Stoiko Ivanov
  2021-07-02 18:21 ` [pve-devel] [PATCH manager 1/1] pve6to7: check for containers not supporting pure cgroupv2 Stoiko Ivanov
  1 sibling, 0 replies; 4+ messages in thread
From: Stoiko Ivanov @ 2021-07-02 18:21 UTC (permalink / raw)
  To: pve-devel

Some container OS (e.g. CentOS 7, Ubuntu 16.04) are booted with
systemd, in a version which is not able to run with a pure cgroupv2
(a.k.a unified hierarchy) environment.

Detect those in the lxc-pve-prestart-hook, because there we already
have all mount-points set up.

This approach only leaves syslog/journal as place for notifying the
user since starting a container eventually runs `systemctl start
pve-container@VMID.service`, where we lose the prints to stdout and
stderr (and the RPCEnvironment for warning in the tasklog).

The alternative of shortly mounting all container mounts just to
obtain the systemd-version, before starting the container seems
prohibitively expensive.

The heuristic of /sbin/init needing to be a link to something ending
in systemd is taken from the systemd documentation[0] and was verified
on a few of our container-templates (Ubuntu, Debian, SUSE, CentOS, Arch).

[0] https://www.freedesktop.org/software/systemd/man/systemd.html
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 src/PVE/LXC/Setup.pm      |  8 ++++++++
 src/PVE/LXC/Setup/Base.pm | 36 ++++++++++++++++++++++++++++++++++++
 src/lxc-pve-prestart-hook |  7 +++++++
 3 files changed, 51 insertions(+)

diff --git a/src/PVE/LXC/Setup.pm b/src/PVE/LXC/Setup.pm
index cf72b03..9abdc85 100644
--- a/src/PVE/LXC/Setup.pm
+++ b/src/PVE/LXC/Setup.pm
@@ -421,4 +421,12 @@ sub get_ct_os_release {
     return &$parse_os_release($data);
 }
 
+sub unified_cgroupv2_support {
+    my ($self) = @_;
+
+    $self->protected_call(sub {
+	$self->{plugin}->unified_cgroupv2_support();
+    });
+}
+
 1;
diff --git a/src/PVE/LXC/Setup/Base.pm b/src/PVE/LXC/Setup/Base.pm
index 663df73..a5b77d3 100644
--- a/src/PVE/LXC/Setup/Base.pm
+++ b/src/PVE/LXC/Setup/Base.pm
@@ -503,6 +503,42 @@ sub clear_machine_id {
     }
 }
 
+# tries to guess the systemd version based on the existence of
+# (/usr)?/lib/systemd/libsystemd-shared<version>.so. It was introduced in v231.
+sub get_systemd_version {
+    my ($self) = @_;
+
+    my $sd_lib_dir = $self->ct_is_directory("/lib/systemd") ?
+	"/lib/systemd" : "/usr/lib/systemd";
+    my $libsd = PVE::Tools::dir_glob_regex($sd_lib_dir, "libsystemd-shared-.+\.so");
+    if (defined($libsd) && $libsd =~ /libsystemd-shared-(\d+)\.so/) {
+	return $1;
+    }
+
+    return undef;
+}
+
+sub unified_cgroupv2_support {
+    my ($self) = @_;
+
+    # https://www.freedesktop.org/software/systemd/man/systemd.html
+    # systemd is installed as symlink to /sbin/init
+    my $systemd = $self->ct_readlink('/sbin/init');
+
+    # assume non-systemd init will run with unified cgroupv2
+    if (!defined($systemd) || $systemd !~ m@/systemd$@) {
+	return 1;
+    }
+
+    # systemd version 232 (e.g. debian stretch) supports the unified hierarchy
+    my $sdver = $self->get_systemd_version();
+    if (!defined($sdver) || $sdver < 232) {
+	return 0;
+    }
+
+    return 1
+}
+
 sub pre_start_hook {
     my ($self, $conf) = @_;
 
diff --git a/src/lxc-pve-prestart-hook b/src/lxc-pve-prestart-hook
index 8d876a8..fac587e 100755
--- a/src/lxc-pve-prestart-hook
+++ b/src/lxc-pve-prestart-hook
@@ -15,6 +15,7 @@ use PVE::LXC::Config;
 use PVE::LXC::Setup;
 use PVE::LXC::Tools;
 use PVE::LXC;
+use PVE::SafeSyslog;
 use PVE::Storage;
 use PVE::Syscall qw(:fsmount);
 use PVE::Tools qw(AT_FDCWD O_PATH);
@@ -126,6 +127,12 @@ PVE::LXC::Tools::lxc_hook('pre-start', 'lxc', sub {
     my $lxc_setup = PVE::LXC::Setup->new($conf, $rootdir);
     $lxc_setup->pre_start_hook();
 
+    if (PVE::CGroup::cgroup_mode() == 2) {
+	if(!$lxc_setup->unified_cgroupv2_support()) {
+	    syslog('err', "CT $vmid does not support running in a pure cgroupv2 environment\n");
+	}
+    }
+
     if (@$devices) {
 	my $devlist = '';
 	foreach my $dev (@$devices) {
-- 
2.30.2





^ permalink raw reply	[flat|nested] 4+ messages in thread

* [pve-devel] [PATCH manager 1/1] pve6to7: check for containers not supporting pure cgroupv2
  2021-07-02 18:21 [pve-devel] [PATCH manger/container] detect containers not supporting pure cgroupv2 Stoiko Ivanov
  2021-07-02 18:21 ` [pve-devel] [PATCH container 1/1] prestart-hook: detect cgroupv2 incompatible systemd version Stoiko Ivanov
@ 2021-07-02 18:21 ` Stoiko Ivanov
  2021-07-02 22:32   ` Thomas Lamprecht
  1 sibling, 1 reply; 4+ messages in thread
From: Stoiko Ivanov @ 2021-07-02 18:21 UTC (permalink / raw)
  To: pve-devel

Ordered as much as possible to exit early, still might take quite some
time on systems with many containers (which do support cgroupv2).

needs a versioned bump on pve-container

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 PVE/CLI/pve6to7.pm | 68 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)

diff --git a/PVE/CLI/pve6to7.pm b/PVE/CLI/pve6to7.pm
index 60edac11..3d7c67bd 100644
--- a/PVE/CLI/pve6to7.pm
+++ b/PVE/CLI/pve6to7.pm
@@ -23,6 +23,9 @@ use PVE::Tools qw(run_command split_list);
 use PVE::QemuConfig;
 use PVE::QemuServer;
 use PVE::VZDump::Common;
+use PVE::LXC;
+use PVE::LXC::Config;
+use PVE::LXC::Setup;
 
 use Term::ANSIColor;
 
@@ -890,6 +893,70 @@ sub check_storage_content {
 	log_pass("no problems found");
     }
 }
+sub check_containers_cgroup_compat {
+
+    my $kernel_cli = PVE::Tools::file_get_contents('/proc/cmdline');
+    if ($kernel_cli =~ /systemd.unified_cgroup_hierarchy=0/){
+	log_skip("System explicitly configured for legacy hybrid cgroup hierarchy.");
+	return;
+    }
+
+    my $cts = eval { PVE::API2::LXC->vmlist({ node => $nodename }) };
+    if ($@) {
+	log_warn("Failed to retrieve information about this node's CTs - $@");
+	return;
+    }
+
+    if (!defined($cts) || !scalar(@$cts)) {
+	log_skip("No containers on node detected.");
+	return;
+    }
+    my @running_vmids = map { $_->{status} eq 'running' ? $_->{vmid} : () } @$cts;
+    my @offline_vmids = map { $_->{status} ne 'running' ? $_->{vmid} : () } @$cts;
+
+    my $legacy_container=0;
+
+    for my $ctid (@running_vmids) {
+	my $pid = eval { PVE::LXC::find_lxc_pid($ctid) };
+	if (my $err = $@) {
+	    log_warn("Failed to get PID for running CT $ctid - $err");
+	    next;
+	}
+	my $rootdir = "/proc/$pid/root";
+	my $conf = PVE::LXC::Config->load_config($ctid);
+	my $lxc_setup = PVE::LXC::Setup->new($conf, $rootdir);
+	if (!$lxc_setup->unified_cgroupv2_support()) {
+	    log_warn("CT $ctid does not support running in a unified cgroup v2 layout - either " .
+		"upgrade it or set systemd.unified_cgroup_hierarchy=0 in the kernel cmdline - "  .
+		"skipping further checks");
+	    return;
+	}
+    }
+
+    my $storage_cfg = PVE::Storage::config();
+    for my $ctid (@offline_vmids) {
+	my ($conf, $rootdir, $lxc_setup);
+	eval {
+	    $conf = PVE::LXC::Config->load_config($ctid);
+	    $rootdir = PVE::LXC::mount_all($ctid, $storage_cfg, $conf);
+	    $lxc_setup = PVE::LXC::Setup->new($conf, $rootdir);
+	};
+	if (my $err = $@) {
+	    log_warn("Failed to load config and mount CT $ctid - $err");
+	    eval { PVE::LXC::umount_all($ctid, $storage_cfg, $conf) };
+	    next;
+	}
+	if (!$lxc_setup->unified_cgroupv2_support()) {
+	    log_warn("CT $ctid does not support running in a unified cgroup v2 layout - either " .
+		"upgrade it or set systemd.unified_cgroup_hierarchy=0 in the kernel cmdline - "  .
+		"skipping further checks");
+	    eval { PVE::LXC::umount_all($ctid, $storage_cfg, $conf) };
+	    last;
+	}
+
+	eval { PVE::LXC::umount_all($ctid, $storage_cfg, $conf) };
+    }
+};
 
 sub check_misc {
     print_header("MISCELLANEOUS CHECKS");
@@ -986,6 +1053,7 @@ sub check_misc {
     check_custom_pool_roles();
     check_description_lengths();
     check_storage_content();
+    check_containers_cgroup_compat();
 }
 
 __PACKAGE__->register_method ({
-- 
2.30.2





^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [pve-devel] [PATCH manager 1/1] pve6to7: check for containers not supporting pure cgroupv2
  2021-07-02 18:21 ` [pve-devel] [PATCH manager 1/1] pve6to7: check for containers not supporting pure cgroupv2 Stoiko Ivanov
@ 2021-07-02 22:32   ` Thomas Lamprecht
  0 siblings, 0 replies; 4+ messages in thread
From: Thomas Lamprecht @ 2021-07-02 22:32 UTC (permalink / raw)
  To: Proxmox VE development discussion, Stoiko Ivanov

On 02.07.21 20:21, Stoiko Ivanov wrote:
> Ordered as much as possible to exit early, still might take quite some
> time on systems with many containers (which do support cgroupv2).

The early abort once one is found seems like a good idea in general, but
I still do not really like that happening unconditionally, this could get hidden
behind and  opt-in CLI option flag - with a single skip log if not taken.

An admin with only bleeding-edge Arch Linux container then could then just
snicker over software from the stone age and just continue ;)

Also, you're currently missing some cheap optimizations like skipping devuan/alpine
config ostypes early, doing needless work for them.

> 
> needs a versioned bump on pve-container

I'd rather prefer copying the required helpers over, as this is mainly required
for stable-6, and it would make it way easier than having versioned dependency
handling for just this in two releases.

> 
> Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
> ---
>  PVE/CLI/pve6to7.pm | 68 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 68 insertions(+)
> 
> diff --git a/PVE/CLI/pve6to7.pm b/PVE/CLI/pve6to7.pm
> index 60edac11..3d7c67bd 100644
> --- a/PVE/CLI/pve6to7.pm
> +++ b/PVE/CLI/pve6to7.pm
> @@ -23,6 +23,9 @@ use PVE::Tools qw(run_command split_list);
>  use PVE::QemuConfig;
>  use PVE::QemuServer;
>  use PVE::VZDump::Common;
> +use PVE::LXC;
> +use PVE::LXC::Config;
> +use PVE::LXC::Setup;
>  
>  use Term::ANSIColor;
>  
> @@ -890,6 +893,70 @@ sub check_storage_content {
>  	log_pass("no problems found");
>      }
>  }
> +sub check_containers_cgroup_compat {
> +
> +    my $kernel_cli = PVE::Tools::file_get_contents('/proc/cmdline');
> +    if ($kernel_cli =~ /systemd.unified_cgroup_hierarchy=0/){
> +	log_skip("System explicitly configured for legacy hybrid cgroup hierarchy.");
> +	return;
> +    }
> +
> +    my $cts = eval { PVE::API2::LXC->vmlist({ node => $nodename }) };
> +    if ($@) {
> +	log_warn("Failed to retrieve information about this node's CTs - $@");
> +	return;
> +    }
> +
> +    if (!defined($cts) || !scalar(@$cts)) {
> +	log_skip("No containers on node detected.");
> +	return;
> +    }
> +    my @running_vmids = map { $_->{status} eq 'running' ? $_->{vmid} : () } @$cts;
> +    my @offline_vmids = map { $_->{status} ne 'running' ? $_->{vmid} : () } @$cts;

nit, but why not grep? Would make it a bit more explicit here, avoiding that any
innocent reader thinks map makes this not work and then spent time getting proved
otherwise ;-)

> +
> +    my $legacy_container=0;
> +
> +    for my $ctid (@running_vmids) {
> +	my $pid = eval { PVE::LXC::find_lxc_pid($ctid) };
> +	if (my $err = $@) {
> +	    log_warn("Failed to get PID for running CT $ctid - $err");
> +	    next;
> +	}
> +	my $rootdir = "/proc/$pid/root";
> +	my $conf = PVE::LXC::Config->load_config($ctid);
> +	my $lxc_setup = PVE::LXC::Setup->new($conf, $rootdir);
> +	if (!$lxc_setup->unified_cgroupv2_support()) {
> +	    log_warn("CT $ctid does not support running in a unified cgroup v2 layout - either " .

Maybe start with "Found at least one CT ($ctid) which does not supp...", makes the
nature of the check slightly less subtle IMO.

> +		"upgrade it or set systemd.unified_cgroup_hierarchy=0 in the kernel cmdline - "  .
> +		"skipping further checks");

> +	    return;
> +	}
> +    }
> +
> +    my $storage_cfg = PVE::Storage::config();
> +    for my $ctid (@offline_vmids) {
> +	my ($conf, $rootdir, $lxc_setup);
> +	eval {
> +	    $conf = PVE::LXC::Config->load_config($ctid);
> +	    $rootdir = PVE::LXC::mount_all($ctid, $storage_cfg, $conf);
> +	    $lxc_setup = PVE::LXC::Setup->new($conf, $rootdir);
> +	};
> +	if (my $err = $@) {
> +	    log_warn("Failed to load config and mount CT $ctid - $err");
> +	    eval { PVE::LXC::umount_all($ctid, $storage_cfg, $conf) };
> +	    next;
> +	}
> +	if (!$lxc_setup->unified_cgroupv2_support()) {
> +	    log_warn("CT $ctid does not support running in a unified cgroup v2 layout - either " .
> +		"upgrade it or set systemd.unified_cgroup_hierarchy=0 in the kernel cmdline - "  .
> +		"skipping further checks");

maybe factor out the common part of that specific log message

> +	    eval { PVE::LXC::umount_all($ctid, $storage_cfg, $conf) };
> +	    last;
> +	}
> +
> +	eval { PVE::LXC::umount_all($ctid, $storage_cfg, $conf) };
> +    }
> +};
>  
>  sub check_misc {
>      print_header("MISCELLANEOUS CHECKS");
> @@ -986,6 +1053,7 @@ sub check_misc {
>      check_custom_pool_roles();
>      check_description_lengths();
>      check_storage_content();
> +    check_containers_cgroup_compat();
>  }
>  
>  __PACKAGE__->register_method ({
> 





^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-07-02 22:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-02 18:21 [pve-devel] [PATCH manger/container] detect containers not supporting pure cgroupv2 Stoiko Ivanov
2021-07-02 18:21 ` [pve-devel] [PATCH container 1/1] prestart-hook: detect cgroupv2 incompatible systemd version Stoiko Ivanov
2021-07-02 18:21 ` [pve-devel] [PATCH manager 1/1] pve6to7: check for containers not supporting pure cgroupv2 Stoiko Ivanov
2021-07-02 22:32   ` Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal