public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH manger/container v2 0/2] detect containers not supporting pure cgroupv2
@ 2021-07-05 10:57 Stoiko Ivanov
  2021-07-05 10:57 ` [pve-devel] [PATCH container v2 1/2] prestart-hook: detect cgroupv2 incompatible systemd version Stoiko Ivanov
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Stoiko Ivanov @ 2021-07-05 10:57 UTC (permalink / raw)
  To: pve-devel

changes v1->v2:
incorporated Thomas' feedback (huge thx!) by:
* adding extra handling for Alpine and Devuan container (patch 2/2 for
  pve-container)
* copying the helpers directly to pve6to7 to avoid versioned dependency
  bumps
* refactoring the code in pve6to7 a bit (also returning early for
  Alpine/Devuan)
* adding a 'full' parameter (would be grateful for suggestions for a better
  fitting name) and only running this expensive check if it is provided
  (patch 2/2 for pve-manager)

original cover-letter for v1:
This series addresses the issue of running containers, which boot with a
systemd version which is too old (<232) to support the unified cgroup
hierarchy - This includes CentOS 7 and Ubuntu 16.04 containers.

The patch for pve-container simply logs to syslog with level err to notify
the user. Since container start runs through our stack into systemd
(and back into our stack), I did not see a better option (grateful for
feedback if there is of course).

One alternative might be to mount the container once in vm_start (or the
API calls), check and unmount again - but this seemed a bit expensive to do
unconditionally on every start.

The patch for pve6to7 simply loops through all containers and checks for
the condition

pve-container:
Stoiko Ivanov (2):
  prestart-hook: detect cgroupv2 incompatible systemd version
  setup: shortcut cgroupv2 support for non-systemd distros

 src/PVE/LXC/Setup.pm        |  8 ++++++++
 src/PVE/LXC/Setup/Alpine.pm |  7 +++++++
 src/PVE/LXC/Setup/Base.pm   | 36 ++++++++++++++++++++++++++++++++++++
 src/PVE/LXC/Setup/Devuan.pm |  7 +++++++
 src/lxc-pve-prestart-hook   |  7 +++++++
 5 files changed, 65 insertions(+)

pve-manger:
Stoiko Ivanov (2):
  pve6to7: check for containers not supporting pure cgroupv2
  pve6to7: add 'full' parameter for expensive checks

 PVE/CLI/pve6to7.pm | 138 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 135 insertions(+), 3 deletions(-)

-- 
2.30.2





^ permalink raw reply	[flat|nested] 6+ messages in thread

* [pve-devel] [PATCH container v2 1/2] prestart-hook: detect cgroupv2 incompatible systemd version
  2021-07-05 10:57 [pve-devel] [PATCH manger/container v2 0/2] detect containers not supporting pure cgroupv2 Stoiko Ivanov
@ 2021-07-05 10:57 ` Stoiko Ivanov
  2021-07-05 10:57 ` [pve-devel] [PATCH container v2 2/2] setup: shortcut cgroupv2 support for non-systemd distros Stoiko Ivanov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Stoiko Ivanov @ 2021-07-05 10:57 UTC (permalink / raw)
  To: pve-devel

Some container OS (e.g. CentOS 7, Ubuntu 16.04) are booted with
systemd, in a version which is not able to run with a pure cgroupv2
(a.k.a unified hierarchy) environment.

Detect those in the lxc-pve-prestart-hook, because there we already
have all mount-points set up.

This approach only leaves syslog/journal as place for notifying the
user since starting a container eventually runs `systemctl start
pve-container@VMID.service`, where we lose the prints to stdout and
stderr.

The alternative of shortly mounting all container mounts just to
obtain the systemd-version, before starting the container seems
prohibitively expensive.

The heuristic of /sbin/init needing to be a link to something ending
in systemd is taken from the systemd documentation[0] and was verified
on a few of our container-templates.

[0] https://www.freedesktop.org/software/systemd/man/systemd.html
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---

unchanged from v1

 src/PVE/LXC/Setup.pm      |  8 ++++++++
 src/PVE/LXC/Setup/Base.pm | 36 ++++++++++++++++++++++++++++++++++++
 src/lxc-pve-prestart-hook |  7 +++++++
 3 files changed, 51 insertions(+)

diff --git a/src/PVE/LXC/Setup.pm b/src/PVE/LXC/Setup.pm
index cf72b03..9abdc85 100644
--- a/src/PVE/LXC/Setup.pm
+++ b/src/PVE/LXC/Setup.pm
@@ -421,4 +421,12 @@ sub get_ct_os_release {
     return &$parse_os_release($data);
 }
 
+sub unified_cgroupv2_support {
+    my ($self) = @_;
+
+    $self->protected_call(sub {
+	$self->{plugin}->unified_cgroupv2_support();
+    });
+}
+
 1;
diff --git a/src/PVE/LXC/Setup/Base.pm b/src/PVE/LXC/Setup/Base.pm
index 663df73..a5b77d3 100644
--- a/src/PVE/LXC/Setup/Base.pm
+++ b/src/PVE/LXC/Setup/Base.pm
@@ -503,6 +503,42 @@ sub clear_machine_id {
     }
 }
 
+# tries to guess the systemd version based on the existence of
+# (/usr)?/lib/systemd/libsystemd-shared<version>.so. It was introduced in v231.
+sub get_systemd_version {
+    my ($self) = @_;
+
+    my $sd_lib_dir = $self->ct_is_directory("/lib/systemd") ?
+	"/lib/systemd" : "/usr/lib/systemd";
+    my $libsd = PVE::Tools::dir_glob_regex($sd_lib_dir, "libsystemd-shared-.+\.so");
+    if (defined($libsd) && $libsd =~ /libsystemd-shared-(\d+)\.so/) {
+	return $1;
+    }
+
+    return undef;
+}
+
+sub unified_cgroupv2_support {
+    my ($self) = @_;
+
+    # https://www.freedesktop.org/software/systemd/man/systemd.html
+    # systemd is installed as symlink to /sbin/init
+    my $systemd = $self->ct_readlink('/sbin/init');
+
+    # assume non-systemd init will run with unified cgroupv2
+    if (!defined($systemd) || $systemd !~ m@/systemd$@) {
+	return 1;
+    }
+
+    # systemd version 232 (e.g. debian stretch) supports the unified hierarchy
+    my $sdver = $self->get_systemd_version();
+    if (!defined($sdver) || $sdver < 232) {
+	return 0;
+    }
+
+    return 1
+}
+
 sub pre_start_hook {
     my ($self, $conf) = @_;
 
diff --git a/src/lxc-pve-prestart-hook b/src/lxc-pve-prestart-hook
index 8d876a8..fac587e 100755
--- a/src/lxc-pve-prestart-hook
+++ b/src/lxc-pve-prestart-hook
@@ -15,6 +15,7 @@ use PVE::LXC::Config;
 use PVE::LXC::Setup;
 use PVE::LXC::Tools;
 use PVE::LXC;
+use PVE::SafeSyslog;
 use PVE::Storage;
 use PVE::Syscall qw(:fsmount);
 use PVE::Tools qw(AT_FDCWD O_PATH);
@@ -126,6 +127,12 @@ PVE::LXC::Tools::lxc_hook('pre-start', 'lxc', sub {
     my $lxc_setup = PVE::LXC::Setup->new($conf, $rootdir);
     $lxc_setup->pre_start_hook();
 
+    if (PVE::CGroup::cgroup_mode() == 2) {
+	if(!$lxc_setup->unified_cgroupv2_support()) {
+	    syslog('err', "CT $vmid does not support running in a pure cgroupv2 environment\n");
+	}
+    }
+
     if (@$devices) {
 	my $devlist = '';
 	foreach my $dev (@$devices) {
-- 
2.30.2





^ permalink raw reply	[flat|nested] 6+ messages in thread

* [pve-devel] [PATCH container v2 2/2] setup: shortcut cgroupv2 support for non-systemd distros
  2021-07-05 10:57 [pve-devel] [PATCH manger/container v2 0/2] detect containers not supporting pure cgroupv2 Stoiko Ivanov
  2021-07-05 10:57 ` [pve-devel] [PATCH container v2 1/2] prestart-hook: detect cgroupv2 incompatible systemd version Stoiko Ivanov
@ 2021-07-05 10:57 ` Stoiko Ivanov
  2021-07-05 10:57 ` [pve-devel] [PATCH manager v2 1/2] pve6to7: check for containers not supporting pure cgroupv2 Stoiko Ivanov
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Stoiko Ivanov @ 2021-07-05 10:57 UTC (permalink / raw)
  To: pve-devel

Alpine and Devuan do not use systemd as init, thus run without
problems in a pure cgroupv2 environment

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 src/PVE/LXC/Setup/Alpine.pm | 7 +++++++
 src/PVE/LXC/Setup/Devuan.pm | 7 +++++++
 2 files changed, 14 insertions(+)

diff --git a/src/PVE/LXC/Setup/Alpine.pm b/src/PVE/LXC/Setup/Alpine.pm
index e486971..5caeee7 100644
--- a/src/PVE/LXC/Setup/Alpine.pm
+++ b/src/PVE/LXC/Setup/Alpine.pm
@@ -106,4 +106,11 @@ sub setup_network {
     PVE::LXC::Setup::Debian::setup_network($self, $newconf);
 }
 
+# non systemd based containers work with pure cgroupv2
+sub unified_cgroupv2_support {
+    my ($self) = @_;
+
+    return 1;
+}
+
 1;
diff --git a/src/PVE/LXC/Setup/Devuan.pm b/src/PVE/LXC/Setup/Devuan.pm
index d3d9019..3e15bb2 100644
--- a/src/PVE/LXC/Setup/Devuan.pm
+++ b/src/PVE/LXC/Setup/Devuan.pm
@@ -40,6 +40,13 @@ sub new {
     return bless $self, $class;
 }
 
+# non systemd based containers work with pure cgroupv2
+sub unified_cgroupv2_support {
+    my ($self) = @_;
+
+    return 1;
+}
+
 # the rest gets handled by the Debian plugin, which is compatible with older
 # non-systemd Debian versions for now.
 
-- 
2.30.2





^ permalink raw reply	[flat|nested] 6+ messages in thread

* [pve-devel] [PATCH manager v2 1/2] pve6to7: check for containers not supporting pure cgroupv2
  2021-07-05 10:57 [pve-devel] [PATCH manger/container v2 0/2] detect containers not supporting pure cgroupv2 Stoiko Ivanov
  2021-07-05 10:57 ` [pve-devel] [PATCH container v2 1/2] prestart-hook: detect cgroupv2 incompatible systemd version Stoiko Ivanov
  2021-07-05 10:57 ` [pve-devel] [PATCH container v2 2/2] setup: shortcut cgroupv2 support for non-systemd distros Stoiko Ivanov
@ 2021-07-05 10:57 ` Stoiko Ivanov
  2021-07-05 10:57 ` [pve-devel] [PATCH manager v2 2/2] pve6to7: add 'full' parameter for expensive checks Stoiko Ivanov
  2021-07-05 16:59 ` [pve-devel] applied series: [PATCH manger/container v2 0/2] detect containers not supporting pure cgroupv2 Thomas Lamprecht
  4 siblings, 0 replies; 6+ messages in thread
From: Stoiko Ivanov @ 2021-07-05 10:57 UTC (permalink / raw)
  To: pve-devel

Helpers copied from pve-container to avoid versioned bumps.

Early returns when no containers are running, or the containers don't
use systemd, as well as returning after finding the first affected
container to minimize impact and resource usage.

Checking running containers first since following /proc/<pid>/root is
cheaper than mounting all volumes for a container

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 PVE/CLI/pve6to7.pm | 123 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 123 insertions(+)

diff --git a/PVE/CLI/pve6to7.pm b/PVE/CLI/pve6to7.pm
index 60edac11..b9aeb89c 100644
--- a/PVE/CLI/pve6to7.pm
+++ b/PVE/CLI/pve6to7.pm
@@ -23,6 +23,9 @@ use PVE::Tools qw(run_command split_list);
 use PVE::QemuConfig;
 use PVE::QemuServer;
 use PVE::VZDump::Common;
+use PVE::LXC;
+use PVE::LXC::Config;
+use PVE::LXC::Setup;
 
 use Term::ANSIColor;
 
@@ -891,6 +894,126 @@ sub check_storage_content {
     }
 }
 
+sub check_containers_cgroup_compat {
+
+    my $kernel_cli = PVE::Tools::file_get_contents('/proc/cmdline');
+    if ($kernel_cli =~ /systemd.unified_cgroup_hierarchy=0/){
+	log_skip("System explicitly configured for legacy hybrid cgroup hierarchy.");
+	return;
+    }
+
+    my $supports_cgroupv2 = sub {
+	my ($conf, $rootdir) = @_;
+
+	my $get_systemd_version = sub {
+	    my ($self) = @_;
+
+	    my $sd_lib_dir = -d "/lib/systemd" ? "/lib/systemd" : "/usr/lib/systemd";
+	    my $libsd = PVE::Tools::dir_glob_regex($sd_lib_dir, "libsystemd-shared-.+\.so");
+	    if (defined($libsd) && $libsd =~ /libsystemd-shared-(\d+)\.so/) {
+		return $1;
+	    }
+
+	    return undef;
+	};
+
+	my  $unified_cgroupv2_support = sub {
+	    my ($self) = @_;
+
+	    # https://www.freedesktop.org/software/systemd/man/systemd.html
+	    # systemd is installed as symlink to /sbin/init
+	    my $systemd = CORE::readlink('/sbin/init');
+
+	    # assume non-systemd init will run with unified cgroupv2
+	    if (!defined($systemd) || $systemd !~ m@/systemd$@) {
+		return 1;
+	    }
+
+	    # systemd version 232 (e.g. debian stretch) supports the unified hierarchy
+	    my $sdver = $get_systemd_version->();
+	    if (!defined($sdver) || $sdver < 232) {
+		return 0;
+	    }
+
+	    return 1;
+	};
+
+	my $ostype = $conf->{ostype};
+	if ($ostype eq 'devuan' || $ostype eq 'alpine') {
+	    return 1;
+	}
+
+	my $lxc_setup = PVE::LXC::Setup->new($conf, $rootdir);
+	return $lxc_setup->protected_call($unified_cgroupv2_support);
+    };
+
+    my $log_problem = sub {
+	my ($ctid) = @_;
+	log_warn("Found at least one CT ($ctid) which does not support running in a unified " .
+	    "cgroup v2 layout - either upgrade it or set systemd.unified_cgroup_hierarchy=0 " .
+	    "in the kernel cmdline - skipping further checks"
+	);
+    };
+
+    my $cts = eval { PVE::API2::LXC->vmlist({ node => $nodename }) };
+    if ($@) {
+	log_warn("Failed to retrieve information about this node's CTs - $@");
+	return;
+    }
+
+    if (!defined($cts) || !scalar(@$cts)) {
+	log_skip("No containers on node detected.");
+	return;
+    }
+
+    my @running_cts = grep { $_->{status} eq 'running' } @$cts;
+    my @offline_cts = grep { $_->{status} ne 'running' } @$cts;
+
+    for my $ct (@running_cts) {
+	my $ctid = $ct->{vmid};
+	my $pid = eval { PVE::LXC::find_lxc_pid($ctid) };
+	if (my $err = $@) {
+	    log_warn("Failed to get PID for running CT $ctid - $err");
+	    next;
+	}
+	my $rootdir = "/proc/$pid/root";
+	my $conf = PVE::LXC::Config->load_config($ctid);
+
+	my $ret = eval { $supports_cgroupv2->($conf, $rootdir) };
+	if (my $err = $@) {
+	    log_warn("Failed to get cgroup support status for CT $ctid - $err");
+	    next;
+	}
+	if (!$ret) {
+	    $log_problem->($ctid);
+	    return;
+	}
+    }
+
+    my $storage_cfg = PVE::Storage::config();
+    for my $ct (@offline_cts) {
+	my $ctid = $ct->{vmid};
+	my ($conf, $rootdir, $ret);
+	eval {
+	    $conf = PVE::LXC::Config->load_config($ctid);
+	    $rootdir = PVE::LXC::mount_all($ctid, $storage_cfg, $conf);
+	    $ret = $supports_cgroupv2->($conf, $rootdir);
+	};
+	if (my $err = $@) {
+	    log_warn("Failed to load config and mount CT $ctid - $err");
+	    eval { PVE::LXC::umount_all($ctid, $storage_cfg, $conf) };
+	    next;
+	}
+	if (!$ret) {
+	    $log_problem->($ctid);
+	    eval { PVE::LXC::umount_all($ctid, $storage_cfg, $conf) };
+	    last;
+	}
+
+	eval { PVE::LXC::umount_all($ctid, $storage_cfg, $conf) };
+    }
+};
+
 sub check_misc {
     print_header("MISCELLANEOUS CHECKS");
     my $ssh_config = eval { PVE::Tools::file_get_contents('/root/.ssh/config') };
-- 
2.30.2





^ permalink raw reply	[flat|nested] 6+ messages in thread

* [pve-devel] [PATCH manager v2 2/2] pve6to7: add 'full' parameter for expensive checks
  2021-07-05 10:57 [pve-devel] [PATCH manger/container v2 0/2] detect containers not supporting pure cgroupv2 Stoiko Ivanov
                   ` (2 preceding siblings ...)
  2021-07-05 10:57 ` [pve-devel] [PATCH manager v2 1/2] pve6to7: check for containers not supporting pure cgroupv2 Stoiko Ivanov
@ 2021-07-05 10:57 ` Stoiko Ivanov
  2021-07-05 16:59 ` [pve-devel] applied series: [PATCH manger/container v2 0/2] detect containers not supporting pure cgroupv2 Thomas Lamprecht
  4 siblings, 0 replies; 6+ messages in thread
From: Stoiko Ivanov @ 2021-07-05 10:57 UTC (permalink / raw)
  To: pve-devel

and place the container cgroupv2 support checks behind it.

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 PVE/CLI/pve6to7.pm | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/PVE/CLI/pve6to7.pm b/PVE/CLI/pve6to7.pm
index b9aeb89c..5e97a3c5 100644
--- a/PVE/CLI/pve6to7.pm
+++ b/PVE/CLI/pve6to7.pm
@@ -1119,6 +1119,12 @@ __PACKAGE__->register_method ({
     parameters => {
 	additionalProperties => 0,
 	properties => {
+	    full => {
+		description => 'perform additional, expensive checks.',
+		type => 'boolean',
+		optional => 1,
+		default => 0,
+	    },
 	},
     },
     returns => { type => 'null' },
@@ -1131,6 +1137,12 @@ __PACKAGE__->register_method ({
 	check_storage_health();
 	check_misc();
 
+	if ($param->{full}) {
+	    check_containers_cgroup_compat();
+	} else {
+	    log_skip("Expensive checks not performed without 'full' parameter");
+	}
+
 	print_header("SUMMARY");
 
 	my $total = 0;
@@ -1153,7 +1165,4 @@ __PACKAGE__->register_method ({
 
 our $cmddef = [ __PACKAGE__, 'checklist', [], {}];
 
-# for now drop all unknown params and just check
-@ARGV = ();
-
 1;
-- 
2.30.2





^ permalink raw reply	[flat|nested] 6+ messages in thread

* [pve-devel] applied series: [PATCH manger/container v2 0/2] detect containers not supporting pure cgroupv2
  2021-07-05 10:57 [pve-devel] [PATCH manger/container v2 0/2] detect containers not supporting pure cgroupv2 Stoiko Ivanov
                   ` (3 preceding siblings ...)
  2021-07-05 10:57 ` [pve-devel] [PATCH manager v2 2/2] pve6to7: add 'full' parameter for expensive checks Stoiko Ivanov
@ 2021-07-05 16:59 ` Thomas Lamprecht
  4 siblings, 0 replies; 6+ messages in thread
From: Thomas Lamprecht @ 2021-07-05 16:59 UTC (permalink / raw)
  To: Proxmox VE development discussion, Stoiko Ivanov

On 05.07.21 12:57, Stoiko Ivanov wrote:
> changes v1->v2:
> incorporated Thomas' feedback (huge thx!) by:
> * adding extra handling for Alpine and Devuan container (patch 2/2 for
>   pve-container)
> * copying the helpers directly to pve6to7 to avoid versioned dependency
>   bumps
> * refactoring the code in pve6to7 a bit (also returning early for
>   Alpine/Devuan)
> * adding a 'full' parameter (would be grateful for suggestions for a better
>   fitting name) and only running this expensive check if it is provided
>   (patch 2/2 for pve-manager)
> 
> original cover-letter for v1:
> This series addresses the issue of running containers, which boot with a
> systemd version which is too old (<232) to support the unified cgroup
> hierarchy - This includes CentOS 7 and Ubuntu 16.04 containers.
> 
> The patch for pve-container simply logs to syslog with level err to notify
> the user. Since container start runs through our stack into systemd
> (and back into our stack), I did not see a better option (grateful for
> feedback if there is of course).
> 
> One alternative might be to mount the container once in vm_start (or the
> API calls), check and unmount again - but this seemed a bit expensive to do
> unconditionally on every start.
> 
> The patch for pve6to7 simply loops through all containers and checks for
> the condition
> 
> pve-container:
> Stoiko Ivanov (2):
>   prestart-hook: detect cgroupv2 incompatible systemd version
>   setup: shortcut cgroupv2 support for non-systemd distros
> 
>  src/PVE/LXC/Setup.pm        |  8 ++++++++
>  src/PVE/LXC/Setup/Alpine.pm |  7 +++++++
>  src/PVE/LXC/Setup/Base.pm   | 36 ++++++++++++++++++++++++++++++++++++
>  src/PVE/LXC/Setup/Devuan.pm |  7 +++++++
>  src/lxc-pve-prestart-hook   |  7 +++++++
>  5 files changed, 65 insertions(+)
> 
> pve-manger:
> Stoiko Ivanov (2):
>   pve6to7: check for containers not supporting pure cgroupv2
>   pve6to7: add 'full' parameter for expensive checks
> 
>  PVE/CLI/pve6to7.pm | 138 ++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 135 insertions(+), 3 deletions(-)
> 



applied series, thanks!




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-07-05 17:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-05 10:57 [pve-devel] [PATCH manger/container v2 0/2] detect containers not supporting pure cgroupv2 Stoiko Ivanov
2021-07-05 10:57 ` [pve-devel] [PATCH container v2 1/2] prestart-hook: detect cgroupv2 incompatible systemd version Stoiko Ivanov
2021-07-05 10:57 ` [pve-devel] [PATCH container v2 2/2] setup: shortcut cgroupv2 support for non-systemd distros Stoiko Ivanov
2021-07-05 10:57 ` [pve-devel] [PATCH manager v2 1/2] pve6to7: check for containers not supporting pure cgroupv2 Stoiko Ivanov
2021-07-05 10:57 ` [pve-devel] [PATCH manager v2 2/2] pve6to7: add 'full' parameter for expensive checks Stoiko Ivanov
2021-07-05 16:59 ` [pve-devel] applied series: [PATCH manger/container v2 0/2] detect containers not supporting pure cgroupv2 Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal