From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <s.ivanov@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 8444372A4A
 for <pve-devel@lists.proxmox.com>; Fri,  2 Jul 2021 20:22:11 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id C75AFC0F1
 for <pve-devel@lists.proxmox.com>; Fri,  2 Jul 2021 20:22:10 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id A1D3EC0C5
 for <pve-devel@lists.proxmox.com>; Fri,  2 Jul 2021 20:22:09 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 7A6D940060
 for <pve-devel@lists.proxmox.com>; Fri,  2 Jul 2021 20:22:09 +0200 (CEST)
From: Stoiko Ivanov <s.ivanov@proxmox.com>
To: pve-devel@lists.proxmox.com
Date: Fri,  2 Jul 2021 20:21:51 +0200
Message-Id: <20210702182152.485913-2-s.ivanov@proxmox.com>
X-Mailer: git-send-email 2.30.2
In-Reply-To: <20210702182152.485913-1-s.ivanov@proxmox.com>
References: <20210702182152.485913-1-s.ivanov@proxmox.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.388 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 POISEN_SPAM_PILL          0.1 Meta: its spam
 POISEN_SPAM_PILL_1        0.1 random spam to be learned in bayes
 POISEN_SPAM_PILL_3        0.1 random spam to be learned in bayes
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [freedesktop.org, setup.pm, base.pm]
Subject: [pve-devel] [PATCH container 1/1] prestart-hook: detect cgroupv2
 incompatible systemd version
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Fri, 02 Jul 2021 18:22:11 -0000

Some container OS (e.g. CentOS 7, Ubuntu 16.04) are booted with
systemd, in a version which is not able to run with a pure cgroupv2
(a.k.a unified hierarchy) environment.

Detect those in the lxc-pve-prestart-hook, because there we already
have all mount-points set up.

This approach only leaves syslog/journal as place for notifying the
user since starting a container eventually runs `systemctl start
pve-container@VMID.service`, where we lose the prints to stdout and
stderr (and the RPCEnvironment for warning in the tasklog).

The alternative of shortly mounting all container mounts just to
obtain the systemd-version, before starting the container seems
prohibitively expensive.

The heuristic of /sbin/init needing to be a link to something ending
in systemd is taken from the systemd documentation[0] and was verified
on a few of our container-templates (Ubuntu, Debian, SUSE, CentOS, Arch).

[0] https://www.freedesktop.org/software/systemd/man/systemd.html
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 src/PVE/LXC/Setup.pm      |  8 ++++++++
 src/PVE/LXC/Setup/Base.pm | 36 ++++++++++++++++++++++++++++++++++++
 src/lxc-pve-prestart-hook |  7 +++++++
 3 files changed, 51 insertions(+)

diff --git a/src/PVE/LXC/Setup.pm b/src/PVE/LXC/Setup.pm
index cf72b03..9abdc85 100644
--- a/src/PVE/LXC/Setup.pm
+++ b/src/PVE/LXC/Setup.pm
@@ -421,4 +421,12 @@ sub get_ct_os_release {
     return &$parse_os_release($data);
 }
 
+sub unified_cgroupv2_support {
+    my ($self) = @_;
+
+    $self->protected_call(sub {
+	$self->{plugin}->unified_cgroupv2_support();
+    });
+}
+
 1;
diff --git a/src/PVE/LXC/Setup/Base.pm b/src/PVE/LXC/Setup/Base.pm
index 663df73..a5b77d3 100644
--- a/src/PVE/LXC/Setup/Base.pm
+++ b/src/PVE/LXC/Setup/Base.pm
@@ -503,6 +503,42 @@ sub clear_machine_id {
     }
 }
 
+# tries to guess the systemd version based on the existence of
+# (/usr)?/lib/systemd/libsystemd-shared<version>.so. It was introduced in v231.
+sub get_systemd_version {
+    my ($self) = @_;
+
+    my $sd_lib_dir = $self->ct_is_directory("/lib/systemd") ?
+	"/lib/systemd" : "/usr/lib/systemd";
+    my $libsd = PVE::Tools::dir_glob_regex($sd_lib_dir, "libsystemd-shared-.+\.so");
+    if (defined($libsd) && $libsd =~ /libsystemd-shared-(\d+)\.so/) {
+	return $1;
+    }
+
+    return undef;
+}
+
+sub unified_cgroupv2_support {
+    my ($self) = @_;
+
+    # https://www.freedesktop.org/software/systemd/man/systemd.html
+    # systemd is installed as symlink to /sbin/init
+    my $systemd = $self->ct_readlink('/sbin/init');
+
+    # assume non-systemd init will run with unified cgroupv2
+    if (!defined($systemd) || $systemd !~ m@/systemd$@) {
+	return 1;
+    }
+
+    # systemd version 232 (e.g. debian stretch) supports the unified hierarchy
+    my $sdver = $self->get_systemd_version();
+    if (!defined($sdver) || $sdver < 232) {
+	return 0;
+    }
+
+    return 1
+}
+
 sub pre_start_hook {
     my ($self, $conf) = @_;
 
diff --git a/src/lxc-pve-prestart-hook b/src/lxc-pve-prestart-hook
index 8d876a8..fac587e 100755
--- a/src/lxc-pve-prestart-hook
+++ b/src/lxc-pve-prestart-hook
@@ -15,6 +15,7 @@ use PVE::LXC::Config;
 use PVE::LXC::Setup;
 use PVE::LXC::Tools;
 use PVE::LXC;
+use PVE::SafeSyslog;
 use PVE::Storage;
 use PVE::Syscall qw(:fsmount);
 use PVE::Tools qw(AT_FDCWD O_PATH);
@@ -126,6 +127,12 @@ PVE::LXC::Tools::lxc_hook('pre-start', 'lxc', sub {
     my $lxc_setup = PVE::LXC::Setup->new($conf, $rootdir);
     $lxc_setup->pre_start_hook();
 
+    if (PVE::CGroup::cgroup_mode() == 2) {
+	if(!$lxc_setup->unified_cgroupv2_support()) {
+	    syslog('err', "CT $vmid does not support running in a pure cgroupv2 environment\n");
+	}
+    }
+
     if (@$devices) {
 	my $devlist = '';
 	foreach my $dev (@$devices) {
-- 
2.30.2