From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 08BFB602E1 for ; Tue, 8 Sep 2020 13:59:20 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id F24A59949 for ; Tue, 8 Sep 2020 13:58:49 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [212.186.127.180]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 05C459939 for ; Tue, 8 Sep 2020 13:58:49 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id BFDA444A95 for ; Tue, 8 Sep 2020 13:58:48 +0200 (CEST) From: Fabian Ebner To: pve-devel@lists.proxmox.com Date: Tue, 8 Sep 2020 13:58:43 +0200 Message-Id: <20200908115843.345-2-f.ebner@proxmox.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200908115843.345-1-f.ebner@proxmox.com> References: <20200908115843.345-1-f.ebner@proxmox.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.041 Adjusted score from AWL reputation of From: address KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_DNSWL_MED -2.3 Sender listed at https://www.dnswl.org/, medium trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [lxc.pm] Subject: [pve-devel] [PATCH v2 container 2/2] Improve feedback for startup X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Sep 2020 11:59:20 -0000 Since it was necessary to switch to 'Type=Simple' in the systemd service, see 545d6f0a13ac2bf3a8d3f224c19c0e0def12116d, 'systemctl start' would not wait for the 'lxc-start' command anymore. Thus every container start was reported as a success and the 'post-start' hook would trigger immediately after the 'systemctl start' command. Use the monitor socket to get the necessary information and detect startup failure, and only run the 'post-start' hookscript after the container is effectively running. If something goes wrong with the monitor socket, for example if lxc-monitord is not running, fall back to the old behavior. Signed-off-by: Fabian Ebner --- Changes from v1: * use monitor socket directly instead of forking off an lxc-monitor process * use run_with_timeout helper * warn instead of die on unexpected message src/PVE/LXC.pm | 40 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 39 insertions(+), 1 deletion(-) diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm index db5b8ca..370adda 100644 --- a/src/PVE/LXC.pm +++ b/src/PVE/LXC.pm @@ -32,6 +32,7 @@ use PVE::LXC::Config; use PVE::GuestHelpers qw(safe_string_ne safe_num_ne safe_boolean_ne); use PVE::LXC::Tools; use PVE::LXC::CGroup; +use PVE::LXC::Monitor; use Time::HiRes qw (gettimeofday); my $have_sdn; @@ -2191,10 +2192,47 @@ sub vm_start { PVE::Storage::activate_volumes($storage_cfg, $vollist); + my $monitor_socket = eval { PVE::LXC::Monitor::get_monitor_socket(); }; + warn $@ if $@; + + my $monitor_state_change = sub { + die "no monitor socket" if !defined($monitor_socket); + + while (1) { + my ($type, $name, $value) = PVE::LXC::Monitor::read_lxc_message($monitor_socket); + + die "monitor socket EOF" if !defined($type); + + next if $name ne "$vmid" || $type ne 'STATE'; + + if ($value eq PVE::LXC::Monitor::STATE_STARTING) { + alarm(0); # don't timeout after seeing the starting state + } elsif ($value eq PVE::LXC::Monitor::STATE_ABORTING || + $value eq PVE::LXC::Monitor::STATE_STOPPING || + $value eq PVE::LXC::Monitor::STATE_STOPPED) { + return 0; + } elsif ($value eq PVE::LXC::Monitor::STATE_RUNNING) { + return 1; + } else { + warn "unexpected message from monitor socket - " . + "type: '$type' - value: '$value'\n"; + } + } + }; + my $cmd = ['systemctl', 'start', "pve-container\@$vmid"]; PVE::GuestHelpers::exec_hookscript($conf, $vmid, 'pre-start', 1); - eval { PVE::Tools::run_command($cmd); }; + eval { + PVE::Tools::run_command($cmd); + + my $success = eval { PVE::Tools::run_with_timeout(10, $monitor_state_change); }; + if (my $err = $@) { + warn "problem with monitor socket: $err - continuing anyway\n"; + } elsif (!$success) { + die "startup for container '$vmid' failed\n"; + } + }; if (my $err = $@) { unlink $skiplock_flag_fn; die $err; -- 2.20.1