From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <t.lamprecht@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 3BA39621F4
 for <pve-devel@lists.proxmox.com>; Thu, 20 Aug 2020 11:37:13 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 28A432C073
 for <pve-devel@lists.proxmox.com>; Thu, 20 Aug 2020 11:36:43 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [212.186.127.180])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id 34CE72C064
 for <pve-devel@lists.proxmox.com>; Thu, 20 Aug 2020 11:36:41 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 0352F44734
 for <pve-devel@lists.proxmox.com>; Thu, 20 Aug 2020 11:36:41 +0200 (CEST)
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
 Fabian Ebner <f.ebner@proxmox.com>,
 Wolfgang Bumiller <w.bumiller@proxmox.com>
References: <20200819103037.15143-1-f.ebner@proxmox.com>
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
Message-ID: <1ef68f7f-437a-a160-05e2-f3b111ece024@proxmox.com>
Date: Thu, 20 Aug 2020 11:36:39 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:80.0) Gecko/20100101
 Thunderbird/80.0
MIME-Version: 1.0
In-Reply-To: <20200819103037.15143-1-f.ebner@proxmox.com>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.603 Adjusted score from AWL reputation of From: address
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 NICE_REPLY_A           -1.361 Looks like a legit reply (A)
 RCVD_IN_DNSWL_MED        -2.3 Sender listed at https://www.dnswl.org/,
 medium trust
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [lxc.pm]
Subject: Re: [pve-devel] [RFC container] Improve feedback for startup
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Thu, 20 Aug 2020 09:37:13 -0000

On 19.08.20 12:30, Fabian Ebner wrote:
> Since it was necessary to switch to 'Type=Simple' in the systemd
> service (see 545d6f0a13ac2bf3a8d3f224c19c0e0def12116d ),
> 'systemctl start pve-container@ID' would not wait for the 'lxc-start'
> command anymore. Thus every container start was reported as a success
> and the 'post-start' hook would trigger immediately after the
> 'systemctl start' command.
> 
> Use 'lxc-monitor' to get the necessary information and detect
> startup failure and only run the 'post-start' hookscript after
> the container is effectively running. If something goes wrong
> with the monitor, fall back to the old behavior.
> 
> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
> ---
>  src/PVE/LXC.pm | 36 +++++++++++++++++++++++++++++++++++-
>  1 file changed, 35 insertions(+), 1 deletion(-)
> 

appreciate the effort!
We could also directly connect to /run/lxc/var/lib/lxc/monitor-fifo (or the abstract
unix socket, but not much gained/difference here) of the lxc-monitord which publishes
all state changes and unpack the new state [0] directly.

[0] https://github.com/lxc/lxc/blob/8bdacc22a48f9c09902a1d2febd71439cb38c082/src/lxc/state.h#L10

@Wolfgang, what do you think?

> diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm
> index db5b8ca..35dc54c 100644
> --- a/src/PVE/LXC.pm
> +++ b/src/PVE/LXC.pm
> @@ -2191,10 +2191,44 @@ sub vm_start {
>  
>      PVE::Storage::activate_volumes($storage_cfg, $vollist);
>  
> +    my $monitor_pid = open(my $monitor_fh, '-|', "/usr/bin/lxc-monitor -n $vmid")
> +	or warn "could not open pipe to lxc-monitor\n";
> +
>      my $cmd = ['systemctl', 'start', "pve-container\@$vmid"];
>  
>      PVE::GuestHelpers::exec_hookscript($conf, $vmid, 'pre-start', 1);
> -    eval { PVE::Tools::run_command($cmd); };
> +    eval {
> +	PVE::Tools::run_command($cmd);
> +
> +	my $success;
> +	if ($monitor_pid) {
> +	    eval {
> +		local $SIG{ALRM} = sub { die "got timeout\n" };
> +		alarm(10); # 'STARTING' should appear quickly
> +
> +		while (my $line = <$monitor_fh>) {
> +		    if ($line =~ m/^'$vmid' changed state to \[([A-Z]*)\]$/) {
> +			my $status = $1;
> +			alarm(0);
> +			$success = 1 if $status eq 'RUNNING';
> +			$success = 0 if $status eq 'ABORTING'
> +				     || $status eq 'STOPPING'
> +				     || $status eq 'STOPPED';
> +			if (defined($success)) {
> +			    kill('KILL', $monitor_pid);
> +			    waitpid($monitor_pid, 0);
> +			}
> +		    } else {
> +			die "unexpected output from lxc-monitor: $line\n";
> +		    }
> +		}
> +	    };
> +	    warn "Problem with lxc-monitor: $@" if $@;
> +	    alarm(0);
> +	}
> +	die "'lxc-start' failed for container '$vmid'\n"
> +	    if defined($success) && !$success;
> +    };
>      if (my $err = $@) {
>  	unlink $skiplock_flag_fn;
>  	die $err;
>