From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <t.lamprecht@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 532FC61ACC
 for <pve-devel@lists.proxmox.com>; Tue, 15 Sep 2020 09:58:12 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 47EE9174BA
 for <pve-devel@lists.proxmox.com>; Tue, 15 Sep 2020 09:58:12 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [212.186.127.180])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id AED3A174AD
 for <pve-devel@lists.proxmox.com>; Tue, 15 Sep 2020 09:58:11 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 6DAA844C20;
 Tue, 15 Sep 2020 09:58:11 +0200 (CEST)
To: Alexandre DERUMIER <aderumier@odiso.com>, dietmar <dietmar@proxmox.com>
Cc: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
References: <216436814.339545.1599142316781.JavaMail.zimbra@odiso.com>
 <9e2974b8-3c39-0fda-6f73-6677e3d796f4@proxmox.com>
 <1928266603.714059.1600059280338.JavaMail.zimbra@odiso.com>
 <803983196.1499.1600067690947@webmail.proxmox.com>
 <2093781647.723563.1600072074707.JavaMail.zimbra@odiso.com>
 <88fe5075-870d-9197-7c84-71ae8a25e9dd@proxmox.com>
 <1775665592.735772.1600098305930.JavaMail.zimbra@odiso.com>
 <487514223.9.1600148741895@webmail.proxmox.com>
 <295606419.745430.1600151269212.JavaMail.zimbra@odiso.com>
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
Message-ID: <94ccda38-3f20-3fd5-0e00-d0fd6ef1fc53@proxmox.com>
Date: Tue, 15 Sep 2020 09:58:09 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:81.0) Gecko/20100101
 Thunderbird/81.0
MIME-Version: 1.0
In-Reply-To: <295606419.745430.1600151269212.JavaMail.zimbra@odiso.com>
Content-Type: text/plain; charset=utf-8
Content-Language: en-GB
Content-Transfer-Encoding: 7bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.208 Adjusted score from AWL reputation of From: address
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 NICE_REPLY_A           -0.001 Looks like a legit reply (A)
 RCVD_IN_DNSWL_MED        -2.3 Sender listed at https://www.dnswl.org/,
 medium trust
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [syslog.target]
Subject: Re: [pve-devel] corosync bug: cluster break after 1 node clean
 shutdown
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Tue, 15 Sep 2020 07:58:12 -0000

On 9/15/20 8:27 AM, Alexandre DERUMIER wrote:
>>> This is by intention - we do not want to stop pmxcfs only because coorosync service stops. 
> 
> Yes, but at shutdown, it could be great to stop pmxcfs before corosync ?
> I ask the question, because the 2 times I have problem, it was when shutting down a server.
> So maybe some strange behaviour occur with both corosync && pmxcfs are stopped at same time ?
> 
> 
> looking at the pve-cluster unit file,
> why do we have "Before=corosync.service" and not "After=corosync.service" ?

We may need to sync over the cluster corosync.conf to the local one, that can
only happen before.

Also, if we shutdown pmxcfs before corosync we may still get corosync events (file writes,
locking, ...) but the node does not sees it locally anymore but still looks quorate for
others, that'd be not good.

> 
> I have tried to change this, but even with that, both are still shutting down in parallel.
> 
> the only way I have found to have clean shutdown, is "Requires=corosync.server" + "After=corosync.service".
> But that mean than if you restart corosync, it's restart pmxcfs too first.
> 
> I have looked at systemd doc, After= should be enough (as at shutdown it's doing the reverse order),
> but I don't known why corosync don't wait than pve-cluster ???
> 
> 
> (Also, I think than pmxcfs is also stopping after syslog, because I never see the pmxcfs "teardown filesystem" logs at shutdown)


is that true for (persistent) systemd-journald too? IIRC syslog.target is
deprecated and only rsyslog provides it.

As the next Debian will enable persistent journal by default and we already
use it for everything (IIRC) were we provide an interface to logs, we will
probably not enable rsyslog by default with PVE 7.x

But if we can add some ordering for this to be improved I'm open for it.