From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Fabian Ebner <f.ebner@proxmox.com>,
Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
Thomas Lamprecht <t.lamprecht@proxmox.com>
Subject: Re: [pve-devel] [PATCH cluster] fix #3596: handle delnode of offline node
Date: Fri, 12 Nov 2021 13:59:32 +0100 [thread overview]
Message-ID: <1636721818.35yhvi2ls0.astroid@nora.none> (raw)
In-Reply-To: <28cc8b6a-b34f-4cb4-a5de-9e4b8f5aa4df@proxmox.com>
On November 12, 2021 1:14 pm, Thomas Lamprecht wrote:
> On 12.11.21 12:50, Fabian Ebner wrote:
>> Am 12.11.21 um 09:45 schrieb Fabian Grünbichler:
>>> the recommended way is to first shutdown, then delnode, and never let it
>>> come back online, in which case corosync-cfgtool won't be able to kill
>>> the removed (offline) node.
>>>
>>> also, the order was wrong - if we first update corosync.conf to remove
>>> the node entry from the nodelist, corosync doesn't know about the nodeid
>>> anymore, so killing will fail even if the node is still online.
>>>
>>> Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
>>> ---
>>> data/PVE/API2/ClusterConfig.pm | 8 ++++++--
>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/data/PVE/API2/ClusterConfig.pm b/data/PVE/API2/ClusterConfig.pm
>>> index 8f4a5bb..5a6a1ac 100644
>>> --- a/data/PVE/API2/ClusterConfig.pm
>>> +++ b/data/PVE/API2/ClusterConfig.pm
>>> @@ -485,9 +485,13 @@ __PACKAGE__->register_method ({
>>> delete $nodelist->{$node};
>>> - PVE::Corosync::update_nodelist($conf, $nodelist);
>>> + # allowed to fail when node is already shut down!
>>> + eval {
>>> + PVE::Tools::run_command(['corosync-cfgtool','-k', $nodeid])
>>> + if defined($nodeid);
>>> + };
>>>
>>
>> But what if it fails for a different reason than 'CS_ERR_NOT_EXIST'? Shouldn't we match the error?
>
> at least that examples is like ENOENT on unlink, an OK error (user could
> have -k'illed it before that).
>
IMHO it's okay to treat all errors as warnings here - if you follow the
instructions killing is not possible. if you didn't follow them, and the
node is online, but killing fails for some reason you still get the
output, the node is removed from corosync.conf on all nodes, and thus no
traffic is possible anymore between the cluster and the separated node
(knet will reject traffic from unknown -i.e., not contained in the
nodelist- nodes). no traffic means the separated node is kicked out of
the quorum, so it can't do any harm anymore ;)
prev parent reply other threads:[~2021-11-12 13:00 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-12 8:45 Fabian Grünbichler
2021-11-12 10:04 ` [pve-devel] applied: " Thomas Lamprecht
2021-11-12 11:50 ` [pve-devel] " Fabian Ebner
2021-11-12 12:14 ` Thomas Lamprecht
2021-11-12 12:46 ` Fabian Ebner
2021-11-12 13:03 ` Thomas Lamprecht
2021-11-12 12:59 ` Fabian Grünbichler [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1636721818.35yhvi2ls0.astroid@nora.none \
--to=f.gruenbichler@proxmox.com \
--cc=f.ebner@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
--cc=t.lamprecht@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox