From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id AE2E81FF17C for ; Wed, 20 Aug 2025 01:03:03 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id CF0801D8D7; Wed, 20 Aug 2025 01:04:43 +0200 (CEST) Date: Wed, 20 Aug 2025 00:53:45 +0200 To: pve-user@lists.proxmox.com References: <4lgenl-oaf2.ln1@leia.lilliput.linux.it> In-Reply-To: <4lgenl-oaf2.ln1@leia.lilliput.linux.it> MIME-Version: 1.0 Message-ID: List-Id: Proxmox VE user list List-Post: From: dorsy via pve-user Precedence: list Cc: dorsy X-Mailman-Version: 2.1.29 X-BeenThere: pve-user@lists.proxmox.com List-Subscribe: , List-Unsubscribe: , List-Archive: Reply-To: Proxmox VE user list List-Help: Subject: Re: [PVE-User] Quorum bouncing, VM does not start... Content-Type: multipart/mixed; boundary="===============7080455098475626076==" Errors-To: pve-user-bounces@lists.proxmox.com Sender: "pve-user" --===============7080455098475626076== Content-Type: message/rfc822 Content-Disposition: inline Return-Path: X-Original-To: pve-user@lists.proxmox.com Delivered-To: pve-user@lists.proxmox.com Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 1C059D2B73 for ; Wed, 20 Aug 2025 01:04:41 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id E4CE31D8A4 for ; Wed, 20 Aug 2025 01:04:10 +0200 (CEST) Received: from sonic310-15.consmr.mail.bf2.yahoo.com (sonic310-15.consmr.mail.bf2.yahoo.com [74.6.135.125]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Wed, 20 Aug 2025 01:04:08 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1755644641; bh=oS5yfehlNmp/kzzuzilpvJcdzVy9D3z6O0lZsoELxLM=; h=Date:Subject:To:References:From:In-Reply-To:From:Subject:Reply-To; b=oHRic+jG9chdnBUomHMi9s+hhwnkj+b8D7w9ZMYElckVXjO5jt7NpCrj8W0WLzHzzesw6tRSzW2/y5g154g0Ed8wmI2ZZ1DBRDOeUFB1cQSlHE6xjJuoP8AC9gImzeoDkz3+nL1LkIM8FEr8sOWsxRxTH1sM+1N8aiSDxBCZMcUW82QepwJRQPZd4itustupUR9AyMtk3MghejVeTju5bWmqxuo1vKjPtCVvJZJzbOBEpHEULwDVXOlBLumT+271qloQKFxvieAqehniwy35qzi5hJNeH3ImhoaPBM7UQDYz1wsPcKKVMqAc3Q0L6xBKBXy66F76lD+c7aiZlkWHvQ== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1755644641; bh=XvQNr79Z5MQlIjPlhtVkWDPy70GXBJaABBn20aZVQVx=; h=X-Sonic-MF:Date:Subject:To:From:From:Subject; b=nbWvf+0nEABbp0be+/zaxmHvbISCAy6t/Em421JhLnIsDUwxqnU+8ImdyT/LRZgkhHIMstM3zZxYv3bbQFH2B+ojCGxz6y0C7c6rqdIms5Va7yKLqeBpRGJB2wOdm4Vc9uQojVD4p5JR12UVWrtjecyM+XVUiU4RDwESY9VB6cdUgtPreNygQIgjOsubUpVvbjkezKmZXlnOk0s7G6YxYAS185fQ+iMBool+pV/DyUb8tfpslmrTQ4sO5n3LAFemEJfLnmNGyUbP9ZHflv6aT0xVNudoiza9YY74Gm20DfXlIoO/Jgeh3gJ26DKTAPOZhe6Sv0/HQJH0dvxbNs/cGQ== X-YMail-OSG: 00hcsSkVM1lO52Eg_H9YqbuFlAq0t437Nk5h.fgg3A8LuvvhfEhaK.Yb_X9Hbd3 KbiNXH08jl8LBi76IwGYo32xIN67pDRvnW7ed4il5uVPT9_D1U5Jyyxq6Si2UXtYo_O43TZ06PQS umvOiZXY85kIjd32lUXyz6gRyNj1C2mqxzzZNyz01cDD6odvT5Od9_fuxGFIQGDqAfBzRRIKa3fG j_kOUE1yWF70yR7XUo0LVLpOJinw9Lhy1oP7jaxlQ3kjNgFH90QSU4Usc5YZ9PE1DN1AuiCvEeGu SzgiePBMhdDzzoRP57JWneeCV8qQSYZ.mFUi3ZjmSj4kDkQERd9ybJe49EQjoymdkdMcO6fN.DoK eEJTB.6DCvvvoq4JdU7eSSFJzSI6bAVWtm8c1dCw0vv63PFvozCiNkS2CjrRllfdXhIgG3PGxvA9 M3LuNk__uZCnHL4ugOJshmlR_FoCThJjdnqaXqJkgfyBVYFvC3Ks9Eo1jTk_JWNkl4jB7khvyx1P ssXgH08zq6KCA2B3f4b3OkXJXhPaGsS0RxSde_sa_xyILgn9RmtoKozzw7H2MmUWSyNONDvSbRuD QCIzPsmoKgR07p0a8r4tKEAkTANkfVAAqTv6fhaRYbD44MfyU2.9harGpTmtyqMgM8tLwYSGvqL9 6lSQjCjCYqbvKrqa9TxzoA27gDXcZHTQnqEUrcDLlKtQRzps5G2pPCbEHbAY4vkgvhPiL4hdUTRa ASYcuq4JLCqRojZnzMu5cPv2D2UYdFizGV53VTdzO4c94mITgApUuR7KSUbE0Wb347yI4oew90Ur iMTiVqzJ36wuCP.ZWJGQ5q0Lqd_3Ok2YWtZ5XyiV5SVD2Cvw4t_mfa9gDi1ee0Vcj.WWdLp4y_Ry _eXphKUFvyLssCNPQCnjozUAE7Hq0Cl4VfqybZhWQC.la2pvHI3GVt37g0qRHIZdrFVfGW8YCMhe 4I3WxcoD00H50O5C6_WnZ5AWVG3YthY7C_pUH9GDREhkFEHobUIcuQ437AUDDSWvDct473EAlW0j FR0Q7u1JaTQ_421cyoRqOo9Itpd5xSZTP7U5fxg734cyWKyeJzRSv3Gr_9CL_fe7wH6Lsb5d2uHa ZeOlfeYaQJs0rynUrGVFm6Pl6m0S4wz0o16.D0icW3eHxPZ.ejGFNSBt4f.aBjnvpqAIrENpmpvk fIdW00hYWTAeCIHonufTrFN4Xv6_nmH63iMsjIrePTUdA6iwqGOBtbt_y9GxQtkhAgPSuT3Ec.I. Jp5j5D6Id9RpfeC8xHLEZIzMpYUisymdqLf9I97FLtg3JkXP6mkSnh_N.g0avXotWCmg7bEt01_d .lJLUjay3NTlWjG5N9LrH8hKDsBejTG6VnMonUuTQTKcKvUFAs4qgbVn.0DAiJNirHI1KLdsA8Qk ouGdlFDiSz_oymxouohGjsb3QPijlnzXin9JmByEyJW3A12bEA2CPJJr_ZGG63b6QyuqRKJd8BD4 cJy8xlePdE88B_jPnigUAfpMCnbQMURGYdXn4gJVA4.ChlZgVtIyM7bRQGFCv9zWcr86iT50jwMM Xb78ZMQqEYQwhBdBAV3SXvvzOwzMN8VyvrkBLk93BgObTMgcsa5_R0Dz5Rol_KSXm86dMiFpluDg akJyLxj3Zlno0B9y.9wWvvzITjMwjR69y3zJUGn5S0JGolS5tmoKicvrSv.5nI_EXDzPy7KLunfQ LANAEkAgpgt2u7x269RmF5UFool3QwY1n9mnI08HwVvv8CwyNyoiFecnKQIErUpc_SRmRPcHYqp2 9cuPIDEw_QSMWuGS4M6LnTyQOZKvJUzwSN.xV9ej.S723LOIfh767SJRQ5SrjBtu57f.6GM.0ALI qbdi6NkDPkdvKcvENE7cbLeoYMv95BKIiAOTiu3W.WeHHpdFKHbxaWfQSlYENFf12jsIWtOVIk7. besMaDEBBp8du7grt.ncCpOjYE0OLemcc9HH.ZlIwESwW9ZmCVWOfkZgMjOG9AhRcXBxQLozuz3F RVEEnxdJF0BS4iCOKdf0Vom4zGZ6HYxgG.UttzXpMnq2sTEO5PK5tDWLi4Jvi6C8.Nx4uqQtAeFQ r04klmELK6LbqHw8y.hPnvkym35VOAg3uLB3G6Obk_rzDE1df3gk9FGXmq07UZLYEVqaPUqXTm7A 7BAiqaVZE5nDLOcyKzhDvLUDok04sLVJqYi2Qn3hC4gV9Ng.nAf_X.33SpaFVy273olaCAcuU.3i eeDhvoAFzuspO58cYdRNiccHYEtXFlhp3FaFcT.n_nTnKvTy_hde7zFg- X-Sonic-MF: X-Sonic-ID: 291b8c73-1e51-46aa-adea-c6d877292bf6 Received: from sonic.gate.mail.ne1.yahoo.com by sonic310.consmr.mail.bf2.yahoo.com with HTTP; Tue, 19 Aug 2025 23:04:01 +0000 Received: by hermes--production-ir2-858bd4ff7b-th9dj (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID c27b7ee00b9dae48d7ea22fe38be92c1; Tue, 19 Aug 2025 22:53:48 +0000 (UTC) Message-ID: Date: Wed, 20 Aug 2025 00:53:45 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Beta Subject: Re: [PVE-User] Quorum bouncing, VM does not start... To: pve-user@lists.proxmox.com References: <4lgenl-oaf2.ln1@leia.lilliput.linux.it> Content-Language: en-US From: dorsy In-Reply-To: <4lgenl-oaf2.ln1@leia.lilliput.linux.it> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Mailer: WebService/1.1.24338 mail.backend.jedi.jws.acl:role.jedi.acl.token.atz.jws.hermes.yahoo X-SPAM-LEVEL: Spam detection results: 0 AWL 0.322 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's domain DKIM_VALID_EF -0.1 Message has a valid DKIM or DK signature from envelope-from domain DMARC_PASS -0.1 DMARC pass policy FREEMAIL_FROM 0.001 Sender email is commonly abused enduser mail provider RCVD_IN_DNSWL_NONE -0.0001 Sender listed at https://www.dnswl.org/, no trust RCVD_IN_MSPIKE_H2 0.001 Average reputation (+2) RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record I'd suggest a direct link between the hosts for another quorum ring if You have a spare network port. Also multiple rings could be more resilient than MLAG. But that is only my 2 cents opinion. see: https://pve.proxmox.com/pve-docs/chapter-pvecm.html#pvecm_redundancy and: https://pve.proxmox.com/pve-docs/chapter-pvecm.html#pvecm_corosync_over_bonds On 8/19/2025 5:08 PM, Marco Gaiarin wrote: > We have some couples of servers in some local branch of our organization, > in cluster but clearly not in failover (or 'automatic failover'); this is > intended. > > Most of these branch offices close for summer holidays, when power outgage > flourish. ;-) > Rather frequently all the site get powered off, UPS do they job but sooner > or later shutdown servers (and all other equipment) until some local > employer goes to the site and re-power up all the site. > > The server are organized with two UPS (one per sever); the UPS power also a > stack of two catalyst 2960S switches (again, one UPS per switches); all the > server have a trunk/bond for every interface, a cable on switch1 and a cable > on switch2 in the stack. > > > We have recently upgraded to PVE 8, and found that if all the site get > powered off, sometime but with a decent frequency, only some VMs get powered > on. > > > Digging the culprint we have found: > > 2025-08-07T10:49:19.997751+02:00 pdpve1 systemd[1]: Starting pve-guests.service - PVE guests... > 2025-08-07T10:49:20.792333+02:00 pdpve1 pve-guests[2392]: starting task UPID:pdpve1:00000959:0000117F:68946890:startall::root@pam: > 2025-08-07T10:49:20.794446+02:00 pdpve1 pvesh[2392]: waiting for quorum ... > 2025-08-07T10:52:18.584607+02:00 pdpve1 pmxcfs[2021]: [status] notice: node has quorum > 2025-08-07T10:52:18.879944+02:00 pdpve1 pvesh[2392]: got quorum > 2025-08-07T10:52:18.891461+02:00 pdpve1 pve-guests[2393]: starting task UPID:pdpve1:00000B86:00005711:68946942:qmstart:100:root@pam: > 2025-08-07T10:52:18.891653+02:00 pdpve1 pve-guests[2950]: start VM 100: UPID:pdpve1:00000B86:00005711:68946942:qmstart:100:root@pam: > 2025-08-07T10:52:20.103473+02:00 pdpve1 pve-guests[2950]: VM 100 started with PID 2960. > > so servers restart, get quorum, start VM in order; but suddenly lost quorum: > > 2025-08-07T10:53:16.128336+02:00 pdpve1 pmxcfs[2021]: [status] notice: node lost quorum > 2025-08-07T10:53:20.901367+02:00 pdpve1 pve-guests[2393]: cluster not ready - no quorum? > 2025-08-07T10:53:20.903743+02:00 pdpve1 pvesh[2392]: cluster not ready - no quorum? > 2025-08-07T10:53:20.905349+02:00 pdpve1 pve-guests[2392]: end task UPID:pdpve1:00000959:0000117F:68946890:startall::root@pam: cluster not ready - no quorum? > 2025-08-07T10:53:20.922275+02:00 pdpve1 systemd[1]: Finished pve-guests.service - PVE guests. > > and subsequent VMs does not run; after some seconds, quorum get back, all > goes normal. But VMs have to be run by hand. > > > Clearly if we reboot or poweroff the two servers with the switch still > powered on, all works as expected. > We have managed to power on the server and do a reboot of the switch in the > same time, and the trouble get triggered. > > > So seems that the quorum get lost probably because the switch stop working > for some time doing their things (eg, binding the second unit in the stack > and doing ethernet bonds), that confuse the quorum, bang. > > We have tried to add: > > pvenode config set --startall-onboot-delay 120 > > an the two nodes, do the experiment (eg, start the server and reboot the > switch) and the trouble does not trigger. > > > Still i'm asking some feedback... particulary: > > 1) we was on PVE6: something are changed in quorum definition from PVE6 to > PVE8? Because before upgrading we have never hit this... > > 2) there are better solution to this? > > > Thanks. > -- dorsy --===============7080455098475626076== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user --===============7080455098475626076==--