From: Marco Gaiarin <gaio@lilliput.linux.it>
To: pve-user@lists.proxmox.com
Subject: [PVE-User] Quorum bouncing, VM does not start...
Date: Tue, 19 Aug 2025 17:08:54 +0200 [thread overview]
Message-ID: <4lgenl-oaf2.ln1@leia.lilliput.linux.it> (raw)
We have some couples of servers in some local branch of our organization,
in cluster but clearly not in failover (or 'automatic failover'); this is
intended.
Most of these branch offices close for summer holidays, when power outgage
flourish. ;-)
Rather frequently all the site get powered off, UPS do they job but sooner
or later shutdown servers (and all other equipment) until some local
employer goes to the site and re-power up all the site.
The server are organized with two UPS (one per sever); the UPS power also a
stack of two catalyst 2960S switches (again, one UPS per switches); all the
server have a trunk/bond for every interface, a cable on switch1 and a cable
on switch2 in the stack.
We have recently upgraded to PVE 8, and found that if all the site get
powered off, sometime but with a decent frequency, only some VMs get powered
on.
Digging the culprint we have found:
2025-08-07T10:49:19.997751+02:00 pdpve1 systemd[1]: Starting pve-guests.service - PVE guests...
2025-08-07T10:49:20.792333+02:00 pdpve1 pve-guests[2392]: <root@pam> starting task UPID:pdpve1:00000959:0000117F:68946890:startall::root@pam:
2025-08-07T10:49:20.794446+02:00 pdpve1 pvesh[2392]: waiting for quorum ...
2025-08-07T10:52:18.584607+02:00 pdpve1 pmxcfs[2021]: [status] notice: node has quorum
2025-08-07T10:52:18.879944+02:00 pdpve1 pvesh[2392]: got quorum
2025-08-07T10:52:18.891461+02:00 pdpve1 pve-guests[2393]: <root@pam> starting task UPID:pdpve1:00000B86:00005711:68946942:qmstart:100:root@pam:
2025-08-07T10:52:18.891653+02:00 pdpve1 pve-guests[2950]: start VM 100: UPID:pdpve1:00000B86:00005711:68946942:qmstart:100:root@pam:
2025-08-07T10:52:20.103473+02:00 pdpve1 pve-guests[2950]: VM 100 started with PID 2960.
so servers restart, get quorum, start VM in order; but suddenly lost quorum:
2025-08-07T10:53:16.128336+02:00 pdpve1 pmxcfs[2021]: [status] notice: node lost quorum
2025-08-07T10:53:20.901367+02:00 pdpve1 pve-guests[2393]: cluster not ready - no quorum?
2025-08-07T10:53:20.903743+02:00 pdpve1 pvesh[2392]: cluster not ready - no quorum?
2025-08-07T10:53:20.905349+02:00 pdpve1 pve-guests[2392]: <root@pam> end task UPID:pdpve1:00000959:0000117F:68946890:startall::root@pam: cluster not ready - no quorum?
2025-08-07T10:53:20.922275+02:00 pdpve1 systemd[1]: Finished pve-guests.service - PVE guests.
and subsequent VMs does not run; after some seconds, quorum get back, all
goes normal. But VMs have to be run by hand.
Clearly if we reboot or poweroff the two servers with the switch still
powered on, all works as expected.
We have managed to power on the server and do a reboot of the switch in the
same time, and the trouble get triggered.
So seems that the quorum get lost probably because the switch stop working
for some time doing their things (eg, binding the second unit in the stack
and doing ethernet bonds), that confuse the quorum, bang.
We have tried to add:
pvenode config set --startall-onboot-delay 120
an the two nodes, do the experiment (eg, start the server and reboot the
switch) and the trouble does not trigger.
Still i'm asking some feedback... particulary:
1) we was on PVE6: something are changed in quorum definition from PVE6 to
PVE8? Because before upgrading we have never hit this...
2) there are better solution to this?
Thanks.
--
_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
next reply other threads:[~2025-08-19 21:39 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-19 15:08 Marco Gaiarin [this message]
2025-08-19 22:53 ` dorsy via pve-user
2025-08-25 9:04 ` Marco Gaiarin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4lgenl-oaf2.ln1@leia.lilliput.linux.it \
--to=gaio@lilliput.linux.it \
--cc=pve-user@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.