all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH docs 2/2] ha-manager: document disarming and arming
Date: Tue, 10 Mar 2026 16:47:30 +0100	[thread overview]
Message-ID: <20260310155216.2086316-3-t.lamprecht@proxmox.com> (raw)
In-Reply-To: <20260310155216.2086316-1-t.lamprecht@proxmox.com>

Add a new section to document the new disarm-ha and arm-ha commands
and their interaction with some other commands or situations.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
 ha-manager.adoc | 127 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 127 insertions(+)

diff --git a/ha-manager.adoc b/ha-manager.adoc
index ee254be..5547f7c 100644
--- a/ha-manager.adoc
+++ b/ha-manager.adoc
@@ -1024,6 +1024,19 @@ when no HA resources are configured yet or the cluster just started. The CRM
 watchdog is not open. Fencing automatically transitions to `armed` once a CRM
 takes over as master.
 
+disarming::
+
+A `disarm-ha` command was issued. The CRM is freezing or removing services
+from tracking and waiting for all LRMs to release their watchdogs. The CRM
+watchdog is still active during this phase. Each LRM entry's watchdog status
+changes to `released` as it acknowledges the disarm.
+
+disarmed::
+
+All watchdogs have been released cluster-wide. No automatic fencing,
+failover, or recovery takes place. See
+xref:ha_manager_disarm[Disarming HA for Cluster Maintenance].
+
 NOTE: The `watchdog-mux` service keeps the underlying `/dev/watchdog` device
 open for its entire lifetime, even when no HA client is connected. This
 prevents other processes from claiming the device and ensures the HA stack can
@@ -1281,6 +1294,120 @@ NOTE: Please do not 'kill' services like `pve-ha-crm`, `pve-ha-lrm` or
 immediate node reboot or even reset.
 
 
+[[ha_manager_disarm]]
+Disarming HA for Cluster Maintenance
+-------------------------------------
+
+Certain cluster maintenance tasks, such as reconfiguring the network or the
+cluster communication stack (corosync), can cause temporary quorum loss or
+network partitions. Normally, HA would interpret this as a node failure and
+trigger self-fencing, disrupting services unnecessarily.
+
+The disarm mechanism releases all CRM and LRM watchdogs cluster-wide, allowing
+you to perform such maintenance safely without the risk of nodes being fenced.
+
+IMPORTANT: While disarmed, HA does not protect your services. Failures during
+this period are not automatically recovered. Keep the disarm window as short
+as possible.
+
+.Resource Modes
+
+When disarming HA, you must choose a resource mode that controls how HA
+managed resources are handled while disarmed. The current state of resources
+is not affected.
+
+freeze::
+
+New commands and state changes are not applied. Services stay in their current
+state, but the HA stack does not react to failures or process new requests.
+This is the safest choice when you expect all nodes to remain running.
+
+ignore::
+
+Resources are removed from HA tracking and can be managed as if they were not
+HA managed. This allows you to manually start, stop, or migrate services
+while HA is disarmed. Use this when you need to manually relocate services
+during maintenance.
+
+.Disarming and Re-Arming
+
+To disarm HA with the desired resource mode:
+
+----
+# ha-manager crm-command disarm-ha freeze
+----
+
+or:
+
+----
+# ha-manager crm-command disarm-ha ignore
+----
+
+To re-arm HA after maintenance is complete:
+
+----
+# ha-manager crm-command arm-ha
+----
+
+You can monitor the current state with:
+
+----
+# ha-manager status
+----
+
+The fencing status line shows the current state of the fencing mechanism (see
+xref:ha_manager_fencing_status[Fencing Status]), including the CRM and LRM
+watchdog states.
+
+.The Disarm Process
+
+After you request disarm, the following sequence happens:
+
+. The CRM freezes all services or removes them from tracking, depending on
+  the chosen resource mode.
+. Each LRM finishes its active workers, then releases its agent lock and
+  watchdog.
+. Once all online LRMs are idle, the CRM releases its own watchdog too.
+
+The CRM keeps the manager lock throughout this process, so it can accept and
+process the `arm-ha` command to reverse it.
+
+If any services are currently being fenced or recovered, the disarm is
+deferred until fencing completes. This ensures that partially fenced services
+do not end up in an inconsistent state.
+
+.Nodes Offline During Disarm
+
+If a node is offline when HA is disarmed, its LRM cannot process the disarm
+request. The CRM proceeds to the disarmed state once all *online* LRMs have
+completed their part. The offline node does not block this.
+
+When the offline node comes back online while HA is still disarmed, its LRM
+picks up the disarm state and releases its watchdog without attempting any
+service recovery.
+
+When you re-arm HA, any services that were on the offline node are handled
+according to normal HA recovery rules: they are fenced and recovered if the
+node is still unreachable, or restarted on the node if it has come back
+online.
+
+.Interaction with Maintenance Mode
+
+If a node is already in maintenance mode when disarm is requested, the
+maintenance migration continues until all services have been moved away. Once
+no active services and workers remain, the LRM releases its lock and watchdog
+as part of the disarm process.
+
+When HA is re-armed, the maintenance mode state is preserved. The node remains
+in maintenance and services are not moved back until maintenance mode is
+explicitly disabled.
+
+CAUTION: While the HA stack is disarmed, no automatic recovery, failover, or
+fencing takes place. A node failure during this window is not detected or
+handled by HA. Keep the disarm window as short as possible and ensure that the
+cluster is in a healthy state before re-arming.
+
+
 [[ha_manager_crs]]
 Cluster Resource Scheduling
 ---------------------------
-- 
2.47.3





      parent reply	other threads:[~2026-03-10 15:52 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-10 15:47 [PATCH docs 0/2] document disarm-ha, arm-ha and watchdog fencing status Thomas Lamprecht
2026-03-10 15:47 ` [PATCH docs 1/2] ha-manager: document fencing & watchdog status Thomas Lamprecht
2026-03-10 15:47 ` Thomas Lamprecht [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260310155216.2086316-3-t.lamprecht@proxmox.com \
    --to=t.lamprecht@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal