From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 41C961FF16F for ; Tue, 24 Jun 2025 10:52:37 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 4AA9D316B2; Tue, 24 Jun 2025 10:53:09 +0200 (CEST) From: Aaron Lauterer To: pve-devel@lists.proxmox.com Date: Tue, 24 Jun 2025 10:53:06 +0200 Message-Id: <20250624085306.3224755-1-a.lauterer@proxmox.com> X-Mailer: git-send-email 2.39.5 MIME-Version: 1.0 X-SPAM-LEVEL: Spam detection results: 0 AWL -0.080 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [archive.org, croit.io] URIBL_SBL_A 0.1 Contains URL's A record listed in the Spamhaus SBL blocklist [207.241.237.3] Subject: [pve-devel] [PATCH docs] pveceph: update cluster shutdown procedure X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" It seems that the shutdown procedure which we had documented (and many others as well) can potentially cause problems when the cluster starts up again. There is a blog article from croit explaining it in detail [0] if you are interested. The shorter explanation is that when unsetting 'pause' it might take half an hour to get out of that state. Unsetting nodown also could take a long time. During all this OSDs and MONs might consume a lot of CPU resources, increasing the CPU load on the host considerbly. nobackfill and norecover are not needed, but also not harmful. Archeology revealed that the procedure with all the other flags (which this commit removes), originated in the RedHat documentation, but was never part of Ceph's shutdown procedure which is tested by the Ceph team. [0] https://web.archive.org/web/20250624082830/https://www.croit.io/blog/how-not-to-shut-down-a-ceph-cluster Signed-off-by: Aaron Lauterer --- pveceph.adoc | 21 ++++++--------------- 1 file changed, 6 insertions(+), 15 deletions(-) diff --git a/pveceph.adoc b/pveceph.adoc index 79aa045..a049612 100644 --- a/pveceph.adoc +++ b/pveceph.adoc @@ -1133,34 +1133,25 @@ or the CLI: ceph -s ---- -To disable all self-healing actions, and to pause any client IO in the Ceph -cluster, enable the following OSD flags in the **Ceph -> OSD** panel or via the -CLI: +In order to not cause any recovery during the shut down and later power on +phases, enable the 'noout' OSD flag. Either in the **Ceph -> OSD** panel behind +the **Manage Global Flags** button or the CLI: [source,bash] ---- ceph osd set noout -ceph osd set norecover -ceph osd set norebalance -ceph osd set nobackfill -ceph osd set nodown -ceph osd set pause ---- Start powering down your nodes without a monitor (MON). After these nodes are down, continue by shutting down nodes with monitors on them. When powering on the cluster, start the nodes with monitors (MONs) first. Once -all nodes are up and running, confirm that all Ceph services are up and running -before you unset the OSD flags again: +all nodes are up and running, confirm that all Ceph services are up and running. +In the end, the only warning you should see for Ceph is that the 'noout' flag +is still set. You can disable it via the web UI or via the CLI: [source,bash] ---- -ceph osd unset pause -ceph osd unset nodown -ceph osd unset nobackfill -ceph osd unset norebalance -ceph osd unset norecover ceph osd unset noout ---- -- 2.39.5 _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel