public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH docs] pveceph: update cluster shutdown procedure
@ 2025-06-24  8:53 Aaron Lauterer
  2025-07-16 23:02 ` [pve-devel] applied: " Thomas Lamprecht
  0 siblings, 1 reply; 2+ messages in thread
From: Aaron Lauterer @ 2025-06-24  8:53 UTC (permalink / raw)
  To: pve-devel

It seems that the shutdown procedure which we had documented (and many
others as well) can potentially cause problems when the cluster starts
up again.

There is a blog article from croit explaining it in detail [0] if you
are interested.

The shorter explanation is that when unsetting 'pause' it might take
half an hour to get out of that state. Unsetting nodown also could take
a long time. During all this OSDs and MONs might consume a lot of CPU
resources, increasing the CPU load on the host considerbly.

nobackfill and norecover are not needed, but also not harmful.

Archeology revealed that the procedure with all the other flags (which
this commit removes), originated in the RedHat documentation, but was
never part of Ceph's shutdown procedure which is tested by the Ceph
team.

[0] https://web.archive.org/web/20250624082830/https://www.croit.io/blog/how-not-to-shut-down-a-ceph-cluster

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
---
 pveceph.adoc | 21 ++++++---------------
 1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/pveceph.adoc b/pveceph.adoc
index 79aa045..a049612 100644
--- a/pveceph.adoc
+++ b/pveceph.adoc
@@ -1133,34 +1133,25 @@ or the CLI:
 ceph -s
 ----
 
-To disable all self-healing actions, and to pause any client IO in the Ceph
-cluster, enable the following OSD flags in the **Ceph -> OSD** panel or via the
-CLI:
+In order to not cause any recovery during the shut down and later power on
+phases, enable the 'noout' OSD flag. Either in the **Ceph -> OSD** panel behind
+the **Manage Global Flags** button or the CLI:
 
 [source,bash]
 ----
 ceph osd set noout
-ceph osd set norecover
-ceph osd set norebalance
-ceph osd set nobackfill
-ceph osd set nodown
-ceph osd set pause
 ----
 
 Start powering down your nodes without a monitor (MON). After these nodes are
 down, continue by shutting down nodes with monitors on them.
 
 When powering on the cluster, start the nodes with monitors (MONs) first. Once
-all nodes are up and running, confirm that all Ceph services are up and running
-before you unset the OSD flags again:
+all nodes are up and running, confirm that all Ceph services are up and running.
+In the end, the only warning you should see for Ceph is that the 'noout' flag
+is still set. You can disable it via the web UI or via the CLI:
 
 [source,bash]
 ----
-ceph osd unset pause
-ceph osd unset nodown
-ceph osd unset nobackfill
-ceph osd unset norebalance
-ceph osd unset norecover
 ceph osd unset noout
 ----
 
-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 2+ messages in thread

* [pve-devel] applied: [PATCH docs] pveceph: update cluster shutdown procedure
  2025-06-24  8:53 [pve-devel] [PATCH docs] pveceph: update cluster shutdown procedure Aaron Lauterer
@ 2025-07-16 23:02 ` Thomas Lamprecht
  0 siblings, 0 replies; 2+ messages in thread
From: Thomas Lamprecht @ 2025-07-16 23:02 UTC (permalink / raw)
  To: pve-devel, Aaron Lauterer

On Tue, 24 Jun 2025 10:53:06 +0200, Aaron Lauterer wrote:
> It seems that the shutdown procedure which we had documented (and many
> others as well) can potentially cause problems when the cluster starts
> up again.
> 
> There is a blog article from croit explaining it in detail [0] if you
> are interested.
> 
> [...]

Applied, thanks!

[1/1] pveceph: update cluster shutdown procedure
      commit: b9d31c25944c2ecf4983ebce01b325c817053fe1


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-07-16 23:01 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-06-24  8:53 [pve-devel] [PATCH docs] pveceph: update cluster shutdown procedure Aaron Lauterer
2025-07-16 23:02 ` [pve-devel] applied: " Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal