public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH docs v2] pveceph: document cluster shutdown
@ 2024-05-22  8:33 Aaron Lauterer
  2024-05-23 12:23 ` Alexander Zeidler
  2024-06-17  9:02 ` Aaron Lauterer
  0 siblings, 2 replies; 4+ messages in thread
From: Aaron Lauterer @ 2024-05-22  8:33 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
---
changes since v1:
* incorporated suggested changes in phrasing to fix grammar and
  distinguish the steps on how to power down the nodes better

 pveceph.adoc | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/pveceph.adoc b/pveceph.adoc
index 089ac80..04bf462 100644
--- a/pveceph.adoc
+++ b/pveceph.adoc
@@ -1080,6 +1080,56 @@ scrubs footnote:[Ceph scrubbing {cephdocs-url}/rados/configuration/osd-config-re
 are executed.
 
 
+[[pveceph_shutdown]]
+Shutdown {pve} + Ceph HCI cluster
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To shut down the whole {pve} + Ceph cluster, first stop all Ceph clients. This
+will mainly be VMs and containers. If you have additional clients that might
+access a Ceph FS or an installed RADOS GW, stop these as well.
+Highly available guests will switch their state to 'stopped' when powered down
+via the {pve} tooling.
+
+Once all clients, VMs and containers are off or not accessing the Ceph cluster
+anymore, verify that the Ceph cluster is in a healthy state. Either via the Web UI
+or the CLI:
+
+----
+ceph -s
+----
+
+Then enable the following OSD flags in the Ceph -> OSD panel or the CLI:
+
+----
+ceph osd set noout
+ceph osd set norecover
+ceph osd set norebalance
+ceph osd set nobackfill
+ceph osd set nodown
+ceph osd set pause
+----
+
+This will halt all self-healing actions for Ceph and the 'pause' will stop any client IO.
+
+Start powering down your nodes without a monitor (MON). After these nodes are
+down, continue shutting down hosts with monitors on them.
+
+When powering on the cluster, start the nodes with Monitors (MONs) first. Once
+all nodes are up and running, confirm that all Ceph services are up and running
+before you unset the OSD flags:
+
+----
+ceph osd unset noout
+ceph osd unset norecover
+ceph osd unset norebalance
+ceph osd unset nobackfill
+ceph osd unset nodown
+ceph osd unset pause
+----
+
+You can now start up the guests. Highly available guests will change their state
+to 'started' when they power on.
+
 Ceph Monitoring and Troubleshooting
 -----------------------------------
 
-- 
2.39.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [pve-devel] [PATCH docs v2] pveceph: document cluster shutdown
  2024-05-22  8:33 [pve-devel] [PATCH docs v2] pveceph: document cluster shutdown Aaron Lauterer
@ 2024-05-23 12:23 ` Alexander Zeidler
  2024-05-28 11:53   ` Aaron Lauterer
  2024-06-17  9:02 ` Aaron Lauterer
  1 sibling, 1 reply; 4+ messages in thread
From: Alexander Zeidler @ 2024-05-23 12:23 UTC (permalink / raw)
  To: Proxmox VE development discussion

On Wed, 2024-05-22 at 10:33 +0200, Aaron Lauterer wrote:
> Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
> ---
> changes since v1:
> * incorporated suggested changes in phrasing to fix grammar and
>   distinguish the steps on how to power down the nodes better
> 
>  pveceph.adoc | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 50 insertions(+)
> 
> diff --git a/pveceph.adoc b/pveceph.adoc
> index 089ac80..04bf462 100644
> --- a/pveceph.adoc
> +++ b/pveceph.adoc
> @@ -1080,6 +1080,56 @@ scrubs footnote:[Ceph scrubbing {cephdocs-url}/rados/configuration/osd-config-re
>  are executed.
>  
>  
> +[[pveceph_shutdown]]
> +Shutdown {pve} + Ceph HCI cluster
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +To shut down the whole {pve} + Ceph cluster, first stop all Ceph clients. This
Rather s/This/These/ ?

> +will mainly be VMs and containers. If you have additional clients that might
> +access a Ceph FS or an installed RADOS GW, stop these as well.
> +Highly available guests will switch their state to 'stopped' when powered down
> +via the {pve} tooling.
> +
> +Once all clients, VMs and containers are off or not accessing the Ceph cluster
> +anymore, verify that the Ceph cluster is in a healthy state. Either via the Web UI
> +or the CLI:
> +
> +----
> +ceph -s
> +----
> +
> +Then enable the following OSD flags in the Ceph -> OSD panel or the CLI:
For style consistency: **Ceph -> OSD panel**

Maybe: s/or the CLI/or via CLI/

> +
> +----
> +ceph osd set noout
> +ceph osd set norecover
> +ceph osd set norebalance
> +ceph osd set nobackfill
> +ceph osd set nodown
> +ceph osd set pause
Maybe sort alphabetically as in the UI.

> +----
> +
> +This will halt all self-healing actions for Ceph and the 'pause' will stop any client IO.
Perhaps state the goal/result beforehand, e.g.:
Then enable the following OSD flags in the **Ceph -> OSD panel** or via CLI,
which halt all self-healing actions for Ceph and 'pause' any client IO:

> +
> +Start powering down your nodes without a monitor (MON). After these nodes are
> +down, continue shutting down hosts with monitors on them.
Since the continuation is not meant/true for "hosts with monitors":
s/continue/continue by/

Maybe: s/hosts/nodes/

> +
> +When powering on the cluster, start the nodes with Monitors (MONs) first. Once
s/Monitors/monitors/

> +all nodes are up and running, confirm that all Ceph services are up and running
> +before you unset the OSD flags:
Maybe stay with either enable/disable or set/unset.

s/flags:/flags again:/

> +
> +----
> +ceph osd unset noout
> +ceph osd unset norecover
> +ceph osd unset norebalance
> +ceph osd unset nobackfill
> +ceph osd unset nodown
> +ceph osd unset pause
Above mentioned sorting.

> +----
> +
> +You can now start up the guests. Highly available guests will change their state
> +to 'started' when they power on.
> +
>  Ceph Monitoring and Troubleshooting
>  -----------------------------------
>  



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [pve-devel] [PATCH docs v2] pveceph: document cluster shutdown
  2024-05-23 12:23 ` Alexander Zeidler
@ 2024-05-28 11:53   ` Aaron Lauterer
  0 siblings, 0 replies; 4+ messages in thread
From: Aaron Lauterer @ 2024-05-28 11:53 UTC (permalink / raw)
  To: Proxmox VE development discussion, Alexander Zeidler

thanks for the review. one comment inline

On  2024-05-23  14:23, Alexander Zeidler wrote:
> On Wed, 2024-05-22 at 10:33 +0200, Aaron Lauterer wrote:
>> Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
>> ---
>> changes since v1:
>> * incorporated suggested changes in phrasing to fix grammar and
>>    distinguish the steps on how to power down the nodes better
>>
>>   pveceph.adoc | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 50 insertions(+)
>>
>> diff --git a/pveceph.adoc b/pveceph.adoc
>> index 089ac80..04bf462 100644
>> --- a/pveceph.adoc
>> +++ b/pveceph.adoc
>> @@ -1080,6 +1080,56 @@ scrubs footnote:[Ceph scrubbing {cephdocs-url}/rados/configuration/osd-config-re
>>   are executed.
>>   
>>   
>> +[[pveceph_shutdown]]
>> +Shutdown {pve} + Ceph HCI cluster
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +To shut down the whole {pve} + Ceph cluster, first stop all Ceph clients. This
> Rather s/This/These/ ?
> 
>> +will mainly be VMs and containers. If you have additional clients that might
>> +access a Ceph FS or an installed RADOS GW, stop these as well.
>> +Highly available guests will switch their state to 'stopped' when powered down
>> +via the {pve} tooling.
>> +
>> +Once all clients, VMs and containers are off or not accessing the Ceph cluster
>> +anymore, verify that the Ceph cluster is in a healthy state. Either via the Web UI
>> +or the CLI:
>> +
>> +----
>> +ceph -s
>> +----
>> +
>> +Then enable the following OSD flags in the Ceph -> OSD panel or the CLI:
> For style consistency: **Ceph -> OSD panel**
> 
> Maybe: s/or the CLI/or via CLI/
> 
>> +
>> +----
>> +ceph osd set noout
>> +ceph osd set norecover
>> +ceph osd set norebalance
>> +ceph osd set nobackfill
>> +ceph osd set nodown
>> +ceph osd set pause
> Maybe sort alphabetically as in the UI.

I don't think this is a good idea. The order is roughly going from 
"should be set" to "would be good if set". While it would not be 
affected in this case, as p comes after n, but still is an example that 
a pure alphabetical order by default can be problematic: pause will be 
set last, as is halts any IO in the cluster.

With that in mind, I realized that the sorting in the unset part should 
be reversed.

> 
>> +----
>> +
>> +This will halt all self-healing actions for Ceph and the 'pause' will stop any client IO.
> Perhaps state the goal/result beforehand, e.g.:
> Then enable the following OSD flags in the **Ceph -> OSD panel** or via CLI,
> which halt all self-healing actions for Ceph and 'pause' any client IO:
> 
>> +
>> +Start powering down your nodes without a monitor (MON). After these nodes are
>> +down, continue shutting down hosts with monitors on them.
> Since the continuation is not meant/true for "hosts with monitors":
> s/continue/continue by/
> 
> Maybe: s/hosts/nodes/
> 
>> +
>> +When powering on the cluster, start the nodes with Monitors (MONs) first. Once
> s/Monitors/monitors/
> 
>> +all nodes are up and running, confirm that all Ceph services are up and running
>> +before you unset the OSD flags:
> Maybe stay with either enable/disable or set/unset.
> 
> s/flags:/flags again:/
> 
>> +
>> +----
>> +ceph osd unset noout
>> +ceph osd unset norecover
>> +ceph osd unset norebalance
>> +ceph osd unset nobackfill
>> +ceph osd unset nodown
>> +ceph osd unset pause
> Above mentioned sorting.
> 
>> +----
>> +
>> +You can now start up the guests. Highly available guests will change their state
>> +to 'started' when they power on.
>> +
>>   Ceph Monitoring and Troubleshooting
>>   -----------------------------------
>>   
> 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> 


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [pve-devel] [PATCH docs v2] pveceph: document cluster shutdown
  2024-05-22  8:33 [pve-devel] [PATCH docs v2] pveceph: document cluster shutdown Aaron Lauterer
  2024-05-23 12:23 ` Alexander Zeidler
@ 2024-06-17  9:02 ` Aaron Lauterer
  1 sibling, 0 replies; 4+ messages in thread
From: Aaron Lauterer @ 2024-06-17  9:02 UTC (permalink / raw)
  To: pve-devel

new v3 is available 
https://lists.proxmox.com/pipermail/pve-devel/2024-May/064009.html

On  2024-05-22  10:33, Aaron Lauterer wrote:
> Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
> ---
> changes since v1:
> * incorporated suggested changes in phrasing to fix grammar and
>    distinguish the steps on how to power down the nodes better
> 
>   pveceph.adoc | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 50 insertions(+)
> 
> diff --git a/pveceph.adoc b/pveceph.adoc
> index 089ac80..04bf462 100644
> --- a/pveceph.adoc
> +++ b/pveceph.adoc
> @@ -1080,6 +1080,56 @@ scrubs footnote:[Ceph scrubbing {cephdocs-url}/rados/configuration/osd-config-re
>   are executed.
>   
>   
> +[[pveceph_shutdown]]
> +Shutdown {pve} + Ceph HCI cluster
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +To shut down the whole {pve} + Ceph cluster, first stop all Ceph clients. This
> +will mainly be VMs and containers. If you have additional clients that might
> +access a Ceph FS or an installed RADOS GW, stop these as well.
> +Highly available guests will switch their state to 'stopped' when powered down
> +via the {pve} tooling.
> +
> +Once all clients, VMs and containers are off or not accessing the Ceph cluster
> +anymore, verify that the Ceph cluster is in a healthy state. Either via the Web UI
> +or the CLI:
> +
> +----
> +ceph -s
> +----
> +
> +Then enable the following OSD flags in the Ceph -> OSD panel or the CLI:
> +
> +----
> +ceph osd set noout
> +ceph osd set norecover
> +ceph osd set norebalance
> +ceph osd set nobackfill
> +ceph osd set nodown
> +ceph osd set pause
> +----
> +
> +This will halt all self-healing actions for Ceph and the 'pause' will stop any client IO.
> +
> +Start powering down your nodes without a monitor (MON). After these nodes are
> +down, continue shutting down hosts with monitors on them.
> +
> +When powering on the cluster, start the nodes with Monitors (MONs) first. Once
> +all nodes are up and running, confirm that all Ceph services are up and running
> +before you unset the OSD flags:
> +
> +----
> +ceph osd unset noout
> +ceph osd unset norecover
> +ceph osd unset norebalance
> +ceph osd unset nobackfill
> +ceph osd unset nodown
> +ceph osd unset pause
> +----
> +
> +You can now start up the guests. Highly available guests will change their state
> +to 'started' when they power on.
> +
>   Ceph Monitoring and Troubleshooting
>   -----------------------------------
>   


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-06-17  9:02 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-22  8:33 [pve-devel] [PATCH docs v2] pveceph: document cluster shutdown Aaron Lauterer
2024-05-23 12:23 ` Alexander Zeidler
2024-05-28 11:53   ` Aaron Lauterer
2024-06-17  9:02 ` Aaron Lauterer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal