From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 5438B1FF38E for ; Tue, 28 May 2024 13:53:56 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 4C0F615D08; Tue, 28 May 2024 13:54:18 +0200 (CEST) Message-ID: <8088a0e4-2e77-48a3-831e-7ac37d11a73f@proxmox.com> Date: Tue, 28 May 2024 13:53:44 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: Proxmox VE development discussion , Alexander Zeidler References: <20240522083319.62350-1-a.lauterer@proxmox.com> <0b22bf5afe61914412bbdc5645e3142c092e3859.camel@proxmox.com> From: Aaron Lauterer In-Reply-To: <0b22bf5afe61914412bbdc5645e3142c092e3859.camel@proxmox.com> X-SPAM-LEVEL: Spam detection results: 0 AWL -0.438 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_ASCII_DIVIDERS 0.8 Email that uses ascii formatting dividers and possible spam tricks KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com] Subject: Re: [pve-devel] [PATCH docs v2] pveceph: document cluster shutdown X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" thanks for the review. one comment inline On 2024-05-23 14:23, Alexander Zeidler wrote: > On Wed, 2024-05-22 at 10:33 +0200, Aaron Lauterer wrote: >> Signed-off-by: Aaron Lauterer >> --- >> changes since v1: >> * incorporated suggested changes in phrasing to fix grammar and >> distinguish the steps on how to power down the nodes better >> >> pveceph.adoc | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 50 insertions(+) >> >> diff --git a/pveceph.adoc b/pveceph.adoc >> index 089ac80..04bf462 100644 >> --- a/pveceph.adoc >> +++ b/pveceph.adoc >> @@ -1080,6 +1080,56 @@ scrubs footnote:[Ceph scrubbing {cephdocs-url}/rados/configuration/osd-config-re >> are executed. >> >> >> +[[pveceph_shutdown]] >> +Shutdown {pve} + Ceph HCI cluster >> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> + >> +To shut down the whole {pve} + Ceph cluster, first stop all Ceph clients. This > Rather s/This/These/ ? > >> +will mainly be VMs and containers. If you have additional clients that might >> +access a Ceph FS or an installed RADOS GW, stop these as well. >> +Highly available guests will switch their state to 'stopped' when powered down >> +via the {pve} tooling. >> + >> +Once all clients, VMs and containers are off or not accessing the Ceph cluster >> +anymore, verify that the Ceph cluster is in a healthy state. Either via the Web UI >> +or the CLI: >> + >> +---- >> +ceph -s >> +---- >> + >> +Then enable the following OSD flags in the Ceph -> OSD panel or the CLI: > For style consistency: **Ceph -> OSD panel** > > Maybe: s/or the CLI/or via CLI/ > >> + >> +---- >> +ceph osd set noout >> +ceph osd set norecover >> +ceph osd set norebalance >> +ceph osd set nobackfill >> +ceph osd set nodown >> +ceph osd set pause > Maybe sort alphabetically as in the UI. I don't think this is a good idea. The order is roughly going from "should be set" to "would be good if set". While it would not be affected in this case, as p comes after n, but still is an example that a pure alphabetical order by default can be problematic: pause will be set last, as is halts any IO in the cluster. With that in mind, I realized that the sorting in the unset part should be reversed. > >> +---- >> + >> +This will halt all self-healing actions for Ceph and the 'pause' will stop any client IO. > Perhaps state the goal/result beforehand, e.g.: > Then enable the following OSD flags in the **Ceph -> OSD panel** or via CLI, > which halt all self-healing actions for Ceph and 'pause' any client IO: > >> + >> +Start powering down your nodes without a monitor (MON). After these nodes are >> +down, continue shutting down hosts with monitors on them. > Since the continuation is not meant/true for "hosts with monitors": > s/continue/continue by/ > > Maybe: s/hosts/nodes/ > >> + >> +When powering on the cluster, start the nodes with Monitors (MONs) first. Once > s/Monitors/monitors/ > >> +all nodes are up and running, confirm that all Ceph services are up and running >> +before you unset the OSD flags: > Maybe stay with either enable/disable or set/unset. > > s/flags:/flags again:/ > >> + >> +---- >> +ceph osd unset noout >> +ceph osd unset norecover >> +ceph osd unset norebalance >> +ceph osd unset nobackfill >> +ceph osd unset nodown >> +ceph osd unset pause > Above mentioned sorting. > >> +---- >> + >> +You can now start up the guests. Highly available guests will change their state >> +to 'started' when they power on. >> + >> Ceph Monitoring and Troubleshooting >> ----------------------------------- >> > > > > _______________________________________________ > pve-devel mailing list > pve-devel@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel > > _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel