all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Alexander Zeidler <a.zeidler@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH docs 3/6] ceph: troubleshooting: revise and add frequently needed information
Date: Mon,  3 Feb 2025 15:27:58 +0100	[thread overview]
Message-ID: <20250203142801.3-3-a.zeidler@proxmox.com> (raw)
In-Reply-To: <20250203142801.3-1-a.zeidler@proxmox.com>

Existing information is slightly modified and retained.

Add information:
* List which logs are usually helpful for troubleshooting
* Explain how to acknowledge listed Ceph crashes and view details
* List common causes of Ceph problems and link to recommendations for a
  healthy cluster
* Briefly describe the common problem "OSDs down/crashed"

Signed-off-by: Alexander Zeidler <a.zeidler@proxmox.com>
---
 pveceph.adoc | 72 ++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 64 insertions(+), 8 deletions(-)

diff --git a/pveceph.adoc b/pveceph.adoc
index 90bb975..4e1c1e2 100644
--- a/pveceph.adoc
+++ b/pveceph.adoc
@@ -1150,22 +1150,78 @@ The following Ceph commands can be used to see if the cluster is healthy
 ('HEALTH_OK'), if there are warnings ('HEALTH_WARN'), or even errors
 ('HEALTH_ERR'). If the cluster is in an unhealthy state, the status commands
 below will also give you an overview of the current events and actions to take.
+To stop their execution, press CTRL-C.
 
 ----
-# single time output
-pve# ceph -s
-# continuously output status changes (press CTRL+C to stop)
-pve# ceph -w
+# Continuously watch the cluster status
+pve# watch ceph --status
+
+# Print the cluster status once (not being updated)
+# and continuously append lines of status events
+pve# ceph --watch
 ----
 
+[[pve_ceph_ts]]
+Troubleshooting
+~~~~~~~~~~~~~~~
+
+This section includes frequently used troubleshooting information.
+More information can be found on the official Ceph website under
+Troubleshooting
+footnote:[Ceph troubleshooting {cephdocs-url}/rados/troubleshooting/].
+
+[[pve_ceph_ts_logs]]
+.Relevant Logs on Affected Node
+
+* xref:_disk_health_monitoring[Disk Health Monitoring]
+* __System -> System Log__ (or, for example,
+  `journalctl --since "2 days ago"`)
+* IPMI and RAID controller logs
+
+Ceph service crashes can be listed and viewed in detail by running
+`ceph crash ls` and `ceph crash info <crash_id>`. Crashes marked as
+new can be acknowledged by running, for example,
+`ceph crash archive-all`.
+
 To get a more detailed view, every Ceph service has a log file under
 `/var/log/ceph/`. If more detail is required, the log level can be
 adjusted footnote:[Ceph log and debugging {cephdocs-url}/rados/troubleshooting/log-and-debug/].
 
-You can find more information about troubleshooting
-footnote:[Ceph troubleshooting {cephdocs-url}/rados/troubleshooting/]
-a Ceph cluster on the official website.
-
+[[pve_ceph_ts_causes]]
+.Common Causes of Ceph Problems
+
+* Network problems like congestion, a faulty switch, a shut down
+interface or a blocking firewall. Check whether all {pve} nodes are
+reliably reachable on the xref:_cluster_network[corosync] network and
+on the xref:pve_ceph_install_wizard[configured] Ceph public and
+cluster network.
+
+* Disk or connection parts which are:
+** defective
+** not firmly mounted
+** lacking I/O performance under higher load (e.g. when using HDDs,
+consumer hardware or xref:pve_ceph_recommendation_raid[inadvisable]
+RAID controllers)
+
+* Not fulfilling the xref:pve_ceph_recommendation[recommendations] for
+a healthy Ceph cluster.
+
+[[pve_ceph_ts_problems]]
+.Common Ceph Problems
+ ::
+
+OSDs `down`/crashed:::
+A faulty OSD will be reported as `down` and mostly (auto) `out` 10
+minutes later. Depending on the cause, it can also automatically
+become `up` and `in` again. To try a manual activation via web
+interface, go to __Any node -> Ceph -> OSD__, select the OSD and click
+on **Start**, **In** and **Reload**. When using the shell, run on the
+affected node `ceph-volume lvm activate --all`.
++
+To activate a failed OSD, it may be necessary to
+xref:ha_manager_node_maintenance[safely] reboot the respective node
+or, as a last resort, to
+xref:pve_ceph_osd_replace[recreate or replace] the OSD.
 
 ifdef::manvolnum[]
 include::pve-copyright.adoc[]
-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


  parent reply	other threads:[~2025-02-03 14:29 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-03 14:27 [pve-devel] [PATCH docs 1/6] ceph: add anchors for use in troubleshooting section Alexander Zeidler
2025-02-03 14:27 ` [pve-devel] [PATCH docs 2/6] ceph: correct heading capitalization Alexander Zeidler
2025-02-03 14:27 ` Alexander Zeidler [this message]
2025-02-03 16:19   ` [pve-devel] [PATCH docs 3/6] ceph: troubleshooting: revise and add frequently needed information Max Carrara
2025-02-03 14:27 ` [pve-devel] [PATCH docs 4/6] ceph: osd: revise and expand the section "Destroy OSDs" Alexander Zeidler
2025-02-03 16:19   ` Max Carrara
2025-02-03 14:28 ` [pve-devel] [PATCH docs 5/6] ceph: maintenance: revise and expand section "Replace OSDs" Alexander Zeidler
2025-02-03 14:28 ` [pve-devel] [PATCH docs 6/6] pvecm: remove node: mention Ceph and its steps for safe removal Alexander Zeidler
2025-02-03 16:19 ` [pve-devel] [PATCH docs 1/6] ceph: add anchors for use in troubleshooting section Max Carrara
2025-02-04  9:22   ` Alexander Zeidler
2025-02-04  9:52     ` Max Carrara
2025-02-05 10:10       ` Alexander Zeidler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250203142801.3-3-a.zeidler@proxmox.com \
    --to=a.zeidler@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal