public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH docs 1/6] ceph: add anchors for use in troubleshooting section
@ 2025-02-03 14:27 Alexander Zeidler
  2025-02-03 14:27 ` [pve-devel] [PATCH docs 2/6] ceph: correct heading capitalization Alexander Zeidler
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Alexander Zeidler @ 2025-02-03 14:27 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Alexander Zeidler <a.zeidler@proxmox.com>
---
 pveceph.adoc | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/pveceph.adoc b/pveceph.adoc
index da39e7f..93c2f8d 100644
--- a/pveceph.adoc
+++ b/pveceph.adoc
@@ -82,6 +82,7 @@ and vocabulary
 footnote:[Ceph glossary {cephdocs-url}/glossary].
 
 
+[[pve_ceph_recommendation]]
 Recommendations for a Healthy Ceph Cluster
 ------------------------------------------
 
@@ -95,6 +96,7 @@ NOTE: The recommendations below should be seen as a rough guidance for choosing
 hardware. Therefore, it is still essential to adapt it to your specific needs.
 You should test your setup and monitor health and performance continuously.
 
+[[pve_ceph_recommendation_cpu]]
 .CPU
 Ceph services can be classified into two categories:
 
@@ -122,6 +124,7 @@ IOPS load over 100'000 with sub millisecond latency, each OSD can use multiple
 CPU threads, e.g., four to six CPU threads utilized per NVMe backed OSD is
 likely for very high performance disks.
 
+[[pve_ceph_recommendation_memory]]
 .Memory
 Especially in a hyper-converged setup, the memory consumption needs to be
 carefully planned out and monitored. In addition to the predicted memory usage
@@ -137,6 +140,7 @@ normal operation, but rather leave some headroom to cope with outages.
 The OSD service itself will use additional memory. The Ceph BlueStore backend of
 the daemon requires by default **3-5 GiB of memory** (adjustable).
 
+[[pve_ceph_recommendation_network]]
 .Network
 We recommend a network bandwidth of at least 10 Gbps, or more, to be used
 exclusively for Ceph traffic. A meshed network setup
@@ -172,6 +176,7 @@ high-performance setups:
 * one medium bandwidth (1 Gbps) exclusive for the latency sensitive corosync
   cluster communication.
 
+[[pve_ceph_recommendation_disk]]
 .Disks
 When planning the size of your Ceph cluster, it is important to take the
 recovery time into consideration. Especially with small clusters, recovery
@@ -197,6 +202,7 @@ You also need to balance OSD count and single OSD capacity. More capacity
 allows you to increase storage density, but it also means that a single OSD
 failure forces Ceph to recover more data at once.
 
+[[pve_ceph_recommendation_raid]]
 .Avoid RAID
 As Ceph handles data object redundancy and multiple parallel writes to disks
 (OSDs) on its own, using a RAID controller normally doesn’t improve
@@ -1018,6 +1024,7 @@ to act as standbys.
 Ceph maintenance
 ----------------
 
+[[pve_ceph_osd_replace]]
 Replace OSDs
 ~~~~~~~~~~~~
 
@@ -1131,6 +1138,7 @@ ceph osd unset noout
 You can now start up the guests. Highly available guests will change their state
 to 'started' when they power on.
 
+[[pve_ceph_mon_and_ts]]
 Ceph Monitoring and Troubleshooting
 -----------------------------------
 
-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [pve-devel] [PATCH docs 2/6] ceph: correct heading capitalization
  2025-02-03 14:27 [pve-devel] [PATCH docs 1/6] ceph: add anchors for use in troubleshooting section Alexander Zeidler
@ 2025-02-03 14:27 ` Alexander Zeidler
  2025-02-03 14:27 ` [pve-devel] [PATCH docs 3/6] ceph: troubleshooting: revise and add frequently needed information Alexander Zeidler
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Alexander Zeidler @ 2025-02-03 14:27 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Alexander Zeidler <a.zeidler@proxmox.com>
---
 pveceph.adoc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/pveceph.adoc b/pveceph.adoc
index 93c2f8d..90bb975 100644
--- a/pveceph.adoc
+++ b/pveceph.adoc
@@ -768,7 +768,7 @@ Nautilus: PG merging and autotuning].
 
 
 [[pve_ceph_device_classes]]
-Ceph CRUSH & device classes
+Ceph CRUSH & Device Classes
 ---------------------------
 
 [thumbnail="screenshot/gui-ceph-config.png"]
@@ -1021,7 +1021,7 @@ After these steps, the CephFS should be completely removed and if you have
 other CephFS instances, the stopped metadata servers can be started again
 to act as standbys.
 
-Ceph maintenance
+Ceph Maintenance
 ----------------
 
 [[pve_ceph_osd_replace]]
@@ -1089,7 +1089,7 @@ are executed.
 
 
 [[pveceph_shutdown]]
-Shutdown {pve} + Ceph HCI cluster
+Shutdown {pve} + Ceph HCI Cluster
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 To shut down the whole {pve} + Ceph cluster, first stop all Ceph clients. These
-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [pve-devel] [PATCH docs 3/6] ceph: troubleshooting: revise and add frequently needed information
  2025-02-03 14:27 [pve-devel] [PATCH docs 1/6] ceph: add anchors for use in troubleshooting section Alexander Zeidler
  2025-02-03 14:27 ` [pve-devel] [PATCH docs 2/6] ceph: correct heading capitalization Alexander Zeidler
@ 2025-02-03 14:27 ` Alexander Zeidler
  2025-02-03 16:19   ` Max Carrara
  2025-02-03 14:27 ` [pve-devel] [PATCH docs 4/6] ceph: osd: revise and expand the section "Destroy OSDs" Alexander Zeidler
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Alexander Zeidler @ 2025-02-03 14:27 UTC (permalink / raw)
  To: pve-devel

Existing information is slightly modified and retained.

Add information:
* List which logs are usually helpful for troubleshooting
* Explain how to acknowledge listed Ceph crashes and view details
* List common causes of Ceph problems and link to recommendations for a
  healthy cluster
* Briefly describe the common problem "OSDs down/crashed"

Signed-off-by: Alexander Zeidler <a.zeidler@proxmox.com>
---
 pveceph.adoc | 72 ++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 64 insertions(+), 8 deletions(-)

diff --git a/pveceph.adoc b/pveceph.adoc
index 90bb975..4e1c1e2 100644
--- a/pveceph.adoc
+++ b/pveceph.adoc
@@ -1150,22 +1150,78 @@ The following Ceph commands can be used to see if the cluster is healthy
 ('HEALTH_OK'), if there are warnings ('HEALTH_WARN'), or even errors
 ('HEALTH_ERR'). If the cluster is in an unhealthy state, the status commands
 below will also give you an overview of the current events and actions to take.
+To stop their execution, press CTRL-C.
 
 ----
-# single time output
-pve# ceph -s
-# continuously output status changes (press CTRL+C to stop)
-pve# ceph -w
+# Continuously watch the cluster status
+pve# watch ceph --status
+
+# Print the cluster status once (not being updated)
+# and continuously append lines of status events
+pve# ceph --watch
 ----
 
+[[pve_ceph_ts]]
+Troubleshooting
+~~~~~~~~~~~~~~~
+
+This section includes frequently used troubleshooting information.
+More information can be found on the official Ceph website under
+Troubleshooting
+footnote:[Ceph troubleshooting {cephdocs-url}/rados/troubleshooting/].
+
+[[pve_ceph_ts_logs]]
+.Relevant Logs on Affected Node
+
+* xref:_disk_health_monitoring[Disk Health Monitoring]
+* __System -> System Log__ (or, for example,
+  `journalctl --since "2 days ago"`)
+* IPMI and RAID controller logs
+
+Ceph service crashes can be listed and viewed in detail by running
+`ceph crash ls` and `ceph crash info <crash_id>`. Crashes marked as
+new can be acknowledged by running, for example,
+`ceph crash archive-all`.
+
 To get a more detailed view, every Ceph service has a log file under
 `/var/log/ceph/`. If more detail is required, the log level can be
 adjusted footnote:[Ceph log and debugging {cephdocs-url}/rados/troubleshooting/log-and-debug/].
 
-You can find more information about troubleshooting
-footnote:[Ceph troubleshooting {cephdocs-url}/rados/troubleshooting/]
-a Ceph cluster on the official website.
-
+[[pve_ceph_ts_causes]]
+.Common Causes of Ceph Problems
+
+* Network problems like congestion, a faulty switch, a shut down
+interface or a blocking firewall. Check whether all {pve} nodes are
+reliably reachable on the xref:_cluster_network[corosync] network and
+on the xref:pve_ceph_install_wizard[configured] Ceph public and
+cluster network.
+
+* Disk or connection parts which are:
+** defective
+** not firmly mounted
+** lacking I/O performance under higher load (e.g. when using HDDs,
+consumer hardware or xref:pve_ceph_recommendation_raid[inadvisable]
+RAID controllers)
+
+* Not fulfilling the xref:pve_ceph_recommendation[recommendations] for
+a healthy Ceph cluster.
+
+[[pve_ceph_ts_problems]]
+.Common Ceph Problems
+ ::
+
+OSDs `down`/crashed:::
+A faulty OSD will be reported as `down` and mostly (auto) `out` 10
+minutes later. Depending on the cause, it can also automatically
+become `up` and `in` again. To try a manual activation via web
+interface, go to __Any node -> Ceph -> OSD__, select the OSD and click
+on **Start**, **In** and **Reload**. When using the shell, run on the
+affected node `ceph-volume lvm activate --all`.
++
+To activate a failed OSD, it may be necessary to
+xref:ha_manager_node_maintenance[safely] reboot the respective node
+or, as a last resort, to
+xref:pve_ceph_osd_replace[recreate or replace] the OSD.
 
 ifdef::manvolnum[]
 include::pve-copyright.adoc[]
-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [pve-devel] [PATCH docs 4/6] ceph: osd: revise and expand the section "Destroy OSDs"
  2025-02-03 14:27 [pve-devel] [PATCH docs 1/6] ceph: add anchors for use in troubleshooting section Alexander Zeidler
  2025-02-03 14:27 ` [pve-devel] [PATCH docs 2/6] ceph: correct heading capitalization Alexander Zeidler
  2025-02-03 14:27 ` [pve-devel] [PATCH docs 3/6] ceph: troubleshooting: revise and add frequently needed information Alexander Zeidler
@ 2025-02-03 14:27 ` Alexander Zeidler
  2025-02-03 16:19   ` Max Carrara
  2025-02-03 14:28 ` [pve-devel] [PATCH docs 5/6] ceph: maintenance: revise and expand section "Replace OSDs" Alexander Zeidler
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Alexander Zeidler @ 2025-02-03 14:27 UTC (permalink / raw)
  To: pve-devel

Existing information is slightly modified and retained.

Add information:
* Mention and link to the sections "Troubleshooting" and "Replace OSDs"
* CLI commands (pveceph) must be executed on the affected node
* Check in advance the "Used (%)" of OSDs to avoid blocked I/O
* Check and wait until the OSD can be stopped safely
* Use `pveceph stop` instead of `systemctl stop ceph-osd@<ID>.service`
* Explain cleanup option a bit more

Signed-off-by: Alexander Zeidler <a.zeidler@proxmox.com>
---
 pveceph.adoc | 58 ++++++++++++++++++++++++++++------------------------
 1 file changed, 31 insertions(+), 27 deletions(-)

diff --git a/pveceph.adoc b/pveceph.adoc
index 4e1c1e2..754c401 100644
--- a/pveceph.adoc
+++ b/pveceph.adoc
@@ -502,33 +502,37 @@ ceph-volume lvm create --filestore --data /dev/sd[X] --journal /dev/sd[Y]
 Destroy OSDs
 ~~~~~~~~~~~~
 
-To remove an OSD via the GUI, first select a {PVE} node in the tree view and go
-to the **Ceph -> OSD** panel. Then select the OSD to destroy and click the **OUT**
-button. Once the OSD status has changed from `in` to `out`, click the **STOP**
-button. Finally, after the status has changed from `up` to `down`, select
-**Destroy** from the `More` drop-down menu.
-
-To remove an OSD via the CLI run the following commands.
-
-[source,bash]
-----
-ceph osd out <ID>
-systemctl stop ceph-osd@<ID>.service
-----
-
-NOTE: The first command instructs Ceph not to include the OSD in the data
-distribution. The second command stops the OSD service. Until this time, no
-data is lost.
-
-The following command destroys the OSD. Specify the '-cleanup' option to
-additionally destroy the partition table.
-
-[source,bash]
-----
-pveceph osd destroy <ID>
-----
-
-WARNING: The above command will destroy all data on the disk!
+If you experience problems with an OSD or its disk, try to
+xref:pve_ceph_mon_and_ts[troubleshoot] them first to decide if a
+xref:pve_ceph_osd_replace[replacement] is needed.
+
+To destroy an OSD:
+
+. Either open the web interface and select any {pve} node in the tree
+view, or open a shell on the node where the OSD to be deleted is
+located.
+
+. Go to the __Ceph -> OSD__ panel (`ceph osd df tree`). If the OSD to
+be deleted is still `up` and `in` (non-zero value at `AVAIL`), make
+sure that all OSDs have their `Used (%)` value well below the
+`nearfull_ratio` of default `85%`. In this way you can reduce the risk
+from the upcoming rebalancing, which may cause OSDs to run full and
+thereby blocking I/O on Ceph pools.
+
+. If the deletable OSD is not `out` yet, select the OSD and click on
+**Out** (`ceph osd out <id>`). This will exclude it from data
+distribution and starts a rebalance.
+
+. Click on **Stop**, and if a warning appears, click on **Cancel** and
+try again shortly afterwards. When using the shell, check if it is
+safe to stop by reading the output from `ceph osd ok-to-stop <id>`,
+once true, run `pveceph stop --service osd.<id>` .
+
+. **Attention, this step removes the OSD from Ceph and deletes all
+disk data.** To continue, first click on **More -> Destroy**. Use the
+cleanup option to clean up the partition table and similar, enabling
+an immediate reuse of the disk in {pve}. Finally, click on **Remove**
+(`pveceph osd destroy <id> [--cleanup]`).
 
 
 [[pve_ceph_pools]]
-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [pve-devel] [PATCH docs 5/6] ceph: maintenance: revise and expand section "Replace OSDs"
  2025-02-03 14:27 [pve-devel] [PATCH docs 1/6] ceph: add anchors for use in troubleshooting section Alexander Zeidler
                   ` (2 preceding siblings ...)
  2025-02-03 14:27 ` [pve-devel] [PATCH docs 4/6] ceph: osd: revise and expand the section "Destroy OSDs" Alexander Zeidler
@ 2025-02-03 14:28 ` Alexander Zeidler
  2025-02-03 14:28 ` [pve-devel] [PATCH docs 6/6] pvecm: remove node: mention Ceph and its steps for safe removal Alexander Zeidler
  2025-02-03 16:19 ` [pve-devel] [PATCH docs 1/6] ceph: add anchors for use in troubleshooting section Max Carrara
  5 siblings, 0 replies; 12+ messages in thread
From: Alexander Zeidler @ 2025-02-03 14:28 UTC (permalink / raw)
  To: pve-devel

Remove redundant information that is already described in section
“Destroy OSDs” and link to it.

Mention and link to the troubleshooting section, as replacing the OSD
may not fix the underyling problem.

Mention that the replacement disk should be of the same type and size
and comply with the recommendations.

Mention how to acknowledge warnings of crashed OSDs.

Signed-off-by: Alexander Zeidler <a.zeidler@proxmox.com>
---
 pveceph.adoc | 45 +++++++++++++--------------------------------
 1 file changed, 13 insertions(+), 32 deletions(-)

diff --git a/pveceph.adoc b/pveceph.adoc
index 754c401..010f48c 100644
--- a/pveceph.adoc
+++ b/pveceph.adoc
@@ -1032,43 +1032,24 @@ Ceph Maintenance
 Replace OSDs
 ~~~~~~~~~~~~
 
-One of the most common maintenance tasks in Ceph is to replace the disk of an
-OSD. If a disk is already in a failed state, then you can go ahead and run
-through the steps in xref:pve_ceph_osd_destroy[Destroy OSDs]. Ceph will recreate
-those copies on the remaining OSDs if possible. This rebalancing will start as
-soon as an OSD failure is detected or an OSD was actively stopped.
+With the following steps you can replace the disk of an OSD, which is
+one of the most common maintenance tasks in Ceph. If there is a
+problem with an OSD while its disk still seems to be healthy, read the
+xref:pve_ceph_mon_and_ts[troubleshooting] section first.
 
-NOTE: With the default size/min_size (3/2) of a pool, recovery only starts when
-`size + 1` nodes are available. The reason for this is that the Ceph object
-balancer xref:pve_ceph_device_classes[CRUSH] defaults to a full node as
-`failure domain'.
+. If the disk failed, get a
+xref:pve_ceph_recommendation_disk[recommended] replacement disk of the
+same type and size.
 
-To replace a functioning disk from the GUI, go through the steps in
-xref:pve_ceph_osd_destroy[Destroy OSDs]. The only addition is to wait until
-the cluster shows 'HEALTH_OK' before stopping the OSD to destroy it.
+. xref:pve_ceph_osd_destroy[Destroy] the OSD in question.
 
-On the command line, use the following commands:
+. Detach the old disk from the server and attach the new one.
 
-----
-ceph osd out osd.<id>
-----
-
-You can check with the command below if the OSD can be safely removed.
-
-----
-ceph osd safe-to-destroy osd.<id>
-----
-
-Once the above check tells you that it is safe to remove the OSD, you can
-continue with the following commands:
-
-----
-systemctl stop ceph-osd@<id>.service
-pveceph osd destroy <id>
-----
+. xref:pve_ceph_osd_create[Create] the OSD again.
 
-Replace the old disk with the new one and use the same procedure as described
-in xref:pve_ceph_osd_create[Create OSDs].
+. After automatic rebalancing, the cluster status should switch back
+to `HEALTH_OK`. Any still listed crashes can be acknowledged by
+running, for example, `ceph crash archive-all`.
 
 Trim/Discard
 ~~~~~~~~~~~~
-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [pve-devel] [PATCH docs 6/6] pvecm: remove node: mention Ceph and its steps for safe removal
  2025-02-03 14:27 [pve-devel] [PATCH docs 1/6] ceph: add anchors for use in troubleshooting section Alexander Zeidler
                   ` (3 preceding siblings ...)
  2025-02-03 14:28 ` [pve-devel] [PATCH docs 5/6] ceph: maintenance: revise and expand section "Replace OSDs" Alexander Zeidler
@ 2025-02-03 14:28 ` Alexander Zeidler
  2025-02-03 16:19 ` [pve-devel] [PATCH docs 1/6] ceph: add anchors for use in troubleshooting section Max Carrara
  5 siblings, 0 replies; 12+ messages in thread
From: Alexander Zeidler @ 2025-02-03 14:28 UTC (permalink / raw)
  To: pve-devel

as it has already been missed in the past or the proper procedure was
not known.

Signed-off-by: Alexander Zeidler <a.zeidler@proxmox.com>
---
 pvecm.adoc | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/pvecm.adoc b/pvecm.adoc
index 15dda4e..8026de4 100644
--- a/pvecm.adoc
+++ b/pvecm.adoc
@@ -320,6 +320,53 @@ replication automatically switches direction if a replicated VM is migrated, so
 by migrating a replicated VM from a node to be deleted, replication jobs will be
 set up to that node automatically.
 
+If the node to be removed has been configured for
+xref:chapter_pveceph[Ceph]:
+
+. Ensure that sufficient {pve} nodes with running OSDs (`up` and `in`)
+continue to exist.
++
+NOTE: By default, Ceph pools have a `size/min_size` of `3/2` and a
+full node as `failure domain` at the object balancer
+xref:pve_ceph_device_classes[CRUSH]. So if less than `size` (`3`)
+nodes with running OSDs are online, data redundancy will be degraded.
+If less than `min_size` are online, pool I/O will be blocked and
+affected guests may crash.
+
+. Ensure that sufficient xref:pve_ceph_monitors[monitors],
+xref:pve_ceph_manager[managers] and, if using CephFS,
+xref:pveceph_fs_mds[metadata servers] remain available.
+
+. To maintain data redundancy, each destruction of an OSD, especially
+the last one on a node, will trigger a data rebalance. Therefore,
+ensure that the OSDs on the remaining nodes have sufficient free space
+left.
+
+. To remove Ceph from the node to be deleted, start by
+xref:pve_ceph_osd_destroy[destroying] its OSDs, one after the other.
+
+. Once the xref:pve_ceph_mon_and_ts[CEPH status] is `HEALTH_OK` again,
+proceed by:
+
+[arabic]
+.. destroying its xref:pveceph_fs_mds[metadata server] via web
+interface at __Ceph -> CephFS__ or by running:
++
+----
+# pveceph mds destroy <local hostname>
+----
+
+.. xref:pveceph_destroy_mon[destroying its monitor]
+
+.. xref:pveceph_destroy_mgr[destroying its manager]
+
+. Finally, remove the now empty bucket ({pve} node to be removed) from
+the CRUSH hierarchy by running:
++
+----
+# ceph osd crush remove <hostname>
+----
+
 In the following example, we will remove the node hp4 from the cluster.
 
 Log in to a *different* cluster node (not hp4), and issue a `pvecm nodes`
-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [pve-devel] [PATCH docs 1/6] ceph: add anchors for use in troubleshooting section
  2025-02-03 14:27 [pve-devel] [PATCH docs 1/6] ceph: add anchors for use in troubleshooting section Alexander Zeidler
                   ` (4 preceding siblings ...)
  2025-02-03 14:28 ` [pve-devel] [PATCH docs 6/6] pvecm: remove node: mention Ceph and its steps for safe removal Alexander Zeidler
@ 2025-02-03 16:19 ` Max Carrara
  2025-02-04  9:22   ` Alexander Zeidler
  5 siblings, 1 reply; 12+ messages in thread
From: Max Carrara @ 2025-02-03 16:19 UTC (permalink / raw)
  To: Proxmox VE development discussion

On Mon Feb 3, 2025 at 3:27 PM CET, Alexander Zeidler wrote:
> Signed-off-by: Alexander Zeidler <a.zeidler@proxmox.com>
> ---

Some high-level feedback (see comments inline and in patches otherwise):

- The writing style is IMO quite clear and straightforward, nice work!

- In patch 03, the "_disk_health_monitoring" anchor reference seems to
  break my build for some reason. Does this also happen on your end? The
  single-page docs ("pve-admin-guide.html") seem to build just fine
  otherwise.

- Regarding implicitly / auto-generated anchors, is it fine to break
  those in general or not? See my other comments inline here.

- There are a few tiny style things I personally would correct, but if
  you disagree with them, feel free to leave them as they are.

All in all this seems pretty solid; the stuff regarding the anchors
needs to be clarified first (whether it's okay to break auto-generated
ones & the one anchor that makes my build fail). Otherwise, pretty good!

>  pveceph.adoc | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/pveceph.adoc b/pveceph.adoc
> index da39e7f..93c2f8d 100644
> --- a/pveceph.adoc
> +++ b/pveceph.adoc
> @@ -82,6 +82,7 @@ and vocabulary
>  footnote:[Ceph glossary {cephdocs-url}/glossary].
>  
>  
> +[[pve_ceph_recommendation]]
>  Recommendations for a Healthy Ceph Cluster
>  ------------------------------------------

AsciiDoc automatically generated an anchor for the heading above
already, and it's "_recommendations_for_a_healthy_ceph_cluster"
apparently. So, there's no need to provide one here explicitly, since it
already exists; it also might break old links that refer to the
documentation.

Though, perhaps in a separate series, you could look for all implicitly
defined anchors and set them explicitly..? Not sure if that's something
we want, though.

>  
> @@ -95,6 +96,7 @@ NOTE: The recommendations below should be seen as a rough guidance for choosing
>  hardware. Therefore, it is still essential to adapt it to your specific needs.
>  You should test your setup and monitor health and performance continuously.
>  
> +[[pve_ceph_recommendation_cpu]]
>  .CPU
>  Ceph services can be classified into two categories:
>  
> @@ -122,6 +124,7 @@ IOPS load over 100'000 with sub millisecond latency, each OSD can use multiple
>  CPU threads, e.g., four to six CPU threads utilized per NVMe backed OSD is
>  likely for very high performance disks.
>  
> +[[pve_ceph_recommendation_memory]]
>  .Memory
>  Especially in a hyper-converged setup, the memory consumption needs to be
>  carefully planned out and monitored. In addition to the predicted memory usage
> @@ -137,6 +140,7 @@ normal operation, but rather leave some headroom to cope with outages.
>  The OSD service itself will use additional memory. The Ceph BlueStore backend of
>  the daemon requires by default **3-5 GiB of memory** (adjustable).
>  
> +[[pve_ceph_recommendation_network]]
>  .Network
>  We recommend a network bandwidth of at least 10 Gbps, or more, to be used
>  exclusively for Ceph traffic. A meshed network setup
> @@ -172,6 +176,7 @@ high-performance setups:
>  * one medium bandwidth (1 Gbps) exclusive for the latency sensitive corosync
>    cluster communication.
>  
> +[[pve_ceph_recommendation_disk]]
>  .Disks
>  When planning the size of your Ceph cluster, it is important to take the
>  recovery time into consideration. Especially with small clusters, recovery
> @@ -197,6 +202,7 @@ You also need to balance OSD count and single OSD capacity. More capacity
>  allows you to increase storage density, but it also means that a single OSD
>  failure forces Ceph to recover more data at once.
>  
> +[[pve_ceph_recommendation_raid]]
>  .Avoid RAID
>  As Ceph handles data object redundancy and multiple parallel writes to disks
>  (OSDs) on its own, using a RAID controller normally doesn’t improve
> @@ -1018,6 +1024,7 @@ to act as standbys.
>  Ceph maintenance
>  ----------------
>  
> +[[pve_ceph_osd_replace]]
>  Replace OSDs
>  ~~~~~~~~~~~~

This one here is also implicitly defined already, unfortunately.

>  
> @@ -1131,6 +1138,7 @@ ceph osd unset noout
>  You can now start up the guests. Highly available guests will change their state
>  to 'started' when they power on.
>  
> +[[pve_ceph_mon_and_ts]]
>  Ceph Monitoring and Troubleshooting
>  -----------------------------------
>  

So is this one.

Actually, now I do wonder: I think it's better to define them in the
AsciiDoc code directly, but how would we do that with existing anchors?
Just use the automatically generated anchor name? Or are we fine with
breaking links? Would be nice if someone could chime in here.

(Personally, I think it's fine to break these things, but I stand
corrected if that's a no-go.)



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [pve-devel] [PATCH docs 3/6] ceph: troubleshooting: revise and add frequently needed information
  2025-02-03 14:27 ` [pve-devel] [PATCH docs 3/6] ceph: troubleshooting: revise and add frequently needed information Alexander Zeidler
@ 2025-02-03 16:19   ` Max Carrara
  0 siblings, 0 replies; 12+ messages in thread
From: Max Carrara @ 2025-02-03 16:19 UTC (permalink / raw)
  To: Proxmox VE development discussion

On Mon Feb 3, 2025 at 3:27 PM CET, Alexander Zeidler wrote:
> Existing information is slightly modified and retained.
>
> Add information:
> * List which logs are usually helpful for troubleshooting
> * Explain how to acknowledge listed Ceph crashes and view details
> * List common causes of Ceph problems and link to recommendations for a
>   healthy cluster
> * Briefly describe the common problem "OSDs down/crashed"
>
> Signed-off-by: Alexander Zeidler <a.zeidler@proxmox.com>
> ---
>  pveceph.adoc | 72 ++++++++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 64 insertions(+), 8 deletions(-)
>
> diff --git a/pveceph.adoc b/pveceph.adoc
> index 90bb975..4e1c1e2 100644
> --- a/pveceph.adoc
> +++ b/pveceph.adoc
> @@ -1150,22 +1150,78 @@ The following Ceph commands can be used to see if the cluster is healthy
>  ('HEALTH_OK'), if there are warnings ('HEALTH_WARN'), or even errors
>  ('HEALTH_ERR'). If the cluster is in an unhealthy state, the status commands
>  below will also give you an overview of the current events and actions to take.
> +To stop their execution, press CTRL-C.
>  
>  ----
> -# single time output
> -pve# ceph -s
> -# continuously output status changes (press CTRL+C to stop)
> -pve# ceph -w
> +# Continuously watch the cluster status
> +pve# watch ceph --status
> +
> +# Print the cluster status once (not being updated)
> +# and continuously append lines of status events
> +pve# ceph --watch
>  ----
>  
> +[[pve_ceph_ts]]
> +Troubleshooting
> +~~~~~~~~~~~~~~~
> +
> +This section includes frequently used troubleshooting information.
> +More information can be found on the official Ceph website under
> +Troubleshooting
> +footnote:[Ceph troubleshooting {cephdocs-url}/rados/troubleshooting/].
> +
> +[[pve_ceph_ts_logs]]
> +.Relevant Logs on Affected Node
> +
> +* xref:_disk_health_monitoring[Disk Health Monitoring]

For some reason, the "_disk_health_monitoring" anchor above breaks
building the docs for me -- "make update" exits with an error,
complaining that it can't find the anchor. The one-page docs
("pve-admin-guide.html") seems to build just fine, though. The anchor
works there too, so I'm not sure what's going wrong there exactly.

> +* __System -> System Log__ (or, for example,
> +  `journalctl --since "2 days ago"`)
> +* IPMI and RAID controller logs
> +
> +Ceph service crashes can be listed and viewed in detail by running
> +`ceph crash ls` and `ceph crash info <crash_id>`. Crashes marked as
> +new can be acknowledged by running, for example,
> +`ceph crash archive-all`.
> +
>  To get a more detailed view, every Ceph service has a log file under
>  `/var/log/ceph/`. If more detail is required, the log level can be
>  adjusted footnote:[Ceph log and debugging {cephdocs-url}/rados/troubleshooting/log-and-debug/].
>  
> -You can find more information about troubleshooting
> -footnote:[Ceph troubleshooting {cephdocs-url}/rados/troubleshooting/]
> -a Ceph cluster on the official website.
> -
> +[[pve_ceph_ts_causes]]
> +.Common Causes of Ceph Problems
> +
> +* Network problems like congestion, a faulty switch, a shut down
> +interface or a blocking firewall. Check whether all {pve} nodes are
> +reliably reachable on the xref:_cluster_network[corosync] network and

Would personally prefer "xref:_cluster_network[corosync network]" above,
but no hard opinions there.

> +on the xref:pve_ceph_install_wizard[configured] Ceph public and
> +cluster network.

Would also prefer [configured Ceph public and cluster network] as a
whole here.

> +
> +* Disk or connection parts which are:
> +** defective
> +** not firmly mounted
> +** lacking I/O performance under higher load (e.g. when using HDDs,
> +consumer hardware or xref:pve_ceph_recommendation_raid[inadvisable]
> +RAID controllers)

Same here; I would prefer to highlight [inadvisable RAID controllers] as
a whole.

> +
> +* Not fulfilling the xref:pve_ceph_recommendation[recommendations] for
> +a healthy Ceph cluster.
> +
> +[[pve_ceph_ts_problems]]
> +.Common Ceph Problems
> + ::
> +
> +OSDs `down`/crashed:::
> +A faulty OSD will be reported as `down` and mostly (auto) `out` 10
> +minutes later. Depending on the cause, it can also automatically
> +become `up` and `in` again. To try a manual activation via web
> +interface, go to __Any node -> Ceph -> OSD__, select the OSD and click
> +on **Start**, **In** and **Reload**. When using the shell, run on the
> +affected node `ceph-volume lvm activate --all`.
> ++
> +To activate a failed OSD, it may be necessary to
> +xref:ha_manager_node_maintenance[safely] reboot the respective node

And again here: Would personally prefer [safely reboot] in the anchor
ref.

> +or, as a last resort, to
> +xref:pve_ceph_osd_replace[recreate or replace] the OSD.
>  
>  ifdef::manvolnum[]
>  include::pve-copyright.adoc[]

Note: The only thing that really stood out to me was the
"_disk_health_monitoring" refusing to build on my system; the other
comments here are just tiny style suggestions. If you disagree with
them, no hard feelings at all! :P



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [pve-devel] [PATCH docs 4/6] ceph: osd: revise and expand the section "Destroy OSDs"
  2025-02-03 14:27 ` [pve-devel] [PATCH docs 4/6] ceph: osd: revise and expand the section "Destroy OSDs" Alexander Zeidler
@ 2025-02-03 16:19   ` Max Carrara
  0 siblings, 0 replies; 12+ messages in thread
From: Max Carrara @ 2025-02-03 16:19 UTC (permalink / raw)
  To: Proxmox VE development discussion

On Mon Feb 3, 2025 at 3:27 PM CET, Alexander Zeidler wrote:
> Existing information is slightly modified and retained.
>
> Add information:
> * Mention and link to the sections "Troubleshooting" and "Replace OSDs"
> * CLI commands (pveceph) must be executed on the affected node
> * Check in advance the "Used (%)" of OSDs to avoid blocked I/O
> * Check and wait until the OSD can be stopped safely
> * Use `pveceph stop` instead of `systemctl stop ceph-osd@<ID>.service`
> * Explain cleanup option a bit more
>
> Signed-off-by: Alexander Zeidler <a.zeidler@proxmox.com>
> ---
>  pveceph.adoc | 58 ++++++++++++++++++++++++++++------------------------
>  1 file changed, 31 insertions(+), 27 deletions(-)
>
> diff --git a/pveceph.adoc b/pveceph.adoc
> index 4e1c1e2..754c401 100644
> --- a/pveceph.adoc
> +++ b/pveceph.adoc
> @@ -502,33 +502,37 @@ ceph-volume lvm create --filestore --data /dev/sd[X] --journal /dev/sd[Y]
>  Destroy OSDs
>  ~~~~~~~~~~~~
>  
> -To remove an OSD via the GUI, first select a {PVE} node in the tree view and go
> -to the **Ceph -> OSD** panel. Then select the OSD to destroy and click the **OUT**
> -button. Once the OSD status has changed from `in` to `out`, click the **STOP**
> -button. Finally, after the status has changed from `up` to `down`, select
> -**Destroy** from the `More` drop-down menu.
> -
> -To remove an OSD via the CLI run the following commands.
> -
> -[source,bash]
> -----
> -ceph osd out <ID>
> -systemctl stop ceph-osd@<ID>.service
> -----
> -
> -NOTE: The first command instructs Ceph not to include the OSD in the data
> -distribution. The second command stops the OSD service. Until this time, no
> -data is lost.
> -
> -The following command destroys the OSD. Specify the '-cleanup' option to
> -additionally destroy the partition table.
> -
> -[source,bash]
> -----
> -pveceph osd destroy <ID>
> -----
> -
> -WARNING: The above command will destroy all data on the disk!
> +If you experience problems with an OSD or its disk, try to
> +xref:pve_ceph_mon_and_ts[troubleshoot] them first to decide if a
> +xref:pve_ceph_osd_replace[replacement] is needed.
> +
> +To destroy an OSD:
> +
> +. Either open the web interface and select any {pve} node in the tree
> +view, or open a shell on the node where the OSD to be deleted is
> +located.
> +
> +. Go to the __Ceph -> OSD__ panel (`ceph osd df tree`). If the OSD to
> +be deleted is still `up` and `in` (non-zero value at `AVAIL`), make
> +sure that all OSDs have their `Used (%)` value well below the
> +`nearfull_ratio` of default `85%`. In this way you can reduce the risk
> +from the upcoming rebalancing, which may cause OSDs to run full and
> +thereby blocking I/O on Ceph pools.
> +
> +. If the deletable OSD is not `out` yet, select the OSD and click on
> +**Out** (`ceph osd out <id>`). This will exclude it from data
> +distribution and starts a rebalance.
> +
> +. Click on **Stop**, and if a warning appears, click on **Cancel** and
> +try again shortly afterwards. When using the shell, check if it is

What kind of warning can appear in this case here, though? Is there
something that the user could perhaps miss, if they just proceed to
click on Cancel?

> +safe to stop by reading the output from `ceph osd ok-to-stop <id>`,
> +once true, run `pveceph stop --service osd.<id>` .
> +
> +. **Attention, this step removes the OSD from Ceph and deletes all

Would prefer to keep the "WARNING: " here instead of "Attention, ",
personally.

> +disk data.** To continue, first click on **More -> Destroy**. Use the
> +cleanup option to clean up the partition table and similar, enabling
> +an immediate reuse of the disk in {pve}. Finally, click on **Remove**
> +(`pveceph osd destroy <id> [--cleanup]`).
>  
>  
>  [[pve_ceph_pools]]



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [pve-devel] [PATCH docs 1/6] ceph: add anchors for use in troubleshooting section
  2025-02-03 16:19 ` [pve-devel] [PATCH docs 1/6] ceph: add anchors for use in troubleshooting section Max Carrara
@ 2025-02-04  9:22   ` Alexander Zeidler
  2025-02-04  9:52     ` Max Carrara
  0 siblings, 1 reply; 12+ messages in thread
From: Alexander Zeidler @ 2025-02-04  9:22 UTC (permalink / raw)
  To: Proxmox VE development discussion

On Mon Feb 3, 2025 at 5:19 PM CET, Max Carrara wrote:
> On Mon Feb 3, 2025 at 3:27 PM CET, Alexander Zeidler wrote:
>> Signed-off-by: Alexander Zeidler <a.zeidler@proxmox.com>
>> ---
>
> Some high-level feedback (see comments inline and in patches otherwise):
>
> - The writing style is IMO quite clear and straightforward, nice work!
Thank you for the review!

>
> - In patch 03, the "_disk_health_monitoring" anchor reference seems to
>   break my build for some reason. Does this also happen on your end? The
>   single-page docs ("pve-admin-guide.html") seem to build just fine
>   otherwise.
Same for me, I will fix it.

>
> - Regarding implicitly / auto-generated anchors, is it fine to break
>   those in general or not? See my other comments inline here.
>
> - There are a few tiny style things I personally would correct, but if
>   you disagree with them, feel free to leave them as they are.
I will look into it! Using longer link texts sounds good!

>
> All in all this seems pretty solid; the stuff regarding the anchors
> needs to be clarified first (whether it's okay to break auto-generated
> ones & the one anchor that makes my build fail). Otherwise, pretty good!
See my two comments below.

>
>>  pveceph.adoc | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/pveceph.adoc b/pveceph.adoc
>> index da39e7f..93c2f8d 100644
>> --- a/pveceph.adoc
>> +++ b/pveceph.adoc
>> @@ -82,6 +82,7 @@ and vocabulary
>>  footnote:[Ceph glossary {cephdocs-url}/glossary].
>>  
>>  
>> +[[pve_ceph_recommendation]]
>>  Recommendations for a Healthy Ceph Cluster
>>  ------------------------------------------
>
> AsciiDoc automatically generated an anchor for the heading above
> already, and it's "_recommendations_for_a_healthy_ceph_cluster"
> apparently. So, there's no need to provide one here explicitly, since it
> already exists; it also might break old links that refer to the
> documentation.
For this I searched our forum before, it shows 12 results, the heading
was only added about a year ago. But apart from this specific anchor,
IMHO it can be okay to break such links in certain cases:

* The main reasons for not using the auto generated ones are, that those
  are not stable (in case of changing the title) and can also be very
  long when using it with xref:...[...]. Such lines get even longer (and
  an awkward combined name) when using it as a prefix for sub sections
  (as often done).
* Since with the break there might have been added new or updated
  information in those chapters/sections, old forum posts may no longer
  be accurate anyway.
* In the Ceph chapter for example, we have been using the explicit
  "pve_ceph_" or "pveceph_" for years, so IMHO it should (almost
  always?) be added with adding a new section.

>
> Though, perhaps in a separate series, you could look for all implicitly
> defined anchors and set them explicitly..? Not sure if that's something
> we want, though.
This would break a lot of links at the same time, so far I am not aware
about a notable benefit.

>
>>  
>> @@ -95,6 +96,7 @@ NOTE: The recommendations below should be seen as a rough guidance for choosing
>>  hardware. Therefore, it is still essential to adapt it to your specific needs.
>>  You should test your setup and monitor health and performance continuously.
>>  
>> +[[pve_ceph_recommendation_cpu]]
>>  .CPU
>>  Ceph services can be classified into two categories:
>>  
>> @@ -122,6 +124,7 @@ IOPS load over 100'000 with sub millisecond latency, each OSD can use multiple
>>  CPU threads, e.g., four to six CPU threads utilized per NVMe backed OSD is
>>  likely for very high performance disks.
>>  
>> +[[pve_ceph_recommendation_memory]]
>>  .Memory
>>  Especially in a hyper-converged setup, the memory consumption needs to be
>>  carefully planned out and monitored. In addition to the predicted memory usage
>> @@ -137,6 +140,7 @@ normal operation, but rather leave some headroom to cope with outages.
>>  The OSD service itself will use additional memory. The Ceph BlueStore backend of
>>  the daemon requires by default **3-5 GiB of memory** (adjustable).
>>  
>> +[[pve_ceph_recommendation_network]]
>>  .Network
>>  We recommend a network bandwidth of at least 10 Gbps, or more, to be used
>>  exclusively for Ceph traffic. A meshed network setup
>> @@ -172,6 +176,7 @@ high-performance setups:
>>  * one medium bandwidth (1 Gbps) exclusive for the latency sensitive corosync
>>    cluster communication.
>>  
>> +[[pve_ceph_recommendation_disk]]
>>  .Disks
>>  When planning the size of your Ceph cluster, it is important to take the
>>  recovery time into consideration. Especially with small clusters, recovery
>> @@ -197,6 +202,7 @@ You also need to balance OSD count and single OSD capacity. More capacity
>>  allows you to increase storage density, but it also means that a single OSD
>>  failure forces Ceph to recover more data at once.
>>  
>> +[[pve_ceph_recommendation_raid]]
>>  .Avoid RAID
>>  As Ceph handles data object redundancy and multiple parallel writes to disks
>>  (OSDs) on its own, using a RAID controller normally doesn’t improve
>> @@ -1018,6 +1024,7 @@ to act as standbys.
>>  Ceph maintenance
>>  ----------------
>>  
>> +[[pve_ceph_osd_replace]]
>>  Replace OSDs
>>  ~~~~~~~~~~~~
>
> This one here is also implicitly defined already, unfortunately.
>
>>  
>> @@ -1131,6 +1138,7 @@ ceph osd unset noout
>>  You can now start up the guests. Highly available guests will change their state
>>  to 'started' when they power on.
>>  
>> +[[pve_ceph_mon_and_ts]]
>>  Ceph Monitoring and Troubleshooting
>>  -----------------------------------
>>  
>
> So is this one.
>
> Actually, now I do wonder: I think it's better to define them in the
> AsciiDoc code directly, but how would we do that with existing anchors?
> Just use the automatically generated anchor name? Or are we fine with
> breaking links? Would be nice if someone could chime in here.
>
> (Personally, I think it's fine to break these things, but I stand
> corrected if that's a no-go.)
>
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [pve-devel] [PATCH docs 1/6] ceph: add anchors for use in troubleshooting section
  2025-02-04  9:22   ` Alexander Zeidler
@ 2025-02-04  9:52     ` Max Carrara
  2025-02-05 10:10       ` Alexander Zeidler
  0 siblings, 1 reply; 12+ messages in thread
From: Max Carrara @ 2025-02-04  9:52 UTC (permalink / raw)
  To: Proxmox VE development discussion

On Tue Feb 4, 2025 at 10:22 AM CET, Alexander Zeidler wrote:
> On Mon Feb 3, 2025 at 5:19 PM CET, Max Carrara wrote:
> > On Mon Feb 3, 2025 at 3:27 PM CET, Alexander Zeidler wrote:
> >> Signed-off-by: Alexander Zeidler <a.zeidler@proxmox.com>
> >> ---
> >
> > Some high-level feedback (see comments inline and in patches otherwise):
> >
> > - The writing style is IMO quite clear and straightforward, nice work!
> Thank you for the review!
>
> >
> > - In patch 03, the "_disk_health_monitoring" anchor reference seems to
> >   break my build for some reason. Does this also happen on your end? The
> >   single-page docs ("pve-admin-guide.html") seem to build just fine
> >   otherwise.
> Same for me, I will fix it.
>
> >
> > - Regarding implicitly / auto-generated anchors, is it fine to break
> >   those in general or not? See my other comments inline here.
> >
> > - There are a few tiny style things I personally would correct, but if
> >   you disagree with them, feel free to leave them as they are.
> I will look into it! Using longer link texts sounds good!
>
> >
> > All in all this seems pretty solid; the stuff regarding the anchors
> > needs to be clarified first (whether it's okay to break auto-generated
> > ones & the one anchor that makes my build fail). Otherwise, pretty good!
> See my two comments below.
>
> >
> >>  pveceph.adoc | 8 ++++++++
> >>  1 file changed, 8 insertions(+)
> >>
> >> diff --git a/pveceph.adoc b/pveceph.adoc
> >> index da39e7f..93c2f8d 100644
> >> --- a/pveceph.adoc
> >> +++ b/pveceph.adoc
> >> @@ -82,6 +82,7 @@ and vocabulary
> >>  footnote:[Ceph glossary {cephdocs-url}/glossary].
> >>  
> >>  
> >> +[[pve_ceph_recommendation]]
> >>  Recommendations for a Healthy Ceph Cluster
> >>  ------------------------------------------
> >
> > AsciiDoc automatically generated an anchor for the heading above
> > already, and it's "_recommendations_for_a_healthy_ceph_cluster"
> > apparently. So, there's no need to provide one here explicitly, since it
> > already exists; it also might break old links that refer to the
> > documentation.
> For this I searched our forum before, it shows 12 results, the heading
> was only added about a year ago. But apart from this specific anchor,
> IMHO it can be okay to break such links in certain cases:
>
> * The main reasons for not using the auto generated ones are, that those
>   are not stable (in case of changing the title) and can also be very
>   long when using it with xref:...[...]. Such lines get even longer (and
>   an awkward combined name) when using it as a prefix for sub sections
>   (as often done).
> * Since with the break there might have been added new or updated
>   information in those chapters/sections, old forum posts may no longer
>   be accurate anyway.
> * In the Ceph chapter for example, we have been using the explicit
>   "pve_ceph_" or "pveceph_" for years, so IMHO it should (almost
>   always?) be added with adding a new section.
>
> >
> > Though, perhaps in a separate series, you could look for all implicitly
> > defined anchors and set them explicitly..? Not sure if that's something
> > we want, though.
> This would break a lot of links at the same time, so far I am not aware
> about a notable benefit.
>

I agree with all of your points made here; so, all in all, great work!
Ping me when you shoot out v2, then I'll have one last look. :)



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [pve-devel] [PATCH docs 1/6] ceph: add anchors for use in troubleshooting section
  2025-02-04  9:52     ` Max Carrara
@ 2025-02-05 10:10       ` Alexander Zeidler
  0 siblings, 0 replies; 12+ messages in thread
From: Alexander Zeidler @ 2025-02-05 10:10 UTC (permalink / raw)
  To: Proxmox VE development discussion

On Tue Feb 4, 2025 at 10:52 AM CET, Max Carrara wrote:
> On Tue Feb 4, 2025 at 10:22 AM CET, Alexander Zeidler wrote:
>> On Mon Feb 3, 2025 at 5:19 PM CET, Max Carrara wrote:
>> > On Mon Feb 3, 2025 at 3:27 PM CET, Alexander Zeidler wrote:
>> >> Signed-off-by: Alexander Zeidler <a.zeidler@proxmox.com>
>> >> ---
>> >
>> > Some high-level feedback (see comments inline and in patches otherwise):
>> >
>> > - The writing style is IMO quite clear and straightforward, nice work!
>> Thank you for the review!
>>
>> >
>> > - In patch 03, the "_disk_health_monitoring" anchor reference seems to
>> >   break my build for some reason. Does this also happen on your end? The
>> >   single-page docs ("pve-admin-guide.html") seem to build just fine
>> >   otherwise.
>> Same for me, I will fix it.
>>
>> >
>> > - Regarding implicitly / auto-generated anchors, is it fine to break
>> >   those in general or not? See my other comments inline here.
>> >
>> > - There are a few tiny style things I personally would correct, but if
>> >   you disagree with them, feel free to leave them as they are.
>> I will look into it! Using longer link texts sounds good!
>>
>> >
>> > All in all this seems pretty solid; the stuff regarding the anchors
>> > needs to be clarified first (whether it's okay to break auto-generated
>> > ones & the one anchor that makes my build fail). Otherwise, pretty good!
>> See my two comments below.
>>
>> >
>> >>  pveceph.adoc | 8 ++++++++
>> >>  1 file changed, 8 insertions(+)
>> >>
>> >> diff --git a/pveceph.adoc b/pveceph.adoc
>> >> index da39e7f..93c2f8d 100644
>> >> --- a/pveceph.adoc
>> >> +++ b/pveceph.adoc
>> >> @@ -82,6 +82,7 @@ and vocabulary
>> >>  footnote:[Ceph glossary {cephdocs-url}/glossary].
>> >>  
>> >>  
>> >> +[[pve_ceph_recommendation]]
>> >>  Recommendations for a Healthy Ceph Cluster
>> >>  ------------------------------------------
>> >
>> > AsciiDoc automatically generated an anchor for the heading above
>> > already, and it's "_recommendations_for_a_healthy_ceph_cluster"
>> > apparently. So, there's no need to provide one here explicitly, since it
>> > already exists; it also might break old links that refer to the
>> > documentation.
>> For this I searched our forum before, it shows 12 results, the heading
>> was only added about a year ago. But apart from this specific anchor,
>> IMHO it can be okay to break such links in certain cases:
>>
>> * The main reasons for not using the auto generated ones are, that those
>>   are not stable (in case of changing the title) and can also be very
>>   long when using it with xref:...[...]. Such lines get even longer (and
>>   an awkward combined name) when using it as a prefix for sub sections
>>   (as often done).
>> * Since with the break there might have been added new or updated
>>   information in those chapters/sections, old forum posts may no longer
>>   be accurate anyway.
>> * In the Ceph chapter for example, we have been using the explicit
>>   "pve_ceph_" or "pveceph_" for years, so IMHO it should (almost
>>   always?) be added with adding a new section.
>>
>> >
>> > Though, perhaps in a separate series, you could look for all implicitly
>> > defined anchors and set them explicitly..? Not sure if that's something
>> > we want, though.
>> This would break a lot of links at the same time, so far I am not aware
>> about a notable benefit.
>>
>
> I agree with all of your points made here; so, all in all, great work!
> Ping me when you shoot out v2, then I'll have one last look. :)
v2: https://lore.proxmox.com/pve-devel/20250205100850.3-1-a.zeidler@proxmox.com/T/#t

>
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-02-05 10:10 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-03 14:27 [pve-devel] [PATCH docs 1/6] ceph: add anchors for use in troubleshooting section Alexander Zeidler
2025-02-03 14:27 ` [pve-devel] [PATCH docs 2/6] ceph: correct heading capitalization Alexander Zeidler
2025-02-03 14:27 ` [pve-devel] [PATCH docs 3/6] ceph: troubleshooting: revise and add frequently needed information Alexander Zeidler
2025-02-03 16:19   ` Max Carrara
2025-02-03 14:27 ` [pve-devel] [PATCH docs 4/6] ceph: osd: revise and expand the section "Destroy OSDs" Alexander Zeidler
2025-02-03 16:19   ` Max Carrara
2025-02-03 14:28 ` [pve-devel] [PATCH docs 5/6] ceph: maintenance: revise and expand section "Replace OSDs" Alexander Zeidler
2025-02-03 14:28 ` [pve-devel] [PATCH docs 6/6] pvecm: remove node: mention Ceph and its steps for safe removal Alexander Zeidler
2025-02-03 16:19 ` [pve-devel] [PATCH docs 1/6] ceph: add anchors for use in troubleshooting section Max Carrara
2025-02-04  9:22   ` Alexander Zeidler
2025-02-04  9:52     ` Max Carrara
2025-02-05 10:10       ` Alexander Zeidler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal