public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH docs v2 1/2] pvesr: update the chapter and bring it into good condition
@ 2024-12-18 16:19 Alexander Zeidler
  2024-12-18 16:19 ` [pve-devel] [PATCH manager v2 2/2] replication: update help button reference Alexander Zeidler
  2025-01-08 10:34 ` [pve-devel] [PATCH docs v2 1/2] pvesr: update the chapter and bring it into good condition Daniel Kral
  0 siblings, 2 replies; 4+ messages in thread
From: Alexander Zeidler @ 2024-12-18 16:19 UTC (permalink / raw)
  To: pve-devel

* restructure and revise the introduction
* add subchapter "Recommendations"
* remove the subchapter "Schedule Format" with its one line of content
  and link where appropriate directly to the copy under "25. Appendix D:
  Calendar Events". The help button at adding/editing a job links now to
  the subchapter "Managing Jobs".
* provide details on job removal and how to enforce it if necessary
* add more helpful CLI examples and improve existing ones
* restructure and revise the subchapter "Error Handling"

Signed-off-by: Alexander Zeidler <a.zeidler@proxmox.com>
---
v2:
* no changes, only add missing pve-manager patch

 pvecm.adoc |   2 +
 pvesr.adoc | 402 ++++++++++++++++++++++++++++++++++++-----------------
 2 files changed, 279 insertions(+), 125 deletions(-)

diff --git a/pvecm.adoc b/pvecm.adoc
index 15dda4e..4028e92 100644
--- a/pvecm.adoc
+++ b/pvecm.adoc
@@ -486,6 +486,7 @@ authentication. You should fix this by removing the respective keys from the
 '/etc/pve/priv/authorized_keys' file.
 
 
+[[pvecm_quorum]]
 Quorum
 ------
 
@@ -963,6 +964,7 @@ case $- in
 esac
 ----
 
+[[pvecm_external_vote]]
 Corosync External Vote Support
 ------------------------------
 
diff --git a/pvesr.adoc b/pvesr.adoc
index 9ad02f5..de29240 100644
--- a/pvesr.adoc
+++ b/pvesr.adoc
@@ -24,48 +24,65 @@ Storage Replication
 :pve-toplevel:
 endif::manvolnum[]
 
-The `pvesr` command-line tool manages the {PVE} storage replication
-framework. Storage replication brings redundancy for guests using
-local storage and reduces migration time.
-
-It replicates guest volumes to another node so that all data is available
-without using shared storage. Replication uses snapshots to minimize traffic
-sent over the network. Therefore, new data is sent only incrementally after
-the initial full sync. In the case of a node failure, your guest data is
-still available on the replicated node.
-
-The replication is done automatically in configurable intervals.
-The minimum replication interval is one minute, and the maximal interval
-once a week. The format used to specify those intervals is a subset of
-`systemd` calendar events, see
-xref:pvesr_schedule_time_format[Schedule Format] section:
-
-It is possible to replicate a guest to multiple target nodes,
-but not twice to the same target node.
-
-Each replications bandwidth can be limited, to avoid overloading a storage
-or server.
-
-Only changes since the last replication (so-called `deltas`) need to be
-transferred if the guest is migrated to a node to which it already is
-replicated. This reduces the time needed significantly. The replication
-direction automatically switches if you migrate a guest to the replication
-target node.
-
-For example: VM100 is currently on `nodeA` and gets replicated to `nodeB`.
-You migrate it to `nodeB`, so now it gets automatically replicated back from
-`nodeB` to `nodeA`.
-
-If you migrate to a node where the guest is not replicated, the whole disk
-data must send over. After the migration, the replication job continues to
-replicate this guest to the configured nodes.
+Storage replication is particularly interesting for small clusters if
+guest volumes are placed on a local storage instead of a shared one.
+By replicating the volumes to other cluster nodes, guest migration to
+those nodes will become significantly faster.
+
+In the event of a node or local storage failure, the volume data as of
+the latest completed replication runs are still available on the
+replication target nodes.
 
 [IMPORTANT]
 ====
-High-Availability is allowed in combination with storage replication, but there
-may be some data loss between the last synced time and the time a node failed.
+While a replication-enabled guest can be configured for
+xref:chapter_ha_manager[high availability], or
+xref:pvesr_node_failed[manually moved] while its origin node is not
+available, read about the involved
+xref:pvesr_risk_of_data_loss[risk of data loss] and how to avoid it.
 ====
 
+.Replication requires …
+
+* at least one other cluster node as a replication target
+* one common local storage entry in the datacenter, being functional
+on both nodes
+* that the local storage type is
+xref:pvesr_supported_storage[supported by replication]
+* that guest volumes are stored on that local storage
+
+.Replication …
+
+* allows a fast migration to nodes where the guest is being replicated
+* provides guest volume redundancy in a cluster where using a shared
+storage type is not an option
+* is configured as a job for a guest, with multiple jobs enabling
+multiple replication targets
+* jobs run one after the other at their configured interval (shortest
+is every minute)
+* uses snapshots to regularly transmit only changed volume data
+(so-called deltas)
+* network bandwidth can be limited per job, smoothing the storage and
+network utilization
+* targets stay basically the same when migrating the guest to another
+node
+* direction of a job reverses when moving the guest to its configured
+replication target
+
+.Example:
+
+A guest runs on node `A` and has replication jobs to node `B` and `C`,
+both with a set interval of every five minutes (`*/5`). Now we migrate
+the guest from `A` to `B`, which also automatically updates the
+replication targets for this guest to be `A` and `C`. Migration was
+completed fast, as only the changed volume data since the last
+replication run has been transmitted.
+
+In the event that node `B` or its local storage fails, the guest can
+be restarted on `A` or `C`, with the risk of some data loss as
+described in this chapter.
+
+[[pvesr_supported_storage]]
 Supported Storage Types
 -----------------------
 
@@ -76,147 +93,282 @@ Supported Storage Types
 |ZFS (local)    |zfspool     |yes      |yes
 |=============================================
 
-[[pvesr_schedule_time_format]]
-Schedule Format
+[[pvesr_recommendations]]
+Recommendations
 ---------------
-Replication uses xref:chapter_calendar_events[calendar events] for
-configuring the schedule.
 
-Error Handling
---------------
+[[pvesr_risk_of_data_loss]]
+Risk of Data Loss
+~~~~~~~~~~~~~~~~~
+
+If a node should suddenly become unavailable for a longer period of
+time, it may become neccessary to run a guest on a replication target
+node instead. Thereby the guest will use the latest replicated volume
+data available on the chosen target node. That volume state will then
+also be replicated to other nodes with the next replication runs,
+since the replication directions are automatically updated for related
+jobs. This also means, that the once newer volume state on the failed
+node will be removed after it becomes available again.
+
+A more resilient solution may be to use a shared
+xref:chapter_storage[storage type] instead. If that is not an option,
+consider setting the replication job intervals short enough and avoid
+moving replication-configured guests while their origin node is not
+available. Instead of configuring those guests for high availability,
+xref:qm_startup_and_shutdown[start at boot] could be a sufficient
+alternative.
+
+[[pvesr_replication_network]]
+Network for Replication Traffic
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Replication traffic is routed via the
+xref:pvecm_migration_network[migration network]. If it is not set, the
+management network is used by default, which can have a negative
+impact on corosync and therefore on cluster availability. To specify
+the migration network, navigate to
+__Datacenter -> Options -> Migration Settings__, or set it via CLI in
+the xref:datacenter_configuration_file[`datacenter.cfg`].
+
+[[pvesr_cluster_size]]
+Cluster Size
+~~~~~~~~~~~~
+
+With a 2-node cluster in particular, the failure of one node can leave
+the other node without a xref:pvecm_quorum[quorum]. In order to keep
+the cluster functional at all times, it is therefore crucial to
+xref:pvecm_join_node_to_cluster[expand] to a 3-node cluster in advance
+or to configure a xref:pvecm_external_vote[QDevice] for the third
+vote.
+
+[[pvesr_managing_jobs]]
+Managing Jobs
+-------------
 
-If a replication job encounters problems, it is placed in an error state.
-In this state, the configured replication intervals get suspended
-temporarily. The failed replication is repeatedly tried again in a
-30 minute interval.
-Once this succeeds, the original schedule gets activated again.
+[thumbnail="screenshot/gui-qemu-add-replication-job.png"]
 
-Possible issues
-~~~~~~~~~~~~~~~
+Replication jobs can easily be created, modified and removed via web
+interface, or by using the CLI tool `pvesr`.
 
-Some of the most common issues are in the following list. Depending on your
-setup there may be another cause.
+To manage all replication jobs in one place, go to
+__Datacenter -> Replication__. Additional functionalities are
+available under __Node -> Replication__ and __Guest -> Replication__.
+Go there to view logs, schedule a job once for now, or benefit from
+preset fields when configuring a job.
 
-* Network is not working.
+Enabled replication jobs will automatically run at their set interval,
+one after the other. The default interval is at every quarter of an
+hour (`*/15`), and can be set to as often as every minute (`*/1`), see
+xref:chapter_calendar_events[schedule format].
 
-* No free space left on the replication target storage.
+Optionally, the network bandwidth can be limited, which also helps to
+keep the storage load on the target node acceptable.
 
-* Storage with the same storage ID is not available on the target node.
+Shortly after job creation, a first snapshot is taken and sent to the
+target node. Subsequent snapshots are taken at the set interval and
+only contain modified volume data, allowing a significantly shorter
+transfer time.
 
-NOTE: You can always use the replication log to find out what is causing the problem.
+If you remove a replication job, the snapshots on the target node are
+also getting deleted again by default. The removal takes place at the
+next possible point in time and requires the job to be enabled. If the
+target node is permanently unreachable, the cleanup can be skipped by
+forcing a job deletion via CLI.
 
-Migrating a guest in case of Error
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-// FIXME: move this to better fitting chapter (sysadmin ?) and only link to
-// it here
+When not using the web interface, the cluster-wide unique replication
+job ID has to be specified. For example, `100-0`, which is composed of
+the guest ID, a hyphen and an arbitrary job number.
 
-In the case of a grave error, a virtual guest may get stuck on a failed
-node. You then need to move it manually to a working node again.
+[[pvesr_cli_examples]]
+CLI Examples
+------------
 
-Example
-~~~~~~~
+Create a replication job for guest `100` and give it the job number
+`0`. Replicate to node `pve2` every five minutes (`*/5`), at a maximum
+network bandwitdh of `10` MBps (megabytes per second).
 
-Let's assume that you have two guests (VM 100 and CT 200) running on node A
-and replicate to node B.
-Node A failed and can not get back online. Now you have to migrate the guest
-to Node B manually.
+----
+# pvesr create-local-job 100-0 pve2 --schedule "*/5" --rate 10
+----
 
-- connect to node B over ssh or open its shell via the web UI
+List replication jobs from all nodes.
 
-- check if that the cluster is quorate
-+
 ----
-# pvecm status
+# pvesr list
 ----
 
-- If you have no quorum, we strongly advise to fix this first and make the
-  node operable again. Only if this is not possible at the moment, you may
-  use the following command to enforce quorum on the current node:
-+
+List the job statuses from all local guests, or only from a specific
+local guest.
+
 ----
-# pvecm expected 1
+# pvesr status [--guest 100]
 ----
 
-WARNING: Avoid changes which affect the cluster if `expected votes` are set
-(for example adding/removing nodes, storages, virtual guests) at all costs.
-Only use it to get vital guests up and running again or to resolve the quorum
-issue itself.
+Read the configuration of job `100-0`.
 
-- move both guest configuration files form the origin node A to node B:
-+
 ----
-# mv /etc/pve/nodes/A/qemu-server/100.conf /etc/pve/nodes/B/qemu-server/100.conf
-# mv /etc/pve/nodes/A/lxc/200.conf /etc/pve/nodes/B/lxc/200.conf
+# pvesr read 100-0
 ----
 
-- Now you can start the guests again:
-+
+Update the configuration of job `100-0`, for example, to change the
+schedule interval to every full hour.
+
 ----
-# qm start 100
-# pct start 200
+# pvesr update 100-0 --schedule "*/00"
 ----
 
-Remember to replace the VMIDs and node names with your respective values.
+To run the job `100-0` once soon, schedule it regardless of the
+configured interval.
 
-Managing Jobs
--------------
+----
+# pvesr schedule-now 100-0
+----
 
-[thumbnail="screenshot/gui-qemu-add-replication-job.png"]
+Disable (or `enable`) the job `100-0`.
+
+----
+# pvesr disable 100-0
+----
+
+Delete the job `100-0`. If the target node is permanently unreachable,
+`--force` can be used to skip the failing cleanup.
 
-You can use the web GUI to create, modify, and remove replication jobs
-easily. Additionally, the command-line interface (CLI) tool `pvesr` can be
-used to do this.
+----
+# pvesr delete 100-0 [--force]
+----
 
-You can find the replication panel on all levels (datacenter, node, virtual
-guest) in the web GUI. They differ in which jobs get shown:
-all, node- or guest-specific jobs.
+[[pvesr_error_handling]]
+Error Handling
+--------------
 
-When adding a new job, you need to specify the guest if not already selected
-as well as the target node. The replication
-xref:pvesr_schedule_time_format[schedule] can be set if the default of `all
-15 minutes` is not desired. You may impose a rate-limit on a replication
-job. The rate limit can help to keep the load on the storage acceptable.
+[[pvesr_job_failed]]
+Job Failed
+~~~~~~~~~~
 
-A replication job is identified by a cluster-wide unique ID. This ID is
-composed of the VMID in addition to a job number.
-This ID must only be specified manually if the CLI tool is used.
+In the event that a replication job fails, it is temporarily placed in
+an error state and a notification is sent. A retry is scheduled for 5
+minutes later, followed by another 10, 15 and finally every 30
+minutes. As soon as the job has run successfully again, the error
+state is left and the configured interval is resumed.
 
-Network
--------
+.Troubleshooting Job Failures
 
-Replication traffic will use the same network as the live guest migration. By
-default, this is the management network. To use a different network for the
-migration, configure the `Migration Network` in the web interface under
-`Datacenter -> Options -> Migration Settings` or in the `datacenter.cfg`. See
-xref:pvecm_migration_network[Migration Network] for more details.
+To find out why a job exactly failed, read the log available under
+__Node -> Replication__.
 
-Command-line Interface Examples
--------------------------------
+Common causes are:
 
-Create a replication job which runs every 5 minutes with a limited bandwidth
-of 10 Mbps (megabytes per second) for the guest with ID 100.
+* The network is not working properly.
+* The storage (ID) in use has set an availability restriction,
+excluding the target node.
+* The storage is not set up correctly on the target node (e.g.
+different pool name).
+* The storage on the target node has no free space left.
+
+[[pvesr_node_failed]]
+Origin Node Failed
+~~~~~~~~~~~~~~~~~~
+// FIXME: move this to better fitting chapter (sysadmin ?) and only link to
+// it here
 
+In the event that a node running replicated guests fails suddenly and
+for too long, it may become necessary to restart these guests on their
+replicated nodes. If replicated guests are configured for high
+availability (HA), beside its involved
+xref:pvesr_risk_of_data_loss[risk of data loss], just wait until these
+guests are recovered on other nodes. Replicated guests which are not
+configured for HA can be moved manually as explained below, including
+the same risk of data loss.
+
+[[pvesr_find_latest_replicas]]
+.Step 1: Optionally Decide on a Specific Replication Target Node
+
+To minimize the data loss of an important guest, you can find the
+target node on which the most recent successful replication took
+place. If the origin node is healthy enough to access its web
+interface, go to __Node -> Replication__ and see the 'Last Sync'
+column. Alternatively, you can carry out the following steps.
+
+. To list all target nodes of an important guest, exemplary with the
+ID `1000`, go to the CLI of any node and run:
++
 ----
-# pvesr create-local-job 100-0 pve1 --schedule "*/5" --rate 10
+# pvesr list | grep -e Job -e ^1000
 ----
 
-Disable an active job with ID `100-0`.
+. Open the CLI on all listed target nodes.
 
+. Adapt the following command with your VMID to find the most recent
+snapshots among your target nodes. If snapshots were taken in the same
+minute, look for the highest number at the end of the name, which is
+the Unix timestamp.
++
 ----
-# pvesr disable 100-0
+# zfs list -t snapshot -o name,creation | grep -e -1000-disk
 ----
 
-Enable a deactivated job with ID `100-0`.
+[[pvesr_verify_cluster_health]]
+.Step 2: Verify Cluster Health
+
+Go to the CLI of any replication target node and run `pvecm status`.
+If the output contains `Quorate: Yes`, then the cluster/corosync is
+healthy enough and you can proceed with
+xref:pvesr_move_a_guest[Step 3: Move a guest].
 
+WARNING: If the cluster is not quorate and consists of 3 or more
+nodes/votes, we strongly recommend to solve the underlying problem
+first so that at least the majority of nodes/votes are available
+again.
+
+If the cluster is not quorate and consists of only 2 nodes without an
+additional xref:pvecm_external_vote[QDevice], you may want to proceed
+with the following steps to temporary make the cluster functional
+again.
+
+. Check whether the expected votes are `2`.
++
 ----
-# pvesr enable 100-0
+# pvecm status | grep votes
 ----
 
-Change the schedule interval of the job with ID `100-0` to once per hour.
+. Now you can enforce quorum on the one remaining node by running:
++
+----
+# pvecm expected 1
+----
++
+WARNING: Avoid making changes to the cluster in this state at all
+costs, for example adding or removing nodes, storages or guests. Delay
+it until the second node is available again and expected votes have
+been automatically restored to `2`.
+
+[[pvesr_move_a_guest]]
+.Step 3: Move a Guest
 
+. Use SSH to connect to any node that is part of the cluster majority.
+Alternatively, go to the web interface and open the shell of such node
+in a separate window or browser tab.
++
+. The following example commands move a VMID `1000` and CTID `2000`
+from the node named `pve-failed` to a still available replication
+target node named `pve-replicated`.
++
+----
+# cd /etc/pve/nodes/
+# mv pve-failed/qemu-server/1000.conf pve-replicated/qemu-server/
+# mv pve-failed/lxc/2000.conf pve-replicated/lxc/
+----
++
+. Now you can start those guests again:
++
 ----
-# pvesr update 100-0 --schedule '*/00'
+# qm start 1000
+# pct start 2000
 ----
++
+. If it was necessary to enforce the quorum, as described when
+verifying the cluster health, do not forget the warning at the end
+about avoiding changes to the cluster.
 
 ifdef::manvolnum[]
 include::pve-copyright.adoc[]
-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [pve-devel] [PATCH manager v2 2/2] replication: update help button reference
  2024-12-18 16:19 [pve-devel] [PATCH docs v2 1/2] pvesr: update the chapter and bring it into good condition Alexander Zeidler
@ 2024-12-18 16:19 ` Alexander Zeidler
  2025-01-08 10:34 ` [pve-devel] [PATCH docs v2 1/2] pvesr: update the chapter and bring it into good condition Daniel Kral
  1 sibling, 0 replies; 4+ messages in thread
From: Alexander Zeidler @ 2024-12-18 16:19 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Alexander Zeidler <a.zeidler@proxmox.com>
---
 www/manager6/grid/Replication.js | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/www/manager6/grid/Replication.js b/www/manager6/grid/Replication.js
index 79824b9b..51aa9fde 100644
--- a/www/manager6/grid/Replication.js
+++ b/www/manager6/grid/Replication.js
@@ -64,7 +64,7 @@ Ext.define('PVE.window.ReplicaEdit', {
 	    {
 		xtype: 'inputpanel',
 		itemId: 'ipanel',
-		onlineHelp: 'pvesr_schedule_time_format',
+		onlineHelp: 'pvesr_managing_jobs',
 
 		onGetValues: function(values) {
 		    let win = this.up('window');
-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [pve-devel] [PATCH docs v2 1/2] pvesr: update the chapter and bring it into good condition
  2024-12-18 16:19 [pve-devel] [PATCH docs v2 1/2] pvesr: update the chapter and bring it into good condition Alexander Zeidler
  2024-12-18 16:19 ` [pve-devel] [PATCH manager v2 2/2] replication: update help button reference Alexander Zeidler
@ 2025-01-08 10:34 ` Daniel Kral
  2025-01-10 17:00   ` Alexander Zeidler
  1 sibling, 1 reply; 4+ messages in thread
From: Daniel Kral @ 2025-01-08 10:34 UTC (permalink / raw)
  To: Proxmox VE development discussion, Alexander Zeidler

Thanks a lot for taking the time to rewrite this! Your changed text 
reads great and it feels like the right information is pointed to at the 
right time. I've added a few notes inline below.

On 12/18/24 17:19, Alexander Zeidler wrote:
> * restructure and revise the introduction
> * add subchapter "Recommendations"
> * remove the subchapter "Schedule Format" with its one line of content
>    and link where appropriate directly to the copy under "25. Appendix D:
>    Calendar Events". The help button at adding/editing a job links now to
>    the subchapter "Managing Jobs".
> * provide details on job removal and how to enforce it if necessary
> * add more helpful CLI examples and improve existing ones
> * restructure and revise the subchapter "Error Handling"

Since these changes seem all pretty independent from each other, it 
would be great for the git history to have each of them in separate 
commits with the reasoning of the change in the body.

This also makes it easier to follow where old sections have been moved 
to or have been removed (with a reason why it was removed). It would 
also clarify why some hyperlink references (e.g. `pvecm_quorum`) were 
created in this patch series.

> 
> Signed-off-by: Alexander Zeidler <a.zeidler@proxmox.com>
> ---
> v2:
> * no changes, only add missing pve-manager patch
> 
>   pvecm.adoc |   2 +
>   pvesr.adoc | 402 ++++++++++++++++++++++++++++++++++++-----------------
>   2 files changed, 279 insertions(+), 125 deletions(-)
> 
> diff --git a/pvecm.adoc b/pvecm.adoc
> index 15dda4e..4028e92 100644
> --- a/pvecm.adoc
> +++ b/pvecm.adoc
> @@ -486,6 +486,7 @@ authentication. You should fix this by removing the respective keys from the
>   '/etc/pve/priv/authorized_keys' file.
>   
>   
> +[[pvecm_quorum]]
>   Quorum
>   ------
>   
> @@ -963,6 +964,7 @@ case $- in
>   esac
>   ----
>   
> +[[pvecm_external_vote]]
>   Corosync External Vote Support
>   ------------------------------
>   
> diff --git a/pvesr.adoc b/pvesr.adoc
> index 9ad02f5..de29240 100644
> --- a/pvesr.adoc
> +++ b/pvesr.adoc
> @@ -24,48 +24,65 @@ Storage Replication
>   :pve-toplevel:
>   endif::manvolnum[]
>   
> -The `pvesr` command-line tool manages the {PVE} storage replication
> -framework. Storage replication brings redundancy for guests using
> -local storage and reduces migration time.
> -
> -It replicates guest volumes to another node so that all data is available
> -without using shared storage. Replication uses snapshots to minimize traffic
> -sent over the network. Therefore, new data is sent only incrementally after
> -the initial full sync. In the case of a node failure, your guest data is
> -still available on the replicated node.
> -
> -The replication is done automatically in configurable intervals.
> -The minimum replication interval is one minute, and the maximal interval
> -once a week. The format used to specify those intervals is a subset of
> -`systemd` calendar events, see
> -xref:pvesr_schedule_time_format[Schedule Format] section:
> -
> -It is possible to replicate a guest to multiple target nodes,
> -but not twice to the same target node.
> -
> -Each replications bandwidth can be limited, to avoid overloading a storage
> -or server.
> -
> -Only changes since the last replication (so-called `deltas`) need to be
> -transferred if the guest is migrated to a node to which it already is
> -replicated. This reduces the time needed significantly. The replication
> -direction automatically switches if you migrate a guest to the replication
> -target node.
> -
> -For example: VM100 is currently on `nodeA` and gets replicated to `nodeB`.
> -You migrate it to `nodeB`, so now it gets automatically replicated back from
> -`nodeB` to `nodeA`.
> -
> -If you migrate to a node where the guest is not replicated, the whole disk
> -data must send over. After the migration, the replication job continues to
> -replicate this guest to the configured nodes.
> +Storage replication is particularly interesting for small clusters if
> +guest volumes are placed on a local storage instead of a shared one.
> +By replicating the volumes to other cluster nodes, guest migration to
> +those nodes will become significantly faster.

Hm, I like the new paragraph as it's very clear to the reader if the 
following is relevant to their needs right from the start. Still, if the 
reader has no clue about what storage replication could be about, I 
think the first sentence should still be a "Storage replication is 
[short summary description]. It is particularly interesting for...".

> +
> +In the event of a node or local storage failure, the volume data as of
> +the latest completed replication runs are still available on the
> +replication target nodes.
>   
>   [IMPORTANT]
>   ====
> -High-Availability is allowed in combination with storage replication, but there
> -may be some data loss between the last synced time and the time a node failed.
> +While a replication-enabled guest can be configured for
> +xref:chapter_ha_manager[high availability], or
> +xref:pvesr_node_failed[manually moved] while its origin node is not
> +available, read about the involved
> +xref:pvesr_risk_of_data_loss[risk of data loss] and how to avoid it.
>   ====

nit: if possible, make this sentence a little shorter as the new text
      wraps just for the word "it.". Just a nit of course ;).

>   
> +.Replication requires …
> +
> +* at least one other cluster node as a replication target
> +* one common local storage entry in the datacenter, being functional
> +on both nodes
> +* that the local storage type is
> +xref:pvesr_supported_storage[supported by replication]
> +* that guest volumes are stored on that local storage
> +
> +.Replication …
> +
> +* allows a fast migration to nodes where the guest is being replicated
> +* provides guest volume redundancy in a cluster where using a shared
> +storage type is not an option
> +* is configured as a job for a guest, with multiple jobs enabling
> +multiple replication targets
> +* jobs run one after the other at their configured interval (shortest
> +is every minute)
> +* uses snapshots to regularly transmit only changed volume data
> +(so-called deltas)
> +* network bandwidth can be limited per job, smoothing the storage and
> +network utilization
> +* targets stay basically the same when migrating the guest to another
> +node
> +* direction of a job reverses when moving the guest to its configured
> +replication target
> +
> +.Example:
> +
> +A guest runs on node `A` and has replication jobs to node `B` and `C`,
> +both with a set interval of every five minutes (`*/5`). Now we migrate
> +the guest from `A` to `B`, which also automatically updates the
> +replication targets for this guest to be `A` and `C`. Migration was
> +completed fast, as only the changed volume data since the last
> +replication run has been transmitted.
> +
> +In the event that node `B` or its local storage fails, the guest can
> +be restarted on `A` or `C`, with the risk of some data loss as
> +described in this chapter.
> +
> +[[pvesr_supported_storage]]
>   Supported Storage Types
>   -----------------------
>   
> @@ -76,147 +93,282 @@ Supported Storage Types
>   |ZFS (local)    |zfspool     |yes      |yes
>   |=============================================
>   
> -[[pvesr_schedule_time_format]]
> -Schedule Format
> +[[pvesr_recommendations]]
> +Recommendations
>   ---------------

Hm... Would "Considerations" be a better title for this section? All of 
the subsections are more focused about warning the reader of things that 
could happen or what to consider if they should implement storage 
replication in their setup, so the text makes it visible to the reader 
what they should think about and optionally we recommend them something 
they should do.

> -Replication uses xref:chapter_calendar_events[calendar events] for
> -configuring the schedule.
>   
> -Error Handling
> ---------------
> +[[pvesr_risk_of_data_loss]]
> +Risk of Data Loss
> +~~~~~~~~~~~~~~~~~
> +
> +If a node should suddenly become unavailable for a longer period of
> +time, it may become neccessary to run a guest on a replication target
> +node instead. Thereby the guest will use the latest replicated volume
> +data available on the chosen target node. That volume state will then
> +also be replicated to other nodes with the next replication runs,
> +since the replication directions are automatically updated for related
> +jobs. This also means, that the once newer volume state on the failed
> +node will be removed after it becomes available again.
> +
> +A more resilient solution may be to use a shared
> +xref:chapter_storage[storage type] instead. If that is not an option,
> +consider setting the replication job intervals short enough and avoid
> +moving replication-configured guests while their origin node is not
> +available. Instead of configuring those guests for high availability,
> +xref:qm_startup_and_shutdown[start at boot] could be a sufficient
> +alternative.
> +
> +[[pvesr_replication_network]]
> +Network for Replication Traffic
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Replication traffic is routed via the
> +xref:pvecm_migration_network[migration network]. If it is not set, the
> +management network is used by default, which can have a negative
> +impact on corosync and therefore on cluster availability. To specify
> +the migration network, navigate to
> +__Datacenter -> Options -> Migration Settings__, or set it via CLI in
> +the xref:datacenter_configuration_file[`datacenter.cfg`].
> +
> +[[pvesr_cluster_size]]
> +Cluster Size
> +~~~~~~~~~~~~
> +
> +With a 2-node cluster in particular, the failure of one node can leave
> +the other node without a xref:pvecm_quorum[quorum]. In order to keep
> +the cluster functional at all times, it is therefore crucial to
> +xref:pvecm_join_node_to_cluster[expand] to a 3-node cluster in advance
> +or to configure a xref:pvecm_external_vote[QDevice] for the third
> +vote.
> +
> +[[pvesr_managing_jobs]]
> +Managing Jobs
> +-------------
>   
> -If a replication job encounters problems, it is placed in an error state.
> -In this state, the configured replication intervals get suspended
> -temporarily. The failed replication is repeatedly tried again in a
> -30 minute interval.
> -Once this succeeds, the original schedule gets activated again.
> +[thumbnail="screenshot/gui-qemu-add-replication-job.png"]
>   
> -Possible issues
> -~~~~~~~~~~~~~~~
> +Replication jobs can easily be created, modified and removed via web
> +interface, or by using the CLI tool `pvesr`.
>   
> -Some of the most common issues are in the following list. Depending on your
> -setup there may be another cause.
> +To manage all replication jobs in one place, go to
> +__Datacenter -> Replication__. Additional functionalities are
> +available under __Node -> Replication__ and __Guest -> Replication__.
> +Go there to view logs, schedule a job once for now, or benefit from
> +preset fields when configuring a job.
>   
> -* Network is not working.
> +Enabled replication jobs will automatically run at their set interval,
> +one after the other. The default interval is at every quarter of an
> +hour (`*/15`), and can be set to as often as every minute (`*/1`), see

use "every 15 minutes" here as the placeholder text is in the "Create: 
Replication Job" modal in the WebGUI.

> +xref:chapter_calendar_events[schedule format].
>   
> -* No free space left on the replication target storage.
> +Optionally, the network bandwidth can be limited, which also helps to
> +keep the storage load on the target node acceptable.

This sentence / paragraph could be restructured to prioritize why 
someone would like to limit the bandwidth. Something like:

"If the storage replication jobs result in significant I/O load on the 
target node, the network bandwidth of individual jobs can be limited to 
reduce it to an acceptable level."

>   
> -* Storage with the same storage ID is not available on the target node.
> +Shortly after job creation, a first snapshot is taken and sent to the
> +target node. Subsequent snapshots are taken at the set interval and
> +only contain modified volume data, allowing a significantly shorter
> +transfer time.
>   
> -NOTE: You can always use the replication log to find out what is causing the problem.
> +If you remove a replication job, the snapshots on the target node are
> +also getting deleted again by default. The removal takes place at the
> +next possible point in time and requires the job to be enabled. If the
> +target node is permanently unreachable, the cleanup can be skipped by
> +forcing a job deletion via CLI.
>   

[ ... ]

I have only glossed over whether all information from the previous 
article has been preserved and haven't found anything missing so far. 
All being said, this looks great, consider this:

Reviewed-by: Daniel Kral <d.kral@proxmox.com>



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [pve-devel] [PATCH docs v2 1/2] pvesr: update the chapter and bring it into good condition
  2025-01-08 10:34 ` [pve-devel] [PATCH docs v2 1/2] pvesr: update the chapter and bring it into good condition Daniel Kral
@ 2025-01-10 17:00   ` Alexander Zeidler
  0 siblings, 0 replies; 4+ messages in thread
From: Alexander Zeidler @ 2025-01-10 17:00 UTC (permalink / raw)
  To: Daniel Kral, Proxmox VE development discussion

On Wed Jan 8, 2025 at 11:34 AM CET, Daniel Kral wrote:
> Thanks a lot for taking the time to rewrite this! Your changed text 
> reads great and it feels like the right information is pointed to at the 
> right time. I've added a few notes inline below.
Thank you for the review! Most suggestions are implemented in the new
version, others were discussed off-list, see comments below.
v3: https://lore.proxmox.com/pve-devel/20250110165807.3-1-a.zeidler@proxmox.com

>
> On 12/18/24 17:19, Alexander Zeidler wrote:
>> * restructure and revise the introduction
>> * add subchapter "Recommendations"
>> * remove the subchapter "Schedule Format" with its one line of content
>>    and link where appropriate directly to the copy under "25. Appendix D:
>>    Calendar Events". The help button at adding/editing a job links now to
>>    the subchapter "Managing Jobs".
>> * provide details on job removal and how to enforce it if necessary
>> * add more helpful CLI examples and improve existing ones
>> * restructure and revise the subchapter "Error Handling"
>
> Since these changes seem all pretty independent from each other, it 
> would be great for the git history to have each of them in separate 
> commits with the reasoning of the change in the body.
>
> This also makes it easier to follow where old sections have been moved 
> to or have been removed (with a reason why it was removed). It would 
> also clarify why some hyperlink references (e.g. `pvecm_quorum`) were 
> created in this patch series.
Partly implemented in new version. While the bullet points appear pretty
independent from each other, it would not be worth the time to actually
create separate commits this time due to the extensive chapter revision.

I updated the commit message to mention that basically no information is
removed by this commit, beside the superfluous "Schedule Format"
subchapter as already explained in the commit message.

Anchors like `pvecm_quorum` have been created because they are either
now already used or will very likely be used in the future in the
enterprise support, forum and similar.

>
>> 
>> Signed-off-by: Alexander Zeidler <a.zeidler@proxmox.com>
>> ---
>> v2:
>> * no changes, only add missing pve-manager patch
>> 
>>   pvecm.adoc |   2 +
>>   pvesr.adoc | 402 ++++++++++++++++++++++++++++++++++++-----------------
>>   2 files changed, 279 insertions(+), 125 deletions(-)
>> 
>> diff --git a/pvecm.adoc b/pvecm.adoc
>> index 15dda4e..4028e92 100644
>> --- a/pvecm.adoc
>> +++ b/pvecm.adoc
>> @@ -486,6 +486,7 @@ authentication. You should fix this by removing the respective keys from the
>>   '/etc/pve/priv/authorized_keys' file.
>>   
>>   
>> +[[pvecm_quorum]]
>>   Quorum
>>   ------
>>   
>> @@ -963,6 +964,7 @@ case $- in
>>   esac
>>   ----
>>   
>> +[[pvecm_external_vote]]
>>   Corosync External Vote Support
>>   ------------------------------
>>   
>> diff --git a/pvesr.adoc b/pvesr.adoc
>> index 9ad02f5..de29240 100644
>> --- a/pvesr.adoc
>> +++ b/pvesr.adoc
>> @@ -24,48 +24,65 @@ Storage Replication
>>   :pve-toplevel:
>>   endif::manvolnum[]
>>   
>> -The `pvesr` command-line tool manages the {PVE} storage replication
>> -framework. Storage replication brings redundancy for guests using
>> -local storage and reduces migration time.
>> -
>> -It replicates guest volumes to another node so that all data is available
>> -without using shared storage. Replication uses snapshots to minimize traffic
>> -sent over the network. Therefore, new data is sent only incrementally after
>> -the initial full sync. In the case of a node failure, your guest data is
>> -still available on the replicated node.
>> -
>> -The replication is done automatically in configurable intervals.
>> -The minimum replication interval is one minute, and the maximal interval
>> -once a week. The format used to specify those intervals is a subset of
>> -`systemd` calendar events, see
>> -xref:pvesr_schedule_time_format[Schedule Format] section:
>> -
>> -It is possible to replicate a guest to multiple target nodes,
>> -but not twice to the same target node.
>> -
>> -Each replications bandwidth can be limited, to avoid overloading a storage
>> -or server.
>> -
>> -Only changes since the last replication (so-called `deltas`) need to be
>> -transferred if the guest is migrated to a node to which it already is
>> -replicated. This reduces the time needed significantly. The replication
>> -direction automatically switches if you migrate a guest to the replication
>> -target node.
>> -
>> -For example: VM100 is currently on `nodeA` and gets replicated to `nodeB`.
>> -You migrate it to `nodeB`, so now it gets automatically replicated back from
>> -`nodeB` to `nodeA`.
>> -
>> -If you migrate to a node where the guest is not replicated, the whole disk
>> -data must send over. After the migration, the replication job continues to
>> -replicate this guest to the configured nodes.
>> +Storage replication is particularly interesting for small clusters if
>> +guest volumes are placed on a local storage instead of a shared one.
>> +By replicating the volumes to other cluster nodes, guest migration to
>> +those nodes will become significantly faster.
>
> Hm, I like the new paragraph as it's very clear to the reader if the 
> following is relevant to their needs right from the start. Still, if the 
> reader has no clue about what storage replication could be about, I 
> think the first sentence should still be a "Storage replication is 
> [short summary description]. It is particularly interesting for...".
Implemented in new version.

>
>> +
>> +In the event of a node or local storage failure, the volume data as of
>> +the latest completed replication runs are still available on the
>> +replication target nodes.
>>   
>>   [IMPORTANT]
>>   ====
>> -High-Availability is allowed in combination with storage replication, but there
>> -may be some data loss between the last synced time and the time a node failed.
>> +While a replication-enabled guest can be configured for
>> +xref:chapter_ha_manager[high availability], or
>> +xref:pvesr_node_failed[manually moved] while its origin node is not
>> +available, read about the involved
>> +xref:pvesr_risk_of_data_loss[risk of data loss] and how to avoid it.
>>   ====
>
> nit: if possible, make this sentence a little shorter as the new text
>       wraps just for the word "it.". Just a nit of course ;).
Not implemented in new version. The displayed line length depends on
various factors, so IMO it is rather not practical to shorten a sentence
for it. For example, one of my browsers shows no third line at all while
another has a third line with "how to avoid it.".

>
>>   
>> +.Replication requires …
>> +
>> +* at least one other cluster node as a replication target
>> +* one common local storage entry in the datacenter, being functional
>> +on both nodes
>> +* that the local storage type is
>> +xref:pvesr_supported_storage[supported by replication]
>> +* that guest volumes are stored on that local storage
>> +
>> +.Replication …
>> +
>> +* allows a fast migration to nodes where the guest is being replicated
>> +* provides guest volume redundancy in a cluster where using a shared
>> +storage type is not an option
>> +* is configured as a job for a guest, with multiple jobs enabling
>> +multiple replication targets
>> +* jobs run one after the other at their configured interval (shortest
>> +is every minute)
>> +* uses snapshots to regularly transmit only changed volume data
>> +(so-called deltas)
>> +* network bandwidth can be limited per job, smoothing the storage and
>> +network utilization
>> +* targets stay basically the same when migrating the guest to another
>> +node
>> +* direction of a job reverses when moving the guest to its configured
>> +replication target
>> +
>> +.Example:
>> +
>> +A guest runs on node `A` and has replication jobs to node `B` and `C`,
>> +both with a set interval of every five minutes (`*/5`). Now we migrate
>> +the guest from `A` to `B`, which also automatically updates the
>> +replication targets for this guest to be `A` and `C`. Migration was
>> +completed fast, as only the changed volume data since the last
>> +replication run has been transmitted.
>> +
>> +In the event that node `B` or its local storage fails, the guest can
>> +be restarted on `A` or `C`, with the risk of some data loss as
>> +described in this chapter.
>> +
>> +[[pvesr_supported_storage]]
>>   Supported Storage Types
>>   -----------------------
>>   
>> @@ -76,147 +93,282 @@ Supported Storage Types
>>   |ZFS (local)    |zfspool     |yes      |yes
>>   |=============================================
>>   
>> -[[pvesr_schedule_time_format]]
>> -Schedule Format
>> +[[pvesr_recommendations]]
>> +Recommendations
>>   ---------------
>
> Hm... Would "Considerations" be a better title for this section? All of 
> the subsections are more focused about warning the reader of things that 
> could happen or what to consider if they should implement storage 
> replication in their setup, so the text makes it visible to the reader 
> what they should think about and optionally we recommend them something 
> they should do.
Implemented in new version.

>
>> -Replication uses xref:chapter_calendar_events[calendar events] for
>> -configuring the schedule.
>>   
>> -Error Handling
>> ---------------
>> +[[pvesr_risk_of_data_loss]]
>> +Risk of Data Loss
>> +~~~~~~~~~~~~~~~~~
>> +
>> +If a node should suddenly become unavailable for a longer period of
>> +time, it may become neccessary to run a guest on a replication target
>> +node instead. Thereby the guest will use the latest replicated volume
>> +data available on the chosen target node. That volume state will then
>> +also be replicated to other nodes with the next replication runs,
>> +since the replication directions are automatically updated for related
>> +jobs. This also means, that the once newer volume state on the failed
>> +node will be removed after it becomes available again.
>> +
>> +A more resilient solution may be to use a shared
>> +xref:chapter_storage[storage type] instead. If that is not an option,
>> +consider setting the replication job intervals short enough and avoid
>> +moving replication-configured guests while their origin node is not
>> +available. Instead of configuring those guests for high availability,
>> +xref:qm_startup_and_shutdown[start at boot] could be a sufficient
>> +alternative.
>> +
>> +[[pvesr_replication_network]]
>> +Network for Replication Traffic
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +Replication traffic is routed via the
>> +xref:pvecm_migration_network[migration network]. If it is not set, the
>> +management network is used by default, which can have a negative
>> +impact on corosync and therefore on cluster availability. To specify
>> +the migration network, navigate to
>> +__Datacenter -> Options -> Migration Settings__, or set it via CLI in
>> +the xref:datacenter_configuration_file[`datacenter.cfg`].
>> +
>> +[[pvesr_cluster_size]]
>> +Cluster Size
>> +~~~~~~~~~~~~
>> +
>> +With a 2-node cluster in particular, the failure of one node can leave
>> +the other node without a xref:pvecm_quorum[quorum]. In order to keep
>> +the cluster functional at all times, it is therefore crucial to
>> +xref:pvecm_join_node_to_cluster[expand] to a 3-node cluster in advance
>> +or to configure a xref:pvecm_external_vote[QDevice] for the third
>> +vote.
>> +
>> +[[pvesr_managing_jobs]]
>> +Managing Jobs
>> +-------------
>>   
>> -If a replication job encounters problems, it is placed in an error state.
>> -In this state, the configured replication intervals get suspended
>> -temporarily. The failed replication is repeatedly tried again in a
>> -30 minute interval.
>> -Once this succeeds, the original schedule gets activated again.
>> +[thumbnail="screenshot/gui-qemu-add-replication-job.png"]
>>   
>> -Possible issues
>> -~~~~~~~~~~~~~~~
>> +Replication jobs can easily be created, modified and removed via web
>> +interface, or by using the CLI tool `pvesr`.
>>   
>> -Some of the most common issues are in the following list. Depending on your
>> -setup there may be another cause.
>> +To manage all replication jobs in one place, go to
>> +__Datacenter -> Replication__. Additional functionalities are
>> +available under __Node -> Replication__ and __Guest -> Replication__.
>> +Go there to view logs, schedule a job once for now, or benefit from
>> +preset fields when configuring a job.
>>   
>> -* Network is not working.
>> +Enabled replication jobs will automatically run at their set interval,
>> +one after the other. The default interval is at every quarter of an
>> +hour (`*/15`), and can be set to as often as every minute (`*/1`), see
>
> use "every 15 minutes" here as the placeholder text is in the "Create: 
> Replication Job" modal in the WebGUI.
Implemented in new version.

>
>> +xref:chapter_calendar_events[schedule format].
>>   
>> -* No free space left on the replication target storage.
>> +Optionally, the network bandwidth can be limited, which also helps to
>> +keep the storage load on the target node acceptable.
>
> This sentence / paragraph could be restructured to prioritize why 
> someone would like to limit the bandwidth. Something like:
>
> "If the storage replication jobs result in significant I/O load on the 
> target node, the network bandwidth of individual jobs can be limited to 
> reduce it to an acceptable level."
Implemented in new version.

>
>>   
>> -* Storage with the same storage ID is not available on the target node.
>> +Shortly after job creation, a first snapshot is taken and sent to the
>> +target node. Subsequent snapshots are taken at the set interval and
>> +only contain modified volume data, allowing a significantly shorter
>> +transfer time.
>>   
>> -NOTE: You can always use the replication log to find out what is causing the problem.
>> +If you remove a replication job, the snapshots on the target node are
>> +also getting deleted again by default. The removal takes place at the
>> +next possible point in time and requires the job to be enabled. If the
>> +target node is permanently unreachable, the cleanup can be skipped by
>> +forcing a job deletion via CLI.
>>   
>
> [ ... ]
>
> I have only glossed over whether all information from the previous 
> article has been preserved and haven't found anything missing so far. 
> All being said, this looks great, consider this:
>
> Reviewed-by: Daniel Kral <d.kral@proxmox.com>



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-01-10 17:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-12-18 16:19 [pve-devel] [PATCH docs v2 1/2] pvesr: update the chapter and bring it into good condition Alexander Zeidler
2024-12-18 16:19 ` [pve-devel] [PATCH manager v2 2/2] replication: update help button reference Alexander Zeidler
2025-01-08 10:34 ` [pve-devel] [PATCH docs v2 1/2] pvesr: update the chapter and bring it into good condition Daniel Kral
2025-01-10 17:00   ` Alexander Zeidler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal