Re: [pve-devel] [PATCH docs v2 1/2] pvesr: update the chapter and bring it into good condition

From: "Alexander Zeidler" <a.zeidler@proxmox.com>
To: "Daniel Kral" <d.kral@proxmox.com>,
	"Proxmox VE development discussion" <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [PATCH docs v2 1/2] pvesr: update the chapter and bring it into good condition
Date: Fri, 10 Jan 2025 18:00:29 +0100	[thread overview]
Message-ID: <D6YJYBDHQ0ZO.IGBZBNF0MXUX@proxmox.com> (raw)
In-Reply-To: <f43f93a4-299f-4101-916c-66ae299fcfb7@proxmox.com>

On Wed Jan 8, 2025 at 11:34 AM CET, Daniel Kral wrote:
> Thanks a lot for taking the time to rewrite this! Your changed text 
> reads great and it feels like the right information is pointed to at the 
> right time. I've added a few notes inline below.
Thank you for the review! Most suggestions are implemented in the new
version, others were discussed off-list, see comments below.
v3: https://lore.proxmox.com/pve-devel/20250110165807.3-1-a.zeidler@proxmox.com

>
> On 12/18/24 17:19, Alexander Zeidler wrote:
>> * restructure and revise the introduction
>> * add subchapter "Recommendations"
>> * remove the subchapter "Schedule Format" with its one line of content
>>    and link where appropriate directly to the copy under "25. Appendix D:
>>    Calendar Events". The help button at adding/editing a job links now to
>>    the subchapter "Managing Jobs".
>> * provide details on job removal and how to enforce it if necessary
>> * add more helpful CLI examples and improve existing ones
>> * restructure and revise the subchapter "Error Handling"
>
> Since these changes seem all pretty independent from each other, it 
> would be great for the git history to have each of them in separate 
> commits with the reasoning of the change in the body.
>
> This also makes it easier to follow where old sections have been moved 
> to or have been removed (with a reason why it was removed). It would 
> also clarify why some hyperlink references (e.g. `pvecm_quorum`) were 
> created in this patch series.
Partly implemented in new version. While the bullet points appear pretty
independent from each other, it would not be worth the time to actually
create separate commits this time due to the extensive chapter revision.

I updated the commit message to mention that basically no information is
removed by this commit, beside the superfluous "Schedule Format"
subchapter as already explained in the commit message.

Anchors like `pvecm_quorum` have been created because they are either
now already used or will very likely be used in the future in the
enterprise support, forum and similar.

>
>> 
>> Signed-off-by: Alexander Zeidler <a.zeidler@proxmox.com>
>> ---
>> v2:
>> * no changes, only add missing pve-manager patch
>> 
>>   pvecm.adoc |   2 +
>>   pvesr.adoc | 402 ++++++++++++++++++++++++++++++++++++-----------------
>>   2 files changed, 279 insertions(+), 125 deletions(-)
>> 
>> diff --git a/pvecm.adoc b/pvecm.adoc
>> index 15dda4e..4028e92 100644
>> --- a/pvecm.adoc
>> +++ b/pvecm.adoc
>> @@ -486,6 +486,7 @@ authentication. You should fix this by removing the respective keys from the
>>   '/etc/pve/priv/authorized_keys' file.
>>   
>>   
>> +[[pvecm_quorum]]
>>   Quorum
>>   ------
>>   
>> @@ -963,6 +964,7 @@ case $- in
>>   esac
>>   ----
>>   
>> +[[pvecm_external_vote]]
>>   Corosync External Vote Support
>>   ------------------------------
>>   
>> diff --git a/pvesr.adoc b/pvesr.adoc
>> index 9ad02f5..de29240 100644
>> --- a/pvesr.adoc
>> +++ b/pvesr.adoc
>> @@ -24,48 +24,65 @@ Storage Replication
>>   :pve-toplevel:
>>   endif::manvolnum[]
>>   
>> -The `pvesr` command-line tool manages the {PVE} storage replication
>> -framework. Storage replication brings redundancy for guests using
>> -local storage and reduces migration time.
>> -
>> -It replicates guest volumes to another node so that all data is available
>> -without using shared storage. Replication uses snapshots to minimize traffic
>> -sent over the network. Therefore, new data is sent only incrementally after
>> -the initial full sync. In the case of a node failure, your guest data is
>> -still available on the replicated node.
>> -
>> -The replication is done automatically in configurable intervals.
>> -The minimum replication interval is one minute, and the maximal interval
>> -once a week. The format used to specify those intervals is a subset of
>> -`systemd` calendar events, see
>> -xref:pvesr_schedule_time_format[Schedule Format] section:
>> -
>> -It is possible to replicate a guest to multiple target nodes,
>> -but not twice to the same target node.
>> -
>> -Each replications bandwidth can be limited, to avoid overloading a storage
>> -or server.
>> -
>> -Only changes since the last replication (so-called `deltas`) need to be
>> -transferred if the guest is migrated to a node to which it already is
>> -replicated. This reduces the time needed significantly. The replication
>> -direction automatically switches if you migrate a guest to the replication
>> -target node.
>> -
>> -For example: VM100 is currently on `nodeA` and gets replicated to `nodeB`.
>> -You migrate it to `nodeB`, so now it gets automatically replicated back from
>> -`nodeB` to `nodeA`.
>> -
>> -If you migrate to a node where the guest is not replicated, the whole disk
>> -data must send over. After the migration, the replication job continues to
>> -replicate this guest to the configured nodes.
>> +Storage replication is particularly interesting for small clusters if
>> +guest volumes are placed on a local storage instead of a shared one.
>> +By replicating the volumes to other cluster nodes, guest migration to
>> +those nodes will become significantly faster.
>
> Hm, I like the new paragraph as it's very clear to the reader if the 
> following is relevant to their needs right from the start. Still, if the 
> reader has no clue about what storage replication could be about, I 
> think the first sentence should still be a "Storage replication is 
> [short summary description]. It is particularly interesting for...".
Implemented in new version.

>
>> +
>> +In the event of a node or local storage failure, the volume data as of
>> +the latest completed replication runs are still available on the
>> +replication target nodes.
>>   
>>   [IMPORTANT]
>>   ====
>> -High-Availability is allowed in combination with storage replication, but there
>> -may be some data loss between the last synced time and the time a node failed.
>> +While a replication-enabled guest can be configured for
>> +xref:chapter_ha_manager[high availability], or
>> +xref:pvesr_node_failed[manually moved] while its origin node is not
>> +available, read about the involved
>> +xref:pvesr_risk_of_data_loss[risk of data loss] and how to avoid it.
>>   ====
>
> nit: if possible, make this sentence a little shorter as the new text
>       wraps just for the word "it.". Just a nit of course ;).
Not implemented in new version. The displayed line length depends on
various factors, so IMO it is rather not practical to shorten a sentence
for it. For example, one of my browsers shows no third line at all while
another has a third line with "how to avoid it.".

>
>>   
>> +.Replication requires …
>> +
>> +* at least one other cluster node as a replication target
>> +* one common local storage entry in the datacenter, being functional
>> +on both nodes
>> +* that the local storage type is
>> +xref:pvesr_supported_storage[supported by replication]
>> +* that guest volumes are stored on that local storage
>> +
>> +.Replication …
>> +
>> +* allows a fast migration to nodes where the guest is being replicated
>> +* provides guest volume redundancy in a cluster where using a shared
>> +storage type is not an option
>> +* is configured as a job for a guest, with multiple jobs enabling
>> +multiple replication targets
>> +* jobs run one after the other at their configured interval (shortest
>> +is every minute)
>> +* uses snapshots to regularly transmit only changed volume data
>> +(so-called deltas)
>> +* network bandwidth can be limited per job, smoothing the storage and
>> +network utilization
>> +* targets stay basically the same when migrating the guest to another
>> +node
>> +* direction of a job reverses when moving the guest to its configured
>> +replication target
>> +
>> +.Example:
>> +
>> +A guest runs on node `A` and has replication jobs to node `B` and `C`,
>> +both with a set interval of every five minutes (`*/5`). Now we migrate
>> +the guest from `A` to `B`, which also automatically updates the
>> +replication targets for this guest to be `A` and `C`. Migration was
>> +completed fast, as only the changed volume data since the last
>> +replication run has been transmitted.
>> +
>> +In the event that node `B` or its local storage fails, the guest can
>> +be restarted on `A` or `C`, with the risk of some data loss as
>> +described in this chapter.
>> +
>> +[[pvesr_supported_storage]]
>>   Supported Storage Types
>>   -----------------------
>>   
>> @@ -76,147 +93,282 @@ Supported Storage Types
>>   |ZFS (local)    |zfspool     |yes      |yes
>>   |=============================================
>>   
>> -[[pvesr_schedule_time_format]]
>> -Schedule Format
>> +[[pvesr_recommendations]]
>> +Recommendations
>>   ---------------
>
> Hm... Would "Considerations" be a better title for this section? All of 
> the subsections are more focused about warning the reader of things that 
> could happen or what to consider if they should implement storage 
> replication in their setup, so the text makes it visible to the reader 
> what they should think about and optionally we recommend them something 
> they should do.
Implemented in new version.

>
>> -Replication uses xref:chapter_calendar_events[calendar events] for
>> -configuring the schedule.
>>   
>> -Error Handling
>> ---------------
>> +[[pvesr_risk_of_data_loss]]
>> +Risk of Data Loss
>> +~~~~~~~~~~~~~~~~~
>> +
>> +If a node should suddenly become unavailable for a longer period of
>> +time, it may become neccessary to run a guest on a replication target
>> +node instead. Thereby the guest will use the latest replicated volume
>> +data available on the chosen target node. That volume state will then
>> +also be replicated to other nodes with the next replication runs,
>> +since the replication directions are automatically updated for related
>> +jobs. This also means, that the once newer volume state on the failed
>> +node will be removed after it becomes available again.
>> +
>> +A more resilient solution may be to use a shared
>> +xref:chapter_storage[storage type] instead. If that is not an option,
>> +consider setting the replication job intervals short enough and avoid
>> +moving replication-configured guests while their origin node is not
>> +available. Instead of configuring those guests for high availability,
>> +xref:qm_startup_and_shutdown[start at boot] could be a sufficient
>> +alternative.
>> +
>> +[[pvesr_replication_network]]
>> +Network for Replication Traffic
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +Replication traffic is routed via the
>> +xref:pvecm_migration_network[migration network]. If it is not set, the
>> +management network is used by default, which can have a negative
>> +impact on corosync and therefore on cluster availability. To specify
>> +the migration network, navigate to
>> +__Datacenter -> Options -> Migration Settings__, or set it via CLI in
>> +the xref:datacenter_configuration_file[`datacenter.cfg`].
>> +
>> +[[pvesr_cluster_size]]
>> +Cluster Size
>> +~~~~~~~~~~~~
>> +
>> +With a 2-node cluster in particular, the failure of one node can leave
>> +the other node without a xref:pvecm_quorum[quorum]. In order to keep
>> +the cluster functional at all times, it is therefore crucial to
>> +xref:pvecm_join_node_to_cluster[expand] to a 3-node cluster in advance
>> +or to configure a xref:pvecm_external_vote[QDevice] for the third
>> +vote.
>> +
>> +[[pvesr_managing_jobs]]
>> +Managing Jobs
>> +-------------
>>   
>> -If a replication job encounters problems, it is placed in an error state.
>> -In this state, the configured replication intervals get suspended
>> -temporarily. The failed replication is repeatedly tried again in a
>> -30 minute interval.
>> -Once this succeeds, the original schedule gets activated again.
>> +[thumbnail="screenshot/gui-qemu-add-replication-job.png"]
>>   
>> -Possible issues
>> -~~~~~~~~~~~~~~~
>> +Replication jobs can easily be created, modified and removed via web
>> +interface, or by using the CLI tool `pvesr`.
>>   
>> -Some of the most common issues are in the following list. Depending on your
>> -setup there may be another cause.
>> +To manage all replication jobs in one place, go to
>> +__Datacenter -> Replication__. Additional functionalities are
>> +available under __Node -> Replication__ and __Guest -> Replication__.
>> +Go there to view logs, schedule a job once for now, or benefit from
>> +preset fields when configuring a job.
>>   
>> -* Network is not working.
>> +Enabled replication jobs will automatically run at their set interval,
>> +one after the other. The default interval is at every quarter of an
>> +hour (`*/15`), and can be set to as often as every minute (`*/1`), see
>
> use "every 15 minutes" here as the placeholder text is in the "Create: 
> Replication Job" modal in the WebGUI.
Implemented in new version.

>
>> +xref:chapter_calendar_events[schedule format].
>>   
>> -* No free space left on the replication target storage.
>> +Optionally, the network bandwidth can be limited, which also helps to
>> +keep the storage load on the target node acceptable.
>
> This sentence / paragraph could be restructured to prioritize why 
> someone would like to limit the bandwidth. Something like:
>
> "If the storage replication jobs result in significant I/O load on the 
> target node, the network bandwidth of individual jobs can be limited to 
> reduce it to an acceptable level."
Implemented in new version.

>
>>   
>> -* Storage with the same storage ID is not available on the target node.
>> +Shortly after job creation, a first snapshot is taken and sent to the
>> +target node. Subsequent snapshots are taken at the set interval and
>> +only contain modified volume data, allowing a significantly shorter
>> +transfer time.
>>   
>> -NOTE: You can always use the replication log to find out what is causing the problem.
>> +If you remove a replication job, the snapshots on the target node are
>> +also getting deleted again by default. The removal takes place at the
>> +next possible point in time and requires the job to be enabled. If the
>> +target node is permanently unreachable, the cleanup can be skipped by
>> +forcing a job deletion via CLI.
>>   
>
> [ ... ]
>
> I have only glossed over whether all information from the previous 
> article has been preserved and haven't found anything missing so far. 
> All being said, this looks great, consider this:
>
> Reviewed-by: Daniel Kral <d.kral@proxmox.com>

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel