From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [RFC qemu] fix #3231+#3631: PVE backup: fail backup rather than guest write when backup target cannot be reached or is too slow
Date: Fri, 05 Jul 2024 11:41:07 +0200 [thread overview]
Message-ID: <172017246796.125161.25996190556527554@yuna.proxmox.com> (raw)
In-Reply-To: <20240610125942.116985-1-f.ebner@proxmox.com>
Quoting Fiona Ebner (2024-06-10 14:59:35)
> A long-standing issue with VM backups in Proxmox VE is that a slow or
> unreachable target would lead to a copy-before-write (cbw) operation
> to break the guest write rather than abort the backup. This is
> unexpected to users and the will end up without a successful backup
> and without a working guest in such cases. This series prevents the
> latter by changing the behavior to fail the backup instead of the
> guest write.
>
> This is done by re-using the already existing 'on-cbw-error' and
> 'cbw-timeout' options that are already used for fleecing and having
> regular backup also check for the cbw's snapshot_error (unfortunately
> this becomes a bit of a misnomer). If a given copy-before-write
> operation cannot complete within 45 seconds, it's extremely likely
> that aborting the backup is the better choice than keeping the guest
> IO blocked.
>
> Just checking for the error already makes it work (i.e. without the
> last two patches), but backup can only check the error at the end. To
> abort backup immediately, an error callback for the copy-before-write
> node is introduced. A potential alternative would be give the
> block-copy operation a pointer to the snapshot_error and have it check
> it during its operation, but my initial attempt failed. Likely I
> missed adapting certain logic that checks for whether the block-copy
> operation failed and it's questionable if this approach would be
> cleaner. An error callback is nice and explicit.
>
> Note for testers: if e.g. the PBS is compeletly unreachable, the
> backup job still will need to wait until the in-flight request is
> aborted after 15 minutes. But the guest writes should be fast again.
>
> Should it really be required to make the option more flexible, i.e.
> allow users to specify a custom timeout or go back to the old behavior
> then the 'backup' QMP call can be extended with those parameters.
>
> Unfortunately, this is a non-trivial amount of code to make it work,
> but there is quite a bit of boilerplate and some comments, so
> hopefully the logic is straight-forward enough.
this sentence here made me except a lot worse ;) code seems very
straight-forward and clean, two small comments inline. not sure whether we want
to entangle this with 9.0, but I think this could be applied soonish after some
more in-depth testing, since it should solve a pretty big pain point user
consistenly run into..
I am sure we will have users clamoring for a configurable timeout soon after
though ;)
>
> The first patch can be applied regardless of whether we want to go
> with this or not.
>
>
> Fiona Ebner (7):
> PVE backup: fleecing: properly set minimum cluster size
> block/copy-before-write: allow passing additional options for
> bdrv_cbw_append()
> block/backup: allow passing additional options for copy-before-write
> upon job creation
> block/backup: make cbw error also fail backup that does not use
> fleecing
> fix #3231+#3631: PVE backup: add timeout for copy-before-write
> operations and fail backup instead of guest writes
> block/copy-before-write: allow specifying error callback
> block/backup: set callback for cbw errors
>
> block/backup.c | 57 +++++++++++++++++++++++++-
> block/copy-before-write.c | 41 +++++++++++++++---
> block/copy-before-write.h | 9 +++-
> block/replication.c | 2 +-
> blockdev.c | 2 +-
> include/block/block_int-global-state.h | 2 +
> pve-backup.c | 13 +++++-
> 7 files changed, 115 insertions(+), 11 deletions(-)
>
> --
> 2.39.2
>
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
prev parent reply other threads:[~2024-07-05 9:41 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-10 12:59 Fiona Ebner
2024-06-10 12:59 ` [pve-devel] [PATCH qemu 1/7] PVE backup: fleecing: properly set minimum cluster size Fiona Ebner
2024-06-10 12:59 ` [pve-devel] [RFC qemu 2/7] block/copy-before-write: allow passing additional options for bdrv_cbw_append() Fiona Ebner
2024-06-10 12:59 ` [pve-devel] [RFC qemu 3/7] block/backup: allow passing additional options for copy-before-write upon job creation Fiona Ebner
2024-07-05 9:30 ` Fabian Grünbichler
2024-06-10 12:59 ` [pve-devel] [RFC qemu 4/7] block/backup: make cbw error also fail backup that does not use fleecing Fiona Ebner
2024-06-10 12:59 ` [pve-devel] [RFC qemu 5/7] fix #3231+#3631: PVE backup: add timeout for copy-before-write operations and fail backup instead of guest writes Fiona Ebner
2024-06-10 12:59 ` [pve-devel] [RFC qemu 6/7] block/copy-before-write: allow specifying error callback Fiona Ebner
2024-06-10 12:59 ` [pve-devel] [RFC qemu 7/7] block/backup: set callback for cbw errors Fiona Ebner
2024-07-05 9:37 ` Fabian Grünbichler
2024-07-05 9:41 ` Fabian Grünbichler [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=172017246796.125161.25996190556527554@yuna.proxmox.com \
--to=f.gruenbichler@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.