public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [RFC qemu] fix #3231+#3631: PVE backup: fail backup rather than guest write when backup target cannot be reached or is too slow
Date: Fri, 05 Jul 2024 11:41:07 +0200	[thread overview]
Message-ID: <172017246796.125161.25996190556527554@yuna.proxmox.com> (raw)
In-Reply-To: <20240610125942.116985-1-f.ebner@proxmox.com>

Quoting Fiona Ebner (2024-06-10 14:59:35)
> A long-standing issue with VM backups in Proxmox VE is that a slow or
> unreachable target would lead to a copy-before-write (cbw) operation
> to break the guest write rather than abort the backup. This is
> unexpected to users and the will end up without a successful backup
> and without a working guest in such cases. This series prevents the
> latter by changing the behavior to fail the backup instead of the
> guest write.
> 
> This is done by re-using the already existing 'on-cbw-error' and
> 'cbw-timeout' options that are already used for fleecing and having
> regular backup also check for the cbw's snapshot_error (unfortunately
> this becomes a bit of a misnomer). If a given copy-before-write
> operation cannot complete within 45 seconds, it's extremely likely
> that aborting the backup is the better choice than keeping the guest
> IO blocked.
> 
> Just checking for the error already makes it work (i.e. without the
> last two patches), but backup can only check the error at the end. To
> abort backup immediately, an error callback for the copy-before-write
> node is introduced. A potential alternative would be give the
> block-copy operation a pointer to the snapshot_error and have it check
> it during its operation, but my initial attempt failed. Likely I
> missed adapting certain logic that checks for whether the block-copy
> operation failed and it's questionable if this approach would be
> cleaner. An error callback is nice and explicit.
> 
> Note for testers: if e.g. the PBS is compeletly unreachable, the
> backup job still will need to wait until the in-flight request is
> aborted after 15 minutes. But the guest writes should be fast again.
> 
> Should it really be required to make the option more flexible, i.e.
> allow users to specify a custom timeout or go back to the old behavior
> then the 'backup' QMP call can be extended with those parameters.
> 
> Unfortunately, this is a non-trivial amount of code to make it work,
> but there is quite a bit of boilerplate and some comments, so
> hopefully the logic is straight-forward enough.

this sentence here made me except a lot worse ;) code seems very
straight-forward and clean, two small comments inline. not sure whether we want
to entangle this with 9.0, but I think this could be applied soonish after some
more in-depth testing, since it should solve a pretty big pain point user
consistenly run into..

I am sure we will have users clamoring for a configurable timeout soon after
though ;)

> 
> The first patch can be applied regardless of whether we want to go
> with this or not.
> 
> 
> Fiona Ebner (7):
>   PVE backup: fleecing: properly set minimum cluster size
>   block/copy-before-write: allow passing additional options for
>     bdrv_cbw_append()
>   block/backup: allow passing additional options for copy-before-write
>     upon job creation
>   block/backup: make cbw error also fail backup that does not use
>     fleecing
>   fix #3231+#3631: PVE backup: add timeout for copy-before-write
>     operations and fail backup instead of guest writes
>   block/copy-before-write: allow specifying error callback
>   block/backup: set callback for cbw errors
> 
>  block/backup.c                         | 57 +++++++++++++++++++++++++-
>  block/copy-before-write.c              | 41 +++++++++++++++---
>  block/copy-before-write.h              |  9 +++-
>  block/replication.c                    |  2 +-
>  blockdev.c                             |  2 +-
>  include/block/block_int-global-state.h |  2 +
>  pve-backup.c                           | 13 +++++-
>  7 files changed, 115 insertions(+), 11 deletions(-)
> 
> -- 
> 2.39.2
> 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
>


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


      parent reply	other threads:[~2024-07-05  9:41 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-10 12:59 Fiona Ebner
2024-06-10 12:59 ` [pve-devel] [PATCH qemu 1/7] PVE backup: fleecing: properly set minimum cluster size Fiona Ebner
2024-06-10 12:59 ` [pve-devel] [RFC qemu 2/7] block/copy-before-write: allow passing additional options for bdrv_cbw_append() Fiona Ebner
2024-06-10 12:59 ` [pve-devel] [RFC qemu 3/7] block/backup: allow passing additional options for copy-before-write upon job creation Fiona Ebner
2024-07-05  9:30   ` Fabian Grünbichler
2024-06-10 12:59 ` [pve-devel] [RFC qemu 4/7] block/backup: make cbw error also fail backup that does not use fleecing Fiona Ebner
2024-06-10 12:59 ` [pve-devel] [RFC qemu 5/7] fix #3231+#3631: PVE backup: add timeout for copy-before-write operations and fail backup instead of guest writes Fiona Ebner
2024-06-10 12:59 ` [pve-devel] [RFC qemu 6/7] block/copy-before-write: allow specifying error callback Fiona Ebner
2024-06-10 12:59 ` [pve-devel] [RFC qemu 7/7] block/backup: set callback for cbw errors Fiona Ebner
2024-07-05  9:37   ` Fabian Grünbichler
2024-07-05  9:41 ` Fabian Grünbichler [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=172017246796.125161.25996190556527554@yuna.proxmox.com \
    --to=f.gruenbichler@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal