public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH manager/storage] fix #4289: wait for backup verification to finish before updating volume attribute
@ 2023-01-02 12:36 Christoph Heiss
  2023-01-02 12:36 ` [pve-devel] [PATCH manager] vzdump: pass logfunc down into storage plugin when " Christoph Heiss
  2023-01-02 12:36 ` [pve-devel] [PATCH storage] fix #4289: pbs: wait for backup verification to finish before " Christoph Heiss
  0 siblings, 2 replies; 8+ messages in thread
From: Christoph Heiss @ 2023-01-02 12:36 UTC (permalink / raw)
  To: pve-devel

When creating a backup to a PBS datastore which has the 'Verify New'
flag set, the backup will fail with an "unable to set protected flag"
error. This is due to the volume being immediately locked by the PBS
server for verifying, before PVE has a chance to set the 'protected'
flag.

Fix this by waiting on the verification job to finish before attempting
to set the volume flag.

[ This is really more of an RFC if it even can be done in this 'naive' way: It
feels a bit hacky, esp. the matching of the `worker_id` value, since this can
be an arbitrary string according to the documentation. Maybe there is a better
way to check for that? Also, should this use some timeout just to be safe? ]

Christoph Heiss (1):
      vzdump: pass logfunc down into storage plugin when updating volume attribute

 PVE/VZDump.pm | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Christoph Heiss (1):
      fix #4289: pbs: wait for backup verification to finish before updating volume attribute

 PVE/Storage.pm           |  4 ++--
 PVE/Storage/PBSPlugin.pm | 27 ++++++++++++++++++++++++++-
 2 files changed, 28 insertions(+), 3 deletions(-)




^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] [PATCH manager] vzdump: pass logfunc down into storage plugin when updating volume attribute
  2023-01-02 12:36 [pve-devel] [PATCH manager/storage] fix #4289: wait for backup verification to finish before updating volume attribute Christoph Heiss
@ 2023-01-02 12:36 ` Christoph Heiss
  2023-01-02 12:36 ` [pve-devel] [PATCH storage] fix #4289: pbs: wait for backup verification to finish before " Christoph Heiss
  1 sibling, 0 replies; 8+ messages in thread
From: Christoph Heiss @ 2023-01-02 12:36 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
---
 PVE/VZDump.pm | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/PVE/VZDump.pm b/PVE/VZDump.pm
index a04837e7..6086e80a 100644
--- a/PVE/VZDump.pm
+++ b/PVE/VZDump.pm
@@ -1079,6 +1079,8 @@ sub exec_backup_task {
 	    debugmsg ('info', "archive file size: $cs", $logfd);
 	}

+	my $logfunc = sub { debugmsg($_[0], $_[1], $logfd) };
+
 	# Mark as protected before pruning.
 	if (my $storeid = $opts->{storage}) {
 	    my $volname = $opts->{pbs} ? $task->{target} : basename($task->{target});
@@ -1090,14 +1092,14 @@ sub exec_backup_task {
 		if (my $err = $@) {
 		    debugmsg('warn', "unable to add notes - $err", $logfd);
 		} else {
-		    eval { PVE::Storage::update_volume_attribute($cfg, $volid, 'notes', $notes) };
+		    eval { PVE::Storage::update_volume_attribute($cfg, $volid, 'notes', $notes, $logfunc) };
 		    debugmsg('warn', "unable to add notes - $@", $logfd) if $@;
 		}
 	    }

 	    if ($opts->{protected}) {
 		debugmsg('info', "marking backup as protected", $logfd);
-		eval { PVE::Storage::update_volume_attribute($cfg, $volid, 'protected', 1) };
+		eval { PVE::Storage::update_volume_attribute($cfg, $volid, 'protected', 1, $logfunc) };
 		die "unable to set protected flag - $@\n" if $@;
 	    }
 	}
@@ -1126,7 +1128,7 @@ sub exec_backup_task {
 		    $vmid,
 		    $vmtype,
 		    0,
-		    sub { debugmsg($_[0], $_[1], $logfd) },
+		    $logfunc,
 		);
 		$pruned = scalar(grep { $_->{mark} eq 'remove' } $pruned_list->@*);
 	    }
--
2.30.2





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] [PATCH storage] fix #4289: pbs: wait for backup verification to finish before updating volume attribute
  2023-01-02 12:36 [pve-devel] [PATCH manager/storage] fix #4289: wait for backup verification to finish before updating volume attribute Christoph Heiss
  2023-01-02 12:36 ` [pve-devel] [PATCH manager] vzdump: pass logfunc down into storage plugin when " Christoph Heiss
@ 2023-01-02 12:36 ` Christoph Heiss
  2023-01-04 10:50   ` Fiona Ebner
  1 sibling, 1 reply; 8+ messages in thread
From: Christoph Heiss @ 2023-01-02 12:36 UTC (permalink / raw)
  To: pve-devel

This fixes an "unable to set protected flag" error when backing up to a
PBS datastore that has the 'Verify New' flag set.

This happens due to the volume being locked for verifiying after the
backup completes, but before the request to update the 'protected'
volume flag is made.

Thus delay the volume flag update until the verify job is finished.

Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
---
 PVE/Storage.pm           |  4 ++--
 PVE/Storage/PBSPlugin.pm | 27 ++++++++++++++++++++++++++-
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/PVE/Storage.pm b/PVE/Storage.pm
index 89c7116..47cb266 100755
--- a/PVE/Storage.pm
+++ b/PVE/Storage.pm
@@ -248,7 +248,7 @@ sub get_volume_attribute {
 }

 sub update_volume_attribute {
-    my ($cfg, $volid, $attribute, $value) = @_;
+    my ($cfg, $volid, $attribute, $value, $logfunc) = @_;

     my ($storeid, $volname) = parse_volume_id($volid);
     my $scfg = storage_config($cfg, $storeid);
@@ -278,7 +278,7 @@ sub update_volume_attribute {
 	}
     }

-    return $plugin->update_volume_attribute($scfg, $storeid, $volname, $attribute, $value);
+    return $plugin->update_volume_attribute($scfg, $storeid, $volname, $attribute, $value, $logfunc);
 }

 sub volume_size_info {
diff --git a/PVE/Storage/PBSPlugin.pm b/PVE/Storage/PBSPlugin.pm
index 4320974..1cdbc11 100644
--- a/PVE/Storage/PBSPlugin.pm
+++ b/PVE/Storage/PBSPlugin.pm
@@ -906,8 +906,30 @@ sub get_volume_attribute {
     return;
 }

+sub wait_for_verify_finish {
+    my ($conn, $node, $datastore, $attrs) = @_;
+
+    my $param = {
+	running => 'true',
+	since => $attrs->{'backup-time'},
+	store => $datastore,
+	typefilter => 'verify',
+    };
+
+    my $taskname = sprintf('%s:%s/%s/%X',
+	$datastore,
+        @{$attrs}{qw(backup-type backup-id backup-time)},
+    );
+
+    while (1) {
+	my $res = eval { $conn->get("/api2/json/nodes/$node/tasks", $param); };
+	last if !grep { $_->{worker_id} eq $taskname } @$res;
+	sleep(1);
+    }
+}
+
 sub update_volume_attribute {
-    my ($class, $scfg, $storeid, $volname, $attribute, $value) = @_;
+    my ($class, $scfg, $storeid, $volname, $attribute, $value, $logfunc) = @_;

     if ($attribute eq 'notes') {
 	return $class->update_volume_notes($scfg, $storeid, $volname, $value);
@@ -921,6 +943,9 @@ sub update_volume_attribute {
 	my $conn = pbs_api_connect($scfg, $password);
 	my $datastore = $scfg->{datastore};

+	$logfunc->('info', 'waiting for server to finish backup verification...') if $logfunc;
+	wait_for_verify_finish($conn, $scfg->{server}, $datastore, $param);
+
 	eval { $conn->put("/api2/json/admin/datastore/$datastore/$attribute", $param); };
 	if (my $err = $@) {
 	    die "Server is not recent enough to support feature '$attribute'\n"
--
2.30.2





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pve-devel] [PATCH storage] fix #4289: pbs: wait for backup verification to finish before updating volume attribute
  2023-01-02 12:36 ` [pve-devel] [PATCH storage] fix #4289: pbs: wait for backup verification to finish before " Christoph Heiss
@ 2023-01-04 10:50   ` Fiona Ebner
  2023-01-10 11:11     ` Christoph Heiss
  0 siblings, 1 reply; 8+ messages in thread
From: Fiona Ebner @ 2023-01-04 10:50 UTC (permalink / raw)
  To: pve-devel, c.heiss

Am 02.01.23 um 13:36 schrieb Christoph Heiss:
> diff --git a/PVE/Storage/PBSPlugin.pm b/PVE/Storage/PBSPlugin.pm
> index 4320974..1cdbc11 100644
> --- a/PVE/Storage/PBSPlugin.pm
> +++ b/PVE/Storage/PBSPlugin.pm
> @@ -906,8 +906,30 @@ sub get_volume_attribute {
>      return;
>  }
> 
> +sub wait_for_verify_finish {
> +    my ($conn, $node, $datastore, $attrs) = @_;
> +
> +    my $param = {
> +	running => 'true',
> +	since => $attrs->{'backup-time'},
> +	store => $datastore,
> +	typefilter => 'verify',
> +    };
> +
> +    my $taskname = sprintf('%s:%s/%s/%X',
> +	$datastore,
> +        @{$attrs}{qw(backup-type backup-id backup-time)},
> +    );

I don't think it's likely that the task name format here will change
often, but as you already mentioned in the cover letter, it's not ideal
to have it hard-coded here.

> +
> +    while (1) {
> +	my $res = eval { $conn->get("/api2/json/nodes/$node/tasks", $param); };
> +	last if !grep { $_->{worker_id} eq $taskname } @$res;
> +	sleep(1);
> +    }
> +}
> +
> @@ -921,6 +943,9 @@ sub update_volume_attribute {
>  	my $conn = pbs_api_connect($scfg, $password);
>  	my $datastore = $scfg->{datastore};
> 
> +	$logfunc->('info', 'waiting for server to finish backup verification...') if $logfunc;

Should only be printed if there is actually a verification we need to
wait for.

> +	wait_for_verify_finish($conn, $scfg->{server}, $datastore, $param);

To me, it feels out of place to be concerned with waiting on
verification in (the rather low-level) update_volume_attribute(), which
is a rather specific thing to do. I'd say it's fine to fail there when
the snapshot is locked by verification or some other operation.

Waiting for verification also can increase the backup duration/time
holding the vzdump lock on the PVE side quite a bit. It might not seem
that big of a deal, because usually only manual backups use 'protected'.
But by doing it in update_volume_attribute(), you also do it for
'notes', where it's not needed and which is relevant to backup jobs
where the increased wait might be very noticeable. So at least, it
should only be done for 'protected' if doing it in
update_volume_attribute().

It would be better if the protected flag could be specified upon
creation already. Would also fix the following race I guess:
1. backup finishes
2. prune running on PBS
3. protected status set from PVE

If going for the waiting approach after all, I think it should rather be
done in vzdump, before calling update_volume_attribute(). And the helper
to wait on verification should likely be part of PBSClient.pm (would
need to teach it to use an API connection first).




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pve-devel] [PATCH storage] fix #4289: pbs: wait for backup verification to finish before updating volume attribute
  2023-01-04 10:50   ` Fiona Ebner
@ 2023-01-10 11:11     ` Christoph Heiss
  2023-01-10 12:34       ` Fiona Ebner
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Heiss @ 2023-01-10 11:11 UTC (permalink / raw)
  To: Fiona Ebner; +Cc: pve-devel

Thanks for the review!

On Wed, Jan 04, 2023 at 11:50:38AM +0100, Fiona Ebner wrote:
> Am 02.01.23 um 13:36 schrieb Christoph Heiss:
> > diff --git a/PVE/Storage/PBSPlugin.pm b/PVE/Storage/PBSPlugin.pm
> > index 4320974..1cdbc11 100644
> > --- a/PVE/Storage/PBSPlugin.pm
> > +++ b/PVE/Storage/PBSPlugin.pm
> > @@ -906,8 +906,30 @@ sub get_volume_attribute {
> >      return;
> >  }
> >
> > +sub wait_for_verify_finish {
> > +    my ($conn, $node, $datastore, $attrs) = @_;
> > +
> > +    my $param = {
> > +	running => 'true',
> > +	since => $attrs->{'backup-time'},
> > +	store => $datastore,
> > +	typefilter => 'verify',
> > +    };
> > +
> > +    my $taskname = sprintf('%s:%s/%s/%X',
> > +	$datastore,
> > +        @{$attrs}{qw(backup-type backup-id backup-time)},
> > +    );
>
> I don't think it's likely that the task name format here will change
> often, but as you already mentioned in the cover letter, it's not ideal
> to have it hard-coded here.
>
> > +
> > +    while (1) {
> > +	my $res = eval { $conn->get("/api2/json/nodes/$node/tasks", $param); };
> > +	last if !grep { $_->{worker_id} eq $taskname } @$res;
> > +	sleep(1);
> > +    }
> > +}
> > +
> > @@ -921,6 +943,9 @@ sub update_volume_attribute {
> >  	my $conn = pbs_api_connect($scfg, $password);
> >  	my $datastore = $scfg->{datastore};
> >
> > +	$logfunc->('info', 'waiting for server to finish backup verification...') if $logfunc;
>
> Should only be printed if there is actually a verification we need to
> wait for.
Makes sense.

>
> > +	wait_for_verify_finish($conn, $scfg->{server}, $datastore, $param);
>
> To me, it feels out of place to be concerned with waiting on
> verification in (the rather low-level) update_volume_attribute(), which
> is a rather specific thing to do. I'd say it's fine to fail there when
> the snapshot is locked by verification or some other operation.
>
> Waiting for verification also can increase the backup duration/time
> holding the vzdump lock on the PVE side quite a bit.
That was one of my concerns too. Especially for very big VMs this can
probably delay the task quite a bit.

> It might not seem that big of a deal, because usually only manual
> backups use 'protected'.  But by doing it in
> update_volume_attribute(), you also do it for 'notes', where it's not
> needed and which is relevant to backup jobs where the increased wait
> might be very noticeable. So at least, it should only be done for
> 'protected' if doing it in update_volume_attribute().
That is actually the case now - updating notes takes a different path
through update_volume_notes().

>
> It would be better if the protected flag could be specified upon
> creation already. Would also fix the following race I guess:
It definitely would be a lot cleaner. I'll see what I can do and rework
the whole series.
Probably involves adding a new parameter to the `proxmox-backup-client
backup` command and API(?) AFAICS. But this would not be all that bad
of a feature for the backup client in general, I think.

And I guess I need to figure out a way how to detect whether the new
parameter is supported or not?
In case this it not supported, just keeping the current behavior (i.e.
best-effort via the API and maybe failing) is probably the sensible way.

> 1. backup finishes
> 2. prune running on PBS
> 3. protected status set from PVE
>
> If going for the waiting approach after all, I think it should rather be
> done in vzdump, before calling update_volume_attribute(). And the helper
> to wait on verification should likely be part of PBSClient.pm (would
> need to teach it to use an API connection first).





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pve-devel] [PATCH storage] fix #4289: pbs: wait for backup verification to finish before updating volume attribute
  2023-01-10 11:11     ` Christoph Heiss
@ 2023-01-10 12:34       ` Fiona Ebner
  2023-01-10 12:44         ` Christoph Heiss
  0 siblings, 1 reply; 8+ messages in thread
From: Fiona Ebner @ 2023-01-10 12:34 UTC (permalink / raw)
  To: Christoph Heiss; +Cc: pve-devel

Am 10.01.23 um 12:11 schrieb Christoph Heiss:
> On Wed, Jan 04, 2023 at 11:50:38AM +0100, Fiona Ebner wrote:
>> It might not seem that big of a deal, because usually only manual
>> backups use 'protected'.  But by doing it in
>> update_volume_attribute(), you also do it for 'notes', where it's not
>> needed and which is relevant to backup jobs where the increased wait
>> might be very noticeable. So at least, it should only be done for
>> 'protected' if doing it in update_volume_attribute().
> That is actually the case now - updating notes takes a different path
> through update_volume_notes().
> 

Sorry, I missed that.

>>
>> It would be better if the protected flag could be specified upon
>> creation already. Would also fix the following race I guess:
> It definitely would be a lot cleaner. I'll see what I can do and rework
> the whole series.
> Probably involves adding a new parameter to the `proxmox-backup-client
> backup` command and API(?) AFAICS. But this would not be all that bad
> of a feature for the backup client in general, I think.

I think you also need to add support in QEMU (new parameter for the
'backup' QMP command) and the proxmox-backup-qemu library (to handle the
parameter).

Regarding the API, maybe it can be its own endpoint in the backup API
(alongside endpoints like 'blob' and 'finish')? As long as we protect
the backup before marking it as finished it should be good. Just an
idea, not sure if it would be better.

> And I guess I need to figure out a way how to detect whether the new
> parameter is supported or not?

If there is no straightforward way to make that information available in
VZDump.pm, we could also just base the decision off of the PBS version.

One way to decide if the current behavior should be used as a fallback
would be to check the protected status after finishing the backup. That
is slightly racy though, because something else could've already changed
the protection between finishing and the check.

> In case this it not supported, just keeping the current behavior (i.e.
> best-effort via the API and maybe failing) is probably the sensible way.

Yes, to not break existing setups. Also note that non-PBS backup
storages need the current behavior too.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pve-devel] [PATCH storage] fix #4289: pbs: wait for backup verification to finish before updating volume attribute
  2023-01-10 12:34       ` Fiona Ebner
@ 2023-01-10 12:44         ` Christoph Heiss
       [not found]           ` <159837ba-f916-7b03-2cab-8e486b38b6bb@proxmox.com>
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Heiss @ 2023-01-10 12:44 UTC (permalink / raw)
  To: Fiona Ebner; +Cc: pve-devel

On Tue, Jan 10, 2023 at 01:34:14PM +0100, Fiona Ebner wrote:
> Am 10.01.23 um 12:11 schrieb Christoph Heiss:
> > On Wed, Jan 04, 2023 at 11:50:38AM +0100, Fiona Ebner wrote:
> >> It might not seem that big of a deal, because usually only manual
> >> backups use 'protected'.  But by doing it in
> >> update_volume_attribute(), you also do it for 'notes', where it's not
> >> needed and which is relevant to backup jobs where the increased wait
> >> might be very noticeable. So at least, it should only be done for
> >> 'protected' if doing it in update_volume_attribute().
> > That is actually the case now - updating notes takes a different path
> > through update_volume_notes().
> >
>
> Sorry, I missed that.
>
> >>
> >> It would be better if the protected flag could be specified upon
> >> creation already. Would also fix the following race I guess:
> > It definitely would be a lot cleaner. I'll see what I can do and rework
> > the whole series.
> > Probably involves adding a new parameter to the `proxmox-backup-client
> > backup` command and API(?) AFAICS. But this would not be all that bad
> > of a feature for the backup client in general, I think.
>
> I think you also need to add support in QEMU (new parameter for the
> 'backup' QMP command) and the proxmox-backup-qemu library (to handle the
> parameter).
Thanks for the pointers!

>
> Regarding the API, maybe it can be its own endpoint in the backup API
> (alongside endpoints like 'blob' and 'finish')? As long as we protect
> the backup before marking it as finished it should be good. Just an
> idea, not sure if it would be better.
After looking into it, my first though was maybe to add a (boolean)
parameter to the `finish` endpoint.
But creating a separate endpoint and calling that before `finish` sounds
very reasonable as well.
Any thoughts on what would be more idiomatic/reasonable?

>
> > And I guess I need to figure out a way how to detect whether the new
> > parameter is supported or not?
>
> If there is no straightforward way to make that information available in
> VZDump.pm, we could also just base the decision off of the PBS version.
Thanks for the idea, that may be doable!

>
> One way to decide if the current behavior should be used as a fallback
> would be to check the protected status after finishing the backup. That
> is slightly racy though, because something else could've already changed
> the protection between finishing and the check.
I'd base it off the decision from above - if the `proxmox-backup-client`
version supports setting it directly, use that, otherwise simply fall
back.

>
> > In case this it not supported, just keeping the current behavior (i.e.
> > best-effort via the API and maybe failing) is probably the sensible way.
>
> Yes, to not break existing setups. Also note that non-PBS backup
> storages need the current behavior too.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pve-devel] [PATCH storage] fix #4289: pbs: wait for backup verification to finish before updating volume attribute
       [not found]           ` <159837ba-f916-7b03-2cab-8e486b38b6bb@proxmox.com>
@ 2023-01-10 13:21             ` Fiona Ebner
  0 siblings, 0 replies; 8+ messages in thread
From: Fiona Ebner @ 2023-01-10 13:21 UTC (permalink / raw)
  To: pve-devel, Christoph Heiss

Am 10.01.23 um 14:06 schrieb Fiona Ebner:
> Am 10.01.23 um 13:44 schrieb Christoph Heiss:
>> On Tue, Jan 10, 2023 at 01:34:14PM +0100, Fiona Ebner wrote:
>>> One way to decide if the current behavior should be used as a fallback
>>> would be to check the protected status after finishing the backup. That
>>> is slightly racy though, because something else could've already changed
>>> the protection between finishing and the check.
>> I'd base it off the decision from above - if the `proxmox-backup-client`
>> version supports setting it directly, use that, otherwise simply fall
>> back.
> It's not just the client, but the server that needs to support it too.
> To make sure that the client/QEMU/etc. support it, we can just have
> pve-manager depend on a recent enough version. For the server, there is
> a /version API endpoint we can query.

For QEMU, we don't want to force specific package versions, so using
package dependency is not good there. Instead, we can use the
'query-proxmox-support' QMP command to see if it's supported. That also
makes the check work for already running VMs.




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-01-10 13:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-02 12:36 [pve-devel] [PATCH manager/storage] fix #4289: wait for backup verification to finish before updating volume attribute Christoph Heiss
2023-01-02 12:36 ` [pve-devel] [PATCH manager] vzdump: pass logfunc down into storage plugin when " Christoph Heiss
2023-01-02 12:36 ` [pve-devel] [PATCH storage] fix #4289: pbs: wait for backup verification to finish before " Christoph Heiss
2023-01-04 10:50   ` Fiona Ebner
2023-01-10 11:11     ` Christoph Heiss
2023-01-10 12:34       ` Fiona Ebner
2023-01-10 12:44         ` Christoph Heiss
     [not found]           ` <159837ba-f916-7b03-2cab-8e486b38b6bb@proxmox.com>
2023-01-10 13:21             ` Fiona Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal