[pve-devel] [PATCH v2 pve-guest-common/pve-docs] Add pre/post/failed-snapshot hooks

public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed

* [pve-devel] [PATCH v2 pve-guest-common/pve-docs] Add pre/post/failed-snapshot hooks
@ 2022-12-12 13:43 Stefan Hanreich
  2022-12-12 13:43 ` [pve-devel] [PATCH pve-docs 1/1] examples: add pre/post/failed-snapshot hooks to example hookscript Stefan Hanreich
  2022-12-12 13:43 ` [pve-devel] [PATCH pve-guest-common 1/1] partially fix #2530: snapshots: add pre/post/failed-snapshot hooks Stefan Hanreich
  0 siblings, 2 replies; 7+ messages in thread
From: Stefan Hanreich @ 2022-12-12 13:43 UTC (permalink / raw)
  To: pve-devel

This patch series introduces the pre/post/failed-snapshot hooks that run before/
after a snapshot is taken, or after failing to take a snapshot.

I used the new example script from pve-docs as template for my test hookscripts.

What I tested:
- Normal snapshotting, without VM state, without hookscript
  - snapshot works, no hooks executed
- Normal snapshotting, with VM state, without hookscript
  - snapshot works, no hooks executed
- Normal snapshotting, without VM state, with hookscript
  - snapshot works, pre/post hooks work
- Normal snapshotting, with VM state, with hookscript
  - snapshot works, pre/post hooks work
- Taking snapshot with existing name, with hookscript
  - fails, no hookscripts get executed
- Failed at wrong storage config, with hookscript
  - pre/failed get executed, lock gets released
- Failed at taking RAM Snapshot (simulated with monkey-patched die), with hookscript
  - pre/failed get executed, lock gets released
- Add hookscript that attaches/detaches unsnapshottable disk (without --skiplock)
  - snapshotting fails, attach/detach fails, pre/failed get executed, lock released
- Add hookscript that attaches/detaches unsnapshottable disk (with --skiplock)
  - snapshotting works, attach/detach works, pre/post get executed
  - restoring works, detached disk is detached after restoring
- pre-snapshot hookscript exits with code > 0
  - only pre-snapshot gets executed and fails, snapshot fails, lock gets released
- Taking snapshot of template
  - fails, no hookscripts get executed
- execute commands in VM in pre/post via qm guest exec during snapshotting
  - snapshot succeeds, commands get executed, pre/post executed

Changes from v1:
- added failed-snapshot hook that runs after a failed snapshot
  - this enables users to revert any changes made in pre-snapshot hooks
  in case of errors
- running cfs_update() after every hookscript invocation
- adjusted the call sites of exec_hookscript()
  - particularly interesting for pre-snapshot since some checks now run
  before the hook runs
- VM/CT config is now locked during hookscript execution

Thanks to Fiona and Fabian for their valuable input/help!


pve-guest-common:

Stefan Hanreich (1):
  partially fix #2530: snapshots: add pre/post/failed-snapshot hooks

 src/PVE/AbstractConfig.pm | 49 ++++++++++++++++++++++++++++++++++-----
 1 file changed, 43 insertions(+), 6 deletions(-)


pve-docs:

Stefan Hanreich (1):
  examples: add pre/post/failed-snapshot hooks to example hookscript

 examples/guest-example-hookscript.pl | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

-- 
2.30.2




^ permalink raw reply	[flat|nested] 7+ messages in thread

* [pve-devel] [PATCH pve-docs 1/1] examples: add pre/post/failed-snapshot hooks to example hookscript
  2022-12-12 13:43 [pve-devel] [PATCH v2 pve-guest-common/pve-docs] Add pre/post/failed-snapshot hooks Stefan Hanreich
@ 2022-12-12 13:43 ` Stefan Hanreich
  2022-12-12 13:43 ` [pve-devel] [PATCH pve-guest-common 1/1] partially fix #2530: snapshots: add pre/post/failed-snapshot hooks Stefan Hanreich
  1 sibling, 0 replies; 7+ messages in thread
From: Stefan Hanreich @ 2022-12-12 13:43 UTC (permalink / raw)
  To: pve-devel

Added a section for each new snapshot hook to the example hookscript,
as well as a short comment explaining when the respective section gets
executed.

Signed-off-by: Stefan Hanreich <s.hanreich@proxmox.com>
---
 examples/guest-example-hookscript.pl | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/examples/guest-example-hookscript.pl b/examples/guest-example-hookscript.pl
index adeed59..1167c85 100755
--- a/examples/guest-example-hookscript.pl
+++ b/examples/guest-example-hookscript.pl
@@ -54,6 +54,27 @@ if ($phase eq 'pre-start') {
 
     print "$vmid stopped. Doing cleanup.\n";
 
+} elsif ($phase eq 'pre-snapshot') {
+
+    # Phase 'pre-snapshot' will be executed before taking a snapshot of
+    # the guest (via UI or CLI)
+
+    print "$vmid will be snapshotted.\n";
+
+} elsif ($phase eq 'post-snapshot') {
+
+    # Phase 'post-snapshot' will be executed after taking a snapshot of
+    # the guest (via UI or CLI)
+
+    print "$vmid has been successfully snapshotted.\n";
+
+} elsif ($phase eq 'failed-snapshot') {
+
+    # Phase 'failed-snapshot' will be executed when taking a snapshot of
+    # the guest fails and 'pre-snapshot' already ran (via UI or CLI)
+
+    print "$vmid snapshot failed.\n";
+
 } else {
     die "got unknown phase '$phase'\n";
 }
-- 
2.30.2




^ permalink raw reply	[flat|nested] 7+ messages in thread

* [pve-devel] [PATCH pve-guest-common 1/1] partially fix #2530: snapshots: add pre/post/failed-snapshot hooks
  2022-12-12 13:43 [pve-devel] [PATCH v2 pve-guest-common/pve-docs] Add pre/post/failed-snapshot hooks Stefan Hanreich
  2022-12-12 13:43 ` [pve-devel] [PATCH pve-docs 1/1] examples: add pre/post/failed-snapshot hooks to example hookscript Stefan Hanreich
@ 2022-12-12 13:43 ` Stefan Hanreich
  2022-12-21 10:44   ` Fabian Grünbichler
  1 sibling, 1 reply; 7+ messages in thread
From: Stefan Hanreich @ 2022-12-12 13:43 UTC (permalink / raw)
  To: pve-devel

This commit adds hooks to the snapshotting process, which can be used
to run additional setup scripts to prepare the VM for snapshotting.

Examples for use cases include:
* forcing processes to flush their writes
* blocking processes from writing
* altering the configuration of the VM to make snapshotting possible

The prepare step has been split into two parts, so the configuration
can be locked a bit earlier during the snapshotting process. Doing it
this way ensures that the configuration is already locked during the
pre-snapshot hook. Because of this split, the VM config gets written
in two stages now, rather than one.

In case of failure during the preparation step - after the lock is
written - error handling has been added so the lock gets released
properly. The failed-snapshot hook runs when the snapshot fails, if
the pre-snapshot hook ran already. This enables users to revert any
changes done during the pre-snapshot hookscript.

The preparation step assumes that the hook does not convert the
current VM into a template, which is why the basic checks are not
re-run after the pre-snapshot hook. The storage check runs after the
pre-snapshot hook, because the hook might get used to setup the
storage for snapshotting. If the hook would run after the storage
checks, this becomes impossible.

cfs_update() gets called after every invocation of a hookscript, since
it is impossible to know which changes get made by the hookscript.
Doing this ensures that we see the updated state of the CFS after the
hookscript got invoked.

Signed-off-by: Stefan Hanreich <s.hanreich@proxmox.com>
---
 src/PVE/AbstractConfig.pm | 49 ++++++++++++++++++++++++++++++++++-----
 1 file changed, 43 insertions(+), 6 deletions(-)

diff --git a/src/PVE/AbstractConfig.pm b/src/PVE/AbstractConfig.pm
index a0c0bc6..3bff600 100644
--- a/src/PVE/AbstractConfig.pm
+++ b/src/PVE/AbstractConfig.pm
@@ -710,8 +710,7 @@ sub __snapshot_prepare {
 
     my $snap;
 
-    my $updatefn =  sub {
-
+    my $run_checks = sub {
 	my $conf = $class->load_config($vmid);
 
 	die "you can't take a snapshot if it's a template\n"
@@ -721,15 +720,21 @@ sub __snapshot_prepare {
 
 	$conf->{lock} = 'snapshot';
 
-	my $snapshots = $conf->{snapshots};
-
 	die "snapshot name '$snapname' already used\n"
-	    if defined($snapshots->{$snapname});
+	    if defined($conf->{snapshots}->{$snapname});
+
+	$class->write_config($vmid, $conf);
+    };
 
+    my $updatefn = sub {
+	my $conf = $class->load_config($vmid);
 	my $storecfg = PVE::Storage::config();
+
 	die "snapshot feature is not available\n"
 	    if !$class->has_feature('snapshot', $conf, $storecfg, undef, undef, $snapname eq 'vzdump');
 
+	my $snapshots = $conf->{snapshots};
+
 	for my $snap (sort keys %$snapshots) {
 	    my $parent_name = $snapshots->{$snap}->{parent} // '';
 	    if ($snapname eq $parent_name) {
@@ -753,7 +758,32 @@ sub __snapshot_prepare {
 	$class->write_config($vmid, $conf);
     };
 
-    $class->lock_config($vmid, $updatefn);
+    $class->lock_config($vmid, $run_checks);
+
+    eval {
+	my $conf = $class->load_config($vmid);
+	PVE::GuestHelpers::exec_hookscript($conf, $vmid, "pre-snapshot", 1);
+    };
+    my $err = $@;
+
+    PVE::Cluster::cfs_update();
+
+    if ($err) {
+	$class->remove_lock($vmid, 'snapshot');
+	die $err;
+    }
+
+    eval {
+	$class->lock_config($vmid, $updatefn);
+    };
+    if (my $err = $@) {
+	my $conf = $class->load_config($vmid);
+	PVE::GuestHelpers::exec_hookscript($conf, $vmid, "failed-snapshot");
+	PVE::Cluster::cfs_update();
+
+	$class->remove_lock($vmid, 'snapshot');
+	die $err;
+    }
 
     return $snap;
 }
@@ -837,11 +867,18 @@ sub snapshot_create {
 
     if ($err) {
 	warn "snapshot create failed: starting cleanup\n";
+
+	PVE::GuestHelpers::exec_hookscript($conf, $vmid, "failed-snapshot");
+	PVE::Cluster::cfs_update();
+
 	eval { $class->snapshot_delete($vmid, $snapname, 1, $drivehash); };
 	warn "$@" if $@;
 	die "$err\n";
     }
 
+    PVE::GuestHelpers::exec_hookscript($conf, $vmid, "post-snapshot");
+    PVE::Cluster::cfs_update();
+
     $class->__snapshot_commit($vmid, $snapname);
 }
 
-- 
2.30.2




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [pve-devel] [PATCH pve-guest-common 1/1] partially fix #2530: snapshots: add pre/post/failed-snapshot hooks
  2022-12-12 13:43 ` [pve-devel] [PATCH pve-guest-common 1/1] partially fix #2530: snapshots: add pre/post/failed-snapshot hooks Stefan Hanreich
@ 2022-12-21 10:44   ` Fabian Grünbichler
  2022-12-21 11:26     ` Stefan Hanreich
  0 siblings, 1 reply; 7+ messages in thread
From: Fabian Grünbichler @ 2022-12-21 10:44 UTC (permalink / raw)
  To: Proxmox VE development discussion

this is v2, right? ;)

On December 12, 2022 2:43 pm, Stefan Hanreich wrote:
> This commit adds hooks to the snapshotting process, which can be used
> to run additional setup scripts to prepare the VM for snapshotting.
> 
> Examples for use cases include:
> * forcing processes to flush their writes
> * blocking processes from writing
> * altering the configuration of the VM to make snapshotting possible
> 
> The prepare step has been split into two parts, so the configuration
> can be locked a bit earlier during the snapshotting process. Doing it
> this way ensures that the configuration is already locked during the
> pre-snapshot hook. Because of this split, the VM config gets written
> in two stages now, rather than one.
> 
> In case of failure during the preparation step - after the lock is
> written - error handling has been added so the lock gets released
> properly. The failed-snapshot hook runs when the snapshot fails, if
> the pre-snapshot hook ran already. This enables users to revert any
> changes done during the pre-snapshot hookscript.

see below
 
> The preparation step assumes that the hook does not convert the
> current VM into a template, which is why the basic checks are not
> re-run after the pre-snapshot hook. The storage check runs after the
> pre-snapshot hook, because the hook might get used to setup the
> storage for snapshotting. If the hook would run after the storage
> checks, this becomes impossible.
> 
> cfs_update() gets called after every invocation of a hookscript, since
> it is impossible to know which changes get made by the hookscript.
> Doing this ensures that we see the updated state of the CFS after the
> hookscript got invoked.
> 
> Signed-off-by: Stefan Hanreich <s.hanreich@proxmox.com>
> ---
>  src/PVE/AbstractConfig.pm | 49 ++++++++++++++++++++++++++++++++++-----
>  1 file changed, 43 insertions(+), 6 deletions(-)
> 
> diff --git a/src/PVE/AbstractConfig.pm b/src/PVE/AbstractConfig.pm
> index a0c0bc6..3bff600 100644
> --- a/src/PVE/AbstractConfig.pm
> +++ b/src/PVE/AbstractConfig.pm
> @@ -710,8 +710,7 @@ sub __snapshot_prepare {
>  
>      my $snap;
>  
> -    my $updatefn =  sub {
> -
> +    my $run_checks = sub {
>  	my $conf = $class->load_config($vmid);
>  
>  	die "you can't take a snapshot if it's a template\n"
> @@ -721,15 +720,21 @@ sub __snapshot_prepare {
>  
>  	$conf->{lock} = 'snapshot';
>  
> -	my $snapshots = $conf->{snapshots};
> -
>  	die "snapshot name '$snapname' already used\n"
> -	    if defined($snapshots->{$snapname});
> +	    if defined($conf->{snapshots}->{$snapname});
> +
> +	$class->write_config($vmid, $conf);
> +    };
>  
> +    my $updatefn = sub {
> +	my $conf = $class->load_config($vmid);
>  	my $storecfg = PVE::Storage::config();
> +
>  	die "snapshot feature is not available\n"
>  	    if !$class->has_feature('snapshot', $conf, $storecfg, undef, undef, $snapname eq 'vzdump');
>  
> +	my $snapshots = $conf->{snapshots};
> +
>  	for my $snap (sort keys %$snapshots) {
>  	    my $parent_name = $snapshots->{$snap}->{parent} // '';
>  	    if ($snapname eq $parent_name) {
> @@ -753,7 +758,32 @@ sub __snapshot_prepare {
>  	$class->write_config($vmid, $conf);
>      };
>  
> -    $class->lock_config($vmid, $updatefn);
> +    $class->lock_config($vmid, $run_checks);
> +
> +    eval {
> +	my $conf = $class->load_config($vmid);
> +	PVE::GuestHelpers::exec_hookscript($conf, $vmid, "pre-snapshot", 1);
> +    };
> +    my $err = $@;
> +
> +    PVE::Cluster::cfs_update();
> +
> +    if ($err) {
> +	$class->remove_lock($vmid, 'snapshot');
> +	die $err;
> +    }
> +

I wonder if we don't also want to call the 'failed-snapshot' phase when just the
pre-snapshot invocation failed? might be possible to combine the error handling
then, although I am not sure it makes it more readable if combined..

> +   
> +    if (my $err = $@) {
> +	my $conf = $class->load_config($vmid);
> +	PVE::GuestHelpers::exec_hookscript($conf, $vmid, "failed-snapshot");

this exec_hookscript needs to be inside an eval {}, with warn in case it fails..

also, this call here happens when preparing for making the snapshot, after
possibly saving the VM state, but before taking the volume snapshots..

> +	PVE::Cluster::cfs_update();
> +
> +	$class->remove_lock($vmid, 'snapshot');
> +	die $err;
> +    }
>  
>      return $snap;
>  }
> @@ -837,11 +867,18 @@ sub snapshot_create {
>  
>      if ($err) {
>  	warn "snapshot create failed: starting cleanup\n";
> +
> +	PVE::GuestHelpers::exec_hookscript($conf, $vmid, "failed-snapshot");

eval + warn as well

this call here happens when the volume snapshots might or might not have been
created already (depending on what exactly the error cause is).

> +	PVE::Cluster::cfs_update();
> +
>  	eval { $class->snapshot_delete($vmid, $snapname, 1, $drivehash); };
>  	warn "$@" if $@;
>  	die "$err\n";
>      }
>  
> +    PVE::GuestHelpers::exec_hookscript($conf, $vmid, "post-snapshot");

and here we have a similar issue (no eval), what should happen if post-snapshot
fails?

A die immediately (very likely wrong, current)
B eval + warn but proceed with commit (possibly leaving leftover hook changes around)
C eval + warn, call failed-snapshot but proceed with commit (gives the
  hookscript a chance to cleanup, but how does it differentiate between the
  different failed-snapshot call sites?)
D eval + delete snapshot (seems suboptimal)
E eval + call failed-snapshot + delete snapshot (same, and also the issue of the
  hookscript being able to know what's going on again)

B and C seem most sensible to me, but C adds to the issue of "missing
failed-snapshot context", depending on what the hookscript is doing..

one way to pass information is via the environment, we do that for the migration
case already (setting PVE_MIGRATED_FROM, so that the pre-start/post-start
hookscript can know the start happens in a migration context, and where to
(possibly) find the guest config..

for example, we could set PVE_SNAPSHOT_PHASE here, and have prepare/commit/post
as sub-phases, or even pass a list of volumes already snapshotted (or created,
in case of vmstate), or ..

obviously setting the environment is only allowed in a forked worker context,
else it would affect the next API endpoint handled by the pveproxy/pvedaemon/..
process, so it might be worth double-checking and cleaning up to avoid
side-effects with replication/migration/.. if we go down that route..

> +    PVE::Cluster::cfs_update();
> +
>      $class->__snapshot_commit($vmid, $snapname);
>  }
>  
> -- 
> 2.30.2
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> 
> 




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [pve-devel] [PATCH pve-guest-common 1/1] partially fix #2530: snapshots: add pre/post/failed-snapshot hooks
  2022-12-21 10:44   ` Fabian Grünbichler
@ 2022-12-21 11:26     ` Stefan Hanreich
  2022-12-21 12:41       ` Fabian Grünbichler
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Hanreich @ 2022-12-21 11:26 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fabian Grünbichler



On 12/21/22 11:44, Fabian Grünbichler wrote:
> this is v2, right? ;)

Oh no - for some reason it's only in the cover letter..

> 
> On December 12, 2022 2:43 pm, Stefan Hanreich wrote:
>> This commit adds hooks to the snapshotting process, which can be used
>> to run additional setup scripts to prepare the VM for snapshotting.
>>
>> Examples for use cases include:
>> * forcing processes to flush their writes
>> * blocking processes from writing
>> * altering the configuration of the VM to make snapshotting possible
>>
>> The prepare step has been split into two parts, so the configuration
>> can be locked a bit earlier during the snapshotting process. Doing it
>> this way ensures that the configuration is already locked during the
>> pre-snapshot hook. Because of this split, the VM config gets written
>> in two stages now, rather than one.
>>
>> In case of failure during the preparation step - after the lock is
>> written - error handling has been added so the lock gets released
>> properly. The failed-snapshot hook runs when the snapshot fails, if
>> the pre-snapshot hook ran already. This enables users to revert any
>> changes done during the pre-snapshot hookscript.
> 
> see below
>   
>> The preparation step assumes that the hook does not convert the
>> current VM into a template, which is why the basic checks are not
>> re-run after the pre-snapshot hook. The storage check runs after the
>> pre-snapshot hook, because the hook might get used to setup the
>> storage for snapshotting. If the hook would run after the storage
>> checks, this becomes impossible.
>>
>> cfs_update() gets called after every invocation of a hookscript, since
>> it is impossible to know which changes get made by the hookscript.
>> Doing this ensures that we see the updated state of the CFS after the
>> hookscript got invoked.
>>
>> Signed-off-by: Stefan Hanreich <s.hanreich@proxmox.com>
>> ---
>>   src/PVE/AbstractConfig.pm | 49 ++++++++++++++++++++++++++++++++++-----
>>   1 file changed, 43 insertions(+), 6 deletions(-)
>>
>> diff --git a/src/PVE/AbstractConfig.pm b/src/PVE/AbstractConfig.pm
>> index a0c0bc6..3bff600 100644
>> --- a/src/PVE/AbstractConfig.pm
>> +++ b/src/PVE/AbstractConfig.pm
>> @@ -710,8 +710,7 @@ sub __snapshot_prepare {
>>   
>>       my $snap;
>>   
>> -    my $updatefn =  sub {
>> -
>> +    my $run_checks = sub {
>>   	my $conf = $class->load_config($vmid);
>>   
>>   	die "you can't take a snapshot if it's a template\n"
>> @@ -721,15 +720,21 @@ sub __snapshot_prepare {
>>   
>>   	$conf->{lock} = 'snapshot';
>>   
>> -	my $snapshots = $conf->{snapshots};
>> -
>>   	die "snapshot name '$snapname' already used\n"
>> -	    if defined($snapshots->{$snapname});
>> +	    if defined($conf->{snapshots}->{$snapname});
>> +
>> +	$class->write_config($vmid, $conf);
>> +    };
>>   
>> +    my $updatefn = sub {
>> +	my $conf = $class->load_config($vmid);
>>   	my $storecfg = PVE::Storage::config();
>> +
>>   	die "snapshot feature is not available\n"
>>   	    if !$class->has_feature('snapshot', $conf, $storecfg, undef, undef, $snapname eq 'vzdump');
>>   
>> +	my $snapshots = $conf->{snapshots};
>> +
>>   	for my $snap (sort keys %$snapshots) {
>>   	    my $parent_name = $snapshots->{$snap}->{parent} // '';
>>   	    if ($snapname eq $parent_name) {
>> @@ -753,7 +758,32 @@ sub __snapshot_prepare {
>>   	$class->write_config($vmid, $conf);
>>       };
>>   
>> -    $class->lock_config($vmid, $updatefn);
>> +    $class->lock_config($vmid, $run_checks);
>> +
>> +    eval {
>> +	my $conf = $class->load_config($vmid);
>> +	PVE::GuestHelpers::exec_hookscript($conf, $vmid, "pre-snapshot", 1);
>> +    };
>> +    my $err = $@;
>> +
>> +    PVE::Cluster::cfs_update();
>> +
>> +    if ($err) {
>> +	$class->remove_lock($vmid, 'snapshot');
>> +	die $err;
>> +    }
>> +
> 
> I wonder if we don't also want to call the 'failed-snapshot' phase when just the
> pre-snapshot invocation failed? might be possible to combine the error handling
> then, although I am not sure it makes it more readable if combined..
> 

I thought about it, but I thought that if the user die's in his perl 
script he should be able to run any cleanup code before that. This 
doesn't consider any problems in the hookscript unforeseen by the user 
though, so I think your approach is better, since it is easier to use. 
This places less burden on the author of the hookscript. Might make the 
code a bit more convoluted though (depending on how we want to handle 
errors in failed-snapshot), but the upsides are way better imo.

One thing that would be easier with making the user do his cleanup in 
pre-snapshot would be that the pre-snapshot hook knows exactly what 
failed in pre-snapshot, so cleanup-code could use that information to 
skip certain steps. But again, it assumes that pre-snapshot will 
properly handle any possible error, which might be a bit much to assume.

>> +
>> +    if (my $err = $@) {
>> +	my $conf = $class->load_config($vmid);
>> +	PVE::GuestHelpers::exec_hookscript($conf, $vmid, "failed-snapshot");
> 
> this exec_hookscript needs to be inside an eval {}, with warn in case it fails..

Isn't this already handled by the exec_hookscript function, since I am 
not passing $stop_on_error ? It should exit with warn instead of die 
then. Maybe I am misunderstanding something.

See:
https://git.proxmox.com/?p=pve-guest-common.git;a=blob;f=src/PVE/GuestHelpers.pm;h=b4ccbaa73a3fd08ba5d34350ebd57ee31355035b;hb=HEAD#l125

> 
> also, this call here happens when preparing for making the snapshot, after
> possibly saving the VM state, but before taking the volume snapshots..
> 

This should be alleviated by the envvars you proposed below, because 
then we could pass that information to the hookscript and the user 
decides what to do with this information, right?

>> +	PVE::Cluster::cfs_update();
>> +
>> +	$class->remove_lock($vmid, 'snapshot');
>> +	die $err;
>> +    }
>>   
>>       return $snap;
>>   }
>> @@ -837,11 +867,18 @@ sub snapshot_create {
>>   
>>       if ($err) {
>>   	warn "snapshot create failed: starting cleanup\n";
>> +
>> +	PVE::GuestHelpers::exec_hookscript($conf, $vmid, "failed-snapshot");
> 
> eval + warn as well

see above

> 
> this call here happens when the volume snapshots might or might not have been
> created already (depending on what exactly the error cause is).
>

same here - should be alleviated by adding envvars, right?

>> +	PVE::Cluster::cfs_update();
>> +
>>   	eval { $class->snapshot_delete($vmid, $snapname, 1, $drivehash); };
>>   	warn "$@" if $@;
>>   	die "$err\n";
>>       }
>>   
>> +    PVE::GuestHelpers::exec_hookscript($conf, $vmid, "post-snapshot");
> 
> and here we have a similar issue (no eval), what should happen if post-snapshot
> fails?
> 
> A die immediately (very likely wrong, current)
> B eval + warn but proceed with commit (possibly leaving leftover hook changes around)
> C eval + warn, call failed-snapshot but proceed with commit (gives the
>    hookscript a chance to cleanup, but how does it differentiate between the
>    different failed-snapshot call sites?)
> D eval + delete snapshot (seems suboptimal)
> E eval + call failed-snapshot + delete snapshot (same, and also the issue of the
>    hookscript being able to know what's going on again)
> 
> B and C seem most sensible to me, but C adds to the issue of "missing
> failed-snapshot context", depending on what the hookscript is doing..

again, see above - I think it currently actually behaves like B because 
of how exec_hookscript works if I understand correctly.

Similar idea to not running the failed-snapshot hook if pre-snapshot 
fails. I thought that the user should be aware that his hookscript 
failed at some point and run possible cleanup code before returning. As 
I said above that's probably a worse idea than just running 
failed-snapshot. It also enables the user to just have all the cleanup 
handled by failed-snapshot instead of having to add it to pre/post/failed.

> 
> one way to pass information is via the environment, we do that for the migration
> case already (setting PVE_MIGRATED_FROM, so that the pre-start/post-start
> hookscript can know the start happens in a migration context, and where to
> (possibly) find the guest config..
> 
> for example, we could set PVE_SNAPSHOT_PHASE here, and have prepare/commit/post
> as sub-phases, or even pass a list of volumes already snapshotted (or created,
> in case of vmstate), or 

That's a good idea, I'll look into sensible values for 
PVE_SNAPSHOT_PHASE as well as look into how we could pass the 
information about volumes to the hookscript best.

> obviously setting the environment is only allowed in a forked worker context,
> else it would affect the next API endpoint handled by the pveproxy/pvedaemon/..
> process, so it might be worth double-checking and cleaning up to avoid
> side-effects with replication/migration/.. if we go down that route..
> 

very good remark - thanks. I would not have thought of it even though it 
is kinda obvious now you pointed it out.

>> +    PVE::Cluster::cfs_update();
>> +
>>       $class->__snapshot_commit($vmid, $snapname);
>>   }
>>   
>> -- 
>> 2.30.2
>>
>>
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>
>>
>>
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> 

Many thanks for the review!




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [pve-devel] [PATCH pve-guest-common 1/1] partially fix #2530: snapshots: add pre/post/failed-snapshot hooks
  2022-12-21 11:26     ` Stefan Hanreich
@ 2022-12-21 12:41       ` Fabian Grünbichler
  2022-12-21 12:57         ` Stefan Hanreich
  0 siblings, 1 reply; 7+ messages in thread
From: Fabian Grünbichler @ 2022-12-21 12:41 UTC (permalink / raw)
  To: Proxmox VE development discussion, Stefan Hanreich

On December 21, 2022 12:26 pm, Stefan Hanreich wrote:
> 
> 
> On 12/21/22 11:44, Fabian Grünbichler wrote:
>> this is v2, right? ;)
> 
> Oh no - for some reason it's only in the cover letter..
> 
>> 
>> On December 12, 2022 2:43 pm, Stefan Hanreich wrote:
>>> This commit adds hooks to the snapshotting process, which can be used
>>> to run additional setup scripts to prepare the VM for snapshotting.
>>>
>>> Examples for use cases include:
>>> * forcing processes to flush their writes
>>> * blocking processes from writing
>>> * altering the configuration of the VM to make snapshotting possible
>>>
>>> The prepare step has been split into two parts, so the configuration
>>> can be locked a bit earlier during the snapshotting process. Doing it
>>> this way ensures that the configuration is already locked during the
>>> pre-snapshot hook. Because of this split, the VM config gets written
>>> in two stages now, rather than one.
>>>
>>> In case of failure during the preparation step - after the lock is
>>> written - error handling has been added so the lock gets released
>>> properly. The failed-snapshot hook runs when the snapshot fails, if
>>> the pre-snapshot hook ran already. This enables users to revert any
>>> changes done during the pre-snapshot hookscript.
>> 
>> see below
>>   
>>> The preparation step assumes that the hook does not convert the
>>> current VM into a template, which is why the basic checks are not
>>> re-run after the pre-snapshot hook. The storage check runs after the
>>> pre-snapshot hook, because the hook might get used to setup the
>>> storage for snapshotting. If the hook would run after the storage
>>> checks, this becomes impossible.
>>>
>>> cfs_update() gets called after every invocation of a hookscript, since
>>> it is impossible to know which changes get made by the hookscript.
>>> Doing this ensures that we see the updated state of the CFS after the
>>> hookscript got invoked.
>>>
>>> Signed-off-by: Stefan Hanreich <s.hanreich@proxmox.com>
>>> ---
>>>   src/PVE/AbstractConfig.pm | 49 ++++++++++++++++++++++++++++++++++-----
>>>   1 file changed, 43 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/src/PVE/AbstractConfig.pm b/src/PVE/AbstractConfig.pm
>>> index a0c0bc6..3bff600 100644
>>> --- a/src/PVE/AbstractConfig.pm
>>> +++ b/src/PVE/AbstractConfig.pm
>>> @@ -710,8 +710,7 @@ sub __snapshot_prepare {
>>>   
>>>       my $snap;
>>>   
>>> -    my $updatefn =  sub {
>>> -
>>> +    my $run_checks = sub {
>>>   	my $conf = $class->load_config($vmid);
>>>   
>>>   	die "you can't take a snapshot if it's a template\n"
>>> @@ -721,15 +720,21 @@ sub __snapshot_prepare {
>>>   
>>>   	$conf->{lock} = 'snapshot';
>>>   
>>> -	my $snapshots = $conf->{snapshots};
>>> -
>>>   	die "snapshot name '$snapname' already used\n"
>>> -	    if defined($snapshots->{$snapname});
>>> +	    if defined($conf->{snapshots}->{$snapname});
>>> +
>>> +	$class->write_config($vmid, $conf);
>>> +    };
>>>   
>>> +    my $updatefn = sub {
>>> +	my $conf = $class->load_config($vmid);
>>>   	my $storecfg = PVE::Storage::config();
>>> +
>>>   	die "snapshot feature is not available\n"
>>>   	    if !$class->has_feature('snapshot', $conf, $storecfg, undef, undef, $snapname eq 'vzdump');
>>>   
>>> +	my $snapshots = $conf->{snapshots};
>>> +
>>>   	for my $snap (sort keys %$snapshots) {
>>>   	    my $parent_name = $snapshots->{$snap}->{parent} // '';
>>>   	    if ($snapname eq $parent_name) {
>>> @@ -753,7 +758,32 @@ sub __snapshot_prepare {
>>>   	$class->write_config($vmid, $conf);
>>>       };
>>>   
>>> -    $class->lock_config($vmid, $updatefn);
>>> +    $class->lock_config($vmid, $run_checks);
>>> +
>>> +    eval {
>>> +	my $conf = $class->load_config($vmid);
>>> +	PVE::GuestHelpers::exec_hookscript($conf, $vmid, "pre-snapshot", 1);
>>> +    };
>>> +    my $err = $@;
>>> +
>>> +    PVE::Cluster::cfs_update();
>>> +
>>> +    if ($err) {
>>> +	$class->remove_lock($vmid, 'snapshot');
>>> +	die $err;
>>> +    }
>>> +
>> 
>> I wonder if we don't also want to call the 'failed-snapshot' phase when just the
>> pre-snapshot invocation failed? might be possible to combine the error handling
>> then, although I am not sure it makes it more readable if combined..
>> 
> 
> I thought about it, but I thought that if the user die's in his perl 
> script he should be able to run any cleanup code before that. This 
> doesn't consider any problems in the hookscript unforeseen by the user 
> though, so I think your approach is better, since it is easier to use. 
> This places less burden on the author of the hookscript. Might make the 
> code a bit more convoluted though (depending on how we want to handle 
> errors in failed-snapshot), but the upsides are way better imo.
> 
> One thing that would be easier with making the user do his cleanup in 
> pre-snapshot would be that the pre-snapshot hook knows exactly what 
> failed in pre-snapshot, so cleanup-code could use that information to 
> skip certain steps. But again, it assumes that pre-snapshot will 
> properly handle any possible error, which might be a bit much to assume.

yes, there is always the question of whether the hookscript does (proper) error
handling.. but if it does, an additional call to failed-snapshot shouldn't hurt
since that should be covered as well (in this case, if we pass the information
that prepare failed, it could be a no-op for example).

>>> +
>>> +    if (my $err = $@) {
>>> +	my $conf = $class->load_config($vmid);
>>> +	PVE::GuestHelpers::exec_hookscript($conf, $vmid, "failed-snapshot");
>> 
>> this exec_hookscript needs to be inside an eval {}, with warn in case it fails..
> 
> Isn't this already handled by the exec_hookscript function, since I am 
> not passing $stop_on_error ? It should exit with warn instead of die 
> then. Maybe I am misunderstanding something.
> 
> See:
> https://git.proxmox.com/?p=pve-guest-common.git;a=blob;f=src/PVE/GuestHelpers.pm;h=b4ccbaa73a3fd08ba5d34350ebd57ee31355035b;hb=HEAD#l125

ah yeah, missed that - thanks for pointing it out :)

>> 
>> also, this call here happens when preparing for making the snapshot, after
>> possibly saving the VM state, but before taking the volume snapshots..
>> 
> 
> This should be alleviated by the envvars you proposed below, because 
> then we could pass that information to the hookscript and the user 
> decides what to do with this information, right?

yes exactly, this is part of the issue that could be solved by passing more
information to the hook script.

>>> +	PVE::Cluster::cfs_update();
>>> +
>>> +	$class->remove_lock($vmid, 'snapshot');
>>> +	die $err;
>>> +    }
>>>   
>>>       return $snap;
>>>   }
>>> @@ -837,11 +867,18 @@ sub snapshot_create {
>>>   
>>>       if ($err) {
>>>   	warn "snapshot create failed: starting cleanup\n";
>>> +
>>> +	PVE::GuestHelpers::exec_hookscript($conf, $vmid, "failed-snapshot");
>> 
>> eval + warn as well
> 
> see above

indeed

>> 
>> this call here happens when the volume snapshots might or might not have been
>> created already (depending on what exactly the error cause is).
>>
> 
> same here - should be alleviated by adding envvars, right?
> 
>>> +	PVE::Cluster::cfs_update();
>>> +
>>>   	eval { $class->snapshot_delete($vmid, $snapname, 1, $drivehash); };
>>>   	warn "$@" if $@;
>>>   	die "$err\n";
>>>       }
>>>   
>>> +    PVE::GuestHelpers::exec_hookscript($conf, $vmid, "post-snapshot");
>> 
>> and here we have a similar issue (no eval), what should happen if post-snapshot
>> fails?
>> 
>> A die immediately (very likely wrong, current)
>> B eval + warn but proceed with commit (possibly leaving leftover hook changes around)
>> C eval + warn, call failed-snapshot but proceed with commit (gives the
>>    hookscript a chance to cleanup, but how does it differentiate between the
>>    different failed-snapshot call sites?)
>> D eval + delete snapshot (seems suboptimal)
>> E eval + call failed-snapshot + delete snapshot (same, and also the issue of the
>>    hookscript being able to know what's going on again)
>> 
>> B and C seem most sensible to me, but C adds to the issue of "missing
>> failed-snapshot context", depending on what the hookscript is doing..
> 
> again, see above - I think it currently actually behaves like B because 
> of how exec_hookscript works if I understand correctly.

yes. but the question is whether that is good ;) I guess switching to C would
require passing stop_on_error and wrapping in eval..
 
> Similar idea to not running the failed-snapshot hook if pre-snapshot 
> fails. I thought that the user should be aware that his hookscript 
> failed at some point and run possible cleanup code before returning. As 
> I said above that's probably a worse idea than just running 
> failed-snapshot. It also enables the user to just have all the cleanup 
> handled by failed-snapshot instead of having to add it to pre/post/failed.

like mentioned above, doing

if !pre {
    failed
}

if !snapshot {
    failed
}

if !post {
    failed
}

doesn't stop the user from handling all errors in the phase itself, and then
doing `exit 1` to fail the hookscript, with the failed phase only handling
actual "snapshotting didn't work" errors - as long as we pass along *why* we are
calling the failed phase.

>> one way to pass information is via the environment, we do that for the migration
>> case already (setting PVE_MIGRATED_FROM, so that the pre-start/post-start
>> hookscript can know the start happens in a migration context, and where to
>> (possibly) find the guest config..
>> 
>> for example, we could set PVE_SNAPSHOT_PHASE here, and have prepare/commit/post
>> as sub-phases, or even pass a list of volumes already snapshotted (or created,
>> in case of vmstate), or 
> 
> That's a good idea, I'll look into sensible values for 
> PVE_SNAPSHOT_PHASE as well as look into how we could pass the 
> information about volumes to the hookscript best.
> 
>> obviously setting the environment is only allowed in a forked worker context,
>> else it would affect the next API endpoint handled by the pveproxy/pvedaemon/..
>> process, so it might be worth double-checking and cleaning up to avoid
>> side-effects with replication/migration/.. if we go down that route..
>> 
> 
> very good remark - thanks. I would not have thought of it even though it 
> is kinda obvious now you pointed it out.

well, I just realized that with replication/migration we are not doing a full
guest snapshot anyway, so that should be irrelevant ;) it just leaves container
backups in snapshot mode where we need to be careful to clean the env so that
the next container's hookscript execution in a single vzdump job doesn't see
wrong information.




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [pve-devel] [PATCH pve-guest-common 1/1] partially fix #2530: snapshots: add pre/post/failed-snapshot hooks
  2022-12-21 12:41       ` Fabian Grünbichler
@ 2022-12-21 12:57         ` Stefan Hanreich
  0 siblings, 0 replies; 7+ messages in thread
From: Stefan Hanreich @ 2022-12-21 12:57 UTC (permalink / raw)
  To: Fabian Grünbichler, Proxmox VE development discussion



On 12/21/22 13:41, Fabian Grünbichler wrote:
> On December 21, 2022 12:26 pm, Stefan Hanreich wrote:
>>
>>
>> On 12/21/22 11:44, Fabian Grünbichler wrote:
>>> this is v2, right? ;)
>>
>> Oh no - for some reason it's only in the cover letter..
>>
>>>
>>> On December 12, 2022 2:43 pm, Stefan Hanreich wrote:
>>>> This commit adds hooks to the snapshotting process, which can be used
>>>> to run additional setup scripts to prepare the VM for snapshotting.
>>>>
>>>> Examples for use cases include:
>>>> * forcing processes to flush their writes
>>>> * blocking processes from writing
>>>> * altering the configuration of the VM to make snapshotting possible
>>>>
>>>> The prepare step has been split into two parts, so the configuration
>>>> can be locked a bit earlier during the snapshotting process. Doing it
>>>> this way ensures that the configuration is already locked during the
>>>> pre-snapshot hook. Because of this split, the VM config gets written
>>>> in two stages now, rather than one.
>>>>
>>>> In case of failure during the preparation step - after the lock is
>>>> written - error handling has been added so the lock gets released
>>>> properly. The failed-snapshot hook runs when the snapshot fails, if
>>>> the pre-snapshot hook ran already. This enables users to revert any
>>>> changes done during the pre-snapshot hookscript.
>>>
>>> see below
>>>    
>>>> The preparation step assumes that the hook does not convert the
>>>> current VM into a template, which is why the basic checks are not
>>>> re-run after the pre-snapshot hook. The storage check runs after the
>>>> pre-snapshot hook, because the hook might get used to setup the
>>>> storage for snapshotting. If the hook would run after the storage
>>>> checks, this becomes impossible.
>>>>
>>>> cfs_update() gets called after every invocation of a hookscript, since
>>>> it is impossible to know which changes get made by the hookscript.
>>>> Doing this ensures that we see the updated state of the CFS after the
>>>> hookscript got invoked.
>>>>
>>>> Signed-off-by: Stefan Hanreich <s.hanreich@proxmox.com>
>>>> ---
>>>>    src/PVE/AbstractConfig.pm | 49 ++++++++++++++++++++++++++++++++++-----
>>>>    1 file changed, 43 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/src/PVE/AbstractConfig.pm b/src/PVE/AbstractConfig.pm
>>>> index a0c0bc6..3bff600 100644
>>>> --- a/src/PVE/AbstractConfig.pm
>>>> +++ b/src/PVE/AbstractConfig.pm
>>>> @@ -710,8 +710,7 @@ sub __snapshot_prepare {
>>>>    
>>>>        my $snap;
>>>>    
>>>> -    my $updatefn =  sub {
>>>> -
>>>> +    my $run_checks = sub {
>>>>    	my $conf = $class->load_config($vmid);
>>>>    
>>>>    	die "you can't take a snapshot if it's a template\n"
>>>> @@ -721,15 +720,21 @@ sub __snapshot_prepare {
>>>>    
>>>>    	$conf->{lock} = 'snapshot';
>>>>    
>>>> -	my $snapshots = $conf->{snapshots};
>>>> -
>>>>    	die "snapshot name '$snapname' already used\n"
>>>> -	    if defined($snapshots->{$snapname});
>>>> +	    if defined($conf->{snapshots}->{$snapname});
>>>> +
>>>> +	$class->write_config($vmid, $conf);
>>>> +    };
>>>>    
>>>> +    my $updatefn = sub {
>>>> +	my $conf = $class->load_config($vmid);
>>>>    	my $storecfg = PVE::Storage::config();
>>>> +
>>>>    	die "snapshot feature is not available\n"
>>>>    	    if !$class->has_feature('snapshot', $conf, $storecfg, undef, undef, $snapname eq 'vzdump');
>>>>    
>>>> +	my $snapshots = $conf->{snapshots};
>>>> +
>>>>    	for my $snap (sort keys %$snapshots) {
>>>>    	    my $parent_name = $snapshots->{$snap}->{parent} // '';
>>>>    	    if ($snapname eq $parent_name) {
>>>> @@ -753,7 +758,32 @@ sub __snapshot_prepare {
>>>>    	$class->write_config($vmid, $conf);
>>>>        };
>>>>    
>>>> -    $class->lock_config($vmid, $updatefn);
>>>> +    $class->lock_config($vmid, $run_checks);
>>>> +
>>>> +    eval {
>>>> +	my $conf = $class->load_config($vmid);
>>>> +	PVE::GuestHelpers::exec_hookscript($conf, $vmid, "pre-snapshot", 1);
>>>> +    };
>>>> +    my $err = $@;
>>>> +
>>>> +    PVE::Cluster::cfs_update();
>>>> +
>>>> +    if ($err) {
>>>> +	$class->remove_lock($vmid, 'snapshot');
>>>> +	die $err;
>>>> +    }
>>>> +
>>>
>>> I wonder if we don't also want to call the 'failed-snapshot' phase when just the
>>> pre-snapshot invocation failed? might be possible to combine the error handling
>>> then, although I am not sure it makes it more readable if combined..
>>>
>>
>> I thought about it, but I thought that if the user die's in his perl
>> script he should be able to run any cleanup code before that. This
>> doesn't consider any problems in the hookscript unforeseen by the user
>> though, so I think your approach is better, since it is easier to use.
>> This places less burden on the author of the hookscript. Might make the
>> code a bit more convoluted though (depending on how we want to handle
>> errors in failed-snapshot), but the upsides are way better imo.
>>
>> One thing that would be easier with making the user do his cleanup in
>> pre-snapshot would be that the pre-snapshot hook knows exactly what
>> failed in pre-snapshot, so cleanup-code could use that information to
>> skip certain steps. But again, it assumes that pre-snapshot will
>> properly handle any possible error, which might be a bit much to assume.
> 
> yes, there is always the question of whether the hookscript does (proper) error
> handling.. but if it does, an additional call to failed-snapshot shouldn't hurt
> since that should be covered as well (in this case, if we pass the information
> that prepare failed, it could be a no-op for example).
> 
>>>> +
>>>> +    if (my $err = $@) {
>>>> +	my $conf = $class->load_config($vmid);
>>>> +	PVE::GuestHelpers::exec_hookscript($conf, $vmid, "failed-snapshot");
>>>
>>> this exec_hookscript needs to be inside an eval {}, with warn in case it fails..
>>
>> Isn't this already handled by the exec_hookscript function, since I am
>> not passing $stop_on_error ? It should exit with warn instead of die
>> then. Maybe I am misunderstanding something.
>>
>> See:
>> https://git.proxmox.com/?p=pve-guest-common.git;a=blob;f=src/PVE/GuestHelpers.pm;h=b4ccbaa73a3fd08ba5d34350ebd57ee31355035b;hb=HEAD#l125
> 
> ah yeah, missed that - thanks for pointing it out :)
> 
>>>
>>> also, this call here happens when preparing for making the snapshot, after
>>> possibly saving the VM state, but before taking the volume snapshots..
>>>
>>
>> This should be alleviated by the envvars you proposed below, because
>> then we could pass that information to the hookscript and the user
>> decides what to do with this information, right?
> 
> yes exactly, this is part of the issue that could be solved by passing more
> information to the hook script.
> 
>>>> +	PVE::Cluster::cfs_update();
>>>> +
>>>> +	$class->remove_lock($vmid, 'snapshot');
>>>> +	die $err;
>>>> +    }
>>>>    
>>>>        return $snap;
>>>>    }
>>>> @@ -837,11 +867,18 @@ sub snapshot_create {
>>>>    
>>>>        if ($err) {
>>>>    	warn "snapshot create failed: starting cleanup\n";
>>>> +
>>>> +	PVE::GuestHelpers::exec_hookscript($conf, $vmid, "failed-snapshot");
>>>
>>> eval + warn as well
>>
>> see above
> 
> indeed
> 
>>>
>>> this call here happens when the volume snapshots might or might not have been
>>> created already (depending on what exactly the error cause is).
>>>
>>
>> same here - should be alleviated by adding envvars, right?
>>
>>>> +	PVE::Cluster::cfs_update();
>>>> +
>>>>    	eval { $class->snapshot_delete($vmid, $snapname, 1, $drivehash); };
>>>>    	warn "$@" if $@;
>>>>    	die "$err\n";
>>>>        }
>>>>    
>>>> +    PVE::GuestHelpers::exec_hookscript($conf, $vmid, "post-snapshot");
>>>
>>> and here we have a similar issue (no eval), what should happen if post-snapshot
>>> fails?
>>>
>>> A die immediately (very likely wrong, current)
>>> B eval + warn but proceed with commit (possibly leaving leftover hook changes around)
>>> C eval + warn, call failed-snapshot but proceed with commit (gives the
>>>     hookscript a chance to cleanup, but how does it differentiate between the
>>>     different failed-snapshot call sites?)
>>> D eval + delete snapshot (seems suboptimal)
>>> E eval + call failed-snapshot + delete snapshot (same, and also the issue of the
>>>     hookscript being able to know what's going on again)
>>>
>>> B and C seem most sensible to me, but C adds to the issue of "missing
>>> failed-snapshot context", depending on what the hookscript is doing..
>>
>> again, see above - I think it currently actually behaves like B because
>> of how exec_hookscript works if I understand correctly.
> 
> yes. but the question is whether that is good ;) I guess switching to C would
> require passing stop_on_error and wrapping in eval..
>    >> Similar idea to not running the failed-snapshot hook if pre-snapshot
>> fails. I thought that the user should be aware that his hookscript
>> failed at some point and run possible cleanup code before returning. As
>> I said above that's probably a worse idea than just running
>> failed-snapshot. It also enables the user to just have all the cleanup
>> handled by failed-snapshot instead of having to add it to pre/post/failed.
> 
> like mentioned above, doing
> 
> if !pre {
>      failed
> }
> 
> if !snapshot {
>      failed
> }
> 
> if !post {
>      failed
> }
> 
> doesn't stop the user from handling all errors in the phase itself, and then
> doing `exit 1` to fail the hookscript, with the failed phase only handling
> actual "snapshotting didn't work" errors - as long as we pass along *why* we are
> calling the failed phase.
> 

This sounds like the best way to implement this (and would then behave 
like you described in option C). I will implement it this way together 
with the envvars you proposed. Again, thanks for the feedback!

>>> one way to pass information is via the environment, we do that for the migration
>>> case already (setting PVE_MIGRATED_FROM, so that the pre-start/post-start
>>> hookscript can know the start happens in a migration context, and where to
>>> (possibly) find the guest config..
>>>
>>> for example, we could set PVE_SNAPSHOT_PHASE here, and have prepare/commit/post
>>> as sub-phases, or even pass a list of volumes already snapshotted (or created,
>>> in case of vmstate), or
>>
>> That's a good idea, I'll look into sensible values for
>> PVE_SNAPSHOT_PHASE as well as look into how we could pass the
>> information about volumes to the hookscript best.
>>
>>> obviously setting the environment is only allowed in a forked worker context,
>>> else it would affect the next API endpoint handled by the pveproxy/pvedaemon/..
>>> process, so it might be worth double-checking and cleaning up to avoid
>>> side-effects with replication/migration/.. if we go down that route..
>>>
>>
>> very good remark - thanks. I would not have thought of it even though it
>> is kinda obvious now you pointed it out.
> 
> well, I just realized that with replication/migration we are not doing a full
> guest snapshot anyway, so that should be irrelevant ;) it just leaves container
> backups in snapshot mode where we need to be careful to clean the env so that
> the next container's hookscript execution in a single vzdump job doesn't see
> wrong information.




^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-12-21 12:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-12 13:43 [pve-devel] [PATCH v2 pve-guest-common/pve-docs] Add pre/post/failed-snapshot hooks Stefan Hanreich
2022-12-12 13:43 ` [pve-devel] [PATCH pve-docs 1/1] examples: add pre/post/failed-snapshot hooks to example hookscript Stefan Hanreich
2022-12-12 13:43 ` [pve-devel] [PATCH pve-guest-common 1/1] partially fix #2530: snapshots: add pre/post/failed-snapshot hooks Stefan Hanreich
2022-12-21 10:44   ` Fabian Grünbichler
2022-12-21 11:26     ` Stefan Hanreich
2022-12-21 12:41       ` Fabian Grünbichler
2022-12-21 12:57         ` Stefan Hanreich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal