From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id B737891676 for ; Wed, 21 Dec 2022 11:44:43 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id A1B5166B9 for ; Wed, 21 Dec 2022 11:44:43 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Wed, 21 Dec 2022 11:44:42 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 1ABF5440DC for ; Wed, 21 Dec 2022 11:44:42 +0100 (CET) Date: Wed, 21 Dec 2022 11:44:34 +0100 From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= To: Proxmox VE development discussion References: <20221212134312.257662-1-s.hanreich@proxmox.com> <20221212134312.257662-3-s.hanreich@proxmox.com> In-Reply-To: <20221212134312.257662-3-s.hanreich@proxmox.com> MIME-Version: 1.0 User-Agent: astroid/0.16.0 (https://github.com/astroidmail/astroid) Message-Id: <1671617798.6ip8zjgw5r.astroid@yuna.none> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-SPAM-LEVEL: Spam detection results: 0 AWL 0.133 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com, abstractconfig.pm] Subject: Re: [pve-devel] [PATCH pve-guest-common 1/1] partially fix #2530: snapshots: add pre/post/failed-snapshot hooks X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Dec 2022 10:44:43 -0000 this is v2, right? ;) On December 12, 2022 2:43 pm, Stefan Hanreich wrote: > This commit adds hooks to the snapshotting process, which can be used > to run additional setup scripts to prepare the VM for snapshotting. >=20 > Examples for use cases include: > * forcing processes to flush their writes > * blocking processes from writing > * altering the configuration of the VM to make snapshotting possible >=20 > The prepare step has been split into two parts, so the configuration > can be locked a bit earlier during the snapshotting process. Doing it > this way ensures that the configuration is already locked during the > pre-snapshot hook. Because of this split, the VM config gets written > in two stages now, rather than one. >=20 > In case of failure during the preparation step - after the lock is > written - error handling has been added so the lock gets released > properly. The failed-snapshot hook runs when the snapshot fails, if > the pre-snapshot hook ran already. This enables users to revert any > changes done during the pre-snapshot hookscript. see below =20 > The preparation step assumes that the hook does not convert the > current VM into a template, which is why the basic checks are not > re-run after the pre-snapshot hook. The storage check runs after the > pre-snapshot hook, because the hook might get used to setup the > storage for snapshotting. If the hook would run after the storage > checks, this becomes impossible. >=20 > cfs_update() gets called after every invocation of a hookscript, since > it is impossible to know which changes get made by the hookscript. > Doing this ensures that we see the updated state of the CFS after the > hookscript got invoked. >=20 > Signed-off-by: Stefan Hanreich > --- > src/PVE/AbstractConfig.pm | 49 ++++++++++++++++++++++++++++++++++----- > 1 file changed, 43 insertions(+), 6 deletions(-) >=20 > diff --git a/src/PVE/AbstractConfig.pm b/src/PVE/AbstractConfig.pm > index a0c0bc6..3bff600 100644 > --- a/src/PVE/AbstractConfig.pm > +++ b/src/PVE/AbstractConfig.pm > @@ -710,8 +710,7 @@ sub __snapshot_prepare { > =20 > my $snap; > =20 > - my $updatefn =3D sub { > - > + my $run_checks =3D sub { > my $conf =3D $class->load_config($vmid); > =20 > die "you can't take a snapshot if it's a template\n" > @@ -721,15 +720,21 @@ sub __snapshot_prepare { > =20 > $conf->{lock} =3D 'snapshot'; > =20 > - my $snapshots =3D $conf->{snapshots}; > - > die "snapshot name '$snapname' already used\n" > - if defined($snapshots->{$snapname}); > + if defined($conf->{snapshots}->{$snapname}); > + > + $class->write_config($vmid, $conf); > + }; > =20 > + my $updatefn =3D sub { > + my $conf =3D $class->load_config($vmid); > my $storecfg =3D PVE::Storage::config(); > + > die "snapshot feature is not available\n" > if !$class->has_feature('snapshot', $conf, $storecfg, undef, undef,= $snapname eq 'vzdump'); > =20 > + my $snapshots =3D $conf->{snapshots}; > + > for my $snap (sort keys %$snapshots) { > my $parent_name =3D $snapshots->{$snap}->{parent} // ''; > if ($snapname eq $parent_name) { > @@ -753,7 +758,32 @@ sub __snapshot_prepare { > $class->write_config($vmid, $conf); > }; > =20 > - $class->lock_config($vmid, $updatefn); > + $class->lock_config($vmid, $run_checks); > + > + eval { > + my $conf =3D $class->load_config($vmid); > + PVE::GuestHelpers::exec_hookscript($conf, $vmid, "pre-snapshot", 1); > + }; > + my $err =3D $@; > + > + PVE::Cluster::cfs_update(); > + > + if ($err) { > + $class->remove_lock($vmid, 'snapshot'); > + die $err; > + } > + I wonder if we don't also want to call the 'failed-snapshot' phase when jus= t the pre-snapshot invocation failed? might be possible to combine the error hand= ling then, although I am not sure it makes it more readable if combined.. > + =20 > + if (my $err =3D $@) { > + my $conf =3D $class->load_config($vmid); > + PVE::GuestHelpers::exec_hookscript($conf, $vmid, "failed-snapshot"); this exec_hookscript needs to be inside an eval {}, with warn in case it fa= ils.. also, this call here happens when preparing for making the snapshot, after possibly saving the VM state, but before taking the volume snapshots.. > + PVE::Cluster::cfs_update(); > + > + $class->remove_lock($vmid, 'snapshot'); > + die $err; > + } > =20 > return $snap; > } > @@ -837,11 +867,18 @@ sub snapshot_create { > =20 > if ($err) { > warn "snapshot create failed: starting cleanup\n"; > + > + PVE::GuestHelpers::exec_hookscript($conf, $vmid, "failed-snapshot"); eval + warn as well this call here happens when the volume snapshots might or might not have be= en created already (depending on what exactly the error cause is). > + PVE::Cluster::cfs_update(); > + > eval { $class->snapshot_delete($vmid, $snapname, 1, $drivehash); }; > warn "$@" if $@; > die "$err\n"; > } > =20 > + PVE::GuestHelpers::exec_hookscript($conf, $vmid, "post-snapshot"); and here we have a similar issue (no eval), what should happen if post-snap= shot fails? A die immediately (very likely wrong, current) B eval + warn but proceed with commit (possibly leaving leftover hook chang= es around) C eval + warn, call failed-snapshot but proceed with commit (gives the hookscript a chance to cleanup, but how does it differentiate between the different failed-snapshot call sites?) D eval + delete snapshot (seems suboptimal) E eval + call failed-snapshot + delete snapshot (same, and also the issue o= f the hookscript being able to know what's going on again) B and C seem most sensible to me, but C adds to the issue of "missing failed-snapshot context", depending on what the hookscript is doing.. one way to pass information is via the environment, we do that for the migr= ation case already (setting PVE_MIGRATED_FROM, so that the pre-start/post-start hookscript can know the start happens in a migration context, and where to (possibly) find the guest config.. for example, we could set PVE_SNAPSHOT_PHASE here, and have prepare/commit/= post as sub-phases, or even pass a list of volumes already snapshotted (or creat= ed, in case of vmstate), or .. obviously setting the environment is only allowed in a forked worker contex= t, else it would affect the next API endpoint handled by the pveproxy/pvedaemo= n/.. process, so it might be worth double-checking and cleaning up to avoid side-effects with replication/migration/.. if we go down that route.. > + PVE::Cluster::cfs_update(); > + > $class->__snapshot_commit($vmid, $snapname); > } > =20 > --=20 > 2.30.2 >=20 >=20 > _______________________________________________ > pve-devel mailing list > pve-devel@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel >=20 >=20 >=20