From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <s.hanreich@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id A25859148
 for <pve-devel@lists.proxmox.com>; Thu, 17 Nov 2022 12:27:36 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 81B5F2B7DA
 for <pve-devel@lists.proxmox.com>; Thu, 17 Nov 2022 12:27:36 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS
 for <pve-devel@lists.proxmox.com>; Thu, 17 Nov 2022 12:27:35 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id C2A8E43E03
 for <pve-devel@lists.proxmox.com>; Thu, 17 Nov 2022 12:27:34 +0100 (CET)
Message-ID: <5aab529e-d71f-ca87-10da-abe586f22eb8@proxmox.com>
Date: Thu, 17 Nov 2022 12:27:33 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.4.0
From: Stefan Hanreich <s.hanreich@proxmox.com>
To: Fiona Ebner <f.ebner@proxmox.com>, pve-devel@lists.proxmox.com
References: <20220922115421.1406405-1-s.hanreich@proxmox.com>
 <20220922115421.1406405-3-s.hanreich@proxmox.com>
 <b6cd4181-8476-69a4-5f47-e16bea643e58@proxmox.com>
Content-Language: en-US
In-Reply-To: <b6cd4181-8476-69a4-5f47-e16bea643e58@proxmox.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-SPAM-LEVEL: Spam detection results: =?UTF-8?Q?0=0A=09?=AWL 0.567 Adjusted
 score from AWL reputation of From: =?UTF-8?Q?address=0A=09?=BAYES_00 -1.9
 Bayes spam probability is 0 to 1%
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict
 =?UTF-8?Q?Alignment=0A=09?=NICE_REPLY_A -0.001 Looks like a legit reply (A)
 SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF
 =?UTF-8?Q?Record=0A=09?=SPF_PASS -0.001 SPF: sender matches SPF record
Subject: Re: [pve-devel] [PATCH pve-guest-common 1/1] add pre/post-snapshot
 hooks
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Thu, 17 Nov 2022 11:27:36 -0000


On 11/14/22 09:51, Fiona Ebner wrote:
> Am 22.09.22 um 13:54 schrieb Stefan Hanreich:
>> Signed-off-by: Stefan Hanreich <s.hanreich@proxmox.com>
>> ---
>>
> 
> Should there be a third hook that's called when the snapshot fails? That
> would allow doing cleanup in all cases. Could still be added later when
> actually requested by users of course.
> 
> What are the common use cases you have in mind for these hooks?
> 

Quoting from the user in #2530:
Imagine hooks for pre- and post-snapshot too, it will helps with making 
consistent backups, clones, replication greatly, reduces recovery time 
after firing HA or backup restore… If one will be able to ensure that 
your databases and other stuff oflloads data correctly before snapshoting.

Apparently they want to use it to force DBs and similar applications to 
flush all their writes and maybe block new writes in the meanwhile.
Adding a failed-snapshot hook might be a good idea then, because in 
those cases one would still have to clean up some stuff that has been 
done in the pre-snapshot hook...

>> I have opted to include the snapshot hooks in the abstract class, since this
>> seemed like the more straightforward way.
>> The other option would have been duplicating the calls of the hooks into the
>> respective Backends, but I couldn't see any upsides to this.
>>
>> This hook runs before the preparation steps, since otherwise in case of a
>> failing pre-snapshot hook the VM/CT is left in a locked state, which I would
>> need to clean up, which would add unnecessary complexity imo.
>>
>> Otoh, there is a case to be made that the hook should only run after we checked
>> every precondition and are as certain as we can be that the snapshot will be
>> successfully created. This would be more convenient from a user's pov.
>> This can be particularly convenient as it would avoid errors with user scripts
>> that are not idempotent. Although those would still fail in case of a failure
>> during the snapshotting process.
> 
> Calling the hook script only after setting the lock in the config would
> add protection against other modifications happening at the same time.
> Stupid example: if we add another such hook outside a lock, and both
> that and the 'pre-snapshot' hook would modify the config, they could
> interfere when happening right after another. But it doesn't need to be
> another hook script, the config modification could also come from
> somewhere else.
> 
> Although this approach does requires users to pass the --skiplock option
> to modify the config if using our API/CLI. And we need to repeat the
> checks after calling the hook script, because they might not apply any
> more ;)
>

Yes, from this POV it's probably a smarter idea to do it this way. Not 
sure if we can trust the users to properly lock the config file in their 
hook scripts when editing the config. Definitely the safer option. 
Especially if you think about the case that the hookscript config option 
could also change sometime inbetween. That could lead to some weird 
behaviour.

I can actually see some use-cases for editing the config with regards to 
PCI passthrough. One could script removing / adding PCI-E devices in 
order to be able to take snapshots. Together with the restore hooks, one 
could even try scripting automatically detaching/attaching PCI-E devices 
when snapshotting/restoring. Not sure if this is practical though, 
because of the issues it could cause.

>>
>> What do you think would be the better solution?
>>
>>   src/PVE/AbstractConfig.pm | 8 +++++++-
>>   1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/PVE/AbstractConfig.pm b/src/PVE/AbstractConfig.pm
>> index a0c0bc6..8052fde 100644
>> --- a/src/PVE/AbstractConfig.pm
>> +++ b/src/PVE/AbstractConfig.pm
>> @@ -799,11 +799,15 @@ sub __snapshot_activate_storages {
>>   sub snapshot_create {
>>       my ($class, $vmid, $snapname, $save_vmstate, $comment) = @_;
>>   
>> +    my $conf = $class->load_config($vmid);
>> +    PVE::GuestHelpers::exec_hookscript($conf, $vmid, "pre-snapshot", 1);
>> +
>>       my $snap = $class->__snapshot_prepare($vmid, $snapname, $save_vmstate, $comment);
>>   
>>       $save_vmstate = 0 if !$snap->{vmstate};
>>   
>> -    my $conf = $class->load_config($vmid);
>> +    # reload config after changes in snapshot preparation step
> 
> I think there should be a cfs_update() call?

Yes, definitely. If I understand correctly load_config uses a cache to 
get already loaded configurations instead of hitting the disk everytime? 
That's why this is required?

> 
>> +    $conf = $class->load_config($vmid);
>>   
>>       my ($running, $freezefs) = $class->__snapshot_check_freeze_needed($vmid, $conf, $snap->{vmstate});
>>   
>> @@ -843,6 +847,8 @@ sub snapshot_create {
>>       }
>>   
>>       $class->__snapshot_commit($vmid, $snapname);
>> +
>> +    PVE::GuestHelpers::exec_hookscript($conf, $vmid, "post-snapshot");
>>   }
>>   
>>   # Check if the snapshot might still be needed by a replication job.