From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <f.ebner@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 745D49334F
 for <pve-devel@lists.proxmox.com>; Wed,  4 Jan 2023 11:50:58 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 549F31DD3A
 for <pve-devel@lists.proxmox.com>; Wed,  4 Jan 2023 11:50:58 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS
 for <pve-devel@lists.proxmox.com>; Wed,  4 Jan 2023 11:50:57 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 6EAA744E3B
 for <pve-devel@lists.proxmox.com>; Wed,  4 Jan 2023 11:50:57 +0100 (CET)
Message-ID: <dff207ed-4116-2010-1be0-d3b263469ea9@proxmox.com>
Date: Wed, 4 Jan 2023 11:50:38 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.5.0
Content-Language: en-US
To: pve-devel@lists.proxmox.com, c.heiss@proxmox.com
References: <20230102123633.2493599-1-c.heiss@proxmox.com>
 <20230102123633.2493599-3-c.heiss@proxmox.com>
From: Fiona Ebner <f.ebner@proxmox.com>
In-Reply-To: <20230102123633.2493599-3-c.heiss@proxmox.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.880 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 NICE_REPLY_A           -1.708 Looks like a legit reply (A)
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pve-devel] [PATCH storage] fix #4289: pbs: wait for backup
 verification to finish before updating volume attribute
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Wed, 04 Jan 2023 10:50:58 -0000

Am 02.01.23 um 13:36 schrieb Christoph Heiss:
> diff --git a/PVE/Storage/PBSPlugin.pm b/PVE/Storage/PBSPlugin.pm
> index 4320974..1cdbc11 100644
> --- a/PVE/Storage/PBSPlugin.pm
> +++ b/PVE/Storage/PBSPlugin.pm
> @@ -906,8 +906,30 @@ sub get_volume_attribute {
>      return;
>  }
> 
> +sub wait_for_verify_finish {
> +    my ($conn, $node, $datastore, $attrs) = @_;
> +
> +    my $param = {
> +	running => 'true',
> +	since => $attrs->{'backup-time'},
> +	store => $datastore,
> +	typefilter => 'verify',
> +    };
> +
> +    my $taskname = sprintf('%s:%s/%s/%X',
> +	$datastore,
> +        @{$attrs}{qw(backup-type backup-id backup-time)},
> +    );

I don't think it's likely that the task name format here will change
often, but as you already mentioned in the cover letter, it's not ideal
to have it hard-coded here.

> +
> +    while (1) {
> +	my $res = eval { $conn->get("/api2/json/nodes/$node/tasks", $param); };
> +	last if !grep { $_->{worker_id} eq $taskname } @$res;
> +	sleep(1);
> +    }
> +}
> +
> @@ -921,6 +943,9 @@ sub update_volume_attribute {
>  	my $conn = pbs_api_connect($scfg, $password);
>  	my $datastore = $scfg->{datastore};
> 
> +	$logfunc->('info', 'waiting for server to finish backup verification...') if $logfunc;

Should only be printed if there is actually a verification we need to
wait for.

> +	wait_for_verify_finish($conn, $scfg->{server}, $datastore, $param);

To me, it feels out of place to be concerned with waiting on
verification in (the rather low-level) update_volume_attribute(), which
is a rather specific thing to do. I'd say it's fine to fail there when
the snapshot is locked by verification or some other operation.

Waiting for verification also can increase the backup duration/time
holding the vzdump lock on the PVE side quite a bit. It might not seem
that big of a deal, because usually only manual backups use 'protected'.
But by doing it in update_volume_attribute(), you also do it for
'notes', where it's not needed and which is relevant to backup jobs
where the increased wait might be very noticeable. So at least, it
should only be done for 'protected' if doing it in
update_volume_attribute().

It would be better if the protected flag could be specified upon
creation already. Would also fix the following race I guess:
1. backup finishes
2. prune running on PBS
3. protected status set from PVE

If going for the waiting approach after all, I think it should rather be
done in vzdump, before calling update_volume_attribute(). And the helper
to wait on verification should likely be part of PBSClient.pm (would
need to teach it to use an API connection first).