From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <f.ebner@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 45E1E76791
 for <pve-devel@lists.proxmox.com>; Tue, 19 Oct 2021 09:55:48 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 3907A29C75
 for <pve-devel@lists.proxmox.com>; Tue, 19 Oct 2021 09:55:17 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id 7DA0F298F9
 for <pve-devel@lists.proxmox.com>; Tue, 19 Oct 2021 09:55:09 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id B000F468D2
 for <pve-devel@lists.proxmox.com>; Tue, 19 Oct 2021 09:55:04 +0200 (CEST)
From: Fabian Ebner <f.ebner@proxmox.com>
To: pve-devel@lists.proxmox.com
Date: Tue, 19 Oct 2021 09:54:55 +0200
Message-Id: <20211019075459.14328-7-f.ebner@proxmox.com>
X-Mailer: git-send-email 2.30.2
In-Reply-To: <20211019075459.14328-1-f.ebner@proxmox.com>
References: <20211019075459.14328-1-f.ebner@proxmox.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.267 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: [pve-devel] [PATCH guest-common 3/4] replication: find common
 snapshot: use additional information
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Tue, 19 Oct 2021 07:55:48 -0000

which is now available from the storage back-end.

The benefits are:

1. Ability to detect different snapshots even if they have the same
name. Rather hard to reach, but for example with:
Snapshots A -> B -> C -> __replicationXYZ
Remove B, rollback to C (causes __replicationXYZ to be removed),
create a new snapshot B. Previously, B was selected as replication
base, but it didn't match on source and target. Now, C is correctly
selected.
2. Smaller delta in some cases by not prefering replication snapshots
over config snapshots, but using the most recent possible one from
both types of snapshots.
3. Less code complexity for snapshot selection.

If the remote side is old, it's not possible to detect mismatch of
distinct snapshots with the same name, but the timestamps from the
local side can still be used.

Still limit to our snapshots (config and replication), because we
don't have guarantees for other snapshots (could be deleted in the
meantime, name might not fit import/export regex, etc.).

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
 src/PVE/Replication.pm | 55 ++++++++++++++++++------------------------
 1 file changed, 24 insertions(+), 31 deletions(-)

diff --git a/src/PVE/Replication.pm b/src/PVE/Replication.pm
index 4d4f62f..051dfd9 100644
--- a/src/PVE/Replication.pm
+++ b/src/PVE/Replication.pm
@@ -31,6 +31,7 @@ sub find_common_replication_snapshot {
     my ($ssh_info, $jobid, $vmid, $storecfg, $volumes, $storeid_list, $last_sync, $guest_conf, $logfunc) = @_;
 
     my $parent_snapname = $guest_conf->{parent};
+    my $conf_snapshots = $guest_conf->{snapshots};
 
     my $last_sync_snapname =
 	PVE::ReplicationState::replication_snapshot_name($jobid, $last_sync);
@@ -57,24 +58,35 @@ sub find_common_replication_snapshot {
     my $base_snapshots = {};
 
     foreach my $volid (@$volumes) {
-	if (defined($last_snapshots->{$volid}) && defined($remote_snapshots->{$volid})) {
-	    if ($last_snapshots->{$volid}->{$last_sync_snapname} &&
-		$remote_snapshots->{$volid}->{$last_sync_snapname}) {
+	my $local_info = $last_snapshots->{$volid};
+	my $remote_info = $remote_snapshots->{$volid};
+
+	if (defined($local_info) && defined($remote_info)) {
+	    my $common_snapshot = sub {
+		my ($snap) = @_;
+
+		return 0 if !$local_info->{$snap} || !$remote_info->{$snap};
+
+		# Check for ID if remote side supports it already.
+		return $local_info->{$snap}->{id} eq $remote_info->{$snap}->{id}
+		    if ref($remote_info->{$snap}) eq 'HASH';
+
+		return 1;
+	    };
+
+	    if ($common_snapshot->($last_sync_snapname)) {
 		$base_snapshots->{$volid} = $last_sync_snapname;
-	    } elsif (defined($parent_snapname) &&
-		     ($last_snapshots->{$volid}->{$parent_snapname} &&
-		      $remote_snapshots->{$volid}->{$parent_snapname})) {
+	    } elsif (defined($parent_snapname) && $common_snapshot->($parent_snapname)) {
 		$base_snapshots->{$volid} = $parent_snapname;
 	    } else {
-		# First, try all replication snapshots.
 		my $most_recent = [0, undef];
-		for my $remote_snap (keys $remote_snapshots->{$volid}->%*) {
-		    next if !defined($last_snapshots->{$volid}->{$remote_snap});
+		for my $snapshot (keys $local_info->%*) {
+		    next if !$common_snapshot->($snapshot);
+		    next if !$conf_snapshots->{$snapshot} && !is_replication_snapshot($snapshot);
 
-		    my $timestamp = ($remote_snap =~ /__replicate_\Q$vmid\E-(?:\d+)_(\d+)_/)[0];
-		    next if !$timestamp;
+		    my $timestamp = $local_info->{$snapshot}->{timestamp};
 
-		    $most_recent = [$timestamp, $remote_snap] if $timestamp > $most_recent->[0];
+		    $most_recent = [$timestamp, $snapshot] if $timestamp > $most_recent->[0];
 		}
 
 		if ($most_recent->[1]) {
@@ -82,25 +94,6 @@ sub find_common_replication_snapshot {
 		    next;
 		}
 
-		# Then, try config snapshots ($parent_snapname was already tested for above).
-		my $snapname = $parent_snapname // '';
-
-		# Be robust against loop, just in case.
-		# https://forum.proxmox.com/threads/snapshot-not-working.69511/post-312281
-		my $max_count = scalar(keys $guest_conf->{snapshots}->%*);
-		for (my $count = 0; $count < $max_count; $count++) {
-		    last if defined($base_snapshots->{$volid});
-
-		    my $snap_conf = $guest_conf->{snapshots}->{$snapname} || last;
-		    $snapname = $snap_conf->{parent} // last;
-
-		    if ($last_snapshots->{$volid}->{$snapname} &&
-			$remote_snapshots->{$volid}->{$snapname})
-		    {
-			$base_snapshots->{$volid} = $snapname;
-		    }
-		}
-
 		# The volume exists on the remote side, so trying a full sync won't work.
 		# Die early with a clean error.
 		die "No common base to restore the job state\n".
-- 
2.30.2