public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Fabian Ebner <f.ebner@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH v3 guest-common 6/7] partially fix #3111: replication: be less picky when selecting incremental base
Date: Thu, 12 Aug 2021 13:01:10 +0200	[thread overview]
Message-ID: <20210812110111.73883-12-f.ebner@proxmox.com> (raw)
In-Reply-To: <20210812110111.73883-1-f.ebner@proxmox.com>

After rollback, it might be necessary to start the replication from an
earlier, possibly non-replication, snapshot, because the replication
snapshot might have been removed from the source node. Previously,
replication could only recover in case the current parent snapshot was
already replicated.

To get into the bad situation (with no replication happening between
the steps):
1. have existing replication
2. take new snapshot
3. rollback to that snapshot
In case the partial fix to only remove blocking replication snapshots
for rollback was already applied, an additional step is necessary to
get into the bad situation:
4. take a second new snapshot

Since non-replication snapshots are now also included, where no
timestamp is readily available, it is necessary to filter them out
when probing for replication snapshots.

If no common replication snapshot is present, iterate backwards
through the config snapshots.

The changes are backwards compatible:

If one side is running the old code, and the other the new code,
the fact that one of the two prepare() calls does not return the
new additional snapshot candidates, means that no new match is
possible. Since the new prepare() returns a superset, no previously
possible match is now impossible.

The branch with @desc_sorted_snap is now taken more often, but
it can still only be taken when the volume exists on the remote side
(and has snapshots). In such cases, it is safe to die if no
incremental base snapshot can be found, because a full sync would not
be possible.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---

$parent_snapname could now slowly be dropped from prepare(), but I'll
save that for later (it'll take at least until 8.x because of
backwards compatibility anyways).

 src/PVE/Replication.pm | 31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/src/PVE/Replication.pm b/src/PVE/Replication.pm
index 4056ea2..2609ad6 100644
--- a/src/PVE/Replication.pm
+++ b/src/PVE/Replication.pm
@@ -65,10 +65,12 @@ sub find_common_replication_snapshot {
 		     ($last_snapshots->{$volid}->{$parent_snapname} &&
 		      $remote_snapshots->{$volid}->{$parent_snapname})) {
 		$base_snapshots->{$volid} = $parent_snapname;
-	    } elsif ($last_sync == 0) {
+	    } else {
+		# First, try all replication snapshots.
 		my @desc_sorted_snap =
 		    map { $_->[1] } sort { $b->[0] <=> $a->[0] }
-		    map { [ ($_ =~ /__replicate_\Q$jobid\E_(\d+)_/)[0] || 0, $_ ] }
+		    grep { $_->[0] != 0 } # only consider replication snapshots
+		    map { [ ($_ =~ /__replicate_\Q$vmid\E-(?:\d+)_(\d+)_/)[0] || 0, $_ ] }
 		    keys %{$remote_snapshots->{$volid}};
 
 		foreach my $remote_snap (@desc_sorted_snap) {
@@ -77,6 +79,28 @@ sub find_common_replication_snapshot {
 			last;
 		    }
 		}
+
+		# Then, try config snapshots ($parent_snapname was already tested for above).
+		my $snapname = $parent_snapname // '';
+
+		# Be robust against loop, just in case.
+		# https://forum.proxmox.com/threads/snapshot-not-working.69511/post-312281
+		my $max_count = scalar(keys $guest_conf->{snapshots}->%*);
+		for (my $count = 0; $count < $max_count; $count++) {
+		    last if defined($base_snapshots->{$volid});
+
+		    my $snap_conf = $guest_conf->{snapshots}->{$snapname} || last;
+		    $snapname = $snap_conf->{parent} // last;
+
+		    if ($last_snapshots->{$volid}->{$snapname} &&
+			$remote_snapshots->{$volid}->{$snapname})
+		    {
+			$base_snapshots->{$volid} = $snapname;
+		    }
+		}
+
+		# The volume exists on the remote side, so trying a full sync won't work.
+		# Die early with a clean error.
 		die "No common base to restore the job state\n".
 		    "please delete jobid: $jobid and create the job again\n"
 		    if !defined($base_snapshots->{$volid});
@@ -182,6 +206,9 @@ sub prepare {
 		} else {
 		    $last_snapshots->{$volid}->{$snap} = 1;
 		}
+	    # Other snapshots might need to serve as replication base after rollback
+	    } else {
+		$last_snapshots->{$volid}->{$snap} = 1;
 	    }
 	}
     }
-- 
2.30.2





  parent reply	other threads:[~2021-08-12 11:02 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-12 11:00 [pve-devel] [PATCH-SERIES v3] fix #3111: fix interaction of snapshot operations with replication Fabian Ebner
2021-08-12 11:01 ` [pve-devel] [PATCH v3 storage 1/3] zfspool: add zfs_get_sorted_snapshot_list helper Fabian Ebner
2021-08-12 11:01 ` [pve-devel] [PATCH v3 storage 2/3] zfspool: add blockers parameter to volume_snapshot_is_possible Fabian Ebner
2021-08-13  6:54   ` Fabian Ebner
2021-08-12 11:01 ` [pve-devel] [PATCH v3 storage 3/3] test: zfspool: extend some rollback is possible tests with new blockers parameter Fabian Ebner
2021-08-12 11:01 ` [pve-devel] [PATCH v3 container 1/1] config: rollback is possible: add " Fabian Ebner
2021-08-12 11:01 ` [pve-devel] [PATCH v3 qemu-server " Fabian Ebner
2021-08-12 11:01 ` [pve-devel] [PATCH v3 guest-common 1/7] partially fix #3111: snapshot rollback: improve removing replication snapshots Fabian Ebner
2021-08-12 11:01 ` [pve-devel] [PATCH v3 guest-common 2/7] config: rollback: factor out helper for " Fabian Ebner
2021-08-12 11:01 ` [pve-devel] [PATCH v3 guest-common 3/7] partially fix #3111: further improve " Fabian Ebner
2021-08-12 11:01 ` [pve-devel] [PATCH v3 guest-common 4/7] replication: remove unused variable and style fixes Fabian Ebner
2021-08-12 11:01 ` [pve-devel] [PATCH v3 guest-common 5/7] replication: pass guest config to find_common_replication_snapshot Fabian Ebner
2021-08-12 11:01 ` Fabian Ebner [this message]
2021-08-12 11:01 ` [pve-devel] [RFC v3 guest-common 7/7] fix #3111: config: snapshot delete: check if replication still needs it Fabian Ebner
2021-10-06  6:59 ` [pve-devel] [PATCH-SERIES v3] fix #3111: fix interaction of snapshot operations with replication Fabian Ebner
2021-11-09 16:49 ` [pve-devel] applied-series: " Fabian Grünbichler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210812110111.73883-12-f.ebner@proxmox.com \
    --to=f.ebner@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal