public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH-SERIES v3] fix #3111: fix interaction of snapshot operations with replication
@ 2021-08-12 11:00 Fabian Ebner
  2021-08-12 11:01 ` [pve-devel] [PATCH v3 storage 1/3] zfspool: add zfs_get_sorted_snapshot_list helper Fabian Ebner
                   ` (13 more replies)
  0 siblings, 14 replies; 16+ messages in thread
From: Fabian Ebner @ 2021-08-12 11:00 UTC (permalink / raw)
  To: pve-devel

For more context, see also:
https://lists.proxmox.com/pipermail/pve-devel/2021-August/049694.html

Changes from v2:
    * Many new patches, as the approach is different:
      For one, only replication snapshots that are blocking rollback
      are removed. Second, consider more snapshot candidates when
      probing for an incremental replication base. Last, instead of
      directly running replication after rollback, prevent snapshot
      deletion if it might be the current incremental replication
      base.

Many thanks to Fabian G. for discussing those ideas with me!

All patches are new in v3, except guest-common patch #1, which
hasn't changed much, and would fix some issues already by itself.

I think nothing requires an explicit dependency bump, but some things,
like "remove only the real blockers", will only start working when all
the pieces are in place.


storage:

Fabian Ebner (3):
  zfspool: add zfs_get_sorted_snapshot_list helper
  zfspool: add blockers parameter to volume_snapshot_is_possible
  test: zfspool: extend some rollback is possible tests with new
    blockers parameter

 PVE/Storage.pm                 |  4 +--
 PVE/Storage/BTRFSPlugin.pm     |  2 +-
 PVE/Storage/Plugin.pm          |  5 ++-
 PVE/Storage/ZFSPoolPlugin.pm   | 65 +++++++++++++++-------------------
 test/run_test_zfspoolplugin.pl | 65 +++++++++++++++++++++++++---------
 5 files changed, 85 insertions(+), 56 deletions(-)


container:

Fabian Ebner (1):
  config: rollback is possible: add blockers parameter

 src/PVE/LXC/Config.pm | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)


qemu-server:

Fabian Ebner (1):
  config: rollback is possible: add blockers parameter

 PVE/QemuConfig.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


guest-common:

Fabian Ebner (7):
  partially fix #3111: snapshot rollback: improve removing replication
    snapshots
  config: rollback: factor out helper for removing replication snapshots
  partially fix #3111: further improve removing replication snapshots
  replication: remove unused variable and style fixes
  replication: pass guest config to find_common_replication_snapshot
  partially fix #3111: replication: be less picky when selecting
    incremental base
  fix #3111 config: snapshot delete: check if replication still needs it

 src/PVE/AbstractConfig.pm    | 120 +++++++++++++++++++++++++++++++----
 src/PVE/Replication.pm       |  66 ++++++++++++++++----
 src/PVE/ReplicationConfig.pm |  14 +++++
 3 files changed, 177 insertions(+), 23 deletions(-)

-- 
2.30.2





^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pve-devel] [PATCH v3 storage 1/3] zfspool: add zfs_get_sorted_snapshot_list helper
  2021-08-12 11:00 [pve-devel] [PATCH-SERIES v3] fix #3111: fix interaction of snapshot operations with replication Fabian Ebner
@ 2021-08-12 11:01 ` Fabian Ebner
  2021-08-12 11:01 ` [pve-devel] [PATCH v3 storage 2/3] zfspool: add blockers parameter to volume_snapshot_is_possible Fabian Ebner
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Fabian Ebner @ 2021-08-12 11:01 UTC (permalink / raw)
  To: pve-devel

replacing the current zfs_get_latest_snapshot. For
volume_snapshot_list, ignore errors as before.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
 PVE/Storage/ZFSPoolPlugin.pm | 44 +++++++++++-------------------------
 1 file changed, 13 insertions(+), 31 deletions(-)

diff --git a/PVE/Storage/ZFSPoolPlugin.pm b/PVE/Storage/ZFSPoolPlugin.pm
index c4be70f..c2a0385 100644
--- a/PVE/Storage/ZFSPoolPlugin.pm
+++ b/PVE/Storage/ZFSPoolPlugin.pm
@@ -395,25 +395,21 @@ sub zfs_list_zvol {
     return $list;
 }
 
-sub zfs_get_latest_snapshot {
-    my ($class, $scfg, $volname) = @_;
+sub zfs_get_sorted_snapshot_list {
+    my ($class, $scfg, $volname, $sort_params) = @_;
 
     my $vname = ($class->parse_volname($volname))[1];
 
-    # abort rollback if snapshot is not the latest
-    my @params = ('-t', 'snapshot', '-o', 'name', '-s', 'creation');
+    my @params = ('-H', '-t', 'snapshot', '-o', 'name', $sort_params->@*, "$scfg->{pool}\/$vname");
     my $text = $class->zfs_request($scfg, undef, 'list', @params);
     my @snapshots = split(/\n/, $text);
 
-    my $recentsnap;
-    foreach (@snapshots) {
-        if (/$scfg->{pool}\/$vname/) {
-            s/^.*@//;
-            $recentsnap = $_;
-        }
+    my $snap_names = [];
+    for my $snapshot (@snapshots) {
+	(my $snap_name = $snapshot) =~ s/^.*@//;
+	push $snap_names->@*, $snap_name;
     }
-
-    return $recentsnap;
+    return $snap_names;
 }
 
 sub status {
@@ -486,7 +482,10 @@ sub volume_snapshot_rollback {
 sub volume_rollback_is_possible {
     my ($class, $scfg, $storeid, $volname, $snap) = @_;
 
-    my $recentsnap = $class->zfs_get_latest_snapshot($scfg, $volname);
+    # can't use '-S creation', because zfs list won't reverse the order when the
+    # creation time is the same second, breaking at least our tests.
+    my $snapshots = $class->zfs_get_sorted_snapshot_list($scfg, $volname, ['-s', 'creation']);
+    my $recentsnap = $snapshots->[-1];
 
     die "can't rollback, no snapshots exist at all\n"
 	if !defined($recentsnap);
@@ -500,26 +499,9 @@ sub volume_rollback_is_possible {
 sub volume_snapshot_list {
     my ($class, $scfg, $storeid, $volname) = @_;
 
-    my ($vtype, $name, $vmid) = $class->parse_volname($volname);
-
-    my $zpath = "$scfg->{pool}/$name";
-
     my $snaps = [];
-
-    my $cmd = ['zfs', 'list', '-r', '-H', '-S', 'name', '-t', 'snap', '-o',
-	       'name', $zpath];
-
-    my $outfunc = sub {
-	my $line = shift;
-
-	if ($line =~ m/^\Q$zpath\E@(.*)$/) {
-	    push @$snaps, $1;
-	}
-    };
-
-    eval { run_command( [$cmd], outfunc => $outfunc , errfunc => sub{}); };
-
     # return an empty array if dataset does not exist.
+    eval { $snaps = $class->zfs_get_sorted_snapshot_list($scfg, $volname, ['-S', 'name']); };
     return $snaps;
 }
 
-- 
2.30.2





^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pve-devel] [PATCH v3 storage 2/3] zfspool: add blockers parameter to volume_snapshot_is_possible
  2021-08-12 11:00 [pve-devel] [PATCH-SERIES v3] fix #3111: fix interaction of snapshot operations with replication Fabian Ebner
  2021-08-12 11:01 ` [pve-devel] [PATCH v3 storage 1/3] zfspool: add zfs_get_sorted_snapshot_list helper Fabian Ebner
@ 2021-08-12 11:01 ` Fabian Ebner
  2021-08-13  6:54   ` Fabian Ebner
  2021-08-12 11:01 ` [pve-devel] [PATCH v3 storage 3/3] test: zfspool: extend some rollback is possible tests with new blockers parameter Fabian Ebner
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 16+ messages in thread
From: Fabian Ebner @ 2021-08-12 11:01 UTC (permalink / raw)
  To: pve-devel

useful for rollback, so that only the required replication snapshots
can be removed, and it's possible to abort early without deleting any
replication snapshots if there are other non-replication snasphots
blocking rollback.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
 PVE/Storage.pm               |  4 ++--
 PVE/Storage/BTRFSPlugin.pm   |  2 +-
 PVE/Storage/Plugin.pm        |  5 ++++-
 PVE/Storage/ZFSPoolPlugin.pm | 23 +++++++++++++++++------
 4 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/PVE/Storage.pm b/PVE/Storage.pm
index c04b5a2..3b3ce93 100755
--- a/PVE/Storage.pm
+++ b/PVE/Storage.pm
@@ -269,13 +269,13 @@ sub volume_resize {
 }
 
 sub volume_rollback_is_possible {
-    my ($cfg, $volid, $snap) = @_;
+    my ($cfg, $volid, $snap, $blockers) = @_;
 
     my ($storeid, $volname) = parse_volume_id($volid, 1);
     if ($storeid) {
         my $scfg = storage_config($cfg, $storeid);
         my $plugin = PVE::Storage::Plugin->lookup($scfg->{type});
-        return $plugin->volume_rollback_is_possible($scfg, $storeid, $volname, $snap);
+        return $plugin->volume_rollback_is_possible($scfg, $storeid, $volname, $snap, $blockers);
     } elsif ($volid =~ m|^(/.+)$| && -e $volid) {
         die "snapshot rollback file/device '$volid' is not possible\n";
     } else {
diff --git a/PVE/Storage/BTRFSPlugin.pm b/PVE/Storage/BTRFSPlugin.pm
index 411cab9..60fb027 100644
--- a/PVE/Storage/BTRFSPlugin.pm
+++ b/PVE/Storage/BTRFSPlugin.pm
@@ -513,7 +513,7 @@ sub volume_snapshot {
 }
 
 sub volume_rollback_is_possible {
-    my ($class, $scfg, $storeid, $volname, $snap) = @_; 
+    my ($class, $scfg, $storeid, $volname, $snap, $blockers) = @_;
 
     return 1; 
 }
diff --git a/PVE/Storage/Plugin.pm b/PVE/Storage/Plugin.pm
index b1865cb..c265e53 100644
--- a/PVE/Storage/Plugin.pm
+++ b/PVE/Storage/Plugin.pm
@@ -900,8 +900,11 @@ sub volume_snapshot {
     return undef;
 }
 
+# Asserts that a rollback to $snap on $volname is possible.
+# If certain snapshots are preventing the rollback and $blockers is an array
+# reference, the snapshot names can be pushed onto $blockers prior to dying.
 sub volume_rollback_is_possible {
-    my ($class, $scfg, $storeid, $volname, $snap) = @_;
+    my ($class, $scfg, $storeid, $volname, $snap, $blockers) = @_;
 
     return 1;
 }
diff --git a/PVE/Storage/ZFSPoolPlugin.pm b/PVE/Storage/ZFSPoolPlugin.pm
index c2a0385..8d79f38 100644
--- a/PVE/Storage/ZFSPoolPlugin.pm
+++ b/PVE/Storage/ZFSPoolPlugin.pm
@@ -480,18 +480,29 @@ sub volume_snapshot_rollback {
 }
 
 sub volume_rollback_is_possible {
-    my ($class, $scfg, $storeid, $volname, $snap) = @_;
+    my ($class, $scfg, $storeid, $volname, $snap, $blockers) = @_;
 
     # can't use '-S creation', because zfs list won't reverse the order when the
     # creation time is the same second, breaking at least our tests.
     my $snapshots = $class->zfs_get_sorted_snapshot_list($scfg, $volname, ['-s', 'creation']);
-    my $recentsnap = $snapshots->[-1];
 
-    die "can't rollback, no snapshots exist at all\n"
-	if !defined($recentsnap);
+    my $found;
+    $blockers //= []; # not guaranteed to be set by caller
+    for my $snapshot ($snapshots->@*) {
+	if ($snapshot eq $snap) {
+	    $found = 1;
+	} elsif ($found) {
+	    push $blockers->@*, $snapshot;
+	}
+    }
+
+    my $volid = "${storeid}:${volname}";
+
+    die "can't rollback, snapshot '$snap' does not exist on '$volid'\n"
+	if !$found;
 
-    die "can't rollback, '$snap' is not most recent snapshot\n"
-	if $snap ne $recentsnap;
+    die "can't rollback, '$snap' is not most recent snapshot on '$volid'\n"
+	if scalar($blockers->@*) > 0;
 
     return 1;
 }
-- 
2.30.2





^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pve-devel] [PATCH v3 storage 3/3] test: zfspool: extend some rollback is possible tests with new blockers parameter
  2021-08-12 11:00 [pve-devel] [PATCH-SERIES v3] fix #3111: fix interaction of snapshot operations with replication Fabian Ebner
  2021-08-12 11:01 ` [pve-devel] [PATCH v3 storage 1/3] zfspool: add zfs_get_sorted_snapshot_list helper Fabian Ebner
  2021-08-12 11:01 ` [pve-devel] [PATCH v3 storage 2/3] zfspool: add blockers parameter to volume_snapshot_is_possible Fabian Ebner
@ 2021-08-12 11:01 ` Fabian Ebner
  2021-08-12 11:01 ` [pve-devel] [PATCH v3 container 1/1] config: rollback is possible: add " Fabian Ebner
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Fabian Ebner @ 2021-08-12 11:01 UTC (permalink / raw)
  To: pve-devel

and fix a few typos.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
 test/run_test_zfspoolplugin.pl | 65 +++++++++++++++++++++++++---------
 1 file changed, 49 insertions(+), 16 deletions(-)

diff --git a/test/run_test_zfspoolplugin.pl b/test/run_test_zfspoolplugin.pl
index 2f63f1b..095ccb3 100755
--- a/test/run_test_zfspoolplugin.pl
+++ b/test/run_test_zfspoolplugin.pl
@@ -1362,10 +1362,21 @@ my $test6 = sub {
 
     eval {
 	PVE::Storage::volume_snapshot($cfg, "$storagename:$vmdisk", 'snap1');
-	if ( 1 !=
-	     PVE::Storage::volume_rollback_is_possible($cfg, "$storagename:$vmdisk", 'snap1')) {
+
+	my $blockers = [];
+	my $res = PVE::Storage::volume_rollback_is_possible(
+	    $cfg,
+	    "$storagename:$vmdisk",
+	    'snap1',
+	    $blockers,
+	);
+	if ($res != 1) {
+	    $count++;
+	    warn "Test6 a: Rollback should be possible";
+	}
+	if (scalar($blockers->@*) != 0) {
 	    $count++;
-	    warn "Test6 a: Rollback sould possible"
+	    warn "Test6 a: 'blockers' should be empty";
 	}
     };
     if ($@) {
@@ -1378,7 +1389,7 @@ my $test6 = sub {
 	if ( 1 !=
 	     PVE::Storage::volume_rollback_is_possible($cfg, "$storagename:$vmbase", 'snap1')) {
 	    $count++;
-	    warn "Test6 b: Rollback sould possible"
+	    warn "Test6 b: Rollback should be possible";
 	}
     };
     if ($@) {
@@ -1391,7 +1402,7 @@ my $test6 = sub {
 	if ( 1 !=
 	     PVE::Storage::volume_rollback_is_possible($cfg, "$storagename:$vmbase\/$vmlinked", 'snap1')) {
 	    $count++;
-	    warn "Test6 c: Rollback sould possible"
+	    warn "Test6 c: Rollback should be possible";
 	}
     };
     if ($@) {
@@ -1404,7 +1415,7 @@ my $test6 = sub {
 	if ( 1 !=
 	     PVE::Storage::volume_rollback_is_possible($cfg, "$storagename:$ctdisk", 'snap1')) {
 	    $count++;
-	    warn "Test6 d: Rollback sould possible"
+	    warn "Test6 d: Rollback should be possible";
 	}
     };
     if ($@) {
@@ -1417,7 +1428,7 @@ my $test6 = sub {
 	if ( 1 !=
 	     PVE::Storage::volume_rollback_is_possible($cfg, "$storagename:$ctbase", 'snap1')) {
 	    $count++;
-	    warn "Test6 e: Rollback sould possible"
+	    warn "Test6 e: Rollback should be possible";
 	}
     };
     if ($@) {
@@ -1430,7 +1441,7 @@ my $test6 = sub {
 	if ( 1 !=
 	     PVE::Storage::volume_rollback_is_possible($cfg, "$storagename:$ctbase\/$ctlinked", 'snap1')) {
 	    $count++;
-	    warn "Test6 f: Rollback sould possible"
+	    warn "Test6 f: Rollback should be possible";
 	}
     };
     if ($@) {
@@ -1438,23 +1449,45 @@ my $test6 = sub {
 	warn "Test6 f: $@";
     }
 
+    my $blockers = [];
     eval {
 	PVE::Storage::volume_snapshot($cfg, "$storagename:$vmdisk", 'snap2');
-	PVE::Storage::volume_rollback_is_possible($cfg, "$storagename:$vmdisk", 'snap1');
+	PVE::Storage::volume_rollback_is_possible($cfg, "$storagename:$vmdisk", 'snap1', $blockers);
     };
     if (!$@) {
 	$count++;
-	warn "Test6 g: Rollback should not possible";
+	warn "Test6 g: Rollback should not be possible";
+    } elsif (scalar($blockers->@*) != 1 || $blockers->[0] ne 'snap2') {
+	$count++;
+	warn "Test6 g: 'blockers' should be ['snap2']";
     }
+    undef $blockers;
 
+    $blockers = [];
     eval {
 	PVE::Storage::volume_snapshot($cfg, "$storagename:$vmbase", 'snap2');
-	PVE::Storage::volume_rollback_is_possible($cfg, "$storagename:$vmbase", 'snap1');
+	PVE::Storage::volume_snapshot($cfg, "$storagename:$vmbase", 'snap3');
+	PVE::Storage::volume_rollback_is_possible($cfg, "$storagename:$vmbase", 'snap1', $blockers);
     };
     if (!$@) {
 	$count++;
-	warn "Test6 h: Rollback should not possible";
+	warn "Test6 h: Rollback should not be possible";
+    } else {
+	if (scalar($blockers->@*) != 2) {
+	    $count++;
+	    warn "Test6 g: 'blockers' should contain two elements";
+	}
+	my $blockers_hash = { map { $_ => 1 } $blockers->@* };
+	if (!$blockers_hash->{'snap2'}) {
+	    $count++;
+	    warn "Test6 g: 'blockers' should contain 'snap2'";
+	}
+	if (!$blockers_hash->{'snap3'}) {
+	    $count++;
+	    warn "Test6 g: 'blockers' should contain 'snap3'";
+	}
     }
+    undef $blockers;
 
     eval {
 	PVE::Storage::volume_snapshot($cfg, "$storagename:$vmlinked", 'snap2');
@@ -1462,7 +1495,7 @@ my $test6 = sub {
     };
     if (!$@) {
 	$count++;
-	warn "Test6 j: Rollback should not possible";
+	warn "Test6 j: Rollback should not be possible";
     }
 
     eval {
@@ -1471,7 +1504,7 @@ my $test6 = sub {
     };
     if (!$@) {
 	$count++;
-	warn "Test6 k: Rollback should not possible";
+	warn "Test6 k: Rollback should not be possible";
     }
 
     eval {
@@ -1480,7 +1513,7 @@ my $test6 = sub {
     };
     if (!$@) {
 	$count++;
-	warn "Test6 l: Rollback should not possible";
+	warn "Test6 l: Rollback should not be possible";
     }
 
     eval {
@@ -1489,7 +1522,7 @@ my $test6 = sub {
     };
     if (!$@) {
 	$count++;
-	warn "Test6 m: Rollback should not possible";
+	warn "Test6 m: Rollback should not be possible";
     }
 };
 $tests->{6} = $test6;
-- 
2.30.2





^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pve-devel] [PATCH v3 container 1/1] config: rollback is possible: add blockers parameter
  2021-08-12 11:00 [pve-devel] [PATCH-SERIES v3] fix #3111: fix interaction of snapshot operations with replication Fabian Ebner
                   ` (2 preceding siblings ...)
  2021-08-12 11:01 ` [pve-devel] [PATCH v3 storage 3/3] test: zfspool: extend some rollback is possible tests with new blockers parameter Fabian Ebner
@ 2021-08-12 11:01 ` Fabian Ebner
  2021-08-12 11:01 ` [pve-devel] [PATCH v3 qemu-server " Fabian Ebner
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Fabian Ebner @ 2021-08-12 11:01 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
 src/PVE/LXC/Config.pm | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/PVE/LXC/Config.pm b/src/PVE/LXC/Config.pm
index 8557e4c..814cbad 100644
--- a/src/PVE/LXC/Config.pm
+++ b/src/PVE/LXC/Config.pm
@@ -214,10 +214,15 @@ sub __snapshot_delete_vol_snapshot {
 }
 
 sub __snapshot_rollback_vol_possible {
-    my ($class, $mountpoint, $snapname) = @_;
+    my ($class, $mountpoint, $snapname, $blockers) = @_;
 
     my $storecfg = PVE::Storage::config();
-    PVE::Storage::volume_rollback_is_possible($storecfg, $mountpoint->{volume}, $snapname);
+    PVE::Storage::volume_rollback_is_possible(
+	$storecfg,
+	$mountpoint->{volume},
+	$snapname,
+	$blockers,
+    );
 }
 
 sub __snapshot_rollback_vol_rollback {
-- 
2.30.2





^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pve-devel] [PATCH v3 qemu-server 1/1] config: rollback is possible: add blockers parameter
  2021-08-12 11:00 [pve-devel] [PATCH-SERIES v3] fix #3111: fix interaction of snapshot operations with replication Fabian Ebner
                   ` (3 preceding siblings ...)
  2021-08-12 11:01 ` [pve-devel] [PATCH v3 container 1/1] config: rollback is possible: add " Fabian Ebner
@ 2021-08-12 11:01 ` Fabian Ebner
  2021-08-12 11:01 ` [pve-devel] [PATCH v3 guest-common 1/7] partially fix #3111: snapshot rollback: improve removing replication snapshots Fabian Ebner
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Fabian Ebner @ 2021-08-12 11:01 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
 PVE/QemuConfig.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/PVE/QemuConfig.pm b/PVE/QemuConfig.pm
index 7ee8876..b993378 100644
--- a/PVE/QemuConfig.pm
+++ b/PVE/QemuConfig.pm
@@ -430,14 +430,14 @@ sub __snapshot_rollback_hook {
 }
 
 sub __snapshot_rollback_vol_possible {
-    my ($class, $drive, $snapname) = @_;
+    my ($class, $drive, $snapname, $blockers) = @_;
 
     return if PVE::QemuServer::drive_is_cdrom($drive);
 
     my $storecfg = PVE::Storage::config();
     my $volid = $drive->{file};
 
-    PVE::Storage::volume_rollback_is_possible($storecfg, $volid, $snapname);
+    PVE::Storage::volume_rollback_is_possible($storecfg, $volid, $snapname, $blockers);
 }
 
 sub __snapshot_rollback_vol_rollback {
-- 
2.30.2





^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pve-devel] [PATCH v3 guest-common 1/7] partially fix #3111: snapshot rollback: improve removing replication snapshots
  2021-08-12 11:00 [pve-devel] [PATCH-SERIES v3] fix #3111: fix interaction of snapshot operations with replication Fabian Ebner
                   ` (4 preceding siblings ...)
  2021-08-12 11:01 ` [pve-devel] [PATCH v3 qemu-server " Fabian Ebner
@ 2021-08-12 11:01 ` Fabian Ebner
  2021-08-12 11:01 ` [pve-devel] [PATCH v3 guest-common 2/7] config: rollback: factor out helper for " Fabian Ebner
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Fabian Ebner @ 2021-08-12 11:01 UTC (permalink / raw)
  To: pve-devel

Get the replicatable volumes from the snapshot config rather than the
current config. And filter those volumes further to those that will
actually be rolled back.

Previously, a volume that only had replication snapshots (e.g. because
it was added after the snapshot was taken, or the vmstate volume)
would lose them.  Then, on the next replication run, such a volume
would lead to an error, because replication tried to do a full sync,
but the target volume still exists.

This is not a complete fix. It is still possible to run into problems:
- by removing the last (non-replication) snapshots after a rollback
  before replication can run once.
- by creating a snapshot and making a rollback before replication can
  run once.

The list of volumes is not required to be sorted for prepare(), but it
is sorted by how foreach_volume() iterates now, so not random.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---

Changes from v2:
    * mention other possible way to run into issue in commit message
    * remove redundant comment

 src/PVE/AbstractConfig.pm | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/src/PVE/AbstractConfig.pm b/src/PVE/AbstractConfig.pm
index 3348d8a..493bf97 100644
--- a/src/PVE/AbstractConfig.pm
+++ b/src/PVE/AbstractConfig.pm
@@ -974,13 +974,22 @@ sub snapshot_rollback {
 	if ($prepare) {
 	    my $repl_conf = PVE::ReplicationConfig->new();
 	    if ($repl_conf->check_for_existing_jobs($vmid, 1)) {
-		# remove all replication snapshots
-		my $volumes = $class->get_replicatable_volumes($storecfg, $vmid, $conf, 1);
-		my $sorted_volids = [ sort keys %$volumes ];
+		my $volumes = $class->get_replicatable_volumes($storecfg, $vmid, $snap, 1);
+
+		# filter by what we actually iterate over below (excludes vmstate!)
+		my $volids = [];
+		$class->foreach_volume($snap, sub {
+		    my ($vs, $volume) = @_;
+
+		    my $volid_key = $class->volid_key();
+		    my $volid = $volume->{$volid_key};
+
+		    push @{$volids}, $volid if $volumes->{$volid};
+		});
 
 		# remove all local replication snapshots (jobid => undef)
 		my $logfunc = sub { my $line = shift; chomp $line; print "$line\n"; };
-		PVE::Replication::prepare($storecfg, $sorted_volids, undef, 1, undef, $logfunc);
+		PVE::Replication::prepare($storecfg, $volids, undef, 1, undef, $logfunc);
 	    }
 
 	    $class->foreach_volume($snap, sub {
-- 
2.30.2





^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pve-devel] [PATCH v3 guest-common 2/7] config: rollback: factor out helper for removing replication snapshots
  2021-08-12 11:00 [pve-devel] [PATCH-SERIES v3] fix #3111: fix interaction of snapshot operations with replication Fabian Ebner
                   ` (5 preceding siblings ...)
  2021-08-12 11:01 ` [pve-devel] [PATCH v3 guest-common 1/7] partially fix #3111: snapshot rollback: improve removing replication snapshots Fabian Ebner
@ 2021-08-12 11:01 ` Fabian Ebner
  2021-08-12 11:01 ` [pve-devel] [PATCH v3 guest-common 3/7] partially fix #3111: further improve " Fabian Ebner
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Fabian Ebner @ 2021-08-12 11:01 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
 src/PVE/AbstractConfig.pm | 49 ++++++++++++++++++++++-----------------
 1 file changed, 28 insertions(+), 21 deletions(-)

diff --git a/src/PVE/AbstractConfig.pm b/src/PVE/AbstractConfig.pm
index 493bf97..0d1c7ca 100644
--- a/src/PVE/AbstractConfig.pm
+++ b/src/PVE/AbstractConfig.pm
@@ -943,14 +943,39 @@ sub snapshot_delete {
     });
 }
 
+# Remove replication snapshots to make a rollback possible.
+my $rollback_remove_replication_snapshots = sub {
+    my ($class, $vmid, $snap) = @_;
+
+    my $storecfg = PVE::Storage::config();
+
+    my $repl_conf = PVE::ReplicationConfig->new();
+    return if !$repl_conf->check_for_existing_jobs($vmid, 1);
+
+    my $volumes = $class->get_replicatable_volumes($storecfg, $vmid, $snap, 1);
+
+    # filter by what we actually iterate over below (excludes vmstate!)
+    my $volids = [];
+    $class->foreach_volume($snap, sub {
+	my ($vs, $volume) = @_;
+
+	my $volid_key = $class->volid_key();
+	my $volid = $volume->{$volid_key};
+
+	push @{$volids}, $volid if $volumes->{$volid};
+    });
+
+    # remove all local replication snapshots (jobid => undef)
+    my $logfunc = sub { my $line = shift; chomp $line; print "$line\n"; };
+    PVE::Replication::prepare($storecfg, $volids, undef, 1, undef, $logfunc);
+};
+
 # Rolls back to a given snapshot.
 sub snapshot_rollback {
     my ($class, $vmid, $snapname) = @_;
 
     my $prepare = 1;
 
-    my $storecfg = PVE::Storage::config();
-
     my $data = {};
 
     my $get_snapshot_config = sub {
@@ -972,25 +997,7 @@ sub snapshot_rollback {
 	$snap = $get_snapshot_config->($conf);
 
 	if ($prepare) {
-	    my $repl_conf = PVE::ReplicationConfig->new();
-	    if ($repl_conf->check_for_existing_jobs($vmid, 1)) {
-		my $volumes = $class->get_replicatable_volumes($storecfg, $vmid, $snap, 1);
-
-		# filter by what we actually iterate over below (excludes vmstate!)
-		my $volids = [];
-		$class->foreach_volume($snap, sub {
-		    my ($vs, $volume) = @_;
-
-		    my $volid_key = $class->volid_key();
-		    my $volid = $volume->{$volid_key};
-
-		    push @{$volids}, $volid if $volumes->{$volid};
-		});
-
-		# remove all local replication snapshots (jobid => undef)
-		my $logfunc = sub { my $line = shift; chomp $line; print "$line\n"; };
-		PVE::Replication::prepare($storecfg, $volids, undef, 1, undef, $logfunc);
-	    }
+	    $rollback_remove_replication_snapshots->($class, $vmid, $snap);
 
 	    $class->foreach_volume($snap, sub {
 	       my ($vs, $volume) = @_;
-- 
2.30.2





^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pve-devel] [PATCH v3 guest-common 3/7] partially fix #3111: further improve removing replication snapshots
  2021-08-12 11:00 [pve-devel] [PATCH-SERIES v3] fix #3111: fix interaction of snapshot operations with replication Fabian Ebner
                   ` (6 preceding siblings ...)
  2021-08-12 11:01 ` [pve-devel] [PATCH v3 guest-common 2/7] config: rollback: factor out helper for " Fabian Ebner
@ 2021-08-12 11:01 ` Fabian Ebner
  2021-08-12 11:01 ` [pve-devel] [PATCH v3 guest-common 4/7] replication: remove unused variable and style fixes Fabian Ebner
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Fabian Ebner @ 2021-08-12 11:01 UTC (permalink / raw)
  To: pve-devel

by using the new $blocker parameter. No longer remove all replication
snapshots from affected volumes unconditionally, but check first if
all blocking snapshots are replication snapshots. If they are, remove
them and proceed with rollback. If they are not, die without removing
any.

For backwards compatibility, it's still necessary to remove all
replication snapshots if $blockers is not available.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
 src/PVE/AbstractConfig.pm | 47 +++++++++++++++++++++++++++++++++++----
 src/PVE/Replication.pm    |  6 +++++
 2 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/src/PVE/AbstractConfig.pm b/src/PVE/AbstractConfig.pm
index 0d1c7ca..a5a15bf 100644
--- a/src/PVE/AbstractConfig.pm
+++ b/src/PVE/AbstractConfig.pm
@@ -945,7 +945,7 @@ sub snapshot_delete {
 
 # Remove replication snapshots to make a rollback possible.
 my $rollback_remove_replication_snapshots = sub {
-    my ($class, $vmid, $snap) = @_;
+    my ($class, $vmid, $snap, $snapname) = @_;
 
     my $storecfg = PVE::Storage::config();
 
@@ -954,17 +954,56 @@ my $rollback_remove_replication_snapshots = sub {
 
     my $volumes = $class->get_replicatable_volumes($storecfg, $vmid, $snap, 1);
 
-    # filter by what we actually iterate over below (excludes vmstate!)
+    # For these, all replication snapshots need to be removed for backwards compatibility.
     my $volids = [];
+
+    # For these, we know more and can remove only the required replication snapshots.
+    my $blocking_snapshots = {};
+
+    # filter by what we actually iterate over below (excludes vmstate!)
     $class->foreach_volume($snap, sub {
 	my ($vs, $volume) = @_;
 
 	my $volid_key = $class->volid_key();
 	my $volid = $volume->{$volid_key};
 
-	push @{$volids}, $volid if $volumes->{$volid};
+	return if !$volumes->{$volid};
+
+	my $blockers = [];
+	eval { $class->__snapshot_rollback_vol_possible($volume, $snapname, $blockers); };
+	if (my $err = $@) {
+	    # FIXME die instead, once $blockers is required by the storage plugin API
+	    # and the guest plugins are required to be new enough to support it too.
+	    # Currently, it's not possible to distinguish between blockers being empty
+	    # because the plugin is old versus because there is a different error.
+	    if (scalar($blockers->@*) == 0) {
+		push @{$volids}, $volid;
+		return;
+	    }
+
+	    for my $blocker ($blockers->@*) {
+		die $err if !PVE::Replication::is_replication_snapshot($blocker);
+	    }
+
+	    $blocking_snapshots->{$volid} = $blockers;
+	}
     });
 
+    my $removed_repl_snapshot;
+    for my $volid (sort keys $blocking_snapshots->%*) {
+	my $blockers = $blocking_snapshots->{$volid};
+
+	for my $blocker ($blockers->@*) {
+	    warn "WARN: removing replication snapshot '$volid\@$blocker'\n";
+	    $removed_repl_snapshot = 1;
+	    eval { PVE::Storage::volume_snapshot_delete($storecfg, $volid, $blocker); };
+	    die $@ if $@;
+	}
+    }
+    warn "WARN: you shouldn't remove '$snapname' before running the next replication!\n"
+	if $removed_repl_snapshot;
+
+    # Need to keep using a hammer for backwards compatibility...
     # remove all local replication snapshots (jobid => undef)
     my $logfunc = sub { my $line = shift; chomp $line; print "$line\n"; };
     PVE::Replication::prepare($storecfg, $volids, undef, 1, undef, $logfunc);
@@ -997,7 +1036,7 @@ sub snapshot_rollback {
 	$snap = $get_snapshot_config->($conf);
 
 	if ($prepare) {
-	    $rollback_remove_replication_snapshots->($class, $vmid, $snap);
+	    $rollback_remove_replication_snapshots->($class, $vmid, $snap, $snapname);
 
 	    $class->foreach_volume($snap, sub {
 	       my ($vs, $volume) = @_;
diff --git a/src/PVE/Replication.pm b/src/PVE/Replication.pm
index ae0f145..3da79be 100644
--- a/src/PVE/Replication.pm
+++ b/src/PVE/Replication.pm
@@ -435,4 +435,10 @@ sub run_replication {
     return $volumes;
 }
 
+sub is_replication_snapshot {
+    my ($snapshot_name) = @_;
+
+    return $snapshot_name =~ m/^__replicate_/ ? 1 : 0;
+}
+
 1;
-- 
2.30.2





^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pve-devel] [PATCH v3 guest-common 4/7] replication: remove unused variable and style fixes
  2021-08-12 11:00 [pve-devel] [PATCH-SERIES v3] fix #3111: fix interaction of snapshot operations with replication Fabian Ebner
                   ` (7 preceding siblings ...)
  2021-08-12 11:01 ` [pve-devel] [PATCH v3 guest-common 3/7] partially fix #3111: further improve " Fabian Ebner
@ 2021-08-12 11:01 ` Fabian Ebner
  2021-08-12 11:01 ` [pve-devel] [PATCH v3 guest-common 5/7] replication: pass guest config to find_common_replication_snapshot Fabian Ebner
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Fabian Ebner @ 2021-08-12 11:01 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
 src/PVE/Replication.pm | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/src/PVE/Replication.pm b/src/PVE/Replication.pm
index 3da79be..83fc642 100644
--- a/src/PVE/Replication.pm
+++ b/src/PVE/Replication.pm
@@ -36,18 +36,25 @@ sub find_common_replication_snapshot {
     # test if we have a replication_ snapshot from last sync
     # and remove all other/stale replication snapshots
 
-    my $last_snapshots = prepare(
-	$storecfg, $volumes, $jobid, $last_sync, $parent_snapname, $logfunc);
+    my $last_snapshots =
+	prepare($storecfg, $volumes, $jobid, $last_sync, $parent_snapname, $logfunc);
 
     # prepare remote side
     my $remote_snapshots = remote_prepare_local_job(
-	$ssh_info, $jobid, $vmid, $volumes, $storeid_list, $last_sync, $parent_snapname, 0, $logfunc);
+	$ssh_info,
+	$jobid,
+	$vmid,
+	$volumes,
+	$storeid_list,
+	$last_sync,
+	$parent_snapname,
+	0,
+	$logfunc,
+    );
 
     my $base_snapshots = {};
 
     foreach my $volid (@$volumes) {
-	my $base_snapname;
-
 	if (defined($last_snapshots->{$volid}) && defined($remote_snapshots->{$volid})) {
 	    if ($last_snapshots->{$volid}->{$last_sync_snapname} &&
 		$remote_snapshots->{$volid}->{$last_sync_snapname}) {
-- 
2.30.2





^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pve-devel] [PATCH v3 guest-common 5/7] replication: pass guest config to find_common_replication_snapshot
  2021-08-12 11:00 [pve-devel] [PATCH-SERIES v3] fix #3111: fix interaction of snapshot operations with replication Fabian Ebner
                   ` (8 preceding siblings ...)
  2021-08-12 11:01 ` [pve-devel] [PATCH v3 guest-common 4/7] replication: remove unused variable and style fixes Fabian Ebner
@ 2021-08-12 11:01 ` Fabian Ebner
  2021-08-12 11:01 ` [pve-devel] [PATCH v3 guest-common 6/7] partially fix #3111: replication: be less picky when selecting incremental base Fabian Ebner
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Fabian Ebner @ 2021-08-12 11:01 UTC (permalink / raw)
  To: pve-devel

in preparation to iterate over all config snapshots when necessary.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
 src/PVE/Replication.pm | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/PVE/Replication.pm b/src/PVE/Replication.pm
index 83fc642..4056ea2 100644
--- a/src/PVE/Replication.pm
+++ b/src/PVE/Replication.pm
@@ -28,7 +28,9 @@ sub get_log_time {
 # Find common base replication snapshot, available on local and remote side.
 # Note: this also removes stale replication snapshots
 sub find_common_replication_snapshot {
-    my ($ssh_info, $jobid, $vmid, $storecfg, $volumes, $storeid_list, $last_sync, $parent_snapname, $logfunc) = @_;
+    my ($ssh_info, $jobid, $vmid, $storecfg, $volumes, $storeid_list, $last_sync, $guest_conf, $logfunc) = @_;
+
+    my $parent_snapname = $guest_conf->{parent};
 
     my $last_sync_snapname =
 	PVE::ReplicationState::replication_snapshot_name($jobid, $last_sync);
@@ -270,10 +272,8 @@ sub replicate {
 
     my $ssh_info = PVE::SSHInfo::get_ssh_info($jobcfg->{target}, $migration_network);
 
-    my $parent_snapname = $conf->{parent};
-
     my ($base_snapshots, $last_snapshots, $last_sync_snapname) = find_common_replication_snapshot(
-	$ssh_info, $jobid, $vmid, $storecfg, $sorted_volids, $state->{storeid_list}, $last_sync, $parent_snapname, $logfunc);
+	$ssh_info, $jobid, $vmid, $storecfg, $sorted_volids, $state->{storeid_list}, $last_sync, $conf, $logfunc);
 
     my $storeid_hash = {};
     foreach my $volid (@$sorted_volids) {
-- 
2.30.2





^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pve-devel] [PATCH v3 guest-common 6/7] partially fix #3111: replication: be less picky when selecting incremental base
  2021-08-12 11:00 [pve-devel] [PATCH-SERIES v3] fix #3111: fix interaction of snapshot operations with replication Fabian Ebner
                   ` (9 preceding siblings ...)
  2021-08-12 11:01 ` [pve-devel] [PATCH v3 guest-common 5/7] replication: pass guest config to find_common_replication_snapshot Fabian Ebner
@ 2021-08-12 11:01 ` Fabian Ebner
  2021-08-12 11:01 ` [pve-devel] [RFC v3 guest-common 7/7] fix #3111: config: snapshot delete: check if replication still needs it Fabian Ebner
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Fabian Ebner @ 2021-08-12 11:01 UTC (permalink / raw)
  To: pve-devel

After rollback, it might be necessary to start the replication from an
earlier, possibly non-replication, snapshot, because the replication
snapshot might have been removed from the source node. Previously,
replication could only recover in case the current parent snapshot was
already replicated.

To get into the bad situation (with no replication happening between
the steps):
1. have existing replication
2. take new snapshot
3. rollback to that snapshot
In case the partial fix to only remove blocking replication snapshots
for rollback was already applied, an additional step is necessary to
get into the bad situation:
4. take a second new snapshot

Since non-replication snapshots are now also included, where no
timestamp is readily available, it is necessary to filter them out
when probing for replication snapshots.

If no common replication snapshot is present, iterate backwards
through the config snapshots.

The changes are backwards compatible:

If one side is running the old code, and the other the new code,
the fact that one of the two prepare() calls does not return the
new additional snapshot candidates, means that no new match is
possible. Since the new prepare() returns a superset, no previously
possible match is now impossible.

The branch with @desc_sorted_snap is now taken more often, but
it can still only be taken when the volume exists on the remote side
(and has snapshots). In such cases, it is safe to die if no
incremental base snapshot can be found, because a full sync would not
be possible.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---

$parent_snapname could now slowly be dropped from prepare(), but I'll
save that for later (it'll take at least until 8.x because of
backwards compatibility anyways).

 src/PVE/Replication.pm | 31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/src/PVE/Replication.pm b/src/PVE/Replication.pm
index 4056ea2..2609ad6 100644
--- a/src/PVE/Replication.pm
+++ b/src/PVE/Replication.pm
@@ -65,10 +65,12 @@ sub find_common_replication_snapshot {
 		     ($last_snapshots->{$volid}->{$parent_snapname} &&
 		      $remote_snapshots->{$volid}->{$parent_snapname})) {
 		$base_snapshots->{$volid} = $parent_snapname;
-	    } elsif ($last_sync == 0) {
+	    } else {
+		# First, try all replication snapshots.
 		my @desc_sorted_snap =
 		    map { $_->[1] } sort { $b->[0] <=> $a->[0] }
-		    map { [ ($_ =~ /__replicate_\Q$jobid\E_(\d+)_/)[0] || 0, $_ ] }
+		    grep { $_->[0] != 0 } # only consider replication snapshots
+		    map { [ ($_ =~ /__replicate_\Q$vmid\E-(?:\d+)_(\d+)_/)[0] || 0, $_ ] }
 		    keys %{$remote_snapshots->{$volid}};
 
 		foreach my $remote_snap (@desc_sorted_snap) {
@@ -77,6 +79,28 @@ sub find_common_replication_snapshot {
 			last;
 		    }
 		}
+
+		# Then, try config snapshots ($parent_snapname was already tested for above).
+		my $snapname = $parent_snapname // '';
+
+		# Be robust against loop, just in case.
+		# https://forum.proxmox.com/threads/snapshot-not-working.69511/post-312281
+		my $max_count = scalar(keys $guest_conf->{snapshots}->%*);
+		for (my $count = 0; $count < $max_count; $count++) {
+		    last if defined($base_snapshots->{$volid});
+
+		    my $snap_conf = $guest_conf->{snapshots}->{$snapname} || last;
+		    $snapname = $snap_conf->{parent} // last;
+
+		    if ($last_snapshots->{$volid}->{$snapname} &&
+			$remote_snapshots->{$volid}->{$snapname})
+		    {
+			$base_snapshots->{$volid} = $snapname;
+		    }
+		}
+
+		# The volume exists on the remote side, so trying a full sync won't work.
+		# Die early with a clean error.
 		die "No common base to restore the job state\n".
 		    "please delete jobid: $jobid and create the job again\n"
 		    if !defined($base_snapshots->{$volid});
@@ -182,6 +206,9 @@ sub prepare {
 		} else {
 		    $last_snapshots->{$volid}->{$snap} = 1;
 		}
+	    # Other snapshots might need to serve as replication base after rollback
+	    } else {
+		$last_snapshots->{$volid}->{$snap} = 1;
 	    }
 	}
     }
-- 
2.30.2





^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pve-devel] [RFC v3 guest-common 7/7] fix #3111: config: snapshot delete: check if replication still needs it
  2021-08-12 11:00 [pve-devel] [PATCH-SERIES v3] fix #3111: fix interaction of snapshot operations with replication Fabian Ebner
                   ` (10 preceding siblings ...)
  2021-08-12 11:01 ` [pve-devel] [PATCH v3 guest-common 6/7] partially fix #3111: replication: be less picky when selecting incremental base Fabian Ebner
@ 2021-08-12 11:01 ` Fabian Ebner
  2021-10-06  6:59 ` [pve-devel] [PATCH-SERIES v3] fix #3111: fix interaction of snapshot operations with replication Fabian Ebner
  2021-11-09 16:49 ` [pve-devel] applied-series: " Fabian Grünbichler
  13 siblings, 0 replies; 16+ messages in thread
From: Fabian Ebner @ 2021-08-12 11:01 UTC (permalink / raw)
  To: pve-devel

and abort if it does and --force is not specified.

After rollback, the rollback snapshot might still be needed as the
base for incremental replication, because rollback removes (blocking)
replication snapshots.

It's not enough to limit the check to the most recent snapshot,
because new snapshots might've been created between rollback and
remove.

It's not enough to limit the check to snapshots without a parent (i.e.
in case of ZFS, the oldest), because some volumes might've been added
only after that, meaning the oldest snapshot is not an incremental
replication base for them.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---

Sent as RFC, because I feel like this is quite a bit of code just to
prevent a corner case that's now already warned about upon rollback.
Arguably the warning in the UI is not very visible, but improving that
by either using the new task warnings or showing the task viewer upon
rollback is an alternative that might be preferable.

 src/PVE/AbstractConfig.pm    | 41 ++++++++++++++++++++++++++++++++++++
 src/PVE/Replication.pm       |  6 +++++-
 src/PVE/ReplicationConfig.pm | 14 ++++++++++++
 3 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/src/PVE/AbstractConfig.pm b/src/PVE/AbstractConfig.pm
index a5a15bf..39f1cc8 100644
--- a/src/PVE/AbstractConfig.pm
+++ b/src/PVE/AbstractConfig.pm
@@ -824,6 +824,44 @@ sub snapshot_create {
     $class->__snapshot_commit($vmid, $snapname);
 }
 
+# Check if the snapshot might still be needed by a replication job.
+my $snapshot_delete_assert_not_needed_by_replication = sub {
+    my ($class, $vmid, $conf, $snap, $snapname) = @_;
+
+    my $repl_conf = PVE::ReplicationConfig->new();
+    return if !$repl_conf->check_for_existing_jobs($vmid, 1);
+
+    my $storecfg = PVE::Storage::config();
+
+    # Current config's volumes are relevant for replication.
+    my $volumes = $class->get_replicatable_volumes($storecfg, $vmid, $conf, 1);
+
+    my $replication_jobs = $repl_conf->list_guests_local_replication_jobs($vmid);
+
+    $class->foreach_volume($snap, sub {
+	my ($vs, $volume) = @_;
+
+	my $volid_key = $class->volid_key();
+	my $volid = $volume->{$volid_key};
+
+	return if !$volumes->{$volid};
+
+	my $snapshots = PVE::Storage::volume_snapshot_list($storecfg, $volid);
+
+	for my $job ($replication_jobs->@*) {
+	    my $jobid = $job->{id};
+
+	    my @jobs_snapshots = grep {
+		PVE::Replication::is_replication_snapshot($_, $jobid)
+	    } $snapshots->@*;
+
+	    next if scalar(@jobs_snapshots) > 0;
+
+	    die "snapshot '$snapname' needed by replication job '$jobid' - run replication first\n";
+	}
+    });
+};
+
 # Deletes a snapshot.
 # Note: $drivehash is only set when called from snapshot_create.
 sub snapshot_delete {
@@ -838,6 +876,9 @@ sub snapshot_delete {
 
     die "snapshot '$snapname' does not exist\n" if !defined($snap);
 
+    $snapshot_delete_assert_not_needed_by_replication->($class, $vmid, $conf, $snap, $snapname)
+	if !$drivehash && !$force;
+
     $class->set_lock($vmid, 'snapshot-delete')
 	if (!$drivehash); # doesn't already have a 'snapshot' lock
 
diff --git a/src/PVE/Replication.pm b/src/PVE/Replication.pm
index 2609ad6..098ac00 100644
--- a/src/PVE/Replication.pm
+++ b/src/PVE/Replication.pm
@@ -470,7 +470,11 @@ sub run_replication {
 }
 
 sub is_replication_snapshot {
-    my ($snapshot_name) = @_;
+    my ($snapshot_name, $jobid) = @_;
+
+    if (defined($jobid)) {
+	return $snapshot_name =~ m/^__replicate_\Q$jobid\E/ ? 1 : 0;
+    }
 
     return $snapshot_name =~ m/^__replicate_/ ? 1 : 0;
 }
diff --git a/src/PVE/ReplicationConfig.pm b/src/PVE/ReplicationConfig.pm
index fd856a0..78f55bb 100644
--- a/src/PVE/ReplicationConfig.pm
+++ b/src/PVE/ReplicationConfig.pm
@@ -228,6 +228,20 @@ sub find_local_replication_job {
     return undef;
 }
 
+sub list_guests_local_replication_jobs {
+    my ($cfg, $vmid) = @_;
+
+    my $jobs = [];
+
+    for my $job (values %{$cfg->{ids}}) {
+	next if $job->{type} ne 'local' || $job->{guest} != $vmid;
+
+	push @{$jobs}, $job;
+    }
+
+    return $jobs;
+}
+
 # makes old_target the new source for all local jobs of this guest
 # makes new_target the target for the single local job with target old_target
 sub switch_replication_job_target_nolock {
-- 
2.30.2





^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [pve-devel] [PATCH v3 storage 2/3] zfspool: add blockers parameter to volume_snapshot_is_possible
  2021-08-12 11:01 ` [pve-devel] [PATCH v3 storage 2/3] zfspool: add blockers parameter to volume_snapshot_is_possible Fabian Ebner
@ 2021-08-13  6:54   ` Fabian Ebner
  0 siblings, 0 replies; 16+ messages in thread
From: Fabian Ebner @ 2021-08-13  6:54 UTC (permalink / raw)
  To: pve-devel

Am 12.08.21 um 13:01 schrieb Fabian Ebner:
> useful for rollback, so that only the required replication snapshots
> can be removed, and it's possible to abort early without deleting any
> replication snapshots if there are other non-replication snasphots
> blocking rollback.
> 
> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
> ---

Forgot to mention that this requires an APIVER+APIAGE bump.

>   PVE/Storage.pm               |  4 ++--
>   PVE/Storage/BTRFSPlugin.pm   |  2 +-
>   PVE/Storage/Plugin.pm        |  5 ++++-
>   PVE/Storage/ZFSPoolPlugin.pm | 23 +++++++++++++++++------
>   4 files changed, 24 insertions(+), 10 deletions(-)
> 
> diff --git a/PVE/Storage.pm b/PVE/Storage.pm
> index c04b5a2..3b3ce93 100755
> --- a/PVE/Storage.pm
> +++ b/PVE/Storage.pm
> @@ -269,13 +269,13 @@ sub volume_resize {
>   }
>   
>   sub volume_rollback_is_possible {
> -    my ($cfg, $volid, $snap) = @_;
> +    my ($cfg, $volid, $snap, $blockers) = @_;
>   
>       my ($storeid, $volname) = parse_volume_id($volid, 1);
>       if ($storeid) {
>           my $scfg = storage_config($cfg, $storeid);
>           my $plugin = PVE::Storage::Plugin->lookup($scfg->{type});
> -        return $plugin->volume_rollback_is_possible($scfg, $storeid, $volname, $snap);
> +        return $plugin->volume_rollback_is_possible($scfg, $storeid, $volname, $snap, $blockers);
>       } elsif ($volid =~ m|^(/.+)$| && -e $volid) {
>           die "snapshot rollback file/device '$volid' is not possible\n";
>       } else {
> diff --git a/PVE/Storage/BTRFSPlugin.pm b/PVE/Storage/BTRFSPlugin.pm
> index 411cab9..60fb027 100644
> --- a/PVE/Storage/BTRFSPlugin.pm
> +++ b/PVE/Storage/BTRFSPlugin.pm
> @@ -513,7 +513,7 @@ sub volume_snapshot {
>   }
>   
>   sub volume_rollback_is_possible {
> -    my ($class, $scfg, $storeid, $volname, $snap) = @_;
> +    my ($class, $scfg, $storeid, $volname, $snap, $blockers) = @_;
>   
>       return 1;
>   }
> diff --git a/PVE/Storage/Plugin.pm b/PVE/Storage/Plugin.pm
> index b1865cb..c265e53 100644
> --- a/PVE/Storage/Plugin.pm
> +++ b/PVE/Storage/Plugin.pm
> @@ -900,8 +900,11 @@ sub volume_snapshot {
>       return undef;
>   }
>   
> +# Asserts that a rollback to $snap on $volname is possible.
> +# If certain snapshots are preventing the rollback and $blockers is an array
> +# reference, the snapshot names can be pushed onto $blockers prior to dying.
>   sub volume_rollback_is_possible {
> -    my ($class, $scfg, $storeid, $volname, $snap) = @_;
> +    my ($class, $scfg, $storeid, $volname, $snap, $blockers) = @_;
>   
>       return 1;
>   }
> diff --git a/PVE/Storage/ZFSPoolPlugin.pm b/PVE/Storage/ZFSPoolPlugin.pm
> index c2a0385..8d79f38 100644
> --- a/PVE/Storage/ZFSPoolPlugin.pm
> +++ b/PVE/Storage/ZFSPoolPlugin.pm
> @@ -480,18 +480,29 @@ sub volume_snapshot_rollback {
>   }
>   
>   sub volume_rollback_is_possible {
> -    my ($class, $scfg, $storeid, $volname, $snap) = @_;
> +    my ($class, $scfg, $storeid, $volname, $snap, $blockers) = @_;
>   
>       # can't use '-S creation', because zfs list won't reverse the order when the
>       # creation time is the same second, breaking at least our tests.
>       my $snapshots = $class->zfs_get_sorted_snapshot_list($scfg, $volname, ['-s', 'creation']);
> -    my $recentsnap = $snapshots->[-1];
>   
> -    die "can't rollback, no snapshots exist at all\n"
> -	if !defined($recentsnap);
> +    my $found;
> +    $blockers //= []; # not guaranteed to be set by caller
> +    for my $snapshot ($snapshots->@*) {
> +	if ($snapshot eq $snap) {
> +	    $found = 1;
> +	} elsif ($found) {
> +	    push $blockers->@*, $snapshot;
> +	}
> +    }
> +
> +    my $volid = "${storeid}:${volname}";
> +
> +    die "can't rollback, snapshot '$snap' does not exist on '$volid'\n"
> +	if !$found;
>   
> -    die "can't rollback, '$snap' is not most recent snapshot\n"
> -	if $snap ne $recentsnap;
> +    die "can't rollback, '$snap' is not most recent snapshot on '$volid'\n"
> +	if scalar($blockers->@*) > 0;
>   
>       return 1;
>   }
> 




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [pve-devel] [PATCH-SERIES v3] fix #3111: fix interaction of snapshot operations with replication
  2021-08-12 11:00 [pve-devel] [PATCH-SERIES v3] fix #3111: fix interaction of snapshot operations with replication Fabian Ebner
                   ` (11 preceding siblings ...)
  2021-08-12 11:01 ` [pve-devel] [RFC v3 guest-common 7/7] fix #3111: config: snapshot delete: check if replication still needs it Fabian Ebner
@ 2021-10-06  6:59 ` Fabian Ebner
  2021-11-09 16:49 ` [pve-devel] applied-series: " Fabian Grünbichler
  13 siblings, 0 replies; 16+ messages in thread
From: Fabian Ebner @ 2021-10-06  6:59 UTC (permalink / raw)
  To: pve-devel

Any feedback for this?

Am 12.08.21 um 13:00 schrieb Fabian Ebner:
> For more context, see also:
> https://lists.proxmox.com/pipermail/pve-devel/2021-August/049694.html
> 
> Changes from v2:
>      * Many new patches, as the approach is different:
>        For one, only replication snapshots that are blocking rollback
>        are removed. Second, consider more snapshot candidates when
>        probing for an incremental replication base. Last, instead of
>        directly running replication after rollback, prevent snapshot
>        deletion if it might be the current incremental replication
>        base.
> 
> Many thanks to Fabian G. for discussing those ideas with me!
> 
> All patches are new in v3, except guest-common patch #1, which
> hasn't changed much, and would fix some issues already by itself.
> 
> I think nothing requires an explicit dependency bump, but some things,
> like "remove only the real blockers", will only start working when all
> the pieces are in place.
> 
> 
> storage:
> 
> Fabian Ebner (3):
>    zfspool: add zfs_get_sorted_snapshot_list helper
>    zfspool: add blockers parameter to volume_snapshot_is_possible
>    test: zfspool: extend some rollback is possible tests with new
>      blockers parameter
> 
>   PVE/Storage.pm                 |  4 +--
>   PVE/Storage/BTRFSPlugin.pm     |  2 +-
>   PVE/Storage/Plugin.pm          |  5 ++-
>   PVE/Storage/ZFSPoolPlugin.pm   | 65 +++++++++++++++-------------------
>   test/run_test_zfspoolplugin.pl | 65 +++++++++++++++++++++++++---------
>   5 files changed, 85 insertions(+), 56 deletions(-)
> 
> 
> container:
> 
> Fabian Ebner (1):
>    config: rollback is possible: add blockers parameter
> 
>   src/PVE/LXC/Config.pm | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
> 
> 
> qemu-server:
> 
> Fabian Ebner (1):
>    config: rollback is possible: add blockers parameter
> 
>   PVE/QemuConfig.pm | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> 
> guest-common:
> 
> Fabian Ebner (7):
>    partially fix #3111: snapshot rollback: improve removing replication
>      snapshots
>    config: rollback: factor out helper for removing replication snapshots
>    partially fix #3111: further improve removing replication snapshots
>    replication: remove unused variable and style fixes
>    replication: pass guest config to find_common_replication_snapshot
>    partially fix #3111: replication: be less picky when selecting
>      incremental base
>    fix #3111 config: snapshot delete: check if replication still needs it
> 
>   src/PVE/AbstractConfig.pm    | 120 +++++++++++++++++++++++++++++++----
>   src/PVE/Replication.pm       |  66 ++++++++++++++++----
>   src/PVE/ReplicationConfig.pm |  14 +++++
>   3 files changed, 177 insertions(+), 23 deletions(-)
> 




^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pve-devel] applied-series: [PATCH-SERIES v3] fix #3111: fix interaction of snapshot operations with replication
  2021-08-12 11:00 [pve-devel] [PATCH-SERIES v3] fix #3111: fix interaction of snapshot operations with replication Fabian Ebner
                   ` (12 preceding siblings ...)
  2021-10-06  6:59 ` [pve-devel] [PATCH-SERIES v3] fix #3111: fix interaction of snapshot operations with replication Fabian Ebner
@ 2021-11-09 16:49 ` Fabian Grünbichler
  13 siblings, 0 replies; 16+ messages in thread
From: Fabian Grünbichler @ 2021-11-09 16:49 UTC (permalink / raw)
  To: Proxmox VE development discussion

On August 12, 2021 1:00 pm, Fabian Ebner wrote:
> For more context, see also:
> https://lists.proxmox.com/pipermail/pve-devel/2021-August/049694.html
> 
> Changes from v2:
>     * Many new patches, as the approach is different:
>       For one, only replication snapshots that are blocking rollback
>       are removed. Second, consider more snapshot candidates when
>       probing for an incremental replication base. Last, instead of
>       directly running replication after rollback, prevent snapshot
>       deletion if it might be the current incremental replication
>       base.
> 
> Many thanks to Fabian G. for discussing those ideas with me!
> 
> All patches are new in v3, except guest-common patch #1, which
> hasn't changed much, and would fix some issues already by itself.
> 
> I think nothing requires an explicit dependency bump, but some things,
> like "remove only the real blockers", will only start working when all
> the pieces are in place.
> 
> 
> storage:
> 
> Fabian Ebner (3):
>   zfspool: add zfs_get_sorted_snapshot_list helper
>   zfspool: add blockers parameter to volume_snapshot_is_possible
>   test: zfspool: extend some rollback is possible tests with new
>     blockers parameter
> 
>  PVE/Storage.pm                 |  4 +--
>  PVE/Storage/BTRFSPlugin.pm     |  2 +-
>  PVE/Storage/Plugin.pm          |  5 ++-
>  PVE/Storage/ZFSPoolPlugin.pm   | 65 +++++++++++++++-------------------
>  test/run_test_zfspoolplugin.pl | 65 +++++++++++++++++++++++++---------
>  5 files changed, 85 insertions(+), 56 deletions(-)
> 
> 
> container:
> 
> Fabian Ebner (1):
>   config: rollback is possible: add blockers parameter
> 
>  src/PVE/LXC/Config.pm | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> 
> qemu-server:
> 
> Fabian Ebner (1):
>   config: rollback is possible: add blockers parameter
> 
>  PVE/QemuConfig.pm | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> 
> guest-common:
> 
> Fabian Ebner (7):
>   partially fix #3111: snapshot rollback: improve removing replication
>     snapshots
>   config: rollback: factor out helper for removing replication snapshots
>   partially fix #3111: further improve removing replication snapshots
>   replication: remove unused variable and style fixes
>   replication: pass guest config to find_common_replication_snapshot
>   partially fix #3111: replication: be less picky when selecting
>     incremental base
>   fix #3111 config: snapshot delete: check if replication still needs it
> 
>  src/PVE/AbstractConfig.pm    | 120 +++++++++++++++++++++++++++++++----
>  src/PVE/Replication.pm       |  66 ++++++++++++++++----
>  src/PVE/ReplicationConfig.pm |  14 +++++
>  3 files changed, 177 insertions(+), 23 deletions(-)
> 
> -- 
> 2.30.2
> 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> 
> 




^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-11-09 16:50 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-12 11:00 [pve-devel] [PATCH-SERIES v3] fix #3111: fix interaction of snapshot operations with replication Fabian Ebner
2021-08-12 11:01 ` [pve-devel] [PATCH v3 storage 1/3] zfspool: add zfs_get_sorted_snapshot_list helper Fabian Ebner
2021-08-12 11:01 ` [pve-devel] [PATCH v3 storage 2/3] zfspool: add blockers parameter to volume_snapshot_is_possible Fabian Ebner
2021-08-13  6:54   ` Fabian Ebner
2021-08-12 11:01 ` [pve-devel] [PATCH v3 storage 3/3] test: zfspool: extend some rollback is possible tests with new blockers parameter Fabian Ebner
2021-08-12 11:01 ` [pve-devel] [PATCH v3 container 1/1] config: rollback is possible: add " Fabian Ebner
2021-08-12 11:01 ` [pve-devel] [PATCH v3 qemu-server " Fabian Ebner
2021-08-12 11:01 ` [pve-devel] [PATCH v3 guest-common 1/7] partially fix #3111: snapshot rollback: improve removing replication snapshots Fabian Ebner
2021-08-12 11:01 ` [pve-devel] [PATCH v3 guest-common 2/7] config: rollback: factor out helper for " Fabian Ebner
2021-08-12 11:01 ` [pve-devel] [PATCH v3 guest-common 3/7] partially fix #3111: further improve " Fabian Ebner
2021-08-12 11:01 ` [pve-devel] [PATCH v3 guest-common 4/7] replication: remove unused variable and style fixes Fabian Ebner
2021-08-12 11:01 ` [pve-devel] [PATCH v3 guest-common 5/7] replication: pass guest config to find_common_replication_snapshot Fabian Ebner
2021-08-12 11:01 ` [pve-devel] [PATCH v3 guest-common 6/7] partially fix #3111: replication: be less picky when selecting incremental base Fabian Ebner
2021-08-12 11:01 ` [pve-devel] [RFC v3 guest-common 7/7] fix #3111: config: snapshot delete: check if replication still needs it Fabian Ebner
2021-10-06  6:59 ` [pve-devel] [PATCH-SERIES v3] fix #3111: fix interaction of snapshot operations with replication Fabian Ebner
2021-11-09 16:49 ` [pve-devel] applied-series: " Fabian Grünbichler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal