From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <f.ebner@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 433A9723DE
 for <pve-devel@lists.proxmox.com>; Mon, 12 Apr 2021 13:37:53 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 3A3CD1E286
 for <pve-devel@lists.proxmox.com>; Mon, 12 Apr 2021 13:37:23 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [212.186.127.180])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id 073011E260
 for <pve-devel@lists.proxmox.com>; Mon, 12 Apr 2021 13:37:22 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id BEE19420B2
 for <pve-devel@lists.proxmox.com>; Mon, 12 Apr 2021 13:37:21 +0200 (CEST)
From: Fabian Ebner <f.ebner@proxmox.com>
To: pve-devel@lists.proxmox.com
Date: Mon, 12 Apr 2021 13:37:17 +0200
Message-Id: <20210412113717.25356-3-f.ebner@proxmox.com>
X-Mailer: git-send-email 2.20.1
In-Reply-To: <20210412113717.25356-1-f.ebner@proxmox.com>
References: <20210412113717.25356-1-f.ebner@proxmox.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.007 Adjusted score from AWL reputation of From: address
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 RCVD_IN_DNSWL_MED        -2.3 Sender listed at https://www.dnswl.org/,
 medium trust
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: [pve-devel] [RFC guest-common 3/3] fix 3111: replicate guest on
 rollback if there are replication jobs for it
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Mon, 12 Apr 2021 11:37:53 -0000

so that there will be a valid replication snapshot again.

Otherwise, replication will be broken after a rollback if the last
(non-replication) snapshot is removed before replication can run once.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---

Not a huge fan of this, but the alternatives I could come up with don't seem
much better IMHO:

1. Invalidate/remove replicated volumes after a rollback altogether and require
a full sync on the next replication job afterwards.

2. Another one is to disallow removing the last non-replication snapshot if:
    * there is a replication job configured
    * no replication snapshot for that job currently exists (which likely
      means it was removed by a previous rollback operation, but can also happen
      for a new job that didn't run yet).

3. Hope not very many people immediately delete their snapshots after rollback.

Pick a favorite or suggest your own ;)

 PVE/AbstractConfig.pm    | 19 +++++++++++++++++--
 PVE/ReplicationConfig.pm | 14 ++++++++++++++
 2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/PVE/AbstractConfig.pm b/PVE/AbstractConfig.pm
index c4d1d6c..b169b8b 100644
--- a/PVE/AbstractConfig.pm
+++ b/PVE/AbstractConfig.pm
@@ -951,6 +951,9 @@ sub snapshot_rollback {
 
     my $storecfg = PVE::Storage::config();
 
+    my $repl_conf = PVE::ReplicationConfig->new();
+    my $logfunc = sub { my $line = shift; chomp $line; print "$line\n"; };
+
     my $data = {};
 
     my $get_snapshot_config = sub {
@@ -972,7 +975,6 @@ sub snapshot_rollback {
 	$snap = $get_snapshot_config->($conf);
 
 	if ($prepare) {
-	    my $repl_conf = PVE::ReplicationConfig->new();
 	    if ($repl_conf->check_for_existing_jobs($vmid, 1)) {
 		# remove replication snapshots on volumes affected by rollback *only*!
 		my $volumes = $class->get_replicatable_volumes($storecfg, $vmid, $snap, 1);
@@ -988,7 +990,6 @@ sub snapshot_rollback {
 		});
 
 		# remove all local replication snapshots (jobid => undef)
-		my $logfunc = sub { my $line = shift; chomp $line; print "$line\n"; };
 		PVE::Replication::prepare($storecfg, $volids, undef, 1, undef, $logfunc);
 	    }
 
@@ -1046,6 +1047,20 @@ sub snapshot_rollback {
 
     $prepare = 0;
     $class->lock_config($vmid, $updatefn);
+
+    my $replication_jobs = $repl_conf->list_guests_replication_jobs($vmid);
+    for my $job (@{$replication_jobs}) {
+	my $target = $job->{target};
+	$logfunc->("replicating rolled back guest to node '$target'");
+
+	my $start_time = time();
+	eval {
+	    PVE::Replication::run_replication($class, $job, $start_time, $start_time, $logfunc);
+	};
+	if (my $err = $@) {
+	    warn "unable to replicate rolled back guest to node '$target' - $err";
+	}
+    }
 }
 
 # bash completion helper
diff --git a/PVE/ReplicationConfig.pm b/PVE/ReplicationConfig.pm
index fd856a0..84a718f 100644
--- a/PVE/ReplicationConfig.pm
+++ b/PVE/ReplicationConfig.pm
@@ -228,6 +228,20 @@ sub find_local_replication_job {
     return undef;
 }
 
+sub list_guests_replication_jobs {
+    my ($cfg, $vmid) = @_;
+
+    my $jobs = [];
+
+    for my $job (values %{$cfg->{ids}}) {
+	next if $job->{type} ne 'local' || $job->{guest} != $vmid;
+
+	push @{$jobs}, $job;
+    }
+
+    return $jobs;
+}
+
 # makes old_target the new source for all local jobs of this guest
 # makes new_target the target for the single local job with target old_target
 sub switch_replication_job_target_nolock {
-- 
2.20.1