From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <f.gruenbichler@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 402367485D
 for <pve-devel@lists.proxmox.com>; Tue, 22 Jun 2021 09:42:15 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 2A4932342B
 for <pve-devel@lists.proxmox.com>; Tue, 22 Jun 2021 09:41:45 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id AEF8423420
 for <pve-devel@lists.proxmox.com>; Tue, 22 Jun 2021 09:41:43 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 7F1C34302A
 for <pve-devel@lists.proxmox.com>; Tue, 22 Jun 2021 09:41:43 +0200 (CEST)
Date: Tue, 22 Jun 2021 09:41:35 +0200
From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= <f.gruenbichler@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
References: <20210609091858.27219-1-f.ebner@proxmox.com>
 <20210609091858.27219-2-f.ebner@proxmox.com>
In-Reply-To: <20210609091858.27219-2-f.ebner@proxmox.com>
MIME-Version: 1.0
User-Agent: astroid/0.15.0 (https://github.com/astroidmail/astroid)
Message-Id: <1624346775.lqzf5lnz7c.astroid@nora.none>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.692 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [abstractconfig.pm, replicationconfig.pm, proxmox.com]
Subject: Re: [pve-devel] [PATCH v2 guest-common 2/2] fix 3111: replicate
 guest on rollback if there are replication jobs for it
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Tue, 22 Jun 2021 07:42:15 -0000

On June 9, 2021 11:18 am, Fabian Ebner wrote:
> so that there will be a valid replication snapshot again.
>=20
> Otherwise, replication will be broken after a rollback if the last
> (non-replication) snapshot is removed before replication can run again.

I still see issues with these two patches applied..

A: snapshot 'test2', successful replication afterwards, then rollback to=20
'test2':

delete stale replication snapshot '__replicate_102-1_1624346101__' on local=
-zfs:vm-102-disk-0
delete stale replication snapshot '__replicate_102-0_1624346407__' on local=
-zfs:vm-102-disk-0
replicating rolled back guest to node 'pve64test02'
start replication job
guest =3D> VM 102, running =3D> 0
volumes =3D> local-zfs:vm-102-disk-0
create snapshot '__replicate_102-0_1624346600__' on local-zfs:vm-102-disk-0
using secure transmission, rate limit: none
incremental sync 'local-zfs:vm-102-disk-0' (test2 =3D> __replicate_102-0_16=
24346600__)
send from @test2 to rpool/data/vm-102-disk-0@__replicate_102-0_1624346600__=
 estimated size is 624B
total estimated size is 624B
successfully imported 'local-zfs:vm-102-disk-0'
delete previous replication snapshot '__replicate_102-0_1624346407__' on lo=
cal-zfs:vm-102-disk-0
zfs error: could not find any snapshots to destroy; check snapshot names.
end replication job
replicating rolled back guest to node 'pve64test03'
start replication job
guest =3D> VM 102, running =3D> 0
volumes =3D> local-zfs:vm-102-disk-0
create snapshot '__replicate_102-1_1624346602__' on local-zfs:vm-102-disk-0
using secure transmission, rate limit: none
incremental sync 'local-zfs:vm-102-disk-0' (test2 =3D> __replicate_102-1_16=
24346602__)
send from @test2 to rpool/data/vm-102-disk-0@__replicate_102-0_1624346600__=
 estimated size is 624B
send from @__replicate_102-0_1624346600__ to rpool/data/vm-102-disk-0@__rep=
licate_102-1_1624346602__ estimated size is 624B
total estimated size is 1.22K
successfully imported 'local-zfs:vm-102-disk-0'
delete previous replication snapshot '__replicate_102-1_1624346101__' on lo=
cal-zfs:vm-102-disk-0
zfs error: could not find any snapshots to destroy; check snapshot names.
end replication job

two misleading errors from attempting to delete already-cleaned up=20
snapshots (this is just confusing, likely caused by the state file being=20
outdated after prepare(), replication is still working as expected=20
afterwards)

B: successful replication, snapshot 'test3', rollback to 'test3'

delete stale replication snapshot '__replicate_102-1_1624346602__' on local=
-zfs:vm-102-disk-0
delete stale replication snapshot '__replicate_102-0_1624346600__' on local=
-zfs:vm-102-disk-0
replicating rolled back guest to node 'pve64test02'
start replication job
guest =3D> VM 102, running =3D> 0
volumes =3D> local-zfs:vm-102-disk-0
create snapshot '__replicate_102-0_1624346683__' on local-zfs:vm-102-disk-0
using secure transmission, rate limit: none
full sync 'local-zfs:vm-102-disk-0' (__replicate_102-0_1624346683__)
full send of rpool/data/vm-102-disk-0@test estimated size is 5.59G
send from @test to rpool/data/vm-102-disk-0@test2 estimated size is 624B
send from @test2 to rpool/data/vm-102-disk-0@test3 estimated size is 624B
send from @test3 to rpool/data/vm-102-disk-0@__replicate_102-0_1624346683__=
 estimated size is 624B
total estimated size is 5.59G
volume 'rpool/data/vm-102-disk-0' already exists
command 'zfs send -Rpv -- rpool/data/vm-102-disk-0@__replicate_102-0_162434=
6683__' failed: got signal 13
send/receive failed, cleaning up snapshot(s)..
delete previous replication snapshot '__replicate_102-0_1624346683__' on lo=
cal-zfs:vm-102-disk-0
end replication job with error: command 'set -o pipefail && pvesm export lo=
cal-zfs:vm-102-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_102-0_1=
624346683__ | /usr/bin/ssh -e none -o 'BatchMode=3Dyes' -o 'HostKeyAlias=3D=
pve64test02' root@10.0.0.172 -- pvesm import local-zfs:vm-102-disk-0 zfs - =
-with-snapshots 1 -allow-rename 0' failed: exit code 255
unable to replicate rolled back guest to node 'pve64test02' - command 'set =
-o pipefail && pvesm export local-zfs:vm-102-disk-0 zfs - -with-snapshots 1=
 -snapshot __replicate_102-0_1624346683__ | /usr/bin/ssh -e none -o 'BatchM=
ode=3Dyes' -o 'HostKeyAlias=3Dpve64test02' root@10.0.0.172 -- pvesm import =
local-zfs:vm-102-disk-0 zfs - -with-snapshots 1 -allow-rename 0' failed: ex=
it code 255
replicating rolled back guest to node 'pve64test03'
start replication job
guest =3D> VM 102, running =3D> 0
volumes =3D> local-zfs:vm-102-disk-0
create snapshot '__replicate_102-1_1624346685__' on local-zfs:vm-102-disk-0
using secure transmission, rate limit: none
full sync 'local-zfs:vm-102-disk-0' (__replicate_102-1_1624346685__)
full send of rpool/data/vm-102-disk-0@test estimated size is 5.59G
send from @test to rpool/data/vm-102-disk-0@test2 estimated size is 624B
send from @test2 to rpool/data/vm-102-disk-0@test3 estimated size is 624B
send from @test3 to rpool/data/vm-102-disk-0@__replicate_102-1_1624346685__=
 estimated size is 624B
total estimated size is 5.59G
volume 'rpool/data/vm-102-disk-0' already exists
command 'zfs send -Rpv -- rpool/data/vm-102-disk-0@__replicate_102-1_162434=
6685__' failed: got signal 13
send/receive failed, cleaning up snapshot(s)..
delete previous replication snapshot '__replicate_102-1_1624346685__' on lo=
cal-zfs:vm-102-disk-0
end replication job with error: command 'set -o pipefail && pvesm export lo=
cal-zfs:vm-102-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_102-1_1=
624346685__ | /usr/bin/ssh -e none -o 'BatchMode=3Dyes' -o 'HostKeyAlias=3D=
pve64test03' root@10.0.0.173 -- pvesm import local-zfs:vm-102-disk-0 zfs - =
-with-snapshots 1 -allow-rename 0' failed: exit code 255
unable to replicate rolled back guest to node 'pve64test03' - command 'set =
-o pipefail && pvesm export local-zfs:vm-102-disk-0 zfs - -with-snapshots 1=
 -snapshot __replicate_102-1_1624346685__ | /usr/bin/ssh -e none -o 'BatchM=
ode=3Dyes' -o 'HostKeyAlias=3Dpve64test03' root@10.0.0.173 -- pvesm import =
local-zfs:vm-102-disk-0 zfs - -with-snapshots 1 -allow-rename 0' failed: ex=
it code 255

now the replication is broken and requires manual intervention to be=20
fixed:

source: no replication snapshots, regular snapshots test, test2, test3

target: one replication snapshot, regular snapshots test, test2 (test3=20
is missing)

>=20
> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
> ---
>=20
> No changes from v1 (except rebase).
>=20
> Not a huge fan of this, but the alternatives I could come up with don't s=
eem
> much better IMHO:
>=20
> 1. Invalidate/remove replicated volumes after a rollback altogether and r=
equire
> a full sync on the next replication job afterwards.
>=20
> 2. Another one is to disallow removing the last non-replication snapshot =
if:
>     * there is a replication job configured
>     * no replication snapshot for that job currently exists
>       (which likely means it was removed by a previous rollback operation=
, but
>       can also happen for a new job that didn't run yet).
>=20
> 3. Hope not very many people immediately delete their snapshots after rol=
lback.
>=20
> Pick a favorite or suggest your own ;)
>=20
>  src/PVE/AbstractConfig.pm    | 19 +++++++++++++++++--
>  src/PVE/ReplicationConfig.pm | 14 ++++++++++++++
>  2 files changed, 31 insertions(+), 2 deletions(-)
>=20
> diff --git a/src/PVE/AbstractConfig.pm b/src/PVE/AbstractConfig.pm
> index 6542ae4..6cc0537 100644
> --- a/src/PVE/AbstractConfig.pm
> +++ b/src/PVE/AbstractConfig.pm
> @@ -951,6 +951,9 @@ sub snapshot_rollback {
> =20
>      my $storecfg =3D PVE::Storage::config();
> =20
> +    my $repl_conf =3D PVE::ReplicationConfig->new();
> +    my $logfunc =3D sub { my $line =3D shift; chomp $line; print "$line\=
n"; };
> +
>      my $data =3D {};
> =20
>      my $get_snapshot_config =3D sub {
> @@ -972,7 +975,6 @@ sub snapshot_rollback {
>  	$snap =3D $get_snapshot_config->($conf);
> =20
>  	if ($prepare) {
> -	    my $repl_conf =3D PVE::ReplicationConfig->new();
>  	    if ($repl_conf->check_for_existing_jobs($vmid, 1)) {
>  		# remove replication snapshots on volumes affected by rollback *only*!
>  		my $volumes =3D $class->get_replicatable_volumes($storecfg, $vmid, $sn=
ap, 1);
> @@ -989,7 +991,6 @@ sub snapshot_rollback {
>  		});
> =20
>  		# remove all local replication snapshots (jobid =3D> undef)
> -		my $logfunc =3D sub { my $line =3D shift; chomp $line; print "$line\n"=
; };
>  		PVE::Replication::prepare($storecfg, $volids, undef, 1, undef, $logfun=
c);

I) maybe prepare or at least this calling context should hold the=20
replication lock so that it can't race with concurrent replication runs?
II) maybe prepare should update the state file in general (if the last=20
snapshot referenced there is stale and gets removed, the next=20
replication run gets confused) - might/should fix A
III) maybe prepare and/or run_replication need to learn how to deal with=20
"only regular snapshots, but not the last one match" (we could match on
name+guid to ensure the common base is actually a common, previously=20
replicated base/the same snapshot and not just the same name with=20
different content) and construct a send stream from that shared snapshot=20
instead of attempting a full sync where the target already (partially)=20
exists.. that would fix B, and improve replication robustness in=20
general (allowing resuming from a partial shared state instead of having=20
to remove, start over from scratch..)

implementing III would also avoid the need for doing a replication after=20
rollback - the next replication would handle the situation just fine=20
unless ALL previously shared snapshots are removed - we could check for=20
that in the remove snapshot code path though.. or we could just schedule=20
a replication here instead of directly doing it. rollback is an almost=20
instant action (as long as no vmstate is involved), and a replication=20
can take a long time so it seems a bit strange to conflate the two..

>  	    }
> =20
> @@ -1047,6 +1048,20 @@ sub snapshot_rollback {
> =20
>      $prepare =3D 0;
>      $class->lock_config($vmid, $updatefn);
> +
> +    my $replication_jobs =3D $repl_conf->list_guests_replication_jobs($v=
mid);
> +    for my $job (@{$replication_jobs}) {
> +	my $target =3D $job->{target};
> +	$logfunc->("replicating rolled back guest to node '$target'");
> +
> +	my $start_time =3D time();
> +	eval {
> +	    PVE::Replication::run_replication($class, $job, $start_time, $start=
_time, $logfunc);
> +	};
> +	if (my $err =3D $@) {
> +	    warn "unable to replicate rolled back guest to node '$target' - $er=
r";
> +	}
> +    }
>  }
> =20
>  # bash completion helper
> diff --git a/src/PVE/ReplicationConfig.pm b/src/PVE/ReplicationConfig.pm
> index fd856a0..84a718f 100644
> --- a/src/PVE/ReplicationConfig.pm
> +++ b/src/PVE/ReplicationConfig.pm
> @@ -228,6 +228,20 @@ sub find_local_replication_job {
>      return undef;
>  }
> =20
> +sub list_guests_replication_jobs {
> +    my ($cfg, $vmid) =3D @_;
> +
> +    my $jobs =3D [];
> +
> +    for my $job (values %{$cfg->{ids}}) {
> +	next if $job->{type} ne 'local' || $job->{guest} !=3D $vmid;
> +
> +	push @{$jobs}, $job;
> +    }
> +
> +    return $jobs;
> +}
> +
>  # makes old_target the new source for all local jobs of this guest
>  # makes new_target the target for the single local job with target old_t=
arget
>  sub switch_replication_job_target_nolock {
> --=20
> 2.30.2
>=20
>=20
>=20
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>=20
>=20
>=20