From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <f.gruenbichler@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 56E8E68D71
 for <pve-devel@lists.proxmox.com>; Mon,  3 Aug 2020 09:49:51 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 4B40C2B423
 for <pve-devel@lists.proxmox.com>; Mon,  3 Aug 2020 09:49:51 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [212.186.127.180])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id 89AC42B419
 for <pve-devel@lists.proxmox.com>; Mon,  3 Aug 2020 09:49:50 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 5181643092
 for <pve-devel@lists.proxmox.com>; Mon,  3 Aug 2020 09:49:50 +0200 (CEST)
Date: Mon, 03 Aug 2020 09:49:41 +0200
From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= <f.gruenbichler@proxmox.com>
To: Fabian Ebner <f.ebner@proxmox.com>, pve-devel@lists.proxmox.com
References: <20200730112915.15402-1-f.ebner@proxmox.com>
In-Reply-To: <20200730112915.15402-1-f.ebner@proxmox.com>
MIME-Version: 1.0
User-Agent: astroid/0.15.0 (https://github.com/astroidmail/astroid)
Message-Id: <1596440280.7cx8h2dp61.astroid@nora.none>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.047 Adjusted score from AWL reputation of From: address
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 RCVD_IN_DNSWL_MED        -2.3 Sender listed at https://www.dnswl.org/,
 medium trust
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pve-devel] [PATCH manager 1/3] Hold the guest migration lock
 when changing the replication config
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Mon, 03 Aug 2020 07:49:51 -0000

On July 30, 2020 1:29 pm, Fabian Ebner wrote:
> The guest migration lock is already held when running replications,
> but it also makes sense to hold it when updating the replication
> config itself. Otherwise, it can happen that the migration does
> not know the de-facto state of replication.
>=20
> For example:
> 1. migration starts
> 2. replication job is deleted
> 3. migration reads the replication config
> 4. migration runs the replication which causes the
>    replicated disks to be removed, because the job
>    is marked for removal
> 5. migration will continue without replication
>=20
> Note that the migration doesn't actually fail, but it's probably
> not the desired behavior either.
>=20
> Suggested-by: Fabian Gr=C3=BCnbichler <f.gruenbichler@proxmox.com>
> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
> ---
>  PVE/API2/ReplicationConfig.pm | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
>=20
> diff --git a/PVE/API2/ReplicationConfig.pm b/PVE/API2/ReplicationConfig.p=
m
> index 2b4ecd10..e5262068 100644
> --- a/PVE/API2/ReplicationConfig.pm
> +++ b/PVE/API2/ReplicationConfig.pm
> @@ -9,6 +9,7 @@ use PVE::JSONSchema qw(get_standard_option);
>  use PVE::RPCEnvironment;
>  use PVE::ReplicationConfig;
>  use PVE::Cluster;
> +use PVE::GuestHelpers;
> =20
>  use PVE::RESTHandler;
> =20
> @@ -144,7 +145,9 @@ __PACKAGE__->register_method ({
>  	    $cfg->write();
>  	};
> =20
> -	PVE::ReplicationConfig::lock($code);
> +	PVE::GuestHelpers::guest_migration_lock($guest, 10, sub {
> +	    PVE::ReplicationConfig::lock($code);
> +	});

it might make sense to have a single wrapper for this, or add the guest=20
ID as parameter to ReplicationConfig::lock (to not miss it or get the=20
order wrong).

what about the calls to lock within ReplicationConfig? they are all=20
job/guest ID specific, and should also get this additional protection,=20
right?

from a quick glance, there seems to be only a single call to=20
ReplicationConfig::lock that spans more than one job (job_status in=20
ReplicationState), but that immediately iterates over jobs, so we could=20
either move the lock into the loop (expensive, since it involves a=20
cfs_lock), or split the cfs and flock just for this instance?

(side note, that code and possibly other stuff in ReplicationConfig is=20
buggy since it does not re-read the config after locking)

> =20
>  	return undef;
>      }});
> @@ -167,6 +170,7 @@ __PACKAGE__->register_method ({
>  	my $id =3D extract_param($param, 'id');
>  	my $digest =3D extract_param($param, 'digest');
>  	my $delete =3D extract_param($param, 'delete');
> +	my ($guest_id) =3D PVE::ReplicationConfig::parse_replication_job_id($id=
);
> =20
>  	my $code =3D sub {
>  	    my $cfg =3D PVE::ReplicationConfig->new();
> @@ -199,7 +203,9 @@ __PACKAGE__->register_method ({
>  	    $cfg->write();
>  	};
> =20
> -	PVE::ReplicationConfig::lock($code);
> +	PVE::GuestHelpers::guest_migration_lock($guest_id, 10, sub {
> +	    PVE::ReplicationConfig::lock($code);
> +	});
> =20
>  	return undef;
>      }});
> @@ -237,10 +243,12 @@ __PACKAGE__->register_method ({
> =20
>  	my $rpcenv =3D PVE::RPCEnvironment::get();
> =20
> +	my $id =3D extract_param($param, 'id');
> +	my ($guest_id) =3D PVE::ReplicationConfig::parse_replication_job_id($id=
);
> +
>  	my $code =3D sub {
>  	    my $cfg =3D PVE::ReplicationConfig->new();
> =20
> -	    my $id =3D $param->{id};
>  	    if ($param->{force}) {
>  		raise_param_exc({ 'keep' =3D> "conflicts with parameter 'force'" }) if=
 $param->{keep};
>  		delete $cfg->{ids}->{$id};
> @@ -262,7 +270,9 @@ __PACKAGE__->register_method ({
>  	    $cfg->write();
>  	};
> =20
> -	PVE::ReplicationConfig::lock($code);
> +	PVE::GuestHelpers::guest_migration_lock($guest_id, 10, sub {
> +	    PVE::ReplicationConfig::lock($code);
> +	});
> =20
>  	return undef;
>      }});
> --=20
> 2.20.1
>=20
>=20
=