From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 56E8E68D71 for ; Mon, 3 Aug 2020 09:49:51 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 4B40C2B423 for ; Mon, 3 Aug 2020 09:49:51 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [212.186.127.180]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 89AC42B419 for ; Mon, 3 Aug 2020 09:49:50 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 5181643092 for ; Mon, 3 Aug 2020 09:49:50 +0200 (CEST) Date: Mon, 03 Aug 2020 09:49:41 +0200 From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= To: Fabian Ebner , pve-devel@lists.proxmox.com References: <20200730112915.15402-1-f.ebner@proxmox.com> In-Reply-To: <20200730112915.15402-1-f.ebner@proxmox.com> MIME-Version: 1.0 User-Agent: astroid/0.15.0 (https://github.com/astroidmail/astroid) Message-Id: <1596440280.7cx8h2dp61.astroid@nora.none> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-SPAM-LEVEL: Spam detection results: 0 AWL 0.047 Adjusted score from AWL reputation of From: address KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_DNSWL_MED -2.3 Sender listed at https://www.dnswl.org/, medium trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [PATCH manager 1/3] Hold the guest migration lock when changing the replication config X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Aug 2020 07:49:51 -0000 On July 30, 2020 1:29 pm, Fabian Ebner wrote: > The guest migration lock is already held when running replications, > but it also makes sense to hold it when updating the replication > config itself. Otherwise, it can happen that the migration does > not know the de-facto state of replication. >=20 > For example: > 1. migration starts > 2. replication job is deleted > 3. migration reads the replication config > 4. migration runs the replication which causes the > replicated disks to be removed, because the job > is marked for removal > 5. migration will continue without replication >=20 > Note that the migration doesn't actually fail, but it's probably > not the desired behavior either. >=20 > Suggested-by: Fabian Gr=C3=BCnbichler > Signed-off-by: Fabian Ebner > --- > PVE/API2/ReplicationConfig.pm | 18 ++++++++++++++---- > 1 file changed, 14 insertions(+), 4 deletions(-) >=20 > diff --git a/PVE/API2/ReplicationConfig.pm b/PVE/API2/ReplicationConfig.p= m > index 2b4ecd10..e5262068 100644 > --- a/PVE/API2/ReplicationConfig.pm > +++ b/PVE/API2/ReplicationConfig.pm > @@ -9,6 +9,7 @@ use PVE::JSONSchema qw(get_standard_option); > use PVE::RPCEnvironment; > use PVE::ReplicationConfig; > use PVE::Cluster; > +use PVE::GuestHelpers; > =20 > use PVE::RESTHandler; > =20 > @@ -144,7 +145,9 @@ __PACKAGE__->register_method ({ > $cfg->write(); > }; > =20 > - PVE::ReplicationConfig::lock($code); > + PVE::GuestHelpers::guest_migration_lock($guest, 10, sub { > + PVE::ReplicationConfig::lock($code); > + }); it might make sense to have a single wrapper for this, or add the guest=20 ID as parameter to ReplicationConfig::lock (to not miss it or get the=20 order wrong). what about the calls to lock within ReplicationConfig? they are all=20 job/guest ID specific, and should also get this additional protection,=20 right? from a quick glance, there seems to be only a single call to=20 ReplicationConfig::lock that spans more than one job (job_status in=20 ReplicationState), but that immediately iterates over jobs, so we could=20 either move the lock into the loop (expensive, since it involves a=20 cfs_lock), or split the cfs and flock just for this instance? (side note, that code and possibly other stuff in ReplicationConfig is=20 buggy since it does not re-read the config after locking) > =20 > return undef; > }}); > @@ -167,6 +170,7 @@ __PACKAGE__->register_method ({ > my $id =3D extract_param($param, 'id'); > my $digest =3D extract_param($param, 'digest'); > my $delete =3D extract_param($param, 'delete'); > + my ($guest_id) =3D PVE::ReplicationConfig::parse_replication_job_id($id= ); > =20 > my $code =3D sub { > my $cfg =3D PVE::ReplicationConfig->new(); > @@ -199,7 +203,9 @@ __PACKAGE__->register_method ({ > $cfg->write(); > }; > =20 > - PVE::ReplicationConfig::lock($code); > + PVE::GuestHelpers::guest_migration_lock($guest_id, 10, sub { > + PVE::ReplicationConfig::lock($code); > + }); > =20 > return undef; > }}); > @@ -237,10 +243,12 @@ __PACKAGE__->register_method ({ > =20 > my $rpcenv =3D PVE::RPCEnvironment::get(); > =20 > + my $id =3D extract_param($param, 'id'); > + my ($guest_id) =3D PVE::ReplicationConfig::parse_replication_job_id($id= ); > + > my $code =3D sub { > my $cfg =3D PVE::ReplicationConfig->new(); > =20 > - my $id =3D $param->{id}; > if ($param->{force}) { > raise_param_exc({ 'keep' =3D> "conflicts with parameter 'force'" }) if= $param->{keep}; > delete $cfg->{ids}->{$id}; > @@ -262,7 +270,9 @@ __PACKAGE__->register_method ({ > $cfg->write(); > }; > =20 > - PVE::ReplicationConfig::lock($code); > + PVE::GuestHelpers::guest_migration_lock($guest_id, 10, sub { > + PVE::ReplicationConfig::lock($code); > + }); > =20 > return undef; > }}); > --=20 > 2.20.1 >=20 >=20 =