From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 70B2168DD7 for ; Mon, 3 Aug 2020 09:11:14 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 5F90B2AF3A for ; Mon, 3 Aug 2020 09:11:14 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [212.186.127.180]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 90E0E2AF2F for ; Mon, 3 Aug 2020 09:11:13 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 5E90E42E9A for ; Mon, 3 Aug 2020 09:11:13 +0200 (CEST) To: pve-devel@lists.proxmox.com References: <20200730112915.15402-1-f.ebner@proxmox.com> From: Fabian Ebner Message-ID: <96e549ce-75a9-7e0b-d8f1-36e2d9b22740@proxmox.com> Date: Mon, 3 Aug 2020 09:11:06 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 MIME-Version: 1.0 In-Reply-To: <20200730112915.15402-1-f.ebner@proxmox.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.781 Adjusted score from AWL reputation of From: address KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -1.458 Looks like a legit reply (A) RCVD_IN_DNSWL_MED -2.3 Sender listed at https://www.dnswl.org/, medium trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [PATCH manager 1/3] Hold the guest migration lock when changing the replication config X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Aug 2020 07:11:14 -0000 Am 30.07.20 um 13:29 schrieb Fabian Ebner: > The guest migration lock is already held when running replications, > but it also makes sense to hold it when updating the replication > config itself. Otherwise, it can happen that the migration does > not know the de-facto state of replication. > > For example: > 1. migration starts > 2. replication job is deleted > 3. migration reads the replication config > 4. migration runs the replication which causes the > replicated disks to be removed, because the job > is marked for removal > 5. migration will continue without replication > This situation can still happen even with the locking from this patch: 1. replication job is deleted 2. migration starts before the replication was run, so the job is still marked for removal in the replication config 3.-5. same as above So we probably want to check during migration whether the replication job that we want to use is marked for removal. If it is, we could either: - leave the situation as is, i.e. the replication job will be removed during migration and migration will continue without replication - fail the migration (principle of least surprise?) - run replication without the removal mark during migration. Then the replication job would be removed the next time replication runs after migration and hence after the target was switched. Also: If we only read the replication config once during a migration, the locking from this patch shouldn't even be necessary. switch_replication_job_target does read the config once more, but that would still be compatible with allowing other changes to the replication config during migration. But of course this locking might make things more future proof. > Note that the migration doesn't actually fail, but it's probably > not the desired behavior either. > > Suggested-by: Fabian Grünbichler > Signed-off-by: Fabian Ebner > --- > PVE/API2/ReplicationConfig.pm | 18 ++++++++++++++---- > 1 file changed, 14 insertions(+), 4 deletions(-) > > diff --git a/PVE/API2/ReplicationConfig.pm b/PVE/API2/ReplicationConfig.pm > index 2b4ecd10..e5262068 100644 > --- a/PVE/API2/ReplicationConfig.pm > +++ b/PVE/API2/ReplicationConfig.pm > @@ -9,6 +9,7 @@ use PVE::JSONSchema qw(get_standard_option); > use PVE::RPCEnvironment; > use PVE::ReplicationConfig; > use PVE::Cluster; > +use PVE::GuestHelpers; > > use PVE::RESTHandler; > > @@ -144,7 +145,9 @@ __PACKAGE__->register_method ({ > $cfg->write(); > }; > > - PVE::ReplicationConfig::lock($code); > + PVE::GuestHelpers::guest_migration_lock($guest, 10, sub { > + PVE::ReplicationConfig::lock($code); > + }); > > return undef; > }}); > @@ -167,6 +170,7 @@ __PACKAGE__->register_method ({ > my $id = extract_param($param, 'id'); > my $digest = extract_param($param, 'digest'); > my $delete = extract_param($param, 'delete'); > + my ($guest_id) = PVE::ReplicationConfig::parse_replication_job_id($id); > > my $code = sub { > my $cfg = PVE::ReplicationConfig->new(); > @@ -199,7 +203,9 @@ __PACKAGE__->register_method ({ > $cfg->write(); > }; > > - PVE::ReplicationConfig::lock($code); > + PVE::GuestHelpers::guest_migration_lock($guest_id, 10, sub { > + PVE::ReplicationConfig::lock($code); > + }); > > return undef; > }}); > @@ -237,10 +243,12 @@ __PACKAGE__->register_method ({ > > my $rpcenv = PVE::RPCEnvironment::get(); > > + my $id = extract_param($param, 'id'); > + my ($guest_id) = PVE::ReplicationConfig::parse_replication_job_id($id); > + > my $code = sub { > my $cfg = PVE::ReplicationConfig->new(); > > - my $id = $param->{id}; > if ($param->{force}) { > raise_param_exc({ 'keep' => "conflicts with parameter 'force'" }) if $param->{keep}; > delete $cfg->{ids}->{$id}; > @@ -262,7 +270,9 @@ __PACKAGE__->register_method ({ > $cfg->write(); > }; > > - PVE::ReplicationConfig::lock($code); > + PVE::GuestHelpers::guest_migration_lock($guest_id, 10, sub { > + PVE::ReplicationConfig::lock($code); > + }); > > return undef; > }}); >