From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id CF21D60910 for ; Thu, 13 Aug 2020 12:06:41 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id BD32118FAD for ; Thu, 13 Aug 2020 12:06:11 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [212.186.127.180]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id E8FBC18FA2 for ; Thu, 13 Aug 2020 12:06:10 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id AE7D2445E0 for ; Thu, 13 Aug 2020 12:06:10 +0200 (CEST) Date: Thu, 13 Aug 2020 12:06:03 +0200 From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= To: Fabian Ebner , pve-devel@lists.proxmox.com References: <20200810123557.22618-1-f.ebner@proxmox.com> <20200810123557.22618-6-f.ebner@proxmox.com> In-Reply-To: MIME-Version: 1.0 User-Agent: astroid/0.15.0 (https://github.com/astroidmail/astroid) Message-Id: <1597312701.72ffjl19l1.astroid@nora.none> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-SPAM-LEVEL: Spam detection results: 0 AWL 0.034 Adjusted score from AWL reputation of From: address KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_DNSWL_MED -2.3 Sender listed at https://www.dnswl.org/, medium trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [PATCH/RFC guest-common 6/6] job_status: return jobs with target local node X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Aug 2020 10:06:41 -0000 On August 11, 2020 11:20 am, Fabian Ebner wrote: > There is another minor issue (with and without my patches): > If there is a job 123-0 for a guest on pve0 with source=3Dtarget=3Dpve0 a= nd=20 > a job 123-4 with source=3Dpve0 and target=3Dpve1, and we migrate to pve1,= =20 > then switching source and target for job 123-4 is not possible, because=20 > there already is a job with target pve0. Thus cfg->write() will fail,=20 > and by extension job_status. (Also the switch_replication_job_target=20 > call during migration fails for the same reason). this patch also breaks replication_test2.pl in pve-manager.. >=20 > Possible solutions: >=20 > 1. Instead of making such jobs (i.e. jobs with target=3Dsource) visible,=20 > in the hope that a user would remove/fix them, we could automatically=20 > remove them ourselves (this could be done as part of the=20 > switch_replication_job_target function as well). Under normal=20 > conditions, there shouldn't be any such jobs anyways. I guess making them visible as long as they get filtered out/warned=20 about early on when actually doing a replication is fine. would need to=20 look the the call sites to make sure that everybody handles this=20 correctly (and probably also adapt the test case, see above). I would not remove them altogether/automatically, could be a result of=20 an admin misediting the config, we don't want to throw away a=20 potentially existing replication state if we don't have to.. >=20 > 2. Alternatively (or additionally), we could also add checks in the=20 > create/update API paths to ensure that the target is not the node the=20 > guest is on. >=20 > Option 2 would add a reason for using guest_migration locks in the=20 > create/update paths. But I'm not sure we'd want that. The ability to=20 > update job configurations while a replication is running is a feature=20 > IMHO, and I think stealing guests might still lead to a bad=20 > configuration. Therefore, I'd prefer option 1, which just adds a bit to=20 > the automatic fixing we already do. Yes, I'd check that source =3D=3D current node and target !=3D current node= =20 (see my patch series ;)). we could leave out the guest lock - worst case, it's possible to=20 add/modify a replication config such that it represents the=20 pre-migration source->target pair, which would get cleaned up on the=20 next run by job_status anyway unless I am mistaking something? >=20 > @Fabian G.: Opinions? >=20 > Am 10.08.20 um 14:35 schrieb Fabian Ebner: >> even if not scheduled for removal, while adapting >> replicate to die gracefully except for the removal case. >>=20 >> Like this such invalid jobs are not hidden to the user anymore >> (at least via the API, the GUI still hides them) >>=20 >> Signed-off-by: Fabian Ebner >> --- >>=20 >> I think it's a bit weird that such jobs only show up once >> they are scheduled for removal. I'll send a patch for the >> GUI too if we do want the new behavior. >>=20 >> PVE/Replication.pm | 3 +++ >> PVE/ReplicationState.pm | 5 +---- >> 2 files changed, 4 insertions(+), 4 deletions(-) >>=20 >> diff --git a/PVE/Replication.pm b/PVE/Replication.pm >> index ae0f145..b5835bd 100644 >> --- a/PVE/Replication.pm >> +++ b/PVE/Replication.pm >> @@ -207,6 +207,9 @@ sub replicate { >> =20 >> die "not implemented - internal error" if $jobcfg->{type} ne 'loca= l'; >> =20 >> + die "job target is local node\n" if $jobcfg->{target} eq $local_nod= e >> + && !$jobcfg->{remove_job}; >> + >> my $dc_conf =3D PVE::Cluster::cfs_read_file('datacenter.cfg'); >> =20 >> my $migration_network; >> diff --git a/PVE/ReplicationState.pm b/PVE/ReplicationState.pm >> index e486bc7..0b751bb 100644 >> --- a/PVE/ReplicationState.pm >> +++ b/PVE/ReplicationState.pm >> @@ -261,10 +261,6 @@ sub job_status { >> $cfg->switch_replication_job_target_nolock($vmid, $local_node, $j= obcfg->{source}) >> if $local_node ne $jobcfg->{source}; >> =20 >> - my $target =3D $jobcfg->{target}; >> - # never sync to local node >> - next if !$jobcfg->{remove_job} && $target eq $local_node; >> - >> next if !$get_disabled && $jobcfg->{disable}; >> =20 >> my $state =3D extract_job_state($stateobj, $jobcfg); >> @@ -280,6 +276,7 @@ sub job_status { >> } else { >> if (my $fail_count =3D $state->{fail_count}) { >> my $members =3D PVE::Cluster::get_members(); >> + my $target =3D $jobcfg->{target}; >> if (!$fail_count || ($members->{$target} && $members->{$target}-= >{online})) { >> $next_sync =3D $state->{last_try} + 60*($fail_count < 3 ? 5*$fail_c= ount : 30); >> } >>=20 >=20 =