public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH manager 1/6] Set source when creating a new replication job
@ 2020-08-10 12:35 Fabian Ebner
  2020-08-10 12:35 ` [pve-devel] [PATCH guest-common 2/6] job_status: read only after acquiring the lock Fabian Ebner
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Fabian Ebner @ 2020-08-10 12:35 UTC (permalink / raw)
  To: pve-devel

If source is missing, pvesr will set it via job_status
on the next run. But the info is already present here,
so it doesn't hurt to use it.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
 PVE/API2/ReplicationConfig.pm | 1 +
 1 file changed, 1 insertion(+)

diff --git a/PVE/API2/ReplicationConfig.pm b/PVE/API2/ReplicationConfig.pm
index 2b4ecd10..b85e5804 100644
--- a/PVE/API2/ReplicationConfig.pm
+++ b/PVE/API2/ReplicationConfig.pm
@@ -138,6 +138,7 @@ __PACKAGE__->register_method ({
 	    my $opts = $plugin->check_config($id, $param, 1, 1);
 
 	    $opts->{guest} = $guest;
+	    $opts->{source} //= $vmlist->{ids}->{$guest}->{node};
 
 	    $cfg->{ids}->{$id} = $opts;
 
-- 
2.20.1





^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pve-devel] [PATCH guest-common 2/6] job_status: read only after acquiring the lock
  2020-08-10 12:35 [pve-devel] [PATCH manager 1/6] Set source when creating a new replication job Fabian Ebner
@ 2020-08-10 12:35 ` Fabian Ebner
  2020-08-10 12:35 ` [pve-devel] [PATCH guest-common 3/6] Clarify what the source property is used for in a replication job Fabian Ebner
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Fabian Ebner @ 2020-08-10 12:35 UTC (permalink / raw)
  To: pve-devel

to get the current replication config and have the VM list
and state object as recent as possible.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
 PVE/ReplicationState.pm | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/PVE/ReplicationState.pm b/PVE/ReplicationState.pm
index 057cae6..0c92778 100644
--- a/PVE/ReplicationState.pm
+++ b/PVE/ReplicationState.pm
@@ -234,13 +234,11 @@ sub job_status {
 
     my $jobs = {};
 
-    my $stateobj = read_state();
-
-    my $cfg = PVE::ReplicationConfig->new();
-
-    my $vms = PVE::Cluster::get_vmlist();
-
     my $func = sub {
+	my $cfg = PVE::ReplicationConfig->new();
+	my $stateobj = read_state();
+	my $vms = PVE::Cluster::get_vmlist();
+
 	foreach my $jobid (sort keys %{$cfg->{ids}}) {
 	    my $jobcfg = $cfg->{ids}->{$jobid};
 	    my $vmid = $jobcfg->{guest};
-- 
2.20.1





^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pve-devel] [PATCH guest-common 3/6] Clarify what the source property is used for in a replication job
  2020-08-10 12:35 [pve-devel] [PATCH manager 1/6] Set source when creating a new replication job Fabian Ebner
  2020-08-10 12:35 ` [pve-devel] [PATCH guest-common 2/6] job_status: read only after acquiring the lock Fabian Ebner
@ 2020-08-10 12:35 ` Fabian Ebner
  2020-08-10 12:35 ` [pve-devel] [PATCH guest-common 4/6] Also update sources in switch_replication_job_target Fabian Ebner
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Fabian Ebner @ 2020-08-10 12:35 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
 PVE/ReplicationConfig.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/PVE/ReplicationConfig.pm b/PVE/ReplicationConfig.pm
index 66ef842..65436ea 100644
--- a/PVE/ReplicationConfig.pm
+++ b/PVE/ReplicationConfig.pm
@@ -80,7 +80,7 @@ my $defaultData = {
 	    optional => 1,
 	},
 	source => {
-	    description => "Source of the replication.",
+	    description => "For internal use, to detect if the guest was stolen.",
 	    type => 'string', format => 'pve-node',
 	    optional => 1,
 	},
-- 
2.20.1





^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pve-devel] [PATCH guest-common 4/6] Also update sources in switch_replication_job_target
  2020-08-10 12:35 [pve-devel] [PATCH manager 1/6] Set source when creating a new replication job Fabian Ebner
  2020-08-10 12:35 ` [pve-devel] [PATCH guest-common 2/6] job_status: read only after acquiring the lock Fabian Ebner
  2020-08-10 12:35 ` [pve-devel] [PATCH guest-common 3/6] Clarify what the source property is used for in a replication job Fabian Ebner
@ 2020-08-10 12:35 ` Fabian Ebner
  2020-08-10 12:35 ` [pve-devel] [PATCH/RFC guest-common 5/6] job_status: simplify fixup of jobs for stolen guests Fabian Ebner
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Fabian Ebner @ 2020-08-10 12:35 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
 PVE/ReplicationConfig.pm | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/PVE/ReplicationConfig.pm b/PVE/ReplicationConfig.pm
index 65436ea..fc3e792 100644
--- a/PVE/ReplicationConfig.pm
+++ b/PVE/ReplicationConfig.pm
@@ -228,23 +228,27 @@ sub find_local_replication_job {
     return undef;
 }
 
-# switch local replication job target
+# makes old_target the new source for all local jobs of this guest
+# makes new_target the target for the single local job with target old_target
 sub switch_replication_job_target {
     my ($vmid, $old_target, $new_target) = @_;
 
-    my $transfer_job = sub {
+    my $update_jobs = sub {
 	my $cfg = PVE::ReplicationConfig->new();
-	my $jobcfg = find_local_replication_job($cfg, $vmid, $old_target);
-
-	return if !$jobcfg;
+	foreach my $id (keys %{$cfg->{ids}}) {
+	    my $jobcfg = $cfg->{ids}->{$id};
 
-	$jobcfg->{target} = $new_target;
+	    next if $jobcfg->{guest} ne $vmid;
+	    next if $jobcfg->{type} ne 'local';
 
+	    $jobcfg->{target} = $new_target if $jobcfg->{target} eq $old_target;
+	    $jobcfg->{source} = $old_target;
+	}
 	$cfg->write();
     };
 
-    lock($transfer_job);
-};
+    lock($update_jobs);
+}
 
 sub delete_job {
     my ($jobid) = @_;
-- 
2.20.1





^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pve-devel] [PATCH/RFC guest-common 5/6] job_status: simplify fixup of jobs for stolen guests
  2020-08-10 12:35 [pve-devel] [PATCH manager 1/6] Set source when creating a new replication job Fabian Ebner
                   ` (2 preceding siblings ...)
  2020-08-10 12:35 ` [pve-devel] [PATCH guest-common 4/6] Also update sources in switch_replication_job_target Fabian Ebner
@ 2020-08-10 12:35 ` Fabian Ebner
  2020-08-10 12:35 ` [pve-devel] [PATCH/RFC guest-common 6/6] job_status: return jobs with target local node Fabian Ebner
  2020-08-11 12:31 ` [pve-devel] applied: [PATCH manager 1/6] Set source when creating a new replication job Fabian Grünbichler
  5 siblings, 0 replies; 9+ messages in thread
From: Fabian Ebner @ 2020-08-10 12:35 UTC (permalink / raw)
  To: pve-devel

by re-using switch_replication_job_target, or in fact
the new no_lock variant operating on the config itself.

If a job is scheduled for removal and the guest was
stolen, it still makes sense to correct the job entry,
which didn't happen previously.

AFAICT, this was the only user of swap_source_target_nolock

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
 PVE/ReplicationConfig.pm | 40 ++++++++++++++++------------------------
 PVE/ReplicationState.pm  | 32 +++++++++++++-------------------
 2 files changed, 29 insertions(+), 43 deletions(-)

diff --git a/PVE/ReplicationConfig.pm b/PVE/ReplicationConfig.pm
index fc3e792..815c9c0 100644
--- a/PVE/ReplicationConfig.pm
+++ b/PVE/ReplicationConfig.pm
@@ -230,23 +230,28 @@ sub find_local_replication_job {
 
 # makes old_target the new source for all local jobs of this guest
 # makes new_target the target for the single local job with target old_target
+sub switch_replication_job_target_nolock {
+    my ($cfg, $vmid, $old_target, $new_target) = @_;
+
+    foreach my $id (keys %{$cfg->{ids}}) {
+	my $jobcfg = $cfg->{ids}->{$id};
+
+	next if $jobcfg->{guest} ne $vmid;
+	next if $jobcfg->{type} ne 'local';
+
+	$jobcfg->{target} = $new_target if $jobcfg->{target} eq $old_target;
+	$jobcfg->{source} = $old_target;
+    }
+    $cfg->write();
+}
+
 sub switch_replication_job_target {
     my ($vmid, $old_target, $new_target) = @_;
 
     my $update_jobs = sub {
 	my $cfg = PVE::ReplicationConfig->new();
-	foreach my $id (keys %{$cfg->{ids}}) {
-	    my $jobcfg = $cfg->{ids}->{$id};
-
-	    next if $jobcfg->{guest} ne $vmid;
-	    next if $jobcfg->{type} ne 'local';
-
-	    $jobcfg->{target} = $new_target if $jobcfg->{target} eq $old_target;
-	    $jobcfg->{source} = $old_target;
-	}
-	$cfg->write();
+	$cfg->switch_replication_job_target_nolock($vmid, $old_target, $new_target);
     };
-
     lock($update_jobs);
 }
 
@@ -276,19 +281,6 @@ sub remove_vmid_jobs {
     lock($code);
 }
 
-sub swap_source_target_nolock {
-    my ($jobid) = @_;
-
-    my $cfg = __PACKAGE__->new();
-    my $job = $cfg->{ids}->{$jobid};
-    my $tmp = $job->{source};
-    $job->{source} = $job->{target};
-    $job->{target} = $tmp;
-    $cfg->write();
-
-    return $cfg->{ids}->{$jobid};
-}
-
 package PVE::ReplicationConfig::Cluster;
 
 use base qw(PVE::ReplicationConfig);
diff --git a/PVE/ReplicationState.pm b/PVE/ReplicationState.pm
index 0c92778..e486bc7 100644
--- a/PVE/ReplicationState.pm
+++ b/PVE/ReplicationState.pm
@@ -251,22 +251,21 @@ sub job_status {
 	    # only consider guest on local node
 	    next if $vms->{ids}->{$vmid}->{node} ne $local_node;
 
+	    # source is optional in schema, but we set it automatically
+	    if (!defined($jobcfg->{source})) {
+		$jobcfg->{source} = $local_node;
+		$cfg->write();
+	    }
+
+	    # fix jobs for stolen guest
+	    $cfg->switch_replication_job_target_nolock($vmid, $local_node, $jobcfg->{source})
+		if $local_node ne $jobcfg->{source};
+
 	    my $target = $jobcfg->{target};
-	    if (!$jobcfg->{remove_job}) {
-		# check if vm was stolen (swapped source target)
-		if ($target eq $local_node) {
-		    my $source = $jobcfg->{source};
-		    if (defined($source) && $source ne $target) {
-			$jobcfg = PVE::ReplicationConfig::swap_source_target_nolock($jobid);
-			$cfg->{ids}->{$jobid} = $jobcfg;
-		    } else {
-			# never sync to local node
-			next;
-		    }
-		}
+	    # never sync to local node
+	    next if !$jobcfg->{remove_job} && $target eq $local_node;
 
-		next if !$get_disabled && $jobcfg->{disable};
-	    }
+	    next if !$get_disabled && $jobcfg->{disable};
 
 	    my $state = extract_job_state($stateobj, $jobcfg);
 	    $jobcfg->{state} = $state;
@@ -293,11 +292,6 @@ sub job_status {
 
 	    $jobcfg->{next_sync} = $next_sync;
 
-	    if (!defined($jobcfg->{source}) || $jobcfg->{source} ne $local_node) {
-		$jobcfg->{source} = $cfg->{ids}->{$jobid}->{source} = $local_node;
-		PVE::ReplicationConfig::write($cfg);
-	    }
-
 	    $jobs->{$jobid} = $jobcfg;
 	}
     };
-- 
2.20.1





^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pve-devel] [PATCH/RFC guest-common 6/6] job_status: return jobs with target local node
  2020-08-10 12:35 [pve-devel] [PATCH manager 1/6] Set source when creating a new replication job Fabian Ebner
                   ` (3 preceding siblings ...)
  2020-08-10 12:35 ` [pve-devel] [PATCH/RFC guest-common 5/6] job_status: simplify fixup of jobs for stolen guests Fabian Ebner
@ 2020-08-10 12:35 ` Fabian Ebner
  2020-08-11  9:20   ` Fabian Ebner
  2020-08-11 12:31 ` [pve-devel] applied: [PATCH manager 1/6] Set source when creating a new replication job Fabian Grünbichler
  5 siblings, 1 reply; 9+ messages in thread
From: Fabian Ebner @ 2020-08-10 12:35 UTC (permalink / raw)
  To: pve-devel

even if not scheduled for removal, while adapting
replicate to die gracefully except for the removal case.

Like this such invalid jobs are not hidden to the user anymore
(at least via the API, the GUI still hides them)

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---

I think it's a bit weird that such jobs only show up once
they are scheduled for removal. I'll send a patch for the
GUI too if we do want the new behavior.

 PVE/Replication.pm      | 3 +++
 PVE/ReplicationState.pm | 5 +----
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/PVE/Replication.pm b/PVE/Replication.pm
index ae0f145..b5835bd 100644
--- a/PVE/Replication.pm
+++ b/PVE/Replication.pm
@@ -207,6 +207,9 @@ sub replicate {
 
     die "not implemented - internal error" if $jobcfg->{type} ne 'local';
 
+    die "job target is local node\n" if $jobcfg->{target} eq $local_node
+				     && !$jobcfg->{remove_job};
+
     my $dc_conf = PVE::Cluster::cfs_read_file('datacenter.cfg');
 
     my $migration_network;
diff --git a/PVE/ReplicationState.pm b/PVE/ReplicationState.pm
index e486bc7..0b751bb 100644
--- a/PVE/ReplicationState.pm
+++ b/PVE/ReplicationState.pm
@@ -261,10 +261,6 @@ sub job_status {
 	    $cfg->switch_replication_job_target_nolock($vmid, $local_node, $jobcfg->{source})
 		if $local_node ne $jobcfg->{source};
 
-	    my $target = $jobcfg->{target};
-	    # never sync to local node
-	    next if !$jobcfg->{remove_job} && $target eq $local_node;
-
 	    next if !$get_disabled && $jobcfg->{disable};
 
 	    my $state = extract_job_state($stateobj, $jobcfg);
@@ -280,6 +276,7 @@ sub job_status {
 	    } else  {
 		if (my $fail_count = $state->{fail_count}) {
 		    my $members = PVE::Cluster::get_members();
+		    my $target = $jobcfg->{target};
 		    if (!$fail_count || ($members->{$target} && $members->{$target}->{online})) {
 			$next_sync = $state->{last_try} + 60*($fail_count < 3 ? 5*$fail_count : 30);
 		    }
-- 
2.20.1





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [pve-devel] [PATCH/RFC guest-common 6/6] job_status: return jobs with target local node
  2020-08-10 12:35 ` [pve-devel] [PATCH/RFC guest-common 6/6] job_status: return jobs with target local node Fabian Ebner
@ 2020-08-11  9:20   ` Fabian Ebner
  2020-08-13 10:06     ` Fabian Grünbichler
  0 siblings, 1 reply; 9+ messages in thread
From: Fabian Ebner @ 2020-08-11  9:20 UTC (permalink / raw)
  To: pve-devel

There is another minor issue (with and without my patches):
If there is a job 123-0 for a guest on pve0 with source=target=pve0 and 
a job 123-4 with source=pve0 and target=pve1, and we migrate to pve1, 
then switching source and target for job 123-4 is not possible, because 
there already is a job with target pve0. Thus cfg->write() will fail, 
and by extension job_status. (Also the switch_replication_job_target 
call during migration fails for the same reason).

Possible solutions:

1. Instead of making such jobs (i.e. jobs with target=source) visible, 
in the hope that a user would remove/fix them, we could automatically 
remove them ourselves (this could be done as part of the 
switch_replication_job_target function as well). Under normal 
conditions, there shouldn't be any such jobs anyways.

2. Alternatively (or additionally), we could also add checks in the 
create/update API paths to ensure that the target is not the node the 
guest is on.

Option 2 would add a reason for using guest_migration locks in the 
create/update paths. But I'm not sure we'd want that. The ability to 
update job configurations while a replication is running is a feature 
IMHO, and I think stealing guests might still lead to a bad 
configuration. Therefore, I'd prefer option 1, which just adds a bit to 
the automatic fixing we already do.

@Fabian G.: Opinions?

Am 10.08.20 um 14:35 schrieb Fabian Ebner:
> even if not scheduled for removal, while adapting
> replicate to die gracefully except for the removal case.
> 
> Like this such invalid jobs are not hidden to the user anymore
> (at least via the API, the GUI still hides them)
> 
> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
> ---
> 
> I think it's a bit weird that such jobs only show up once
> they are scheduled for removal. I'll send a patch for the
> GUI too if we do want the new behavior.
> 
>   PVE/Replication.pm      | 3 +++
>   PVE/ReplicationState.pm | 5 +----
>   2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/PVE/Replication.pm b/PVE/Replication.pm
> index ae0f145..b5835bd 100644
> --- a/PVE/Replication.pm
> +++ b/PVE/Replication.pm
> @@ -207,6 +207,9 @@ sub replicate {
>   
>       die "not implemented - internal error" if $jobcfg->{type} ne 'local';
>   
> +    die "job target is local node\n" if $jobcfg->{target} eq $local_node
> +				     && !$jobcfg->{remove_job};
> +
>       my $dc_conf = PVE::Cluster::cfs_read_file('datacenter.cfg');
>   
>       my $migration_network;
> diff --git a/PVE/ReplicationState.pm b/PVE/ReplicationState.pm
> index e486bc7..0b751bb 100644
> --- a/PVE/ReplicationState.pm
> +++ b/PVE/ReplicationState.pm
> @@ -261,10 +261,6 @@ sub job_status {
>   	    $cfg->switch_replication_job_target_nolock($vmid, $local_node, $jobcfg->{source})
>   		if $local_node ne $jobcfg->{source};
>   
> -	    my $target = $jobcfg->{target};
> -	    # never sync to local node
> -	    next if !$jobcfg->{remove_job} && $target eq $local_node;
> -
>   	    next if !$get_disabled && $jobcfg->{disable};
>   
>   	    my $state = extract_job_state($stateobj, $jobcfg);
> @@ -280,6 +276,7 @@ sub job_status {
>   	    } else  {
>   		if (my $fail_count = $state->{fail_count}) {
>   		    my $members = PVE::Cluster::get_members();
> +		    my $target = $jobcfg->{target};
>   		    if (!$fail_count || ($members->{$target} && $members->{$target}->{online})) {
>   			$next_sync = $state->{last_try} + 60*($fail_count < 3 ? 5*$fail_count : 30);
>   		    }
> 




^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pve-devel] applied: [PATCH manager 1/6] Set source when creating a new replication job
  2020-08-10 12:35 [pve-devel] [PATCH manager 1/6] Set source when creating a new replication job Fabian Ebner
                   ` (4 preceding siblings ...)
  2020-08-10 12:35 ` [pve-devel] [PATCH/RFC guest-common 6/6] job_status: return jobs with target local node Fabian Ebner
@ 2020-08-11 12:31 ` Fabian Grünbichler
  5 siblings, 0 replies; 9+ messages in thread
From: Fabian Grünbichler @ 2020-08-11 12:31 UTC (permalink / raw)
  To: Proxmox VE development discussion

and sent some additional checks/refactors as follow-up series

On August 10, 2020 2:35 pm, Fabian Ebner wrote:
> If source is missing, pvesr will set it via job_status
> on the next run. But the info is already present here,
> so it doesn't hurt to use it.
> 
> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
> ---
>  PVE/API2/ReplicationConfig.pm | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/PVE/API2/ReplicationConfig.pm b/PVE/API2/ReplicationConfig.pm
> index 2b4ecd10..b85e5804 100644
> --- a/PVE/API2/ReplicationConfig.pm
> +++ b/PVE/API2/ReplicationConfig.pm
> @@ -138,6 +138,7 @@ __PACKAGE__->register_method ({
>  	    my $opts = $plugin->check_config($id, $param, 1, 1);
>  
>  	    $opts->{guest} = $guest;
> +	    $opts->{source} //= $vmlist->{ids}->{$guest}->{node};
>  
>  	    $cfg->{ids}->{$id} = $opts;
>  
> -- 
> 2.20.1
> 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> 
> 




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [pve-devel] [PATCH/RFC guest-common 6/6] job_status: return jobs with target local node
  2020-08-11  9:20   ` Fabian Ebner
@ 2020-08-13 10:06     ` Fabian Grünbichler
  0 siblings, 0 replies; 9+ messages in thread
From: Fabian Grünbichler @ 2020-08-13 10:06 UTC (permalink / raw)
  To: Fabian Ebner, pve-devel

On August 11, 2020 11:20 am, Fabian Ebner wrote:
> There is another minor issue (with and without my patches):
> If there is a job 123-0 for a guest on pve0 with source=target=pve0 and 
> a job 123-4 with source=pve0 and target=pve1, and we migrate to pve1, 
> then switching source and target for job 123-4 is not possible, because 
> there already is a job with target pve0. Thus cfg->write() will fail, 
> and by extension job_status. (Also the switch_replication_job_target 
> call during migration fails for the same reason).

this patch also breaks replication_test2.pl in pve-manager..

> 
> Possible solutions:
> 
> 1. Instead of making such jobs (i.e. jobs with target=source) visible, 
> in the hope that a user would remove/fix them, we could automatically 
> remove them ourselves (this could be done as part of the 
> switch_replication_job_target function as well). Under normal 
> conditions, there shouldn't be any such jobs anyways.

I guess making them visible as long as they get filtered out/warned 
about early on when actually doing a replication is fine. would need to 
look the the call sites to make sure that everybody handles this 
correctly (and probably also adapt the test case, see above).

I would not remove them altogether/automatically, could be a result of 
an admin misediting the config, we don't want to throw away a 
potentially existing replication state if we don't have to..

> 
> 2. Alternatively (or additionally), we could also add checks in the 
> create/update API paths to ensure that the target is not the node the 
> guest is on.
> 
> Option 2 would add a reason for using guest_migration locks in the 
> create/update paths. But I'm not sure we'd want that. The ability to 
> update job configurations while a replication is running is a feature 
> IMHO, and I think stealing guests might still lead to a bad 
> configuration. Therefore, I'd prefer option 1, which just adds a bit to 
> the automatic fixing we already do.

Yes, I'd check that source == current node and target != current node 
(see my patch series ;)).

we could leave out the guest lock - worst case, it's possible to 
add/modify a replication config such that it represents the 
pre-migration source->target pair, which would get cleaned up on the 
next run by job_status anyway unless I am mistaking something?

> 
> @Fabian G.: Opinions?
> 
> Am 10.08.20 um 14:35 schrieb Fabian Ebner:
>> even if not scheduled for removal, while adapting
>> replicate to die gracefully except for the removal case.
>> 
>> Like this such invalid jobs are not hidden to the user anymore
>> (at least via the API, the GUI still hides them)
>> 
>> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
>> ---
>> 
>> I think it's a bit weird that such jobs only show up once
>> they are scheduled for removal. I'll send a patch for the
>> GUI too if we do want the new behavior.
>> 
>>   PVE/Replication.pm      | 3 +++
>>   PVE/ReplicationState.pm | 5 +----
>>   2 files changed, 4 insertions(+), 4 deletions(-)
>> 
>> diff --git a/PVE/Replication.pm b/PVE/Replication.pm
>> index ae0f145..b5835bd 100644
>> --- a/PVE/Replication.pm
>> +++ b/PVE/Replication.pm
>> @@ -207,6 +207,9 @@ sub replicate {
>>   
>>       die "not implemented - internal error" if $jobcfg->{type} ne 'local';
>>   
>> +    die "job target is local node\n" if $jobcfg->{target} eq $local_node
>> +				     && !$jobcfg->{remove_job};
>> +
>>       my $dc_conf = PVE::Cluster::cfs_read_file('datacenter.cfg');
>>   
>>       my $migration_network;
>> diff --git a/PVE/ReplicationState.pm b/PVE/ReplicationState.pm
>> index e486bc7..0b751bb 100644
>> --- a/PVE/ReplicationState.pm
>> +++ b/PVE/ReplicationState.pm
>> @@ -261,10 +261,6 @@ sub job_status {
>>   	    $cfg->switch_replication_job_target_nolock($vmid, $local_node, $jobcfg->{source})
>>   		if $local_node ne $jobcfg->{source};
>>   
>> -	    my $target = $jobcfg->{target};
>> -	    # never sync to local node
>> -	    next if !$jobcfg->{remove_job} && $target eq $local_node;
>> -
>>   	    next if !$get_disabled && $jobcfg->{disable};
>>   
>>   	    my $state = extract_job_state($stateobj, $jobcfg);
>> @@ -280,6 +276,7 @@ sub job_status {
>>   	    } else  {
>>   		if (my $fail_count = $state->{fail_count}) {
>>   		    my $members = PVE::Cluster::get_members();
>> +		    my $target = $jobcfg->{target};
>>   		    if (!$fail_count || ($members->{$target} && $members->{$target}->{online})) {
>>   			$next_sync = $state->{last_try} + 60*($fail_count < 3 ? 5*$fail_count : 30);
>>   		    }
>> 
> 




^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-08-13 10:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-10 12:35 [pve-devel] [PATCH manager 1/6] Set source when creating a new replication job Fabian Ebner
2020-08-10 12:35 ` [pve-devel] [PATCH guest-common 2/6] job_status: read only after acquiring the lock Fabian Ebner
2020-08-10 12:35 ` [pve-devel] [PATCH guest-common 3/6] Clarify what the source property is used for in a replication job Fabian Ebner
2020-08-10 12:35 ` [pve-devel] [PATCH guest-common 4/6] Also update sources in switch_replication_job_target Fabian Ebner
2020-08-10 12:35 ` [pve-devel] [PATCH/RFC guest-common 5/6] job_status: simplify fixup of jobs for stolen guests Fabian Ebner
2020-08-10 12:35 ` [pve-devel] [PATCH/RFC guest-common 6/6] job_status: return jobs with target local node Fabian Ebner
2020-08-11  9:20   ` Fabian Ebner
2020-08-13 10:06     ` Fabian Grünbichler
2020-08-11 12:31 ` [pve-devel] applied: [PATCH manager 1/6] Set source when creating a new replication job Fabian Grünbichler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal