public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Fabian Ebner <f.ebner@proxmox.com>, pve-devel@lists.proxmox.com
Subject: Re: [pve-devel] [PATCH/RFC guest-common 6/6] job_status: return jobs with target local node
Date: Thu, 13 Aug 2020 12:06:03 +0200	[thread overview]
Message-ID: <1597312701.72ffjl19l1.astroid@nora.none> (raw)
In-Reply-To: <cd685c4e-64d2-0fdd-038a-5e5ea02dec82@proxmox.com>

On August 11, 2020 11:20 am, Fabian Ebner wrote:
> There is another minor issue (with and without my patches):
> If there is a job 123-0 for a guest on pve0 with source=target=pve0 and 
> a job 123-4 with source=pve0 and target=pve1, and we migrate to pve1, 
> then switching source and target for job 123-4 is not possible, because 
> there already is a job with target pve0. Thus cfg->write() will fail, 
> and by extension job_status. (Also the switch_replication_job_target 
> call during migration fails for the same reason).

this patch also breaks replication_test2.pl in pve-manager..

> 
> Possible solutions:
> 
> 1. Instead of making such jobs (i.e. jobs with target=source) visible, 
> in the hope that a user would remove/fix them, we could automatically 
> remove them ourselves (this could be done as part of the 
> switch_replication_job_target function as well). Under normal 
> conditions, there shouldn't be any such jobs anyways.

I guess making them visible as long as they get filtered out/warned 
about early on when actually doing a replication is fine. would need to 
look the the call sites to make sure that everybody handles this 
correctly (and probably also adapt the test case, see above).

I would not remove them altogether/automatically, could be a result of 
an admin misediting the config, we don't want to throw away a 
potentially existing replication state if we don't have to..

> 
> 2. Alternatively (or additionally), we could also add checks in the 
> create/update API paths to ensure that the target is not the node the 
> guest is on.
> 
> Option 2 would add a reason for using guest_migration locks in the 
> create/update paths. But I'm not sure we'd want that. The ability to 
> update job configurations while a replication is running is a feature 
> IMHO, and I think stealing guests might still lead to a bad 
> configuration. Therefore, I'd prefer option 1, which just adds a bit to 
> the automatic fixing we already do.

Yes, I'd check that source == current node and target != current node 
(see my patch series ;)).

we could leave out the guest lock - worst case, it's possible to 
add/modify a replication config such that it represents the 
pre-migration source->target pair, which would get cleaned up on the 
next run by job_status anyway unless I am mistaking something?

> 
> @Fabian G.: Opinions?
> 
> Am 10.08.20 um 14:35 schrieb Fabian Ebner:
>> even if not scheduled for removal, while adapting
>> replicate to die gracefully except for the removal case.
>> 
>> Like this such invalid jobs are not hidden to the user anymore
>> (at least via the API, the GUI still hides them)
>> 
>> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
>> ---
>> 
>> I think it's a bit weird that such jobs only show up once
>> they are scheduled for removal. I'll send a patch for the
>> GUI too if we do want the new behavior.
>> 
>>   PVE/Replication.pm      | 3 +++
>>   PVE/ReplicationState.pm | 5 +----
>>   2 files changed, 4 insertions(+), 4 deletions(-)
>> 
>> diff --git a/PVE/Replication.pm b/PVE/Replication.pm
>> index ae0f145..b5835bd 100644
>> --- a/PVE/Replication.pm
>> +++ b/PVE/Replication.pm
>> @@ -207,6 +207,9 @@ sub replicate {
>>   
>>       die "not implemented - internal error" if $jobcfg->{type} ne 'local';
>>   
>> +    die "job target is local node\n" if $jobcfg->{target} eq $local_node
>> +				     && !$jobcfg->{remove_job};
>> +
>>       my $dc_conf = PVE::Cluster::cfs_read_file('datacenter.cfg');
>>   
>>       my $migration_network;
>> diff --git a/PVE/ReplicationState.pm b/PVE/ReplicationState.pm
>> index e486bc7..0b751bb 100644
>> --- a/PVE/ReplicationState.pm
>> +++ b/PVE/ReplicationState.pm
>> @@ -261,10 +261,6 @@ sub job_status {
>>   	    $cfg->switch_replication_job_target_nolock($vmid, $local_node, $jobcfg->{source})
>>   		if $local_node ne $jobcfg->{source};
>>   
>> -	    my $target = $jobcfg->{target};
>> -	    # never sync to local node
>> -	    next if !$jobcfg->{remove_job} && $target eq $local_node;
>> -
>>   	    next if !$get_disabled && $jobcfg->{disable};
>>   
>>   	    my $state = extract_job_state($stateobj, $jobcfg);
>> @@ -280,6 +276,7 @@ sub job_status {
>>   	    } else  {
>>   		if (my $fail_count = $state->{fail_count}) {
>>   		    my $members = PVE::Cluster::get_members();
>> +		    my $target = $jobcfg->{target};
>>   		    if (!$fail_count || ($members->{$target} && $members->{$target}->{online})) {
>>   			$next_sync = $state->{last_try} + 60*($fail_count < 3 ? 5*$fail_count : 30);
>>   		    }
>> 
> 




  reply	other threads:[~2020-08-13 10:06 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-10 12:35 [pve-devel] [PATCH manager 1/6] Set source when creating a new replication job Fabian Ebner
2020-08-10 12:35 ` [pve-devel] [PATCH guest-common 2/6] job_status: read only after acquiring the lock Fabian Ebner
2020-08-10 12:35 ` [pve-devel] [PATCH guest-common 3/6] Clarify what the source property is used for in a replication job Fabian Ebner
2020-08-10 12:35 ` [pve-devel] [PATCH guest-common 4/6] Also update sources in switch_replication_job_target Fabian Ebner
2020-08-10 12:35 ` [pve-devel] [PATCH/RFC guest-common 5/6] job_status: simplify fixup of jobs for stolen guests Fabian Ebner
2020-08-10 12:35 ` [pve-devel] [PATCH/RFC guest-common 6/6] job_status: return jobs with target local node Fabian Ebner
2020-08-11  9:20   ` Fabian Ebner
2020-08-13 10:06     ` Fabian Grünbichler [this message]
2020-08-11 12:31 ` [pve-devel] applied: [PATCH manager 1/6] Set source when creating a new replication job Fabian Grünbichler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1597312701.72ffjl19l1.astroid@nora.none \
    --to=f.gruenbichler@proxmox.com \
    --cc=f.ebner@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal