From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Fabian Ebner <f.ebner@proxmox.com>, pve-devel@lists.proxmox.com
Subject: Re: [pve-devel] [PATCH/RFC guest-common 6/6] job_status: return jobs with target local node
Date: Thu, 13 Aug 2020 12:06:03 +0200 [thread overview]
Message-ID: <1597312701.72ffjl19l1.astroid@nora.none> (raw)
In-Reply-To: <cd685c4e-64d2-0fdd-038a-5e5ea02dec82@proxmox.com>
On August 11, 2020 11:20 am, Fabian Ebner wrote:
> There is another minor issue (with and without my patches):
> If there is a job 123-0 for a guest on pve0 with source=target=pve0 and
> a job 123-4 with source=pve0 and target=pve1, and we migrate to pve1,
> then switching source and target for job 123-4 is not possible, because
> there already is a job with target pve0. Thus cfg->write() will fail,
> and by extension job_status. (Also the switch_replication_job_target
> call during migration fails for the same reason).
this patch also breaks replication_test2.pl in pve-manager..
>
> Possible solutions:
>
> 1. Instead of making such jobs (i.e. jobs with target=source) visible,
> in the hope that a user would remove/fix them, we could automatically
> remove them ourselves (this could be done as part of the
> switch_replication_job_target function as well). Under normal
> conditions, there shouldn't be any such jobs anyways.
I guess making them visible as long as they get filtered out/warned
about early on when actually doing a replication is fine. would need to
look the the call sites to make sure that everybody handles this
correctly (and probably also adapt the test case, see above).
I would not remove them altogether/automatically, could be a result of
an admin misediting the config, we don't want to throw away a
potentially existing replication state if we don't have to..
>
> 2. Alternatively (or additionally), we could also add checks in the
> create/update API paths to ensure that the target is not the node the
> guest is on.
>
> Option 2 would add a reason for using guest_migration locks in the
> create/update paths. But I'm not sure we'd want that. The ability to
> update job configurations while a replication is running is a feature
> IMHO, and I think stealing guests might still lead to a bad
> configuration. Therefore, I'd prefer option 1, which just adds a bit to
> the automatic fixing we already do.
Yes, I'd check that source == current node and target != current node
(see my patch series ;)).
we could leave out the guest lock - worst case, it's possible to
add/modify a replication config such that it represents the
pre-migration source->target pair, which would get cleaned up on the
next run by job_status anyway unless I am mistaking something?
>
> @Fabian G.: Opinions?
>
> Am 10.08.20 um 14:35 schrieb Fabian Ebner:
>> even if not scheduled for removal, while adapting
>> replicate to die gracefully except for the removal case.
>>
>> Like this such invalid jobs are not hidden to the user anymore
>> (at least via the API, the GUI still hides them)
>>
>> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
>> ---
>>
>> I think it's a bit weird that such jobs only show up once
>> they are scheduled for removal. I'll send a patch for the
>> GUI too if we do want the new behavior.
>>
>> PVE/Replication.pm | 3 +++
>> PVE/ReplicationState.pm | 5 +----
>> 2 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/PVE/Replication.pm b/PVE/Replication.pm
>> index ae0f145..b5835bd 100644
>> --- a/PVE/Replication.pm
>> +++ b/PVE/Replication.pm
>> @@ -207,6 +207,9 @@ sub replicate {
>>
>> die "not implemented - internal error" if $jobcfg->{type} ne 'local';
>>
>> + die "job target is local node\n" if $jobcfg->{target} eq $local_node
>> + && !$jobcfg->{remove_job};
>> +
>> my $dc_conf = PVE::Cluster::cfs_read_file('datacenter.cfg');
>>
>> my $migration_network;
>> diff --git a/PVE/ReplicationState.pm b/PVE/ReplicationState.pm
>> index e486bc7..0b751bb 100644
>> --- a/PVE/ReplicationState.pm
>> +++ b/PVE/ReplicationState.pm
>> @@ -261,10 +261,6 @@ sub job_status {
>> $cfg->switch_replication_job_target_nolock($vmid, $local_node, $jobcfg->{source})
>> if $local_node ne $jobcfg->{source};
>>
>> - my $target = $jobcfg->{target};
>> - # never sync to local node
>> - next if !$jobcfg->{remove_job} && $target eq $local_node;
>> -
>> next if !$get_disabled && $jobcfg->{disable};
>>
>> my $state = extract_job_state($stateobj, $jobcfg);
>> @@ -280,6 +276,7 @@ sub job_status {
>> } else {
>> if (my $fail_count = $state->{fail_count}) {
>> my $members = PVE::Cluster::get_members();
>> + my $target = $jobcfg->{target};
>> if (!$fail_count || ($members->{$target} && $members->{$target}->{online})) {
>> $next_sync = $state->{last_try} + 60*($fail_count < 3 ? 5*$fail_count : 30);
>> }
>>
>
next prev parent reply other threads:[~2020-08-13 10:06 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-10 12:35 [pve-devel] [PATCH manager 1/6] Set source when creating a new replication job Fabian Ebner
2020-08-10 12:35 ` [pve-devel] [PATCH guest-common 2/6] job_status: read only after acquiring the lock Fabian Ebner
2020-08-10 12:35 ` [pve-devel] [PATCH guest-common 3/6] Clarify what the source property is used for in a replication job Fabian Ebner
2020-08-10 12:35 ` [pve-devel] [PATCH guest-common 4/6] Also update sources in switch_replication_job_target Fabian Ebner
2020-08-10 12:35 ` [pve-devel] [PATCH/RFC guest-common 5/6] job_status: simplify fixup of jobs for stolen guests Fabian Ebner
2020-08-10 12:35 ` [pve-devel] [PATCH/RFC guest-common 6/6] job_status: return jobs with target local node Fabian Ebner
2020-08-11 9:20 ` Fabian Ebner
2020-08-13 10:06 ` Fabian Grünbichler [this message]
2020-08-11 12:31 ` [pve-devel] applied: [PATCH manager 1/6] Set source when creating a new replication job Fabian Grünbichler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1597312701.72ffjl19l1.astroid@nora.none \
--to=f.gruenbichler@proxmox.com \
--cc=f.ebner@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.