From: Fabian Ebner <f.ebner@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: Re: [pve-devel] [PATCH/RFC guest-common 6/6] job_status: return jobs with target local node
Date: Tue, 11 Aug 2020 11:20:55 +0200 [thread overview]
Message-ID: <cd685c4e-64d2-0fdd-038a-5e5ea02dec82@proxmox.com> (raw)
In-Reply-To: <20200810123557.22618-6-f.ebner@proxmox.com>
There is another minor issue (with and without my patches):
If there is a job 123-0 for a guest on pve0 with source=target=pve0 and
a job 123-4 with source=pve0 and target=pve1, and we migrate to pve1,
then switching source and target for job 123-4 is not possible, because
there already is a job with target pve0. Thus cfg->write() will fail,
and by extension job_status. (Also the switch_replication_job_target
call during migration fails for the same reason).
Possible solutions:
1. Instead of making such jobs (i.e. jobs with target=source) visible,
in the hope that a user would remove/fix them, we could automatically
remove them ourselves (this could be done as part of the
switch_replication_job_target function as well). Under normal
conditions, there shouldn't be any such jobs anyways.
2. Alternatively (or additionally), we could also add checks in the
create/update API paths to ensure that the target is not the node the
guest is on.
Option 2 would add a reason for using guest_migration locks in the
create/update paths. But I'm not sure we'd want that. The ability to
update job configurations while a replication is running is a feature
IMHO, and I think stealing guests might still lead to a bad
configuration. Therefore, I'd prefer option 1, which just adds a bit to
the automatic fixing we already do.
@Fabian G.: Opinions?
Am 10.08.20 um 14:35 schrieb Fabian Ebner:
> even if not scheduled for removal, while adapting
> replicate to die gracefully except for the removal case.
>
> Like this such invalid jobs are not hidden to the user anymore
> (at least via the API, the GUI still hides them)
>
> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
> ---
>
> I think it's a bit weird that such jobs only show up once
> they are scheduled for removal. I'll send a patch for the
> GUI too if we do want the new behavior.
>
> PVE/Replication.pm | 3 +++
> PVE/ReplicationState.pm | 5 +----
> 2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/PVE/Replication.pm b/PVE/Replication.pm
> index ae0f145..b5835bd 100644
> --- a/PVE/Replication.pm
> +++ b/PVE/Replication.pm
> @@ -207,6 +207,9 @@ sub replicate {
>
> die "not implemented - internal error" if $jobcfg->{type} ne 'local';
>
> + die "job target is local node\n" if $jobcfg->{target} eq $local_node
> + && !$jobcfg->{remove_job};
> +
> my $dc_conf = PVE::Cluster::cfs_read_file('datacenter.cfg');
>
> my $migration_network;
> diff --git a/PVE/ReplicationState.pm b/PVE/ReplicationState.pm
> index e486bc7..0b751bb 100644
> --- a/PVE/ReplicationState.pm
> +++ b/PVE/ReplicationState.pm
> @@ -261,10 +261,6 @@ sub job_status {
> $cfg->switch_replication_job_target_nolock($vmid, $local_node, $jobcfg->{source})
> if $local_node ne $jobcfg->{source};
>
> - my $target = $jobcfg->{target};
> - # never sync to local node
> - next if !$jobcfg->{remove_job} && $target eq $local_node;
> -
> next if !$get_disabled && $jobcfg->{disable};
>
> my $state = extract_job_state($stateobj, $jobcfg);
> @@ -280,6 +276,7 @@ sub job_status {
> } else {
> if (my $fail_count = $state->{fail_count}) {
> my $members = PVE::Cluster::get_members();
> + my $target = $jobcfg->{target};
> if (!$fail_count || ($members->{$target} && $members->{$target}->{online})) {
> $next_sync = $state->{last_try} + 60*($fail_count < 3 ? 5*$fail_count : 30);
> }
>
next prev parent reply other threads:[~2020-08-11 9:20 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-10 12:35 [pve-devel] [PATCH manager 1/6] Set source when creating a new replication job Fabian Ebner
2020-08-10 12:35 ` [pve-devel] [PATCH guest-common 2/6] job_status: read only after acquiring the lock Fabian Ebner
2020-08-10 12:35 ` [pve-devel] [PATCH guest-common 3/6] Clarify what the source property is used for in a replication job Fabian Ebner
2020-08-10 12:35 ` [pve-devel] [PATCH guest-common 4/6] Also update sources in switch_replication_job_target Fabian Ebner
2020-08-10 12:35 ` [pve-devel] [PATCH/RFC guest-common 5/6] job_status: simplify fixup of jobs for stolen guests Fabian Ebner
2020-08-10 12:35 ` [pve-devel] [PATCH/RFC guest-common 6/6] job_status: return jobs with target local node Fabian Ebner
2020-08-11 9:20 ` Fabian Ebner [this message]
2020-08-13 10:06 ` Fabian Grünbichler
2020-08-11 12:31 ` [pve-devel] applied: [PATCH manager 1/6] Set source when creating a new replication job Fabian Grünbichler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cd685c4e-64d2-0fdd-038a-5e5ea02dec82@proxmox.com \
--to=f.ebner@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal