* [pve-devel] [PATCH qemu-server 2/3] Repeat check for replication target in locked section
2020-07-30 11:29 [pve-devel] [PATCH manager 1/3] Hold the guest migration lock when changing the replication config Fabian Ebner
@ 2020-07-30 11:29 ` Fabian Ebner
2020-07-30 11:29 ` [pve-devel] [PATCH/RFC qemu-server 3/3] Fix checks for transfering replication state/switching job target Fabian Ebner
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Fabian Ebner @ 2020-07-30 11:29 UTC (permalink / raw)
To: pve-devel
No need to warn twice, so the warning from the outside check
was removed.
Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
PVE/API2/Qemu.pm | 11 +++--------
PVE/QemuMigrate.pm | 13 +++++++++++++
2 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index 8da616a..bc67666 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -3539,14 +3539,9 @@ __PACKAGE__->register_method({
my $repl_conf = PVE::ReplicationConfig->new();
my $is_replicated = $repl_conf->check_for_existing_jobs($vmid, 1);
my $is_replicated_to_target = defined($repl_conf->find_local_replication_job($vmid, $target));
- if ($is_replicated && !$is_replicated_to_target) {
- if ($param->{force}) {
- warn "WARNING: Node '$target' is not a replication target. Existing replication " .
- "jobs will fail after migration!\n";
- } else {
- die "Cannot live-migrate replicated VM to node '$target' - not a replication target." .
- " Use 'force' to override.\n";
- }
+ if (!$param->{force} && $is_replicated && !$is_replicated_to_target) {
+ die "Cannot live-migrate replicated VM to node '$target' - not a replication " .
+ "target. Use 'force' to override.\n";
}
} else {
warn "VM isn't running. Doing offline migration instead.\n" if $param->{online};
diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index b699b67..a20e1c7 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -227,6 +227,19 @@ sub prepare {
die "can't migrate running VM without --online\n" if !$online;
$running = $pid;
+ my $repl_conf = PVE::ReplicationConfig->new();
+ my $is_replicated = $repl_conf->check_for_existing_jobs($vmid, 1);
+ my $is_replicated_to_target = defined($repl_conf->find_local_replication_job($vmid, $self->{node}));
+ if ($is_replicated && !$is_replicated_to_target) {
+ if ($self->{opts}->{force}) {
+ $self->log('warn', "WARNING: Node '$self->{node}' is not a replication target. Existing " .
+ "replication jobs will fail after migration!\n");
+ } else {
+ die "Cannot live-migrate replicated VM to node '$self->{node}' - not a replication " .
+ "target. Use 'force' to override.\n";
+ }
+ }
+
$self->{forcemachine} = PVE::QemuServer::Machine::qemu_machine_pxe($vmid, $conf);
# To support custom CPU types, we keep QEMU's "-cpu" parameter intact.
--
2.20.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [pve-devel] [PATCH/RFC qemu-server 3/3] Fix checks for transfering replication state/switching job target
2020-07-30 11:29 [pve-devel] [PATCH manager 1/3] Hold the guest migration lock when changing the replication config Fabian Ebner
2020-07-30 11:29 ` [pve-devel] [PATCH qemu-server 2/3] Repeat check for replication target in locked section Fabian Ebner
@ 2020-07-30 11:29 ` Fabian Ebner
2020-08-03 7:11 ` [pve-devel] [PATCH manager 1/3] Hold the guest migration lock when changing the replication config Fabian Ebner
2020-08-03 7:49 ` Fabian Grünbichler
3 siblings, 0 replies; 5+ messages in thread
From: Fabian Ebner @ 2020-07-30 11:29 UTC (permalink / raw)
To: pve-devel
When there are offline disks, $self->{replicated_volumes} will be
auto-vivified to {} by the check:
next if $self->{replicated_volumes}->{$volid}
in sync_disks() and then {} would evaluate to true in a boolean context.
Now the replication job information is retrieved once in prepare,
and the job information rather than the information if volumes
were replicated is used to decide whether to make the calls or not.
For offline migration to a non-replication target, there are no
$self->{replicated_volumes}, but the state should be transfered nonetheless.
For online migration to a non-replication target, replication
is broken afterwards anyways, so it doesn't make much of a difference
if the state is transferred or not.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
Hope I'm not misinterpreting when these calls should or shouldn't
be made.
PVE/QemuMigrate.pm | 21 ++++++++++-----------
1 file changed, 10 insertions(+), 11 deletions(-)
diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index a20e1c7..6097ef2 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -220,6 +220,10 @@ sub prepare {
# test if VM exists
my $conf = $self->{vmconf} = PVE::QemuConfig->load_config($vmid);
+ my $repl_conf = PVE::ReplicationConfig->new();
+ $self->{replication_jobcfg} = $repl_conf->find_local_replication_job($vmid, $self->{node});
+ $self->{is_replicated} = $repl_conf->check_for_existing_jobs($vmid, 1);
+
PVE::QemuConfig->check_lock($conf);
my $running = 0;
@@ -227,10 +231,7 @@ sub prepare {
die "can't migrate running VM without --online\n" if !$online;
$running = $pid;
- my $repl_conf = PVE::ReplicationConfig->new();
- my $is_replicated = $repl_conf->check_for_existing_jobs($vmid, 1);
- my $is_replicated_to_target = defined($repl_conf->find_local_replication_job($vmid, $self->{node}));
- if ($is_replicated && !$is_replicated_to_target) {
+ if ($self->{is_replicated} && !$self->{replication_jobcfg}) {
if ($self->{opts}->{force}) {
$self->log('warn', "WARNING: Node '$self->{node}' is not a replication target. Existing " .
"replication jobs will fail after migration!\n");
@@ -362,9 +363,7 @@ sub sync_disks {
});
}
- my $rep_cfg = PVE::ReplicationConfig->new();
- my $replication_jobcfg = $rep_cfg->find_local_replication_job($vmid, $self->{node});
- my $replicatable_volumes = !$replication_jobcfg ? {}
+ my $replicatable_volumes = !$self->{replication_jobcfg} ? {}
: PVE::QemuConfig->get_replicatable_volumes($storecfg, $vmid, $conf, 0, 1);
my $test_volid = sub {
@@ -489,7 +488,7 @@ sub sync_disks {
}
}
- if ($replication_jobcfg) {
+ if ($self->{replication_jobcfg}) {
if ($self->{running}) {
my $version = PVE::QemuServer::kvm_user_version();
@@ -523,7 +522,7 @@ sub sync_disks {
my $start_time = time();
my $logfunc = sub { $self->log('info', shift) };
$self->{replicated_volumes} = PVE::Replication::run_replication(
- 'PVE::QemuConfig', $replication_jobcfg, $start_time, $start_time, $logfunc);
+ 'PVE::QemuConfig', $self->{replication_jobcfg}, $start_time, $start_time, $logfunc);
}
# sizes in config have to be accurate for remote node to correctly
@@ -1193,7 +1192,7 @@ sub phase3_cleanup {
}
# transfer replication state before move config
- $self->transfer_replication_state() if $self->{replicated_volumes};
+ $self->transfer_replication_state() if $self->{is_replicated};
# move config to remote node
my $conffile = PVE::QemuConfig->config_file($vmid);
@@ -1202,7 +1201,7 @@ sub phase3_cleanup {
die "Failed to move config to node '$self->{node}' - rename failed: $!\n"
if !rename($conffile, $newconffile);
- $self->switch_replication_job_target() if $self->{replicated_volumes};
+ $self->switch_replication_job_target() if $self->{replication_jobcfg};
if ($self->{livemigration}) {
if ($self->{stopnbd}) {
--
2.20.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [pve-devel] [PATCH manager 1/3] Hold the guest migration lock when changing the replication config
2020-07-30 11:29 [pve-devel] [PATCH manager 1/3] Hold the guest migration lock when changing the replication config Fabian Ebner
2020-07-30 11:29 ` [pve-devel] [PATCH qemu-server 2/3] Repeat check for replication target in locked section Fabian Ebner
2020-07-30 11:29 ` [pve-devel] [PATCH/RFC qemu-server 3/3] Fix checks for transfering replication state/switching job target Fabian Ebner
@ 2020-08-03 7:11 ` Fabian Ebner
2020-08-03 7:49 ` Fabian Grünbichler
3 siblings, 0 replies; 5+ messages in thread
From: Fabian Ebner @ 2020-08-03 7:11 UTC (permalink / raw)
To: pve-devel
Am 30.07.20 um 13:29 schrieb Fabian Ebner:
> The guest migration lock is already held when running replications,
> but it also makes sense to hold it when updating the replication
> config itself. Otherwise, it can happen that the migration does
> not know the de-facto state of replication.
>
> For example:
> 1. migration starts
> 2. replication job is deleted
> 3. migration reads the replication config
> 4. migration runs the replication which causes the
> replicated disks to be removed, because the job
> is marked for removal
> 5. migration will continue without replication
>
This situation can still happen even with the locking from this patch:
1. replication job is deleted
2. migration starts before the replication was run, so the job is still
marked for removal in the replication config
3.-5. same as above
So we probably want to check during migration whether the replication
job that we want to use is marked for removal. If it is, we could either:
- leave the situation as is, i.e. the replication job will be removed
during migration and migration will continue without replication
- fail the migration (principle of least surprise?)
- run replication without the removal mark during migration. Then the
replication job would be removed the next time replication runs after
migration and hence after the target was switched.
Also: If we only read the replication config once during a migration,
the locking from this patch shouldn't even be necessary.
switch_replication_job_target does read the config once more, but that
would still be compatible with allowing other changes to the replication
config during migration. But of course this locking might make things
more future proof.
> Note that the migration doesn't actually fail, but it's probably
> not the desired behavior either.
>
> Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
> ---
> PVE/API2/ReplicationConfig.pm | 18 ++++++++++++++----
> 1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/PVE/API2/ReplicationConfig.pm b/PVE/API2/ReplicationConfig.pm
> index 2b4ecd10..e5262068 100644
> --- a/PVE/API2/ReplicationConfig.pm
> +++ b/PVE/API2/ReplicationConfig.pm
> @@ -9,6 +9,7 @@ use PVE::JSONSchema qw(get_standard_option);
> use PVE::RPCEnvironment;
> use PVE::ReplicationConfig;
> use PVE::Cluster;
> +use PVE::GuestHelpers;
>
> use PVE::RESTHandler;
>
> @@ -144,7 +145,9 @@ __PACKAGE__->register_method ({
> $cfg->write();
> };
>
> - PVE::ReplicationConfig::lock($code);
> + PVE::GuestHelpers::guest_migration_lock($guest, 10, sub {
> + PVE::ReplicationConfig::lock($code);
> + });
>
> return undef;
> }});
> @@ -167,6 +170,7 @@ __PACKAGE__->register_method ({
> my $id = extract_param($param, 'id');
> my $digest = extract_param($param, 'digest');
> my $delete = extract_param($param, 'delete');
> + my ($guest_id) = PVE::ReplicationConfig::parse_replication_job_id($id);
>
> my $code = sub {
> my $cfg = PVE::ReplicationConfig->new();
> @@ -199,7 +203,9 @@ __PACKAGE__->register_method ({
> $cfg->write();
> };
>
> - PVE::ReplicationConfig::lock($code);
> + PVE::GuestHelpers::guest_migration_lock($guest_id, 10, sub {
> + PVE::ReplicationConfig::lock($code);
> + });
>
> return undef;
> }});
> @@ -237,10 +243,12 @@ __PACKAGE__->register_method ({
>
> my $rpcenv = PVE::RPCEnvironment::get();
>
> + my $id = extract_param($param, 'id');
> + my ($guest_id) = PVE::ReplicationConfig::parse_replication_job_id($id);
> +
> my $code = sub {
> my $cfg = PVE::ReplicationConfig->new();
>
> - my $id = $param->{id};
> if ($param->{force}) {
> raise_param_exc({ 'keep' => "conflicts with parameter 'force'" }) if $param->{keep};
> delete $cfg->{ids}->{$id};
> @@ -262,7 +270,9 @@ __PACKAGE__->register_method ({
> $cfg->write();
> };
>
> - PVE::ReplicationConfig::lock($code);
> + PVE::GuestHelpers::guest_migration_lock($guest_id, 10, sub {
> + PVE::ReplicationConfig::lock($code);
> + });
>
> return undef;
> }});
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [pve-devel] [PATCH manager 1/3] Hold the guest migration lock when changing the replication config
2020-07-30 11:29 [pve-devel] [PATCH manager 1/3] Hold the guest migration lock when changing the replication config Fabian Ebner
` (2 preceding siblings ...)
2020-08-03 7:11 ` [pve-devel] [PATCH manager 1/3] Hold the guest migration lock when changing the replication config Fabian Ebner
@ 2020-08-03 7:49 ` Fabian Grünbichler
3 siblings, 0 replies; 5+ messages in thread
From: Fabian Grünbichler @ 2020-08-03 7:49 UTC (permalink / raw)
To: Fabian Ebner, pve-devel
On July 30, 2020 1:29 pm, Fabian Ebner wrote:
> The guest migration lock is already held when running replications,
> but it also makes sense to hold it when updating the replication
> config itself. Otherwise, it can happen that the migration does
> not know the de-facto state of replication.
>
> For example:
> 1. migration starts
> 2. replication job is deleted
> 3. migration reads the replication config
> 4. migration runs the replication which causes the
> replicated disks to be removed, because the job
> is marked for removal
> 5. migration will continue without replication
>
> Note that the migration doesn't actually fail, but it's probably
> not the desired behavior either.
>
> Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
> ---
> PVE/API2/ReplicationConfig.pm | 18 ++++++++++++++----
> 1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/PVE/API2/ReplicationConfig.pm b/PVE/API2/ReplicationConfig.pm
> index 2b4ecd10..e5262068 100644
> --- a/PVE/API2/ReplicationConfig.pm
> +++ b/PVE/API2/ReplicationConfig.pm
> @@ -9,6 +9,7 @@ use PVE::JSONSchema qw(get_standard_option);
> use PVE::RPCEnvironment;
> use PVE::ReplicationConfig;
> use PVE::Cluster;
> +use PVE::GuestHelpers;
>
> use PVE::RESTHandler;
>
> @@ -144,7 +145,9 @@ __PACKAGE__->register_method ({
> $cfg->write();
> };
>
> - PVE::ReplicationConfig::lock($code);
> + PVE::GuestHelpers::guest_migration_lock($guest, 10, sub {
> + PVE::ReplicationConfig::lock($code);
> + });
it might make sense to have a single wrapper for this, or add the guest
ID as parameter to ReplicationConfig::lock (to not miss it or get the
order wrong).
what about the calls to lock within ReplicationConfig? they are all
job/guest ID specific, and should also get this additional protection,
right?
from a quick glance, there seems to be only a single call to
ReplicationConfig::lock that spans more than one job (job_status in
ReplicationState), but that immediately iterates over jobs, so we could
either move the lock into the loop (expensive, since it involves a
cfs_lock), or split the cfs and flock just for this instance?
(side note, that code and possibly other stuff in ReplicationConfig is
buggy since it does not re-read the config after locking)
>
> return undef;
> }});
> @@ -167,6 +170,7 @@ __PACKAGE__->register_method ({
> my $id = extract_param($param, 'id');
> my $digest = extract_param($param, 'digest');
> my $delete = extract_param($param, 'delete');
> + my ($guest_id) = PVE::ReplicationConfig::parse_replication_job_id($id);
>
> my $code = sub {
> my $cfg = PVE::ReplicationConfig->new();
> @@ -199,7 +203,9 @@ __PACKAGE__->register_method ({
> $cfg->write();
> };
>
> - PVE::ReplicationConfig::lock($code);
> + PVE::GuestHelpers::guest_migration_lock($guest_id, 10, sub {
> + PVE::ReplicationConfig::lock($code);
> + });
>
> return undef;
> }});
> @@ -237,10 +243,12 @@ __PACKAGE__->register_method ({
>
> my $rpcenv = PVE::RPCEnvironment::get();
>
> + my $id = extract_param($param, 'id');
> + my ($guest_id) = PVE::ReplicationConfig::parse_replication_job_id($id);
> +
> my $code = sub {
> my $cfg = PVE::ReplicationConfig->new();
>
> - my $id = $param->{id};
> if ($param->{force}) {
> raise_param_exc({ 'keep' => "conflicts with parameter 'force'" }) if $param->{keep};
> delete $cfg->{ids}->{$id};
> @@ -262,7 +270,9 @@ __PACKAGE__->register_method ({
> $cfg->write();
> };
>
> - PVE::ReplicationConfig::lock($code);
> + PVE::GuestHelpers::guest_migration_lock($guest_id, 10, sub {
> + PVE::ReplicationConfig::lock($code);
> + });
>
> return undef;
> }});
> --
> 2.20.1
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread