public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [POC qemu-server] fix 3303: allow "live" upgrade of qemu version
@ 2021-04-08 10:33 Fabian Ebner
  2021-04-08 10:33 ` [pve-devel] [POC qemu-server 1/6] create vmstate_size helper Fabian Ebner
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: Fabian Ebner @ 2021-04-08 10:33 UTC (permalink / raw)
  To: pve-devel

The code is in a very early state, I'm just sending this to discuss the idea.
I didn't do a whole lot of testing yet, but it does seem to work.

The idea is rather simple:
1. save the state to ramfs
2. stop the VM
3. start the VM loading the state

This approach solves the problem that our stack is (currently) not designed to
have multiple instances with the same VM ID running. To do so, we'd need to
handle config locking, sockets, pid file, passthrough resources?, etc.

Another nice feature of this approach is that it doesn't require touching the
vm_start or migration code at all, avoiding further bloating.


Thanks to Fabian G. and Stefan for inspiring this idea:

Fabian G. suggested using the suspend to disk + start route if the required
changes to our stack would turn out to be infeasable.

Stefan suggested migrating to a dummy VM (outside our stack) which just holds
the state and migrating back right away. It seems that dummy VM is in fact not
even needed ;) If we really really care about smallest possible downtime, this
approach might still be the best, and we'd need to start the dummy VM while the
backwards migration runs (resulting in two times the migration downtime). But
it does have more moving parts and requires some migration/startup changes.


Fabian Ebner (6):
  create vmstate_size helper
  create savevm_monitor helper
  draft of upgrade_qemu function
  draft of qemuupgrade API call
  add timing for testing
  add usleep parameter to savevm_monitor

 PVE/API2/Qemu.pm  |  60 ++++++++++++++++++++++
 PVE/QemuConfig.pm |  10 +---
 PVE/QemuServer.pm | 125 +++++++++++++++++++++++++++++++++++++++-------
 3 files changed, 170 insertions(+), 25 deletions(-)

-- 
2.20.1





^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pve-devel] [POC qemu-server 1/6] create vmstate_size helper
  2021-04-08 10:33 [pve-devel] [POC qemu-server] fix 3303: allow "live" upgrade of qemu version Fabian Ebner
@ 2021-04-08 10:33 ` Fabian Ebner
  2021-04-08 10:33 ` [pve-devel] [POC qemu-server 2/6] create savevm_monitor helper Fabian Ebner
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Fabian Ebner @ 2021-04-08 10:33 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
 PVE/QemuConfig.pm | 10 ++--------
 PVE/QemuServer.pm | 13 +++++++++++++
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/PVE/QemuConfig.pm b/PVE/QemuConfig.pm
index 7ee8876..01c51b0 100644
--- a/PVE/QemuConfig.pm
+++ b/PVE/QemuConfig.pm
@@ -208,14 +208,8 @@ sub __snapshot_save_vmstate {
 	$target = PVE::QemuServer::find_vmstate_storage($conf, $storecfg);
     }
 
-    my $defaults = PVE::QemuServer::load_defaults();
-    my $mem_size = $conf->{memory} // $defaults->{memory};
-    my $driver_state_size = 500; # assume 500MB is enough to safe all driver state;
-    # our savevm-start does live-save of the memory until the space left in the
-    # volume is just enough for the remaining memory content + internal state
-    # then it stops the vm and copies the rest so we reserve twice the
-    # memory content + state to minimize vm downtime
-    my $size = $mem_size*2 + $driver_state_size;
+    my $size = PVE::QemuServer::vmstate_size($conf);
+
     my $scfg = PVE::Storage::storage_config($storecfg, $target);
 
     my $name = "vm-$vmid-state-$snapname";
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index fdb2ac9..5a89853 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -7710,4 +7710,17 @@ sub vm_is_paused {
     return $qmpstatus && $qmpstatus->{status} eq "paused";
 }
 
+sub vmstate_size {
+    my ($conf) = @_;
+
+    my $defaults = PVE::QemuServer::load_defaults();
+    my $mem_size = $conf->{memory} // $defaults->{memory};
+    my $driver_state_size = 500; # assume 500MB is enough to safe all driver state;
+    # our savevm-start does live-save of the memory until the space left in the
+    # volume is just enough for the remaining memory content + internal state
+    # then it stops the vm and copies the rest so we reserve twice the
+    # memory content + state to minimize vm downtime
+    return $mem_size*2 + $driver_state_size;
+}
+
 1;
-- 
2.20.1





^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pve-devel] [POC qemu-server 2/6] create savevm_monitor helper
  2021-04-08 10:33 [pve-devel] [POC qemu-server] fix 3303: allow "live" upgrade of qemu version Fabian Ebner
  2021-04-08 10:33 ` [pve-devel] [POC qemu-server 1/6] create vmstate_size helper Fabian Ebner
@ 2021-04-08 10:33 ` Fabian Ebner
  2021-04-08 10:33 ` [pve-devel] [POC qemu-server 3/6] draft of upgrade_qemu function Fabian Ebner
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Fabian Ebner @ 2021-04-08 10:33 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
 PVE/QemuServer.pm | 38 ++++++++++++++++++++++----------------
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 5a89853..983fb2f 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -5621,6 +5621,27 @@ sub vm_reboot {
    });
 }
 
+sub savevm_monitor {
+    my ($vmid) = @_;
+
+    for(;;) {
+	my $state = mon_cmd($vmid, "query-savevm");
+	if (!$state->{status}) {
+	    die "savevm not active\n";
+	} elsif ($state->{status} eq 'active') {
+	    sleep(1);
+	    next;
+	} elsif ($state->{status} eq 'completed') {
+	    print "State saved, quitting\n";
+	    return;
+	} elsif ($state->{status} eq 'failed' && $state->{error}) {
+	    die "query-savevm failed with error '$state->{error}'\n"
+	} else {
+	    die "query-savevm returned status '$state->{status}'\n";
+	}
+    }
+}
+
 # note: if using the statestorage parameter, the caller has to check privileges
 sub vm_suspend {
     my ($vmid, $skiplock, $includestate, $statestorage) = @_;
@@ -5672,22 +5693,7 @@ sub vm_suspend {
 	eval {
 	    set_migration_caps($vmid, 1);
 	    mon_cmd($vmid, "savevm-start", statefile => $path);
-	    for(;;) {
-		my $state = mon_cmd($vmid, "query-savevm");
-		if (!$state->{status}) {
-		    die "savevm not active\n";
-		} elsif ($state->{status} eq 'active') {
-		    sleep(1);
-		    next;
-		} elsif ($state->{status} eq 'completed') {
-		    print "State saved, quitting\n";
-		    last;
-		} elsif ($state->{status} eq 'failed' && $state->{error}) {
-		    die "query-savevm failed with error '$state->{error}'\n"
-		} else {
-		    die "query-savevm returned status '$state->{status}'\n";
-		}
-	    }
+	    savevm_monitor($vmid);
 	};
 	my $err = $@;
 
-- 
2.20.1





^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pve-devel] [POC qemu-server 3/6] draft of upgrade_qemu function
  2021-04-08 10:33 [pve-devel] [POC qemu-server] fix 3303: allow "live" upgrade of qemu version Fabian Ebner
  2021-04-08 10:33 ` [pve-devel] [POC qemu-server 1/6] create vmstate_size helper Fabian Ebner
  2021-04-08 10:33 ` [pve-devel] [POC qemu-server 2/6] create savevm_monitor helper Fabian Ebner
@ 2021-04-08 10:33 ` Fabian Ebner
  2021-04-08 10:33 ` [pve-devel] [POC qemu-server 4/6] draft of qemuupgrade API call Fabian Ebner
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Fabian Ebner @ 2021-04-08 10:33 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
 PVE/QemuServer.pm | 50 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 983fb2f..fa2aad9 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -7729,4 +7729,54 @@ sub vmstate_size {
     return $mem_size*2 + $driver_state_size;
 }
 
+# TODO ensure that qmeventd is happy with what we do
+sub upgrade_qemu {
+    my ($vmid, $conf, $param) = @_;
+
+    my $storecfg = PVE::Storage::config();
+    my $forcemachine = PVE::QemuServer::Machine::qemu_machine_pxe($vmid, $conf);
+
+    # TODO is it worth setting a lock in the config?
+
+    my $size = vmstate_size($conf);
+
+    my $ramfs = "/run/pve/${vmid}_ramfs";
+    my $statefile = "${ramfs}/state";
+
+    mkpath($ramfs);
+
+    run_command(['mount', '-t', 'ramfs', 'ramfs', $ramfs]);
+    run_command(['truncate', '-s', "${size}M", $statefile]);
+
+    eval {
+	eval {
+	    set_migration_caps($vmid, 1); #TODO needed here?
+	    mon_cmd($vmid, "savevm-start", statefile => $statefile);
+	    savevm_monitor($vmid);
+	};
+	if (my $err = $@) {
+	    eval { mon_cmd($vmid, "savevm-end"); };
+	    warn $@ if $@;
+	    die $err;
+	}
+
+	mon_cmd($vmid, "quit");
+
+	my $start_params = {
+	    statefile => $statefile,
+	    forcemachine => $forcemachine,
+	};
+
+	# TODO forcecpu, spice ticket?
+	vm_start_nolock($storecfg, $vmid, $conf, $start_params, {});
+
+	# TODO save state to disk if there's an error with start?
+    };
+    my $err = $@;
+
+    run_command(['umount', $ramfs]);
+
+    die $err if $err;
+}
+
 1;
-- 
2.20.1





^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pve-devel] [POC qemu-server 4/6] draft of qemuupgrade API call
  2021-04-08 10:33 [pve-devel] [POC qemu-server] fix 3303: allow "live" upgrade of qemu version Fabian Ebner
                   ` (2 preceding siblings ...)
  2021-04-08 10:33 ` [pve-devel] [POC qemu-server 3/6] draft of upgrade_qemu function Fabian Ebner
@ 2021-04-08 10:33 ` Fabian Ebner
  2021-04-08 10:33 ` [pve-devel] [POC qemu-server 5/6] add timing for testing Fabian Ebner
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Fabian Ebner @ 2021-04-08 10:33 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
 PVE/API2/Qemu.pm | 60 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index c56b609..f20cd76 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -836,6 +836,7 @@ __PACKAGE__->register_method({
 	    { subdir => 'spiceproxy' },
 	    { subdir => 'sendkey' },
 	    { subdir => 'firewall' },
+	    { subdir => 'upgradeqemu' },
 	    ];
 
 	return $res;
@@ -4395,4 +4396,63 @@ __PACKAGE__->register_method({
 	return PVE::QemuServer::Cloudinit::dump_cloudinit_config($conf, $param->{vmid}, $param->{type});
     }});
 
+__PACKAGE__->register_method({
+    name => 'upgrade_qemu',
+    path => '{vmid}/upgradeqemu',
+    method => 'POST',
+    protected => 1,
+    proxyto => 'node',
+    description => "Upgrade the running QEMU version of the VM to the currently installed one.",
+    permissions => {
+	check => ['perm', '/vms/{vmid}', [ 'VM.PowerMgmt' ]],
+    },
+    parameters => {
+	additionalProperties => 0,
+	properties => {
+	    node => get_standard_option('pve-node'),
+	    vmid => get_standard_option('pve-vmid',
+		{ completion => \&PVE::QemuServer::complete_vmid }),
+	},
+    },
+    returns => {
+	type => 'string',
+	description => "the task ID.",
+    },
+    code => sub {
+	my ($param) = @_;
+
+	my $rpcenv = PVE::RPCEnvironment::get();
+	my $authuser = $rpcenv->get_user();
+
+	my $node = extract_param($param, 'node');
+	my $vmid = extract_param($param, 'vmid');
+
+	my $check_and_load_config = sub {
+	    PVE::QemuConfig::assert_config_exists_on_node($vmid, $node);
+	    PVE::QemuServer::Helpers::vm_running_locally($vmid) or die "VM is not running\n";
+
+	    # TODO check if running version is actually outdated
+
+	    my $conf = PVE::QemuConfig->load_config($vmid);
+	    PVE::QemuConfig->check_lock($conf);
+
+	    return $conf;
+	};
+
+	$check_and_load_config->();
+
+	# TODO ensure HA is happy with what we do
+
+	my $realcmd = sub {
+	    my $conf = $check_and_load_config->();
+	    PVE::QemuServer::upgrade_qemu($vmid, $conf);
+	};
+
+	my $worker = sub {
+	    return PVE::QemuConfig->lock_config($vmid, $realcmd);
+	};
+
+	return $rpcenv->fork_worker('qemuupgrade', $vmid, $authuser, $worker);
+    }});
+
 1;
-- 
2.20.1





^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pve-devel] [POC qemu-server 5/6] add timing for testing
  2021-04-08 10:33 [pve-devel] [POC qemu-server] fix 3303: allow "live" upgrade of qemu version Fabian Ebner
                   ` (3 preceding siblings ...)
  2021-04-08 10:33 ` [pve-devel] [POC qemu-server 4/6] draft of qemuupgrade API call Fabian Ebner
@ 2021-04-08 10:33 ` Fabian Ebner
  2021-04-08 10:33 ` [pve-devel] [POC qemu-server 6/6] add usleep parameter to savevm_monitor Fabian Ebner
  2021-04-08 16:44 ` [pve-devel] [POC qemu-server] fix 3303: allow "live" upgrade of qemu version Thomas Lamprecht
  6 siblings, 0 replies; 9+ messages in thread
From: Fabian Ebner @ 2021-04-08 10:33 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
 PVE/QemuServer.pm | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index fa2aad9..0287a80 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -22,7 +22,7 @@ use JSON;
 use MIME::Base64;
 use POSIX;
 use Storable qw(dclone);
-use Time::HiRes qw(gettimeofday);
+use Time::HiRes qw(gettimeofday tv_interval);
 use URI::Escape;
 use UUID;
 
@@ -7743,16 +7743,34 @@ sub upgrade_qemu {
     my $ramfs = "/run/pve/${vmid}_ramfs";
     my $statefile = "${ramfs}/state";
 
+    # for testing
+    my $start;
+
+    my $print_elapsed = sub {
+	my ($what) = @_;
+	my $end = [gettimeofday()];
+	my $elapsed = tv_interval($start, $end);
+	print "elapsed: '$elapsed' for $what\n";
+	$start = [gettimeofday()];
+    };
+
+    $start = [gettimeofday()];
+
     mkpath($ramfs);
 
     run_command(['mount', '-t', 'ramfs', 'ramfs', $ramfs]);
     run_command(['truncate', '-s', "${size}M", $statefile]);
 
+    $print_elapsed->("preparing ramfs");
+
     eval {
 	eval {
 	    set_migration_caps($vmid, 1); #TODO needed here?
+	    $print_elapsed->("setting migration caps");
 	    mon_cmd($vmid, "savevm-start", statefile => $statefile);
+	    $print_elapsed->("issuing savevm-start");
 	    savevm_monitor($vmid);
+	    $print_elapsed->("saving state");
 	};
 	if (my $err = $@) {
 	    eval { mon_cmd($vmid, "savevm-end"); };
@@ -7761,6 +7779,7 @@ sub upgrade_qemu {
 	}
 
 	mon_cmd($vmid, "quit");
+	$print_elapsed->("issuing quit");
 
 	my $start_params = {
 	    statefile => $statefile,
@@ -7769,6 +7788,7 @@ sub upgrade_qemu {
 
 	# TODO forcecpu, spice ticket?
 	vm_start_nolock($storecfg, $vmid, $conf, $start_params, {});
+	$print_elapsed->("starting vm");
 
 	# TODO save state to disk if there's an error with start?
     };
-- 
2.20.1





^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pve-devel] [POC qemu-server 6/6] add usleep parameter to savevm_monitor
  2021-04-08 10:33 [pve-devel] [POC qemu-server] fix 3303: allow "live" upgrade of qemu version Fabian Ebner
                   ` (4 preceding siblings ...)
  2021-04-08 10:33 ` [pve-devel] [POC qemu-server 5/6] add timing for testing Fabian Ebner
@ 2021-04-08 10:33 ` Fabian Ebner
  2021-04-08 16:44 ` [pve-devel] [POC qemu-server] fix 3303: allow "live" upgrade of qemu version Thomas Lamprecht
  6 siblings, 0 replies; 9+ messages in thread
From: Fabian Ebner @ 2021-04-08 10:33 UTC (permalink / raw)
  To: pve-devel

and potentially save a big part of a second

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
 PVE/QemuServer.pm | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 0287a80..a8caa35 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -22,7 +22,7 @@ use JSON;
 use MIME::Base64;
 use POSIX;
 use Storable qw(dclone);
-use Time::HiRes qw(gettimeofday tv_interval);
+use Time::HiRes qw(gettimeofday tv_interval usleep);
 use URI::Escape;
 use UUID;
 
@@ -5622,14 +5622,16 @@ sub vm_reboot {
 }
 
 sub savevm_monitor {
-    my ($vmid) = @_;
+    my ($vmid, $usleep) = @_;
+
+    $usleep //= 1000 * 1000;
 
     for(;;) {
 	my $state = mon_cmd($vmid, "query-savevm");
 	if (!$state->{status}) {
 	    die "savevm not active\n";
 	} elsif ($state->{status} eq 'active') {
-	    sleep(1);
+	    usleep($usleep);
 	    next;
 	} elsif ($state->{status} eq 'completed') {
 	    print "State saved, quitting\n";
@@ -7769,7 +7771,7 @@ sub upgrade_qemu {
 	    $print_elapsed->("setting migration caps");
 	    mon_cmd($vmid, "savevm-start", statefile => $statefile);
 	    $print_elapsed->("issuing savevm-start");
-	    savevm_monitor($vmid);
+	    savevm_monitor($vmid, 10 * 1000);
 	    $print_elapsed->("saving state");
 	};
 	if (my $err = $@) {
-- 
2.20.1





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [pve-devel] [POC qemu-server] fix 3303: allow "live" upgrade of qemu version
  2021-04-08 10:33 [pve-devel] [POC qemu-server] fix 3303: allow "live" upgrade of qemu version Fabian Ebner
                   ` (5 preceding siblings ...)
  2021-04-08 10:33 ` [pve-devel] [POC qemu-server 6/6] add usleep parameter to savevm_monitor Fabian Ebner
@ 2021-04-08 16:44 ` Thomas Lamprecht
  2021-06-23 17:56   ` Laurent GUERBY
  6 siblings, 1 reply; 9+ messages in thread
From: Thomas Lamprecht @ 2021-04-08 16:44 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fabian Ebner

On 08.04.21 12:33, Fabian Ebner wrote:
> The code is in a very early state, I'm just sending this to discuss the idea.
> I didn't do a whole lot of testing yet, but it does seem to work.
> 
> The idea is rather simple:
> 1. save the state to ramfs
> 2. stop the VM
> 3. start the VM loading the state

For the record, as we (Dietmar, you and I) discussed this a bit off-list:

The issue we see here is that one temporarily requires a potential big chunk of
free memory, i.e., another time the amount the guest is assigned. So tens to
hundreds of GiB, which (educated guess) > 90 % of our users just do not have
available, at least for the bigger VMs of theirs.

So, it would be nicer if we could makes this more QEMU internal, e.g., just save
the state out (as that one may not be compatible 1:1 for reuse with the new QEMU
version) and re-use the guest memory directly, e.g., start new QEMU process
migrate state and map over the guest-memory, then pause old one, cont new one and
be done (very condensed).
That may have it's own difficulties/edge-cases, but it would not require having
so much extra memory freely available...

> 
> This approach solves the problem that our stack is (currently) not designed to
> have multiple instances with the same VM ID running. To do so, we'd need to
> handle config locking, sockets, pid file, passthrough resources?, etc.
> 
> Another nice feature of this approach is that it doesn't require touching the
> vm_start or migration code at all, avoiding further bloating.
> 
> 
> Thanks to Fabian G. and Stefan for inspiring this idea:
> 
> Fabian G. suggested using the suspend to disk + start route if the required
> changes to our stack would turn out to be infeasable.
> 
> Stefan suggested migrating to a dummy VM (outside our stack) which just holds
> the state and migrating back right away. It seems that dummy VM is in fact not
> even needed ;) If we really really care about smallest possible downtime, this
> approach might still be the best, and we'd need to start the dummy VM while the
> backwards migration runs (resulting in two times the migration downtime). But
> it does have more moving parts and requires some migration/startup changes.
> 
> 
> Fabian Ebner (6):
>   create vmstate_size helper
>   create savevm_monitor helper
>   draft of upgrade_qemu function
>   draft of qemuupgrade API call
>   add timing for testing
>   add usleep parameter to savevm_monitor
> 
>  PVE/API2/Qemu.pm  |  60 ++++++++++++++++++++++
>  PVE/QemuConfig.pm |  10 +---
>  PVE/QemuServer.pm | 125 +++++++++++++++++++++++++++++++++++++++-------
>  3 files changed, 170 insertions(+), 25 deletions(-)
> 





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [pve-devel] [POC qemu-server] fix 3303: allow "live" upgrade of qemu version
  2021-04-08 16:44 ` [pve-devel] [POC qemu-server] fix 3303: allow "live" upgrade of qemu version Thomas Lamprecht
@ 2021-06-23 17:56   ` Laurent GUERBY
  0 siblings, 0 replies; 9+ messages in thread
From: Laurent GUERBY @ 2021-06-23 17:56 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fabian Ebner

On Thu, 2021-04-08 at 18:44 +0200, Thomas Lamprecht wrote:
> On 08.04.21 12:33, Fabian Ebner wrote:
> > The code is in a very early state, I'm just sending this to discuss
> > the idea.
> > I didn't do a whole lot of testing yet, but it does seem to work.
> > 
> > The idea is rather simple:
> > 1. save the state to ramfs
> > 2. stop the VM
> > 3. start the VM loading the state
> 
> For the record, as we (Dietmar, you and I) discussed this a bit off-
> list:
> 
> The issue we see here is that one temporarily requires a potential
> big chunk of
> free memory, i.e., another time the amount the guest is assigned. So
> tens to
> hundreds of GiB, which (educated guess) > 90 % of our users just do
> not have
> available, at least for the bigger VMs of theirs.
> 
> So, it would be nicer if we could makes this more QEMU internal,
> e.g., just save
> the state out (as that one may not be compatible 1:1 for reuse with
> the new QEMU
> version) and re-use the guest memory directly, e.g., start new QEMU
> process
> migrate state and map over the guest-memory, then pause old one, cont
> new one and
> be done (very condensed).
> That may have it's own difficulties/edge-cases, but it would not
> require having
> so much extra memory freely available...

Hi,

I'm wondering how much ksm would help reduce the extra memory
requirement during same host migration.

May be there's a sweet spot by changing ksm to be more aggressive just
before starting the migration and slowing down the migration using
bandwidth control parameter so all new pages created by the migration
process end up shared quickly? And returning ksmtuned to default after
it's done.

Or may be only lowering migration bandwidth will be enough with ksm
settings unchanged (still has to be faster than mutation rate though so
can't be too low).

I assume for most users even if the migration to same host is slow it's
fine since it will not consume network ressources, just a bit more cpu.

Sincerely,

Laurent

PS: thanks Stefan_R for pointing this thread
https://forum.proxmox.com/threads/upgrade-of-pve-qemu-kvm-and-running-v
m.91236/

> > 
> > This approach solves the problem that our stack is (currently) not
> > designed to
> > have multiple instances with the same VM ID running. To do so, we'd
> > need to
> > handle config locking, sockets, pid file, passthrough resources?,
> > etc.
> > 
> > Another nice feature of this approach is that it doesn't require
> > touching the
> > vm_start or migration code at all, avoiding further bloating.
> > 
> > 
> > Thanks to Fabian G. and Stefan for inspiring this idea:
> > 
> > Fabian G. suggested using the suspend to disk + start route if the
> > required
> > changes to our stack would turn out to be infeasable.
> > 
> > Stefan suggested migrating to a dummy VM (outside our stack) which
> > just holds
> > the state and migrating back right away. It seems that dummy VM is
> > in fact not
> > even needed ;) If we really really care about smallest possible
> > downtime, this
> > approach might still be the best, and we'd need to start the dummy
> > VM while the
> > backwards migration runs (resulting in two times the migration
> > downtime). But
> > it does have more moving parts and requires some migration/startup
> > changes.
> > 
> > 
> > Fabian Ebner (6):
> >   create vmstate_size helper
> >   create savevm_monitor helper
> >   draft of upgrade_qemu function
> >   draft of qemuupgrade API call
> >   add timing for testing
> >   add usleep parameter to savevm_monitor
> > 
> >  PVE/API2/Qemu.pm  |  60 ++++++++++++++++++++++
> >  PVE/QemuConfig.pm |  10 +---
> >  PVE/QemuServer.pm | 125 +++++++++++++++++++++++++++++++++++++++---
> > ----
> >  3 files changed, 170 insertions(+), 25 deletions(-)
> > 
> 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-06-23 17:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-08 10:33 [pve-devel] [POC qemu-server] fix 3303: allow "live" upgrade of qemu version Fabian Ebner
2021-04-08 10:33 ` [pve-devel] [POC qemu-server 1/6] create vmstate_size helper Fabian Ebner
2021-04-08 10:33 ` [pve-devel] [POC qemu-server 2/6] create savevm_monitor helper Fabian Ebner
2021-04-08 10:33 ` [pve-devel] [POC qemu-server 3/6] draft of upgrade_qemu function Fabian Ebner
2021-04-08 10:33 ` [pve-devel] [POC qemu-server 4/6] draft of qemuupgrade API call Fabian Ebner
2021-04-08 10:33 ` [pve-devel] [POC qemu-server 5/6] add timing for testing Fabian Ebner
2021-04-08 10:33 ` [pve-devel] [POC qemu-server 6/6] add usleep parameter to savevm_monitor Fabian Ebner
2021-04-08 16:44 ` [pve-devel] [POC qemu-server] fix 3303: allow "live" upgrade of qemu version Thomas Lamprecht
2021-06-23 17:56   ` Laurent GUERBY

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal