public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Christoph Heiss <c.heiss@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH qemu-server 08/14] fix #5180: migrate: integrate helper for live-migrating conntrack info
Date: Mon, 17 Mar 2025 15:11:45 +0100	[thread overview]
Message-ID: <20250317141152.1247324-9-c.heiss@proxmox.com> (raw)
In-Reply-To: <20250317141152.1247324-1-c.heiss@proxmox.com>

Fixes #5180 [0].

This implements for live-migration:
a) the dbus-vmstate is started on the target side, together with the VM
b) the dbus-vmstate helper is started on the source side
c) everything is cleaned up properly, in any case

It is currently off-by-default and must be enabled via the optional
`with-conntrack-state` migration parameter.

The conntrack entry migration is done in such a way that it can
soft-fail, w/o impacting the actual migration, i.e. considering it
"best-effort".

A failed conntrack entry migration does not have any real impact on
functionality, other than it might exhibit the problems as lined out in
the issue [0].

For remote migrations, only a warning is thrown for now. Cross-cluster
migration has stricter requirements and is not a "one-size-fits-it-all".
E.g. the most promentient issue if the network segmentation is
different, which would make the conntrack entries useless or require
careful rewriting.

[0] https://bugzilla.proxmox.com/show_bug.cgi?id=5180

Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
---
Depends on patch #4 to pve-common & an dependency bump of it.

 PVE/API2/Qemu.pm              |  72 ++++++++++++++++++++
 PVE/CLI/qm.pm                 |   5 ++
 PVE/QemuMigrate.pm            |  64 ++++++++++++++++++
 PVE/QemuServer.pm             |   6 ++
 PVE/QemuServer/DBusVMState.pm | 124 ++++++++++++++++++++++++++++++++++
 PVE/QemuServer/Makefile       |   1 +
 6 files changed, 272 insertions(+)
 create mode 100644 PVE/QemuServer/DBusVMState.pm

diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index 156b1c7b..4d7b8196 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -39,6 +39,7 @@ use PVE::QemuServer::MetaInfo;
 use PVE::QemuServer::PCI;
 use PVE::QemuServer::QMPHelpers;
 use PVE::QemuServer::USB;
+use PVE::QemuServer::DBusVMState;
 use PVE::QemuMigrate;
 use PVE::RPCEnvironment;
 use PVE::AccessControl;
@@ -3035,6 +3036,12 @@ __PACKAGE__->register_method({
 		default => 'max(30, vm memory in GiB)',
 		optional => 1,
 	    },
+	    'with-conntrack-state' => {
+		type => 'boolean',
+		optional => 1,
+		default => 0,
+		description => 'Whether to migrate conntrack entries for running VMs.',
+	    }
 	},
     },
     returns => {
@@ -3065,6 +3072,7 @@ __PACKAGE__->register_method({
 	my $migration_network = $get_root_param->('migration_network');
 	my $targetstorage = $get_root_param->('targetstorage');
 	my $force_cpu = $get_root_param->('force-cpu');
+	my $with_conntrack_state = $get_root_param->('with-conntrack-state');
 
 	my $storagemap;
 
@@ -3136,6 +3144,7 @@ __PACKAGE__->register_method({
 		    nbd_proto_version => $nbd_protocol_version,
 		    replicated_volumes => $replicated_volumes,
 		    offline_volumes => $offline_volumes,
+		    with_conntrack_state => $with_conntrack_state,
 		};
 
 		my $params = {
@@ -4675,6 +4684,11 @@ __PACKAGE__->register_method({
 		},
 		description => "List of mapped resources e.g. pci, usb"
 	    },
+	    'has-dbus-vmstate' => {
+		type => 'boolean',
+		description => 'Whether the VM host supports migrating additional VM state, '
+		    . 'such as conntrack entries.',
+	    }
 	},
     },
     code => sub {
@@ -4739,6 +4753,7 @@ __PACKAGE__->register_method({
 
 	$res->{local_resources} = $local_resources;
 	$res->{'mapped-resources'} = $mapped_resources;
+	$res->{'has-dbus-vmstate'} = 1;
 
 	return $res;
 
@@ -4800,6 +4815,12 @@ __PACKAGE__->register_method({
 		minimum => '0',
 		default => 'migrate limit from datacenter or storage config',
 	    },
+	    'with-conntrack-state' => {
+		type => 'boolean',
+		optional => 1,
+		default => 0,
+		description => 'Whether to migrate conntrack entries for running VMs.',
+	    }
 	},
     },
     returns => {
@@ -4855,6 +4876,7 @@ __PACKAGE__->register_method({
 	} else {
 	    warn "VM isn't running. Doing offline migration instead.\n" if $param->{online};
 	    $param->{online} = 0;
+	    $param->{'with-conntrack-state'} = 0;
 	}
 
 	my $storecfg = PVE::Storage::config();
@@ -6126,6 +6148,7 @@ __PACKAGE__->register_method({
 			    warn $@ if $@;
 			}
 
+			PVE::QemuServer::DBusVMState::qemu_del_dbus_vmstate($state->{vmid});
 			PVE::QemuServer::destroy_vm($state->{storecfg}, $state->{vmid}, 1);
 		    }
 
@@ -6299,4 +6322,53 @@ __PACKAGE__->register_method({
 	return { socket => $socket };
     }});
 
+__PACKAGE__->register_method({
+    name => 'dbus_vmstate',
+    path => '{vmid}/dbus-vmstate',
+    method => 'POST',
+    proxyto => 'node',
+    description => 'Stop the dbus-vmstate helper for the given VM if running.',
+    permissions => {
+	check => ['perm', '/vms/{vmid}', [ 'VM.Migrate' ]],
+    },
+    parameters => {
+	additionalProperties => 0,
+	properties => {
+	    node => get_standard_option('pve-node'),
+	    vmid => get_standard_option('pve-vmid', { completion => \&PVE::QemuServer::complete_vmid }),
+	    action => {
+		type => 'string',
+		enum => [qw(start stop)],
+		description => 'Action to perform on the DBus VMState helper.',
+		optional => 0,
+	    },
+	},
+    },
+    returns => {
+	type => 'null',
+    },
+    code => sub {
+	my ($param) = @_;
+	my ($node, $vmid, $action) = $param->@{qw(node vmid action)};
+
+	my $nodename = PVE::INotify::nodename();
+	if ($node ne 'localhost' && $node ne $nodename) {
+	    raise_param_exc({ node => "node needs to be 'localhost' or local hostname '$nodename'" });
+	}
+
+	if (!PVE::QemuServer::Helpers::vm_running_locally($vmid)) {
+	    raise_param_exc({ node => "VM $vmid not running locally on node '$nodename'" });
+	}
+
+	if ($action eq 'start') {
+	   syslog('info', "starting dbus-vmstate helper for VM $vmid\n");
+	   PVE::QemuServer::DBusVMState::qemu_add_dbus_vmstate($vmid);
+	} elsif ($action eq 'stop') {
+	   syslog('info', "stopping dbus-vmstate helper for VM $vmid\n");
+	   PVE::QemuServer::DBusVMState::qemu_del_dbus_vmstate($vmid);
+	} else {
+	    die "unknown action $action\n";
+	}
+    }});
+
 1;
diff --git a/PVE/CLI/qm.pm b/PVE/CLI/qm.pm
index 3e3a4c91..32c7629c 100755
--- a/PVE/CLI/qm.pm
+++ b/PVE/CLI/qm.pm
@@ -36,6 +36,7 @@ use PVE::QemuServer::Agent qw(agent_available);
 use PVE::QemuServer::ImportDisk;
 use PVE::QemuServer::Monitor qw(mon_cmd);
 use PVE::QemuServer::QMPHelpers;
+use PVE::QemuServer::DBusVMState;
 use PVE::QemuServer;
 
 use PVE::CLIHandler;
@@ -965,6 +966,10 @@ __PACKAGE__->register_method({
 		# vm was shutdown from inside the guest or crashed, doing api cleanup
 		PVE::QemuServer::vm_stop_cleanup($storecfg, $vmid, $conf, 0, 0, 1);
 	    }
+
+	    # ensure that no dbus-vmstate helper is left running in any case
+	    PVE::QemuServer::DBusVMState::qemu_del_dbus_vmstate($vmid);
+
 	    PVE::GuestHelpers::exec_hookscript($conf, $vmid, 'post-stop');
 
 	    $restart = eval { PVE::QemuServer::clear_reboot_request($vmid) };
diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index c2e36334..7704a38e 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -32,6 +32,7 @@ use PVE::QemuServer::Machine;
 use PVE::QemuServer::Monitor qw(mon_cmd);
 use PVE::QemuServer::Memory qw(get_current_memory);
 use PVE::QemuServer::QMPHelpers;
+use PVE::QemuServer::DBusVMState;
 use PVE::QemuServer;
 
 use PVE::AbstractMigrate;
@@ -224,6 +225,21 @@ sub prepare {
 	# Do not treat a suspended VM as paused, as it might wake up
 	# during migration and remain paused after migration finishes.
 	$self->{vm_was_paused} = 1 if PVE::QemuServer::vm_is_paused($vmid, 0);
+
+	if ($self->{opts}->{'with-conntrack-state'}) {
+	    if ($self->{opts}->{remote}) {
+		# shouldn't be reached in normal circumstances anyway, as we prevent it on
+		# an API level
+		$self->log('warn', 'conntrack state migration not supported for remote migrations, '
+		    . 'active connections might get dropped');
+		$self->{opts}->{'with-conntrack-state'} = 0;
+	    } else {
+		PVE::QemuServer::DBusVMState::qemu_add_dbus_vmstate($vmid);
+	    }
+	} else {
+	    $self->log('warn', 'conntrack state migration not supported or enabled, '
+		. 'active connections might get dropped');
+	}
     }
 
     my ($loc_res, $mapped_res, $missing_mappings_by_node) = PVE::QemuServer::check_local_resources($conf, $running, 1);
@@ -859,6 +875,14 @@ sub phase1_cleanup {
     if (my $err =$@) {
 	$self->log('err', $err);
     }
+
+    if ($self->{running} && $self->{opts}->{'with-conntrack-state'}) {
+	# if the VM is running, that means we also tried to migrate additional
+	# state via our dbus-vmstate helper
+	# only need to locally stop it, on the target the VM cleanup will
+	# handle it
+	PVE::QemuServer::DBusVMState::qemu_del_dbus_vmstate($vmid);
+    }
 }
 
 sub phase2_start_local_cluster {
@@ -905,6 +929,10 @@ sub phase2_start_local_cluster {
 	push @$cmd, '--targetstorage', ($self->{opts}->{targetstorage} // '1');
     }
 
+    if ($self->{opts}->{'with-conntrack-state'}) {
+	push @$cmd, '--with-conntrack-state';
+    }
+
     my $spice_port;
     my $input = "nbd_protocol_version: $migrate->{nbd_proto_version}\n";
 
@@ -1434,6 +1462,13 @@ sub phase2_cleanup {
 	$self->log('err', $err);
     }
 
+    if ($self->{running} && $self->{opts}->{'with-conntrack-state'}) {
+	# if the VM is running, that means we also tried to migrate additional
+	# state via our dbus-vmstate helper
+	# only need to locally stop it, on the target the VM cleanup will
+	# handle it
+	PVE::QemuServer::DBusVMState::qemu_del_dbus_vmstate($vmid);
+    }
 
     if ($self->{tunnel}) {
 	eval { PVE::Tunnel::finish_tunnel($self->{tunnel});  };
@@ -1556,6 +1591,35 @@ sub phase3_cleanup {
 		$self->log('info', "skipping guest fstrim, because VM is paused");
 	    }
 	}
+
+	if ($self->{running} && $self->{opts}->{'with-conntrack-state'}) {
+	    # if the VM is running, that means we also migrated additional
+	    # state via our dbus-vmstate helper
+	    $self->log('info', 'stopping migration dbus-vmstate helpers');
+
+	    # first locally
+	    my $num = PVE::QemuServer::DBusVMState::qemu_del_dbus_vmstate($vmid);
+	    if (defined($num)) {
+		my $plural = $num > 1 ? "entries" : "entry";
+		$self->log('info', "migrated $num conntrack state $plural");
+	    }
+
+	    # .. and then remote
+	    my $targetnode = $self->{node};
+	    eval {
+		# FIXME: introduce proper way to call API methods on another node?
+		# See also e.g. pve-network/src/PVE/API2/Network/SDN.pm, which
+		# does something similar.
+		PVE::Tools::run_command([
+		    'pvesh', 'create',
+		    "/nodes/$targetnode/qemu/$vmid/dbus-vmstate",
+		    '--action', 'stop',
+		]);
+	    };
+	    if (my $err = $@) {
+		$self->log('warn', "failed to stop dbus-vmstate on $targetnode: $err\n");
+	    }
+	}
     }
 
     # close tunnel on successful migration, on error phase2_cleanup closed it
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index ffd5d56b..211e02ad 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -62,6 +62,7 @@ use PVE::QemuServer::Monitor qw(mon_cmd);
 use PVE::QemuServer::PCI qw(print_pci_addr print_pcie_addr print_pcie_root_port parse_hostpci);
 use PVE::QemuServer::QMPHelpers qw(qemu_deviceadd qemu_devicedel qemu_objectadd qemu_objectdel);
 use PVE::QemuServer::USB;
+use PVE::QemuServer::DBusVMState;
 
 my $have_sdn;
 eval {
@@ -5559,6 +5560,7 @@ sub vm_start {
 #   replicated_volumes => which volids should be re-used with bitmaps for nbd migration
 #   offline_volumes => new volids of offline migrated disks like tpmstate and cloudinit, not yet
 #       contained in config
+#   with_conntrack_state => whether to start the dbus-vmstate helper for conntrack state migration
 sub vm_start_nolock {
     my ($storecfg, $vmid, $conf, $params, $migrate_opts) = @_;
 
@@ -5956,6 +5958,10 @@ sub vm_start_nolock {
 	    }
 	}
 
+        # conntrack migration is only supported for intra-cluster migrations
+	if ($migrate_opts->{with_conntrack_state} && !$migrate_opts->{remote_node}) {
+	    PVE::QemuServer::DBusVMState::qemu_add_dbus_vmstate($vmid);
+	}
     } else {
 	mon_cmd($vmid, "balloon", value => $conf->{balloon}*1024*1024)
 	    if !$statefile && $conf->{balloon};
diff --git a/PVE/QemuServer/DBusVMState.pm b/PVE/QemuServer/DBusVMState.pm
new file mode 100644
index 00000000..b2e14b7f
--- /dev/null
+++ b/PVE/QemuServer/DBusVMState.pm
@@ -0,0 +1,124 @@
+package PVE::QemuServer::DBusVMState;
+
+use strict;
+use warnings;
+
+use PVE::SafeSyslog;
+use PVE::Systemd;
+use PVE::Tools;
+
+use constant {
+    DBUS_VMSTATE_EXE => '/usr/libexec/qemu-server/dbus-vmstate',
+};
+
+# Retrieves a property from an object from a specific interface name.
+# In contrast to accessing the property directly by using $obj->Property, this
+# actually respects the owner of the object and thus can be used for interfaces
+# with might have multiple (queued) owners on the DBus.
+my sub dbus_get_property {
+    my ($obj, $interface, $name) = @_;
+
+    my $con = $obj->{service}->get_bus()->get_connection();
+
+    my $call = $con->make_method_call_message(
+        $obj->{service}->get_service_name(),
+        $obj->{object_path},
+        'org.freedesktop.DBus.Properties',
+        'Get',
+    );
+
+    $call->set_destination($obj->get_service()->get_owner_name());
+    $call->append_args_list($interface, $name);
+
+    my @reply = $con->send_with_reply_and_block($call, 10 * 1000)->get_args_list();
+    return $reply[0];
+}
+
+# Starts the dbus-vmstate helper D-Bus service daemon and adds the needed
+# object to the appropriate QEMU instance for the specified VM.
+sub qemu_add_dbus_vmstate {
+    my ($vmid) = @_;
+
+    if (!PVE::QemuServer::Helpers::vm_running_locally($vmid)) {
+        die "VM $vmid must be running locally\n";
+    }
+
+    # In case some leftover, previous instance is running, stop it. Otherwise
+    # we run into errors, as a systemd scope is unique.
+    if (defined(qemu_del_dbus_vmstate($vmid, quiet => 1))) {
+        warn "stopped previously running dbus-vmstate helper for VM $vmid\n";
+    }
+
+    # This also ensures that only ever one instance can run
+    PVE::Systemd::enter_systemd_scope(
+        "pve-dbus-vmstate-$vmid",
+        "Proxmox VE dbus-vmstate helper (VM $vmid)",
+    );
+
+    PVE::Tools::run_fork_detached(sub {
+        exec {DBUS_VMSTATE_EXE} DBUS_VMSTATE_EXE, $vmid;
+        die "exec failed: $!\n";
+    });
+}
+
+# Stops the dbus-vmstate helper D-Bus service daemon and removes the associated
+# object from QEMU for the specified VM.
+#
+# Returns the number of migrated conntrack entries, or undef in case of error.
+sub qemu_del_dbus_vmstate {
+    my ($vmid, %params) = @_;
+
+    my $num_entries = undef;
+    my $dbus = Net::DBus->system();
+    my $dbus_obj = $dbus->get_bus_object();
+
+    my $owners = eval { $dbus_obj->ListQueuedOwners('org.qemu.VMState1') };
+    if (my $err = $@) {
+        syslog('warn', "failed to retrieve org.qemu.VMState1 owners: $err\n")
+            if !$params{quiet};
+        return undef;
+    }
+
+    # Iterate through all name owners for 'org.qemu.VMState1' and compare
+    # the ID. If we found the corresponding one for $vmid, call our `Quit` method.
+    # Any D-Bus interaction might die/croak, so try to be careful here and swallow
+    # any hard errors.
+    foreach my $owner (@$owners) {
+        my $service = eval { Net::DBus::RemoteService->new($dbus, $owner, 'org.qemu.VMState1') };
+        if (my $err = $@) {
+            syslog('warn', "failed to get org.qemu.VMState1 service from D-Bus $owner: $err\n")
+                if !$params{quiet};
+            next;
+        }
+
+        my $object = eval { $service->get_object('/org/qemu/VMState1') };
+        if (my $err = $@) {
+            syslog('warn', "failed to get /org/qemu/VMState1 object from D-Bus $owner: $err\n")
+                if !$params{quiet};
+            next;
+        }
+
+        my $id = eval { dbus_get_property($object, 'org.qemu.VMState1', 'Id') };
+        if (defined($id) && $id eq "pve-vmstate-$vmid") {
+            my $helperobj = eval { $service->get_object('/org/qemu/VMState1', 'com.proxmox.VMStateHelper') };
+            if (my $err = $@) {
+                syslog('warn', "found dbus-vmstate helper, but does not implement com.proxmox.VMStateHelper? ($err)\n")
+                    if !$params{quiet};
+                last;
+            }
+
+            $num_entries = eval { dbus_get_property($object, 'com.proxmox.VMStateHelper', 'NumMigratedEntries') };
+            eval { $object->Quit() };
+            if (my $err = $@) {
+                syslog('warn', "failed to call quit on dbus-vmstate for VM $vmid: $err\n")
+                    if !$params{quiet};
+            }
+
+            last;
+        }
+    }
+
+    return $num_entries;
+}
+
+1;
diff --git a/PVE/QemuServer/Makefile b/PVE/QemuServer/Makefile
index 18fd13ea..8226bd2f 100644
--- a/PVE/QemuServer/Makefile
+++ b/PVE/QemuServer/Makefile
@@ -3,6 +3,7 @@ SOURCES=PCI.pm		\
 	Memory.pm	\
 	ImportDisk.pm	\
 	Cloudinit.pm	\
+	DBusVMState.pm	\
 	Agent.pm	\
 	Helpers.pm	\
 	Monitor.pm	\
-- 
2.48.1



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


  parent reply	other threads:[~2025-03-17 14:17 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-17 14:11 [pve-devel] [PATCH many 00/14] fix #5180: migrate conntrack state on live migration Christoph Heiss
2025-03-17 14:11 ` [pve-devel] [PATCH proxmox-ve-rs 01/14] config: guest: allow access to raw Vmid value Christoph Heiss
2025-03-17 14:11 ` [pve-devel] [PATCH proxmox-firewall 02/14] firewall: add connmark rule with VMID to all guest chains Christoph Heiss
2025-03-17 14:11 ` [pve-devel] [PATCH pve-firewall 03/14] " Christoph Heiss
2025-03-17 14:11 ` [pve-devel] [PATCH common 04/14] tools: add run_fork_detached() for spawning daemons Christoph Heiss
2025-03-17 14:11 ` [pve-devel] [PATCH qemu-server 05/14] qmp helpers: allow passing structured args via qemu_objectadd() Christoph Heiss
2025-03-17 14:11 ` [pve-devel] [PATCH qemu-server 06/14] api2: qemu: add module exposing node migration capabilities Christoph Heiss
2025-03-17 14:11 ` [pve-devel] [PATCH qemu-server 07/14] fix #5180: libexec: add QEMU dbus-vmstate daemon for migrating conntrack Christoph Heiss
2025-03-17 14:11 ` Christoph Heiss [this message]
2025-03-17 14:11 ` [pve-devel] [PATCH manager 09/14] api2: capabilities: explicitly import CPU capabilities module Christoph Heiss
2025-03-17 14:11 ` [pve-devel] [PATCH manager 10/14] api2: capabilities: proxy index endpoints to respective nodes Christoph Heiss
2025-03-17 14:11 ` [pve-devel] [PATCH manager 11/14] api2: capabilities: expose new qemu/migration endpoint Christoph Heiss
2025-03-17 14:11 ` [pve-devel] [PATCH manager 12/14] ui: window: Migrate: add checkbox for migrating VM conntrack state Christoph Heiss
2025-03-17 14:11 ` [pve-devel] [RFC PATCH firewall 13/14] firewall: helpers: add sub for flushing conntrack entries by mark Christoph Heiss
2025-03-17 14:11 ` [pve-devel] [RFC PATCH qemu-server 14/14] migrate: flush old VM conntrack entries after successful migration Christoph Heiss

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250317141152.1247324-9-c.heiss@proxmox.com \
    --to=c.heiss@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal