public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Christoph Heiss <c.heiss@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH qemu-server v2 7/13] fix #5180: libexec: add QEMU dbus-vmstate daemon for migrating conntrack
Date: Thu, 24 Apr 2025 13:19:27 +0200	[thread overview]
Message-ID: <20250424111941.730528-8-c.heiss@proxmox.com> (raw)
In-Reply-To: <20250424111941.730528-1-c.heiss@proxmox.com>

First part to fixing #5180 [0].

Adds a simple D-Bus server which implements the `org.qemu.VMState1`
interface as specified in the QEMU documentation [1].

Using the built-in QEMU VMState machinery saves us from having to worry
about transfer and convergence of the data and letl QEMU take care of
it.

Any object on the D-Bus path `/org/qemu/VMState1` implementing that
interface will be called by QEMU during live-migration, iif the `Id`
property is registered within the `dbus-vmstate` QEMU object for a
specific VM.

The actual state loading/restoring is done via the conntrack(8) tool, a
small tool which already implements hard parts of interacting with the
conntrack subsystem via netlink.

Filtering is done on CONNMARK, which is set to the specific VMID for all
packets by the firewall.

Additionally, a custom `com.proxmox.VMStateHelper` interface is
implemented by the object, adding a small `Quit` method for cleanly
shutting down the daemon via the D-Bus API.

For all to work, D-Bus needs a policy describing who is allowed to
access the interface. [2]

Currently, there is a hard-limit of 1 MiB of state enforced by QEMU.
Typical conntrack state entries as dumped by conntrack(8) in the `save`
output format are just plaintext, ASCII lines and mostly around
150-200 characters. That translates then to about ~5200 entries that can
be migrated.

Such a typical line looks like:

  -A -t 431974 -u SEEN_REPLY,ASSURED -s 10.1.0.1 -d 10.1.1.20 \
  -r 10.1.1.20 -q 10.1.0.1 -p tcp --sport 48550 --dport 22 \
  --reply-port-src 22 --reply-port-dst 48550 --state ESTABLISHED

In the future, compression could be implemented for these before sending
them to QEMU, which should increase the above number quite a bit - since
these entries are nicely compressible.

[0] https://bugzilla.proxmox.com/show_bug.cgi?id=5180
[1] https://www.qemu.org/docs/master/interop/dbus-vmstate.html
[2] https://dbus.freedesktop.org/doc/dbus-daemon.1.html#configuration_file

Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
---
Changes v1 -> v2:
  * convert dbus-vmstate to instanced systemd service
  * fix plural for zero entries in migration log

 Makefile                               |   7 +-
 dbus-vmstate/Makefile                  |   7 ++
 dbus-vmstate/dbus-vmstate              | 168 +++++++++++++++++++++++++
 dbus-vmstate/org.qemu.VMState1.conf    |  11 ++
 dbus-vmstate/pve-dbus-vmstate@.service |  10 ++
 debian/control                         |   7 +-
 6 files changed, 208 insertions(+), 2 deletions(-)
 create mode 100644 dbus-vmstate/Makefile
 create mode 100755 dbus-vmstate/dbus-vmstate
 create mode 100644 dbus-vmstate/org.qemu.VMState1.conf
 create mode 100644 dbus-vmstate/pve-dbus-vmstate@.service

diff --git a/Makefile b/Makefile
index ed67fe0a..2591c2d0 100644
--- a/Makefile
+++ b/Makefile
@@ -3,7 +3,7 @@ include /usr/share/dpkg/default.mk
 PACKAGE=qemu-server
 BUILDDIR ?= $(PACKAGE)-$(DEB_VERSION_UPSTREAM)
 
-DESTDIR=
+export DESTDIR=
 PREFIX=/usr
 SBINDIR=$(PREFIX)/sbin
 LIBDIR=$(PREFIX)/lib/$(PACKAGE)
@@ -16,6 +16,10 @@ ZSHCOMPLDIR=$(PREFIX)/share/zsh/vendor-completions/
 export PERLDIR=$(PREFIX)/share/perl5
 PERLINCDIR=$(PERLDIR)/asm-x86_64
 
+export LIBSYSTEMDDIR=$(PREFIX)/lib/systemd
+export LIBEXECDIR=$(PREFIX)/libexec/$(PACKAGE)
+export DBUSDIR=$(PREFIX)/share/dbus-1
+
 GITVERSION:=$(shell git rev-parse HEAD)
 
 DEB=$(PACKAGE)_$(DEB_VERSION_UPSTREAM_REVISION)_$(DEB_BUILD_ARCH).deb
@@ -68,6 +72,7 @@ install: $(PKGSOURCES)
 	$(MAKE) -C query-machine-capabilities install
 	$(MAKE) -C qemu-configs install
 	$(MAKE) -C vm-network-scripts install
+	$(MAKE) -C dbus-vmstate install
 	install -m 0755 qm $(DESTDIR)$(SBINDIR)
 	install -m 0755 qmrestore $(DESTDIR)$(SBINDIR)
 	install -D -m 0644 modules-load.conf $(DESTDIR)/etc/modules-load.d/qemu-server.conf
diff --git a/dbus-vmstate/Makefile b/dbus-vmstate/Makefile
new file mode 100644
index 00000000..177bbbc1
--- /dev/null
+++ b/dbus-vmstate/Makefile
@@ -0,0 +1,7 @@
+all:
+
+.PHONY: install
+install:
+	install -D -m 0755 dbus-vmstate $(DESTDIR)/$(LIBEXECDIR)/dbus-vmstate
+	install -D -m 0644 pve-dbus-vmstate@.service $(DESTDIR)/$(LIBSYSTEMDDIR)/system/pve-dbus-vmstate@.service
+	install -D -m 0644 org.qemu.VMState1.conf $(DESTDIR)/$(DBUSDIR)/system.d/org.qemu.VMState1.conf
diff --git a/dbus-vmstate/dbus-vmstate b/dbus-vmstate/dbus-vmstate
new file mode 100755
index 00000000..04a1b53d
--- /dev/null
+++ b/dbus-vmstate/dbus-vmstate
@@ -0,0 +1,168 @@
+#!/usr/bin/perl
+
+# Exports an DBus object implementing
+# https://www.qemu.org/docs/master/interop/dbus-vmstate.html
+
+package PVE::QemuServer::DBusVMState;
+
+use warnings;
+use strict;
+
+use Carp;
+use Net::DBus;
+use Net::DBus::Exporter qw(org.qemu.VMState1);
+use Net::DBus::Reactor;
+use PVE::QemuServer::Helpers;
+use PVE::QemuServer::QMPHelpers qw(qemu_objectadd qemu_objectdel);
+use PVE::SafeSyslog;
+use PVE::Tools;
+
+use base qw(Net::DBus::Object);
+
+use Class::MethodMaker [ scalar => [ qw(Id NumMigratedEntries) ]];
+dbus_property('Id', 'string', 'read');
+dbus_property('NumMigratedEntries', 'uint32', 'read', 'com.proxmox.VMStateHelper');
+
+sub new {
+    my ($class, $service, $vmid) = @_;
+
+    my $self = $class->SUPER::new($service, '/org/qemu/VMState1');
+    $self->{vmid} = $vmid;
+    $self->Id("pve-vmstate-$vmid");
+    $self->NumMigratedEntries(0);
+
+    bless $self, $class;
+    return $self;
+}
+
+sub Load {
+    my ($self, $bytes) = @_;
+
+    my $len = scalar(@$bytes);
+    return if $len <= 1; # see also the `Save` method
+
+    my $text = pack('c*', @$bytes);
+
+    eval {
+	PVE::Tools::run_command(
+	    ['conntrack', '--load-file', '-'],
+	    input => $text,
+	);
+    };
+    if (my $err = $@) {
+	syslog('warn', "failed to restore conntrack state: $err\n");
+    } else {
+	syslog('info', "restored $len bytes of conntrack state\n");
+    }
+}
+dbus_method('Load', [['array', 'byte']], []);
+
+use constant {
+    # From the documentation:
+    #   https://www.qemu.org/docs/master/interop/dbus-vmstate.html),
+    # > For now, the data amount to be transferred is arbitrarily limited to 1Mb.
+    #
+    # See also qemu/backends/dbus-vmstate.c:DBUS_VMSTATE_SIZE_LIMIT
+    DBUS_VMSTATE_SIZE_LIMIT => 1024 * 1024,
+};
+
+sub Save {
+    my ($self) = @_;
+
+    my $text = '';
+    my $truncated = 0;
+    my $num_entries = 0;
+    eval {
+	PVE::Tools::run_command(
+	    ['conntrack', '--dump', '--mark', $self->{vmid}, '--output', 'save'],
+	    outfunc => sub {
+		my ($line) = @_;
+		return if $truncated;
+
+		if ((length($text) + length($line)) > DBUS_VMSTATE_SIZE_LIMIT) {
+		   syslog('warn', 'conntrack state too large, ignoring further entries');
+		   $truncated = 1;
+		   return;
+		}
+
+		# conntrack(8) does not preserve the `--mark` option, apparently
+		# just add it back ourselves
+		$text .= "$line --mark $self->{vmid}\n";
+	    },
+	    errfunc => sub {
+		my ($line) = @_;
+
+		if ($line =~ /(\d) flow entries/) {
+		    syslog('info', "received $1 conntrack entries");
+		    # conntrack reports the number of displayed entries on stderr,
+		    # which shouldn't be considered an error.
+		    $self->NumMigratedEntries($1);
+		    return;
+		}
+		syslog('err', $line);
+	    }
+	);
+    };
+    if (my $err = $@) {
+	syslog('warn', "failed to save conntrack state: $err\n");
+
+	# Apparently either Net::DBus does not correctly zero-sized (byte)
+	# arrays correctly - returning [] yields QEMU failing with
+	#
+	#   "kvm: dbus_save_state_proxy: Failed to Save: not a byte array"
+	#
+	# Thus, just return an array with a single element and detect that
+	# appropriately in the `Load`. A valid conntrack state can *never* be
+	# just a single byte, so it is safe to rely on that.
+	return [0];
+    }
+
+    my @bytes = unpack('c*', $text);
+    my $len = scalar(@bytes);
+
+    syslog('info', "transferring $len bytes of conntrack state\n");
+
+    # Same as above w.r.t. returning as single-element array.
+    return $len == 0 ? [0] : \@bytes;
+}
+dbus_method('Save', [], [['array', 'byte']]);
+
+# Additional method for cleanly shutting down the service.
+sub Quit {
+    my ($self) = @_;
+
+    syslog('info', "shutting down gracefully ..\n");
+
+    # On the source side, the VM won't exist anymore, so no need to remove
+    # anything.
+    if (PVE::QemuServer::Helpers::vm_running_locally($self->{vmid})) {
+	eval { qemu_objectdel($self->{vmid}, 'pve-vmstate') };
+	if (my $err = $@) {
+	    syslog('warn', "failed to remove object: $err\n");
+	}
+    }
+
+    Net::DBus::Reactor->main()->shutdown();
+}
+dbus_method('Quit', [], [], 'com.proxmox.VMStateHelper', { no_return => 1 });
+
+my $vmid = shift;
+
+my $dbus = Net::DBus->system();
+my $service = $dbus->export_service('org.qemu.VMState1');
+my $obj = PVE::QemuServer::DBusVMState->new($service, $vmid);
+
+$SIG{TERM} = sub {
+    $obj->Quit();
+};
+
+my $addr = $dbus->get_unique_name();
+syslog('info', "pve-vmstate-$vmid listening on $addr\n");
+
+# Inform QEMU about our running dbus-vmstate helper
+qemu_objectadd($vmid, 'pve-vmstate', 'dbus-vmstate',
+    addr => 'unix:path=/run/dbus/system_bus_socket',
+    'id-list' => "pve-vmstate-$vmid",
+);
+
+Net::DBus::Reactor->main()->run();
diff --git a/dbus-vmstate/org.qemu.VMState1.conf b/dbus-vmstate/org.qemu.VMState1.conf
new file mode 100644
index 00000000..cfedcae4
--- /dev/null
+++ b/dbus-vmstate/org.qemu.VMState1.conf
@@ -0,0 +1,11 @@
+<?xml version="1.0"?>
+<!DOCTYPE busconfig PUBLIC "-//freedesktop//DTD D-BUS Bus Configuration 1.0//EN"
+        "http://www.freedesktop.org/standards/dbus/1.0/busconfig.dtd">
+<busconfig>
+  <policy user="root">
+    <allow own="org.qemu.VMState1" />
+    <allow send_destination="org.qemu.VMState1" />
+    <allow receive_sender="org.qemu.VMState1" />
+    <allow send_destination="com.proxmox.VMStateHelper" />
+  </policy>
+</busconfig>
diff --git a/dbus-vmstate/pve-dbus-vmstate@.service b/dbus-vmstate/pve-dbus-vmstate@.service
new file mode 100644
index 00000000..56b4e285
--- /dev/null
+++ b/dbus-vmstate/pve-dbus-vmstate@.service
@@ -0,0 +1,10 @@
+[Unit]
+Description=PVE DBus VMState Helper (VM %i)
+Requires=dbus.socket
+After=dbus.socket
+PartOf=%i.scope
+
+[Service]
+Slice=qemu.slice
+Type=simple
+ExecStart=/usr/libexec/qemu-server/dbus-vmstate %i
diff --git a/debian/control b/debian/control
index d6c20040..ee1ca177 100644
--- a/debian/control
+++ b/debian/control
@@ -3,9 +3,11 @@ Section: admin
 Priority: optional
 Maintainer: Proxmox Support Team <support@proxmox.com>
 Build-Depends: debhelper-compat (= 13),
+               libclass-methodmaker-perl,
                libglib2.0-dev,
                libio-multiplex-perl,
                libjson-c-dev,
+               libnet-dbus-perl,
                libpve-apiclient-perl,
                libpve-cluster-perl,
                libpve-common-perl (>= 8.0.2),
@@ -28,11 +30,14 @@ Homepage: https://www.proxmox.com
 
 Package: qemu-server
 Architecture: any
-Depends: dbus,
+Depends: conntrack,
+         dbus,
          genisoimage,
+         libclass-methodmaker-perl,
          libio-multiplex-perl,
          libjson-perl,
          libjson-xs-perl,
+         libnet-dbus-perl,
          libnet-ssleay-perl,
          libpve-access-control (>= 8.0.0~),
          libpve-apiclient-perl,
-- 
2.49.0



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


  parent reply	other threads:[~2025-04-24 11:20 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-24 11:19 [pve-devel] [PATCH ve-rs/firewall/qemu-server/manager v2 00/13] fix #5180: migrate conntrack state on live migration Christoph Heiss
2025-04-24 11:19 ` [pve-devel] [PATCH proxmox-ve-rs v2 1/13] config: guest: allow access to raw Vmid value Christoph Heiss
2025-04-24 11:19 ` [pve-devel] [PATCH proxmox-firewall v2 2/13] firewall: add connmark rule with VMID to all guest chains Christoph Heiss
2025-04-24 11:19 ` [pve-devel] [PATCH pve-firewall v2 3/13] " Christoph Heiss
2025-04-24 11:19 ` [pve-devel] [PATCH pve-firewall v2 4/13] firewall: helpers: add sub for flushing conntrack entries by mark Christoph Heiss
2025-04-24 11:19 ` [pve-devel] [PATCH qemu-server v2 5/13] qmp helpers: allow passing structured args via qemu_objectadd() Christoph Heiss
2025-04-24 11:19 ` [pve-devel] [PATCH qemu-server v2 6/13] api2: qemu: add module exposing node migration capabilities Christoph Heiss
2025-04-24 12:02   ` Fiona Ebner
2025-04-25  8:40     ` Christoph Heiss
2025-04-24 11:19 ` Christoph Heiss [this message]
2025-04-24 11:19 ` [pve-devel] [PATCH qemu-server v2 8/13] fix #5180: migrate: integrate helper for live-migrating conntrack info Christoph Heiss
2025-04-24 11:19 ` [pve-devel] [PATCH qemu-server v2 9/13] migrate: flush old VM conntrack entries after successful migration Christoph Heiss
2025-04-24 11:19 ` [pve-devel] [PATCH manager v2 10/13] api2: capabilities: explicitly import CPU capabilities module Christoph Heiss
2025-04-24 11:19 ` [pve-devel] [PATCH manager v2 11/13] api2: capabilities: proxy index endpoints to respective nodes Christoph Heiss
2025-04-24 11:19 ` [pve-devel] [PATCH manager v2 12/13] api2: capabilities: expose new qemu/migration endpoint Christoph Heiss
2025-04-24 11:19 ` [pve-devel] [PATCH manager v2 13/13] ui: window: Migrate: add checkbox for migrating VM conntrack state Christoph Heiss

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250424111941.730528-8-c.heiss@proxmox.com \
    --to=c.heiss@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal