From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 59E031FF15E for ; Tue, 3 Jun 2025 11:34:30 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 1C34814D46; Tue, 3 Jun 2025 11:34:49 +0200 (CEST) Message-ID: Date: Tue, 3 Jun 2025 11:34:13 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird To: Proxmox VE development discussion , Christoph Heiss References: <20250424111941.730528-1-c.heiss@proxmox.com> <20250424111941.730528-8-c.heiss@proxmox.com> Content-Language: en-US From: Stefan Hanreich In-Reply-To: <20250424111941.730528-8-c.heiss@proxmox.com> X-SPAM-LEVEL: Spam detection results: 0 AWL 0.683 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment KAM_SHORT 0.001 Use of a URL Shortener for very short URL SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [PATCH qemu-server v2 7/13] fix #5180: libexec: add QEMU dbus-vmstate daemon for migrating conntrack X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" On 4/24/25 13:19, Christoph Heiss wrote: > First part to fixing #5180 [0]. > > Adds a simple D-Bus server which implements the `org.qemu.VMState1` > interface as specified in the QEMU documentation [1]. > > Using the built-in QEMU VMState machinery saves us from having to worry > about transfer and convergence of the data and letl QEMU take care of > it. > > Any object on the D-Bus path `/org/qemu/VMState1` implementing that > interface will be called by QEMU during live-migration, iif the `Id` > property is registered within the `dbus-vmstate` QEMU object for a > specific VM. > > The actual state loading/restoring is done via the conntrack(8) tool, a > small tool which already implements hard parts of interacting with the > conntrack subsystem via netlink. > > Filtering is done on CONNMARK, which is set to the specific VMID for all > packets by the firewall. > > Additionally, a custom `com.proxmox.VMStateHelper` interface is > implemented by the object, adding a small `Quit` method for cleanly > shutting down the daemon via the D-Bus API. > > For all to work, D-Bus needs a policy describing who is allowed to > access the interface. [2] > > Currently, there is a hard-limit of 1 MiB of state enforced by QEMU. > Typical conntrack state entries as dumped by conntrack(8) in the `save` > output format are just plaintext, ASCII lines and mostly around > 150-200 characters. That translates then to about ~5200 entries that can > be migrated. > > Such a typical line looks like: > > -A -t 431974 -u SEEN_REPLY,ASSURED -s 10.1.0.1 -d 10.1.1.20 \ > -r 10.1.1.20 -q 10.1.0.1 -p tcp --sport 48550 --dport 22 \ > --reply-port-src 22 --reply-port-dst 48550 --state ESTABLISHED > > In the future, compression could be implemented for these before sending > them to QEMU, which should increase the above number quite a bit - since > these entries are nicely compressible. > > [0] https://bugzilla.proxmox.com/show_bug.cgi?id=5180 > [1] https://www.qemu.org/docs/master/interop/dbus-vmstate.html > [2] https://dbus.freedesktop.org/doc/dbus-daemon.1.html#configuration_file > > Signed-off-by: Christoph Heiss > --- > Changes v1 -> v2: > * convert dbus-vmstate to instanced systemd service > * fix plural for zero entries in migration log > > Makefile | 7 +- > dbus-vmstate/Makefile | 7 ++ > dbus-vmstate/dbus-vmstate | 168 +++++++++++++++++++++++++ > dbus-vmstate/org.qemu.VMState1.conf | 11 ++ > dbus-vmstate/pve-dbus-vmstate@.service | 10 ++ > debian/control | 7 +- > 6 files changed, 208 insertions(+), 2 deletions(-) > create mode 100644 dbus-vmstate/Makefile > create mode 100755 dbus-vmstate/dbus-vmstate > create mode 100644 dbus-vmstate/org.qemu.VMState1.conf > create mode 100644 dbus-vmstate/pve-dbus-vmstate@.service > > diff --git a/Makefile b/Makefile > index ed67fe0a..2591c2d0 100644 > --- a/Makefile > +++ b/Makefile > @@ -3,7 +3,7 @@ include /usr/share/dpkg/default.mk > PACKAGE=qemu-server > BUILDDIR ?= $(PACKAGE)-$(DEB_VERSION_UPSTREAM) > > -DESTDIR= > +export DESTDIR= > PREFIX=/usr > SBINDIR=$(PREFIX)/sbin > LIBDIR=$(PREFIX)/lib/$(PACKAGE) > @@ -16,6 +16,10 @@ ZSHCOMPLDIR=$(PREFIX)/share/zsh/vendor-completions/ > export PERLDIR=$(PREFIX)/share/perl5 > PERLINCDIR=$(PERLDIR)/asm-x86_64 > > +export LIBSYSTEMDDIR=$(PREFIX)/lib/systemd > +export LIBEXECDIR=$(PREFIX)/libexec/$(PACKAGE) > +export DBUSDIR=$(PREFIX)/share/dbus-1 > + > GITVERSION:=$(shell git rev-parse HEAD) > > DEB=$(PACKAGE)_$(DEB_VERSION_UPSTREAM_REVISION)_$(DEB_BUILD_ARCH).deb > @@ -68,6 +72,7 @@ install: $(PKGSOURCES) > $(MAKE) -C query-machine-capabilities install > $(MAKE) -C qemu-configs install > $(MAKE) -C vm-network-scripts install > + $(MAKE) -C dbus-vmstate install > install -m 0755 qm $(DESTDIR)$(SBINDIR) > install -m 0755 qmrestore $(DESTDIR)$(SBINDIR) > install -D -m 0644 modules-load.conf $(DESTDIR)/etc/modules-load.d/qemu-server.conf > diff --git a/dbus-vmstate/Makefile b/dbus-vmstate/Makefile > new file mode 100644 > index 00000000..177bbbc1 > --- /dev/null > +++ b/dbus-vmstate/Makefile > @@ -0,0 +1,7 @@ > +all: > + > +.PHONY: install > +install: > + install -D -m 0755 dbus-vmstate $(DESTDIR)/$(LIBEXECDIR)/dbus-vmstate > + install -D -m 0644 pve-dbus-vmstate@.service $(DESTDIR)/$(LIBSYSTEMDDIR)/system/pve-dbus-vmstate@.service > + install -D -m 0644 org.qemu.VMState1.conf $(DESTDIR)/$(DBUSDIR)/system.d/org.qemu.VMState1.conf > diff --git a/dbus-vmstate/dbus-vmstate b/dbus-vmstate/dbus-vmstate > new file mode 100755 > index 00000000..04a1b53d > --- /dev/null > +++ b/dbus-vmstate/dbus-vmstate > @@ -0,0 +1,168 @@ > +#!/usr/bin/perl > + > +# Exports an DBus object implementing > +# https://www.qemu.org/docs/master/interop/dbus-vmstate.html > + > +package PVE::QemuServer::DBusVMState; > + > +use warnings; > +use strict; > + > +use Carp; > +use Net::DBus; > +use Net::DBus::Exporter qw(org.qemu.VMState1); > +use Net::DBus::Reactor; > +use PVE::QemuServer::Helpers; > +use PVE::QemuServer::QMPHelpers qw(qemu_objectadd qemu_objectdel); > +use PVE::SafeSyslog; > +use PVE::Tools; > + > +use base qw(Net::DBus::Object); > + > +use Class::MethodMaker [ scalar => [ qw(Id NumMigratedEntries) ]]; > +dbus_property('Id', 'string', 'read'); > +dbus_property('NumMigratedEntries', 'uint32', 'read', 'com.proxmox.VMStateHelper'); > + > +sub new { > + my ($class, $service, $vmid) = @_; > + > + my $self = $class->SUPER::new($service, '/org/qemu/VMState1'); > + $self->{vmid} = $vmid; > + $self->Id("pve-vmstate-$vmid"); > + $self->NumMigratedEntries(0); > + > + bless $self, $class; > + return $self; > +} > + > +sub Load { > + my ($self, $bytes) = @_; > + > + my $len = scalar(@$bytes); > + return if $len <= 1; # see also the `Save` method > + > + my $text = pack('c*', @$bytes); > + > + eval { > + PVE::Tools::run_command( > + ['conntrack', '--load-file', '-'], > + input => $text, > + ); > + }; > + if (my $err = $@) { nit: could just use $@ directly here? some additional occurences below > + syslog('warn', "failed to restore conntrack state: $err\n"); > + } else { > + syslog('info', "restored $len bytes of conntrack state\n"); > + } > +} > +dbus_method('Load', [['array', 'byte']], []); > + > +use constant { > + # From the documentation: > + # https://www.qemu.org/docs/master/interop/dbus-vmstate.html), > + # > For now, the data amount to be transferred is arbitrarily limited to 1Mb. > + # > + # See also qemu/backends/dbus-vmstate.c:DBUS_VMSTATE_SIZE_LIMIT > + DBUS_VMSTATE_SIZE_LIMIT => 1024 * 1024, > +}; > + > +sub Save { > + my ($self) = @_; > + > + my $text = ''; > + my $truncated = 0; > + my $num_entries = 0; > + eval { > + PVE::Tools::run_command( > + ['conntrack', '--dump', '--mark', $self->{vmid}, '--output', 'save'], > + outfunc => sub { > + my ($line) = @_; > + return if $truncated; > + > + if ((length($text) + length($line)) > DBUS_VMSTATE_SIZE_LIMIT) { > + syslog('warn', 'conntrack state too large, ignoring further entries'); > + $truncated = 1; > + return; > + } > + > + # conntrack(8) does not preserve the `--mark` option, apparently > + # just add it back ourselves > + $text .= "$line --mark $self->{vmid}\n"; > + }, > + errfunc => sub { > + my ($line) = @_; > + > + if ($line =~ /(\d) flow entries/) { > + syslog('info', "received $1 conntrack entries"); > + # conntrack reports the number of displayed entries on stderr, > + # which shouldn't be considered an error. > + $self->NumMigratedEntries($1); > + return; > + } > + syslog('err', $line); > + } > + ); > + }; > + if (my $err = $@) { here > + syslog('warn', "failed to save conntrack state: $err\n"); > + > + # Apparently either Net::DBus does not correctly zero-sized (byte) > + # arrays correctly - returning [] yields QEMU failing with > + # > + # "kvm: dbus_save_state_proxy: Failed to Save: not a byte array" > + # > + # Thus, just return an array with a single element and detect that > + # appropriately in the `Load`. A valid conntrack state can *never* be > + # just a single byte, so it is safe to rely on that. > + return [0]; > + } > + > + my @bytes = unpack('c*', $text); > + my $len = scalar(@bytes); > + > + syslog('info', "transferring $len bytes of conntrack state\n"); > + > + # Same as above w.r.t. returning as single-element array. > + return $len == 0 ? [0] : \@bytes; > +} > +dbus_method('Save', [], [['array', 'byte']]); > + > +# Additional method for cleanly shutting down the service. > +sub Quit { > + my ($self) = @_; > + > + syslog('info', "shutting down gracefully ..\n"); > + > + # On the source side, the VM won't exist anymore, so no need to remove > + # anything. > + if (PVE::QemuServer::Helpers::vm_running_locally($self->{vmid})) { > + eval { qemu_objectdel($self->{vmid}, 'pve-vmstate') }; > + if (my $err = $@) { here > + syslog('warn', "failed to remove object: $err\n"); > + } > + } > + > + Net::DBus::Reactor->main()->shutdown(); > +} > +dbus_method('Quit', [], [], 'com.proxmox.VMStateHelper', { no_return => 1 }); > + > +my $vmid = shift; > + > +my $dbus = Net::DBus->system(); > +my $service = $dbus->export_service('org.qemu.VMState1'); > +my $obj = PVE::QemuServer::DBusVMState->new($service, $vmid); > + > +$SIG{TERM} = sub { > + $obj->Quit(); > +}; > + > +my $addr = $dbus->get_unique_name(); > +syslog('info', "pve-vmstate-$vmid listening on $addr\n"); > + > +# Inform QEMU about our running dbus-vmstate helper > +qemu_objectadd($vmid, 'pve-vmstate', 'dbus-vmstate', > + addr => 'unix:path=/run/dbus/system_bus_socket', > + 'id-list' => "pve-vmstate-$vmid", > +); > + > +Net::DBus::Reactor->main()->run(); > diff --git a/dbus-vmstate/org.qemu.VMState1.conf b/dbus-vmstate/org.qemu.VMState1.conf > new file mode 100644 > index 00000000..cfedcae4 > --- /dev/null > +++ b/dbus-vmstate/org.qemu.VMState1.conf > @@ -0,0 +1,11 @@ > + > + + "http://www.freedesktop.org/standards/dbus/1.0/busconfig.dtd"> > + > + > + > + > + > + > + > + > diff --git a/dbus-vmstate/pve-dbus-vmstate@.service b/dbus-vmstate/pve-dbus-vmstate@.service > new file mode 100644 > index 00000000..56b4e285 > --- /dev/null > +++ b/dbus-vmstate/pve-dbus-vmstate@.service > @@ -0,0 +1,10 @@ > +[Unit] > +Description=PVE DBus VMState Helper (VM %i) > +Requires=dbus.socket > +After=dbus.socket > +PartOf=%i.scope > + > +[Service] > +Slice=qemu.slice > +Type=simple > +ExecStart=/usr/libexec/qemu-server/dbus-vmstate %i > diff --git a/debian/control b/debian/control > index d6c20040..ee1ca177 100644 > --- a/debian/control > +++ b/debian/control > @@ -3,9 +3,11 @@ Section: admin > Priority: optional > Maintainer: Proxmox Support Team > Build-Depends: debhelper-compat (= 13), > + libclass-methodmaker-perl, > libglib2.0-dev, > libio-multiplex-perl, > libjson-c-dev, > + libnet-dbus-perl, > libpve-apiclient-perl, > libpve-cluster-perl, > libpve-common-perl (>= 8.0.2), > @@ -28,11 +30,14 @@ Homepage: https://www.proxmox.com > > Package: qemu-server > Architecture: any > -Depends: dbus, > +Depends: conntrack, > + dbus, > genisoimage, > + libclass-methodmaker-perl, > libio-multiplex-perl, > libjson-perl, > libjson-xs-perl, > + libnet-dbus-perl, > libnet-ssleay-perl, > libpve-access-control (>= 8.0.0~), > libpve-apiclient-perl, _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel