From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <pve-devel-bounces@lists.proxmox.com> Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 2677D1FF16E for <inbox@lore.proxmox.com>; Mon, 17 Mar 2025 15:18:52 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 033CA57B0; Mon, 17 Mar 2025 15:18:01 +0100 (CET) From: Christoph Heiss <c.heiss@proxmox.com> To: pve-devel@lists.proxmox.com Date: Mon, 17 Mar 2025 15:11:44 +0100 Message-ID: <20250317141152.1247324-8-c.heiss@proxmox.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250317141152.1247324-1-c.heiss@proxmox.com> References: <20250317141152.1247324-1-c.heiss@proxmox.com> MIME-Version: 1.0 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.027 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pve-devel] [PATCH qemu-server 07/14] fix #5180: libexec: add QEMU dbus-vmstate daemon for migrating conntrack X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com> List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe> List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/> List-Post: <mailto:pve-devel@lists.proxmox.com> List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help> List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe> Reply-To: Proxmox VE development discussion <pve-devel@lists.proxmox.com> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" <pve-devel-bounces@lists.proxmox.com> First part to fixing #5180 [0]. Adds a simple D-Bus server which implements the `org.qemu.VMState1` interface as specified in the QEMU documentation [1]. Using the built-in QEMU VMState machinery saves us from having to worry about transfer and convergence of the data and letl QEMU take care of it. Any object on the D-Bus path `/org/qemu/VMState1` implementing that interface will be called by QEMU during live-migration, iif the `Id` property is registered within the `dbus-vmstate` QEMU object for a specific VM. The actual state loading/restoring is done via the conntrack(8) tool, a small tool which already implements hard parts of interacting with the conntrack subsystem via netlink. Filtering is done on CONNMARK, which is set to the specific VMID for all packets by the firewall. Additionally, a custom `com.proxmox.VMStateHelper` interface is implemented by the object, adding a small `Quit` method for cleanly shutting down the daemon via the D-Bus API. For all to work, D-Bus needs a policy describing who is allowed to access the interface. [2] Currently, there is a hard-limit of 1 MiB of state enforced by QEMU. Typical conntrack state entries as dumped by conntrack(8) in the `save` output format are just plaintext, ASCII lines and mostly around 150-200 characters. That translates then to about ~5200 entries that can be migrated. Such a typical line looks like: -A -t 431974 -u SEEN_REPLY,ASSURED -s 10.1.0.1 -d 10.1.1.20 \ -r 10.1.1.20 -q 10.1.0.1 -p tcp --sport 48550 --dport 22 \ --reply-port-src 22 --reply-port-dst 48550 --state ESTABLISHED In the future, compression could be implemented for these before sending them to QEMU, which should increase the above number quite a bit - since these entries are nicely compressible. [0] https://bugzilla.proxmox.com/show_bug.cgi?id=5180 [1] https://www.qemu.org/docs/master/interop/dbus-vmstate.html [2] https://dbus.freedesktop.org/doc/dbus-daemon.1.html#configuration_file Signed-off-by: Christoph Heiss <c.heiss@proxmox.com> --- Depends on patch #2 & #3 (iptables & nftables connmark support, accordingly) being applied first appropriate dependency bumps. Makefile | 3 + debian/control | 7 +- libexec/dbus-vmstate | 164 +++++++++++++++++++++++++++++++++++++++++ org.qemu.VMState1.conf | 11 +++ 4 files changed, 184 insertions(+), 1 deletion(-) create mode 100755 libexec/dbus-vmstate create mode 100644 org.qemu.VMState1.conf diff --git a/Makefile b/Makefile index ed67fe0a..1cd28d2b 100644 --- a/Makefile +++ b/Makefile @@ -7,6 +7,7 @@ DESTDIR= PREFIX=/usr SBINDIR=$(PREFIX)/sbin LIBDIR=$(PREFIX)/lib/$(PACKAGE) +LIBEXECDIR=$(PREFIX)/libexec/$(PACKAGE) MANDIR=$(PREFIX)/share/man DOCDIR=$(PREFIX)/share/doc MAN1DIR=$(MANDIR)/man1/ @@ -71,6 +72,8 @@ install: $(PKGSOURCES) install -m 0755 qm $(DESTDIR)$(SBINDIR) install -m 0755 qmrestore $(DESTDIR)$(SBINDIR) install -D -m 0644 modules-load.conf $(DESTDIR)/etc/modules-load.d/qemu-server.conf + install -D -m 0644 org.qemu.VMState1.conf $(DESTDIR)/etc/dbus-1/system.d/org.qemu.VMState1.conf + install -D -m 0644 libexec/dbus-vmstate $(DESTDIR)$(LIBEXECDIR)/dbus-vmstate install -m 0755 qmextract $(DESTDIR)$(LIBDIR) install -m 0644 qm.1 $(DESTDIR)/$(MAN1DIR) install -m 0644 qmrestore.1 $(DESTDIR)/$(MAN1DIR) diff --git a/debian/control b/debian/control index 81f0fad6..be488381 100644 --- a/debian/control +++ b/debian/control @@ -3,9 +3,11 @@ Section: admin Priority: optional Maintainer: Proxmox Support Team <support@proxmox.com> Build-Depends: debhelper-compat (= 13), + libclass-methodmaker-perl, libglib2.0-dev, libio-multiplex-perl, libjson-c-dev, + libnet-dbus-perl, libpve-apiclient-perl, libpve-cluster-perl, libpve-common-perl (>= 8.0.2), @@ -28,11 +30,14 @@ Homepage: https://www.proxmox.com Package: qemu-server Architecture: any -Depends: dbus, +Depends: conntrack, + dbus, genisoimage, + libclass-methodmaker-perl, libio-multiplex-perl, libjson-perl, libjson-xs-perl, + libnet-dbus-perl, libnet-ssleay-perl, libpve-access-control (>= 8.0.0~), libpve-apiclient-perl, diff --git a/libexec/dbus-vmstate b/libexec/dbus-vmstate new file mode 100755 index 00000000..52e51a32 --- /dev/null +++ b/libexec/dbus-vmstate @@ -0,0 +1,164 @@ +#!/usr/bin/perl + +# Exports an DBus object implementing +# https://www.qemu.org/docs/master/interop/dbus-vmstate.html + +package PVE::QemuServer::DBusVMState; + +use warnings; +use strict; + +use Carp; +use Net::DBus; +use Net::DBus::Exporter qw(org.qemu.VMState1); +use Net::DBus::Reactor; +use PVE::QemuServer::Helpers; +use PVE::QemuServer::QMPHelpers qw(qemu_objectadd qemu_objectdel); +use PVE::SafeSyslog; +use PVE::Tools; + +use base qw(Net::DBus::Object); + +use Class::MethodMaker [ scalar => [ qw(Id NumMigratedEntries) ]]; +dbus_property('Id', 'string', 'read'); +dbus_property('NumMigratedEntries', 'uint32', 'read', 'com.proxmox.VMStateHelper'); + +sub new { + my ($class, $service, $vmid) = @_; + + my $self = $class->SUPER::new($service, '/org/qemu/VMState1'); + $self->{vmid} = $vmid; + $self->Id("pve-vmstate-$vmid"); + $self->NumMigratedEntries(0); + + bless $self, $class; + return $self; +} + +sub Load { + my ($self, $bytes) = @_; + + my $len = scalar(@$bytes); + return if $len <= 1; # see also the `Save` method + + my $text = pack('c*', @$bytes); + + eval { + PVE::Tools::run_command( + ['conntrack', '--load-file', '-'], + input => $text, + ); + }; + if (my $err = $@) { + syslog('warn', "failed to restore conntrack state: $err\n"); + } else { + syslog('info', "restored $len bytes of conntrack state\n"); + } +} +dbus_method('Load', [['array', 'byte']], []); + +use constant { + # From the documentation: + # https://www.qemu.org/docs/master/interop/dbus-vmstate.html), + # > For now, the data amount to be transferred is arbitrarily limited to 1Mb. + # + # See also qemu/backends/dbus-vmstate.c:DBUS_VMSTATE_SIZE_LIMIT + DBUS_VMSTATE_SIZE_LIMIT => 1024 * 1024, +}; + +sub Save { + my ($self) = @_; + + my $text = ''; + my $truncated = 0; + my $num_entries = 0; + eval { + PVE::Tools::run_command( + ['conntrack', '--dump', '--mark', $self->{vmid}, '--output', 'save'], + outfunc => sub { + my ($line) = @_; + return if $truncated; + + if ((length($text) + length($line)) > DBUS_VMSTATE_SIZE_LIMIT) { + syslog('warn', 'conntrack state too large, ignoring further entries'); + $truncated = 1; + return; + } + + # conntrack(8) does not preserve the `--mark` option, apparently + # just add it back ourselves + $text .= "$line --mark $self->{vmid}\n"; + }, + errfunc => sub { + my ($line) = @_; + + if ($line =~ /(\d) flow entries/) { + syslog('info', "received $1 conntrack entries"); + # conntrack reports the number of displayed entries on stderr, + # which shouldn't be considered an error. + $self->NumMigratedEntries($1); + return; + } + syslog('err', $line); + } + ); + }; + if (my $err = $@) { + syslog('warn', "failed to save conntrack state: $err\n"); + + # Apparently either Net::DBus does not correctly zero-sized (byte) + # arrays correctly - returning [] yields QEMU failing with + # + # "kvm: dbus_save_state_proxy: Failed to Save: not a byte array" + # + # Thus, just return an array with a single element and detect that + # appropriately in the `Load`. A valid conntrack state can *never* be + # just a single byte, so it is safe to rely on that. + return [0]; + } + + my @bytes = unpack('c*', $text); + my $len = scalar(@bytes); + + syslog('info', "transferring $len bytes of conntrack state\n"); + + # Same as above w.r.t. returning as single-element array. + return $len == 0 ? [0] : \@bytes; +} +dbus_method('Save', [], [['array', 'byte']]); + +# Additional method for cleanly shutting down the service. +sub Quit { + my ($self) = @_; + + syslog('info', "shutting down gracefully ..\n"); + + # On the source side, the VM won't exist anymore, so no need to remove + # anything. + if (PVE::QemuServer::Helpers::vm_running_locally($self->{vmid})) { + eval { qemu_objectdel($self->{vmid}, 'pve-vmstate') }; + if (my $err = $@) { + syslog('warn', "failed to remove object: $err\n"); + } + } + + Net::DBus::Reactor->main()->shutdown(); +} +dbus_method('Quit', [], [], 'com.proxmox.VMStateHelper', { no_return => 1 }); + +my $vmid = shift; + +my $dbus = Net::DBus->system(); +my $service = $dbus->export_service('org.qemu.VMState1'); +my $obj = PVE::QemuServer::DBusVMState->new($service, $vmid); + +my $addr = $dbus->get_unique_name(); +syslog('info', "pve-vmstate-$vmid listening on $addr\n"); + +# Inform QEMU about our running dbus-vmstate helper +qemu_objectadd($vmid, 'pve-vmstate', 'dbus-vmstate', + addr => 'unix:path=/run/dbus/system_bus_socket', + 'id-list' => "pve-vmstate-$vmid", +); + +Net::DBus::Reactor->main()->run(); diff --git a/org.qemu.VMState1.conf b/org.qemu.VMState1.conf new file mode 100644 index 00000000..cfedcae4 --- /dev/null +++ b/org.qemu.VMState1.conf @@ -0,0 +1,11 @@ +<?xml version="1.0"?> +<!DOCTYPE busconfig PUBLIC "-//freedesktop//DTD D-BUS Bus Configuration 1.0//EN" + "http://www.freedesktop.org/standards/dbus/1.0/busconfig.dtd"> +<busconfig> + <policy user="root"> + <allow own="org.qemu.VMState1" /> + <allow send_destination="org.qemu.VMState1" /> + <allow receive_sender="org.qemu.VMState1" /> + <allow send_destination="com.proxmox.VMStateHelper" /> + </policy> +</busconfig> -- 2.48.1 _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel