public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH many 00/14] fix #5180: migrate conntrack state on live migration
@ 2025-03-17 14:11 Christoph Heiss
  2025-03-17 14:11 ` [pve-devel] [PATCH proxmox-ve-rs 01/14] config: guest: allow access to raw Vmid value Christoph Heiss
                   ` (13 more replies)
  0 siblings, 14 replies; 15+ messages in thread
From: Christoph Heiss @ 2025-03-17 14:11 UTC (permalink / raw)
  To: pve-devel

Fixes #5180 [0].

This implements migration of per-VM conntrack state on live-migration.

The core of the implementation are in patch #7 & #8. See there for more
details.

Patch #1 - #3 implement CONNMARK'ing any VM traffic with their unique
VMID. This is needed later on to filter conntrack entries for the
migration. These three patches can be applied independently,
CONNMARK'ing traffic does not have any visible impact.

Patch #13 & #14 are marked RFC, as I'm not sure if we need/should
implement that. But it's working well and cleanup of old resources is
always good IMHO.

Currently, remote/inter-cluster migration is not supported and indicated
to the user with a warning. See also patch #8 for a bit more in-depth
explanation.

Needed dependency bumps between packages are indicated in the notes
appropriately.

[0] https://bugzilla.proxmox.com/show_bug.cgi?id=5180

Testing
=======

I've primarily tested intra-cluster live-migrations, with both the
iptables-based and nftables-based firewall), using the reproducer as
described in #5180. I further verified that the D-Bus servers get
started as expected and are _always_ stopped, even in the case of some
migration error.

Finally, I also checked using `conntrack -L -m <vmid>` tool that the
conntrack entries are 
a) added/updated on the target node and 
b) removed from the source node (w/ patch #13/#14 applied).

Also tested was the migration from/to an "old" (unpatched) node, which
results in the issue as per #5180 & appropriate warnings in the UI.

For remote migrations, only tested that the warning is logged as
expected.

Diffstat
========

pve-firewall:

Christoph Heiss (2):
  firewall: add connmark rule with VMID to all guest chains
  firewall: helpers: add sub for flushing conntrack entries by mark

 debian/control              |  3 ++-
 src/PVE/Firewall.pm         |  7 +++++--
 src/PVE/Firewall/Helpers.pm | 11 +++++++++++
 3 files changed, 18 insertions(+), 3 deletions(-)

proxmox-firewall:

Christoph Heiss (1):
  firewall: add connmark rule with VMID to all guest chains

 proxmox-firewall/src/firewall.rs              | 14 ++-
 .../integration_tests__firewall.snap          | 85 ++++++++++++++++++-
 proxmox-nftables/src/expression.rs            |  9 ++
 proxmox-nftables/src/statement.rs             | 10 ++-
 4 files changed, 114 insertions(+), 4 deletions(-)

proxmox-ve-rs:

Christoph Heiss (1):
  config: guest: allow access to raw Vmid value

 proxmox-ve-config/src/guest/types.rs | 4 ++++
 1 file changed, 4 insertions(+)

qemu-server:

Christoph Heiss (5):
  qmp helpers: allow passing structured args via qemu_objectadd()
  api2: qemu: add module exposing node migration capabilities
  fix #5180: libexec: add QEMU dbus-vmstate daemon for migrating
    conntrack
  fix #5180: migrate: integrate helper for live-migrating conntrack info
  migrate: flush old VM conntrack entries after successful migration

 Makefile                      |   3 +
 PVE/API2/Qemu.pm              |  72 +++++++++++++++
 PVE/API2/Qemu/Makefile        |   2 +-
 PVE/API2/Qemu/Migration.pm    |  46 ++++++++++
 PVE/CLI/qm.pm                 |   5 ++
 PVE/QemuMigrate.pm            |  69 ++++++++++++++
 PVE/QemuServer.pm             |   6 ++
 PVE/QemuServer/DBusVMState.pm | 124 +++++++++++++++++++++++++
 PVE/QemuServer/Makefile       |   1 +
 PVE/QemuServer/QMPHelpers.pm  |   4 +-
 debian/control                |   7 +-
 libexec/dbus-vmstate          | 164 ++++++++++++++++++++++++++++++++++
 org.qemu.VMState1.conf        |  11 +++
 13 files changed, 510 insertions(+), 4 deletions(-)
 create mode 100644 PVE/API2/Qemu/Migration.pm
 create mode 100644 PVE/QemuServer/DBusVMState.pm
 create mode 100755 libexec/dbus-vmstate
 create mode 100644 org.qemu.VMState1.conf

pve-common:

Christoph Heiss (1):
  tools: add run_fork_detached() for spawning daemons

 src/PVE/Tools.pm | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

pve-manager:

Christoph Heiss (4):
  api2: capabilities: explicitly import CPU capabilities module
  api2: capabilities: proxy index endpoints to respective nodes
  api2: capabilities: expose new qemu/migration endpoint
  ui: window: Migrate: add checkbox for migrating VM conntrack state

 PVE/API2/Capabilities.pm       |  9 +++++
 www/manager6/window/Migrate.js | 73 ++++++++++++++++++++++++++++++++--
 2 files changed, 78 insertions(+), 4 deletions(-)

-- 
2.47.1



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pve-devel] [PATCH proxmox-ve-rs 01/14] config: guest: allow access to raw Vmid value
  2025-03-17 14:11 [pve-devel] [PATCH many 00/14] fix #5180: migrate conntrack state on live migration Christoph Heiss
@ 2025-03-17 14:11 ` Christoph Heiss
  2025-03-17 14:11 ` [pve-devel] [PATCH proxmox-firewall 02/14] firewall: add connmark rule with VMID to all guest chains Christoph Heiss
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Christoph Heiss @ 2025-03-17 14:11 UTC (permalink / raw)
  To: pve-devel

Needed in proxmox-nftables/-firewall to generate rules depending on the
numeric vmid.

Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
---
 proxmox-ve-config/src/guest/types.rs | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/proxmox-ve-config/src/guest/types.rs b/proxmox-ve-config/src/guest/types.rs
index ed6a48c..a0fb67d 100644
--- a/proxmox-ve-config/src/guest/types.rs
+++ b/proxmox-ve-config/src/guest/types.rs
@@ -13,6 +13,10 @@ impl Vmid {
     pub fn new(id: u32) -> Self {
         Vmid(id)
     }
+
+    pub fn raw_value(&self) -> u32 {
+        self.0
+    }
 }
 
 impl From<u32> for Vmid {
-- 
2.48.1



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pve-devel] [PATCH proxmox-firewall 02/14] firewall: add connmark rule with VMID to all guest chains
  2025-03-17 14:11 [pve-devel] [PATCH many 00/14] fix #5180: migrate conntrack state on live migration Christoph Heiss
  2025-03-17 14:11 ` [pve-devel] [PATCH proxmox-ve-rs 01/14] config: guest: allow access to raw Vmid value Christoph Heiss
@ 2025-03-17 14:11 ` Christoph Heiss
  2025-03-17 14:11 ` [pve-devel] [PATCH pve-firewall 03/14] " Christoph Heiss
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Christoph Heiss @ 2025-03-17 14:11 UTC (permalink / raw)
  To: pve-devel

Adds a connmark attribute with the VMID inside to anything flowing
in/out the guest, which are also carried over to all conntrack entries.

This enables differentiating conntrack entries between VMs for
live-migration.

Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
---
Depends on patch #1 being applied first to proxmox-ve-rs & a appropriate
crate bump.

 proxmox-firewall/src/firewall.rs              | 14 ++-
 .../integration_tests__firewall.snap          | 85 ++++++++++++++++++-
 proxmox-nftables/src/expression.rs            |  9 ++
 proxmox-nftables/src/statement.rs             | 10 ++-
 4 files changed, 114 insertions(+), 4 deletions(-)

diff --git a/proxmox-firewall/src/firewall.rs b/proxmox-firewall/src/firewall.rs
index 88fb460..9f7df56 100644
--- a/proxmox-firewall/src/firewall.rs
+++ b/proxmox-firewall/src/firewall.rs
@@ -6,7 +6,9 @@ use anyhow::{bail, Error};
 use proxmox_nftables::command::{Add, Commands, Delete, Flush};
 use proxmox_nftables::expression::{Meta, Payload};
 use proxmox_nftables::helper::NfVec;
-use proxmox_nftables::statement::{AnonymousLimit, Log, LogLevel, Match, Set, SetOperation};
+use proxmox_nftables::statement::{
+    AnonymousLimit, Log, LogLevel, Mangle, Match, Set, SetOperation,
+};
 use proxmox_nftables::types::{
     AddElement, AddRule, ChainPart, MapValue, RateTimescale, SetName, TableFamily, TableName,
     TablePart, Verdict,
@@ -934,7 +936,15 @@ impl Firewall {
             vmid: Some(vmid),
         };
 
-        commands.reserve(config.rules().len());
+        commands.reserve(config.rules().len() + 1);
+
+        // Add a connmark to anything in/out the guest, to be able to later
+        // track/filter per guest, e.g. in the pve-conntrack-tool.
+        // Need to be first, such that it is always applied.
+        commands.push(Add::rule(AddRule::from_statement(
+            chain.clone(),
+            Mangle::ct_mark(vmid),
+        )));
 
         for config_rule in config.rules() {
             for rule in NftRule::from_config_rule(config_rule, &env)? {
diff --git a/proxmox-firewall/tests/snapshots/integration_tests__firewall.snap b/proxmox-firewall/tests/snapshots/integration_tests__firewall.snap
index 9194fc6..aa29e6e 100644
--- a/proxmox-firewall/tests/snapshots/integration_tests__firewall.snap
+++ b/proxmox-firewall/tests/snapshots/integration_tests__firewall.snap
@@ -1,7 +1,6 @@
 ---
 source: proxmox-firewall/tests/integration_tests.rs
 expression: "firewall.full_host_fw().expect(\"firewall can be generated\")"
-snapshot_kind: text
 ---
 {
   "nftables": [
@@ -4373,6 +4372,27 @@ snapshot_kind: text
         }
       }
     },
+    {
+      "add": {
+        "rule": {
+          "family": "bridge",
+          "table": "proxmox-firewall-guests",
+          "chain": "guest-100-in",
+          "expr": [
+            {
+              "mangle": {
+                "key": {
+                  "ct": {
+                    "key": "mark"
+                  }
+                },
+                "value": 100
+              }
+            }
+          ]
+        }
+      }
+    },
     {
       "add": {
         "rule": {
@@ -4648,6 +4668,27 @@ snapshot_kind: text
         }
       }
     },
+    {
+      "add": {
+        "rule": {
+          "family": "bridge",
+          "table": "proxmox-firewall-guests",
+          "chain": "guest-100-out",
+          "expr": [
+            {
+              "mangle": {
+                "key": {
+                  "ct": {
+                    "key": "mark"
+                  }
+                },
+                "value": 100
+              }
+            }
+          ]
+        }
+      }
+    },
     {
       "add": {
         "rule": {
@@ -5034,6 +5075,27 @@ snapshot_kind: text
         }
       }
     },
+    {
+      "add": {
+        "rule": {
+          "family": "bridge",
+          "table": "proxmox-firewall-guests",
+          "chain": "guest-101-in",
+          "expr": [
+            {
+              "mangle": {
+                "key": {
+                  "ct": {
+                    "key": "mark"
+                  }
+                },
+                "value": 101
+              }
+            }
+          ]
+        }
+      }
+    },
     {
       "add": {
         "rule": {
@@ -5096,6 +5158,27 @@ snapshot_kind: text
         }
       }
     },
+    {
+      "add": {
+        "rule": {
+          "family": "bridge",
+          "table": "proxmox-firewall-guests",
+          "chain": "guest-101-out",
+          "expr": [
+            {
+              "mangle": {
+                "key": {
+                  "ct": {
+                    "key": "mark"
+                  }
+                },
+                "value": 101
+              }
+            }
+          ]
+        }
+      }
+    },
     {
       "add": {
         "rule": {
diff --git a/proxmox-nftables/src/expression.rs b/proxmox-nftables/src/expression.rs
index e9ef94f..cbafe85 100644
--- a/proxmox-nftables/src/expression.rs
+++ b/proxmox-nftables/src/expression.rs
@@ -12,6 +12,8 @@ use proxmox_ve_config::firewall::types::port::{PortEntry, PortList};
 use proxmox_ve_config::firewall::types::rule_match::{IcmpCode, IcmpType, Icmpv6Code, Icmpv6Type};
 #[cfg(feature = "config-ext")]
 use proxmox_ve_config::firewall::types::Cidr;
+#[cfg(feature = "config-ext")]
+use proxmox_ve_config::guest::types::Vmid;
 
 #[derive(Clone, Debug, Deserialize, Serialize)]
 #[serde(rename_all = "lowercase")]
@@ -267,6 +269,13 @@ impl From<&BridgeName> for Expression {
     }
 }
 
+#[cfg(feature = "config-ext")]
+impl From<Vmid> for Expression {
+    fn from(value: Vmid) -> Self {
+        Expression::Number(value.raw_value().into())
+    }
+}
+
 #[derive(Clone, Debug, Deserialize, Serialize)]
 pub struct Meta {
     key: String,
diff --git a/proxmox-nftables/src/statement.rs b/proxmox-nftables/src/statement.rs
index 5483368..3264e6c 100644
--- a/proxmox-nftables/src/statement.rs
+++ b/proxmox-nftables/src/statement.rs
@@ -10,6 +10,7 @@ use proxmox_ve_config::firewall::types::rule::Verdict as ConfigVerdict;
 #[cfg(feature = "config-ext")]
 use proxmox_ve_config::guest::types::Vmid;
 
+use crate::expression::Ct;
 use crate::expression::Meta;
 use crate::helper::{NfVec, Null};
 use crate::types::{RateTimescale, RateUnit, Verdict};
@@ -370,12 +371,19 @@ pub struct Mangle {
 }
 
 impl Mangle {
-    pub fn set_mark(value: impl Into<Expression>) -> Self {
+    pub fn meta_mark(value: impl Into<Expression>) -> Self {
         Self {
             key: Meta::new("mark").into(),
             value: value.into(),
         }
     }
+
+    pub fn ct_mark(value: impl Into<Expression>) -> Self {
+        Self {
+            key: Ct::new("mark", None).into(),
+            value: value.into(),
+        }
+    }
 }
 
 #[derive(Clone, Copy, Debug, Deserialize, Serialize)]
-- 
2.48.1



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pve-devel] [PATCH pve-firewall 03/14] firewall: add connmark rule with VMID to all guest chains
  2025-03-17 14:11 [pve-devel] [PATCH many 00/14] fix #5180: migrate conntrack state on live migration Christoph Heiss
  2025-03-17 14:11 ` [pve-devel] [PATCH proxmox-ve-rs 01/14] config: guest: allow access to raw Vmid value Christoph Heiss
  2025-03-17 14:11 ` [pve-devel] [PATCH proxmox-firewall 02/14] firewall: add connmark rule with VMID to all guest chains Christoph Heiss
@ 2025-03-17 14:11 ` Christoph Heiss
  2025-03-17 14:11 ` [pve-devel] [PATCH common 04/14] tools: add run_fork_detached() for spawning daemons Christoph Heiss
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Christoph Heiss @ 2025-03-17 14:11 UTC (permalink / raw)
  To: pve-devel

Adds a connmark attribute with the VMID inside to anything flowing
in/out the guest, which are also carried over to all conntrack entries.

This enables differentiating conntrack entries between VMs for
live-migration.

Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
---
 src/PVE/Firewall.pm | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/PVE/Firewall.pm b/src/PVE/Firewall.pm
index 533f2a2..5f7f72d 100644
--- a/src/PVE/Firewall.pm
+++ b/src/PVE/Firewall.pm
@@ -2468,11 +2468,14 @@ sub ruleset_chain_add_input_filters {
 }
 
 sub ruleset_create_vm_chain {
-    my ($ruleset, $chain, $ipversion, $options, $macaddr, $ipfilter_ipset, $direction) = @_;
+    my ($ruleset, $chain, $ipversion, $options, $macaddr, $ipfilter_ipset, $direction, $vmid) = @_;
 
     ruleset_create_chain($ruleset, $chain);
     my $accept = generate_nfqueue($options);
 
+    # needs to be first, to ensure that it gets always applied
+    ruleset_addrule($ruleset, $chain, "", "-j CONNMARK --set-mark $vmid");
+
     if (!(defined($options->{dhcp}) && $options->{dhcp} == 0)) {
 	if ($ipversion == 4) {
 	    if ($direction eq 'OUT') {
@@ -2619,7 +2622,7 @@ sub generate_tap_rules_direction {
 
     if ($options->{enable}) {
 	# create chain with mac and ip filter
-	ruleset_create_vm_chain($ruleset, $tapchain, $ipversion, $options, $macaddr, $ipfilter_ipset, $direction);
+	ruleset_create_vm_chain($ruleset, $tapchain, $ipversion, $options, $macaddr, $ipfilter_ipset, $direction, $vmid);
 
 	ruleset_generate_vm_rules($ruleset, $rules, $cluster_conf, $vmfw_conf, $tapchain, $netid, $direction, $options, $ipversion, $vmid);
 
-- 
2.48.1



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pve-devel] [PATCH common 04/14] tools: add run_fork_detached() for spawning daemons
  2025-03-17 14:11 [pve-devel] [PATCH many 00/14] fix #5180: migrate conntrack state on live migration Christoph Heiss
                   ` (2 preceding siblings ...)
  2025-03-17 14:11 ` [pve-devel] [PATCH pve-firewall 03/14] " Christoph Heiss
@ 2025-03-17 14:11 ` Christoph Heiss
  2025-03-17 14:11 ` [pve-devel] [PATCH qemu-server 05/14] qmp helpers: allow passing structured args via qemu_objectadd() Christoph Heiss
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Christoph Heiss @ 2025-03-17 14:11 UTC (permalink / raw)
  To: pve-devel

This essentially just does a fork() + setsid().
Needed to e.g. properly spawn background processes.

Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
---
Something similar is already used in e.g. pve-storage to spawn fuse
mounts. If and when this is applied, I'd migrate these sites to this sub
too.

 src/PVE/Tools.pm | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/src/PVE/Tools.pm b/src/PVE/Tools.pm
index 0325f53..f5bf24a 100644
--- a/src/PVE/Tools.pm
+++ b/src/PVE/Tools.pm
@@ -1117,6 +1117,36 @@ sub run_fork {
     return run_fork_with_timeout(undef, $code, $opts);
 }
 
+sub run_fork_detached {
+    my ($fn) = @_;
+
+    pipe(my $rd, my $wr) or die "failed to create pipe: $!\n";
+
+    my $pid = fork();
+    die "fork failed: $!\n" if !defined($pid);
+
+    if (!$pid) {
+	undef $rd;
+	POSIX::setsid();
+
+	eval { $fn->(); };
+	if (my $err = $@) {
+	    print {$wr} "ERROR: $err";
+	}
+	POSIX::_exit(1);
+    };
+    undef $wr;
+
+    my $result = do { local $/ = undef; <$rd> };
+    if ($result =~ /^ERROR: (.*)$/) {
+	die "$1\n";
+    }
+
+    if (waitpid($pid, POSIX::WNOHANG) == $pid) {
+	die "failed to spawn process, process exited with status $?\n";
+    }
+}
+
 # NOTE: NFS syscall can't be interrupted, so alarm does
 # not work to provide timeouts.
 # from 'man nfs': "Only SIGKILL can interrupt a pending NFS operation"
-- 
2.48.1



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pve-devel] [PATCH qemu-server 05/14] qmp helpers: allow passing structured args via qemu_objectadd()
  2025-03-17 14:11 [pve-devel] [PATCH many 00/14] fix #5180: migrate conntrack state on live migration Christoph Heiss
                   ` (3 preceding siblings ...)
  2025-03-17 14:11 ` [pve-devel] [PATCH common 04/14] tools: add run_fork_detached() for spawning daemons Christoph Heiss
@ 2025-03-17 14:11 ` Christoph Heiss
  2025-03-17 14:11 ` [pve-devel] [PATCH qemu-server 06/14] api2: qemu: add module exposing node migration capabilities Christoph Heiss
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Christoph Heiss @ 2025-03-17 14:11 UTC (permalink / raw)
  To: pve-devel

No functional changes for existing code.

Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
---
 PVE/QemuServer/QMPHelpers.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/PVE/QemuServer/QMPHelpers.pm b/PVE/QemuServer/QMPHelpers.pm
index 5f73b01e..c6a8f166 100644
--- a/PVE/QemuServer/QMPHelpers.pm
+++ b/PVE/QemuServer/QMPHelpers.pm
@@ -36,9 +36,9 @@ sub qemu_devicedel {
 }
 
 sub qemu_objectadd {
-    my ($vmid, $objectid, $qomtype) = @_;
+    my ($vmid, $objectid, $qomtype, %args) = @_;
 
-    mon_cmd($vmid, "object-add", id => $objectid, "qom-type" => $qomtype);
+    mon_cmd($vmid, "object-add", id => $objectid, "qom-type" => $qomtype, %args);
 
     return 1;
 }
-- 
2.48.1



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pve-devel] [PATCH qemu-server 06/14] api2: qemu: add module exposing node migration capabilities
  2025-03-17 14:11 [pve-devel] [PATCH many 00/14] fix #5180: migrate conntrack state on live migration Christoph Heiss
                   ` (4 preceding siblings ...)
  2025-03-17 14:11 ` [pve-devel] [PATCH qemu-server 05/14] qmp helpers: allow passing structured args via qemu_objectadd() Christoph Heiss
@ 2025-03-17 14:11 ` Christoph Heiss
  2025-03-17 14:11 ` [pve-devel] [PATCH qemu-server 07/14] fix #5180: libexec: add QEMU dbus-vmstate daemon for migrating conntrack Christoph Heiss
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Christoph Heiss @ 2025-03-17 14:11 UTC (permalink / raw)
  To: pve-devel

Similar to the already existing ones for CPU and QEMU machine support.

Very simple for now, only provides one property for now:

  'has-dbus-vmstate' - Whether the dbus-vmstate is available/installed

Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
---
 PVE/API2/Qemu/Makefile     |  2 +-
 PVE/API2/Qemu/Migration.pm | 46 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+), 1 deletion(-)
 create mode 100644 PVE/API2/Qemu/Migration.pm

diff --git a/PVE/API2/Qemu/Makefile b/PVE/API2/Qemu/Makefile
index 5d4abda6..15f7217a 100644
--- a/PVE/API2/Qemu/Makefile
+++ b/PVE/API2/Qemu/Makefile
@@ -1,4 +1,4 @@
-SOURCES=Agent.pm CPU.pm Machine.pm
+SOURCES=Agent.pm CPU.pm Machine.pm Migration.pm
 
 .PHONY: install
 install:
diff --git a/PVE/API2/Qemu/Migration.pm b/PVE/API2/Qemu/Migration.pm
new file mode 100644
index 00000000..34125a15
--- /dev/null
+++ b/PVE/API2/Qemu/Migration.pm
@@ -0,0 +1,46 @@
+package PVE::API2::Qemu::Migration;
+
+use strict;
+use warnings;
+
+use JSON;
+use PVE::JSONSchema qw(get_standard_option);
+use PVE::RESTHandler;
+
+use base qw(PVE::RESTHandler);
+
+__PACKAGE__->register_method({
+    name => 'capabilities',
+    path => '',
+    method => 'GET',
+    proxyto => 'node',
+    description => 'Get migration capabilities of the node.'
+	. " Requires the 'Sys.Audit' permission on '/nodes/<node>'.",
+    permissions => {
+	check => ['perm', '/nodes/{node}', [ 'Sys.Audit' ]],
+    },
+    parameters => {
+	additionalProperties => 0,
+	properties => {
+	    node => get_standard_option('pve-node'),
+	},
+    },
+    returns => {
+	type => 'object',
+	additionalProperties => 0,
+	properties => {
+	    'dbus-vmstate' => {
+		type => 'boolean',
+		description => 'Whether the host supports live-migrating additional'
+		    . ' VM state via the dbus-vmstate helper.',
+	    },
+	},
+    },
+    code => sub {
+	return {
+	    'has-dbus-vmstate' => -f '/usr/libexec/qemu-server/dbus-vmstate' ? JSON::true : JSON::false,
+	};
+    }
+});
+
+1;
-- 
2.48.1



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pve-devel] [PATCH qemu-server 07/14] fix #5180: libexec: add QEMU dbus-vmstate daemon for migrating conntrack
  2025-03-17 14:11 [pve-devel] [PATCH many 00/14] fix #5180: migrate conntrack state on live migration Christoph Heiss
                   ` (5 preceding siblings ...)
  2025-03-17 14:11 ` [pve-devel] [PATCH qemu-server 06/14] api2: qemu: add module exposing node migration capabilities Christoph Heiss
@ 2025-03-17 14:11 ` Christoph Heiss
  2025-03-17 14:11 ` [pve-devel] [PATCH qemu-server 08/14] fix #5180: migrate: integrate helper for live-migrating conntrack info Christoph Heiss
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Christoph Heiss @ 2025-03-17 14:11 UTC (permalink / raw)
  To: pve-devel

First part to fixing #5180 [0].

Adds a simple D-Bus server which implements the `org.qemu.VMState1`
interface as specified in the QEMU documentation [1].

Using the built-in QEMU VMState machinery saves us from having to worry
about transfer and convergence of the data and letl QEMU take care of
it.

Any object on the D-Bus path `/org/qemu/VMState1` implementing that
interface will be called by QEMU during live-migration, iif the `Id`
property is registered within the `dbus-vmstate` QEMU object for a
specific VM.

The actual state loading/restoring is done via the conntrack(8) tool, a
small tool which already implements hard parts of interacting with the
conntrack subsystem via netlink.

Filtering is done on CONNMARK, which is set to the specific VMID for all
packets by the firewall.

Additionally, a custom `com.proxmox.VMStateHelper` interface is
implemented by the object, adding a small `Quit` method for cleanly
shutting down the daemon via the D-Bus API.

For all to work, D-Bus needs a policy describing who is allowed to
access the interface. [2]

Currently, there is a hard-limit of 1 MiB of state enforced by QEMU.
Typical conntrack state entries as dumped by conntrack(8) in the `save`
output format are just plaintext, ASCII lines and mostly around
150-200 characters. That translates then to about ~5200 entries that can
be migrated.

Such a typical line looks like:

  -A -t 431974 -u SEEN_REPLY,ASSURED -s 10.1.0.1 -d 10.1.1.20 \
  -r 10.1.1.20 -q 10.1.0.1 -p tcp --sport 48550 --dport 22 \
  --reply-port-src 22 --reply-port-dst 48550 --state ESTABLISHED

In the future, compression could be implemented for these before sending
them to QEMU, which should increase the above number quite a bit - since
these entries are nicely compressible.

[0] https://bugzilla.proxmox.com/show_bug.cgi?id=5180
[1] https://www.qemu.org/docs/master/interop/dbus-vmstate.html
[2] https://dbus.freedesktop.org/doc/dbus-daemon.1.html#configuration_file

Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
---
Depends on patch #2 & #3 (iptables & nftables connmark support,
accordingly) being applied first appropriate dependency bumps.

 Makefile               |   3 +
 debian/control         |   7 +-
 libexec/dbus-vmstate   | 164 +++++++++++++++++++++++++++++++++++++++++
 org.qemu.VMState1.conf |  11 +++
 4 files changed, 184 insertions(+), 1 deletion(-)
 create mode 100755 libexec/dbus-vmstate
 create mode 100644 org.qemu.VMState1.conf

diff --git a/Makefile b/Makefile
index ed67fe0a..1cd28d2b 100644
--- a/Makefile
+++ b/Makefile
@@ -7,6 +7,7 @@ DESTDIR=
 PREFIX=/usr
 SBINDIR=$(PREFIX)/sbin
 LIBDIR=$(PREFIX)/lib/$(PACKAGE)
+LIBEXECDIR=$(PREFIX)/libexec/$(PACKAGE)
 MANDIR=$(PREFIX)/share/man
 DOCDIR=$(PREFIX)/share/doc
 MAN1DIR=$(MANDIR)/man1/
@@ -71,6 +72,8 @@ install: $(PKGSOURCES)
 	install -m 0755 qm $(DESTDIR)$(SBINDIR)
 	install -m 0755 qmrestore $(DESTDIR)$(SBINDIR)
 	install -D -m 0644 modules-load.conf $(DESTDIR)/etc/modules-load.d/qemu-server.conf
+	install -D -m 0644 org.qemu.VMState1.conf $(DESTDIR)/etc/dbus-1/system.d/org.qemu.VMState1.conf
+	install -D -m 0644 libexec/dbus-vmstate $(DESTDIR)$(LIBEXECDIR)/dbus-vmstate
 	install -m 0755 qmextract $(DESTDIR)$(LIBDIR)
 	install -m 0644 qm.1 $(DESTDIR)/$(MAN1DIR)
 	install -m 0644 qmrestore.1 $(DESTDIR)/$(MAN1DIR)
diff --git a/debian/control b/debian/control
index 81f0fad6..be488381 100644
--- a/debian/control
+++ b/debian/control
@@ -3,9 +3,11 @@ Section: admin
 Priority: optional
 Maintainer: Proxmox Support Team <support@proxmox.com>
 Build-Depends: debhelper-compat (= 13),
+               libclass-methodmaker-perl,
                libglib2.0-dev,
                libio-multiplex-perl,
                libjson-c-dev,
+               libnet-dbus-perl,
                libpve-apiclient-perl,
                libpve-cluster-perl,
                libpve-common-perl (>= 8.0.2),
@@ -28,11 +30,14 @@ Homepage: https://www.proxmox.com
 
 Package: qemu-server
 Architecture: any
-Depends: dbus,
+Depends: conntrack,
+         dbus,
          genisoimage,
+         libclass-methodmaker-perl,
          libio-multiplex-perl,
          libjson-perl,
          libjson-xs-perl,
+         libnet-dbus-perl,
          libnet-ssleay-perl,
          libpve-access-control (>= 8.0.0~),
          libpve-apiclient-perl,
diff --git a/libexec/dbus-vmstate b/libexec/dbus-vmstate
new file mode 100755
index 00000000..52e51a32
--- /dev/null
+++ b/libexec/dbus-vmstate
@@ -0,0 +1,164 @@
+#!/usr/bin/perl
+
+# Exports an DBus object implementing
+# https://www.qemu.org/docs/master/interop/dbus-vmstate.html
+
+package PVE::QemuServer::DBusVMState;
+
+use warnings;
+use strict;
+
+use Carp;
+use Net::DBus;
+use Net::DBus::Exporter qw(org.qemu.VMState1);
+use Net::DBus::Reactor;
+use PVE::QemuServer::Helpers;
+use PVE::QemuServer::QMPHelpers qw(qemu_objectadd qemu_objectdel);
+use PVE::SafeSyslog;
+use PVE::Tools;
+
+use base qw(Net::DBus::Object);
+
+use Class::MethodMaker [ scalar => [ qw(Id NumMigratedEntries) ]];
+dbus_property('Id', 'string', 'read');
+dbus_property('NumMigratedEntries', 'uint32', 'read', 'com.proxmox.VMStateHelper');
+
+sub new {
+    my ($class, $service, $vmid) = @_;
+
+    my $self = $class->SUPER::new($service, '/org/qemu/VMState1');
+    $self->{vmid} = $vmid;
+    $self->Id("pve-vmstate-$vmid");
+    $self->NumMigratedEntries(0);
+
+    bless $self, $class;
+    return $self;
+}
+
+sub Load {
+    my ($self, $bytes) = @_;
+
+    my $len = scalar(@$bytes);
+    return if $len <= 1; # see also the `Save` method
+
+    my $text = pack('c*', @$bytes);
+
+    eval {
+	PVE::Tools::run_command(
+	    ['conntrack', '--load-file', '-'],
+	    input => $text,
+	);
+    };
+    if (my $err = $@) {
+	syslog('warn', "failed to restore conntrack state: $err\n");
+    } else {
+	syslog('info', "restored $len bytes of conntrack state\n");
+    }
+}
+dbus_method('Load', [['array', 'byte']], []);
+
+use constant {
+    # From the documentation:
+    #   https://www.qemu.org/docs/master/interop/dbus-vmstate.html),
+    # > For now, the data amount to be transferred is arbitrarily limited to 1Mb.
+    #
+    # See also qemu/backends/dbus-vmstate.c:DBUS_VMSTATE_SIZE_LIMIT
+    DBUS_VMSTATE_SIZE_LIMIT => 1024 * 1024,
+};
+
+sub Save {
+    my ($self) = @_;
+
+    my $text = '';
+    my $truncated = 0;
+    my $num_entries = 0;
+    eval {
+	PVE::Tools::run_command(
+	    ['conntrack', '--dump', '--mark', $self->{vmid}, '--output', 'save'],
+	    outfunc => sub {
+		my ($line) = @_;
+		return if $truncated;
+
+		if ((length($text) + length($line)) > DBUS_VMSTATE_SIZE_LIMIT) {
+		   syslog('warn', 'conntrack state too large, ignoring further entries');
+		   $truncated = 1;
+		   return;
+		}
+
+		# conntrack(8) does not preserve the `--mark` option, apparently
+		# just add it back ourselves
+		$text .= "$line --mark $self->{vmid}\n";
+	    },
+	    errfunc => sub {
+		my ($line) = @_;
+
+		if ($line =~ /(\d) flow entries/) {
+		    syslog('info', "received $1 conntrack entries");
+		    # conntrack reports the number of displayed entries on stderr,
+		    # which shouldn't be considered an error.
+		    $self->NumMigratedEntries($1);
+		    return;
+		}
+		syslog('err', $line);
+	    }
+	);
+    };
+    if (my $err = $@) {
+	syslog('warn', "failed to save conntrack state: $err\n");
+
+	# Apparently either Net::DBus does not correctly zero-sized (byte)
+	# arrays correctly - returning [] yields QEMU failing with
+	#
+	#   "kvm: dbus_save_state_proxy: Failed to Save: not a byte array"
+	#
+	# Thus, just return an array with a single element and detect that
+	# appropriately in the `Load`. A valid conntrack state can *never* be
+	# just a single byte, so it is safe to rely on that.
+	return [0];
+    }
+
+    my @bytes = unpack('c*', $text);
+    my $len = scalar(@bytes);
+
+    syslog('info', "transferring $len bytes of conntrack state\n");
+
+    # Same as above w.r.t. returning as single-element array.
+    return $len == 0 ? [0] : \@bytes;
+}
+dbus_method('Save', [], [['array', 'byte']]);
+
+# Additional method for cleanly shutting down the service.
+sub Quit {
+    my ($self) = @_;
+
+    syslog('info', "shutting down gracefully ..\n");
+
+    # On the source side, the VM won't exist anymore, so no need to remove
+    # anything.
+    if (PVE::QemuServer::Helpers::vm_running_locally($self->{vmid})) {
+	eval { qemu_objectdel($self->{vmid}, 'pve-vmstate') };
+	if (my $err = $@) {
+	    syslog('warn', "failed to remove object: $err\n");
+	}
+    }
+
+    Net::DBus::Reactor->main()->shutdown();
+}
+dbus_method('Quit', [], [], 'com.proxmox.VMStateHelper', { no_return => 1 });
+
+my $vmid = shift;
+
+my $dbus = Net::DBus->system();
+my $service = $dbus->export_service('org.qemu.VMState1');
+my $obj = PVE::QemuServer::DBusVMState->new($service, $vmid);
+
+my $addr = $dbus->get_unique_name();
+syslog('info', "pve-vmstate-$vmid listening on $addr\n");
+
+# Inform QEMU about our running dbus-vmstate helper
+qemu_objectadd($vmid, 'pve-vmstate', 'dbus-vmstate',
+    addr => 'unix:path=/run/dbus/system_bus_socket',
+    'id-list' => "pve-vmstate-$vmid",
+);
+
+Net::DBus::Reactor->main()->run();
diff --git a/org.qemu.VMState1.conf b/org.qemu.VMState1.conf
new file mode 100644
index 00000000..cfedcae4
--- /dev/null
+++ b/org.qemu.VMState1.conf
@@ -0,0 +1,11 @@
+<?xml version="1.0"?>
+<!DOCTYPE busconfig PUBLIC "-//freedesktop//DTD D-BUS Bus Configuration 1.0//EN"
+        "http://www.freedesktop.org/standards/dbus/1.0/busconfig.dtd">
+<busconfig>
+  <policy user="root">
+    <allow own="org.qemu.VMState1" />
+    <allow send_destination="org.qemu.VMState1" />
+    <allow receive_sender="org.qemu.VMState1" />
+    <allow send_destination="com.proxmox.VMStateHelper" />
+  </policy>
+</busconfig>
-- 
2.48.1



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pve-devel] [PATCH qemu-server 08/14] fix #5180: migrate: integrate helper for live-migrating conntrack info
  2025-03-17 14:11 [pve-devel] [PATCH many 00/14] fix #5180: migrate conntrack state on live migration Christoph Heiss
                   ` (6 preceding siblings ...)
  2025-03-17 14:11 ` [pve-devel] [PATCH qemu-server 07/14] fix #5180: libexec: add QEMU dbus-vmstate daemon for migrating conntrack Christoph Heiss
@ 2025-03-17 14:11 ` Christoph Heiss
  2025-03-17 14:11 ` [pve-devel] [PATCH manager 09/14] api2: capabilities: explicitly import CPU capabilities module Christoph Heiss
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Christoph Heiss @ 2025-03-17 14:11 UTC (permalink / raw)
  To: pve-devel

Fixes #5180 [0].

This implements for live-migration:
a) the dbus-vmstate is started on the target side, together with the VM
b) the dbus-vmstate helper is started on the source side
c) everything is cleaned up properly, in any case

It is currently off-by-default and must be enabled via the optional
`with-conntrack-state` migration parameter.

The conntrack entry migration is done in such a way that it can
soft-fail, w/o impacting the actual migration, i.e. considering it
"best-effort".

A failed conntrack entry migration does not have any real impact on
functionality, other than it might exhibit the problems as lined out in
the issue [0].

For remote migrations, only a warning is thrown for now. Cross-cluster
migration has stricter requirements and is not a "one-size-fits-it-all".
E.g. the most promentient issue if the network segmentation is
different, which would make the conntrack entries useless or require
careful rewriting.

[0] https://bugzilla.proxmox.com/show_bug.cgi?id=5180

Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
---
Depends on patch #4 to pve-common & an dependency bump of it.

 PVE/API2/Qemu.pm              |  72 ++++++++++++++++++++
 PVE/CLI/qm.pm                 |   5 ++
 PVE/QemuMigrate.pm            |  64 ++++++++++++++++++
 PVE/QemuServer.pm             |   6 ++
 PVE/QemuServer/DBusVMState.pm | 124 ++++++++++++++++++++++++++++++++++
 PVE/QemuServer/Makefile       |   1 +
 6 files changed, 272 insertions(+)
 create mode 100644 PVE/QemuServer/DBusVMState.pm

diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index 156b1c7b..4d7b8196 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -39,6 +39,7 @@ use PVE::QemuServer::MetaInfo;
 use PVE::QemuServer::PCI;
 use PVE::QemuServer::QMPHelpers;
 use PVE::QemuServer::USB;
+use PVE::QemuServer::DBusVMState;
 use PVE::QemuMigrate;
 use PVE::RPCEnvironment;
 use PVE::AccessControl;
@@ -3035,6 +3036,12 @@ __PACKAGE__->register_method({
 		default => 'max(30, vm memory in GiB)',
 		optional => 1,
 	    },
+	    'with-conntrack-state' => {
+		type => 'boolean',
+		optional => 1,
+		default => 0,
+		description => 'Whether to migrate conntrack entries for running VMs.',
+	    }
 	},
     },
     returns => {
@@ -3065,6 +3072,7 @@ __PACKAGE__->register_method({
 	my $migration_network = $get_root_param->('migration_network');
 	my $targetstorage = $get_root_param->('targetstorage');
 	my $force_cpu = $get_root_param->('force-cpu');
+	my $with_conntrack_state = $get_root_param->('with-conntrack-state');
 
 	my $storagemap;
 
@@ -3136,6 +3144,7 @@ __PACKAGE__->register_method({
 		    nbd_proto_version => $nbd_protocol_version,
 		    replicated_volumes => $replicated_volumes,
 		    offline_volumes => $offline_volumes,
+		    with_conntrack_state => $with_conntrack_state,
 		};
 
 		my $params = {
@@ -4675,6 +4684,11 @@ __PACKAGE__->register_method({
 		},
 		description => "List of mapped resources e.g. pci, usb"
 	    },
+	    'has-dbus-vmstate' => {
+		type => 'boolean',
+		description => 'Whether the VM host supports migrating additional VM state, '
+		    . 'such as conntrack entries.',
+	    }
 	},
     },
     code => sub {
@@ -4739,6 +4753,7 @@ __PACKAGE__->register_method({
 
 	$res->{local_resources} = $local_resources;
 	$res->{'mapped-resources'} = $mapped_resources;
+	$res->{'has-dbus-vmstate'} = 1;
 
 	return $res;
 
@@ -4800,6 +4815,12 @@ __PACKAGE__->register_method({
 		minimum => '0',
 		default => 'migrate limit from datacenter or storage config',
 	    },
+	    'with-conntrack-state' => {
+		type => 'boolean',
+		optional => 1,
+		default => 0,
+		description => 'Whether to migrate conntrack entries for running VMs.',
+	    }
 	},
     },
     returns => {
@@ -4855,6 +4876,7 @@ __PACKAGE__->register_method({
 	} else {
 	    warn "VM isn't running. Doing offline migration instead.\n" if $param->{online};
 	    $param->{online} = 0;
+	    $param->{'with-conntrack-state'} = 0;
 	}
 
 	my $storecfg = PVE::Storage::config();
@@ -6126,6 +6148,7 @@ __PACKAGE__->register_method({
 			    warn $@ if $@;
 			}
 
+			PVE::QemuServer::DBusVMState::qemu_del_dbus_vmstate($state->{vmid});
 			PVE::QemuServer::destroy_vm($state->{storecfg}, $state->{vmid}, 1);
 		    }
 
@@ -6299,4 +6322,53 @@ __PACKAGE__->register_method({
 	return { socket => $socket };
     }});
 
+__PACKAGE__->register_method({
+    name => 'dbus_vmstate',
+    path => '{vmid}/dbus-vmstate',
+    method => 'POST',
+    proxyto => 'node',
+    description => 'Stop the dbus-vmstate helper for the given VM if running.',
+    permissions => {
+	check => ['perm', '/vms/{vmid}', [ 'VM.Migrate' ]],
+    },
+    parameters => {
+	additionalProperties => 0,
+	properties => {
+	    node => get_standard_option('pve-node'),
+	    vmid => get_standard_option('pve-vmid', { completion => \&PVE::QemuServer::complete_vmid }),
+	    action => {
+		type => 'string',
+		enum => [qw(start stop)],
+		description => 'Action to perform on the DBus VMState helper.',
+		optional => 0,
+	    },
+	},
+    },
+    returns => {
+	type => 'null',
+    },
+    code => sub {
+	my ($param) = @_;
+	my ($node, $vmid, $action) = $param->@{qw(node vmid action)};
+
+	my $nodename = PVE::INotify::nodename();
+	if ($node ne 'localhost' && $node ne $nodename) {
+	    raise_param_exc({ node => "node needs to be 'localhost' or local hostname '$nodename'" });
+	}
+
+	if (!PVE::QemuServer::Helpers::vm_running_locally($vmid)) {
+	    raise_param_exc({ node => "VM $vmid not running locally on node '$nodename'" });
+	}
+
+	if ($action eq 'start') {
+	   syslog('info', "starting dbus-vmstate helper for VM $vmid\n");
+	   PVE::QemuServer::DBusVMState::qemu_add_dbus_vmstate($vmid);
+	} elsif ($action eq 'stop') {
+	   syslog('info', "stopping dbus-vmstate helper for VM $vmid\n");
+	   PVE::QemuServer::DBusVMState::qemu_del_dbus_vmstate($vmid);
+	} else {
+	    die "unknown action $action\n";
+	}
+    }});
+
 1;
diff --git a/PVE/CLI/qm.pm b/PVE/CLI/qm.pm
index 3e3a4c91..32c7629c 100755
--- a/PVE/CLI/qm.pm
+++ b/PVE/CLI/qm.pm
@@ -36,6 +36,7 @@ use PVE::QemuServer::Agent qw(agent_available);
 use PVE::QemuServer::ImportDisk;
 use PVE::QemuServer::Monitor qw(mon_cmd);
 use PVE::QemuServer::QMPHelpers;
+use PVE::QemuServer::DBusVMState;
 use PVE::QemuServer;
 
 use PVE::CLIHandler;
@@ -965,6 +966,10 @@ __PACKAGE__->register_method({
 		# vm was shutdown from inside the guest or crashed, doing api cleanup
 		PVE::QemuServer::vm_stop_cleanup($storecfg, $vmid, $conf, 0, 0, 1);
 	    }
+
+	    # ensure that no dbus-vmstate helper is left running in any case
+	    PVE::QemuServer::DBusVMState::qemu_del_dbus_vmstate($vmid);
+
 	    PVE::GuestHelpers::exec_hookscript($conf, $vmid, 'post-stop');
 
 	    $restart = eval { PVE::QemuServer::clear_reboot_request($vmid) };
diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index c2e36334..7704a38e 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -32,6 +32,7 @@ use PVE::QemuServer::Machine;
 use PVE::QemuServer::Monitor qw(mon_cmd);
 use PVE::QemuServer::Memory qw(get_current_memory);
 use PVE::QemuServer::QMPHelpers;
+use PVE::QemuServer::DBusVMState;
 use PVE::QemuServer;
 
 use PVE::AbstractMigrate;
@@ -224,6 +225,21 @@ sub prepare {
 	# Do not treat a suspended VM as paused, as it might wake up
 	# during migration and remain paused after migration finishes.
 	$self->{vm_was_paused} = 1 if PVE::QemuServer::vm_is_paused($vmid, 0);
+
+	if ($self->{opts}->{'with-conntrack-state'}) {
+	    if ($self->{opts}->{remote}) {
+		# shouldn't be reached in normal circumstances anyway, as we prevent it on
+		# an API level
+		$self->log('warn', 'conntrack state migration not supported for remote migrations, '
+		    . 'active connections might get dropped');
+		$self->{opts}->{'with-conntrack-state'} = 0;
+	    } else {
+		PVE::QemuServer::DBusVMState::qemu_add_dbus_vmstate($vmid);
+	    }
+	} else {
+	    $self->log('warn', 'conntrack state migration not supported or enabled, '
+		. 'active connections might get dropped');
+	}
     }
 
     my ($loc_res, $mapped_res, $missing_mappings_by_node) = PVE::QemuServer::check_local_resources($conf, $running, 1);
@@ -859,6 +875,14 @@ sub phase1_cleanup {
     if (my $err =$@) {
 	$self->log('err', $err);
     }
+
+    if ($self->{running} && $self->{opts}->{'with-conntrack-state'}) {
+	# if the VM is running, that means we also tried to migrate additional
+	# state via our dbus-vmstate helper
+	# only need to locally stop it, on the target the VM cleanup will
+	# handle it
+	PVE::QemuServer::DBusVMState::qemu_del_dbus_vmstate($vmid);
+    }
 }
 
 sub phase2_start_local_cluster {
@@ -905,6 +929,10 @@ sub phase2_start_local_cluster {
 	push @$cmd, '--targetstorage', ($self->{opts}->{targetstorage} // '1');
     }
 
+    if ($self->{opts}->{'with-conntrack-state'}) {
+	push @$cmd, '--with-conntrack-state';
+    }
+
     my $spice_port;
     my $input = "nbd_protocol_version: $migrate->{nbd_proto_version}\n";
 
@@ -1434,6 +1462,13 @@ sub phase2_cleanup {
 	$self->log('err', $err);
     }
 
+    if ($self->{running} && $self->{opts}->{'with-conntrack-state'}) {
+	# if the VM is running, that means we also tried to migrate additional
+	# state via our dbus-vmstate helper
+	# only need to locally stop it, on the target the VM cleanup will
+	# handle it
+	PVE::QemuServer::DBusVMState::qemu_del_dbus_vmstate($vmid);
+    }
 
     if ($self->{tunnel}) {
 	eval { PVE::Tunnel::finish_tunnel($self->{tunnel});  };
@@ -1556,6 +1591,35 @@ sub phase3_cleanup {
 		$self->log('info', "skipping guest fstrim, because VM is paused");
 	    }
 	}
+
+	if ($self->{running} && $self->{opts}->{'with-conntrack-state'}) {
+	    # if the VM is running, that means we also migrated additional
+	    # state via our dbus-vmstate helper
+	    $self->log('info', 'stopping migration dbus-vmstate helpers');
+
+	    # first locally
+	    my $num = PVE::QemuServer::DBusVMState::qemu_del_dbus_vmstate($vmid);
+	    if (defined($num)) {
+		my $plural = $num > 1 ? "entries" : "entry";
+		$self->log('info', "migrated $num conntrack state $plural");
+	    }
+
+	    # .. and then remote
+	    my $targetnode = $self->{node};
+	    eval {
+		# FIXME: introduce proper way to call API methods on another node?
+		# See also e.g. pve-network/src/PVE/API2/Network/SDN.pm, which
+		# does something similar.
+		PVE::Tools::run_command([
+		    'pvesh', 'create',
+		    "/nodes/$targetnode/qemu/$vmid/dbus-vmstate",
+		    '--action', 'stop',
+		]);
+	    };
+	    if (my $err = $@) {
+		$self->log('warn', "failed to stop dbus-vmstate on $targetnode: $err\n");
+	    }
+	}
     }
 
     # close tunnel on successful migration, on error phase2_cleanup closed it
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index ffd5d56b..211e02ad 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -62,6 +62,7 @@ use PVE::QemuServer::Monitor qw(mon_cmd);
 use PVE::QemuServer::PCI qw(print_pci_addr print_pcie_addr print_pcie_root_port parse_hostpci);
 use PVE::QemuServer::QMPHelpers qw(qemu_deviceadd qemu_devicedel qemu_objectadd qemu_objectdel);
 use PVE::QemuServer::USB;
+use PVE::QemuServer::DBusVMState;
 
 my $have_sdn;
 eval {
@@ -5559,6 +5560,7 @@ sub vm_start {
 #   replicated_volumes => which volids should be re-used with bitmaps for nbd migration
 #   offline_volumes => new volids of offline migrated disks like tpmstate and cloudinit, not yet
 #       contained in config
+#   with_conntrack_state => whether to start the dbus-vmstate helper for conntrack state migration
 sub vm_start_nolock {
     my ($storecfg, $vmid, $conf, $params, $migrate_opts) = @_;
 
@@ -5956,6 +5958,10 @@ sub vm_start_nolock {
 	    }
 	}
 
+        # conntrack migration is only supported for intra-cluster migrations
+	if ($migrate_opts->{with_conntrack_state} && !$migrate_opts->{remote_node}) {
+	    PVE::QemuServer::DBusVMState::qemu_add_dbus_vmstate($vmid);
+	}
     } else {
 	mon_cmd($vmid, "balloon", value => $conf->{balloon}*1024*1024)
 	    if !$statefile && $conf->{balloon};
diff --git a/PVE/QemuServer/DBusVMState.pm b/PVE/QemuServer/DBusVMState.pm
new file mode 100644
index 00000000..b2e14b7f
--- /dev/null
+++ b/PVE/QemuServer/DBusVMState.pm
@@ -0,0 +1,124 @@
+package PVE::QemuServer::DBusVMState;
+
+use strict;
+use warnings;
+
+use PVE::SafeSyslog;
+use PVE::Systemd;
+use PVE::Tools;
+
+use constant {
+    DBUS_VMSTATE_EXE => '/usr/libexec/qemu-server/dbus-vmstate',
+};
+
+# Retrieves a property from an object from a specific interface name.
+# In contrast to accessing the property directly by using $obj->Property, this
+# actually respects the owner of the object and thus can be used for interfaces
+# with might have multiple (queued) owners on the DBus.
+my sub dbus_get_property {
+    my ($obj, $interface, $name) = @_;
+
+    my $con = $obj->{service}->get_bus()->get_connection();
+
+    my $call = $con->make_method_call_message(
+        $obj->{service}->get_service_name(),
+        $obj->{object_path},
+        'org.freedesktop.DBus.Properties',
+        'Get',
+    );
+
+    $call->set_destination($obj->get_service()->get_owner_name());
+    $call->append_args_list($interface, $name);
+
+    my @reply = $con->send_with_reply_and_block($call, 10 * 1000)->get_args_list();
+    return $reply[0];
+}
+
+# Starts the dbus-vmstate helper D-Bus service daemon and adds the needed
+# object to the appropriate QEMU instance for the specified VM.
+sub qemu_add_dbus_vmstate {
+    my ($vmid) = @_;
+
+    if (!PVE::QemuServer::Helpers::vm_running_locally($vmid)) {
+        die "VM $vmid must be running locally\n";
+    }
+
+    # In case some leftover, previous instance is running, stop it. Otherwise
+    # we run into errors, as a systemd scope is unique.
+    if (defined(qemu_del_dbus_vmstate($vmid, quiet => 1))) {
+        warn "stopped previously running dbus-vmstate helper for VM $vmid\n";
+    }
+
+    # This also ensures that only ever one instance can run
+    PVE::Systemd::enter_systemd_scope(
+        "pve-dbus-vmstate-$vmid",
+        "Proxmox VE dbus-vmstate helper (VM $vmid)",
+    );
+
+    PVE::Tools::run_fork_detached(sub {
+        exec {DBUS_VMSTATE_EXE} DBUS_VMSTATE_EXE, $vmid;
+        die "exec failed: $!\n";
+    });
+}
+
+# Stops the dbus-vmstate helper D-Bus service daemon and removes the associated
+# object from QEMU for the specified VM.
+#
+# Returns the number of migrated conntrack entries, or undef in case of error.
+sub qemu_del_dbus_vmstate {
+    my ($vmid, %params) = @_;
+
+    my $num_entries = undef;
+    my $dbus = Net::DBus->system();
+    my $dbus_obj = $dbus->get_bus_object();
+
+    my $owners = eval { $dbus_obj->ListQueuedOwners('org.qemu.VMState1') };
+    if (my $err = $@) {
+        syslog('warn', "failed to retrieve org.qemu.VMState1 owners: $err\n")
+            if !$params{quiet};
+        return undef;
+    }
+
+    # Iterate through all name owners for 'org.qemu.VMState1' and compare
+    # the ID. If we found the corresponding one for $vmid, call our `Quit` method.
+    # Any D-Bus interaction might die/croak, so try to be careful here and swallow
+    # any hard errors.
+    foreach my $owner (@$owners) {
+        my $service = eval { Net::DBus::RemoteService->new($dbus, $owner, 'org.qemu.VMState1') };
+        if (my $err = $@) {
+            syslog('warn', "failed to get org.qemu.VMState1 service from D-Bus $owner: $err\n")
+                if !$params{quiet};
+            next;
+        }
+
+        my $object = eval { $service->get_object('/org/qemu/VMState1') };
+        if (my $err = $@) {
+            syslog('warn', "failed to get /org/qemu/VMState1 object from D-Bus $owner: $err\n")
+                if !$params{quiet};
+            next;
+        }
+
+        my $id = eval { dbus_get_property($object, 'org.qemu.VMState1', 'Id') };
+        if (defined($id) && $id eq "pve-vmstate-$vmid") {
+            my $helperobj = eval { $service->get_object('/org/qemu/VMState1', 'com.proxmox.VMStateHelper') };
+            if (my $err = $@) {
+                syslog('warn', "found dbus-vmstate helper, but does not implement com.proxmox.VMStateHelper? ($err)\n")
+                    if !$params{quiet};
+                last;
+            }
+
+            $num_entries = eval { dbus_get_property($object, 'com.proxmox.VMStateHelper', 'NumMigratedEntries') };
+            eval { $object->Quit() };
+            if (my $err = $@) {
+                syslog('warn', "failed to call quit on dbus-vmstate for VM $vmid: $err\n")
+                    if !$params{quiet};
+            }
+
+            last;
+        }
+    }
+
+    return $num_entries;
+}
+
+1;
diff --git a/PVE/QemuServer/Makefile b/PVE/QemuServer/Makefile
index 18fd13ea..8226bd2f 100644
--- a/PVE/QemuServer/Makefile
+++ b/PVE/QemuServer/Makefile
@@ -3,6 +3,7 @@ SOURCES=PCI.pm		\
 	Memory.pm	\
 	ImportDisk.pm	\
 	Cloudinit.pm	\
+	DBusVMState.pm	\
 	Agent.pm	\
 	Helpers.pm	\
 	Monitor.pm	\
-- 
2.48.1



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pve-devel] [PATCH manager 09/14] api2: capabilities: explicitly import CPU capabilities module
  2025-03-17 14:11 [pve-devel] [PATCH many 00/14] fix #5180: migrate conntrack state on live migration Christoph Heiss
                   ` (7 preceding siblings ...)
  2025-03-17 14:11 ` [pve-devel] [PATCH qemu-server 08/14] fix #5180: migrate: integrate helper for live-migrating conntrack info Christoph Heiss
@ 2025-03-17 14:11 ` Christoph Heiss
  2025-03-17 14:11 ` [pve-devel] [PATCH manager 10/14] api2: capabilities: proxy index endpoints to respective nodes Christoph Heiss
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Christoph Heiss @ 2025-03-17 14:11 UTC (permalink / raw)
  To: pve-devel

This currently works only by pure chance, as it seems to be already
imported somewhere else.

Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
---
 PVE/API2/Capabilities.pm | 1 +
 1 file changed, 1 insertion(+)

diff --git a/PVE/API2/Capabilities.pm b/PVE/API2/Capabilities.pm
index c88c6c46f..7e447b7da 100644
--- a/PVE/API2/Capabilities.pm
+++ b/PVE/API2/Capabilities.pm
@@ -6,6 +6,7 @@ use warnings;
 use PVE::JSONSchema qw(get_standard_option);
 use PVE::RESTHandler;
 
+use PVE::API2::Qemu::CPU;
 use PVE::API2::Qemu::Machine;
 
 use base qw(PVE::RESTHandler);
-- 
2.48.1



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pve-devel] [PATCH manager 10/14] api2: capabilities: proxy index endpoints to respective nodes
  2025-03-17 14:11 [pve-devel] [PATCH many 00/14] fix #5180: migrate conntrack state on live migration Christoph Heiss
                   ` (8 preceding siblings ...)
  2025-03-17 14:11 ` [pve-devel] [PATCH manager 09/14] api2: capabilities: explicitly import CPU capabilities module Christoph Heiss
@ 2025-03-17 14:11 ` Christoph Heiss
  2025-03-17 14:11 ` [pve-devel] [PATCH manager 11/14] api2: capabilities: expose new qemu/migration endpoint Christoph Heiss
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Christoph Heiss @ 2025-03-17 14:11 UTC (permalink / raw)
  To: pve-devel

Nodes might have different capabilities, depending on their version.
This ensures that always the requested is actually queryied.

Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
---
 PVE/API2/Capabilities.pm | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/PVE/API2/Capabilities.pm b/PVE/API2/Capabilities.pm
index 7e447b7da..95b13d3d9 100644
--- a/PVE/API2/Capabilities.pm
+++ b/PVE/API2/Capabilities.pm
@@ -27,6 +27,7 @@ __PACKAGE__->register_method ({
     path => '',
     method => 'GET',
     permissions => { user => 'all' },
+    proxyto => 'node',
     description => "Node capabilities index.",
     parameters => {
 	additionalProperties => 0,
@@ -59,6 +60,7 @@ __PACKAGE__->register_method ({
     path => 'qemu',
     method => 'GET',
     permissions => { user => 'all' },
+    proxyto => 'node',
     description => "QEMU capabilities index.",
     parameters => {
 	additionalProperties => 0,
-- 
2.48.1



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pve-devel] [PATCH manager 11/14] api2: capabilities: expose new qemu/migration endpoint
  2025-03-17 14:11 [pve-devel] [PATCH many 00/14] fix #5180: migrate conntrack state on live migration Christoph Heiss
                   ` (9 preceding siblings ...)
  2025-03-17 14:11 ` [pve-devel] [PATCH manager 10/14] api2: capabilities: proxy index endpoints to respective nodes Christoph Heiss
@ 2025-03-17 14:11 ` Christoph Heiss
  2025-03-17 14:11 ` [pve-devel] [PATCH manager 12/14] ui: window: Migrate: add checkbox for migrating VM conntrack state Christoph Heiss
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Christoph Heiss @ 2025-03-17 14:11 UTC (permalink / raw)
  To: pve-devel

This endpoint provides information about migration capabilities of the
node. Currently, only support for dbus-vmstate is indicated.

Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
---
Depends on patch #6 to qemu-server & appropriate dependency bump.

 PVE/API2/Capabilities.pm | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/PVE/API2/Capabilities.pm b/PVE/API2/Capabilities.pm
index 95b13d3d9..469e23264 100644
--- a/PVE/API2/Capabilities.pm
+++ b/PVE/API2/Capabilities.pm
@@ -8,6 +8,7 @@ use PVE::RESTHandler;
 
 use PVE::API2::Qemu::CPU;
 use PVE::API2::Qemu::Machine;
+use PVE::API2::Qemu::Migration;
 
 use base qw(PVE::RESTHandler);
 
@@ -21,6 +22,10 @@ __PACKAGE__->register_method ({
     path => 'qemu/machines',
 });
 
+__PACKAGE__->register_method ({
+    subclass => 'PVE::API2::Qemu::Migration',
+    path => 'qemu/migration',
+});
 
 __PACKAGE__->register_method ({
     name => 'index',
@@ -82,6 +87,7 @@ __PACKAGE__->register_method ({
 	my $result = [
 	    { name => 'cpu' },
 	    { name => 'machines' },
+	    { name => 'migration' },
 	];
 
 	return $result;
-- 
2.48.1



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pve-devel] [PATCH manager 12/14] ui: window: Migrate: add checkbox for migrating VM conntrack state
  2025-03-17 14:11 [pve-devel] [PATCH many 00/14] fix #5180: migrate conntrack state on live migration Christoph Heiss
                   ` (10 preceding siblings ...)
  2025-03-17 14:11 ` [pve-devel] [PATCH manager 11/14] api2: capabilities: expose new qemu/migration endpoint Christoph Heiss
@ 2025-03-17 14:11 ` Christoph Heiss
  2025-03-17 14:11 ` [pve-devel] [RFC PATCH firewall 13/14] firewall: helpers: add sub for flushing conntrack entries by mark Christoph Heiss
  2025-03-17 14:11 ` [pve-devel] [RFC PATCH qemu-server 14/14] migrate: flush old VM conntrack entries after successful migration Christoph Heiss
  13 siblings, 0 replies; 15+ messages in thread
From: Christoph Heiss @ 2025-03-17 14:11 UTC (permalink / raw)
  To: pve-devel

Adds a new checkbox to the migration dialog, if it is a
live/online-migration and both the source and target nodes have support
for our dbus-vmstate helper.

If the checkbox is active, it passes along the `with-conntrack-state`
parameter to the migrate API call.

Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
---
 www/manager6/window/Migrate.js | 73 ++++++++++++++++++++++++++++++++--
 1 file changed, 69 insertions(+), 4 deletions(-)

diff --git a/www/manager6/window/Migrate.js b/www/manager6/window/Migrate.js
index 78d03921e..e8a68eed9 100644
--- a/www/manager6/window/Migrate.js
+++ b/www/manager6/window/Migrate.js
@@ -28,8 +28,9 @@ Ext.define('PVE.window.Migrate', {
 		allowedNodes: undefined,
 		overwriteLocalResourceCheck: false,
 		hasLocalResources: false,
+		withConntrackState: true,
+		bothHaveDbusVmstate: false,
 	    },
-
 	},
 
 	formulas: {
@@ -59,6 +60,9 @@ Ext.define('PVE.window.Migrate', {
 		    return false;
 		}
 	    },
+	    conntrackStateCheckboxHidden: get =>
+		!get('running') || get('vmtype') !== 'qemu' ||
+		!get('migration.bothHaveDbusVmstate'),
 	},
     },
 
@@ -133,6 +137,10 @@ Ext.define('PVE.window.Migrate', {
 		params.force = 1;
 	    }
 
+	    if (vm.get('migration.bothHaveDbusVmstate') && vm.get('migration.withConntrackState')) {
+		params['with-conntrack-state'] = 1;
+	    }
+
 	    Proxmox.Utils.API2Request({
 		params: params,
 		url: '/nodes/' + vm.get('nodename') + '/' + vm.get('vmtype') + '/' + vm.get('vmid') + '/migrate',
@@ -199,12 +207,28 @@ Ext.define('PVE.window.Migrate', {
 		    method: 'GET',
 		});
 		migrateStats = result.data;
-		me.fetchingNodeMigrateInfo = false;
 	    } catch (error) {
 		Ext.Msg.alert(gettext('Error'), error.htmlStatus);
+		me.fetchingNodeMigrateInfo = false;
 		return;
 	    }
 
+	    const target = me.lookup('pveNodeSelector').value;
+	    let targetCapabilities = {};
+
+	    try {
+		const { result } = await Proxmox.Async.api2({
+		    url: `/nodes/${target}/capabilities/qemu/migration`,
+		    method: 'GET',
+		});
+		targetCapabilities = result.data;
+	    } catch (err) {
+		// In the case the target node does not (yet) support the
+		// `capabilites/qemu/migration` endpoint, just ignore it.
+	    }
+
+	    me.fetchingNodeMigrateInfo = false;
+
 	    if (migrateStats.running) {
 		vm.set('running', true);
 	    }
@@ -217,7 +241,6 @@ Ext.define('PVE.window.Migrate', {
 
 	    if (migrateStats.allowed_nodes) {
 		migration.allowedNodes = migrateStats.allowed_nodes;
-		let target = me.lookup('pveNodeSelector').value;
 		if (target.length && !migrateStats.allowed_nodes.includes(target)) {
 		    let disallowed = migrateStats.not_allowed_nodes[target] ?? {};
 		    if (disallowed.unavailable_storages !== undefined) {
@@ -303,6 +326,29 @@ Ext.define('PVE.window.Migrate', {
 		});
 	    }
 
+	    migration.bothHaveDbusVmstate = migrateStats['has-dbus-vmstate'] && targetCapabilities['has-dbus-vmstate'];
+	    if (vm.get('running')) {
+		if (migration.withConntrackState && !migrateStats['has-dbus-vmstate']) {
+		    migration.preconditions.push({
+			text: gettext('Cannot migrate conntrack state, source node is lacking support. Active network connections might get dropped.'),
+			severity: 'warning',
+		    });
+		}
+		if (migration.withConntrackState && !targetCapabilities['has-dbus-vmstate']) {
+		    migration.preconditions.push({
+			text: gettext('Cannot migrate conntrack state, target node is lacking support. Active network connections might get dropped.'),
+			severity: 'warning',
+		    });
+		}
+
+		if (migration.bothHaveDbusVmstate && !migration.withConntrackState) {
+		    migration.preconditions.push({
+			text: gettext('Conntrack state migration disabled. Active network connections might get dropped.'),
+			severity: 'warning',
+		    });
+		}
+	    }
+
 	    vm.set('migration', migration);
 	},
 	checkLxcPreconditions: function(resetMigrationPossible) {
@@ -394,7 +440,26 @@ Ext.define('PVE.window.Migrate', {
 				extraArg: true,
 			    },
 			},
-		}],
+		    },
+		    {
+			xtype: 'proxmoxcheckbox',
+			name: 'withConntrackState',
+			fieldLabel: gettext('Conntrack state'),
+			autoEl: {
+			    tag: 'div',
+			    'data-qtip': gettext('Enables live migration of conntrack entries for this VM.'),
+			},
+			bind: {
+			    hidden: '{conntrackStateCheckboxHidden}',
+			    value: '{migration.withConntrackState}',
+			},
+			listeners: {
+			    change: {
+				fn: 'checkMigratePreconditions',
+				extraArg: true,
+			    },
+			},
+		    }],
 		},
 	    ],
 	},
-- 
2.48.1



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pve-devel] [RFC PATCH firewall 13/14] firewall: helpers: add sub for flushing conntrack entries by mark
  2025-03-17 14:11 [pve-devel] [PATCH many 00/14] fix #5180: migrate conntrack state on live migration Christoph Heiss
                   ` (11 preceding siblings ...)
  2025-03-17 14:11 ` [pve-devel] [PATCH manager 12/14] ui: window: Migrate: add checkbox for migrating VM conntrack state Christoph Heiss
@ 2025-03-17 14:11 ` Christoph Heiss
  2025-03-17 14:11 ` [pve-devel] [RFC PATCH qemu-server 14/14] migrate: flush old VM conntrack entries after successful migration Christoph Heiss
  13 siblings, 0 replies; 15+ messages in thread
From: Christoph Heiss @ 2025-03-17 14:11 UTC (permalink / raw)
  To: pve-devel

A small helper routine for flushing all conntrack table entries which
are marked with a specific value.

Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
---
 debian/control              |  3 ++-
 src/PVE/Firewall/Helpers.pm | 11 +++++++++++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/debian/control b/debian/control
index 2e8e528..59c45af 100644
--- a/debian/control
+++ b/debian/control
@@ -17,7 +17,8 @@ Standards-Version: 4.6.2
 Package: pve-firewall
 Architecture: any
 Conflicts: ulogd,
-Depends: ebtables,
+Depends: conntrack,
+         ebtables,
          ipset,
          iptables,
          libpve-access-control,
diff --git a/src/PVE/Firewall/Helpers.pm b/src/PVE/Firewall/Helpers.pm
index 0b465ae..1c1692c 100644
--- a/src/PVE/Firewall/Helpers.pm
+++ b/src/PVE/Firewall/Helpers.pm
@@ -16,6 +16,7 @@ lock_vmfw_conf
 remove_vmfw_conf
 clone_vmfw_conf
 collect_refs
+flush_fw_ct_entries_by_mark
 );
 
 my $pvefw_conf_dir = "/etc/pve/firewall";
@@ -181,4 +182,14 @@ sub collect_refs {
     return $res;
 }
 
+# Flushes all conntrack table entries which are CONNMARK'd with the specified value.
+sub flush_fw_ct_entries_by_mark {
+    my ($mark) = @_;
+
+    PVE::Tools::run_command(
+	['conntrack', '--delete', '--mark', $mark],
+	noerr => 1, quiet => 1,
+    );
+}
+
 1;
-- 
2.48.1



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pve-devel] [RFC PATCH qemu-server 14/14] migrate: flush old VM conntrack entries after successful migration
  2025-03-17 14:11 [pve-devel] [PATCH many 00/14] fix #5180: migrate conntrack state on live migration Christoph Heiss
                   ` (12 preceding siblings ...)
  2025-03-17 14:11 ` [pve-devel] [RFC PATCH firewall 13/14] firewall: helpers: add sub for flushing conntrack entries by mark Christoph Heiss
@ 2025-03-17 14:11 ` Christoph Heiss
  13 siblings, 0 replies; 15+ messages in thread
From: Christoph Heiss @ 2025-03-17 14:11 UTC (permalink / raw)
  To: pve-devel

After a successful live-migration, the old VM-specific conntrack entries
are not needed anymore on the source node and can thus be flushed.

Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
---
Depends on patch #13 for pve-firewall & a bump thereof.

Marked RFC since technically this isn't really needed. But as we can now
easily filter/differentiate traffic on a per-VM basis thanks to the
connmark, we can flush them. It's a nice-to-have, optional cleanup,
IMHO.

 PVE/QemuMigrate.pm | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index 7d32fc00..12723b4e 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -11,6 +11,7 @@ use Time::HiRes qw( usleep );
 use PVE::AccessControl;
 use PVE::Cluster;
 use PVE::Format qw(render_bytes);
+use PVE::Firewall::Helpers;
 use PVE::GuestHelpers qw(safe_boolean_ne safe_string_ne);
 use PVE::INotify;
 use PVE::JSONSchema;
@@ -1614,6 +1615,10 @@ sub phase3_cleanup {
 	    if (my $err = $@) {
 		$self->log('warn', "failed to stop dbus-vmstate on $targetnode: $err\n");
 	    }
+
+	    # also flush now-old local conntrack entries for the migrated VM
+	    $self->log('info', 'flushing conntrack state for guest on source');
+	    PVE::Firewall::Helpers::flush_fw_ct_entries_by_mark($vmid);
 	}
     }
 
-- 
2.48.1



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-03-17 14:18 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-17 14:11 [pve-devel] [PATCH many 00/14] fix #5180: migrate conntrack state on live migration Christoph Heiss
2025-03-17 14:11 ` [pve-devel] [PATCH proxmox-ve-rs 01/14] config: guest: allow access to raw Vmid value Christoph Heiss
2025-03-17 14:11 ` [pve-devel] [PATCH proxmox-firewall 02/14] firewall: add connmark rule with VMID to all guest chains Christoph Heiss
2025-03-17 14:11 ` [pve-devel] [PATCH pve-firewall 03/14] " Christoph Heiss
2025-03-17 14:11 ` [pve-devel] [PATCH common 04/14] tools: add run_fork_detached() for spawning daemons Christoph Heiss
2025-03-17 14:11 ` [pve-devel] [PATCH qemu-server 05/14] qmp helpers: allow passing structured args via qemu_objectadd() Christoph Heiss
2025-03-17 14:11 ` [pve-devel] [PATCH qemu-server 06/14] api2: qemu: add module exposing node migration capabilities Christoph Heiss
2025-03-17 14:11 ` [pve-devel] [PATCH qemu-server 07/14] fix #5180: libexec: add QEMU dbus-vmstate daemon for migrating conntrack Christoph Heiss
2025-03-17 14:11 ` [pve-devel] [PATCH qemu-server 08/14] fix #5180: migrate: integrate helper for live-migrating conntrack info Christoph Heiss
2025-03-17 14:11 ` [pve-devel] [PATCH manager 09/14] api2: capabilities: explicitly import CPU capabilities module Christoph Heiss
2025-03-17 14:11 ` [pve-devel] [PATCH manager 10/14] api2: capabilities: proxy index endpoints to respective nodes Christoph Heiss
2025-03-17 14:11 ` [pve-devel] [PATCH manager 11/14] api2: capabilities: expose new qemu/migration endpoint Christoph Heiss
2025-03-17 14:11 ` [pve-devel] [PATCH manager 12/14] ui: window: Migrate: add checkbox for migrating VM conntrack state Christoph Heiss
2025-03-17 14:11 ` [pve-devel] [RFC PATCH firewall 13/14] firewall: helpers: add sub for flushing conntrack entries by mark Christoph Heiss
2025-03-17 14:11 ` [pve-devel] [RFC PATCH qemu-server 14/14] migrate: flush old VM conntrack entries after successful migration Christoph Heiss

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal