* [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration
@ 2024-04-19 12:45 Dominik Csapak
2024-04-19 12:45 ` [pve-devel] [PATCH guest-common v3 1/4] mapping: pci: rework properties check Dominik Csapak
` (24 more replies)
0 siblings, 25 replies; 44+ messages in thread
From: Dominik Csapak @ 2024-04-19 12:45 UTC (permalink / raw)
To: pve-devel
and some useful cleanups
Resending even there was not much feedback, because i worked in some
minor fixes/changes in the meantime.
A user tested the previous patch series and only found one issue with
the ui, see the comments on bug #5175
https://bugzilla.proxmox.com/show_bug.cgi?id=5175
This is implemented for mapped resources. This requires driver and
hardware support, but aside from nvidia vgpus there don't seem to be
many drivers (if any) that do support that.
qemu already supports that for vfio-pci devices, so nothing to be
done there besides actively enabling it.
Since we currently can't properly test it here and very much depends on
hardware/driver support, mark it as experimental everywhere (docs/api/gui).
(though i tested the live-migration part manually here by using
"exec:cat > /tmp/test" for the migration target, and "exec: cat
/tmp/test" as the 'incoming' parameter for a new vm start, which worked ;) )
i opted for marking them migratable at the mapping level, but we could
theoretically also put it in the hostpciX config instead.
(though imho it fits better in the cluster-wide resource mapping config)
also the naming/texts could probably be improved, but i think
'live-migration-capable' is very descriptive and i didn't want to
use an overly short name for it (which can be confusing, see the
'shared' flag for storages)
guest-common 2/4 semi-breaks pve-manager without pve-manager 1/5
and qemu-server without 3/10
(would always fail for devices capable of mediated devices)
guest-common 1,2; qemu-server 1-6; pve-manager 1,2
are preparations/cleanups mostly and could be applied independently
changes from v2:
* rebased on master
* rework the rework of the properties check (pve-guest-common 1/4)
* properly check mdev in the gui (pve-manager 1/5)
remaining patches are unchaned
pve-guest-common:
Dominik Csapak (4):
mapping: pci: rework properties check
mapping: pci: check the mdev configuration on the device too
mapping: pci: add 'live-migration-capable' flag to mappings
mapping: remove find_on_current_node
src/PVE/Mapping/PCI.pm | 55 +++++++++++++++++++++++-------------------
src/PVE/Mapping/USB.pm | 9 -------
2 files changed, 30 insertions(+), 34 deletions(-)
qemu-server:
Dominik Csapak (10):
usb: mapping: move implementation of find_on_current_node here
pci: mapping: move implementation of find_on_current_node here
pci: mapping: check mdev config against hardware
stop cleanup: remove unnecessary tpmstate cleanup
vm_stop_cleanup: add noerr parameter
migrate: call vm_stop_cleanup after stopping in phase3_cleanup
pci: set 'enable-migration' to on for live-migration marked mapped
devices
check_local_resources: add more info per mapped device and return as
hash
api: enable live migration for marked mapped pci devices
api: include not mapped resources for running vms in migrate
preconditions
PVE/API2/Qemu.pm | 48 ++++++++++++++++++++++--------------
PVE/CLI/qm.pm | 2 +-
PVE/QemuMigrate.pm | 28 ++++++++++++---------
PVE/QemuServer.pm | 38 ++++++++++++++--------------
PVE/QemuServer/PCI.pm | 14 +++++++++--
PVE/QemuServer/USB.pm | 5 +++-
test/MigrationTest/Shared.pm | 3 +++
7 files changed, 84 insertions(+), 54 deletions(-)
pve-manager:
Dominik Csapak (5):
mapping: pci: include mdev in config checks
bulk migrate: improve precondition checks
bulk migrate: include checks for live-migratable local resources
ui: adapt migration window to precondition api change
fix #5175: ui: allow configuring and live migration of mapped pci
resources
PVE/API2/Cluster/Mapping/PCI.pm | 2 +-
PVE/API2/Nodes.pm | 27 ++++++++++++++--
www/manager6/dc/PCIMapView.js | 6 ++++
www/manager6/window/Migrate.js | 51 ++++++++++++++++++++-----------
www/manager6/window/PCIMapEdit.js | 12 ++++++++
5 files changed, 76 insertions(+), 22 deletions(-)
pve-docs:
Dominik Csapak (2):
qm: resource mapping: add description for `mdev` option
qm: resource mapping: document `live-migration-capable` setting
qm.adoc | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
--
2.39.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* [pve-devel] [PATCH guest-common v3 1/4] mapping: pci: rework properties check
2024-04-19 12:45 [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
@ 2024-04-19 12:45 ` Dominik Csapak
2024-05-31 11:37 ` Fiona Ebner
2024-04-19 12:45 ` [pve-devel] [PATCH guest-common v3 2/4] mapping: pci: check the mdev configuration on the device too Dominik Csapak
` (23 subsequent siblings)
24 siblings, 1 reply; 44+ messages in thread
From: Dominik Csapak @ 2024-04-19 12:45 UTC (permalink / raw)
To: pve-devel
rename '$cfg' to '$mapping', 'correct' to 'expected'
reword the error messages
also check keys from the configured props not only the expected ones
previously we only checked the keys from the 'correct_props' hash
but that was unintended. We now check the keys from both, but extract
the relevant properties first.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
changes from v2:
* don't refactor the properties check out
* use properties from both configured and expected hashes
* extract the relevant configured properties from the mapping
instead of using all (previously we only used the expected ones
by accident)
src/PVE/Mapping/PCI.pm | 34 +++++++++++++++++++---------------
1 file changed, 19 insertions(+), 15 deletions(-)
diff --git a/src/PVE/Mapping/PCI.pm b/src/PVE/Mapping/PCI.pm
index 725e106..ef1bd8d 100644
--- a/src/PVE/Mapping/PCI.pm
+++ b/src/PVE/Mapping/PCI.pm
@@ -131,9 +131,9 @@ sub options {
# checks if the given config is valid for the current node
sub assert_valid {
- my ($name, $cfg) = @_;
+ my ($name, $mapping) = @_;
- my @paths = split(';', $cfg->{path} // '');
+ my @paths = split(';', $mapping->{path} // '');
my $idx = 0;
for my $path (@paths) {
@@ -148,32 +148,36 @@ sub assert_valid {
my $info = PVE::SysFSTools::pci_device_info($path, 1);
die "pci device '$path' not found\n" if !defined($info);
- my $correct_props = {
+ my $expected_props = {
id => "$info->{vendor}:$info->{device}",
iommugroup => $info->{iommugroup},
};
if (defined($info->{'subsystem_vendor'}) && defined($info->{'subsystem_device'})) {
- $correct_props->{'subsystem-id'} = "$info->{'subsystem_vendor'}:$info->{'subsystem_device'}";
+ $expected_props->{'subsystem-id'} = "$info->{'subsystem_vendor'}:$info->{'subsystem_device'}";
}
- for my $prop (sort keys %$correct_props) {
+ my $configured_props = { $mapping->%{qw(id iommugroup subsystem-id)} };
+
+ my $merged = { %$expected_props, %$configured_props }; # just for the keys
+ for my $prop (sort keys %$merged) {
next if $prop eq 'iommugroup' && $idx > 0; # check iommu only on the first device
- next if !defined($correct_props->{$prop}) && !defined($cfg->{$prop});
- die "no '$prop' for device '$path'\n"
- if defined($correct_props->{$prop}) && !defined($cfg->{$prop});
- die "'$prop' configured but should not be\n"
- if !defined($correct_props->{$prop}) && defined($cfg->{$prop});
+ next if !defined($expected_props->{$prop}) && !defined($configured_props->{$prop});
+ die "missing expected property '$prop' for device '$path'\n"
+ if defined($expected_props->{$prop}) && !defined($configured_props->{$prop});
+ die "unexpected property '$prop' configured for device '$path'\n"
+ if !defined($expected_props->{$prop}) && defined($configured_props->{$prop});
- my $correct_prop = $correct_props->{$prop};
- $correct_prop =~ s/0x//g;
- my $configured_prop = $cfg->{$prop};
+ my $expected_prop = $expected_props->{$prop};
+ $expected_prop =~ s/0x//g;
+ my $configured_prop = $configured_props->{$prop};
$configured_prop =~ s/0x//g;
- die "'$prop' does not match for '$name' ($correct_prop != $configured_prop)\n"
- if $correct_prop ne $configured_prop;
+ die "'$prop' does not match for '$name' ($expected_prop != $configured_prop)\n"
+ if $expected_prop ne $configured_prop;
}
+
$idx++;
}
--
2.39.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* [pve-devel] [PATCH guest-common v3 2/4] mapping: pci: check the mdev configuration on the device too
2024-04-19 12:45 [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
2024-04-19 12:45 ` [pve-devel] [PATCH guest-common v3 1/4] mapping: pci: rework properties check Dominik Csapak
@ 2024-04-19 12:45 ` Dominik Csapak
2024-05-31 12:02 ` Fiona Ebner
2024-04-19 12:45 ` [pve-devel] [PATCH guest-common v3 3/4] mapping: pci: add 'live-migration-capable' flag to mappings Dominik Csapak
` (22 subsequent siblings)
24 siblings, 1 reply; 44+ messages in thread
From: Dominik Csapak @ 2024-04-19 12:45 UTC (permalink / raw)
To: pve-devel
but that lives int he 'global' part of the mapping config, not in a
specific mapping. To check that, add it to the $configured_props from
there.
this requires all call sites to be adapted otherwise the check will
always fail for devices that are capable of mediated devices
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
changes from v2:
* adapt to changes of previous patch
src/PVE/Mapping/PCI.pm | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/src/PVE/Mapping/PCI.pm b/src/PVE/Mapping/PCI.pm
index ef1bd8d..b412c4d 100644
--- a/src/PVE/Mapping/PCI.pm
+++ b/src/PVE/Mapping/PCI.pm
@@ -131,7 +131,7 @@ sub options {
# checks if the given config is valid for the current node
sub assert_valid {
- my ($name, $mapping) = @_;
+ my ($name, $mapping, $cfg) = @_;
my @paths = split(';', $mapping->{path} // '');
@@ -151,6 +151,7 @@ sub assert_valid {
my $expected_props = {
id => "$info->{vendor}:$info->{device}",
iommugroup => $info->{iommugroup},
+ mdev => $info->{mdev},
};
if (defined($info->{'subsystem_vendor'}) && defined($info->{'subsystem_device'})) {
@@ -158,6 +159,8 @@ sub assert_valid {
}
my $configured_props = { $mapping->%{qw(id iommugroup subsystem-id)} };
+ # check mdev from globabl mapping config
+ $configured_props->{mdev} = $cfg->{mdev};
my $merged = { %$expected_props, %$configured_props }; # just for the keys
for my $prop (sort keys %$merged) {
--
2.39.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* [pve-devel] [PATCH guest-common v3 3/4] mapping: pci: add 'live-migration-capable' flag to mappings
2024-04-19 12:45 [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
2024-04-19 12:45 ` [pve-devel] [PATCH guest-common v3 1/4] mapping: pci: rework properties check Dominik Csapak
2024-04-19 12:45 ` [pve-devel] [PATCH guest-common v3 2/4] mapping: pci: check the mdev configuration on the device too Dominik Csapak
@ 2024-04-19 12:45 ` Dominik Csapak
2024-04-19 12:45 ` [pve-devel] [PATCH guest-common v3 4/4] mapping: remove find_on_current_node Dominik Csapak
` (21 subsequent siblings)
24 siblings, 0 replies; 44+ messages in thread
From: Dominik Csapak @ 2024-04-19 12:45 UTC (permalink / raw)
To: pve-devel
so that we can decide in qemu-server to allow live-migration.
The driver and QEMU must be capable of that, and it's the
admin's responsibility to know and configure that
Mark the option as experimental in the description.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
src/PVE/Mapping/PCI.pm | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/src/PVE/Mapping/PCI.pm b/src/PVE/Mapping/PCI.pm
index b412c4d..b525c07 100644
--- a/src/PVE/Mapping/PCI.pm
+++ b/src/PVE/Mapping/PCI.pm
@@ -105,6 +105,13 @@ my $defaultData = {
optional => 1,
default => 0,
},
+ 'live-migration-capable' => {
+ description => "Marks the device(s) as being able to be live-migrated (Experimental)."
+ ." This needs hardware and driver support to work.",
+ type => 'boolean',
+ optional => 1,
+ default => 0,
+ },
map => {
type => 'array',
description => 'A list of maps for the cluster nodes.',
@@ -125,6 +132,7 @@ sub options {
return {
description => { optional => 1 },
mdev => { optional => 1 },
+ 'live-migration-capable' => { optional => 1 },
map => {},
};
}
--
2.39.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* [pve-devel] [PATCH guest-common v3 4/4] mapping: remove find_on_current_node
2024-04-19 12:45 [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
` (2 preceding siblings ...)
2024-04-19 12:45 ` [pve-devel] [PATCH guest-common v3 3/4] mapping: pci: add 'live-migration-capable' flag to mappings Dominik Csapak
@ 2024-04-19 12:45 ` Dominik Csapak
2024-05-31 12:09 ` Fiona Ebner
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 01/10] usb: mapping: move implementation of find_on_current_node here Dominik Csapak
` (20 subsequent siblings)
24 siblings, 1 reply; 44+ messages in thread
From: Dominik Csapak @ 2024-04-19 12:45 UTC (permalink / raw)
To: pve-devel
they only have one user each (where we can inline the implementation).
It's easy enough to recreate should we need to.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
src/PVE/Mapping/PCI.pm | 10 ----------
src/PVE/Mapping/USB.pm | 9 ---------
2 files changed, 19 deletions(-)
diff --git a/src/PVE/Mapping/PCI.pm b/src/PVE/Mapping/PCI.pm
index b525c07..9c2b8b2 100644
--- a/src/PVE/Mapping/PCI.pm
+++ b/src/PVE/Mapping/PCI.pm
@@ -214,16 +214,6 @@ sub write_pci_config {
cfs_write_file($FILENAME, $cfg);
}
-sub find_on_current_node {
- my ($id) = @_;
-
- my $cfg = PVE::Mapping::PCI::config();
- my $node = PVE::INotify::nodename();
-
- # ignore errors
- return get_node_mapping($cfg, $id, $node);
-}
-
sub get_node_mapping {
my ($cfg, $id, $nodename) = @_;
diff --git a/src/PVE/Mapping/USB.pm b/src/PVE/Mapping/USB.pm
index 34bc3cb..6dd15c4 100644
--- a/src/PVE/Mapping/USB.pm
+++ b/src/PVE/Mapping/USB.pm
@@ -155,15 +155,6 @@ sub write_usb_config {
cfs_write_file($FILENAME, $cfg);
}
-sub find_on_current_node {
- my ($id) = @_;
-
- my $cfg = config();
- my $node = PVE::INotify::nodename();
-
- return get_node_mapping($cfg, $id, $node);
-}
-
sub get_node_mapping {
my ($cfg, $id, $nodename) = @_;
--
2.39.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* [pve-devel] [PATCH qemu-server v3 01/10] usb: mapping: move implementation of find_on_current_node here
2024-04-19 12:45 [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
` (3 preceding siblings ...)
2024-04-19 12:45 ` [pve-devel] [PATCH guest-common v3 4/4] mapping: remove find_on_current_node Dominik Csapak
@ 2024-04-19 12:45 ` Dominik Csapak
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 02/10] pci: " Dominik Csapak
` (19 subsequent siblings)
24 siblings, 0 replies; 44+ messages in thread
From: Dominik Csapak @ 2024-04-19 12:45 UTC (permalink / raw)
To: pve-devel
this was the only user, and it's easy enough
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
PVE/QemuServer/USB.pm | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/PVE/QemuServer/USB.pm b/PVE/QemuServer/USB.pm
index 49957444..ecd0361d 100644
--- a/PVE/QemuServer/USB.pm
+++ b/PVE/QemuServer/USB.pm
@@ -5,6 +5,7 @@ use warnings;
use PVE::QemuServer::PCI qw(print_pci_addr);
use PVE::QemuServer::Machine;
use PVE::QemuServer::Helpers qw(min_version windows_version);
+use PVE::INotify;
use PVE::JSONSchema;
use PVE::Mapping::USB;
use base 'Exporter';
@@ -91,7 +92,9 @@ sub parse_usb_device {
$res->{spice} = 1;
}
} elsif (defined($mapping)) {
- my $devices = PVE::Mapping::USB::find_on_current_node($mapping);
+ my $config = PVE::Mapping::USB::config();
+ my $node = PVE::INotify::nodename();
+ my $devices = PVE::Mapping::USB::get_node_mapping($config, $mapping, $node);
die "USB device mapping not found for '$mapping'\n" if !$devices || !scalar($devices->@*);
die "More than one USB mapping per host not supported\n" if scalar($devices->@*) > 1;
eval {
--
2.39.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* [pve-devel] [PATCH qemu-server v3 02/10] pci: mapping: move implementation of find_on_current_node here
2024-04-19 12:45 [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
` (4 preceding siblings ...)
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 01/10] usb: mapping: move implementation of find_on_current_node here Dominik Csapak
@ 2024-04-19 12:45 ` Dominik Csapak
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 03/10] pci: mapping: check mdev config against hardware Dominik Csapak
` (18 subsequent siblings)
24 siblings, 0 replies; 44+ messages in thread
From: Dominik Csapak @ 2024-04-19 12:45 UTC (permalink / raw)
To: pve-devel
this was the only user, and it's easy enough
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
PVE/QemuServer/PCI.pm | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/PVE/QemuServer/PCI.pm b/PVE/QemuServer/PCI.pm
index 1673041b..7ff9cad7 100644
--- a/PVE/QemuServer/PCI.pm
+++ b/PVE/QemuServer/PCI.pm
@@ -3,6 +3,7 @@ package PVE::QemuServer::PCI;
use warnings;
use strict;
+use PVE::INotify;
use PVE::JSONSchema;
use PVE::Mapping::PCI;
use PVE::SysFSTools;
@@ -429,7 +430,9 @@ sub parse_hostpci {
if ($mapping) {
# we have no ordinary pci id, must be a mapping
- my $devices = PVE::Mapping::PCI::find_on_current_node($mapping);
+ my $config = PVE::Mapping::PCI::config();
+ my $node = PVE::INotify::nodename();
+ my $devices = PVE::Mapping::PCI::get_node_mapping($config, $mapping, $node);
die "PCI device mapping not found for '$mapping'\n" if !$devices || !scalar($devices->@*);
for my $device ($devices->@*) {
--
2.39.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* [pve-devel] [PATCH qemu-server v3 03/10] pci: mapping: check mdev config against hardware
2024-04-19 12:45 [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
` (5 preceding siblings ...)
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 02/10] pci: " Dominik Csapak
@ 2024-04-19 12:45 ` Dominik Csapak
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 04/10] stop cleanup: remove unnecessary tpmstate cleanup Dominik Csapak
` (17 subsequent siblings)
24 siblings, 0 replies; 44+ messages in thread
From: Dominik Csapak @ 2024-04-19 12:45 UTC (permalink / raw)
To: pve-devel
by giving the mapping config to assert_valid, not only the specific mapping
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
PVE/QemuServer/PCI.pm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/PVE/QemuServer/PCI.pm b/PVE/QemuServer/PCI.pm
index 7ff9cad7..6ba43ee8 100644
--- a/PVE/QemuServer/PCI.pm
+++ b/PVE/QemuServer/PCI.pm
@@ -436,7 +436,7 @@ sub parse_hostpci {
die "PCI device mapping not found for '$mapping'\n" if !$devices || !scalar($devices->@*);
for my $device ($devices->@*) {
- eval { PVE::Mapping::PCI::assert_valid($mapping, $device) };
+ eval { PVE::Mapping::PCI::assert_valid($mapping, $device, $config->{ids}->{$mapping}) };
die "PCI device mapping invalid (hardware probably changed): $@\n" if $@;
push $alternatives->@*, [split(/;/, $device->{path})];
}
--
2.39.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* [pve-devel] [PATCH qemu-server v3 04/10] stop cleanup: remove unnecessary tpmstate cleanup
2024-04-19 12:45 [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
` (6 preceding siblings ...)
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 03/10] pci: mapping: check mdev config against hardware Dominik Csapak
@ 2024-04-19 12:45 ` Dominik Csapak
2024-05-31 12:56 ` Fiona Ebner
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 05/10] vm_stop_cleanup: add noerr parameter Dominik Csapak
` (16 subsequent siblings)
24 siblings, 1 reply; 44+ messages in thread
From: Dominik Csapak @ 2024-04-19 12:45 UTC (permalink / raw)
To: pve-devel
tpmstate0 is already included in `get_vm_volumes`, and our only storage
plugin that has unmap_volume implemented is the RBDPlugin, where we call
unmap in `deactivate_volume`. So it's already ummapped by the
`deactivate_volumes` calls above.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
PVE/QemuServer.pm | 8 --------
1 file changed, 8 deletions(-)
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 28e630d3..680c36f2 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -6172,14 +6172,6 @@ sub vm_stop_cleanup {
if (!$keepActive) {
my $vollist = get_vm_volumes($conf);
PVE::Storage::deactivate_volumes($storecfg, $vollist);
-
- if (my $tpmdrive = $conf->{tpmstate0}) {
- my $tpm = parse_drive("tpmstate0", $tpmdrive);
- my ($storeid, $volname) = PVE::Storage::parse_volume_id($tpm->{file}, 1);
- if ($storeid) {
- PVE::Storage::unmap_volume($storecfg, $tpm->{file});
- }
- }
}
foreach my $ext (qw(mon qmp pid vnc qga)) {
--
2.39.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* [pve-devel] [PATCH qemu-server v3 05/10] vm_stop_cleanup: add noerr parameter
2024-04-19 12:45 [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
` (7 preceding siblings ...)
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 04/10] stop cleanup: remove unnecessary tpmstate cleanup Dominik Csapak
@ 2024-04-19 12:45 ` Dominik Csapak
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 06/10] migrate: call vm_stop_cleanup after stopping in phase3_cleanup Dominik Csapak
` (15 subsequent siblings)
24 siblings, 0 replies; 44+ messages in thread
From: Dominik Csapak @ 2024-04-19 12:45 UTC (permalink / raw)
To: pve-devel
and set it on all current users
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
PVE/CLI/qm.pm | 2 +-
PVE/QemuServer.pm | 13 ++++++++-----
2 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/PVE/CLI/qm.pm b/PVE/CLI/qm.pm
index b105830f..fbc590f5 100755
--- a/PVE/CLI/qm.pm
+++ b/PVE/CLI/qm.pm
@@ -933,7 +933,7 @@ __PACKAGE__->register_method({
if (!$clean || $guest) {
# vm was shutdown from inside the guest or crashed, doing api cleanup
- PVE::QemuServer::vm_stop_cleanup($storecfg, $vmid, $conf, 0, 0);
+ PVE::QemuServer::vm_stop_cleanup($storecfg, $vmid, $conf, 0, 0, 1);
}
PVE::GuestHelpers::exec_hookscript($conf, $vmid, 'post-stop');
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 680c36f2..5ab52e7a 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -6165,7 +6165,7 @@ sub cleanup_pci_devices {
}
sub vm_stop_cleanup {
- my ($storecfg, $vmid, $conf, $keepActive, $apply_pending_changes) = @_;
+ my ($storecfg, $vmid, $conf, $keepActive, $apply_pending_changes, $noerr) = @_;
eval {
@@ -6191,7 +6191,10 @@ sub vm_stop_cleanup {
vmconfig_apply_pending($vmid, $conf, $storecfg) if $apply_pending_changes;
};
- warn $@ if $@; # avoid errors - just warn
+ if (my $err = $@) {
+ die $err if !$noerr;
+ warn $err;
+ }
}
# call only in locked context
@@ -6242,7 +6245,7 @@ sub _do_vm_stop {
die "VM quit/powerdown failed - got timeout\n";
}
} else {
- vm_stop_cleanup($storecfg, $vmid, $conf, $keepActive, 1) if $conf;
+ vm_stop_cleanup($storecfg, $vmid, $conf, $keepActive, 1, 1) if $conf;
return;
}
} else {
@@ -6273,7 +6276,7 @@ sub _do_vm_stop {
sleep 1;
}
- vm_stop_cleanup($storecfg, $vmid, $conf, $keepActive, 1) if $conf;
+ vm_stop_cleanup($storecfg, $vmid, $conf, $keepActive, 1, 1) if $conf;
}
# Note: use $nocheck to skip tests if VM configuration file exists.
@@ -6288,7 +6291,7 @@ sub vm_stop {
my $pid = check_running($vmid, $nocheck, $migratedfrom);
kill 15, $pid if $pid;
my $conf = PVE::QemuConfig->load_config($vmid, $migratedfrom);
- vm_stop_cleanup($storecfg, $vmid, $conf, $keepActive, 0);
+ vm_stop_cleanup($storecfg, $vmid, $conf, $keepActive, 0, 1);
return;
}
--
2.39.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* [pve-devel] [PATCH qemu-server v3 06/10] migrate: call vm_stop_cleanup after stopping in phase3_cleanup
2024-04-19 12:45 [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
` (8 preceding siblings ...)
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 05/10] vm_stop_cleanup: add noerr parameter Dominik Csapak
@ 2024-04-19 12:45 ` Dominik Csapak
2024-05-31 12:56 ` Fiona Ebner
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 07/10] pci: set 'enable-migration' to on for live-migration marked mapped devices Dominik Csapak
` (14 subsequent siblings)
24 siblings, 1 reply; 44+ messages in thread
From: Dominik Csapak @ 2024-04-19 12:45 UTC (permalink / raw)
To: pve-devel
we currently only call deactivate_volumes, but we actually want to call
the whole vm_stop_cleanup, since that is not invoked by the vm_stop
above (we cannot parse the config anymore) and might do other cleanups
we also want to do (like mdev cleanup).
For this to work properly we have to clone the original config at the
beginning, since we might modify the volids.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
PVE/QemuMigrate.pm | 12 ++++++------
test/MigrationTest/Shared.pm | 3 +++
2 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index 8d9b35ae..381022f5 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -5,6 +5,7 @@ use warnings;
use IO::File;
use IPC::Open2;
+use Storable qw(dclone);
use Time::HiRes qw( usleep );
use PVE::Cluster;
@@ -1455,7 +1456,8 @@ sub phase3_cleanup {
my $tunnel = $self->{tunnel};
- my $sourcevollist = PVE::QemuServer::get_vm_volumes($conf);
+ # we'll need an unmodified copy of the config later for the cleanup
+ my $oldconf = dclone($conf);
if ($self->{volume_map} && !$self->{opts}->{remote}) {
my $target_drives = $self->{target_drive};
@@ -1586,12 +1588,10 @@ sub phase3_cleanup {
$self->{errors} = 1;
}
- # always deactivate volumes - avoid lvm LVs to be active on several nodes
- eval {
- PVE::Storage::deactivate_volumes($self->{storecfg}, $sourcevollist);
- };
+ # stop with nocheck does not do a cleanup, so do it here with the original config
+ eval { PVE::QemuServer::vm_stop_cleanup($self->{storecfg}, $vmid, $oldconf) };
if (my $err = $@) {
- $self->log('err', $err);
+ $self->log('err', "cleanup for vm failed - $err");
$self->{errors} = 1;
}
diff --git a/test/MigrationTest/Shared.pm b/test/MigrationTest/Shared.pm
index aa7203d1..2347e60a 100644
--- a/test/MigrationTest/Shared.pm
+++ b/test/MigrationTest/Shared.pm
@@ -130,6 +130,9 @@ $qemu_server_module->mock(
clear_reboot_request => sub {
return 1;
},
+ vm_stop_cleanup => sub {
+ return 1;
+ },
get_efivars_size => sub {
return 128 * 1024;
},
--
2.39.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* [pve-devel] [PATCH qemu-server v3 07/10] pci: set 'enable-migration' to on for live-migration marked mapped devices
2024-04-19 12:45 [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
` (9 preceding siblings ...)
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 06/10] migrate: call vm_stop_cleanup after stopping in phase3_cleanup Dominik Csapak
@ 2024-04-19 12:45 ` Dominik Csapak
2024-05-31 12:56 ` Fiona Ebner
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 08/10] check_local_resources: add more info per mapped device and return as hash Dominik Csapak
` (13 subsequent siblings)
24 siblings, 1 reply; 44+ messages in thread
From: Dominik Csapak @ 2024-04-19 12:45 UTC (permalink / raw)
To: pve-devel
the default is 'auto', but for those which are marked as capable for
live migration, we want to explicitly enable that, so we get an early
error on start if the driver does not support that.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
PVE/QemuServer/PCI.pm | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/PVE/QemuServer/PCI.pm b/PVE/QemuServer/PCI.pm
index 6ba43ee8..df2ea3eb 100644
--- a/PVE/QemuServer/PCI.pm
+++ b/PVE/QemuServer/PCI.pm
@@ -435,8 +435,11 @@ sub parse_hostpci {
my $devices = PVE::Mapping::PCI::get_node_mapping($config, $mapping, $node);
die "PCI device mapping not found for '$mapping'\n" if !$devices || !scalar($devices->@*);
+ my $cfg = $config->{ids}->{$mapping};
+ $res->{'live-migration-capable'} = 1 if $cfg->{'live-migration-capable'};
+
for my $device ($devices->@*) {
- eval { PVE::Mapping::PCI::assert_valid($mapping, $device, $config->{ids}->{$mapping}) };
+ eval { PVE::Mapping::PCI::assert_valid($mapping, $device, $cfg) };
die "PCI device mapping invalid (hardware probably changed): $@\n" if $@;
push $alternatives->@*, [split(/;/, $device->{path})];
}
@@ -635,6 +638,10 @@ sub print_hostpci_devices {
$devicestr .= ",host=$pcidevice->{id}";
}
+ if ($d->{'live-migration-capable'}) {
+ $devicestr .= ",enable-migration=on"
+ }
+
my $mf_addr = $multifunction ? ".$j" : '';
$devicestr .= ",id=${id}${mf_addr}${pciaddr}${mf_addr}";
--
2.39.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* [pve-devel] [PATCH qemu-server v3 08/10] check_local_resources: add more info per mapped device and return as hash
2024-04-19 12:45 [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
` (10 preceding siblings ...)
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 07/10] pci: set 'enable-migration' to on for live-migration marked mapped devices Dominik Csapak
@ 2024-04-19 12:45 ` Dominik Csapak
2024-05-31 13:05 ` Fiona Ebner
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 09/10] api: enable live migration for marked mapped pci devices Dominik Csapak
` (12 subsequent siblings)
24 siblings, 1 reply; 44+ messages in thread
From: Dominik Csapak @ 2024-04-19 12:45 UTC (permalink / raw)
To: pve-devel
such as the mapping name and if it's marked for live-migration (pci only)
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
PVE/API2/Qemu.pm | 2 +-
PVE/QemuMigrate.pm | 7 ++++---
PVE/QemuServer.pm | 17 ++++++++++-------
3 files changed, 15 insertions(+), 11 deletions(-)
diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index 2a349c8c..f2fa345d 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -4538,7 +4538,7 @@ __PACKAGE__->register_method({
$res->{local_disks} = [ values %$local_disks ];;
$res->{local_resources} = $local_resources;
- $res->{'mapped-resources'} = $mapped_resources;
+ $res->{'mapped-resources'} = [ keys $mapped_resources->%* ];
return $res;
diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index 381022f5..a46eb2a3 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -233,7 +233,7 @@ sub prepare {
my ($loc_res, $mapped_res, $missing_mappings_by_node) = PVE::QemuServer::check_local_resources($conf, 1);
my $blocking_resources = [];
for my $res ($loc_res->@*) {
- if (!grep($res, $mapped_res->@*)) {
+ if (!defined($mapped_res->{$res})) {
push $blocking_resources->@*, $res;
}
}
@@ -245,10 +245,11 @@ sub prepare {
}
}
- if (scalar($mapped_res->@*)) {
+ if (scalar(keys $mapped_res->%*)) {
my $missing_mappings = $missing_mappings_by_node->{$self->{node}};
+ my $mapped_text = join(", ", keys $mapped_res->%*);
if ($running) {
- die "can't migrate running VM which uses mapped devices: " . join(", ", $mapped_res->@*) . "\n";
+ die "can't migrate running VM which uses mapped devices: $mapped_text\n";
} elsif (scalar($missing_mappings->@*)) {
die "can't migrate to '$self->{node}': missing mapped devices " . join(", ", $missing_mappings->@*) . "\n";
} else {
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 5ab52e7a..dcb3b3be 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -2559,7 +2559,7 @@ sub check_local_resources {
my ($conf, $noerr) = @_;
my @loc_res = ();
- my $mapped_res = [];
+ my $mapped_res = {};
my $nodelist = PVE::Cluster::get_nodelist();
my $pci_map = PVE::Mapping::PCI::config();
@@ -2591,16 +2591,19 @@ sub check_local_resources {
if ($k =~ m/^usb/) {
my $entry = parse_property_string('pve-qm-usb', $conf->{$k});
next if $entry->{host} && $entry->{host} =~ m/^spice$/i;
- if ($entry->{mapping}) {
- $add_missing_mapping->('usb', $k, $entry->{mapping});
- push @$mapped_res, $k;
+ if (my $name = $entry->{mapping}) {
+ $add_missing_mapping->('usb', $k, $name);
+ $mapped_res->{$k} = { name => $name };
}
}
if ($k =~ m/^hostpci/) {
my $entry = parse_property_string('pve-qm-hostpci', $conf->{$k});
- if ($entry->{mapping}) {
- $add_missing_mapping->('pci', $k, $entry->{mapping});
- push @$mapped_res, $k;
+ if (my $name = $entry->{mapping}) {
+ $add_missing_mapping->('pci', $k, $name);
+ my $mapped_device = { name => $name };
+ $mapped_device->{'live-migration'} = 1
+ if $pci_map->{ids}->{$name}->{'live-migration-capable'};
+ $mapped_res->{$k} = $mapped_device;
}
}
# sockets are safe: they will recreated be on the target side post-migrate
--
2.39.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* [pve-devel] [PATCH qemu-server v3 09/10] api: enable live migration for marked mapped pci devices
2024-04-19 12:45 [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
` (11 preceding siblings ...)
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 08/10] check_local_resources: add more info per mapped device and return as hash Dominik Csapak
@ 2024-04-19 12:45 ` Dominik Csapak
2024-05-31 13:13 ` Fiona Ebner
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 10/10] api: include not mapped resources for running vms in migrate preconditions Dominik Csapak
` (11 subsequent siblings)
24 siblings, 1 reply; 44+ messages in thread
From: Dominik Csapak @ 2024-04-19 12:45 UTC (permalink / raw)
To: pve-devel
They have to be marked as 'live-migration-capable' in the mapping
config, and the driver and qemu must support it.
For the gui checks, we now return the whole object of the mapped
resources, which includes info like the name and if it's marked as
live-migration capable. (while deprecating the old 'mapped-resource'
return value, since that returns strictly less information)
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
PVE/API2/Qemu.pm | 7 ++++++-
PVE/QemuMigrate.pm | 13 +++++++++----
2 files changed, 15 insertions(+), 5 deletions(-)
diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index f2fa345d..f95d8d95 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -4473,7 +4473,11 @@ __PACKAGE__->register_method({
},
'mapped-resources' => {
type => 'array',
- description => "List of mapped resources e.g. pci, usb"
+ description => "List of mapped resources e.g. pci, usb. Deprecated, use 'mapped-resource-info' instead."
+ },
+ 'mapped-resource-info' => {
+ type => 'object',
+ description => "Object of mapped resources with additional information such if they're live migratable.",
},
},
},
@@ -4539,6 +4543,7 @@ __PACKAGE__->register_method({
$res->{local_resources} = $local_resources;
$res->{'mapped-resources'} = [ keys $mapped_resources->%* ];
+ $res->{'mapped-resource-info'} = $mapped_resources;
return $res;
diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index a46eb2a3..89caefc7 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -247,11 +247,16 @@ sub prepare {
if (scalar(keys $mapped_res->%*)) {
my $missing_mappings = $missing_mappings_by_node->{$self->{node}};
- my $mapped_text = join(", ", keys $mapped_res->%*);
- if ($running) {
- die "can't migrate running VM which uses mapped devices: $mapped_text\n";
- } elsif (scalar($missing_mappings->@*)) {
+ my $missing_live_mappings = [];
+ for my $key (keys $mapped_res->%*) {
+ my $res = $mapped_res->{$key};
+ my $name = "$key:$res->{name}";
+ push $missing_live_mappings->@*, $name if !$res->{'live-migration'};
+ }
+ if (scalar($missing_mappings->@*)) {
die "can't migrate to '$self->{node}': missing mapped devices " . join(", ", $missing_mappings->@*) . "\n";
+ } elsif ($running && scalar($missing_live_mappings->@*)) {
+ die "can't live migrate running VM which uses following mapped devices: " . join(", ", $missing_live_mappings->@*) . "\n";
} else {
$self->log('info', "migrating VM which uses mapped local devices");
}
--
2.39.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* [pve-devel] [PATCH qemu-server v3 10/10] api: include not mapped resources for running vms in migrate preconditions
2024-04-19 12:45 [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
` (12 preceding siblings ...)
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 09/10] api: enable live migration for marked mapped pci devices Dominik Csapak
@ 2024-04-19 12:45 ` Dominik Csapak
2024-05-31 13:37 ` Fiona Ebner
2024-04-19 12:45 ` [pve-devel] [PATCH manager v3 1/5] mapping: pci: include mdev in config checks Dominik Csapak
` (10 subsequent siblings)
24 siblings, 1 reply; 44+ messages in thread
From: Dominik Csapak @ 2024-04-19 12:45 UTC (permalink / raw)
To: pve-devel
so that we can show a proper warning in the migrate dialog and check it
in the bulk migrate precondition check
the unavailable_storages and should be the same as before, but
we now always return allowed_nodes too.
also add a note that we want to redesign the return values here, to make
* the api call simpler
* return better structured values
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
PVE/API2/Qemu.pm | 39 ++++++++++++++++++++++-----------------
1 file changed, 22 insertions(+), 17 deletions(-)
diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index f95d8d95..94aa9942 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -4450,18 +4450,20 @@ __PACKAGE__->register_method({
},
},
returns => {
+ # TODO 9.x: rework the api call to return more sensible structures
+ # e.g. a simple list of nodes with their blockers and/or notices to show
type => "object",
properties => {
running => { type => 'boolean' },
allowed_nodes => {
type => 'array',
optional => 1,
- description => "List nodes allowed for offline migration, only passed if VM is offline"
+ description => "List nodes allowed for offline migration.",
},
not_allowed_nodes => {
type => 'object',
optional => 1,
- description => "List not allowed nodes with additional informations, only passed if VM is offline"
+ description => "List not allowed nodes with additional information.",
},
local_disks => {
type => 'array',
@@ -4518,28 +4520,31 @@ __PACKAGE__->register_method({
# if vm is not running, return target nodes where local storage/mapped devices are available
# for offline migration
+ $res->{allowed_nodes} = [];
+ $res->{not_allowed_nodes} = {};
if (!$res->{running}) {
- $res->{allowed_nodes} = [];
- my $checked_nodes = PVE::QemuServer::check_local_storage_availability($vmconf, $storecfg);
- delete $checked_nodes->{$localnode};
-
- foreach my $node (keys %$checked_nodes) {
- my $missing_mappings = $missing_mappings_by_node->{$node};
- if (scalar($missing_mappings->@*)) {
- $checked_nodes->{$node}->{'unavailable-resources'} = $missing_mappings;
- next;
- }
+ $res->{not_allowed_nodes} = PVE::QemuServer::check_local_storage_availability($vmconf, $storecfg);
+ delete $res->{not_allowed_nodes}->{$localnode};
+ }
- if (!defined($checked_nodes->{$node}->{unavailable_storages})) {
- push @{$res->{allowed_nodes}}, $node;
- }
+ my $merged = { $res->{not_allowed_nodes}->%*, $missing_mappings_by_node->%* };
+ for my $node (keys $merged->%*) {
+ my $missing_mappings = $missing_mappings_by_node->{$node};
+ if (scalar($missing_mappings->@*)) {
+ $res->{not_allowed_nodes}->{$node}->{'unavailable-resources'} = $missing_mappings;
+ next;
+ }
+
+ if (!$res->{running}) {
+ if (!defined($res->{not_allowed_nodes}->{$node}->{unavailable_storages})) {
+ push $res->{allowed_nodes}->@*, $node;
+ }
}
- $res->{not_allowed_nodes} = $checked_nodes;
}
my $local_disks = &$check_vm_disks_local($storecfg, $vmconf, $vmid);
- $res->{local_disks} = [ values %$local_disks ];;
+ $res->{local_disks} = [ values %$local_disks ];
$res->{local_resources} = $local_resources;
$res->{'mapped-resources'} = [ keys $mapped_resources->%* ];
--
2.39.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* [pve-devel] [PATCH manager v3 1/5] mapping: pci: include mdev in config checks
2024-04-19 12:45 [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
` (13 preceding siblings ...)
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 10/10] api: include not mapped resources for running vms in migrate preconditions Dominik Csapak
@ 2024-04-19 12:45 ` Dominik Csapak
2024-04-19 12:45 ` [pve-devel] [PATCH manager v3 2/5] bulk migrate: improve precondition checks Dominik Csapak
` (9 subsequent siblings)
24 siblings, 0 replies; 44+ messages in thread
From: Dominik Csapak @ 2024-04-19 12:45 UTC (permalink / raw)
To: pve-devel
by also providing the global config in assert_valid, and by also
adding the mdev config in the 'toCheck' object in the gui
For the gui, we extract the mdev property from the global entry, and add
it to the individual mapping entries, that way we can reuse the checking
logic of the other properties.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
changes from v2:
* now correctly check the mdev property
expected is from 'device' and configured one is from the record
PVE/API2/Cluster/Mapping/PCI.pm | 2 +-
www/manager6/dc/PCIMapView.js | 6 ++++++
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/PVE/API2/Cluster/Mapping/PCI.pm b/PVE/API2/Cluster/Mapping/PCI.pm
index 64462d25..f85623bb 100644
--- a/PVE/API2/Cluster/Mapping/PCI.pm
+++ b/PVE/API2/Cluster/Mapping/PCI.pm
@@ -115,7 +115,7 @@ __PACKAGE__->register_method ({
};
}
for my $mapping ($mappings->@*) {
- eval { PVE::Mapping::PCI::assert_valid($id, $mapping) };
+ eval { PVE::Mapping::PCI::assert_valid($id, $mapping, $entry) };
if (my $err = $@) {
push $entry->{checks}->@*, {
severity => 'error',
diff --git a/www/manager6/dc/PCIMapView.js b/www/manager6/dc/PCIMapView.js
index 80fe3c0f..3240590c 100644
--- a/www/manager6/dc/PCIMapView.js
+++ b/www/manager6/dc/PCIMapView.js
@@ -20,10 +20,15 @@ Ext.define('PVE.dc.PCIMapView', {
data.forEach((entry) => {
ids[entry.id] = entry;
});
+ let mdev;
me.getRootNode()?.cascade(function(rec) {
+ if (rec.data.type === 'entry') {
+ mdev = rec.data.mdev;
+ }
if (rec.data.node !== node || rec.data.type !== 'map') {
return;
}
+ rec.data.mdev = mdev;
let id = rec.data.path;
if (!id.match(/\.\d$/)) {
@@ -44,6 +49,7 @@ Ext.define('PVE.dc.PCIMapView', {
let toCheck = {
id: deviceId,
'subsystem-id': subId,
+ mdev: device.mdev,
iommugroup: device.iommugroup !== -1 ? device.iommugroup : undefined,
};
--
2.39.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* [pve-devel] [PATCH manager v3 2/5] bulk migrate: improve precondition checks
2024-04-19 12:45 [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
` (14 preceding siblings ...)
2024-04-19 12:45 ` [pve-devel] [PATCH manager v3 1/5] mapping: pci: include mdev in config checks Dominik Csapak
@ 2024-04-19 12:45 ` Dominik Csapak
2024-04-19 12:45 ` [pve-devel] [PATCH manager v3 3/5] bulk migrate: include checks for live-migratable local resources Dominik Csapak
` (8 subsequent siblings)
24 siblings, 0 replies; 44+ messages in thread
From: Dominik Csapak @ 2024-04-19 12:45 UTC (permalink / raw)
To: pve-devel
this now takes into account the 'not_allowed_nodes' hash we get from the
api call. With that, we can now limit the 'local_resources' check for
online vms only, as for offline guests, the 'unavailable-resources' hash
already includes mapped devices that don't exist on the target node.
This now also includes unavailable storages on target nodes.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
PVE/API2/Nodes.pm | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/PVE/API2/Nodes.pm b/PVE/API2/Nodes.pm
index fb4fd4d6..882d7301 100644
--- a/PVE/API2/Nodes.pm
+++ b/PVE/API2/Nodes.pm
@@ -2255,11 +2255,23 @@ my $create_migrate_worker = sub {
$invalidConditions .= join(', ', map { $_->{volid} } @{$preconditions->{local_disks}});
}
- if (@{$preconditions->{local_resources}}) {
+ if ($online && scalar($preconditions->{local_resources}->@*)) {
$invalidConditions .= "\n Has local resources: ";
$invalidConditions .= join(', ', @{$preconditions->{local_resources}});
}
+ if (my $not_allowed_nodes = $preconditions->{not_allowed_nodes}) {
+ if (my $unavail_storages = $not_allowed_nodes->{$target}->{unavailable_storages}) {
+ $invalidConditions .= "\n Has unavailable storages: ";
+ $invalidConditions .= join(', ', $unavail_storages->@*);
+ }
+
+ if (my $unavail_resources = $not_allowed_nodes->{$target}->{'unavailable-resources'}) {
+ $invalidConditions .= "\n Has unavailable resources ";
+ $invalidConditions .= join(', ', $unavail_resources->@*);
+ }
+ }
+
if ($invalidConditions && $invalidConditions ne '') {
print STDERR "skip VM $vmid - precondition check failed:";
die "$invalidConditions\n";
--
2.39.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* [pve-devel] [PATCH manager v3 3/5] bulk migrate: include checks for live-migratable local resources
2024-04-19 12:45 [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
` (15 preceding siblings ...)
2024-04-19 12:45 ` [pve-devel] [PATCH manager v3 2/5] bulk migrate: improve precondition checks Dominik Csapak
@ 2024-04-19 12:45 ` Dominik Csapak
2024-04-19 12:45 ` [pve-devel] [PATCH manager v3 4/5] ui: adapt migration window to precondition api change Dominik Csapak
` (7 subsequent siblings)
24 siblings, 0 replies; 44+ messages in thread
From: Dominik Csapak @ 2024-04-19 12:45 UTC (permalink / raw)
To: pve-devel
those should be able to migrate even for online vms. If the mapping does
not exist on the target node, that will be caught further down anyway.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
PVE/API2/Nodes.pm | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/PVE/API2/Nodes.pm b/PVE/API2/Nodes.pm
index 882d7301..35e7047c 100644
--- a/PVE/API2/Nodes.pm
+++ b/PVE/API2/Nodes.pm
@@ -2255,9 +2255,18 @@ my $create_migrate_worker = sub {
$invalidConditions .= join(', ', map { $_->{volid} } @{$preconditions->{local_disks}});
}
+ # for a live migration all local_resources must be marked as live-migratable
if ($online && scalar($preconditions->{local_resources}->@*)) {
- $invalidConditions .= "\n Has local resources: ";
- $invalidConditions .= join(', ', @{$preconditions->{local_resources}});
+ my $resource_not_live = [];
+ for my $resource ($preconditions->{local_resources}->@*) {
+ next if $preconditions->{'mapped-resource-info'}->{$resource}->{'live-migration'};
+ push $resource_not_live->@*, $resource;
+ }
+
+ if (scalar($resource_not_live->@*)) {
+ $invalidConditions .= "\n Has local resources not marked as live migratable: ";
+ $invalidConditions .= join(', ', $resource_not_live->@*);
+ }
}
if (my $not_allowed_nodes = $preconditions->{not_allowed_nodes}) {
--
2.39.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* [pve-devel] [PATCH manager v3 4/5] ui: adapt migration window to precondition api change
2024-04-19 12:45 [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
` (16 preceding siblings ...)
2024-04-19 12:45 ` [pve-devel] [PATCH manager v3 3/5] bulk migrate: include checks for live-migratable local resources Dominik Csapak
@ 2024-04-19 12:45 ` Dominik Csapak
2024-04-19 12:45 ` [pve-devel] [PATCH manager v3 5/5] fix #5175: ui: allow configuring and live migration of mapped pci resources Dominik Csapak
` (6 subsequent siblings)
24 siblings, 0 replies; 44+ messages in thread
From: Dominik Csapak @ 2024-04-19 12:45 UTC (permalink / raw)
To: pve-devel
we now return the 'allowed_nodes'/'not_allowed_nodes' also if the vm is
running, when it has mapped resources. So do that checks independently
so that the user has instant feedback where those resources exist.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
www/manager6/window/Migrate.js | 26 +++++++++++++-------------
1 file changed, 13 insertions(+), 13 deletions(-)
diff --git a/www/manager6/window/Migrate.js b/www/manager6/window/Migrate.js
index 5473821b..c553a1ee 100644
--- a/www/manager6/window/Migrate.js
+++ b/www/manager6/window/Migrate.js
@@ -104,7 +104,7 @@ Ext.define('PVE.window.Migrate', {
onTargetChange: function(nodeSelector) {
// Always display the storages of the currently seleceted migration target
this.lookup('pveDiskStorageSelector').setNodename(nodeSelector.value);
- this.checkMigratePreconditions();
+ this.checkMigratePreconditions(true);
},
startMigration: function() {
@@ -214,12 +214,12 @@ Ext.define('PVE.window.Migrate', {
migration.possible = true;
}
migration.preconditions = [];
+ let target = me.lookup('pveNodeSelector').value;
+ let disallowed = migrateStats.not_allowed_nodes?.[target] ?? {};
- if (migrateStats.allowed_nodes) {
+ if (migrateStats.allowed_nodes && !vm.get('running')) {
migration.allowedNodes = migrateStats.allowed_nodes;
- let target = me.lookup('pveNodeSelector').value;
if (target.length && !migrateStats.allowed_nodes.includes(target)) {
- let disallowed = migrateStats.not_allowed_nodes[target] ?? {};
if (disallowed.unavailable_storages !== undefined) {
let missingStorages = disallowed.unavailable_storages.join(', ');
@@ -230,17 +230,17 @@ Ext.define('PVE.window.Migrate', {
severity: 'error',
});
}
+ }
+ }
- if (disallowed['unavailable-resources'] !== undefined) {
- let unavailableResources = disallowed['unavailable-resources'].join(', ');
+ if (disallowed['unavailable-resources'] !== undefined) {
+ let unavailableResources = disallowed['unavailable-resources'].join(', ');
- migration.possible = false;
- migration.preconditions.push({
- text: 'Mapped Resources (' + unavailableResources + ') not available on selected target. ',
- severity: 'error',
- });
- }
- }
+ migration.possible = false;
+ migration.preconditions.push({
+ text: 'Mapped Resources (' + unavailableResources + ') not available on selected target. ',
+ severity: 'error',
+ });
}
let blockingResources = [];
--
2.39.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* [pve-devel] [PATCH manager v3 5/5] fix #5175: ui: allow configuring and live migration of mapped pci resources
2024-04-19 12:45 [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
` (17 preceding siblings ...)
2024-04-19 12:45 ` [pve-devel] [PATCH manager v3 4/5] ui: adapt migration window to precondition api change Dominik Csapak
@ 2024-04-19 12:45 ` Dominik Csapak
2024-04-19 12:45 ` [pve-devel] [PATCH docs v3 1/2] qm: resource mapping: add description for `mdev` option Dominik Csapak
` (5 subsequent siblings)
24 siblings, 0 replies; 44+ messages in thread
From: Dominik Csapak @ 2024-04-19 12:45 UTC (permalink / raw)
To: pve-devel
if the hardware/driver is capable, the admin can now mark a pci device
as 'live-migration-capable', which then tries enabling live migration
for such devices.
mark it as experimental when configuring and in the migrate window
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
www/manager6/window/Migrate.js | 25 ++++++++++++++++++++-----
www/manager6/window/PCIMapEdit.js | 12 ++++++++++++
2 files changed, 32 insertions(+), 5 deletions(-)
diff --git a/www/manager6/window/Migrate.js b/www/manager6/window/Migrate.js
index c553a1ee..272ab022 100644
--- a/www/manager6/window/Migrate.js
+++ b/www/manager6/window/Migrate.js
@@ -244,10 +244,10 @@ Ext.define('PVE.window.Migrate', {
}
let blockingResources = [];
- let mappedResources = migrateStats['mapped-resources'] ?? [];
+ let mappedResources = migrateStats['mapped-resource-info'] ?? {};
for (const res of migrateStats.local_resources) {
- if (mappedResources.indexOf(res) === -1) {
+ if (!mappedResources[res]) {
blockingResources.push(res);
}
}
@@ -271,14 +271,29 @@ Ext.define('PVE.window.Migrate', {
}
}
- if (mappedResources && mappedResources.length) {
- if (vm.get('running')) {
+ if (mappedResources && vm.get('running')) {
+ let allowed = [];
+ let notAllowed = [];
+ for (const [key, resource] of Object.entries(mappedResources)) {
+ if (resource['live-migration']) {
+ allowed.push(key);
+ } else {
+ notAllowed.push(key);
+ }
+ }
+ if (notAllowed.length > 0) {
migration.possible = false;
migration.preconditions.push({
text: Ext.String.format('Can\'t migrate running VM with mapped resources: {0}',
- mappedResources.join(', ')),
+ notAllowed.join(', ')),
severity: 'error',
});
+ } else if (allowed.length > 0) {
+ migration.preconditions.push({
+ text: Ext.String.format('Live-migrating running VM with mapped resources (Experimental): {0}',
+ allowed.join(', ')),
+ severity: 'warning',
+ });
}
}
diff --git a/www/manager6/window/PCIMapEdit.js b/www/manager6/window/PCIMapEdit.js
index d43f04eb..731269a0 100644
--- a/www/manager6/window/PCIMapEdit.js
+++ b/www/manager6/window/PCIMapEdit.js
@@ -242,6 +242,18 @@ Ext.define('PVE.window.PCIMapEditWindow', {
disabled: '{hideComment}',
},
},
+ {
+ xtype: 'proxmoxcheckbox',
+ fieldLabel: gettext('Live Migration Capable'),
+ labelWidth: 200,
+ boxLabel: `<i class="fa fa-exclamation-triangle warning"></i> ${gettext('Experimental')}`,
+ reference: 'live-migration-capable',
+ name: 'live-migration-capable',
+ cbind: {
+ deleteEmpty: '{!isCreate}',
+ disabled: '{hideComment}',
+ },
+ },
],
columnB: [
--
2.39.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* [pve-devel] [PATCH docs v3 1/2] qm: resource mapping: add description for `mdev` option
2024-04-19 12:45 [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
` (18 preceding siblings ...)
2024-04-19 12:45 ` [pve-devel] [PATCH manager v3 5/5] fix #5175: ui: allow configuring and live migration of mapped pci resources Dominik Csapak
@ 2024-04-19 12:45 ` Dominik Csapak
2024-04-19 12:45 ` [pve-devel] [PATCH docs v3 2/2] qm: resource mapping: document `live-migration-capable` setting Dominik Csapak
` (4 subsequent siblings)
24 siblings, 0 replies; 44+ messages in thread
From: Dominik Csapak @ 2024-04-19 12:45 UTC (permalink / raw)
To: pve-devel
in a new section about additional options
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
qm.adoc | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/qm.adoc b/qm.adoc
index 42c26db..3f4e59a 100644
--- a/qm.adoc
+++ b/qm.adoc
@@ -1802,6 +1802,19 @@ To create mappings `Mapping.Modify` on `/mapping/<type>/<name>` is necessary
To use these mappings, `Mapping.Use` on `/mapping/<type>/<name>` is necessary
(in addition to the normal guest privileges to edit the configuration).
+Additional Options
+^^^^^^^^^^^^^^^^^^
+
+There are additional options when defining a cluster wide resource mapping.
+Currently there are the following options:
+
+* `mdev` (PCI): This marks the PCI device as being capable of providing
+ `mediated devices`. When this is enabled, you can select a type when
+ configuring it on the guest. If multiple PCI devices are selected for
+ the mapping, the mediated device will be create on the first one where
+ there are any available instances of the selected type.
+
+
Managing Virtual Machines with `qm`
------------------------------------
--
2.39.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* [pve-devel] [PATCH docs v3 2/2] qm: resource mapping: document `live-migration-capable` setting
2024-04-19 12:45 [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
` (19 preceding siblings ...)
2024-04-19 12:45 ` [pve-devel] [PATCH docs v3 1/2] qm: resource mapping: add description for `mdev` option Dominik Csapak
@ 2024-04-19 12:45 ` Dominik Csapak
2024-05-28 7:25 ` [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
` (3 subsequent siblings)
24 siblings, 0 replies; 44+ messages in thread
From: Dominik Csapak @ 2024-04-19 12:45 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
qm.adoc | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/qm.adoc b/qm.adoc
index 3f4e59a..3ab5270 100644
--- a/qm.adoc
+++ b/qm.adoc
@@ -1814,6 +1814,12 @@ Currently there are the following options:
the mapping, the mediated device will be create on the first one where
there are any available instances of the selected type.
+* `live-migration-capable` (PCI): This marks the device as being capable of
+ being live migrated between nodes. This requires driver and hardware support.
+ Only NVIDIA GPUs with recent kernel are known to support this. Note that
+ live migrating passed through devices is an experimental feature and may
+ not work or cause issues.
+
Managing Virtual Machines with `qm`
------------------------------------
--
2.39.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration
2024-04-19 12:45 [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
` (20 preceding siblings ...)
2024-04-19 12:45 ` [pve-devel] [PATCH docs v3 2/2] qm: resource mapping: document `live-migration-capable` setting Dominik Csapak
@ 2024-05-28 7:25 ` Dominik Csapak
2024-05-31 8:11 ` Eneko Lacunza via pve-devel
[not found] ` <954884c6-3a53-4b3a-bdc8-93478037b6a6@binovo.es>
2024-05-31 11:14 ` Fiona Ebner
` (2 subsequent siblings)
24 siblings, 2 replies; 44+ messages in thread
From: Dominik Csapak @ 2024-05-28 7:25 UTC (permalink / raw)
To: pve-devel
ping?
i know we cannot really test this atm but there are a few users/customers
that would like to use/test this
since i marked it experimental i think it should be ok to include this
if its ok code wise
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration
2024-05-28 7:25 ` [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
@ 2024-05-31 8:11 ` Eneko Lacunza via pve-devel
[not found] ` <954884c6-3a53-4b3a-bdc8-93478037b6a6@binovo.es>
1 sibling, 0 replies; 44+ messages in thread
From: Eneko Lacunza via pve-devel @ 2024-05-31 8:11 UTC (permalink / raw)
To: pve-devel; +Cc: Eneko Lacunza
[-- Attachment #1: Type: message/rfc822, Size: 6203 bytes --]
From: Eneko Lacunza <elacunza@binovo.es>
To: pve-devel@lists.proxmox.com
Subject: Re: [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration
Date: Fri, 31 May 2024 10:11:08 +0200
Message-ID: <954884c6-3a53-4b3a-bdc8-93478037b6a6@binovo.es>
Hi Dominik,
El 28/5/24 a las 9:25, Dominik Csapak escribió:
> ping?
>
> i know we cannot really test this atm but there are a few users/customers
> that would like to use/test this
>
> since i marked it experimental i think it should be ok to include this
> if its ok code wise
>
Would this patch-series help in this scenario:
- Running VM has a Nvidia mdev asigned in hw.
- We clone that VM. VM can't be started until we change the hw id to
another Nvidia mdev.
If so, I have a potential customer that is looking to migrate a VDI
deployment from Nutanix to Proxmox. They're currently testing Proxmox
with one server with a Nvidia card, and can test this if packages are
prepared (in testing?).
We deployed this Proxmox server yesterday, we used latest NVIDIA host
driver. Latest kernel 6.8 didn't work but latest 6.5 did (will report
exact versions when I get remote access).
Thanks
Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project
Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration
[not found] ` <954884c6-3a53-4b3a-bdc8-93478037b6a6@binovo.es>
@ 2024-05-31 8:41 ` Dominik Csapak
2024-06-03 8:43 ` Eneko Lacunza via pve-devel
1 sibling, 0 replies; 44+ messages in thread
From: Dominik Csapak @ 2024-05-31 8:41 UTC (permalink / raw)
To: Eneko Lacunza, pve-devel
On 5/31/24 10:11, Eneko Lacunza wrote:
>
> Hi Dominik,
>
>
Hi,
> El 28/5/24 a las 9:25, Dominik Csapak escribió:
>> ping?
>>
>> i know we cannot really test this atm but there are a few users/customers
>> that would like to use/test this
>>
>> since i marked it experimental i think it should be ok to include this
>> if its ok code wise
>>
>
> Would this patch-series help in this scenario:
>
> - Running VM has a Nvidia mdev asigned in hw.
> - We clone that VM. VM can't be started until we change the hw id to another Nvidia mdev.
>
No that should already work with the cluster device mappings, there you can configure
a set of cards that will be used for the mdev creation instead of having one fixed card.
see https://pve.proxmox.com/pve-docs/pve-admin-guide.html#resource_mapping
for how to configure that
this series would enable experimental live migration between hosts for cards/drivers that support this.
> If so, I have a potential customer that is looking to migrate a VDI deployment from Nutanix to
> Proxmox. They're currently testing Proxmox with one server with a Nvidia card, and can test this if
> packages are prepared (in testing?).
>
> We deployed this Proxmox server yesterday, we used latest NVIDIA host driver. Latest kernel 6.8
> didn't work but latest 6.5 did (will report exact versions when I get remote access).
yes we know, see the known issues section of the release
noteshttps://pve.proxmox.com/wiki/Roadmap#Proxmox_VE_8.2
>
> Thanks
>
> Eneko Lacunza
> Zuzendari teknikoa | Director técnico
> Binovo IT Human Project
>
> Tel. +34 943 569 206 | https://www.binovo.es
> Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
>
> https://www.youtube.com/user/CANALBINOVO
> https://www.linkedin.com/company/37269706/
>
>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration
2024-04-19 12:45 [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
` (21 preceding siblings ...)
2024-05-28 7:25 ` [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
@ 2024-05-31 11:14 ` Fiona Ebner
2024-06-05 8:01 ` Dominik Csapak
2024-05-31 13:40 ` Fiona Ebner
2024-06-06 9:38 ` Dominik Csapak
24 siblings, 1 reply; 44+ messages in thread
From: Fiona Ebner @ 2024-05-31 11:14 UTC (permalink / raw)
To: Proxmox VE development discussion, Dominik Csapak
Am 19.04.24 um 14:45 schrieb Dominik Csapak:
> and some useful cleanups
>
> Resending even there was not much feedback, because i worked in some
> minor fixes/changes in the meantime.
>
> A user tested the previous patch series and only found one issue with
> the ui, see the comments on bug #5175
>
> https://bugzilla.proxmox.com/show_bug.cgi?id=5175
>
>
> This is implemented for mapped resources. This requires driver and
> hardware support, but aside from nvidia vgpus there don't seem to be
> many drivers (if any) that do support that.
>
> qemu already supports that for vfio-pci devices, so nothing to be
> done there besides actively enabling it.
>
> Since we currently can't properly test it here and very much depends on
> hardware/driver support, mark it as experimental everywhere (docs/api/gui).
> (though i tested the live-migration part manually here by using
> "exec:cat > /tmp/test" for the migration target, and "exec: cat
> /tmp/test" as the 'incoming' parameter for a new vm start, which worked ;) )
>
> i opted for marking them migratable at the mapping level, but we could
> theoretically also put it in the hostpciX config instead.
> (though imho it fits better in the cluster-wide resource mapping config)
>
> also the naming/texts could probably be improved, but i think
> 'live-migration-capable' is very descriptive and i didn't want to
> use an overly short name for it (which can be confusing, see the
> 'shared' flag for storages)
>
> guest-common 2/4 semi-breaks pve-manager without pve-manager 1/5
> and qemu-server without 3/10
> (would always fail for devices capable of mediated devices)
>
And guest-common 4/4 fully breaks old qemu-server because it removes the
find_on_current_node() functions.
> guest-common 1,2; qemu-server 1-6; pve-manager 1,2
> are preparations/cleanups mostly and could be applied independently
>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [pve-devel] [PATCH guest-common v3 1/4] mapping: pci: rework properties check
2024-04-19 12:45 ` [pve-devel] [PATCH guest-common v3 1/4] mapping: pci: rework properties check Dominik Csapak
@ 2024-05-31 11:37 ` Fiona Ebner
0 siblings, 0 replies; 44+ messages in thread
From: Fiona Ebner @ 2024-05-31 11:37 UTC (permalink / raw)
To: Proxmox VE development discussion, Dominik Csapak
Am 19.04.24 um 14:45 schrieb Dominik Csapak:
> rename '$cfg' to '$mapping', 'correct' to 'expected'
> reword the error messages
> also check keys from the configured props not only the expected ones
>
Would've been nicer as multiple commits.
> previously we only checked the keys from the 'correct_props' hash
> but that was unintended. We now check the keys from both, but extract
> the relevant properties first.
>
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> changes from v2:
> * don't refactor the properties check out
> * use properties from both configured and expected hashes
> * extract the relevant configured properties from the mapping
> instead of using all (previously we only used the expected ones
> by accident)
> src/PVE/Mapping/PCI.pm | 34 +++++++++++++++++++---------------
> 1 file changed, 19 insertions(+), 15 deletions(-)
>
> diff --git a/src/PVE/Mapping/PCI.pm b/src/PVE/Mapping/PCI.pm
> index 725e106..ef1bd8d 100644
> --- a/src/PVE/Mapping/PCI.pm
> +++ b/src/PVE/Mapping/PCI.pm
> @@ -131,9 +131,9 @@ sub options {
>
> # checks if the given config is valid for the current node
> sub assert_valid {
> - my ($name, $cfg) = @_;
> + my ($name, $mapping) = @_;
>
> - my @paths = split(';', $cfg->{path} // '');
> + my @paths = split(';', $mapping->{path} // '');
>
> my $idx = 0;
> for my $path (@paths) {
> @@ -148,32 +148,36 @@ sub assert_valid {
> my $info = PVE::SysFSTools::pci_device_info($path, 1);
> die "pci device '$path' not found\n" if !defined($info);
>
> - my $correct_props = {
My suggestion is the following code changes. See below for rationale[0].
# make sure to initialize all keys that should be checked below!
> + my $expected_props = {
> id => "$info->{vendor}:$info->{device}",
> iommugroup => $info->{iommugroup},
'subsystem-id' => undef,
> };
>
> if (defined($info->{'subsystem_vendor'}) && defined($info->{'subsystem_device'})) {
> - $correct_props->{'subsystem-id'} = "$info->{'subsystem_vendor'}:$info->{'subsystem_device'}";
> + $expected_props->{'subsystem-id'} = "$info->{'subsystem_vendor'}:$info->{'subsystem_device'}";
> }
>
> - for my $prop (sort keys %$correct_props) {
> + my $configured_props = { $mapping->%{qw(id iommugroup subsystem-id)} };
> +
> + my $merged = { %$expected_props, %$configured_props }; # just for the keys
> + for my $prop (sort keys %$merged) {
I'd prefer to just extract the keys directly and avoid the comment:
my @keys = keys { $expected_props->%*, $configured_props->%* }->%*;
[0]: But we could also just initialize $expected_props like mentioned
above and then simply use the keys from there. Then you also don't need
to construct a new hash for $configured_props and introduce a new
hard-coded list ;)
> next if $prop eq 'iommugroup' && $idx > 0; # check iommu only on the first device
>
> - next if !defined($correct_props->{$prop}) && !defined($cfg->{$prop});
> - die "no '$prop' for device '$path'\n"
> - if defined($correct_props->{$prop}) && !defined($cfg->{$prop});
> - die "'$prop' configured but should not be\n"
> - if !defined($correct_props->{$prop}) && defined($cfg->{$prop});
> + next if !defined($expected_props->{$prop}) && !defined($configured_props->{$prop});
> + die "missing expected property '$prop' for device '$path'\n"
> + if defined($expected_props->{$prop}) && !defined($configured_props->{$prop});
> + die "unexpected property '$prop' configured for device '$path'\n"
> + if !defined($expected_props->{$prop}) && defined($configured_props->{$prop});
>
> - my $correct_prop = $correct_props->{$prop};
> - $correct_prop =~ s/0x//g;
> - my $configured_prop = $cfg->{$prop};
> + my $expected_prop = $expected_props->{$prop};
> + $expected_prop =~ s/0x//g;
> + my $configured_prop = $configured_props->{$prop};
> $configured_prop =~ s/0x//g;
>
> - die "'$prop' does not match for '$name' ($correct_prop != $configured_prop)\n"
> - if $correct_prop ne $configured_prop;
> + die "'$prop' does not match for '$name' ($expected_prop != $configured_prop)\n"
> + if $expected_prop ne $configured_prop;
> }
> +
> $idx++;
> }
>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [pve-devel] [PATCH guest-common v3 2/4] mapping: pci: check the mdev configuration on the device too
2024-04-19 12:45 ` [pve-devel] [PATCH guest-common v3 2/4] mapping: pci: check the mdev configuration on the device too Dominik Csapak
@ 2024-05-31 12:02 ` Fiona Ebner
0 siblings, 0 replies; 44+ messages in thread
From: Fiona Ebner @ 2024-05-31 12:02 UTC (permalink / raw)
To: Proxmox VE development discussion, Dominik Csapak
Am 19.04.24 um 14:45 schrieb Dominik Csapak:
> but that lives int he 'global' part of the mapping config, not in a
> specific mapping. To check that, add it to the $configured_props from
> there.
>
> this requires all call sites to be adapted otherwise the check will
> always fail for devices that are capable of mediated devices
>
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> changes from v2:
> * adapt to changes of previous patch
> src/PVE/Mapping/PCI.pm | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/src/PVE/Mapping/PCI.pm b/src/PVE/Mapping/PCI.pm
> index ef1bd8d..b412c4d 100644
> --- a/src/PVE/Mapping/PCI.pm
> +++ b/src/PVE/Mapping/PCI.pm
> @@ -131,7 +131,7 @@ sub options {
>
> # checks if the given config is valid for the current node
> sub assert_valid {
> - my ($name, $mapping) = @_;
> + my ($name, $mapping, $cfg) = @_;
>
> my @paths = split(';', $mapping->{path} // '');
>
> @@ -151,6 +151,7 @@ sub assert_valid {
> my $expected_props = {
> id => "$info->{vendor}:$info->{device}",
> iommugroup => $info->{iommugroup},
> + mdev => $info->{mdev},
The value here returned by SysFSTools is undef when not supported...
> };
>
> if (defined($info->{'subsystem_vendor'}) && defined($info->{'subsystem_device'})) {
> @@ -158,6 +159,8 @@ sub assert_valid {
> }
>
> my $configured_props = { $mapping->%{qw(id iommugroup subsystem-id)} };
> + # check mdev from globabl mapping config
> + $configured_props->{mdev} = $cfg->{mdev};
>
...while the value in the config can be an explicit '0', which will trip
up the check below. (It can also be undef in the config and will be via UI).
Also leads to the error
unexpected property 'mdev' configured for device '0000:00:1b.0'
when SysFSTools returns undef and the property is present in the in the
config, even though it's not per-se unexpected. If we convert to
explicit "0" and "1" values, we would also get a better error message
that points out the value mismatch.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [pve-devel] [PATCH guest-common v3 4/4] mapping: remove find_on_current_node
2024-04-19 12:45 ` [pve-devel] [PATCH guest-common v3 4/4] mapping: remove find_on_current_node Dominik Csapak
@ 2024-05-31 12:09 ` Fiona Ebner
0 siblings, 0 replies; 44+ messages in thread
From: Fiona Ebner @ 2024-05-31 12:09 UTC (permalink / raw)
To: Proxmox VE development discussion, Dominik Csapak
Am 19.04.24 um 14:45 schrieb Dominik Csapak:
> they only have one user each (where we can inline the implementation).
> It's easy enough to recreate should we need to.
>
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> src/PVE/Mapping/PCI.pm | 10 ----------
> src/PVE/Mapping/USB.pm | 9 ---------
> 2 files changed, 19 deletions(-)
>
> diff --git a/src/PVE/Mapping/PCI.pm b/src/PVE/Mapping/PCI.pm
> index b525c07..9c2b8b2 100644
> --- a/src/PVE/Mapping/PCI.pm
> +++ b/src/PVE/Mapping/PCI.pm
> @@ -214,16 +214,6 @@ sub write_pci_config {
> cfs_write_file($FILENAME, $cfg);
> }
>
> -sub find_on_current_node {
> - my ($id) = @_;
> -
> - my $cfg = PVE::Mapping::PCI::config();
> - my $node = PVE::INotify::nodename();
> -
> - # ignore errors
> - return get_node_mapping($cfg, $id, $node);
> -}
> -
> sub get_node_mapping {
> my ($cfg, $id, $nodename) = @_;
>
> diff --git a/src/PVE/Mapping/USB.pm b/src/PVE/Mapping/USB.pm
> index 34bc3cb..6dd15c4 100644
> --- a/src/PVE/Mapping/USB.pm
> +++ b/src/PVE/Mapping/USB.pm
> @@ -155,15 +155,6 @@ sub write_usb_config {
> cfs_write_file($FILENAME, $cfg);
> }
>
> -sub find_on_current_node {
> - my ($id) = @_;
> -
> - my $cfg = config();
> - my $node = PVE::INotify::nodename();
> -
> - return get_node_mapping($cfg, $id, $node);
> -}
> -
> sub get_node_mapping {
> my ($cfg, $id, $nodename) = @_;
>
Seems like the use PVE::INotify; statements can be dropped too now.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [pve-devel] [PATCH qemu-server v3 04/10] stop cleanup: remove unnecessary tpmstate cleanup
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 04/10] stop cleanup: remove unnecessary tpmstate cleanup Dominik Csapak
@ 2024-05-31 12:56 ` Fiona Ebner
0 siblings, 0 replies; 44+ messages in thread
From: Fiona Ebner @ 2024-05-31 12:56 UTC (permalink / raw)
To: Proxmox VE development discussion, Dominik Csapak
Adding the rationale for third-party plugins from last time:
https://lists.proxmox.com/pipermail/pve-devel/2024-March/062354.html
Am 19.04.24 um 14:45 schrieb Dominik Csapak:
> tpmstate0 is already included in `get_vm_volumes`, and our only storage
> plugin that has unmap_volume implemented is the RBDPlugin, where we call
> unmap in `deactivate_volume`. So it's already ummapped by the
> `deactivate_volumes` calls above.
>
For third-party storage plugins, it's natural to expect that
deactivate_volume() would also remove a mapping for the volume just
like RBDPlugin does.
While there is an explicit map_volume() call in start_swtpm(), a
third-party plugin might expect an explicit unmap_volume() call too.
However, the order of calls right now is
1. activate_volume()
2. map_volume()
3. deactivate_volume()
4. unmap_volume()
Which seems like it could cause problems already for third-party
plugins relying on an explicit unmap call.
All that said, it is unlikely that a third-party plugin breaks. If it
really happens, it can be discussed/adapted to the actual needs still.
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Acked-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
> PVE/QemuServer.pm | 8 --------
> 1 file changed, 8 deletions(-)
>
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 28e630d3..680c36f2 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -6172,14 +6172,6 @@ sub vm_stop_cleanup {
> if (!$keepActive) {
> my $vollist = get_vm_volumes($conf);
> PVE::Storage::deactivate_volumes($storecfg, $vollist);
> -
> - if (my $tpmdrive = $conf->{tpmstate0}) {
> - my $tpm = parse_drive("tpmstate0", $tpmdrive);
> - my ($storeid, $volname) = PVE::Storage::parse_volume_id($tpm->{file}, 1);
> - if ($storeid) {
> - PVE::Storage::unmap_volume($storecfg, $tpm->{file});
> - }
> - }
> }
>
> foreach my $ext (qw(mon qmp pid vnc qga)) {
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [pve-devel] [PATCH qemu-server v3 06/10] migrate: call vm_stop_cleanup after stopping in phase3_cleanup
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 06/10] migrate: call vm_stop_cleanup after stopping in phase3_cleanup Dominik Csapak
@ 2024-05-31 12:56 ` Fiona Ebner
2024-06-05 8:49 ` Dominik Csapak
0 siblings, 1 reply; 44+ messages in thread
From: Fiona Ebner @ 2024-05-31 12:56 UTC (permalink / raw)
To: Proxmox VE development discussion, Dominik Csapak
Am 19.04.24 um 14:45 schrieb Dominik Csapak:
> we currently only call deactivate_volumes, but we actually want to call
> the whole vm_stop_cleanup, since that is not invoked by the vm_stop
> above (we cannot parse the config anymore) and might do other cleanups
> we also want to do (like mdev cleanup).
>
> For this to work properly we have to clone the original config at the
> beginning, since we might modify the volids.
>
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> PVE/QemuMigrate.pm | 12 ++++++------
> test/MigrationTest/Shared.pm | 3 +++
> 2 files changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
> index 8d9b35ae..381022f5 100644
> --- a/PVE/QemuMigrate.pm
> +++ b/PVE/QemuMigrate.pm
> @@ -5,6 +5,7 @@ use warnings;
>
> use IO::File;
> use IPC::Open2;
> +use Storable qw(dclone);
> use Time::HiRes qw( usleep );
>
> use PVE::Cluster;
Needs a rebase (because of added include for PVE::AccessControl)
> @@ -1455,7 +1456,8 @@ sub phase3_cleanup {
>
> my $tunnel = $self->{tunnel};
>
> - my $sourcevollist = PVE::QemuServer::get_vm_volumes($conf);
> + # we'll need an unmodified copy of the config later for the cleanup
> + my $oldconf = dclone($conf);
>
> if ($self->{volume_map} && !$self->{opts}->{remote}) {
> my $target_drives = $self->{target_drive};
> @@ -1586,12 +1588,10 @@ sub phase3_cleanup {
> $self->{errors} = 1;
> }
>
> - # always deactivate volumes - avoid lvm LVs to be active on several nodes
> - eval {
> - PVE::Storage::deactivate_volumes($self->{storecfg}, $sourcevollist);
> - };
> + # stop with nocheck does not do a cleanup, so do it here with the original config
> + eval { PVE::QemuServer::vm_stop_cleanup($self->{storecfg}, $vmid, $oldconf) };
> if (my $err = $@) {
> - $self->log('err', $err);
> + $self->log('err', "cleanup for vm failed - $err");
Nit: "Cleanup after stopping VM failed"
Is it better to only execute this in case vm_stop() did not return an
error? Although I guess attempting cleanup in that case also doesn't hurt.
> $self->{errors} = 1;
> }
>
> diff --git a/test/MigrationTest/Shared.pm b/test/MigrationTest/Shared.pm
> index aa7203d1..2347e60a 100644
> --- a/test/MigrationTest/Shared.pm
> +++ b/test/MigrationTest/Shared.pm
> @@ -130,6 +130,9 @@ $qemu_server_module->mock(
> clear_reboot_request => sub {
> return 1;
> },
> + vm_stop_cleanup => sub {
> + return 1;
Nit: I'd just have it be return; without a value.
> + },
> get_efivars_size => sub {
> return 128 * 1024;
> },
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [pve-devel] [PATCH qemu-server v3 07/10] pci: set 'enable-migration' to on for live-migration marked mapped devices
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 07/10] pci: set 'enable-migration' to on for live-migration marked mapped devices Dominik Csapak
@ 2024-05-31 12:56 ` Fiona Ebner
2024-06-05 8:51 ` Dominik Csapak
0 siblings, 1 reply; 44+ messages in thread
From: Fiona Ebner @ 2024-05-31 12:56 UTC (permalink / raw)
To: Proxmox VE development discussion, Dominik Csapak
Am 19.04.24 um 14:45 schrieb Dominik Csapak:
> the default is 'auto', but for those which are marked as capable for
> live migration, we want to explicitly enable that, so we get an early
> error on start if the driver does not support that.
>
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> PVE/QemuServer/PCI.pm | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/PVE/QemuServer/PCI.pm b/PVE/QemuServer/PCI.pm
> index 6ba43ee8..df2ea3eb 100644
> --- a/PVE/QemuServer/PCI.pm
> +++ b/PVE/QemuServer/PCI.pm
> @@ -435,8 +435,11 @@ sub parse_hostpci {
> my $devices = PVE::Mapping::PCI::get_node_mapping($config, $mapping, $node);
> die "PCI device mapping not found for '$mapping'\n" if !$devices || !scalar($devices->@*);
>
> + my $cfg = $config->{ids}->{$mapping};
> + $res->{'live-migration-capable'} = 1 if $cfg->{'live-migration-capable'};
> +
> for my $device ($devices->@*) {
> - eval { PVE::Mapping::PCI::assert_valid($mapping, $device, $config->{ids}->{$mapping}) };
> + eval { PVE::Mapping::PCI::assert_valid($mapping, $device, $cfg) };
This hunk belongs in patch 03/10
> die "PCI device mapping invalid (hardware probably changed): $@\n" if $@;
> push $alternatives->@*, [split(/;/, $device->{path})];
> }
> @@ -635,6 +638,10 @@ sub print_hostpci_devices {
> $devicestr .= ",host=$pcidevice->{id}";
> }
>
> + if ($d->{'live-migration-capable'}) {
> + $devicestr .= ",enable-migration=on"
> + }
> +
> my $mf_addr = $multifunction ? ".$j" : '';
> $devicestr .= ",id=${id}${mf_addr}${pciaddr}${mf_addr}";
>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [pve-devel] [PATCH qemu-server v3 08/10] check_local_resources: add more info per mapped device and return as hash
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 08/10] check_local_resources: add more info per mapped device and return as hash Dominik Csapak
@ 2024-05-31 13:05 ` Fiona Ebner
0 siblings, 0 replies; 44+ messages in thread
From: Fiona Ebner @ 2024-05-31 13:05 UTC (permalink / raw)
To: Proxmox VE development discussion, Dominik Csapak
Am 19.04.24 um 14:45 schrieb Dominik Csapak:
> such as the mapping name and if it's marked for live-migration (pci only)
>
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> PVE/API2/Qemu.pm | 2 +-
> PVE/QemuMigrate.pm | 7 ++++---
> PVE/QemuServer.pm | 17 ++++++++++-------
> 3 files changed, 15 insertions(+), 11 deletions(-)
>
> diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
> index 2a349c8c..f2fa345d 100644
> --- a/PVE/API2/Qemu.pm
> +++ b/PVE/API2/Qemu.pm
> @@ -4538,7 +4538,7 @@ __PACKAGE__->register_method({
> $res->{local_disks} = [ values %$local_disks ];;
>
> $res->{local_resources} = $local_resources;
> - $res->{'mapped-resources'} = $mapped_resources;
> + $res->{'mapped-resources'} = [ keys $mapped_resources->%* ];
Sorting the keys leads to a nicer API result IMHO.
>
> return $res;
>
> diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
> index 381022f5..a46eb2a3 100644
> --- a/PVE/QemuMigrate.pm
> +++ b/PVE/QemuMigrate.pm
> @@ -233,7 +233,7 @@ sub prepare {
> my ($loc_res, $mapped_res, $missing_mappings_by_node) = PVE::QemuServer::check_local_resources($conf, 1);
> my $blocking_resources = [];
> for my $res ($loc_res->@*) {
> - if (!grep($res, $mapped_res->@*)) {
> + if (!defined($mapped_res->{$res})) {
> push $blocking_resources->@*, $res;
> }
> }
> @@ -245,10 +245,11 @@ sub prepare {
> }
> }
>
> - if (scalar($mapped_res->@*)) {
> + if (scalar(keys $mapped_res->%*)) {
> my $missing_mappings = $missing_mappings_by_node->{$self->{node}};
> + my $mapped_text = join(", ", keys $mapped_res->%*);
Nit: Can be moved into the if, keys can be sorted.
> if ($running) {
> - die "can't migrate running VM which uses mapped devices: " . join(", ", $mapped_res->@*) . "\n";
> + die "can't migrate running VM which uses mapped devices: $mapped_text\n";
> } elsif (scalar($missing_mappings->@*)) {
> die "can't migrate to '$self->{node}': missing mapped devices " . join(", ", $missing_mappings->@*) . "\n";
> } else {
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [pve-devel] [PATCH qemu-server v3 09/10] api: enable live migration for marked mapped pci devices
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 09/10] api: enable live migration for marked mapped pci devices Dominik Csapak
@ 2024-05-31 13:13 ` Fiona Ebner
0 siblings, 0 replies; 44+ messages in thread
From: Fiona Ebner @ 2024-05-31 13:13 UTC (permalink / raw)
To: Proxmox VE development discussion, Dominik Csapak
Am 19.04.24 um 14:45 schrieb Dominik Csapak:
> They have to be marked as 'live-migration-capable' in the mapping
> config, and the driver and qemu must support it.
>
> For the gui checks, we now return the whole object of the mapped
> resources, which includes info like the name and if it's marked as
> live-migration capable. (while deprecating the old 'mapped-resource'
> return value, since that returns strictly less information)
>
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> PVE/API2/Qemu.pm | 7 ++++++-
> PVE/QemuMigrate.pm | 13 +++++++++----
> 2 files changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
> index f2fa345d..f95d8d95 100644
> --- a/PVE/API2/Qemu.pm
> +++ b/PVE/API2/Qemu.pm
> @@ -4473,7 +4473,11 @@ __PACKAGE__->register_method({
> },
> 'mapped-resources' => {
> type => 'array',
> - description => "List of mapped resources e.g. pci, usb"
> + description => "List of mapped resources e.g. pci, usb. Deprecated, use 'mapped-resource-info' instead."
> + },
Please add a comment to remind us to remove it in PVE 9
> + 'mapped-resource-info' => {
> + type => 'object',
> + description => "Object of mapped resources with additional information such if they're live migratable.",
> },
> },
> },
> @@ -4539,6 +4543,7 @@ __PACKAGE__->register_method({
>
> $res->{local_resources} = $local_resources;
> $res->{'mapped-resources'} = [ keys $mapped_resources->%* ];
> + $res->{'mapped-resource-info'} = $mapped_resources;
>
> return $res;
>
> diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
> index a46eb2a3..89caefc7 100644
> --- a/PVE/QemuMigrate.pm
> +++ b/PVE/QemuMigrate.pm
> @@ -247,11 +247,16 @@ sub prepare {
>
> if (scalar(keys $mapped_res->%*)) {
> my $missing_mappings = $missing_mappings_by_node->{$self->{node}};
> - my $mapped_text = join(", ", keys $mapped_res->%*);
> - if ($running) {
> - die "can't migrate running VM which uses mapped devices: $mapped_text\n";
> - } elsif (scalar($missing_mappings->@*)) {
> + my $missing_live_mappings = [];
> + for my $key (keys $mapped_res->%*) {
Maybe sort
> + my $res = $mapped_res->{$key};
> + my $name = "$key:$res->{name}";
> + push $missing_live_mappings->@*, $name if !$res->{'live-migration'};
> + }
> + if (scalar($missing_mappings->@*)) {
> die "can't migrate to '$self->{node}': missing mapped devices " . join(", ", $missing_mappings->@*) . "\n";
> + } elsif ($running && scalar($missing_live_mappings->@*)) {
> + die "can't live migrate running VM which uses following mapped devices: " . join(", ", $missing_live_mappings->@*) . "\n";
Style nit: line too long
> } else {
> $self->log('info', "migrating VM which uses mapped local devices");
> }
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [pve-devel] [PATCH qemu-server v3 10/10] api: include not mapped resources for running vms in migrate preconditions
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 10/10] api: include not mapped resources for running vms in migrate preconditions Dominik Csapak
@ 2024-05-31 13:37 ` Fiona Ebner
2024-06-05 9:21 ` Dominik Csapak
0 siblings, 1 reply; 44+ messages in thread
From: Fiona Ebner @ 2024-05-31 13:37 UTC (permalink / raw)
To: Proxmox VE development discussion, Dominik Csapak
Am 19.04.24 um 14:45 schrieb Dominik Csapak:
> so that we can show a proper warning in the migrate dialog and check it
> in the bulk migrate precondition check
>
> the unavailable_storages and should be the same as before, but
> we now always return allowed_nodes too.
>
> also add a note that we want to redesign the return values here, to make
> * the api call simpler
> * return better structured values
>
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> PVE/API2/Qemu.pm | 39 ++++++++++++++++++++++-----------------
> 1 file changed, 22 insertions(+), 17 deletions(-)
>
> diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
> index f95d8d95..94aa9942 100644
> --- a/PVE/API2/Qemu.pm
> +++ b/PVE/API2/Qemu.pm
> @@ -4450,18 +4450,20 @@ __PACKAGE__->register_method({
> },
> },
> returns => {
> + # TODO 9.x: rework the api call to return more sensible structures
> + # e.g. a simple list of nodes with their blockers and/or notices to show
> type => "object",
> properties => {
> running => { type => 'boolean' },
> allowed_nodes => {
> type => 'array',
> optional => 1,
> - description => "List nodes allowed for offline migration, only passed if VM is offline"
> + description => "List nodes allowed for offline migration.",
It still only returns the actual list of allowed nodes if not running.
My idea was to return the allowed nodes in both cases. If we keep the
parameter specific to offline migration, I'd still keep the return guarded.
> },
> not_allowed_nodes => {
> type => 'object',
> optional => 1,
> - description => "List not allowed nodes with additional informations, only passed if VM is offline"
> + description => "List not allowed nodes with additional information.",
> },
> local_disks => {
> type => 'array',
> @@ -4518,28 +4520,31 @@ __PACKAGE__->register_method({
>
> # if vm is not running, return target nodes where local storage/mapped devices are available
> # for offline migration
> + $res->{allowed_nodes} = [];
> + $res->{not_allowed_nodes} = {};
> if (!$res->{running}) {
> - $res->{allowed_nodes} = [];
> - my $checked_nodes = PVE::QemuServer::check_local_storage_availability($vmconf, $storecfg);
> - delete $checked_nodes->{$localnode};
> -
> - foreach my $node (keys %$checked_nodes) {
> - my $missing_mappings = $missing_mappings_by_node->{$node};
> - if (scalar($missing_mappings->@*)) {
> - $checked_nodes->{$node}->{'unavailable-resources'} = $missing_mappings;
> - next;
> - }
> + $res->{not_allowed_nodes} = PVE::QemuServer::check_local_storage_availability($vmconf, $storecfg);
> + delete $res->{not_allowed_nodes}->{$localnode};
> + }
>
> - if (!defined($checked_nodes->{$node}->{unavailable_storages})) {
> - push @{$res->{allowed_nodes}}, $node;
> - }
> + my $merged = { $res->{not_allowed_nodes}->%*, $missing_mappings_by_node->%* };
>
If we'd need this, I'd just get the keys directly:
my @keys = keys { ... }->%*;
But it just reads wrong. Why even consider the keys for the
not_allowed_nodes? Doesn't this just work because
$missing_mappings_by_node already contains all other node keys (and so
we can simply iterate over those keys)?
> + for my $node (keys $merged->%*) {
> + my $missing_mappings = $missing_mappings_by_node->{$node};
> + if (scalar($missing_mappings->@*)) {
> + $res->{not_allowed_nodes}->{$node}->{'unavailable-resources'} = $missing_mappings;
> + next;
> + }
> +
> + if (!$res->{running}) {
> + if (!defined($res->{not_allowed_nodes}->{$node}->{unavailable_storages})) {
> + push $res->{allowed_nodes}->@*, $node;
> + }
> }
> - $res->{not_allowed_nodes} = $checked_nodes;
> }
>
> my $local_disks = &$check_vm_disks_local($storecfg, $vmconf, $vmid);
> - $res->{local_disks} = [ values %$local_disks ];;
> + $res->{local_disks} = [ values %$local_disks ];
>
> $res->{local_resources} = $local_resources;
> $res->{'mapped-resources'} = [ keys $mapped_resources->%* ];
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration
2024-04-19 12:45 [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
` (22 preceding siblings ...)
2024-05-31 11:14 ` Fiona Ebner
@ 2024-05-31 13:40 ` Fiona Ebner
2024-06-06 9:38 ` Dominik Csapak
24 siblings, 0 replies; 44+ messages in thread
From: Fiona Ebner @ 2024-05-31 13:40 UTC (permalink / raw)
To: Proxmox VE development discussion, Dominik Csapak
Am 19.04.24 um 14:45 schrieb Dominik Csapak:
>
> qemu-server:
>
> Dominik Csapak (10):
> usb: mapping: move implementation of find_on_current_node here
> pci: mapping: move implementation of find_on_current_node here
> pci: mapping: check mdev config against hardware
> stop cleanup: remove unnecessary tpmstate cleanup
> vm_stop_cleanup: add noerr parameter
> migrate: call vm_stop_cleanup after stopping in phase3_cleanup
> pci: set 'enable-migration' to on for live-migration marked mapped
> devices
> check_local_resources: add more info per mapped device and return as
> hash
> api: enable live migration for marked mapped pci devices
> api: include not mapped resources for running vms in migrate
> preconditions
>
With the hunk that doesn't belong to 07/10 moved to 03/10, consider the
qemu-server patches up to 09/10:
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration
[not found] ` <954884c6-3a53-4b3a-bdc8-93478037b6a6@binovo.es>
2024-05-31 8:41 ` Dominik Csapak
@ 2024-06-03 8:43 ` Eneko Lacunza via pve-devel
1 sibling, 0 replies; 44+ messages in thread
From: Eneko Lacunza via pve-devel @ 2024-06-03 8:43 UTC (permalink / raw)
To: pve-devel; +Cc: Eneko Lacunza
[-- Attachment #1: Type: message/rfc822, Size: 7039 bytes --]
From: Eneko Lacunza <elacunza@binovo.es>
To: pve-devel@lists.proxmox.com
Subject: Re: [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration
Date: Mon, 3 Jun 2024 10:43:26 +0200
Message-ID: <f834a0aa-eb52-416e-9ca2-fcead4b8ba7f@binovo.es>
Hi,
Sorry for the noise, the issue I described is already resolved.
For other users convenience:
At datacenter->Resource mappings a mapping has to be created for nvidia
mediated devices.
Then at VM hardware, PCI device can be added as mapped device, which is
assigned to first free mdev on host.
Thanks
El 31/5/24 a las 10:11, Eneko Lacunza escribió:
>
> Hi Dominik,
>
>
> El 28/5/24 a las 9:25, Dominik Csapak escribió:
>> ping?
>>
>> i know we cannot really test this atm but there are a few
>> users/customers
>> that would like to use/test this
>>
>> since i marked it experimental i think it should be ok to include this
>> if its ok code wise
>>
>
> Would this patch-series help in this scenario:
>
> - Running VM has a Nvidia mdev asigned in hw.
> - We clone that VM. VM can't be started until we change the hw id to
> another Nvidia mdev.
>
> If so, I have a potential customer that is looking to migrate a VDI
> deployment from Nutanix to Proxmox. They're currently testing Proxmox
> with one server with a Nvidia card, and can test this if packages are
> prepared (in testing?).
>
> We deployed this Proxmox server yesterday, we used latest NVIDIA host
> driver. Latest kernel 6.8 didn't work but latest 6.5 did (will report
> exact versions when I get remote access).
>
> Thanks
>
> Eneko Lacunza
> Zuzendari teknikoa | Director técnico
> Binovo IT Human Project
>
> Tel. +34 943 569 206 | https://www.binovo.es
> Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
>
> https://www.youtube.com/user/CANALBINOVO
> https://www.linkedin.com/company/37269706/
>
Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project
Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration
2024-05-31 11:14 ` Fiona Ebner
@ 2024-06-05 8:01 ` Dominik Csapak
0 siblings, 0 replies; 44+ messages in thread
From: Dominik Csapak @ 2024-06-05 8:01 UTC (permalink / raw)
To: Fiona Ebner, Proxmox VE development discussion
On 5/31/24 13:14, Fiona Ebner wrote:
> Am 19.04.24 um 14:45 schrieb Dominik Csapak:
>> and some useful cleanups
>>
>> Resending even there was not much feedback, because i worked in some
>> minor fixes/changes in the meantime.
>>
>> A user tested the previous patch series and only found one issue with
>> the ui, see the comments on bug #5175
>>
>> https://bugzilla.proxmox.com/show_bug.cgi?id=5175
>>
>>
>> This is implemented for mapped resources. This requires driver and
>> hardware support, but aside from nvidia vgpus there don't seem to be
>> many drivers (if any) that do support that.
>>
>> qemu already supports that for vfio-pci devices, so nothing to be
>> done there besides actively enabling it.
>>
>> Since we currently can't properly test it here and very much depends on
>> hardware/driver support, mark it as experimental everywhere (docs/api/gui).
>> (though i tested the live-migration part manually here by using
>> "exec:cat > /tmp/test" for the migration target, and "exec: cat
>> /tmp/test" as the 'incoming' parameter for a new vm start, which worked ;) )
>>
>> i opted for marking them migratable at the mapping level, but we could
>> theoretically also put it in the hostpciX config instead.
>> (though imho it fits better in the cluster-wide resource mapping config)
>>
>> also the naming/texts could probably be improved, but i think
>> 'live-migration-capable' is very descriptive and i didn't want to
>> use an overly short name for it (which can be confusing, see the
>> 'shared' flag for storages)
>>
>> guest-common 2/4 semi-breaks pve-manager without pve-manager 1/5
>> and qemu-server without 3/10
>> (would always fail for devices capable of mediated devices)
>>
>
> And guest-common 4/4 fully breaks old qemu-server because it removes the
> find_on_current_node() functions.
>
true, though we could leave it in, and mark it with a TODO to remove in 9.x ?
but i'll leave that to the packaging people to decide which variant is less
work/hassle ;)
>> guest-common 1,2; qemu-server 1-6; pve-manager 1,2
>> are preparations/cleanups mostly and could be applied independently
>>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [pve-devel] [PATCH qemu-server v3 06/10] migrate: call vm_stop_cleanup after stopping in phase3_cleanup
2024-05-31 12:56 ` Fiona Ebner
@ 2024-06-05 8:49 ` Dominik Csapak
0 siblings, 0 replies; 44+ messages in thread
From: Dominik Csapak @ 2024-06-05 8:49 UTC (permalink / raw)
To: Fiona Ebner, Proxmox VE development discussion
On 5/31/24 14:56, Fiona Ebner wrote:
> Am 19.04.24 um 14:45 schrieb Dominik Csapak:
>> we currently only call deactivate_volumes, but we actually want to call
>> the whole vm_stop_cleanup, since that is not invoked by the vm_stop
>> above (we cannot parse the config anymore) and might do other cleanups
>> we also want to do (like mdev cleanup).
>>
>> For this to work properly we have to clone the original config at the
>> beginning, since we might modify the volids.
>>
>> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
>> ---
>> PVE/QemuMigrate.pm | 12 ++++++------
>> test/MigrationTest/Shared.pm | 3 +++
>> 2 files changed, 9 insertions(+), 6 deletions(-)
>>
>> diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
>> index 8d9b35ae..381022f5 100644
>> --- a/PVE/QemuMigrate.pm
>> +++ b/PVE/QemuMigrate.pm
>> @@ -5,6 +5,7 @@ use warnings;
>>
>> use IO::File;
>> use IPC::Open2;
>> +use Storable qw(dclone);
>> use Time::HiRes qw( usleep );
>>
>> use PVE::Cluster;
>
> Needs a rebase (because of added include for PVE::AccessControl)
>
>> @@ -1455,7 +1456,8 @@ sub phase3_cleanup {
>>
>> my $tunnel = $self->{tunnel};
>>
>> - my $sourcevollist = PVE::QemuServer::get_vm_volumes($conf);
>> + # we'll need an unmodified copy of the config later for the cleanup
>> + my $oldconf = dclone($conf);
>>
>> if ($self->{volume_map} && !$self->{opts}->{remote}) {
>> my $target_drives = $self->{target_drive};
>> @@ -1586,12 +1588,10 @@ sub phase3_cleanup {
>> $self->{errors} = 1;
>> }
>>
>> - # always deactivate volumes - avoid lvm LVs to be active on several nodes
>> - eval {
>> - PVE::Storage::deactivate_volumes($self->{storecfg}, $sourcevollist);
>> - };
>> + # stop with nocheck does not do a cleanup, so do it here with the original config
>> + eval { PVE::QemuServer::vm_stop_cleanup($self->{storecfg}, $vmid, $oldconf) };
>> if (my $err = $@) {
>> - $self->log('err', $err);
>> + $self->log('err', "cleanup for vm failed - $err");
>
> Nit: "Cleanup after stopping VM failed"
>
> Is it better to only execute this in case vm_stop() did not return an
> error? Although I guess attempting cleanup in that case also doesn't hurt.
not sure honestly, we cannot really know at this point when the vm stop failed and
if we should do a cleanup.. my guess is that when the vm is still running
the cleanup will fail anyway at some step
but IMHO doing it and potentially generating more warning/error output
vs. not doing it and missing some cleanup, i'd prefer the former
>
>> $self->{errors} = 1;
>> }
>>
>> diff --git a/test/MigrationTest/Shared.pm b/test/MigrationTest/Shared.pm
>> index aa7203d1..2347e60a 100644
>> --- a/test/MigrationTest/Shared.pm
>> +++ b/test/MigrationTest/Shared.pm
>> @@ -130,6 +130,9 @@ $qemu_server_module->mock(
>> clear_reboot_request => sub {
>> return 1;
>> },
>> + vm_stop_cleanup => sub {
>> + return 1;
>
> Nit: I'd just have it be return; without a value.
>
>> + },
>> get_efivars_size => sub {
>> return 128 * 1024;
>> },
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [pve-devel] [PATCH qemu-server v3 07/10] pci: set 'enable-migration' to on for live-migration marked mapped devices
2024-05-31 12:56 ` Fiona Ebner
@ 2024-06-05 8:51 ` Dominik Csapak
2024-06-05 9:31 ` Fiona Ebner
0 siblings, 1 reply; 44+ messages in thread
From: Dominik Csapak @ 2024-06-05 8:51 UTC (permalink / raw)
To: Fiona Ebner, Proxmox VE development discussion
On 5/31/24 14:56, Fiona Ebner wrote:
> Am 19.04.24 um 14:45 schrieb Dominik Csapak:
>> the default is 'auto', but for those which are marked as capable for
>> live migration, we want to explicitly enable that, so we get an early
>> error on start if the driver does not support that.
>>
>> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
>> ---
>> PVE/QemuServer/PCI.pm | 9 ++++++++-
>> 1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/PVE/QemuServer/PCI.pm b/PVE/QemuServer/PCI.pm
>> index 6ba43ee8..df2ea3eb 100644
>> --- a/PVE/QemuServer/PCI.pm
>> +++ b/PVE/QemuServer/PCI.pm
>> @@ -435,8 +435,11 @@ sub parse_hostpci {
>> my $devices = PVE::Mapping::PCI::get_node_mapping($config, $mapping, $node);
>> die "PCI device mapping not found for '$mapping'\n" if !$devices || !scalar($devices->@*);
>>
>> + my $cfg = $config->{ids}->{$mapping};
>> + $res->{'live-migration-capable'} = 1 if $cfg->{'live-migration-capable'};
>> +
>> for my $device ($devices->@*) {
>> - eval { PVE::Mapping::PCI::assert_valid($mapping, $device, $config->{ids}->{$mapping}) };
>> + eval { PVE::Mapping::PCI::assert_valid($mapping, $device, $cfg) };
>
>
> This hunk belongs in patch 03/10)
why though?
in 03/10 i adapt to the new $cfg parameter to check the mdev parameter,
and here i refactor it because we have to access it twice
(for the live-migration-capable flag)
i mean i can pull it out into the $cfg before, but the the live-migration
part does IMO not belong in 03/10
>
>> die "PCI device mapping invalid (hardware probably changed): $@\n" if $@;
>> push $alternatives->@*, [split(/;/, $device->{path})];
>> }
>> @@ -635,6 +638,10 @@ sub print_hostpci_devices {
>> $devicestr .= ",host=$pcidevice->{id}";
>> }
>>
>> + if ($d->{'live-migration-capable'}) {
>> + $devicestr .= ",enable-migration=on"
>> + }
>> +
>> my $mf_addr = $multifunction ? ".$j" : '';
>> $devicestr .= ",id=${id}${mf_addr}${pciaddr}${mf_addr}";
>>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [pve-devel] [PATCH qemu-server v3 10/10] api: include not mapped resources for running vms in migrate preconditions
2024-05-31 13:37 ` Fiona Ebner
@ 2024-06-05 9:21 ` Dominik Csapak
2024-06-05 9:38 ` Fiona Ebner
0 siblings, 1 reply; 44+ messages in thread
From: Dominik Csapak @ 2024-06-05 9:21 UTC (permalink / raw)
To: Fiona Ebner, Proxmox VE development discussion
On 5/31/24 15:37, Fiona Ebner wrote:
> Am 19.04.24 um 14:45 schrieb Dominik Csapak:
>> so that we can show a proper warning in the migrate dialog and check it
>> in the bulk migrate precondition check
>>
>> the unavailable_storages and should be the same as before, but
>> we now always return allowed_nodes too.
>>
>> also add a note that we want to redesign the return values here, to make
>> * the api call simpler
>> * return better structured values
>>
>> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
>> ---
>> PVE/API2/Qemu.pm | 39 ++++++++++++++++++++++-----------------
>> 1 file changed, 22 insertions(+), 17 deletions(-)
>>
>> diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
>> index f95d8d95..94aa9942 100644
>> --- a/PVE/API2/Qemu.pm
>> +++ b/PVE/API2/Qemu.pm
>> @@ -4450,18 +4450,20 @@ __PACKAGE__->register_method({
>> },
>> },
>> returns => {
>> + # TODO 9.x: rework the api call to return more sensible structures
>> + # e.g. a simple list of nodes with their blockers and/or notices to show
>> type => "object",
>> properties => {
>> running => { type => 'boolean' },
>> allowed_nodes => {
>> type => 'array',
>> optional => 1,
>> - description => "List nodes allowed for offline migration, only passed if VM is offline"
>> + description => "List nodes allowed for offline migration.",
>
> It still only returns the actual list of allowed nodes if not running.
> My idea was to return the allowed nodes in both cases. If we keep the
> parameter specific to offline migration, I'd still keep the return guarded.
>
mhmm not sure why i did it this way... but yeah
i'd rather not break the api and just not return the that in the online case
>> },
>> not_allowed_nodes => {
>> type => 'object',
>> optional => 1,
>> - description => "List not allowed nodes with additional informations, only passed if VM is offline"
>> + description => "List not allowed nodes with additional information.",
>> },
>> local_disks => {
>> type => 'array',
>> @@ -4518,28 +4520,31 @@ __PACKAGE__->register_method({
>>
>> # if vm is not running, return target nodes where local storage/mapped devices are available
>> # for offline migration
>> + $res->{allowed_nodes} = [];
>> + $res->{not_allowed_nodes} = {};
>> if (!$res->{running}) {
>> - $res->{allowed_nodes} = [];
>> - my $checked_nodes = PVE::QemuServer::check_local_storage_availability($vmconf, $storecfg);
>> - delete $checked_nodes->{$localnode};
>> -
>> - foreach my $node (keys %$checked_nodes) {
>> - my $missing_mappings = $missing_mappings_by_node->{$node};
>> - if (scalar($missing_mappings->@*)) {
>> - $checked_nodes->{$node}->{'unavailable-resources'} = $missing_mappings;
>> - next;
>> - }
>> + $res->{not_allowed_nodes} = PVE::QemuServer::check_local_storage_availability($vmconf, $storecfg);
>> + delete $res->{not_allowed_nodes}->{$localnode};
>> + }
>>
>> - if (!defined($checked_nodes->{$node}->{unavailable_storages})) {
>> - push @{$res->{allowed_nodes}}, $node;
>> - }
>> + my $merged = { $res->{not_allowed_nodes}->%*, $missing_mappings_by_node->%* };
>>
>
> If we'd need this, I'd just get the keys directly:
> my @keys = keys { ... }->%*;
>
> But it just reads wrong. Why even consider the keys for the
> not_allowed_nodes? Doesn't this just work because
> $missing_mappings_by_node already contains all other node keys (and so
> we can simply iterate over those keys)?
huh? we need to iterate over all nodes in 'not_allowed_nodes' for the
unavailable storage check
but we also want to include all nodes that are in the 'missing_mappings_by_node'
nonetheless, i currently find the whole code also not very readable
i'll try to come up with something better...
(i.e. iterate over all nodes via get_nodelist and
fill the allowed/not_allowed based on the missing storage/mapping list)
>
>> + for my $node (keys $merged->%*) {
>> + my $missing_mappings = $missing_mappings_by_node->{$node};
>> + if (scalar($missing_mappings->@*)) {
>> + $res->{not_allowed_nodes}->{$node}->{'unavailable-resources'} = $missing_mappings;
>> + next;
>> + }
>> +
>> + if (!$res->{running}) {
>> + if (!defined($res->{not_allowed_nodes}->{$node}->{unavailable_storages})) {
>> + push $res->{allowed_nodes}->@*, $node;
>> + }
>> }
>> - $res->{not_allowed_nodes} = $checked_nodes;
>> }
>>
>> my $local_disks = &$check_vm_disks_local($storecfg, $vmconf, $vmid);
>> - $res->{local_disks} = [ values %$local_disks ];;
>> + $res->{local_disks} = [ values %$local_disks ];
>>
>> $res->{local_resources} = $local_resources;
>> $res->{'mapped-resources'} = [ keys $mapped_resources->%* ];
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [pve-devel] [PATCH qemu-server v3 07/10] pci: set 'enable-migration' to on for live-migration marked mapped devices
2024-06-05 8:51 ` Dominik Csapak
@ 2024-06-05 9:31 ` Fiona Ebner
0 siblings, 0 replies; 44+ messages in thread
From: Fiona Ebner @ 2024-06-05 9:31 UTC (permalink / raw)
To: Dominik Csapak, Proxmox VE development discussion
Am 05.06.24 um 10:51 schrieb Dominik Csapak:
> On 5/31/24 14:56, Fiona Ebner wrote:
>> Am 19.04.24 um 14:45 schrieb Dominik Csapak:
>>> the default is 'auto', but for those which are marked as capable for
>>> live migration, we want to explicitly enable that, so we get an early
>>> error on start if the driver does not support that.
>>>
>>> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
>>> ---
>>> PVE/QemuServer/PCI.pm | 9 ++++++++-
>>> 1 file changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/PVE/QemuServer/PCI.pm b/PVE/QemuServer/PCI.pm
>>> index 6ba43ee8..df2ea3eb 100644
>>> --- a/PVE/QemuServer/PCI.pm
>>> +++ b/PVE/QemuServer/PCI.pm
>>> @@ -435,8 +435,11 @@ sub parse_hostpci {
>>> my $devices = PVE::Mapping::PCI::get_node_mapping($config,
>>> $mapping, $node);
>>> die "PCI device mapping not found for '$mapping'\n" if
>>> !$devices || !scalar($devices->@*);
>>> + my $cfg = $config->{ids}->{$mapping};
>>> + $res->{'live-migration-capable'} = 1 if
>>> $cfg->{'live-migration-capable'};
>>> +
>>> for my $device ($devices->@*) {
>>> - eval { PVE::Mapping::PCI::assert_valid($mapping, $device,
>>> $config->{ids}->{$mapping}) };
>>> + eval { PVE::Mapping::PCI::assert_valid($mapping, $device,
>>> $cfg) };
>>
>>
>> This hunk belongs in patch 03/10)
>
> why though?
>
> in 03/10 i adapt to the new $cfg parameter to check the mdev parameter,
> and here i refactor it because we have to access it twice
> (for the live-migration-capable flag)
>
> i mean i can pull it out into the $cfg before, but the the live-migration
> part does IMO not belong in 03/10
>
Oh, sorry, I somehow thought this was a fix-up for the assert_valid()
call. You're right, is fine as it is.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [pve-devel] [PATCH qemu-server v3 10/10] api: include not mapped resources for running vms in migrate preconditions
2024-06-05 9:21 ` Dominik Csapak
@ 2024-06-05 9:38 ` Fiona Ebner
0 siblings, 0 replies; 44+ messages in thread
From: Fiona Ebner @ 2024-06-05 9:38 UTC (permalink / raw)
To: Dominik Csapak, Proxmox VE development discussion
Am 05.06.24 um 11:21 schrieb Dominik Csapak:
> On 5/31/24 15:37, Fiona Ebner wrote:
>> Am 19.04.24 um 14:45 schrieb Dominik Csapak:
>>> @@ -4518,28 +4520,31 @@ __PACKAGE__->register_method({
>>> # if vm is not running, return target nodes where local
>>> storage/mapped devices are available
>>> # for offline migration
>>> + $res->{allowed_nodes} = [];
>>> + $res->{not_allowed_nodes} = {};
>>> if (!$res->{running}) {
>>> - $res->{allowed_nodes} = [];
>>> - my $checked_nodes =
>>> PVE::QemuServer::check_local_storage_availability($vmconf, $storecfg);
>>> - delete $checked_nodes->{$localnode};
>>> -
>>> - foreach my $node (keys %$checked_nodes) {
>>> - my $missing_mappings = $missing_mappings_by_node->{$node};
>>> - if (scalar($missing_mappings->@*)) {
>>> - $checked_nodes->{$node}->{'unavailable-resources'} =
>>> $missing_mappings;
>>> - next;
>>> - }
>>> + $res->{not_allowed_nodes} =
>>> PVE::QemuServer::check_local_storage_availability($vmconf, $storecfg);
>>> + delete $res->{not_allowed_nodes}->{$localnode};
>>> + }
>>> - if
>>> (!defined($checked_nodes->{$node}->{unavailable_storages})) {
>>> - push @{$res->{allowed_nodes}}, $node;
>>> - }
>>> + my $merged = { $res->{not_allowed_nodes}->%*,
>>> $missing_mappings_by_node->%* };
>>>
>>
>> If we'd need this, I'd just get the keys directly:
>> my @keys = keys { ... }->%*;
>>
>> But it just reads wrong. Why even consider the keys for the
>> not_allowed_nodes? Doesn't this just work because
>> $missing_mappings_by_node already contains all other node keys (and so
>> we can simply iterate over those keys)?
>
> huh? we need to iterate over all nodes in 'not_allowed_nodes' for the
> unavailable storage check
No, you want to iterate over all nodes (except localhost). The loop
below also constructs the list of allowed nodes after all.
>
> but we also want to include all nodes that are in the
> 'missing_mappings_by_node'
Which are all nodes (except localhost), no matter if there are missing
mappings or not, because check_local_resources() has:
my $missing_mappings_by_node = { map { $_ => [] } @$nodelist };
>
>
> nonetheless, i currently find the whole code also not very readable
> i'll try to come up with something better...
> (i.e. iterate over all nodes via get_nodelist and
> fill the allowed/not_allowed based on the missing storage/mapping list)
>
Yes, that's more straight-forward. All nodes except localhost ;)
>>
>>> + for my $node (keys $merged->%*) {
>>> + my $missing_mappings = $missing_mappings_by_node->{$node};
>>> + if (scalar($missing_mappings->@*)) {
>>> +
>>> $res->{not_allowed_nodes}->{$node}->{'unavailable-resources'} =
>>> $missing_mappings;
>>> + next;
>>> + }
>>> +
>>> + if (!$res->{running}) {
>>> + if
>>> (!defined($res->{not_allowed_nodes}->{$node}->{unavailable_storages})) {
>>> + push $res->{allowed_nodes}->@*, $node;
>>> + }
>>> }
>>> - $res->{not_allowed_nodes} = $checked_nodes;
>>> }
>>> my $local_disks = &$check_vm_disks_local($storecfg, $vmconf,
>>> $vmid);
>>> - $res->{local_disks} = [ values %$local_disks ];;
>>> + $res->{local_disks} = [ values %$local_disks ];
>>> $res->{local_resources} = $local_resources;
>>> $res->{'mapped-resources'} = [ keys $mapped_resources->%* ];
>
>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration
2024-04-19 12:45 [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
` (23 preceding siblings ...)
2024-05-31 13:40 ` Fiona Ebner
@ 2024-06-06 9:38 ` Dominik Csapak
24 siblings, 0 replies; 44+ messages in thread
From: Dominik Csapak @ 2024-06-06 9:38 UTC (permalink / raw)
To: pve-devel
sent a v4 here: https://lists.proxmox.com/pipermail/pve-devel/2024-June/064088.html
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2024-06-06 9:38 UTC | newest]
Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-19 12:45 [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
2024-04-19 12:45 ` [pve-devel] [PATCH guest-common v3 1/4] mapping: pci: rework properties check Dominik Csapak
2024-05-31 11:37 ` Fiona Ebner
2024-04-19 12:45 ` [pve-devel] [PATCH guest-common v3 2/4] mapping: pci: check the mdev configuration on the device too Dominik Csapak
2024-05-31 12:02 ` Fiona Ebner
2024-04-19 12:45 ` [pve-devel] [PATCH guest-common v3 3/4] mapping: pci: add 'live-migration-capable' flag to mappings Dominik Csapak
2024-04-19 12:45 ` [pve-devel] [PATCH guest-common v3 4/4] mapping: remove find_on_current_node Dominik Csapak
2024-05-31 12:09 ` Fiona Ebner
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 01/10] usb: mapping: move implementation of find_on_current_node here Dominik Csapak
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 02/10] pci: " Dominik Csapak
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 03/10] pci: mapping: check mdev config against hardware Dominik Csapak
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 04/10] stop cleanup: remove unnecessary tpmstate cleanup Dominik Csapak
2024-05-31 12:56 ` Fiona Ebner
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 05/10] vm_stop_cleanup: add noerr parameter Dominik Csapak
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 06/10] migrate: call vm_stop_cleanup after stopping in phase3_cleanup Dominik Csapak
2024-05-31 12:56 ` Fiona Ebner
2024-06-05 8:49 ` Dominik Csapak
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 07/10] pci: set 'enable-migration' to on for live-migration marked mapped devices Dominik Csapak
2024-05-31 12:56 ` Fiona Ebner
2024-06-05 8:51 ` Dominik Csapak
2024-06-05 9:31 ` Fiona Ebner
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 08/10] check_local_resources: add more info per mapped device and return as hash Dominik Csapak
2024-05-31 13:05 ` Fiona Ebner
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 09/10] api: enable live migration for marked mapped pci devices Dominik Csapak
2024-05-31 13:13 ` Fiona Ebner
2024-04-19 12:45 ` [pve-devel] [PATCH qemu-server v3 10/10] api: include not mapped resources for running vms in migrate preconditions Dominik Csapak
2024-05-31 13:37 ` Fiona Ebner
2024-06-05 9:21 ` Dominik Csapak
2024-06-05 9:38 ` Fiona Ebner
2024-04-19 12:45 ` [pve-devel] [PATCH manager v3 1/5] mapping: pci: include mdev in config checks Dominik Csapak
2024-04-19 12:45 ` [pve-devel] [PATCH manager v3 2/5] bulk migrate: improve precondition checks Dominik Csapak
2024-04-19 12:45 ` [pve-devel] [PATCH manager v3 3/5] bulk migrate: include checks for live-migratable local resources Dominik Csapak
2024-04-19 12:45 ` [pve-devel] [PATCH manager v3 4/5] ui: adapt migration window to precondition api change Dominik Csapak
2024-04-19 12:45 ` [pve-devel] [PATCH manager v3 5/5] fix #5175: ui: allow configuring and live migration of mapped pci resources Dominik Csapak
2024-04-19 12:45 ` [pve-devel] [PATCH docs v3 1/2] qm: resource mapping: add description for `mdev` option Dominik Csapak
2024-04-19 12:45 ` [pve-devel] [PATCH docs v3 2/2] qm: resource mapping: document `live-migration-capable` setting Dominik Csapak
2024-05-28 7:25 ` [pve-devel] [PATCH guest-common/qemu-server/manager/docs v3 0/4] implement experimental vgpu live migration Dominik Csapak
2024-05-31 8:11 ` Eneko Lacunza via pve-devel
[not found] ` <954884c6-3a53-4b3a-bdc8-93478037b6a6@binovo.es>
2024-05-31 8:41 ` Dominik Csapak
2024-06-03 8:43 ` Eneko Lacunza via pve-devel
2024-05-31 11:14 ` Fiona Ebner
2024-06-05 8:01 ` Dominik Csapak
2024-05-31 13:40 ` Fiona Ebner
2024-06-06 9:38 ` Dominik Csapak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox