public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH qemu-server, container v2 0/6] migration: don't scan all storages, fail on aliases
@ 2023-05-12 12:40 Aaron Lauterer
  2023-05-12 12:40 ` [pve-devel] [PATCH qemu-server v2 1/6] migration: only migrate disks used by the guest Aaron Lauterer
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Aaron Lauterer @ 2023-05-12 12:40 UTC (permalink / raw)
  To: pve-devel

v1 was the series: avoid migrating disk images multiple times

We decided to take a different approach to avoid migrating potentially
aliased disk images / volumes in a migration.

Do not scan the storages for potential images belonging to the guest.
Only migrate images referenced in the config. This made it necessary to
add steps that used to be covered by the initial storage scan.
Especially pending changes need to be handled explicitly.

The patches are split in two parts for each repo, first remove the scan
of all storages, secondly we check for aliases and fail.

Due to the hard fail, this series is intended for Proxmox VE 8!

More details in the patches themselves.

qemu-server: Aaron Lauterer (4):
  migration: only migrate disks used by the guest
  tests: add migration test for pending disk
  migration: fail when aliased volume is detected
  tests: add migration alias check

 PVE/QemuMigrate.pm                    |  81 ++++++++--------
 test/MigrationTest/QemuMigrateMock.pm |  10 ++
 test/run_qemu_migrate_tests.pl        | 132 ++++++++++++++++++++++++--
 3 files changed, 173 insertions(+), 50 deletions(-)

container: Aaron Lauterer (2):
  migration: only migrate volumes used by the guest
  migration: fail when aliased volume is detected

 src/PVE/LXC/Migrate.pm | 56 ++++++++++++++++++++----------------------
 1 file changed, 26 insertions(+), 30 deletions(-)

-- 
2.30.2





^ permalink raw reply	[flat|nested] 17+ messages in thread

* [pve-devel] [PATCH qemu-server v2 1/6] migration: only migrate disks used by the guest
  2023-05-12 12:40 [pve-devel] [PATCH qemu-server, container v2 0/6] migration: don't scan all storages, fail on aliases Aaron Lauterer
@ 2023-05-12 12:40 ` Aaron Lauterer
  2023-05-22 11:59   ` Fiona Ebner
  2023-05-12 12:40 ` [pve-devel] [PATCH qemu-server v2 2/6] tests: add migration test for pending disk Aaron Lauterer
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Aaron Lauterer @ 2023-05-12 12:40 UTC (permalink / raw)
  To: pve-devel

When scanning all configured storages for disk images belonging to the
VM, the migration could easily fail if a storage is not available, but
enabled. That storage might not even be used by the VM at all.

By not doing that and only looking at the disk images referenced in the
VM config, we can avoid that.
Extra handling is needed for disk images currently in the 'pending'
section of the VM config. These disk images used to be detected by
scanning all storages before.
It is also necessary to fetch some information (size, format) about the
disk images explicitly that used to be provided by the initial scan of
all storages.

The big change regarding behavior is that disk images not referenced in
the VM config file will be ignored.  They are already orphans that used
to be migrated as well, but are now left where they are.  The tests have
been adapted to that changed behavior.

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
---
 PVE/QemuMigrate.pm                    | 71 +++++++++++----------------
 test/MigrationTest/QemuMigrateMock.pm | 10 ++++
 test/run_qemu_migrate_tests.pl        | 12 ++---
 3 files changed, 44 insertions(+), 49 deletions(-)

diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index 09cc1d8..1d21250 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -312,49 +312,6 @@ sub scan_local_volumes {
 	    $abort = 1;
 	};
 
-	my @sids = PVE::Storage::storage_ids($storecfg);
-	foreach my $storeid (@sids) {
-	    my $scfg = PVE::Storage::storage_config($storecfg, $storeid);
-	    next if $scfg->{shared} && !$self->{opts}->{remote};
-	    next if !PVE::Storage::storage_check_enabled($storecfg, $storeid, undef, 1);
-
-	    # get list from PVE::Storage (for unused volumes)
-	    my $dl = PVE::Storage::vdisk_list($storecfg, $storeid, $vmid, undef, 'images');
-
-	    next if @{$dl->{$storeid}} == 0;
-
-	    my $targetsid = PVE::JSONSchema::map_id($self->{opts}->{storagemap}, $storeid);
-	    if (!$self->{opts}->{remote}) {
-		# check if storage is available on target node
-		my $target_scfg = PVE::Storage::storage_check_enabled(
-		    $storecfg,
-		    $targetsid,
-		    $self->{node},
-		);
-
-		die "content type 'images' is not available on storage '$targetsid'\n"
-		    if !$target_scfg->{content}->{images};
-
-	    }
-
-	    my $bwlimit = $self->get_bwlimit($storeid, $targetsid);
-
-	    PVE::Storage::foreach_volid($dl, sub {
-		my ($volid, $sid, $volinfo) = @_;
-
-		$local_volumes->{$volid}->{ref} = 'storage';
-		$local_volumes->{$volid}->{size} = $volinfo->{size};
-		$local_volumes->{$volid}->{targetsid} = $targetsid;
-		$local_volumes->{$volid}->{bwlimit} = $bwlimit;
-
-		# If with_snapshots is not set for storage migrate, it tries to use
-		# a raw+size stream, but on-the-fly conversion from qcow2 to raw+size
-		# back to qcow2 is currently not possible.
-		$local_volumes->{$volid}->{snapshots} = ($volinfo->{format} =~ /^(?:qcow2|vmdk)$/);
-		$local_volumes->{$volid}->{format} = $volinfo->{format};
-	    });
-	}
-
 	my $replicatable_volumes = !$self->{replication_jobcfg} ? {}
 	    : PVE::QemuConfig->get_replicatable_volumes($storecfg, $vmid, $conf, 0, 1);
 	foreach my $volid (keys %{$replicatable_volumes}) {
@@ -405,8 +362,23 @@ sub scan_local_volumes {
 
 	    $local_volumes->{$volid}->{ref} = $attr->{referenced_in_config} ? 'config' : 'snapshot';
 	    $local_volumes->{$volid}->{ref} = 'storage' if $attr->{is_unused};
+	    $local_volumes->{$volid}->{ref} = 'storage' if $attr->{is_pending};
 	    $local_volumes->{$volid}->{ref} = 'generated' if $attr->{is_tpmstate};
 
+	    my $bwlimit = $self->get_bwlimit($sid, $targetsid);
+	    $local_volumes->{$volid}->{targetsid} = $targetsid;
+	    $local_volumes->{$volid}->{bwlimit} = $bwlimit;
+
+	    my $volume_list = PVE::Storage::volume_list($storecfg, $sid, $vmid, 'images');
+	    # TODO could probably be done better than just iterating
+	    for my $volume (@$volume_list) {
+		if ($volume->{volid} eq $volid) {
+		    $local_volumes->{$volid}->{size} = $volume->{size};
+		    $local_volumes->{$volid}->{format} = $volume->{format};
+		    last;
+		}
+	    }
+
 	    $local_volumes->{$volid}->{is_vmstate} = $attr->{is_vmstate} ? 1 : 0;
 
 	    $local_volumes->{$volid}->{drivename} = $attr->{drivename}
@@ -450,6 +422,19 @@ sub scan_local_volumes {
 		if PVE::Storage::volume_is_base_and_used($storecfg, $volid);
 	};
 
+	# add pending disks first
+	if (defined $conf->{pending} && %{$conf->{pending}}) {
+	    PVE::QemuServer::foreach_volid($conf->{pending}, sub {
+		    my ($volid, $attr) = @_;
+		    $attr->{is_pending} = 1;
+		    eval { $test_volid->($volid, $attr); };
+		    if (my $err = $@) {
+			&$log_error($err, $volid);
+		    }
+		});
+	}
+
+	# add non-pending referenced disks
 	PVE::QemuServer::foreach_volid($conf, sub {
 	    my ($volid, $attr) = @_;
 	    eval { $test_volid->($volid, $attr); };
diff --git a/test/MigrationTest/QemuMigrateMock.pm b/test/MigrationTest/QemuMigrateMock.pm
index 94fe686..46af62a 100644
--- a/test/MigrationTest/QemuMigrateMock.pm
+++ b/test/MigrationTest/QemuMigrateMock.pm
@@ -230,6 +230,16 @@ $MigrationTest::Shared::storage_module->mock(
 	}
 	return $res;
     },
+    volume_list => sub {
+	my ($cfg, $sid, $vmid) = @_;
+
+	my $res = [];
+
+	for my $volume (@{$source_vdisks->{$sid}}) {
+	    push @$res, $volume if $volume->{vmid} eq $vmid;
+	}
+	return $res;
+    },
     vdisk_free => sub {
 	my ($scfg, $volid) = @_;
 
diff --git a/test/run_qemu_migrate_tests.pl b/test/run_qemu_migrate_tests.pl
index 3a3049d..fbe87e6 100755
--- a/test/run_qemu_migrate_tests.pl
+++ b/test/run_qemu_migrate_tests.pl
@@ -707,7 +707,6 @@ my $tests = [
 	},
     },
     {
-	# FIXME: Maybe add orphaned drives as unused?
 	name => '149_running_orphaned_disk_targetstorage_zfs',
 	target => 'pve1',
 	vmid => 149,
@@ -728,10 +727,11 @@ my $tests = [
 	},
 	expected_calls => $default_expected_calls_online,
 	expected => {
-	    source_volids => {},
+	    source_volids => {
+		'local-dir:149/vm-149-disk-0.qcow2' => 1,
+	    },
 	    target_volids => {
 		'local-zfs:vm-149-disk-10' => 1,
-		'local-zfs:vm-149-disk-0' => 1,
 	    },
 	    vm_config => get_patched_config(149, {
 		scsi0 => 'local-zfs:vm-149-disk-10,format=raw,size=4G',
@@ -744,7 +744,6 @@ my $tests = [
 	},
     },
     {
-	# FIXME: Maybe add orphaned drives as unused?
 	name => '149_running_orphaned_disk',
 	target => 'pve1',
 	vmid => 149,
@@ -764,10 +763,11 @@ my $tests = [
 	},
 	expected_calls => $default_expected_calls_online,
 	expected => {
-	    source_volids => {},
+	    source_volids => {
+		'local-dir:149/vm-149-disk-0.qcow2' => 1,
+	    },
 	    target_volids => {
 		'local-lvm:vm-149-disk-10' => 1,
-		'local-dir:149/vm-149-disk-0.qcow2' => 1,
 	    },
 	    vm_config => get_patched_config(149, {
 		scsi0 => 'local-lvm:vm-149-disk-10,format=raw,size=4G',
-- 
2.30.2





^ permalink raw reply	[flat|nested] 17+ messages in thread

* [pve-devel] [PATCH qemu-server v2 2/6] tests: add migration test for pending disk
  2023-05-12 12:40 [pve-devel] [PATCH qemu-server, container v2 0/6] migration: don't scan all storages, fail on aliases Aaron Lauterer
  2023-05-12 12:40 ` [pve-devel] [PATCH qemu-server v2 1/6] migration: only migrate disks used by the guest Aaron Lauterer
@ 2023-05-12 12:40 ` Aaron Lauterer
  2023-05-22 14:02   ` Fiona Ebner
  2023-05-12 12:40 ` [pve-devel] [PATCH qemu-server v2 3/6] migration: fail when aliased volume is detected Aaron Lauterer
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Aaron Lauterer @ 2023-05-12 12:40 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
---
 test/run_qemu_migrate_tests.pl | 65 ++++++++++++++++++++++++++++++++++
 1 file changed, 65 insertions(+)

diff --git a/test/run_qemu_migrate_tests.pl b/test/run_qemu_migrate_tests.pl
index fbe87e6..a2c7d0c 100755
--- a/test/run_qemu_migrate_tests.pl
+++ b/test/run_qemu_migrate_tests.pl
@@ -130,6 +130,25 @@ my $vm_configs = {
 	'startup' => 'order=2',
 	'vmgenid' => '4eb1d535-9381-4ddc-a8aa-af50c4d9177b',
     },
+    111 => {
+	'bootdisk' => 'scsi0',
+	'cores' => 1,
+	'ide0' => 'local-lvm:vm-111-disk-0,size=4096M',
+	'ide2' => 'none,media=cdrom',
+	'memory' => 512,
+	'name' => 'pending-test',
+	'net0' => 'virtio=4A:A3:E4:4C:CF:F0,bridge=vmbr0,firewall=1',
+	'numa' => 0,
+	'ostype' => 'l26',
+	'pending' => {
+		'scsi0' => 'local-zfs:vm-111-disk-0,size=103M',
+	},
+	'scsihw' => 'virtio-scsi-pci',
+	'snapshots' => {},
+	'smbios1' => 'uuid=5ad71d4d-8f73-4377-853e-2d22c10c96a5',
+	'sockets' => 1,
+	'vmgenid' => '2c00c030-0b5b-4988-a371-6ab259893f22',
+    },
     149 => {
 	'agent' => '0',
 	'bootdisk' => 'scsi0',
@@ -306,6 +325,13 @@ my $source_vdisks = {
 	    'vmid' => '341',
 	    'volid' => 'local-lvm:vm-341-disk-0',
 	},
+	{
+	    'ctime' => '1589277334',
+	    'format' => 'raw',
+	    'size' =>  4294967296,
+	    'vmid' => '111',
+	    'volid' => 'local-lvm:vm-111-disk-0',
+	},
     ],
     'local-zfs' => [
 	{
@@ -322,6 +348,13 @@ my $source_vdisks = {
 	    'vmid' => '105',
 	    'volid' => 'local-zfs:vm-105-disk-1',
 	},
+	{
+	    'ctime' => '1589277334',
+	    'format' => 'raw',
+	    'size' => 108003328,
+	    'vmid' => '111',
+	    'volid' => 'local-zfs:vm-111-disk-0',
+	},
 	{
 	    'format' => 'raw',
 	    'name' => 'vm-4567-disk-0',
@@ -1527,6 +1560,38 @@ my $tests = [
 	    },
 	},
     },
+    {
+	name => '111_running_pending',
+	target => 'pve1',
+	vmid => 111,
+	vm_status => {
+	    running => 1,
+	    runningmachine => 'pc-q35-5.0+pve0',
+	},
+	opts => {
+	    online => 1,
+	    'with-local-disks' => 1,
+	},
+	expected_calls => $default_expected_calls_online,
+	expected => {
+	    source_volids => {},
+	    target_volids => {
+		'local-zfs:vm-111-disk-0' => 1,
+		'local-lvm:vm-111-disk-10' => 1,
+	    },
+	    vm_config => get_patched_config(111, {
+		ide0 => 'local-lvm:vm-111-disk-10,format=raw,size=4G',
+		pending => {
+		    scsi0 => 'local-zfs:vm-111-disk-0,size=103M',
+		    snapshots => {},
+		},
+	    }),
+	    vm_status => {
+		running => 1,
+		runningmachine => 'pc-q35-5.0+pve0',
+	    },
+	},
+    },
 ];
 
 my $single_test_name = shift;
-- 
2.30.2





^ permalink raw reply	[flat|nested] 17+ messages in thread

* [pve-devel] [PATCH qemu-server v2 3/6] migration: fail when aliased volume is detected
  2023-05-12 12:40 [pve-devel] [PATCH qemu-server, container v2 0/6] migration: don't scan all storages, fail on aliases Aaron Lauterer
  2023-05-12 12:40 ` [pve-devel] [PATCH qemu-server v2 1/6] migration: only migrate disks used by the guest Aaron Lauterer
  2023-05-12 12:40 ` [pve-devel] [PATCH qemu-server v2 2/6] tests: add migration test for pending disk Aaron Lauterer
@ 2023-05-12 12:40 ` Aaron Lauterer
  2023-05-22 14:17   ` Fiona Ebner
  2023-05-25  8:15   ` Fiona Ebner
  2023-05-12 12:40 ` [pve-devel] [PATCH qemu-server v2 4/6] tests: add migration alias check Aaron Lauterer
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Aaron Lauterer @ 2023-05-12 12:40 UTC (permalink / raw)
  To: pve-devel

Aliased volumes (referencing the same disk image multiple times) can
lead to unexpected behavior in a migration.

Therefore, stop the migration in such a case.

The check works by comparing the path returned by the storage plugin.
This means that we should be able to catch the common situations where
it can happen:

* by referencing the same volid multiple times
* having a different volid due to an aliased storage: different storage
name but pointing to the same location.

We decided against checking the storages themselves being aliased. It is
not possible to infer that reliably from just the storage configuration
options alone.

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
---
 PVE/QemuMigrate.pm | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index 1d21250..c0fdbce 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -294,12 +294,12 @@ sub scan_local_volumes {
 
     my $storecfg = $self->{storecfg};
     eval {
-
 	# found local volumes and their origin
 	my $local_volumes = $self->{local_volumes};
 	my $local_volumes_errors = {};
 	my $other_errors = [];
 	my $abort = 0;
+	my $path_to_volid = {};
 
 	my $log_error = sub {
 	    my ($msg, $volid) = @_;
@@ -397,6 +397,8 @@ sub scan_local_volumes {
 	    die "owned by other VM (owner = VM $owner)\n"
 		if !$owner || ($owner != $vmid);
 
+	    $path_to_volid->{$path}->{$volid} = 1;
+
 	    return if $attr->{is_vmstate};
 
 	    if (defined($snaprefs)) {
@@ -443,6 +445,12 @@ sub scan_local_volumes {
 	    }
         });
 
+	for my $path (keys %$path_to_volid) {
+	    my @volids = keys %{$path_to_volid->{$path}};
+	    die "detected not supported aliased volumes: '" . join("', '", @volids) . "'"
+		if (scalar @volids > 1);
+	}
+
 	foreach my $vol (sort keys %$local_volumes) {
 	    my $type = $replicatable_volumes->{$vol} ? 'local, replicated' : 'local';
 	    my $ref = $local_volumes->{$vol}->{ref};
-- 
2.30.2





^ permalink raw reply	[flat|nested] 17+ messages in thread

* [pve-devel] [PATCH qemu-server v2 4/6] tests: add migration alias check
  2023-05-12 12:40 [pve-devel] [PATCH qemu-server, container v2 0/6] migration: don't scan all storages, fail on aliases Aaron Lauterer
                   ` (2 preceding siblings ...)
  2023-05-12 12:40 ` [pve-devel] [PATCH qemu-server v2 3/6] migration: fail when aliased volume is detected Aaron Lauterer
@ 2023-05-12 12:40 ` Aaron Lauterer
  2023-05-22 14:25   ` Fiona Ebner
  2023-05-12 12:40 ` [pve-devel] [PATCH container v2 5/6] migration: only migrate volumes used by the guest Aaron Lauterer
  2023-05-12 12:40 ` [pve-devel] [PATCH container v2 6/6] migration: fail when aliased volume is detected Aaron Lauterer
  5 siblings, 1 reply; 17+ messages in thread
From: Aaron Lauterer @ 2023-05-12 12:40 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
---
 test/run_qemu_migrate_tests.pl | 55 ++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/test/run_qemu_migrate_tests.pl b/test/run_qemu_migrate_tests.pl
index a2c7d0c..9e2983a 100755
--- a/test/run_qemu_migrate_tests.pl
+++ b/test/run_qemu_migrate_tests.pl
@@ -63,6 +63,15 @@ my $storage_config = {
 	    sparse => 1,
 	    type => "zfspool",
 	},
+	"alias-zfs" => {
+	    content => {
+		images => 1,
+		rootdir => 1,
+	    },
+	    pool => "rpool/data",
+	    sparse => 1,
+	    type => "zfspool",
+	},
 	"rbd-store" => {
 	    monhost => "127.0.0.42,127.0.0.21,::1",
 	    fsid => 'fc4181a6-56eb-4f68-b452-8ba1f381ca2a',
@@ -149,6 +158,24 @@ my $vm_configs = {
 	'sockets' => 1,
 	'vmgenid' => '2c00c030-0b5b-4988-a371-6ab259893f22',
     },
+    123 => {
+	'bootdisk' => 'scsi0',
+	'cores' => 1,
+	'scsi0' => 'local-zfs:vm-123-disk-0,size=4096M',
+	'scsi1' => 'alias-zfs:vm-123-disk-0,size=4096M',
+	'ide2' => 'none,media=cdrom',
+	'memory' => 512,
+	'name' => 'alias-test',
+	'net0' => 'virtio=4A:A3:E4:4C:CF:F0,bridge=vmbr0,firewall=1',
+	'numa' => 0,
+	'ostype' => 'l26',
+	'pending' => {},
+	'scsihw' => 'virtio-scsi-pci',
+	'snapshots' => {},
+	'smbios1' => 'uuid=5ad71d4d-8f73-4377-853e-2d22c10c96a5',
+	'sockets' => 1,
+	'vmgenid' => '2c00c030-0b5b-4988-a371-6ab259893f22',
+    },
     149 => {
 	'agent' => '0',
 	'bootdisk' => 'scsi0',
@@ -355,6 +382,13 @@ my $source_vdisks = {
 	    'vmid' => '111',
 	    'volid' => 'local-zfs:vm-111-disk-0',
 	},
+	{
+	    'ctime' => '1589277334',
+	    'format' => 'raw',
+	    'size' => 4294967296,
+	    'vmid' => '123',
+	    'volid' => 'local-zfs:vm-123-disk-0',
+	},
 	{
 	    'format' => 'raw',
 	    'name' => 'vm-4567-disk-0',
@@ -1592,6 +1626,27 @@ my $tests = [
 	    },
 	},
     },
+    {
+	name => '123_alias_fail',
+	target => 'pve1',
+	vmid => 123,
+	vm_status => {
+	    running => 0,
+	},
+	opts => {
+	    'with-local-disks' => 1,
+	},
+	expected_calls => {},
+	expect_die => "detected not supported aliased volumes",
+	expected => {
+	    source_volids => local_volids_for_vm(123),
+	    target_volids => {},
+	    vm_config => $vm_configs->{123},
+	    vm_status => {
+		running => 0,
+	    },
+	},
+    },
 ];
 
 my $single_test_name = shift;
-- 
2.30.2





^ permalink raw reply	[flat|nested] 17+ messages in thread

* [pve-devel] [PATCH container v2 5/6] migration: only migrate volumes used by the guest
  2023-05-12 12:40 [pve-devel] [PATCH qemu-server, container v2 0/6] migration: don't scan all storages, fail on aliases Aaron Lauterer
                   ` (3 preceding siblings ...)
  2023-05-12 12:40 ` [pve-devel] [PATCH qemu-server v2 4/6] tests: add migration alias check Aaron Lauterer
@ 2023-05-12 12:40 ` Aaron Lauterer
  2023-05-22 15:00   ` Fiona Ebner
  2023-05-12 12:40 ` [pve-devel] [PATCH container v2 6/6] migration: fail when aliased volume is detected Aaron Lauterer
  5 siblings, 1 reply; 17+ messages in thread
From: Aaron Lauterer @ 2023-05-12 12:40 UTC (permalink / raw)
  To: pve-devel

When scanning all configured storages for volumes belonging to the
container, the migration could easily fail if a storage is not
available, but enabled. That storage might not even be used by the
container at all.

By not doing that and only looking at the disk images referenced in the
config, we can avoid that.
We need to add additional steps for pending volumes with checks if they
actually exist. Changing an existing mountpoint to a new volume
will only create the volume on the next start of the container.

The big change regarding behavior is that volumes not referenced in the
container config will be ignored.  They are already orphans that used to
be migrated as well, but are now left where they are.

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
---
 src/PVE/LXC/Migrate.pm | 47 +++++++++++++++---------------------------
 1 file changed, 17 insertions(+), 30 deletions(-)

diff --git a/src/PVE/LXC/Migrate.pm b/src/PVE/LXC/Migrate.pm
index ccf5157..8e11be7 100644
--- a/src/PVE/LXC/Migrate.pm
+++ b/src/PVE/LXC/Migrate.pm
@@ -195,6 +195,17 @@ sub phase1 {
 
 	return if !$volid;
 
+	# check if volume exists, might be a pending new one
+	eval {
+	    PVE::Storage::path($self->{storecfg}, $volid);
+	};
+	if ($@ =~ m/^unable to parse/) {
+	    $self->log('info', "volume '$volid' does not exist (pending change?)");
+	    return;
+	} elsif ($@) {
+	    die $@;
+	}
+
 	my ($sid, $volname) = PVE::Storage::parse_volume_id($volid);
 
 	# check if storage is available on source node
@@ -256,42 +267,18 @@ sub phase1 {
 	&$log_error($@, $volid) if $@;
     };
 
-    # first unused / lost volumes owned by this container
-    my @sids = PVE::Storage::storage_ids($self->{storecfg});
-    foreach my $storeid (@sids) {
-	my $scfg = PVE::Storage::storage_config($self->{storecfg}, $storeid);
-	next if $scfg->{shared} && !$remote;
-	next if !PVE::Storage::storage_check_enabled($self->{storecfg}, $storeid, undef, 1);
-
-	# get list from PVE::Storage (for unreferenced volumes)
-	my $dl = PVE::Storage::vdisk_list($self->{storecfg}, $storeid, $vmid, undef, 'rootdir');
-
-	next if @{$dl->{$storeid}} == 0;
-
-	# check if storage is available on target node
-	my $targetsid = PVE::JSONSchema::map_id($self->{opts}->{storagemap}, $storeid);
-	if (!$remote) {
-	    my $target_scfg = PVE::Storage::storage_check_enabled($self->{storecfg}, $targetsid, $self->{node});
-
-	    die "content type 'rootdir' is not available on storage '$targetsid'\n"
-		if !$target_scfg->{content}->{rootdir};
-	}
-
-	PVE::Storage::foreach_volid($dl, sub {
-	    my ($volid, $sid, $volname) = @_;
-
-	    $volhash->{$volid}->{ref} = 'storage';
-	    $volhash->{$volid}->{targetsid} = $targetsid;
-	});
-    }
-
-    # then all volumes referenced in snapshots
+    # first all volumes referenced in snapshots
     foreach my $snapname (keys %{$conf->{snapshots}}) {
 	&$test_volid($conf->{snapshots}->{$snapname}->{'vmstate'}, 0, undef)
 	    if defined($conf->{snapshots}->{$snapname}->{'vmstate'});
 	PVE::LXC::Config->foreach_volume($conf->{snapshots}->{$snapname}, $test_mp, $snapname);
     }
 
+    # then all pending volumes
+    if (defined $conf->{pending} && %{$conf->{pending}}) {
+	PVE::LXC::Config->foreach_volume($conf->{pending}, $test_mp);
+    }
+
     # finally all current volumes
     PVE::LXC::Config->foreach_volume_full($conf, { include_unused => 1 }, $test_mp);
 
-- 
2.30.2





^ permalink raw reply	[flat|nested] 17+ messages in thread

* [pve-devel] [PATCH container v2 6/6] migration: fail when aliased volume is detected
  2023-05-12 12:40 [pve-devel] [PATCH qemu-server, container v2 0/6] migration: don't scan all storages, fail on aliases Aaron Lauterer
                   ` (4 preceding siblings ...)
  2023-05-12 12:40 ` [pve-devel] [PATCH container v2 5/6] migration: only migrate volumes used by the guest Aaron Lauterer
@ 2023-05-12 12:40 ` Aaron Lauterer
  5 siblings, 0 replies; 17+ messages in thread
From: Aaron Lauterer @ 2023-05-12 12:40 UTC (permalink / raw)
  To: pve-devel

Aliased volumes (referencing the same volume multiple times) can lead to
unexpected behavior in a migration.

Therefore, stop the migration in such a case.

The check works by comparing the path returned by the storage plugin.
This means that we should be able to catch the common situations where
it can happen:

* by referencing the same volid multiple times
* having a different volid due to an aliased storage: different storage
name but pointing to the same location.

We decided against checking the storages themselves being aliased. It is
not possible to infer that reliably from just the storage configuration
options alone.

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
---
 src/PVE/LXC/Migrate.pm | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/src/PVE/LXC/Migrate.pm b/src/PVE/LXC/Migrate.pm
index 8e11be7..e542ce7 100644
--- a/src/PVE/LXC/Migrate.pm
+++ b/src/PVE/LXC/Migrate.pm
@@ -182,6 +182,7 @@ sub phase1 {
     my $volhash = {}; # 'config', 'snapshot' or 'storage' for local volumes
     my $volhash_errors = {};
     my $abort = 0;
+    my $path_to_volid = {};
 
     my $log_error = sub {
 	my ($msg, $volid) = @_;
@@ -236,6 +237,8 @@ sub phase1 {
 	die "owned by other guest (owner = $owner)\n"
 	    if !$owner || ($owner != $self->{vmid});
 
+	$path_to_volid->{$path}->{$volid} = 1;
+
 	if (defined($snapname)) {
 	    # we cannot migrate shapshots on local storage
 	    # exceptions: 'zfspool', 'btrfs'
@@ -282,6 +285,12 @@ sub phase1 {
     # finally all current volumes
     PVE::LXC::Config->foreach_volume_full($conf, { include_unused => 1 }, $test_mp);
 
+    for my $path (keys %$path_to_volid) {
+	my @volids = keys %{$path_to_volid->{$path}};
+	die "detected not supported aliased volumes: '" . join("', '", @volids) . "'"
+	    if (scalar @volids > 1);
+    }
+
     # additional checks for local storage
     foreach my $volid (keys %$volhash) {
 	eval {
-- 
2.30.2





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [pve-devel] [PATCH qemu-server v2 1/6] migration: only migrate disks used by the guest
  2023-05-12 12:40 ` [pve-devel] [PATCH qemu-server v2 1/6] migration: only migrate disks used by the guest Aaron Lauterer
@ 2023-05-22 11:59   ` Fiona Ebner
  2023-05-24 15:00     ` Aaron Lauterer
  0 siblings, 1 reply; 17+ messages in thread
From: Fiona Ebner @ 2023-05-22 11:59 UTC (permalink / raw)
  To: Proxmox VE development discussion, Aaron Lauterer

Am 12.05.23 um 14:40 schrieb Aaron Lauterer:
> When scanning all configured storages for disk images belonging to the
> VM, the migration could easily fail if a storage is not available, but
> enabled. That storage might not even be used by the VM at all.
> 
> By not doing that and only looking at the disk images referenced in the
> VM config, we can avoid that.
> Extra handling is needed for disk images currently in the 'pending'
> section of the VM config. These disk images used to be detected by
> scanning all storages before.
> It is also necessary to fetch some information (size, format) about the
> disk images explicitly that used to be provided by the initial scan of
> all storages.
> 
> The big change regarding behavior is that disk images not referenced in
> the VM config file will be ignored.  They are already orphans that used
> to be migrated as well, but are now left where they are.  The tests have
> been adapted to that changed behavior.
> 
> Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
> ---
>  PVE/QemuMigrate.pm                    | 71 +++++++++++----------------
>  test/MigrationTest/QemuMigrateMock.pm | 10 ++++
>  test/run_qemu_migrate_tests.pl        | 12 ++---
>  3 files changed, 44 insertions(+), 49 deletions(-)
> 
> diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
> index 09cc1d8..1d21250 100644
> --- a/PVE/QemuMigrate.pm
> +++ b/PVE/QemuMigrate.pm
> @@ -312,49 +312,6 @@ sub scan_local_volumes {
>  	    $abort = 1;
>  	};
>  
> -	my @sids = PVE::Storage::storage_ids($storecfg);
> -	foreach my $storeid (@sids) {
> -	    my $scfg = PVE::Storage::storage_config($storecfg, $storeid);
> -	    next if $scfg->{shared} && !$self->{opts}->{remote};
> -	    next if !PVE::Storage::storage_check_enabled($storecfg, $storeid, undef, 1);
> -
> -	    # get list from PVE::Storage (for unused volumes)
> -	    my $dl = PVE::Storage::vdisk_list($storecfg, $storeid, $vmid, undef, 'images');
> -
> -	    next if @{$dl->{$storeid}} == 0;
> -
> -	    my $targetsid = PVE::JSONSchema::map_id($self->{opts}->{storagemap}, $storeid);
> -	    if (!$self->{opts}->{remote}) {
> -		# check if storage is available on target node
> -		my $target_scfg = PVE::Storage::storage_check_enabled(
> -		    $storecfg,
> -		    $targetsid,
> -		    $self->{node},
> -		);
> -
> -		die "content type 'images' is not available on storage '$targetsid'\n"
> -		    if !$target_scfg->{content}->{images};

This one might be worth re-adding after the storage enabled check
further below. The same check is already done in prepare() for the
result of get_vm_volumes(), but that does not (currently ;)) include
pending ones (a bit of foreshadowing here :P)

Or actually, let's use the improved vtype-aware version from prepare():

>             my ($vtype) = PVE::Storage::parse_volname($storecfg, $volid);
> 
>             die "$volid: content type '$vtype' is not available on storage '$targetsid'\n"
>                 if !$target_scfg->{content}->{$vtype};
Might even be worth factoring out the whole block including the
    if (!$self->{opts}->{remote}) {
    ...
    }
into a mini-helper? But it's only used twice, so do as you like :)

(...)

> @@ -405,8 +362,23 @@ sub scan_local_volumes {
>  
>  	    $local_volumes->{$volid}->{ref} = $attr->{referenced_in_config} ? 'config' : 'snapshot';
>  	    $local_volumes->{$volid}->{ref} = 'storage' if $attr->{is_unused};
> +	    $local_volumes->{$volid}->{ref} = 'storage' if $attr->{is_pending};
>  	    $local_volumes->{$volid}->{ref} = 'generated' if $attr->{is_tpmstate};
>  
> +	    my $bwlimit = $self->get_bwlimit($sid, $targetsid);
> +	    $local_volumes->{$volid}->{targetsid} = $targetsid;
> +	    $local_volumes->{$volid}->{bwlimit} = $bwlimit;
> +
> +	    my $volume_list = PVE::Storage::volume_list($storecfg, $sid, $vmid, 'images');
> +	    # TODO could probably be done better than just iterating

volume_size_info() to the rescue :) Would avoid the loop and quite a bit
of overhead from calling volume_list() for each individual volume.

> +	    for my $volume (@$volume_list) {
> +		if ($volume->{volid} eq $volid) {
> +		    $local_volumes->{$volid}->{size} = $volume->{size};
> +		    $local_volumes->{$volid}->{format} = $volume->{format};
> +		    last;
> +		}
> +	    }
> +
>  	    $local_volumes->{$volid}->{is_vmstate} = $attr->{is_vmstate} ? 1 : 0;
>  
>  	    $local_volumes->{$volid}->{drivename} = $attr->{drivename}
> @@ -450,6 +422,19 @@ sub scan_local_volumes {
>  		if PVE::Storage::volume_is_base_and_used($storecfg, $volid);
>  	};
>  
> +	# add pending disks first
> +	if (defined $conf->{pending} && %{$conf->{pending}}) {

Style nit: please use parentheses for defined. And $conf->{pending}->%*
is slightly nicer, because it can be read from left to right, rather
than needing to look at the inner bit first and then remember the % on
the outside ;)

> +	    PVE::QemuServer::foreach_volid($conf->{pending}, sub {

Should we rather expand foreach_volid() instead? It handles snapshots
already, so handling pending too doesn't seem fundamentally wrong.

Let's check the existing callers and whether they are fine with the change:
1. this one right here: wants pending
2. API2/Qemu.pm: check_vm_disks_local() for migration precondition:
related to migration, so more consistent with pending
3. QemuConfig.pm: get_replicatable_volumes(): can be fine with pending,
but it's a change of course!
4. QemuServer.pm: get_vm_volumes(): is used multiple times by:
4a. vm_stop_cleanup() to deactivate/unmap: should also be fine with
including pending
4b. QemuMigrate.pm: in prepare(): part of migration, so more consistent
with pending
4c. QemuMigrate.pm: in phase3_cleanup() for deactivation: part of
migration, so more consistent with pending

So the only issue is with replication, where we can decide between:
1. Also include pending volumes for replication (would be fine by me).
2. Keep behavior as-is. But then we need to adapt. Some possibilities:
2a. Add a new attribute like 'pending-only' in the result of
foreach_volid(), so that get_replicatable_volumes() can filter them out.
2b. Switch to a different function like foreach_volume() and get the two
attributes that are used there (cdrom and replicate) manually.
2c. Add a switch to foreach_volid() whether it should include volumes
that are only in pending.

> +		    my ($volid, $attr) = @_;
> +		    $attr->{is_pending} = 1;
> +		    eval { $test_volid->($volid, $attr); };
> +		    if (my $err = $@) {
> +			&$log_error($err, $volid);
> +		    }
> +		});
> +	}
> +
> +	# add non-pending referenced disks
>  	PVE::QemuServer::foreach_volid($conf, sub {
>  	    my ($volid, $attr) = @_;
>  	    eval { $test_volid->($volid, $attr); };




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [pve-devel] [PATCH qemu-server v2 2/6] tests: add migration test for pending disk
  2023-05-12 12:40 ` [pve-devel] [PATCH qemu-server v2 2/6] tests: add migration test for pending disk Aaron Lauterer
@ 2023-05-22 14:02   ` Fiona Ebner
  0 siblings, 0 replies; 17+ messages in thread
From: Fiona Ebner @ 2023-05-22 14:02 UTC (permalink / raw)
  To: Proxmox VE development discussion, Aaron Lauterer

Am 12.05.23 um 14:40 schrieb Aaron Lauterer:
> +	    vm_config => get_patched_config(111, {
> +		ide0 => 'local-lvm:vm-111-disk-10,format=raw,size=4G',
> +		pending => {
> +		    scsi0 => 'local-zfs:vm-111-disk-0,size=103M',
> +		    snapshots => {},
> +		},

I was wondering why you would need this. In the last patch, you call
PVE::QemuServer::foreach_volid($conf->{pending}) and I think that
auto-vivifies $conf->{pending}->{snapshots}. Another reason to extend
the function instead of calling it with a non-full config ;)




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [pve-devel] [PATCH qemu-server v2 3/6] migration: fail when aliased volume is detected
  2023-05-12 12:40 ` [pve-devel] [PATCH qemu-server v2 3/6] migration: fail when aliased volume is detected Aaron Lauterer
@ 2023-05-22 14:17   ` Fiona Ebner
  2023-05-24 14:40     ` Aaron Lauterer
  2023-05-25  8:15   ` Fiona Ebner
  1 sibling, 1 reply; 17+ messages in thread
From: Fiona Ebner @ 2023-05-22 14:17 UTC (permalink / raw)
  To: Proxmox VE development discussion, Aaron Lauterer

Am 12.05.23 um 14:40 schrieb Aaron Lauterer:
> Aliased volumes (referencing the same disk image multiple times) can
> lead to unexpected behavior in a migration.

Not only migration, but snapshots, storage locking, etc. Should we
actually care here? I still think it is rather something that people
should be made aware for the storage layer. Maybe a big enough warning
in the documentation is enough?

Since it's not only migration, should we add a warning during VM startup
instead/additionally?

> @@ -443,6 +445,12 @@ sub scan_local_volumes {
>  	    }
>          });
>  
> +	for my $path (keys %$path_to_volid) {
> +	    my @volids = keys %{$path_to_volid->{$path}};
> +	    die "detected not supported aliased volumes: '" . join("', '", @volids) . "'"
> +		if (scalar @volids > 1);

Style nit: please use parentheses for scalar

> +	}
> +
>  	foreach my $vol (sort keys %$local_volumes) {
>  	    my $type = $replicatable_volumes->{$vol} ? 'local, replicated' : 'local';
>  	    my $ref = $local_volumes->{$vol}->{ref};




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [pve-devel] [PATCH qemu-server v2 4/6] tests: add migration alias check
  2023-05-12 12:40 ` [pve-devel] [PATCH qemu-server v2 4/6] tests: add migration alias check Aaron Lauterer
@ 2023-05-22 14:25   ` Fiona Ebner
  2023-05-24 14:41     ` Aaron Lauterer
  0 siblings, 1 reply; 17+ messages in thread
From: Fiona Ebner @ 2023-05-22 14:25 UTC (permalink / raw)
  To: Proxmox VE development discussion, Aaron Lauterer

Am 12.05.23 um 14:40 schrieb Aaron Lauterer:
> @@ -355,6 +382,13 @@ my $source_vdisks = {
>  	    'vmid' => '111',
>  	    'volid' => 'local-zfs:vm-111-disk-0',
>  	},
> +	{
> +	    'ctime' => '1589277334',
> +	    'format' => 'raw',
> +	    'size' => 4294967296,
> +	    'vmid' => '123',
> +	    'volid' => 'local-zfs:vm-123-disk-0',
> +	},

A disk should also be added to alias-zfs. And actually, let's not make
alias-zfs reference "rpool/data", but add a second alias-zfs-2 or
something. Otherwise, we would need to add all disks currently in
local-zfs to alias-zfs too, to stay consistent with what we model.

>  	{
>  	    'format' => 'raw',
>  	    'name' => 'vm-4567-disk-0',





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [pve-devel] [PATCH container v2 5/6] migration: only migrate volumes used by the guest
  2023-05-12 12:40 ` [pve-devel] [PATCH container v2 5/6] migration: only migrate volumes used by the guest Aaron Lauterer
@ 2023-05-22 15:00   ` Fiona Ebner
  0 siblings, 0 replies; 17+ messages in thread
From: Fiona Ebner @ 2023-05-22 15:00 UTC (permalink / raw)
  To: Proxmox VE development discussion, Aaron Lauterer

Am 12.05.23 um 14:40 schrieb Aaron Lauterer:
> When scanning all configured storages for volumes belonging to the
> container, the migration could easily fail if a storage is not
> available, but enabled. That storage might not even be used by the
> container at all.
> 
> By not doing that and only looking at the disk images referenced in the
> config, we can avoid that.
> We need to add additional steps for pending volumes with checks if they
> actually exist. Changing an existing mountpoint to a new volume
> will only create the volume on the next start of the container.
> 
> The big change regarding behavior is that volumes not referenced in the
> container config will be ignored.  They are already orphans that used to
> be migrated as well, but are now left where they are.
> 
> Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
> ---
>  src/PVE/LXC/Migrate.pm | 47 +++++++++++++++---------------------------
>  1 file changed, 17 insertions(+), 30 deletions(-)
> 
> diff --git a/src/PVE/LXC/Migrate.pm b/src/PVE/LXC/Migrate.pm
> index ccf5157..8e11be7 100644
> --- a/src/PVE/LXC/Migrate.pm
> +++ b/src/PVE/LXC/Migrate.pm
> @@ -195,6 +195,17 @@ sub phase1 {
>  
>  	return if !$volid;
>  
> +	# check if volume exists, might be a pending new one
> +	eval {
> +	    PVE::Storage::path($self->{storecfg}, $volid);
> +	};
> +	if ($@ =~ m/^unable to parse/) {

I'd really like to avoid such manual error message matching, especially
across package boundaries. PVE::Storage::path() calls parse_volume_id(),
but that also works for storeid:1 for example, so not triggering an
error for such pending changes. PVE::Storage::path() then calls the
plugin's path(), but the plugin can use whatever error message it likes,
so matching for something specific only covers certain plugins. Can't we
just filter out the yet-to-be-created volumes by matching the NEW_DISK_RE?

> +	    $self->log('info', "volume '$volid' does not exist (pending change?)");
> +	    return;
> +	} elsif ($@) {
> +	    die $@;
> +	}
> +
>  	my ($sid, $volname) = PVE::Storage::parse_volume_id($volid);
>  
>  	# check if storage is available on source node

(...)

> +    # first all volumes referenced in snapshots
>      foreach my $snapname (keys %{$conf->{snapshots}}) {
>  	&$test_volid($conf->{snapshots}->{$snapname}->{'vmstate'}, 0, undef)
>  	    if defined($conf->{snapshots}->{$snapname}->{'vmstate'});
>  	PVE::LXC::Config->foreach_volume($conf->{snapshots}->{$snapname}, $test_mp, $snapname);
>      }
>  
> +    # then all pending volumes
> +    if (defined $conf->{pending} && %{$conf->{pending}}) {

Style nit: please use parentheses for defined

> +	PVE::LXC::Config->foreach_volume($conf->{pending}, $test_mp);
> +    }
> +
>      # finally all current volumes
>      PVE::LXC::Config->foreach_volume_full($conf, { include_unused => 1 }, $test_mp);
>  




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [pve-devel] [PATCH qemu-server v2 3/6] migration: fail when aliased volume is detected
  2023-05-22 14:17   ` Fiona Ebner
@ 2023-05-24 14:40     ` Aaron Lauterer
  2023-05-25  8:14       ` Fiona Ebner
  0 siblings, 1 reply; 17+ messages in thread
From: Aaron Lauterer @ 2023-05-24 14:40 UTC (permalink / raw)
  To: Fiona Ebner, Proxmox VE development discussion



On 5/22/23 16:17, Fiona Ebner wrote:
> Am 12.05.23 um 14:40 schrieb Aaron Lauterer:
>> Aliased volumes (referencing the same disk image multiple times) can
>> lead to unexpected behavior in a migration.
> 
> Not only migration, but snapshots, storage locking, etc. Should we
> actually care here? I still think it is rather something that people
> should be made aware for the storage layer. Maybe a big enough warning
> in the documentation is enough?
> 
> Since it's not only migration, should we add a warning during VM startup
> instead/additionally?
> 

I guess a warning in the docs would be a low-hanging fruit -> added to my TODO.
Snapshots should just fail the second time as we already have one with the same 
name, right?

AFAIU storage migration is a case where an aliased volume can lead to very 
unexpected behavior, like storages running full. So a check and die is probably 
a good idea.

An additional warning during startup could probably be a good idea. In that 
case, the checks should probably be factored out into their own method so we can 
call it from the start as well. Though that would probably mean, another round 
of iterating through the config instead of inlining it.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [pve-devel] [PATCH qemu-server v2 4/6] tests: add migration alias check
  2023-05-22 14:25   ` Fiona Ebner
@ 2023-05-24 14:41     ` Aaron Lauterer
  0 siblings, 0 replies; 17+ messages in thread
From: Aaron Lauterer @ 2023-05-24 14:41 UTC (permalink / raw)
  To: Fiona Ebner, Proxmox VE development discussion



On 5/22/23 16:25, Fiona Ebner wrote:
> Am 12.05.23 um 14:40 schrieb Aaron Lauterer:
>> @@ -355,6 +382,13 @@ my $source_vdisks = {
>>   	    'vmid' => '111',
>>   	    'volid' => 'local-zfs:vm-111-disk-0',
>>   	},
>> +	{
>> +	    'ctime' => '1589277334',
>> +	    'format' => 'raw',
>> +	    'size' => 4294967296,
>> +	    'vmid' => '123',
>> +	    'volid' => 'local-zfs:vm-123-disk-0',
>> +	},
> 
> A disk should also be added to alias-zfs. And actually, let's not make
> alias-zfs reference "rpool/data", but add a second alias-zfs-2 or
> something. Otherwise, we would need to add all disks currently in
> local-zfs to alias-zfs too, to stay consistent with what we model.
> 

good point. will do :)
>>   	{
>>   	    'format' => 'raw',
>>   	    'name' => 'vm-4567-disk-0',
> 




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [pve-devel] [PATCH qemu-server v2 1/6] migration: only migrate disks used by the guest
  2023-05-22 11:59   ` Fiona Ebner
@ 2023-05-24 15:00     ` Aaron Lauterer
  0 siblings, 0 replies; 17+ messages in thread
From: Aaron Lauterer @ 2023-05-24 15:00 UTC (permalink / raw)
  To: Fiona Ebner, Proxmox VE development discussion



On 5/22/23 13:59, Fiona Ebner wrote:
> Am 12.05.23 um 14:40 schrieb Aaron Lauterer:
>> When scanning all configured storages for disk images belonging to the
>> VM, the migration could easily fail if a storage is not available, but
>> enabled. That storage might not even be used by the VM at all.
>>
>> By not doing that and only looking at the disk images referenced in the
>> VM config, we can avoid that.
>> Extra handling is needed for disk images currently in the 'pending'
>> section of the VM config. These disk images used to be detected by
>> scanning all storages before.
>> It is also necessary to fetch some information (size, format) about the
>> disk images explicitly that used to be provided by the initial scan of
>> all storages.
>>
>> The big change regarding behavior is that disk images not referenced in
>> the VM config file will be ignored.  They are already orphans that used
>> to be migrated as well, but are now left where they are.  The tests have
>> been adapted to that changed behavior.
>>
>> Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
>> ---
>>   PVE/QemuMigrate.pm                    | 71 +++++++++++----------------
>>   test/MigrationTest/QemuMigrateMock.pm | 10 ++++
>>   test/run_qemu_migrate_tests.pl        | 12 ++---
>>   3 files changed, 44 insertions(+), 49 deletions(-)
>>
>> diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
>> index 09cc1d8..1d21250 100644
>> --- a/PVE/QemuMigrate.pm
>> +++ b/PVE/QemuMigrate.pm
>> @@ -312,49 +312,6 @@ sub scan_local_volumes {
>>   	    $abort = 1;
>>   	};
>>   
>> -	my @sids = PVE::Storage::storage_ids($storecfg);
>> -	foreach my $storeid (@sids) {
>> -	    my $scfg = PVE::Storage::storage_config($storecfg, $storeid);
>> -	    next if $scfg->{shared} && !$self->{opts}->{remote};
>> -	    next if !PVE::Storage::storage_check_enabled($storecfg, $storeid, undef, 1);
>> -
>> -	    # get list from PVE::Storage (for unused volumes)
>> -	    my $dl = PVE::Storage::vdisk_list($storecfg, $storeid, $vmid, undef, 'images');
>> -
>> -	    next if @{$dl->{$storeid}} == 0;
>> -
>> -	    my $targetsid = PVE::JSONSchema::map_id($self->{opts}->{storagemap}, $storeid);
>> -	    if (!$self->{opts}->{remote}) {
>> -		# check if storage is available on target node
>> -		my $target_scfg = PVE::Storage::storage_check_enabled(
>> -		    $storecfg,
>> -		    $targetsid,
>> -		    $self->{node},
>> -		);
>> -
>> -		die "content type 'images' is not available on storage '$targetsid'\n"
>> -		    if !$target_scfg->{content}->{images};
> 
> This one might be worth re-adding after the storage enabled check
> further below. The same check is already done in prepare() for the
> result of get_vm_volumes(), but that does not (currently ;)) include
> pending ones (a bit of foreshadowing here :P)
> 
> Or actually, let's use the improved vtype-aware version from prepare():
> 
>>              my ($vtype) = PVE::Storage::parse_volname($storecfg, $volid);
>>
>>              die "$volid: content type '$vtype' is not available on storage '$targetsid'\n"
>>                  if !$target_scfg->{content}->{$vtype};
> Might even be worth factoring out the whole block including the
>      if (!$self->{opts}->{remote}) {
>      ...
>      }
> into a mini-helper? But it's only used twice, so do as you like :)
> 
> (...)

okay, will look into it.
> 
>> @@ -405,8 +362,23 @@ sub scan_local_volumes {
>>   
>>   	    $local_volumes->{$volid}->{ref} = $attr->{referenced_in_config} ? 'config' : 'snapshot';
>>   	    $local_volumes->{$volid}->{ref} = 'storage' if $attr->{is_unused};
>> +	    $local_volumes->{$volid}->{ref} = 'storage' if $attr->{is_pending};
>>   	    $local_volumes->{$volid}->{ref} = 'generated' if $attr->{is_tpmstate};
>>   
>> +	    my $bwlimit = $self->get_bwlimit($sid, $targetsid);
>> +	    $local_volumes->{$volid}->{targetsid} = $targetsid;
>> +	    $local_volumes->{$volid}->{bwlimit} = $bwlimit;
>> +
>> +	    my $volume_list = PVE::Storage::volume_list($storecfg, $sid, $vmid, 'images');
>> +	    # TODO could probably be done better than just iterating
> 
> volume_size_info() to the rescue :) Would avoid the loop and quite a bit
> of overhead from calling volume_list() for each individual volume.

ah thanks. looks like I can get both, size and format from it :)

> 
>> +	    for my $volume (@$volume_list) {
>> +		if ($volume->{volid} eq $volid) {
>> +		    $local_volumes->{$volid}->{size} = $volume->{size};
>> +		    $local_volumes->{$volid}->{format} = $volume->{format};
>> +		    last;
>> +		}
>> +	    }
>> +
>>   	    $local_volumes->{$volid}->{is_vmstate} = $attr->{is_vmstate} ? 1 : 0;
>>   
>>   	    $local_volumes->{$volid}->{drivename} = $attr->{drivename}
>> @@ -450,6 +422,19 @@ sub scan_local_volumes {
>>   		if PVE::Storage::volume_is_base_and_used($storecfg, $volid);
>>   	};
>>   
>> +	# add pending disks first
>> +	if (defined $conf->{pending} && %{$conf->{pending}}) {
> 
> Style nit: please use parentheses for defined. And $conf->{pending}->%*
> is slightly nicer, because it can be read from left to right, rather
> than needing to look at the inner bit first and then remember the % on
> the outside ;)
> 
>> +	    PVE::QemuServer::foreach_volid($conf->{pending}, sub {
> 
> Should we rather expand foreach_volid() instead? It handles snapshots
> already, so handling pending too doesn't seem fundamentally wrong.
> 
> Let's check the existing callers and whether they are fine with the change:
> 1. this one right here: wants pending
> 2. API2/Qemu.pm: check_vm_disks_local() for migration precondition:
> related to migration, so more consistent with pending
> 3. QemuConfig.pm: get_replicatable_volumes(): can be fine with pending,
> but it's a change of course!
> 4. QemuServer.pm: get_vm_volumes(): is used multiple times by:
> 4a. vm_stop_cleanup() to deactivate/unmap: should also be fine with
> including pending
> 4b. QemuMigrate.pm: in prepare(): part of migration, so more consistent
> with pending
> 4c. QemuMigrate.pm: in phase3_cleanup() for deactivation: part of
> migration, so more consistent with pending
> 
> So the only issue is with replication, where we can decide between:
> 1. Also include pending volumes for replication (would be fine by me).
> 2. Keep behavior as-is. But then we need to adapt. Some possibilities:
> 2a. Add a new attribute like 'pending-only' in the result of
> foreach_volid(), so that get_replicatable_volumes() can filter them out.
> 2b. Switch to a different function like foreach_volume() and get the two
> attributes that are used there (cdrom and replicate) manually.
> 2c. Add a switch to foreach_volid() whether it should include volumes
> that are only in pending.

I'll give 1 a try and see how it behaves with replication. If not, then I'd 
rather go for 2a or 2c, though I am not yet sure which one would be better in 
the long run. 2c would probably be "nicer" as the caller can decide what they 
want included, instead of having to check for and deal with a return attribute.

> 
>> +		    my ($volid, $attr) = @_;
>> +		    $attr->{is_pending} = 1;
>> +		    eval { $test_volid->($volid, $attr); };
>> +		    if (my $err = $@) {
>> +			&$log_error($err, $volid);
>> +		    }
>> +		});
>> +	}
>> +
>> +	# add non-pending referenced disks
>>   	PVE::QemuServer::foreach_volid($conf, sub {
>>   	    my ($volid, $attr) = @_;
>>   	    eval { $test_volid->($volid, $attr); };




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [pve-devel] [PATCH qemu-server v2 3/6] migration: fail when aliased volume is detected
  2023-05-24 14:40     ` Aaron Lauterer
@ 2023-05-25  8:14       ` Fiona Ebner
  0 siblings, 0 replies; 17+ messages in thread
From: Fiona Ebner @ 2023-05-25  8:14 UTC (permalink / raw)
  To: Aaron Lauterer, Proxmox VE development discussion

Am 24.05.23 um 16:40 schrieb Aaron Lauterer:
> On 5/22/23 16:17, Fiona Ebner wrote:
>> Am 12.05.23 um 14:40 schrieb Aaron Lauterer:
>>> Aliased volumes (referencing the same disk image multiple times) can
>>> lead to unexpected behavior in a migration.
>>
>> Not only migration, but snapshots, storage locking, etc. Should we
>> actually care here? I still think it is rather something that people
>> should be made aware for the storage layer. Maybe a big enough warning
>> in the documentation is enough?
>>
>> Since it's not only migration, should we add a warning during VM startup
>> instead/additionally?
>>
> 
> I guess a warning in the docs would be a low-hanging fruit -> added to
> my TODO.

Great!

> Snapshots should just fail the second time as we already have one with
> the same name, right?

I think so. But it's only when the same aliased disk is attached twice.

> AFAIU storage migration is a case where an aliased volume can lead to
> very unexpected behavior, like storages running full. So a check and die
> is probably a good idea.

That is a good argument for the check during migration :)

> An additional warning during startup could probably be a good idea. In
> that case, the checks should probably be factored out into their own
> method so we can call it from the start as well. Though that would
> probably mean, another round of iterating through the config instead of
> inlining it.

See my other reply to the original mail for 3/6 for a suggestion.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [pve-devel] [PATCH qemu-server v2 3/6] migration: fail when aliased volume is detected
  2023-05-12 12:40 ` [pve-devel] [PATCH qemu-server v2 3/6] migration: fail when aliased volume is detected Aaron Lauterer
  2023-05-22 14:17   ` Fiona Ebner
@ 2023-05-25  8:15   ` Fiona Ebner
  1 sibling, 0 replies; 17+ messages in thread
From: Fiona Ebner @ 2023-05-25  8:15 UTC (permalink / raw)
  To: Proxmox VE development discussion, Aaron Lauterer

Am 12.05.23 um 14:40 schrieb Aaron Lauterer:
> Aliased volumes (referencing the same disk image multiple times) can
> lead to unexpected behavior in a migration.
> 
> Therefore, stop the migration in such a case.
> 
> The check works by comparing the path returned by the storage plugin.
> This means that we should be able to catch the common situations where
> it can happen:
> 
> * by referencing the same volid multiple times

This situation is not actually caught by the patch? If it's the same
volid, it will resolve to the same path, so it will just execute
$path_to_volid->{$path}->{$volid} = 1;
multiple times and there won't be any detection of multi-reference, or
what am I missing? Just incrementing the counter instead also doesn't
work, because e.g. multi-reference is fine when done by snapshots.

Maybe there should be a helper just for the currently attached disks?
E.g. iterate over the currently attached disks with
QemuConfig->foreach_volume() and for each volume, resolve the path and
count. If any path is referenced multiple times, complain. To not lose
the volids while counting, you could e.g. "count" with
push $path_to_volid->{$path}->@*, $volid;

The helper can then be called during migration additionally to your
current patch here.

And the helper can also be re-used during VM start (maybe only warn
there to allow for exotic use-cases?) and I'd expect it to already catch
most problematic situations there.

> @@ -397,6 +397,8 @@ sub scan_local_volumes {
>  	    die "owned by other VM (owner = VM $owner)\n"
>  		if !$owner || ($owner != $vmid);
>  
> +	    $path_to_volid->{$path}->{$volid} = 1;
> +
>  	    return if $attr->{is_vmstate};
>  
>  	    if (defined($snaprefs)) {
> @@ -443,6 +445,12 @@ sub scan_local_volumes {
>  	    }
>          });
>  
> +	for my $path (keys %$path_to_volid) {
> +	    my @volids = keys %{$path_to_volid->{$path}};
> +	    die "detected not supported aliased volumes: '" . join("', '", @volids) . "'"
> +		if (scalar @volids > 1);
> +	}
> +
>  	foreach my $vol (sort keys %$local_volumes) {
>  	    my $type = $replicatable_volumes->{$vol} ? 'local, replicated' : 'local';
>  	    my $ref = $local_volumes->{$vol}->{ref};




^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-05-25  8:15 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-12 12:40 [pve-devel] [PATCH qemu-server, container v2 0/6] migration: don't scan all storages, fail on aliases Aaron Lauterer
2023-05-12 12:40 ` [pve-devel] [PATCH qemu-server v2 1/6] migration: only migrate disks used by the guest Aaron Lauterer
2023-05-22 11:59   ` Fiona Ebner
2023-05-24 15:00     ` Aaron Lauterer
2023-05-12 12:40 ` [pve-devel] [PATCH qemu-server v2 2/6] tests: add migration test for pending disk Aaron Lauterer
2023-05-22 14:02   ` Fiona Ebner
2023-05-12 12:40 ` [pve-devel] [PATCH qemu-server v2 3/6] migration: fail when aliased volume is detected Aaron Lauterer
2023-05-22 14:17   ` Fiona Ebner
2023-05-24 14:40     ` Aaron Lauterer
2023-05-25  8:14       ` Fiona Ebner
2023-05-25  8:15   ` Fiona Ebner
2023-05-12 12:40 ` [pve-devel] [PATCH qemu-server v2 4/6] tests: add migration alias check Aaron Lauterer
2023-05-22 14:25   ` Fiona Ebner
2023-05-24 14:41     ` Aaron Lauterer
2023-05-12 12:40 ` [pve-devel] [PATCH container v2 5/6] migration: only migrate volumes used by the guest Aaron Lauterer
2023-05-22 15:00   ` Fiona Ebner
2023-05-12 12:40 ` [pve-devel] [PATCH container v2 6/6] migration: fail when aliased volume is detected Aaron Lauterer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal