public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [RFC qemu-server/storage 0/3] fix #5779: introduce guest hints to pass rxbounce flag to KRBD
@ 2025-10-24 12:25 Friedrich Weber
  2025-10-24 12:25 ` [pve-devel] [PATCH pve-storage 1/2] plugin: map/activate volume: allow callers to pass hints Friedrich Weber
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Friedrich Weber @ 2025-10-24 12:25 UTC (permalink / raw)
  To: pve-devel

When using Windows VMs with KRBD, it can be necessary to pass the rxbounce
option to `rbd map` to avoid "bad crc/signature" warnings in the journal,
retransmits and degraded performance. A first attempt to allow this [0] added
an rxbounce storage option, but Thomas pointed out that this might nudge users
towards creating two storages pointing to the same Ceph pool (one for Linux VMs
that do not need rxbounce, one for Windows VMs that need rxbounce), which we
should discourage.

The cleanest solution from the user POV would for us to pass rxbounce only when
mapping disks of VMs with Windows OS type, but this is not straightforward to
implement, as pve-storage currently does not have knowledge about the "owner"
of the volumes it manages. Fabian and Thomas suggested an approach to pass
well-defined "hints" from e.g. qemu-server to pve-storage [0], one of which
could be the guest's OS type, and based on which the RBD plugin could decide
whether to pass rxbounce or not. We'd need to take care to avoid too much
coupling between qemu-server and pve-storage, though.

Here I took a first attempt at implementing such "hints" after an off-list
discussion with Thomas. This RFC is intended as a base for further discussion
whether we want hints in general, and if yes, whether we want them in this
form. It should definitely not be merged like this and I tested it only
superficially.

For now, PVE::Storage::activate_volumes and PVE::Storage::map_volumes accept
"hints" an optional hash reference with a JSON schema (which they validate).
These hints are then passed to Plugin::{activate,map}_volume. PVE::Storage also
provides a subroutine to check whether the storage stack supports a particular
hint, callers (such as qemu-server) can use that to determine whether they can
pass a certain hint. Thomas suggested (something like) that so we don't
necessarily always have to bump qemu-server when we introduce a new hint in
pve-storage.

Obstacles I faced so far:

- The biggest obstacle is that we need to update all callers of
  activate_volumes to pass guest hints where possible. This means that optimally
  all callers should be able to generate the guest hints. Right now, there is
  only the 'guest-ostype' hint which is taken from the VM config. Currently, this
  is not always available to the caller of activate_volumes, sometimes
  some extra work/refactoring would be needed to get it (e.g. see
  PVE::QemuServer::QemuImage::convert or PVE::QemuServer::clone_disk),
  so this needs quite some code changes, which I have not done in all cases in
  this RFC.

- There are also some indirect callers of activate_volumes, e.g. via
  PVE::Storage::abs_filesystem_path or PVE::Storage::storage_migrate -- these
  would also need to be extended to accept hints (not done in this RFC)

- Initially, to avoid having to modify all (direct+indirect) callers of
  activate_volumes, I thought I could pass the hints only at the few "relevant"
  call sites (i.e., when starting a VM), but then noticed that volumes may be
  activated by an action unrelated to a VM start (e.g. a clone), then stay
  active, and not be re-activated by a VM start. So if e.g. we do not pass the
  hints on clone, the KRBD volume would be mapped without rxbounce, stay active,
  and when starting the VM, a user could run into the original problem again.
  So we can't get away with only passing hints to the few relevant call sites,
  and actually need to pass them everywhere (where possible).

- But sometimes, it might be fine to not pass hints to activate_volumes because
  we immediately deactivate, e.g. in QemuServer::restore_vma_archive, but only
  because the only current hint 'guest-ostype' is not relevant in such cases -
  might not be the case for future hints?

- 'guest-ostype' might be the only hint we need for now and can be taken from
  the guest config, but can we be sure that future hints are also going to be
  derived from the guest config? Is it OK to have generate_storage_hints only
  take the guest config as well?

- minor, but still: with this approach, rxbounce would also be passed for the
  EFI disk and TPM state of Windows VMs. This is likely not needed, but probably
  OK for a first version?

- should we already pass hints when activating volumes in pve-container too,
  even though they are empty?

- is it even OK to pass the hints only to {activate,map}_volumes? What if an
  external storage plugin needs the hint also when deactivating a volume? Should
  we already add a $hints parameter to other plugin methods as well, to avoid
  having to do that (and another APIVER bump) later?

- With hints being an argument to activate_volumes which takes a list of volumes,
  the hint applies to all volumes in the list. Is this OK API-wise? For the
  guest-ostype hint, this only makes sense if all passed volumes belong to the
  same guest and not several different ones (because they may have different OS
  types), which I think is the case for our call sites, but doesn't have to be
  the case?

- bumping the APIVER and docs are of course also needed, but left out of this
  RFC for now.

All of the above can likely be sorted out, but especially modifying all
direct+indirect call sites of {activate,map}_volumes will need some work and
care to avoid subtly breaking things. Before getting into that, I wanted to ask
for opinions about the general approach -- also because on the one hand I want
to avoid "overfitting" the guest hints to the one usecase we currently have,
but also avoid overengineering it to fit hypothetical usecases we don't even
know about yet.

What do you think?

[0] https://lore.proxmox.com/pve-devel/20241025111304.99680-2-f.weber@proxmox.com/T/
[1] https://lore.proxmox.com/pve-devel/0fdd8f93-8ef6-4c83-92ab-8e86d5386343@proxmox.com/

pve-storage:

Friedrich Weber (2):
  plugin: map/activate volume: allow callers to pass hints
  plugin: rbd: pass rxbounce when mapping Windows VM disks

 src/PVE/Storage.pm           | 34 ++++++++++++++++++++++++++++++----
 src/PVE/Storage/Plugin.pm    | 17 +++++++++++++++--
 src/PVE/Storage/RBDPlugin.pm | 13 +++++++++----
 3 files changed, 54 insertions(+), 10 deletions(-)


qemu-server:

Friedrich Weber (1):
  fix #5997: qemu: pass guest-ostype hint when activating volumes

 src/PVE/API2/Qemu.pm             | 18 ++++++++++---
 src/PVE/QemuConfig.pm            |  1 +
 src/PVE/QemuServer.pm            | 44 ++++++++++++++++++++++++--------
 src/PVE/QemuServer/Cloudinit.pm  |  3 ++-
 src/PVE/QemuServer/ImportDisk.pm |  4 ++-
 src/PVE/QemuServer/OVMF.pm       |  4 ++-
 src/PVE/QemuServer/QemuImage.pm  |  5 ++--
 src/PVE/QemuServer/RunState.pm   |  3 ++-
 src/PVE/VZDump/QemuServer.pm     |  6 +++--
 9 files changed, 66 insertions(+), 22 deletions(-)


Summary over all repositories:
  12 files changed, 120 insertions(+), 32 deletions(-)

-- 
Generated by git-murpp 0.8.1


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [pve-devel] [PATCH pve-storage 1/2] plugin: map/activate volume: allow callers to pass hints
  2025-10-24 12:25 [pve-devel] [RFC qemu-server/storage 0/3] fix #5779: introduce guest hints to pass rxbounce flag to KRBD Friedrich Weber
@ 2025-10-24 12:25 ` Friedrich Weber
  2025-10-28 12:17   ` Fiona Ebner
  2025-10-24 12:25 ` [pve-devel] [PATCH pve-storage 2/2] plugin: rbd: pass rxbounce when mapping Windows VM disks Friedrich Weber
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Friedrich Weber @ 2025-10-24 12:25 UTC (permalink / raw)
  To: pve-devel

Right now, the only supported hint is guest-ostype, via which
qemu-server provides the guest's OS type from the VM configuration.

Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
---
 src/PVE/Storage.pm        | 34 ++++++++++++++++++++++++++++++----
 src/PVE/Storage/Plugin.pm | 17 +++++++++++++++--
 2 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/src/PVE/Storage.pm b/src/PVE/Storage.pm
index 1dde2b7..709d9ca 100755
--- a/src/PVE/Storage.pm
+++ b/src/PVE/Storage.pm
@@ -771,6 +771,7 @@ sub abs_filesystem_path {
 
     my $path;
     if (parse_volume_id($volid, 1)) {
+        # TODO: in order to pass $hints, abs_filesystem_path needs to accept it too
         activate_volumes($cfg, [$volid]);
         $path = PVE::Storage::path($cfg, $volid);
     } else {
@@ -905,6 +906,7 @@ my $volume_export_prepare = sub {
 
     volume_snapshot($cfg, $volid, $snapshot) if $migration_snapshot;
 
+    # TODO: in order to pass $hints, $volume_export_prepare would need to accept it too
     if (defined($snapshot)) {
         activate_volumes($cfg, [$volid], $snapshot);
     } else {
@@ -1122,7 +1124,9 @@ sub vdisk_create_base {
 }
 
 sub map_volume {
-    my ($cfg, $volid, $snapname) = @_;
+    my ($cfg, $volid, $snapname, $hints) = @_;
+
+    verify_hints($hints);
 
     my ($storeid, $volname) = parse_volume_id($volid);
 
@@ -1130,7 +1134,7 @@ sub map_volume {
 
     my $plugin = PVE::Storage::Plugin->lookup($scfg->{type});
 
-    return $plugin->map_volume($storeid, $scfg, $volname, $snapname);
+    return $plugin->map_volume($storeid, $scfg, $volname, $snapname, $hints);
 }
 
 sub unmap_volume {
@@ -1385,11 +1389,33 @@ sub deactivate_storage {
     $plugin->deactivate_storage($storeid, $scfg, $cache);
 }
 
+sub is_hint_supported {
+    my ($hint) = @_;
+
+    return defined($PVE::Storage::Plugin::hints_properties->{$hint});
+}
+
+sub verify_hints {
+    my ($hints, $noerr) = @_;
+
+    return if !defined($hints);
+
+    eval { PVE::JSONSchema::validate($hints, $PVE::Storage::Plugin::hints_format); };
+    my $err = $@;
+
+    return $hints if !$err;
+    return if $noerr;
+
+    die "hints are not valid: $@";
+}
+
 sub activate_volumes {
-    my ($cfg, $vollist, $snapname) = @_;
+    my ($cfg, $vollist, $snapname, $hints) = @_;
 
     return if !($vollist && scalar(@$vollist));
 
+    verify_hints($hints);
+
     my $storagehash = {};
     foreach my $volid (@$vollist) {
         my ($storeid, undef) = parse_volume_id($volid);
@@ -1404,7 +1430,7 @@ sub activate_volumes {
         my ($storeid, $volname) = parse_volume_id($volid);
         my $scfg = storage_config($cfg, $storeid);
         my $plugin = PVE::Storage::Plugin->lookup($scfg->{type});
-        $plugin->activate_volume($storeid, $scfg, $volname, $snapname, $cache);
+        $plugin->activate_volume($storeid, $scfg, $volname, $snapname, $cache, $hints);
     }
 }
 
diff --git a/src/PVE/Storage/Plugin.pm b/src/PVE/Storage/Plugin.pm
index 8acd214..2c72179 100644
--- a/src/PVE/Storage/Plugin.pm
+++ b/src/PVE/Storage/Plugin.pm
@@ -59,6 +59,19 @@ cfs_register_file(
     sub { __PACKAGE__->write_config(@_); },
 );
 
+our $hints_properties = {
+    'guest-ostype' => {
+        type => 'string',
+        optional => 1,
+    },
+};
+
+our $hints_format = {
+    type => 'object',
+    additionalProperties => 0,
+    properties => $hints_properties,
+};
+
 my %prune_option = (
     optional => 1,
     type => 'integer',
@@ -1871,7 +1884,7 @@ sub deactivate_storage {
 }
 
 sub map_volume {
-    my ($class, $storeid, $scfg, $volname, $snapname) = @_;
+    my ($class, $storeid, $scfg, $volname, $snapname, $hints) = @_;
 
     my ($path) = $class->path($scfg, $volname, $storeid, $snapname);
     return $path;
@@ -1884,7 +1897,7 @@ sub unmap_volume {
 }
 
 sub activate_volume {
-    my ($class, $storeid, $scfg, $volname, $snapname, $cache) = @_;
+    my ($class, $storeid, $scfg, $volname, $snapname, $cache, $hints) = @_;
 
     my $path = $class->filesystem_path($scfg, $volname, $snapname);
 
-- 
2.47.3



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [pve-devel] [PATCH pve-storage 2/2] plugin: rbd: pass rxbounce when mapping Windows VM disks
  2025-10-24 12:25 [pve-devel] [RFC qemu-server/storage 0/3] fix #5779: introduce guest hints to pass rxbounce flag to KRBD Friedrich Weber
  2025-10-24 12:25 ` [pve-devel] [PATCH pve-storage 1/2] plugin: map/activate volume: allow callers to pass hints Friedrich Weber
@ 2025-10-24 12:25 ` Friedrich Weber
  2025-10-28 12:19   ` Fiona Ebner
  2025-10-24 12:25 ` [pve-devel] [PATCH qemu-server 1/1] fix #5997: qemu: pass guest-ostype hint when activating volumes Friedrich Weber
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Friedrich Weber @ 2025-10-24 12:25 UTC (permalink / raw)
  To: pve-devel

rxbounce can be needed to avoid "bad crc/signature" errors when
running Windows VM disks on KRBD. When mapping a volume, check
the guest-ostype hint, and if it denotes the volume is mapped
as part of a Windows VM, pass rxbounce.

Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
---
 src/PVE/Storage/RBDPlugin.pm | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/src/PVE/Storage/RBDPlugin.pm b/src/PVE/Storage/RBDPlugin.pm
index cf371c7..54c5a57 100644
--- a/src/PVE/Storage/RBDPlugin.pm
+++ b/src/PVE/Storage/RBDPlugin.pm
@@ -793,7 +793,7 @@ sub deactivate_storage {
 }
 
 sub map_volume {
-    my ($class, $storeid, $scfg, $volname, $snapname) = @_;
+    my ($class, $storeid, $scfg, $volname, $snapname, $hints) = @_;
 
     my ($vtype, $img_name, $vmid) = $class->parse_volname($volname);
 
@@ -807,7 +807,12 @@ sub map_volume {
     # features can only be enabled/disabled for image, not for snapshot!
     $krbd_feature_update->($scfg, $storeid, $img_name);
 
-    my $cmd = $rbd_cmd->($scfg, $storeid, 'map', $name);
+    my @options = ();
+    if (defined($hints->{'guest-ostype'}) && $hints->{'guest-ostype'} =~ /^w/) {
+        @options = ('--options', 'rxbounce');
+    }
+
+    my $cmd = $rbd_cmd->($scfg, $storeid, 'map', $name, @options);
     run_rbd_command($cmd, errmsg => "can't map rbd volume $name");
 
     return $kerneldev;
@@ -830,9 +835,9 @@ sub unmap_volume {
 }
 
 sub activate_volume {
-    my ($class, $storeid, $scfg, $volname, $snapname, $cache) = @_;
+    my ($class, $storeid, $scfg, $volname, $snapname, $cache, $hints) = @_;
 
-    $class->map_volume($storeid, $scfg, $volname, $snapname) if $scfg->{krbd};
+    $class->map_volume($storeid, $scfg, $volname, $snapname, $hints) if $scfg->{krbd};
 
     return 1;
 }
-- 
2.47.3



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [pve-devel] [PATCH qemu-server 1/1] fix #5997: qemu: pass guest-ostype hint when activating volumes
  2025-10-24 12:25 [pve-devel] [RFC qemu-server/storage 0/3] fix #5779: introduce guest hints to pass rxbounce flag to KRBD Friedrich Weber
  2025-10-24 12:25 ` [pve-devel] [PATCH pve-storage 1/2] plugin: map/activate volume: allow callers to pass hints Friedrich Weber
  2025-10-24 12:25 ` [pve-devel] [PATCH pve-storage 2/2] plugin: rbd: pass rxbounce when mapping Windows VM disks Friedrich Weber
@ 2025-10-24 12:25 ` Friedrich Weber
  2025-10-28  9:02 ` [pve-devel] [RFC qemu-server/storage 0/3] fix #5779: introduce guest hints to pass rxbounce flag to KRBD Friedrich Weber
  2025-10-28 12:10 ` Fiona Ebner
  4 siblings, 0 replies; 12+ messages in thread
From: Friedrich Weber @ 2025-10-24 12:25 UTC (permalink / raw)
  To: pve-devel

KRBD can use this hint to pass rxbounce only when mapping disks of
Windows VMs, which should avoid "bad crc/signature" errors in the
host journal, retransmits as well as degraded performance.

Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
---
 src/PVE/API2/Qemu.pm             | 18 ++++++++++---
 src/PVE/QemuConfig.pm            |  1 +
 src/PVE/QemuServer.pm            | 44 ++++++++++++++++++++++++--------
 src/PVE/QemuServer/Cloudinit.pm  |  3 ++-
 src/PVE/QemuServer/ImportDisk.pm |  4 ++-
 src/PVE/QemuServer/OVMF.pm       |  4 ++-
 src/PVE/QemuServer/QemuImage.pm  |  5 ++--
 src/PVE/QemuServer/RunState.pm   |  3 ++-
 src/PVE/VZDump/QemuServer.pm     |  6 +++--
 9 files changed, 66 insertions(+), 22 deletions(-)

diff --git a/src/PVE/API2/Qemu.pm b/src/PVE/API2/Qemu.pm
index 71bedc1e..296d73a3 100644
--- a/src/PVE/API2/Qemu.pm
+++ b/src/PVE/API2/Qemu.pm
@@ -446,6 +446,8 @@ my sub create_disks : prototype($$$$$$$$$$$) {
         my $volid = $disk->{file};
         my ($storeid, $volname) = PVE::Storage::parse_volume_id($volid, 1);
 
+        my $storage_hints = PVE::QemuServer::generate_storage_hints($conf);
+
         if (!$volid || $volid eq 'none' || $volid eq 'cdrom') {
             delete $disk->{size};
             $res->{$ds} = PVE::QemuServer::print_drive($disk);
@@ -548,6 +550,7 @@ my sub create_disks : prototype($$$$$$$$$$$) {
                             drivename => $ds,
                             storage => $storeid,
                             format => $disk->{format},
+                            storage_hints => $storage_hints,
                         };
 
                         $dest_info->{efisize} = PVE::QemuServer::get_efivars_size($conf, $disk)
@@ -681,7 +684,8 @@ my sub create_disks : prototype($$$$$$$$$$$) {
                 }
             }
 
-            PVE::Storage::activate_volumes($storecfg, [$volid]) if $storeid;
+            PVE::Storage::activate_volumes($storecfg, [$volid], undef, $storage_hints)
+                if $storeid;
 
             my $size = PVE::Storage::volume_size_info($storecfg, $volid);
             die "volume $volid does not exist\n" if !$size;
@@ -4487,11 +4491,13 @@ __PACKAGE__->register_method({
             my $newvollist = [];
             my $jobs = {};
 
+            my $storage_hints = PVE::QemuServer::generate_storage_hints($newconf);
+
             eval {
                 local $SIG{INT} = local $SIG{TERM} = local $SIG{QUIT} = local $SIG{HUP} =
                     sub { die "interrupted by signal\n"; };
 
-                PVE::Storage::activate_volumes($storecfg, $vollist, $snapname);
+                PVE::Storage::activate_volumes($storecfg, $vollist, $snapname, $storage_hints);
 
                 my $bwlimit = extract_param($param, 'bwlimit');
 
@@ -4522,6 +4528,7 @@ __PACKAGE__->register_method({
                         drivename => $opt,
                         storage => $storage,
                         format => $format,
+                        storage_hints => $storage_hints,
                     };
 
                     $dest_info->{efisize} = PVE::QemuServer::get_efivars_size($oldconf)
@@ -4768,7 +4775,8 @@ __PACKAGE__->register_method({
 
             my $running = PVE::QemuServer::check_running($vmid);
 
-            PVE::Storage::activate_volumes($storecfg, [$drive->{file}]);
+            my $storage_hints = PVE::QemuServer::generate_storage_hints($conf);
+            PVE::Storage::activate_volumes($storecfg, [$drive->{file}], undef, $storage_hints);
 
             my $newvollist = [];
 
@@ -4797,6 +4805,7 @@ __PACKAGE__->register_method({
                     drivename => $disk,
                     storage => $storeid,
                     format => $format,
+                    storage_hints => $storage_hints,
                 };
 
                 $dest_info->{efisize} = PVE::QemuServer::get_efivars_size($conf)
@@ -5847,7 +5856,8 @@ __PACKAGE__->register_method({
 
             $rpcenv->check($authuser, "/storage/$storeid", ['Datastore.AllocateSpace']);
 
-            PVE::Storage::activate_volumes($storecfg, [$volid]);
+            my $storage_hints = PVE::QemuServer::generate_storage_hints($conf);
+            PVE::Storage::activate_volumes($storecfg, [$volid], undef, $storage_hints);
             my $size = PVE::Storage::volume_size_info($storecfg, $volid, 5);
 
             die "Could not determine current size of volume '$volid'\n" if !defined($size);
diff --git a/src/PVE/QemuConfig.pm b/src/PVE/QemuConfig.pm
index ad6ce6b0..ef6c3edd 100644
--- a/src/PVE/QemuConfig.pm
+++ b/src/PVE/QemuConfig.pm
@@ -326,6 +326,7 @@ sub __snapshot_create_vol_snapshots_hook {
         if ($hook eq "before") {
             if ($snap->{vmstate}) {
                 my $path = PVE::Storage::path($storecfg, $snap->{vmstate});
+                # TODO: no guest hints needed for VM state?
                 PVE::Storage::activate_volumes($storecfg, [$snap->{vmstate}]);
                 my $state_storage_id = PVE::Storage::parse_volume_id($snap->{vmstate});
 
diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm
index cf195ccc..c7b6dc63 100644
--- a/src/PVE/QemuServer.pm
+++ b/src/PVE/QemuServer.pm
@@ -5248,7 +5248,9 @@ sub vmconfig_update_disk {
 
     die "skip\n" if !$hotplug || $opt =~ m/(ide|sata)(\d+)/;
     # hotplug new disks
-    PVE::Storage::activate_volumes($storecfg, [$drive->{file}]) if $drive->{file} !~ m|^/dev/.+|;
+    my $storage_hints = generate_storage_hints($conf);
+    PVE::Storage::activate_volumes($storecfg, [$drive->{file}], undef, $storage_hints)
+        if $drive->{file} !~ m|^/dev/.+|;
     vm_deviceplug($storecfg, $conf, $vmid, $opt, $drive, $arch, $machine_type);
 }
 
@@ -5363,6 +5365,15 @@ my sub remove_left_over_vmstate_opts {
     PVE::QemuConfig->write_config($vmid, $conf) if $found;
 }
 
+sub generate_storage_hints {
+    my ($conf) = @_;
+
+    my $hints = {};
+    $hints->{'guest-ostype'} = $conf->{ostype} if PVE::Storage::is_hint_supported('guest-ostype');
+
+    return $hints;
+}
+
 # see vm_start_nolock for parameters, additionally:
 # migrate_opts:
 #   storagemap = parsed storage map for allocating NBD disks
@@ -5526,11 +5537,13 @@ sub vm_start_nolock {
     my $vollist = get_current_vm_volumes($storecfg, $conf);
     push $vollist->@*, $statefile if $statefile_is_a_volume;
 
+    my $storage_hints = generate_storage_hints($conf);
+
     my ($cmd, $spice_port, $start_timeout);
     my $pci_reserve_list = [];
     eval {
         # With -blockdev, it is necessary to activate the volumes before generating the command line
-        PVE::Storage::activate_volumes($storecfg, $vollist);
+        PVE::Storage::activate_volumes($storecfg, $vollist, undef, $storage_hints);
 
         # Note that for certain cases like templates, the configuration is minimized, so need to ensure
         # the rest of the function here uses the same configuration that was used to build the command
@@ -5901,7 +5914,8 @@ sub vm_commandline {
     # With -blockdev, it is necessary to activate the volumes before generating the command line
     if (!$running) {
         $volumes = get_current_vm_volumes($storecfg, $conf);
-        PVE::Storage::activate_volumes($storecfg, $volumes);
+        my $storage_hints = generate_storage_hints($conf);
+        PVE::Storage::activate_volumes($storecfg, $volumes, undef, $storage_hints);
     }
 
     # There might be concurrent operations on the volumes, so do not deactivate.
@@ -6474,7 +6488,7 @@ my $parse_backup_hints = sub {
 #
 # Returns: { $virtdev => $volid }
 my $restore_allocate_devices = sub {
-    my ($storecfg, $virtdev_hash, $vmid) = @_;
+    my ($storecfg, $virtdev_hash, $vmid, $storage_hints) = @_;
 
     my $map = {};
     foreach my $virtdev (sort keys %$virtdev_hash) {
@@ -6502,7 +6516,7 @@ my $restore_allocate_devices = sub {
         print STDERR "new volume ID is '$volid'\n";
         $d->{volid} = $volid;
 
-        PVE::Storage::activate_volumes($storecfg, [$volid]);
+        PVE::Storage::activate_volumes($storecfg, [$volid], undef, $storage_hints);
 
         $map->{$virtdev} = $volid;
     }
@@ -6867,7 +6881,10 @@ sub restore_proxmox_backup_archive {
         $restore_cleanup_oldconf->($storecfg, $vmid, $oldconf, $virtdev_hash) if $oldconf;
 
         # allocate volumes
-        my $map = $restore_allocate_devices->($storecfg, $virtdev_hash, $vmid);
+        my $storage_hints = {};
+        # TODO: we should also generate storage hints here for live-restore, but we do not have
+        # the config yet -- build the config here already?
+        my $map = $restore_allocate_devices->($storecfg, $virtdev_hash, $vmid, $storage_hints);
 
         foreach my $virtdev (sort keys %$virtdev_hash) {
             my $d = $virtdev_hash->{$virtdev};
@@ -7037,7 +7054,9 @@ sub restore_external_archive {
         $restore_cleanup_oldconf->($storecfg, $vmid, $oldconf, $virtdev_hash) if $oldconf;
 
         # allocate volumes
-        my $map = $restore_allocate_devices->($storecfg, $virtdev_hash, $vmid);
+        my $storage_hints = {};
+        # TODO: in order to generate storage hints, we'd need to build the config already here
+        my $map = $restore_allocate_devices->($storecfg, $virtdev_hash, $vmid, $storage_hints);
 
         for my $virtdev (sort keys $virtdev_hash->%*) {
             my $d = $virtdev_hash->{$virtdev};
@@ -7062,6 +7081,7 @@ sub restore_external_archive {
                     'is-zero-initialized' => $sparseinit,
                     'source-path-format' => $source_format,
                 };
+                # no need to pass source-storage-hints for the source
                 PVE::QemuServer::QemuImage::convert(
                     $source_path,
                     $d->{volid},
@@ -7456,7 +7476,9 @@ sub restore_vma_archive {
         }
 
         # allocate volumes
-        my $map = $restore_allocate_devices->($cfg, $virtdev_hash, $vmid);
+        my $storage_hints = {};
+        # TODO: here we do not necessarily have to pass storage hints, as we deactivate when we're done?
+        my $map = $restore_allocate_devices->($cfg, $virtdev_hash, $vmid, $storage_hints);
 
         # print restore information to $fifofh
         foreach my $virtdev (sort keys %$virtdev_hash) {
@@ -7754,7 +7776,7 @@ sub clone_disk {
     my ($src_drivename, $drive, $snapname) = $source->@{qw(drivename drive snapname)};
 
     my ($newvmid, $dst_drivename, $efisize) = $dest->@{qw(vmid drivename efisize)};
-    my ($storage, $format) = $dest->@{qw(storage format)};
+    my ($storage, $format, $storage_hints) = $dest->@{qw(storage format storage_hints)};
 
     my $unused = defined($src_drivename) && $src_drivename =~ /^unused/;
     my $use_drive_mirror = $full && $running && $src_drivename && !$snapname && !$unused;
@@ -7820,7 +7842,7 @@ sub clone_disk {
         );
         push @$newvollist, $newvolid;
 
-        PVE::Storage::activate_volumes($storecfg, [$newvolid]);
+        PVE::Storage::activate_volumes($storecfg, [$newvolid], undef, $storage_hints);
 
         if (drive_is_cloudinit($drive)) {
             # when cloning multiple disks (e.g. during clone_vm) it might be the last disk
@@ -7877,6 +7899,8 @@ sub clone_disk {
                     bwlimit => $bwlimit,
                     'is-zero-initialized' => $sparseinit,
                     snapname => $snapname,
+                    # TODO: double-check it's OK to pass the dest's storage hints here
+                    'source-storage-hints' => $storage_hints,
                 };
                 PVE::QemuServer::QemuImage::convert($drive->{file}, $newvolid, $size, $opts);
             }
diff --git a/src/PVE/QemuServer/Cloudinit.pm b/src/PVE/QemuServer/Cloudinit.pm
index 349cf90b..f42047e0 100644
--- a/src/PVE/QemuServer/Cloudinit.pm
+++ b/src/PVE/QemuServer/Cloudinit.pm
@@ -48,7 +48,8 @@ sub commit_cloudinit_disk {
         $size *= 1024; # vdisk alloc takes KB, qemu-img dd's osize takes byte
     }
     my $plugin = PVE::Storage::Plugin->lookup($scfg->{type});
-    $plugin->activate_volume($storeid, $scfg, $volname);
+    my $storage_hints = PVE::QemuServer::generate_storage_hints($conf);
+    $plugin->activate_volume($storeid, $scfg, $volname, $storage_hints);
 
     print "generating cloud-init ISO\n";
     eval {
diff --git a/src/PVE/QemuServer/ImportDisk.pm b/src/PVE/QemuServer/ImportDisk.pm
index 75a54aa2..030cc4a1 100755
--- a/src/PVE/QemuServer/ImportDisk.pm
+++ b/src/PVE/QemuServer/ImportDisk.pm
@@ -84,7 +84,9 @@ sub do_import {
         local $SIG{INT} = local $SIG{TERM} = local $SIG{QUIT} = local $SIG{HUP} =
             local $SIG{PIPE} = sub { die "interrupted by signal $!\n"; };
 
-        PVE::Storage::activate_volumes($storecfg, [$dst_volid]);
+        # TODO: do we need storage hints here if we deactivate the volumes immediately?
+        PVE::Storage::activate_volumes($storecfg, [$dst_volid], undef, undef);
+        # TODO: do we need source-storage-hints here?
         PVE::QemuServer::QemuImage::convert(
             $src_path,
             $dst_volid,
diff --git a/src/PVE/QemuServer/OVMF.pm b/src/PVE/QemuServer/OVMF.pm
index 08134e30..717253b2 100644
--- a/src/PVE/QemuServer/OVMF.pm
+++ b/src/PVE/QemuServer/OVMF.pm
@@ -136,8 +136,10 @@ sub create_efidisk($$$$$$$$) {
     my $vars_size_b = -s $ovmf_vars;
     my $vars_size = PVE::Tools::convert_size($vars_size_b, 'b' => 'kb');
     my $volid = PVE::Storage::vdisk_alloc($storecfg, $storeid, $vmid, $fmt, undef, $vars_size);
-    PVE::Storage::activate_volumes($storecfg, [$volid]);
+    # TODO: storage hints needed for EFI disks?
+    PVE::Storage::activate_volumes($storecfg, [$volid], undef, undef);
 
+    # no source-storage-hints passed here, because the source is not a volume
     PVE::QemuServer::QemuImage::convert($ovmf_vars, $volid, $vars_size_b);
     my $size = PVE::Storage::volume_size_info($storecfg, $volid, 3);
 
diff --git a/src/PVE/QemuServer/QemuImage.pm b/src/PVE/QemuServer/QemuImage.pm
index 71be3abb..0adfbee1 100644
--- a/src/PVE/QemuServer/QemuImage.pm
+++ b/src/PVE/QemuServer/QemuImage.pm
@@ -97,10 +97,11 @@ my sub qcow2_target_image_opts {
 # snapname - Use this snapshot of the source image.
 # source-path-format - Indicate the format of the source when the source is a path. For PVE-managed
 # volumes, the format from the storage layer is always used.
+# source-storage-hints - storage hints to pass when activating the source volume
 sub convert {
     my ($src_volid, $dst_volid, $size, $opts) = @_;
 
-    my ($bwlimit, $snapname) = $opts->@{qw(bwlimit snapname)};
+    my ($bwlimit, $snapname, $storage_hints) = $opts->@{qw(bwlimit snapname source-storage-hints)};
 
     my $storecfg = PVE::Storage::config();
     my ($src_storeid) = PVE::Storage::parse_volume_id($src_volid, 1);
@@ -114,7 +115,7 @@ sub convert {
     my $src_format;
 
     if ($src_storeid) {
-        PVE::Storage::activate_volumes($storecfg, [$src_volid], $snapname);
+        PVE::Storage::activate_volumes($storecfg, [$src_volid], $snapname, $storage_hints);
         my $src_scfg = PVE::Storage::storage_config($storecfg, $src_storeid);
         $src_format = checked_volume_format($storecfg, $src_volid);
         $src_path = PVE::Storage::path($storecfg, $src_volid, $snapname);
diff --git a/src/PVE/QemuServer/RunState.pm b/src/PVE/QemuServer/RunState.pm
index 6a5fdbd7..fd2771c1 100644
--- a/src/PVE/QemuServer/RunState.pm
+++ b/src/PVE/QemuServer/RunState.pm
@@ -69,7 +69,8 @@ sub vm_suspend {
 
     if ($includestate) {
         # save vm state
-        PVE::Storage::activate_volumes($storecfg, [$vmstate]);
+        # TODO: need hints for VM state?
+        PVE::Storage::activate_volumes($storecfg, [$vmstate], undef, undef);
 
         eval {
             PVE::QemuMigrate::Helpers::set_migration_caps($vmid, 1);
diff --git a/src/PVE/VZDump/QemuServer.pm b/src/PVE/VZDump/QemuServer.pm
index dd789652..4f387731 100644
--- a/src/PVE/VZDump/QemuServer.pm
+++ b/src/PVE/VZDump/QemuServer.pm
@@ -118,7 +118,8 @@ sub prepare {
         $drivehash->{$name} = $volume->{volume_config};
     }
 
-    PVE::Storage::activate_volumes($self->{storecfg}, $vollist);
+    my $storage_hints = PVE::QemuServer::generate_storage_hints($conf);
+    PVE::Storage::activate_volumes($self->{storecfg}, $vollist, undef, $storage_hints);
 
     foreach my $ds (sort keys %$drivehash) {
         my $drive = $drivehash->{$ds};
@@ -643,7 +644,8 @@ my sub attach_fleecing_images {
     detach_fleecing_images($disks, $vmid);
 
     my $vollist = [map { $_->{'fleece-volid'} } grep { $_->{'fleece-volid'} } $disks->@*];
-    PVE::Storage::activate_volumes($self->{storecfg}, $vollist);
+    # TODO: storage hints needed for fleecing images?
+    PVE::Storage::activate_volumes($self->{storecfg}, $vollist, undef, undef);
 
     for my $di ($disks->@*) {
         if (my $volid = $di->{'fleece-volid'}) {
-- 
2.47.3



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [pve-devel] [RFC qemu-server/storage 0/3] fix #5779: introduce guest hints to pass rxbounce flag to KRBD
  2025-10-24 12:25 [pve-devel] [RFC qemu-server/storage 0/3] fix #5779: introduce guest hints to pass rxbounce flag to KRBD Friedrich Weber
                   ` (2 preceding siblings ...)
  2025-10-24 12:25 ` [pve-devel] [PATCH qemu-server 1/1] fix #5997: qemu: pass guest-ostype hint when activating volumes Friedrich Weber
@ 2025-10-28  9:02 ` Friedrich Weber
  2025-10-28 10:50   ` Fiona Ebner
  2025-10-28 12:10 ` Fiona Ebner
  4 siblings, 1 reply; 12+ messages in thread
From: Friedrich Weber @ 2025-10-28  9:02 UTC (permalink / raw)
  To: pve-devel

On 24/10/2025 14:27, Friedrich Weber wrote:
> [...]
> Obstacles I faced so far:
> 
> - The biggest obstacle is that we need to update all callers of
>   activate_volumes to pass guest hints where possible. This means that optimally
>   all callers should be able to generate the guest hints. Right now, there is
>   only the 'guest-ostype' hint which is taken from the VM config. Currently, this
>   is not always available to the caller of activate_volumes, sometimes
>   some extra work/refactoring would be needed to get it (e.g. see
>   PVE::QemuServer::QemuImage::convert or PVE::QemuServer::clone_disk),
>   so this needs quite some code changes, which I have not done in all cases in
>   this RFC.
> 
> - There are also some indirect callers of activate_volumes, e.g. via
>   PVE::Storage::abs_filesystem_path or PVE::Storage::storage_migrate -- these
>   would also need to be extended to accept hints (not done in this RFC)
> 
> - Initially, to avoid having to modify all (direct+indirect) callers of
>   activate_volumes, I thought I could pass the hints only at the few "relevant"
>   call sites (i.e., when starting a VM), but then noticed that volumes may be
>   activated by an action unrelated to a VM start (e.g. a clone), then stay
>   active, and not be re-activated by a VM start. So if e.g. we do not pass the
>   hints on clone, the KRBD volume would be mapped without rxbounce, stay active,
>   and when starting the VM, a user could run into the original problem again.
>   So we can't get away with only passing hints to the few relevant call sites,
>   and actually need to pass them everywhere (where possible).
> 

Thomas and I discussed this point off-list:

- to clarify: if a Windows guest volume was mapped with KRBD without
rxbounce (e.g. by a clone where the activate_volumes caller doesn't pass
$hints) and doesn't get unmapped, and then a VM start activates the
volumes again (passing $hints this time so we'd like to pass rxbounce),
RBDPlugin::map_volume will early-exit because the volume is already mapped:

sub map_volume {
    my ($class, $storeid, $scfg, $volname, $snapname) = @_;

    my ($vtype, $img_name, $vmid) = $class->parse_volname($volname);

    my $name = $img_name;
    $name .= '@' . $snapname if $snapname;

    my $kerneldev = get_rbd_dev_path($scfg, $storeid, $name);

    return $kerneldev if -b $kerneldev; # already mapped

    [...]
}

... which is the VM will just use the guest volume without rxbounce and
the user can run into the issue.

- we discussed whether, to avoid this, we could apply the rxbounce
option "on the fly" to an already-mapped volume. I looked a bit [1] and
didn't see any way to apply rxbounce to an already-mapped volume.
Calling `rbd map` again apparently just maps the volume a second time
which doesn't sound like a good idea, and an `rbd unmap` followed by an
`rbd map` (with rxbounce) is likely not safe either?

[1] https://docs.ceph.com/en/reef/man/8/rbd/


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [pve-devel] [RFC qemu-server/storage 0/3] fix #5779: introduce guest hints to pass rxbounce flag to KRBD
  2025-10-28  9:02 ` [pve-devel] [RFC qemu-server/storage 0/3] fix #5779: introduce guest hints to pass rxbounce flag to KRBD Friedrich Weber
@ 2025-10-28 10:50   ` Fiona Ebner
  2025-10-28 11:12     ` Fiona Ebner
  0 siblings, 1 reply; 12+ messages in thread
From: Fiona Ebner @ 2025-10-28 10:50 UTC (permalink / raw)
  To: Proxmox VE development discussion, Friedrich Weber; +Cc: Thomas Lamprecht

Am 28.10.25 um 10:02 AM schrieb Friedrich Weber:
> On 24/10/2025 14:27, Friedrich Weber wrote:
>> [...]
>> Obstacles I faced so far:
>>
>> - The biggest obstacle is that we need to update all callers of
>>   activate_volumes to pass guest hints where possible. This means that optimally
>>   all callers should be able to generate the guest hints. Right now, there is
>>   only the 'guest-ostype' hint which is taken from the VM config. Currently, this
>>   is not always available to the caller of activate_volumes, sometimes
>>   some extra work/refactoring would be needed to get it (e.g. see
>>   PVE::QemuServer::QemuImage::convert or PVE::QemuServer::clone_disk),
>>   so this needs quite some code changes, which I have not done in all cases in
>>   this RFC.
>>
>> - There are also some indirect callers of activate_volumes, e.g. via
>>   PVE::Storage::abs_filesystem_path or PVE::Storage::storage_migrate -- these
>>   would also need to be extended to accept hints (not done in this RFC)
>>
>> - Initially, to avoid having to modify all (direct+indirect) callers of
>>   activate_volumes, I thought I could pass the hints only at the few "relevant"
>>   call sites (i.e., when starting a VM), but then noticed that volumes may be
>>   activated by an action unrelated to a VM start (e.g. a clone), then stay
>>   active, and not be re-activated by a VM start. So if e.g. we do not pass the
>>   hints on clone, the KRBD volume would be mapped without rxbounce, stay active,
>>   and when starting the VM, a user could run into the original problem again.
>>   So we can't get away with only passing hints to the few relevant call sites,
>>   and actually need to pass them everywhere (where possible).
>>
> 
> Thomas and I discussed this point off-list:
> 
> - to clarify: if a Windows guest volume was mapped with KRBD without
> rxbounce (e.g. by a clone where the activate_volumes caller doesn't pass
> $hints) and doesn't get unmapped, and then a VM start activates the
> volumes again (passing $hints this time so we'd like to pass rxbounce),
> RBDPlugin::map_volume will early-exit because the volume is already mapped:
> 
> sub map_volume {
>     my ($class, $storeid, $scfg, $volname, $snapname) = @_;
> 
>     my ($vtype, $img_name, $vmid) = $class->parse_volname($volname);
> 
>     my $name = $img_name;
>     $name .= '@' . $snapname if $snapname;
> 
>     my $kerneldev = get_rbd_dev_path($scfg, $storeid, $name);
> 
>     return $kerneldev if -b $kerneldev; # already mapped
> 
>     [...]
> }
> 
> ... which is the VM will just use the guest volume without rxbounce and
> the user can run into the issue.
> 
> - we discussed whether, to avoid this, we could apply the rxbounce
> option "on the fly" to an already-mapped volume. I looked a bit [1] and
> didn't see any way to apply rxbounce to an already-mapped volume.
> Calling `rbd map` again apparently just maps the volume a second time
> which doesn't sound like a good idea, and an `rbd unmap` followed by an
> `rbd map` (with rxbounce) is likely not safe either?
> 
> [1] https://docs.ceph.com/en/reef/man/8/rbd/

I was also thinking along similar lines when reading that paragraph of
the cover letter.

Using unmap and then map again might be safe in certain situations like
VM start, where there should be no other active users of the volume
(from the top of my head, but would need to be checked in detail of
course). That information is not available to the map_volume()
implementation of course, so it would need to be part of the contract
that either:

1. hints are only used in situations where a deactivate/unmap up front
is safe or
2. that the callers using hints are required to ensure the volume is
deactivated first or there is no guarantee that the hint is used.

Approach 1. would have the advantage that the plugin could check if it
even needs to do the unmap/deactivate and for some plugins/hints it
might be able to be done on the fly.

Question is, do we already have callers that would be unhappy with that
requirement? As you described, the approach from the series has the
requirement of knowing "early enough" about what hints we'll need, which
is not always easy, but I haven't looked at it yet, just wanted to ask
about this already :)


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [pve-devel] [RFC qemu-server/storage 0/3] fix #5779: introduce guest hints to pass rxbounce flag to KRBD
  2025-10-28 10:50   ` Fiona Ebner
@ 2025-10-28 11:12     ` Fiona Ebner
  0 siblings, 0 replies; 12+ messages in thread
From: Fiona Ebner @ 2025-10-28 11:12 UTC (permalink / raw)
  To: Proxmox VE development discussion, Friedrich Weber; +Cc: Thomas Lamprecht

Am 28.10.25 um 11:50 AM schrieb Fiona Ebner:
> Using unmap and then map again might be safe in certain situations like
> VM start, where there should be no other active users of the volume
> (from the top of my head, but would need to be checked in detail of
> course).
I think it must be safe, because we do have an explicit deactivate call
if VM start fails ;)


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [pve-devel] [RFC qemu-server/storage 0/3] fix #5779: introduce guest hints to pass rxbounce flag to KRBD
  2025-10-24 12:25 [pve-devel] [RFC qemu-server/storage 0/3] fix #5779: introduce guest hints to pass rxbounce flag to KRBD Friedrich Weber
                   ` (3 preceding siblings ...)
  2025-10-28  9:02 ` [pve-devel] [RFC qemu-server/storage 0/3] fix #5779: introduce guest hints to pass rxbounce flag to KRBD Friedrich Weber
@ 2025-10-28 12:10 ` Fiona Ebner
  2025-10-28 16:35   ` Friedrich Weber
  4 siblings, 1 reply; 12+ messages in thread
From: Fiona Ebner @ 2025-10-28 12:10 UTC (permalink / raw)
  To: Proxmox VE development discussion, Friedrich Weber

Am 24.10.25 um 2:27 PM schrieb Friedrich Weber:
> - The biggest obstacle is that we need to update all callers of
>   activate_volumes to pass guest hints where possible. This means that optimally
>   all callers should be able to generate the guest hints. Right now, there is
>   only the 'guest-ostype' hint which is taken from the VM config. Currently, this
>   is not always available to the caller of activate_volumes, sometimes
>   some extra work/refactoring would be needed to get it (e.g. see
>   PVE::QemuServer::QemuImage::convert or PVE::QemuServer::clone_disk),
>   so this needs quite some code changes, which I have not done in all cases in
>   this RFC.

See below.

> - There are also some indirect callers of activate_volumes, e.g. via
>   PVE::Storage::abs_filesystem_path or PVE::Storage::storage_migrate -- these
>   would also need to be extended to accept hints (not done in this RFC)

See below.

> - Initially, to avoid having to modify all (direct+indirect) callers of
>   activate_volumes, I thought I could pass the hints only at the few "relevant"
>   call sites (i.e., when starting a VM), but then noticed that volumes may be
>   activated by an action unrelated to a VM start (e.g. a clone), then stay
>   active, and not be re-activated by a VM start. So if e.g. we do not pass the
>   hints on clone, the KRBD volume would be mapped without rxbounce, stay active,
>   and when starting the VM, a user could run into the original problem again.
>   So we can't get away with only passing hints to the few relevant call sites,
>   and actually need to pass them everywhere (where possible).

See other mail.

> - But sometimes, it might be fine to not pass hints to activate_volumes because
>   we immediately deactivate, e.g. in QemuServer::restore_vma_archive, but only
>   because the only current hint 'guest-ostype' is not relevant in such cases -
>   might not be the case for future hints?

Ideally, if we go with the current approach, we would cover everything
so that we'd only need to adapt the generate_storage_hints() function.

But something completely different which comes to mind here. What about
the volumes that already are active immediately after allocation? For
those, we either need to:
1. deactivate them and reactivate them with hints after allocation
2. or pass along the hints during allocation
3. ensure that they are not used in situations where hints are required
before deactivation.

The alternative approach of "make sure to deactivate before using hints
in situations where actually required" would avoid that complication too
(with 1. being very similar but here it would be immediately after
allocation).

> - 'guest-ostype' might be the only hint we need for now and can be taken from
>   the guest config, but can we be sure that future hints are also going to be
>   derived from the guest config? Is it OK to have generate_storage_hints only
>   take the guest config as well?

This can easily be changed later when the need arises.

> - minor, but still: with this approach, rxbounce would also be passed for the
>   EFI disk and TPM state of Windows VMs. This is likely not needed, but probably
>   OK for a first version?

Can be okay, but see below.

> - should we already pass hints when activating volumes in pve-container too,
>   even though they are empty?

Not strictly required IMHO, but feel free to if you like.

> - is it even OK to pass the hints only to {activate,map}_volumes? What if an
>   external storage plugin needs the hint also when deactivating a volume? Should
>   we already add a $hints parameter to other plugin methods as well, to avoid
>   having to do that (and another APIVER bump) later?

I'd prefer to add it to deactivate too.

> - With hints being an argument to activate_volumes which takes a list of volumes,
>   the hint applies to all volumes in the list. Is this OK API-wise? For the
>   guest-ostype hint, this only makes sense if all passed volumes belong to the
>   same guest and not several different ones (because they may have different OS
>   types), which I think is the case for our call sites, but doesn't have to be
>   the case?

I'd prefer it to be per-volume, so that we can leave out TPM and EFI
disks and things like the $statefile when that is a volume. Can be done
by the current interface by making multiple calls of course, but that is
not ideal. Maybe one of the future hints will be precisely for TPM or
EFI, those seem like likely candidates for requiring special treatment.

We could deprecate activate_volumes() and either:

1. add a new function which takes a hash reference rather than an array
reference. The keys would be volid with values being additional options,
like snapshot name and hints.

2. Or we could have a mini-class PVE::Storage::Activate which would have
a method to activate a single volume, but keeps track of which storages
and volumes it already activated in its internal state and which could
then also have a method for deactivating what it activated (like is
needed after VM start failure).

But please note that this is independent of the plugin API and does not
need to be done at the same time as the changes there (AFAICS).


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [pve-devel] [PATCH pve-storage 1/2] plugin: map/activate volume: allow callers to pass hints
  2025-10-24 12:25 ` [pve-devel] [PATCH pve-storage 1/2] plugin: map/activate volume: allow callers to pass hints Friedrich Weber
@ 2025-10-28 12:17   ` Fiona Ebner
  0 siblings, 0 replies; 12+ messages in thread
From: Fiona Ebner @ 2025-10-28 12:17 UTC (permalink / raw)
  To: Proxmox VE development discussion, Friedrich Weber

Am 24.10.25 um 2:27 PM schrieb Friedrich Weber:
> +sub verify_hints {
> +    my ($hints, $noerr) = @_;
> +
> +    return if !defined($hints);
> +
> +    eval { PVE::JSONSchema::validate($hints, $PVE::Storage::Plugin::hints_format); };
> +    my $err = $@;
> +
> +    return $hints if !$err;
> +    return if $noerr;
> +
> +    die "hints are not valid: $@";
Nit: I'd use "internal error - storage hints ..." to reduce potential
for confusion.

> diff --git a/src/PVE/Storage/Plugin.pm b/src/PVE/Storage/Plugin.pm
> index 8acd214..2c72179 100644
> --- a/src/PVE/Storage/Plugin.pm
> +++ b/src/PVE/Storage/Plugin.pm
> @@ -59,6 +59,19 @@ cfs_register_file(
>      sub { __PACKAGE__->write_config(@_); },
>  );
>  
> +our $hints_properties = {
> +    'guest-ostype' => {
> +        type => 'string',
> +        optional => 1,
> +    },

I'd rather have this be an explicit "guest-is-windows" boolean. The
storage layer should not have any idea of how to parse guest OS types.


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [pve-devel] [PATCH pve-storage 2/2] plugin: rbd: pass rxbounce when mapping Windows VM disks
  2025-10-24 12:25 ` [pve-devel] [PATCH pve-storage 2/2] plugin: rbd: pass rxbounce when mapping Windows VM disks Friedrich Weber
@ 2025-10-28 12:19   ` Fiona Ebner
  2025-10-28 15:35     ` Friedrich Weber
  0 siblings, 1 reply; 12+ messages in thread
From: Fiona Ebner @ 2025-10-28 12:19 UTC (permalink / raw)
  To: Proxmox VE development discussion, Friedrich Weber

Am 24.10.25 um 2:27 PM schrieb Friedrich Weber:
> rxbounce can be needed to avoid "bad crc/signature" errors when
> running Windows VM disks on KRBD. When mapping a volume, check
> the guest-ostype hint, and if it denotes the volume is mapped
> as part of a Windows VM, pass rxbounce.
> 
> Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
> ---
>  src/PVE/Storage/RBDPlugin.pm | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/src/PVE/Storage/RBDPlugin.pm b/src/PVE/Storage/RBDPlugin.pm
> index cf371c7..54c5a57 100644
> --- a/src/PVE/Storage/RBDPlugin.pm
> +++ b/src/PVE/Storage/RBDPlugin.pm
> @@ -793,7 +793,7 @@ sub deactivate_storage {
>  }
>  
>  sub map_volume {
> -    my ($class, $storeid, $scfg, $volname, $snapname) = @_;
> +    my ($class, $storeid, $scfg, $volname, $snapname, $hints) = @_;
>  
>      my ($vtype, $img_name, $vmid) = $class->parse_volname($volname);
>  
> @@ -807,7 +807,12 @@ sub map_volume {
>      # features can only be enabled/disabled for image, not for snapshot!
>      $krbd_feature_update->($scfg, $storeid, $img_name);
>  
> -    my $cmd = $rbd_cmd->($scfg, $storeid, 'map', $name);
> +    my @options = ();
> +    if (defined($hints->{'guest-ostype'}) && $hints->{'guest-ostype'} =~ /^w/) {

As already said in the last patch, this is a no-go. The RBD plugin
should not be concerned with knowing how qemu-server defines (Windows)
guest OS types.

> +        @options = ('--options', 'rxbounce');
> +    }
> +
> +    my $cmd = $rbd_cmd->($scfg, $storeid, 'map', $name, @options);
>      run_rbd_command($cmd, errmsg => "can't map rbd volume $name");
>  
>      return $kerneldev;
> @@ -830,9 +835,9 @@ sub unmap_volume {
>  }
>  
>  sub activate_volume {
> -    my ($class, $storeid, $scfg, $volname, $snapname, $cache) = @_;
> +    my ($class, $storeid, $scfg, $volname, $snapname, $cache, $hints) = @_;
>  
> -    $class->map_volume($storeid, $scfg, $volname, $snapname) if $scfg->{krbd};
> +    $class->map_volume($storeid, $scfg, $volname, $snapname, $hints) if $scfg->{krbd};
>  
>      return 1;
>  }



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [pve-devel] [PATCH pve-storage 2/2] plugin: rbd: pass rxbounce when mapping Windows VM disks
  2025-10-28 12:19   ` Fiona Ebner
@ 2025-10-28 15:35     ` Friedrich Weber
  0 siblings, 0 replies; 12+ messages in thread
From: Friedrich Weber @ 2025-10-28 15:35 UTC (permalink / raw)
  To: Fiona Ebner, Proxmox VE development discussion

On 28/10/2025 13:19, Fiona Ebner wrote:
> Am 24.10.25 um 2:27 PM schrieb Friedrich Weber:
>> rxbounce can be needed to avoid "bad crc/signature" errors when
>> running Windows VM disks on KRBD. When mapping a volume, check
>> the guest-ostype hint, and if it denotes the volume is mapped
>> as part of a Windows VM, pass rxbounce.
>>
>> Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
>> ---
>>  src/PVE/Storage/RBDPlugin.pm | 13 +++++++++----
>>  1 file changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/src/PVE/Storage/RBDPlugin.pm b/src/PVE/Storage/RBDPlugin.pm
>> index cf371c7..54c5a57 100644
>> --- a/src/PVE/Storage/RBDPlugin.pm
>> +++ b/src/PVE/Storage/RBDPlugin.pm
>> @@ -793,7 +793,7 @@ sub deactivate_storage {
>>  }
>>  
>>  sub map_volume {
>> -    my ($class, $storeid, $scfg, $volname, $snapname) = @_;
>> +    my ($class, $storeid, $scfg, $volname, $snapname, $hints) = @_;
>>  
>>      my ($vtype, $img_name, $vmid) = $class->parse_volname($volname);
>>  
>> @@ -807,7 +807,12 @@ sub map_volume {
>>      # features can only be enabled/disabled for image, not for snapshot!
>>      $krbd_feature_update->($scfg, $storeid, $img_name);
>>  
>> -    my $cmd = $rbd_cmd->($scfg, $storeid, 'map', $name);
>> +    my @options = ();
>> +    if (defined($hints->{'guest-ostype'}) && $hints->{'guest-ostype'} =~ /^w/) {
> 
> As already said in the last patch, this is a no-go. The RBD plugin
> should not be concerned with knowing how qemu-server defines (Windows)
> guest OS types.

Makes sense, will change this!


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [pve-devel] [RFC qemu-server/storage 0/3] fix #5779: introduce guest hints to pass rxbounce flag to KRBD
  2025-10-28 12:10 ` Fiona Ebner
@ 2025-10-28 16:35   ` Friedrich Weber
  0 siblings, 0 replies; 12+ messages in thread
From: Friedrich Weber @ 2025-10-28 16:35 UTC (permalink / raw)
  To: Fiona Ebner, Proxmox VE development discussion

Fiona and I discussed this off-list (thanks again!), to sum up:

- even though there is currently only one hint, we'd like a solution to
that still allows to add more hints later on but doesn't generate too
much churn now

- passing the OS type to enable rxbounce is only relevant at a few
callsites, e.g. when activating on (Windows) VM start, not when e.g.
activating for offline-cloning. So it would be nice to avoid touching
all activate_volumes callsites now

- it's likely that other hypothetical hints we might add in the future
will be relevant only for VM start as well, but if this assumption turns
out to be false, we should be able to adapt without too much hassle.

- rxbounce cannot be applied to already-mapped RBD volumes without
unmapping them first, but hypothetical future flags (that are set based
on guest hints) for other storage plugins might be applicable to
already-mapped volumes without unmapping, so we should be able to allow
this in the future without too much hassle.

- when starting a VM, if a volume is already active without rxbounce,
unmapping and re-mapping it (with rxbounce) should be safe [0], but it
might not be safe for all callsites

- so for now, we can go for a variant of Fiona's suggestion 1 here [1]
where:

* for the base functionality, we pass the hints only at the
{activate,map}_volumes callsites that are relevant for rxbounce
(essentially, on VM start). This avoids having to modify all callsites
now. If a hypothetical future plugin needs hints also at other
callsites, we can still pass the storage hints there

* we define a boolean hint 'plugin-may-deactivate' that, if true, tells
the plugin that it is free to unmap and a re-map volume if necessary to
make use of a hint. If 'plugin-may-deactivate' is 0, the plugin must not
unmap+map to apply a hint -- if it can apply a hint to an already-mapped
volume without unmap+map, it can still do so. On VM start, we pass 1 for
this flag. This way, passing flags at more callsites in the future is
doable (if unmap+map isn't safe, we just pass 0). And we keep the door
open for hints that plugins can apply without unmap+map.

* RBDPlugin::map_volume checks whether an already-mapped volume has
rxbounce set in sysfs, and if it hasn't but should (according to the
hints) and plugin-may-deactivate is 1, it can unmap and map the volume
with rxbounce.

* this can result in unnecessary map (without rxbounce)+unmap+map (with
rxbounce) sequences e.g. in case of live restore, which is not optimal
but also not really problematic. In case of live restore we can probably
fix this by passing the storage hints already on the first activation,
which can be done on top of the base functionality in v1 or (worst case)
as a follow-up.

* if a hypothetical future hint+plugin combination actually runs into
the problem that a hint was not passed on initial activation, is passed
later but cannot be applied anymore then, we still have the option of
adding storage hints to more/all activate_volumes callsites later on.

* Fiona pointed out a problem with volumes that already active after
creation [2]: depending on the hints and the plugin, in such a case the
plugin may be able to apply the hint directly or unmap+map to do so. If
both doesn't work, we can still think about adding hints also to
alloc_image (which would be backwards-compatible).

* the hints applying to all volumes passed to activate_volumes is not
great as Fiona also mentioned [2], and having an alternative API that
allows per-volume hint or even a helper class might be nicer, but since
this is kind of orthogonal I won't cover this in the v1. Having such an
alternative API might still be nice in the future (also because if we
ever need more hints, TPM state and EFI disk are likely candidates, and
then, per-volume hints would make sense). Implementing this wouldn't
change the plugin API.

* we won't add hints to {deactivate,unmap}_volume for now -- they're not
relevant for rxbounce, and possibly not relevant for future hints either
-- and even if they are, the plugin may be better off to track relevant
state on its own, because hypothetical hint X might even change between
volume activation and deactivation. If we still need to add this, it can
be done in a backwards-compatible way.

I'll follow up with a v1.

[0]
https://lore.proxmox.com/pve-devel/59d16447-5f20-4a38-a7b0-fa4890174440@proxmox.com/
[1]
https://lore.proxmox.com/pve-devel/d182c102-204b-43f0-9336-d6faf754fe72@proxmox.com/
[2]
https://lore.proxmox.com/pve-devel/85dab76c-d461-43d1-8f1c-45ea1b31200a@proxmox.com/


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-10-28 16:34 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-24 12:25 [pve-devel] [RFC qemu-server/storage 0/3] fix #5779: introduce guest hints to pass rxbounce flag to KRBD Friedrich Weber
2025-10-24 12:25 ` [pve-devel] [PATCH pve-storage 1/2] plugin: map/activate volume: allow callers to pass hints Friedrich Weber
2025-10-28 12:17   ` Fiona Ebner
2025-10-24 12:25 ` [pve-devel] [PATCH pve-storage 2/2] plugin: rbd: pass rxbounce when mapping Windows VM disks Friedrich Weber
2025-10-28 12:19   ` Fiona Ebner
2025-10-28 15:35     ` Friedrich Weber
2025-10-24 12:25 ` [pve-devel] [PATCH qemu-server 1/1] fix #5997: qemu: pass guest-ostype hint when activating volumes Friedrich Weber
2025-10-28  9:02 ` [pve-devel] [RFC qemu-server/storage 0/3] fix #5779: introduce guest hints to pass rxbounce flag to KRBD Friedrich Weber
2025-10-28 10:50   ` Fiona Ebner
2025-10-28 11:12     ` Fiona Ebner
2025-10-28 12:10 ` Fiona Ebner
2025-10-28 16:35   ` Friedrich Weber

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal