From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id A03551FF16B for ; Fri, 24 Oct 2025 14:27:32 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 8AE8A2010A; Fri, 24 Oct 2025 14:27:52 +0200 (CEST) From: Friedrich Weber To: pve-devel@lists.proxmox.com Date: Fri, 24 Oct 2025 14:25:52 +0200 Message-ID: <20251024122705.93761-1-f.weber@proxmox.com> X-Mailer: git-send-email 2.47.3 MIME-Version: 1.0 X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1761308826964 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.013 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pve-devel] [RFC qemu-server/storage 0/3] fix #5779: introduce guest hints to pass rxbounce flag to KRBD X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" When using Windows VMs with KRBD, it can be necessary to pass the rxbounce option to `rbd map` to avoid "bad crc/signature" warnings in the journal, retransmits and degraded performance. A first attempt to allow this [0] added an rxbounce storage option, but Thomas pointed out that this might nudge users towards creating two storages pointing to the same Ceph pool (one for Linux VMs that do not need rxbounce, one for Windows VMs that need rxbounce), which we should discourage. The cleanest solution from the user POV would for us to pass rxbounce only when mapping disks of VMs with Windows OS type, but this is not straightforward to implement, as pve-storage currently does not have knowledge about the "owner" of the volumes it manages. Fabian and Thomas suggested an approach to pass well-defined "hints" from e.g. qemu-server to pve-storage [0], one of which could be the guest's OS type, and based on which the RBD plugin could decide whether to pass rxbounce or not. We'd need to take care to avoid too much coupling between qemu-server and pve-storage, though. Here I took a first attempt at implementing such "hints" after an off-list discussion with Thomas. This RFC is intended as a base for further discussion whether we want hints in general, and if yes, whether we want them in this form. It should definitely not be merged like this and I tested it only superficially. For now, PVE::Storage::activate_volumes and PVE::Storage::map_volumes accept "hints" an optional hash reference with a JSON schema (which they validate). These hints are then passed to Plugin::{activate,map}_volume. PVE::Storage also provides a subroutine to check whether the storage stack supports a particular hint, callers (such as qemu-server) can use that to determine whether they can pass a certain hint. Thomas suggested (something like) that so we don't necessarily always have to bump qemu-server when we introduce a new hint in pve-storage. Obstacles I faced so far: - The biggest obstacle is that we need to update all callers of activate_volumes to pass guest hints where possible. This means that optimally all callers should be able to generate the guest hints. Right now, there is only the 'guest-ostype' hint which is taken from the VM config. Currently, this is not always available to the caller of activate_volumes, sometimes some extra work/refactoring would be needed to get it (e.g. see PVE::QemuServer::QemuImage::convert or PVE::QemuServer::clone_disk), so this needs quite some code changes, which I have not done in all cases in this RFC. - There are also some indirect callers of activate_volumes, e.g. via PVE::Storage::abs_filesystem_path or PVE::Storage::storage_migrate -- these would also need to be extended to accept hints (not done in this RFC) - Initially, to avoid having to modify all (direct+indirect) callers of activate_volumes, I thought I could pass the hints only at the few "relevant" call sites (i.e., when starting a VM), but then noticed that volumes may be activated by an action unrelated to a VM start (e.g. a clone), then stay active, and not be re-activated by a VM start. So if e.g. we do not pass the hints on clone, the KRBD volume would be mapped without rxbounce, stay active, and when starting the VM, a user could run into the original problem again. So we can't get away with only passing hints to the few relevant call sites, and actually need to pass them everywhere (where possible). - But sometimes, it might be fine to not pass hints to activate_volumes because we immediately deactivate, e.g. in QemuServer::restore_vma_archive, but only because the only current hint 'guest-ostype' is not relevant in such cases - might not be the case for future hints? - 'guest-ostype' might be the only hint we need for now and can be taken from the guest config, but can we be sure that future hints are also going to be derived from the guest config? Is it OK to have generate_storage_hints only take the guest config as well? - minor, but still: with this approach, rxbounce would also be passed for the EFI disk and TPM state of Windows VMs. This is likely not needed, but probably OK for a first version? - should we already pass hints when activating volumes in pve-container too, even though they are empty? - is it even OK to pass the hints only to {activate,map}_volumes? What if an external storage plugin needs the hint also when deactivating a volume? Should we already add a $hints parameter to other plugin methods as well, to avoid having to do that (and another APIVER bump) later? - With hints being an argument to activate_volumes which takes a list of volumes, the hint applies to all volumes in the list. Is this OK API-wise? For the guest-ostype hint, this only makes sense if all passed volumes belong to the same guest and not several different ones (because they may have different OS types), which I think is the case for our call sites, but doesn't have to be the case? - bumping the APIVER and docs are of course also needed, but left out of this RFC for now. All of the above can likely be sorted out, but especially modifying all direct+indirect call sites of {activate,map}_volumes will need some work and care to avoid subtly breaking things. Before getting into that, I wanted to ask for opinions about the general approach -- also because on the one hand I want to avoid "overfitting" the guest hints to the one usecase we currently have, but also avoid overengineering it to fit hypothetical usecases we don't even know about yet. What do you think? [0] https://lore.proxmox.com/pve-devel/20241025111304.99680-2-f.weber@proxmox.com/T/ [1] https://lore.proxmox.com/pve-devel/0fdd8f93-8ef6-4c83-92ab-8e86d5386343@proxmox.com/ pve-storage: Friedrich Weber (2): plugin: map/activate volume: allow callers to pass hints plugin: rbd: pass rxbounce when mapping Windows VM disks src/PVE/Storage.pm | 34 ++++++++++++++++++++++++++++++---- src/PVE/Storage/Plugin.pm | 17 +++++++++++++++-- src/PVE/Storage/RBDPlugin.pm | 13 +++++++++---- 3 files changed, 54 insertions(+), 10 deletions(-) qemu-server: Friedrich Weber (1): fix #5997: qemu: pass guest-ostype hint when activating volumes src/PVE/API2/Qemu.pm | 18 ++++++++++--- src/PVE/QemuConfig.pm | 1 + src/PVE/QemuServer.pm | 44 ++++++++++++++++++++++++-------- src/PVE/QemuServer/Cloudinit.pm | 3 ++- src/PVE/QemuServer/ImportDisk.pm | 4 ++- src/PVE/QemuServer/OVMF.pm | 4 ++- src/PVE/QemuServer/QemuImage.pm | 5 ++-- src/PVE/QemuServer/RunState.pm | 3 ++- src/PVE/VZDump/QemuServer.pm | 6 +++-- 9 files changed, 66 insertions(+), 22 deletions(-) Summary over all repositories: 12 files changed, 120 insertions(+), 32 deletions(-) -- Generated by git-murpp 0.8.1 _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel