From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 6768392666 for ; Tue, 14 Mar 2023 14:45:02 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 3FBC321B0E for ; Tue, 14 Mar 2023 14:44:32 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Tue, 14 Mar 2023 14:44:30 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 19D3945642; Tue, 14 Mar 2023 14:44:30 +0100 (CET) Message-ID: <77306627-4403-b64f-c688-c2573fb555c1@proxmox.com> Date: Tue, 14 Mar 2023 14:44:28 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:110.0) Gecko/20100101 Thunderbird/110.0 To: Proxmox VE development discussion , Noel Ullreich References: <20230314124804.62223-1-n.ullreich@proxmox.com> Content-Language: en-US From: Dominik Csapak In-Reply-To: <20230314124804.62223-1-n.ullreich@proxmox.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.089 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -0.001 Looks like a legit reply (A) POISEN_SPAM_PILL 0.1 Meta: its spam POISEN_SPAM_PILL_1 0.1 random spam to be learned in bayes POISEN_SPAM_PILL_3 0.1 random spam to be learned in bayes SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com] Subject: Re: [pve-devel] [PATCH pve-docs] update the PCI(e) docs X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Mar 2023 13:45:02 -0000 a few comments inline On 3/14/23 13:48, Noel Ullreich wrote: > A little update to the PCI(e) docs with the plan of reworking the PCI > wiki as well. > > Some questions and reasoning to the patch: > * I would only mention the ACS patch in the PCI examples wiki, since it is a > last-ditch effort to get IOMMU to work and who knows how long we will support > the patch. > * Should I move the blacklising example to the example-wiki and just link to it? > I don't want people blindly copy-pasting commands. Same goes for the softdep > example. first, these comments are not part of the commit message and should go below the '---' part of the message yes, i'd only ever mention the acs patch in the wiki, not in the reference docs. also the blacklisting example can stay here, but i'd make it more generic (see comment further down) > > Signed-off-by: Noel Ullreich > --- > qm-pci-passthrough.adoc | 87 +++++++++++++++++++++++++++++++++++------ > 1 file changed, 75 insertions(+), 12 deletions(-) > > diff --git a/qm-pci-passthrough.adoc b/qm-pci-passthrough.adoc > index df6cf21..ed17b9c 100644 > --- a/qm-pci-passthrough.adoc > +++ b/qm-pci-passthrough.adoc > @@ -16,16 +16,17 @@ device anymore on the host or in any other VM. > General Requirements > ~~~~~~~~~~~~~~~~~~~~ > > -Since passthrough is a feature which also needs hardware support, there are > -some requirements to check and preparations to be done to make it work. > - > +Since passthrough is preformed on real hardware, the hardware needs to fulfill > +some requirements. A brief overview of these requirements is given below, for more > +information on specific devices, see > +https://pve.proxmox.com/wiki/PCI_Passthrough[PCI Passthrough Examples]. this reads a bit weird: '[...]on real hardware, the hardware[...]' i'd just go : '[...]on real hardware, it[...]' should be clear enough > > Hardware > ^^^^^^^^ > Your hardware needs to support `IOMMU` (*I*/*O* **M**emory **M**anagement > **U**nit) interrupt remapping, this includes the CPU and the mainboard. > > -Generally, Intel systems with VT-d, and AMD systems with AMD-Vi support this. > +Generally, Intel systems with VT-d and AMD systems with AMD-Vi support this. > But it is not guaranteed that everything will work out of the box, due > to bad hardware implementation and missing or low quality drivers. > > @@ -44,8 +45,8 @@ some configuration to enable PCI(e) passthrough. > > .IOMMU > > -First, you have to enable IOMMU support in your BIOS/UEFI. Usually the > -corresponding setting is called `IOMMU` or `VT-d`,but you should find the exact > +First, you will have to enable IOMMU support in your BIOS/UEFI. Usually the > +corresponding setting is called `IOMMU` or `VT-d`, but you should find the exact > option name in the manual of your motherboard. > > For Intel CPUs, you may also need to enable the IOMMU on the > @@ -72,6 +73,9 @@ hardware IOMMU. To enable these options, add: > > to the xref:sysboot_edit_kernel_cmdline[kernel commandline]. > > +For a complete list of kernel commandline options (of kernel 5.15), see > +https://www.kernel.org/doc/html/v5.15/admin-guide/kernel-parameters.html[kernel.org]. > + imho this should be in the 'edit kernel cmdline' section and shouldn't be referencing a specific version but like this: for a complete list, see https://www.kernel.lorg/doc/html/v/admin... replace with the major.minor version (e.g. 5.15) that way we don't have to update the link on every kernel version bump > .Kernel Modules > > You have to make sure the following modules are loaded. This can be achieved by > @@ -92,6 +96,14 @@ After changing anything modules related, you need to refresh your > # update-initramfs -u -k all > ---- > > +To check if the modules are being loaded, the output of > + > +---- > +# lsmod | grep vfio > +---- > + > +should include the four modules from above. > + > .Finish Configuration > > Finally reboot to bring the changes into effect and check that it is indeed > @@ -105,8 +117,22 @@ should display that `IOMMU`, `Directed I/O` or `Interrupt Remapping` is > enabled, depending on hardware and kernel the exact message can vary. > > It is also important that the device(s) you want to pass through > -are in a *separate* `IOMMU` group. This can be checked with: > +are in a *separate* `IOMMU` group. This can be checked either with: > > +* a call to the {pve} API: > ++ > +---- > +# pvesh get /nodes/{nodename}/hardware/pci --pci-class-blacklist "" > +---- > + > +* a bash oneliner: > ++ > +---- > +# for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done > +---- > + > +* this command, although it gives less information than the other two: > ++ i'd only give one option (preferably the pvesh one) since the user does not need three commands to do it. also mentioning 'this also exists, but is inferior' does not make much sense > ---- > # find /sys/kernel/iommu_groups/ -type l > ---- > @@ -148,6 +174,10 @@ desktop software (for example, VNC or RDP) inside the guest. > > If you want to use the GPU as a hardware accelerator, for example, for > programs using OpenCL or CUDA, this is not required. > +In this case, to use NoVNC or SPICE, you might need to unset the 'primary GPU' > +flag(see xref:qm_pci_passthrough_vm_config[VM configuration]) and make sure the > +GPU is not phyiscally connected to a monitor. that's not completely correct, instead of unsetting 'primary gpu' one can also set a specific display. and why shouldn't the user connect the gpu to a monitor? this does not make a difference for the virtual display 99% of the time > + > > Host Device Passthrough > ~~~~~~~~~~~~~~~~~~~~~~~ > @@ -159,8 +189,8 @@ PCI(e) card, for example a GPU or a network card. > Host Configuration > ^^^^^^^^^^^^^^^^^^ > > -In this case, the host must not use the card. There are two methods to achieve > -this: > +{pve} tries to automatically make the PCI(e) device unavailable for the host. > +However, if this doesn't work, there are two things that can be done: > > * pass the device IDs to the options of the 'vfio-pci' modules by adding > + > @@ -175,7 +205,7 @@ the vendor and device IDs obtained by: > # lspci -nn > ---- > > -* blacklist the driver completely on the host, ensuring that it is free to bind > +* blacklist the driver on the host completely, ensuring that it is free to bind > for passthrough, with > + > ---- > @@ -183,11 +213,46 @@ for passthrough, with > ---- > + > in a .conf file in */etc/modprobe.d/*. > ++ > +To find the drivername, execute > ++ > +---- > +# lspci -k > +---- > ++ > +for example: > ++ > +---- > +# lspci -k | grep -A 3 "VGA" > + > +// The output tells us, that the drivers are called `nvidia` > +01:00.0 VGA compatible controller: NVIDIA Corporation GP108 [GeForce GT 1030] (rev a1) > + Subsystem: Micro-Star International Co., Ltd. [MSI] GP108 [GeForce GT 1030] > + Kernel driver in use: nvidia > + Kernel modules: nvidia > +---- > ++ > +Now we can blacklist the drivers by writing them into a .conf file: > ++ > +---- > +echo "blacklist nvidia" >> /etc/modprobe.d/blacklist.conf > +---- this could stay here, but i'd replace the 'nvidia' in the example with 'some-module', maybe i'd even replace the whole lspci output with dummy info where it also says 'some-module' then even if someone c&p, it should not have a harmful effect > > For both methods you need to > xref:qm_pci_passthrough_update_initramfs[update the `initramfs`] again and > reboot after that. > > +Should this not work, you might need to set a soft dependency to load the gpu > +modules before loading 'vfio-pci'. This can be done with the 'softdep' flag, see > +also the manpages on 'modprobe.d' for more information. > + > +For example, if you are using a NVIDIA gpu and using the 'nouveau' drivers: > + > +---- > +# echo "softdep nouveau pre: vfio-pci" >> /etc/modprobe.d/nouveau.conf > +---- > + > + same here, just use 'some-module' > .Verify Configuration > > To check if your changes were successful, you can use > @@ -262,7 +327,6 @@ For example: > # qm set VMID -hostpci0 02:00,device-id=0x10f6,sub-vendor-id=0x0000 > ---- > > - > Other considerations > ^^^^^^^^^^^^^^^^^^^^ > > @@ -288,7 +352,6 @@ Currently, the most common use case for this are NICs (**N**etwork > physical port. This allows using features such as checksum offloading, etc. to > be used inside a VM, reducing the (host) CPU overhead. > > - > Host Configuration > ^^^^^^^^^^^^^^^^^^ >