public inbox for pve-user@lists.proxmox.com
 help / color / mirror / Atom feed
From: Fiona Ebner <f.ebner@proxmox.com>
To: Proxmox VE user list <pve-user@lists.proxmox.com>
Subject: Re: [PVE-User] QEMU crash with dpdk 22.11 app on Proxmox 8
Date: Wed, 4 Sep 2024 11:58:57 +0200	[thread overview]
Message-ID: <8f50a6ec-5612-4522-a826-2054e4a7d06e@proxmox.com> (raw)
In-Reply-To: <mailman.444.1724858668.302.pve-user@lists.proxmox.com>

Hi,

Am 28.08.24 um 16:56 schrieb Knight, Joshua via pve-user:
> 
> 
> We are seeing an issue on Proxmox 8 hosts where the underlying QEMU process for a guest will crash while starting a DPDK application in the guest.
> 
> 
>   *   Proxmox 8.2.4 with QEMU 9.0.2-2
>   *   Guest running Ubuntu 22.04, application is dpdk 22.11 testpmd
>   *   Using virtio network interfaces that are up/connected
>   *   Binding interfaces with the (legacy) igb_uio driver
> 
> When starting the application, the VM ssh connection will disconnect and the VM will be powered off in the ui.
> 
> root@karma06:~/dpdk-22.11# python3 /root/dpdk-22.11/res/usr/local/bin/dpdk-devbind.py --bind=igb_uio enp6s20
> root@karma06:~/dpdk-22.11# python3 /root/dpdk-22.11/res/usr/local/bin/dpdk-devbind.py --bind=igb_uio enp6s21
> root@karma06:~/dpdk-22.11# python3 /root/dpdk-22.11/res/usr/local/bin/dpdk-devbind.py --bind=igb_uio enp6s22
> root@karma06:~/dpdk-22.11# python3 /root/dpdk-22.11/res/usr/local/bin/dpdk-devbind.py --bind=igb_uio enp6s23
> 
> root@karma06:~/dpdk-22.11# /root/dpdk-22.11/res/usr/local/bin/dpdk-testpmd -- -i --port-topology=chained --rxq=1 --txq=1 --rss-ip
> EAL: Detected CPU lcores: 6
> EAL: Detected NUMA nodes: 1
> EAL: Detected static linkage of DPDK
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> EAL: Selected IOVA mode 'PA'
> EAL: VFIO support initialized
> EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:12.0 (socket -1)
> eth_virtio_pci_init(): Failed to init PCI device
> EAL: Requested device 0000:06:12.0 cannot be used
> EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:13.0 (socket -1)
> eth_virtio_pci_init(): Failed to init PCI device
> EAL: Requested device 0000:06:13.0 cannot be used
> EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:14.0 (socket -1)
> EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:15.0 (socket -1)
> EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:16.0 (socket -1)
> EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:17.0 (socket -1)
> TELEMETRY: No legacy callbacks, legacy socket not created
> Interactive-mode selected
> Warning: NUMA should be configured manually by using --port-numa-config and --ring-numa-config parameters along with --numa.
> testpmd: create a new mbuf pool <mb_pool_0>: n=187456, size=2176, socket=0
> testpmd: preferred mempool ops selected: ring_mp_mc
> Configuring Port 0 (socket 0)
> 
> client_loop: send disconnect: Broken pipe
> 
> 
> 
> A QEMU assertion is seen in the host’s system log. Using GDB we can see that QEMU is aborted.
> 
> karma QEMU[27334]: kvm: ../accel/kvm/kvm-all.c:1836: kvm_irqchip_commit_routes: Assertion `ret == 0' failed.
> 
> Thread 10 "CPU 0/KVM" received signal SIGABRT, Aborted.
> [Switching to Thread 0x7d999cc006c0 (LWP 36256)]
> __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
> 44      ./nptl/pthread_kill.c: No such file or directory.
> (gdb) bt
> #0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
> #1  0x00007d99a10a9e8f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
> #2  0x00007d99a105afb2 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
> #3  0x00007d99a1045472 in __GI_abort () at ./stdlib/abort.c:79
> #4  0x00007d99a1045395 in __assert_fail_base (fmt=0x7d99a11b9a90 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
>     assertion=assertion@entry=0x5a9eb5a20f5e "ret == 0", file=file@entry=0x5a9eb5a021a5 "../accel/kvm/kvm-all.c", line=line@entry=1836,
>     function=function@entry=0x5a9eb5a03ca0 <__PRETTY_FUNCTION__.23> "kvm_irqchip_commit_routes") at ./assert/assert.c:92
> #5  0x00007d99a1053eb2 in __GI___assert_fail (assertion=assertion@entry=0x5a9eb5a20f5e "ret == 0",
>     file=file@entry=0x5a9eb5a021a5 "../accel/kvm/kvm-all.c", line=line@entry=1836,
>     function=function@entry=0x5a9eb5a03ca0 <__PRETTY_FUNCTION__.23> "kvm_irqchip_commit_routes") at ./assert/assert.c:101
> #6  0x00005a9eb566248c in kvm_irqchip_commit_routes (s=0x5a9eb79eed10) at ../accel/kvm/kvm-all.c:1836
> #7  kvm_irqchip_commit_routes (s=0x5a9eb79eed10) at ../accel/kvm/kvm-all.c:1821
> #8  0x00005a9eb540bed2 in virtio_pci_one_vector_unmask (proxy=proxy@entry=0x5a9eb9f5ada0, queue_no=queue_no@entry=4294967295,
>     vector=vector@entry=0, msg=..., n=0x5a9eb9f63368) at ../hw/virtio/virtio-pci.c:991
> #9  0x00005a9eb540c09c in virtio_pci_vector_unmask (dev=0x5a9eb9f5ada0, vector=0, msg=...) at ../hw/virtio/virtio-pci.c:1056
> #10 0x00005a9eb536ff62 in msix_fire_vector_notifier (is_masked=false, vector=0, dev=0x5a9eb9f5ada0) at ../hw/pci/msix.c:120
> #11 msix_handle_mask_update (dev=0x5a9eb9f5ada0, vector=0, was_masked=<optimized out>) at ../hw/pci/msix.c:140
> #12 0x00005a9eb5602260 in memory_region_write_accessor (mr=0x5a9eb9f5b3e0, addr=12, value=<optimized out>, size=4, shift=<optimized out>,
>     mask=<optimized out>, attrs=...) at ../system/memory.c:497
> #13 0x00005a9eb5602f4e in access_with_adjusted_size (addr=addr@entry=12, value=value@entry=0x7d999cbfae58, size=size@entry=4,
>     access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x5a9eb56021e0 <memory_region_write_accessor>,
>     mr=<optimized out>, attrs=...) at ../system/memory.c:573
> #14 0x00005a9eb560403c in memory_region_dispatch_write (mr=mr@entry=0x5a9eb9f5b3e0, addr=addr@entry=12, data=<optimized out>,
>     op=<optimized out>, attrs=attrs@entry=...) at ../system/memory.c:1528
> #15 0x00005a9eb560b95f in flatview_write_continue_step (attrs=attrs@entry=..., buf=buf@entry=0x7d99a3433028 "", mr_addr=12,
>     l=l@entry=0x7d999cbfaf80, mr=0x5a9eb9f5b3e0, len=4) at ../system/physmem.c:2713
> #16 0x00005a9eb560bbed in flatview_write_continue (mr=<optimized out>, l=<optimized out>, mr_addr=<optimized out>, len=4, ptr=0xfdf8500c,
>     attrs=..., addr=4260909068, fv=0x7d8d6c0796b0) at ../system/physmem.c:2743
> #17 flatview_write (fv=0x7d8d6c0796b0, addr=addr@entry=4260909068, attrs=attrs@entry=..., buf=buf@entry=0x7d99a3433028, len=len@entry=4)
>     at ../system/physmem.c:2774
> #18 0x00005a9eb560f251 in address_space_write (len=4, buf=0x7d99a3433028, attrs=..., addr=4260909068, as=0x5a9eb66f1f20 <address_space_memory>)
>     at ../system/physmem.c:2894
> #19 address_space_rw (as=0x5a9eb66f1f20 <address_space_memory>, addr=4260909068, attrs=attrs@entry=..., buf=buf@entry=0x7d99a3433028, len=4,
>     is_write=<optimized out>) at ../system/physmem.c:2904
> #20 0x00005a9eb56660e8 in kvm_cpu_exec (cpu=cpu@entry=0x5a9eb81e6890) at ../accel/kvm/kvm-all.c:2917
> #21 0x00005a9eb56676d5 in kvm_vcpu_thread_fn (arg=arg@entry=0x5a9eb81e6890) at ../accel/kvm/kvm-accel-ops.c:50
> #22 0x00005a9eb581dfe8 in qemu_thread_start (args=0x5a9eb81ee390) at ../util/qemu-thread-posix.c:541
> #23 0x00007d99a10a8134 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
> #24 0x00007d99a11287dc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
> 
> 
> One thing that’s interesting about this backtrace is it seems to exactly match an existing issue in QEMU that claims to be patched, and that patch should be present in QEMU 9.0.2, the version running on this Proxmox host.
> 
> https://gitlab.com/qemu-project/qemu/-/issues/1928
> 
> We’ve found a workaround by switching from the deprecated igb_uio driver to the vfio-pci driver when binding the interfaces for dpdk. In this case the VM does not crash. But I’m wondering if anyone has hit this before or if it’s a known issue.  I would certainly not expect any operation in the guest to cause QEMU to crash. It’s also odd that the crash seen claims to be patched in 9.0.2.
> 
> We’ve been able to reproduce this on Proxmox 8.0, 8.1, 8.2 on both AMD and Intel processors. The crash does not occur on earlier releases such as Proxmox 6.4, and does not occur with earlier dpdk versions such as 20.08.
> 
> Thanks,
> Josh
> 

we do have a revert of that patch currently, because it caused some
regressions that sounded just as bad as the original issue [0].

A fix for the regressions has landed upstream now [1], and I'll take a
look at pulling it in and dropping the revert.

[0]:
https://git.proxmox.com/?p=pve-qemu.git;a=blob;f=debian/patches/extra/0006-Revert-virtio-pci-fix-use-of-a-released-vector.patch;h=d2de6d11ba1e2a2bd2ea8dccf660ac6e66b047d4;hb=582fd47901356342b8e0bef19d7d8fdc324d2d96
[1]:
https://lore.kernel.org/qemu-devel/a8e63ff289d137197ad7a701a587cc432872d798.1724151593.git.mst@redhat.com/

Best Regards,
Fiona


_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

  reply	other threads:[~2024-09-04  9:58 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-28 14:56 Knight, Joshua via pve-user
2024-09-04  9:58 ` Fiona Ebner [this message]
2024-09-04 14:49   ` Knight, Joshua via pve-user
     [not found]   ` <PH7PR01MB84968F27787A0BF6B885EEF9879C2@PH7PR01MB8496.prod.exchangelabs.com>
2024-09-05  9:53     ` Fiona Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8f50a6ec-5612-4522-a826-2054e4a7d06e@proxmox.com \
    --to=f.ebner@proxmox.com \
    --cc=pve-user@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal