public inbox for pve-user@lists.proxmox.com
 help / color / mirror / Atom feed
* [PVE-User] QEMU crash with dpdk 22.11 app on Proxmox 8
@ 2024-08-28 14:56 Knight, Joshua via pve-user
  2024-09-04  9:58 ` Fiona Ebner
  0 siblings, 1 reply; 4+ messages in thread
From: Knight, Joshua via pve-user @ 2024-08-28 14:56 UTC (permalink / raw)
  To: pve-user; +Cc: Knight, Joshua

[-- Attachment #1: Type: message/rfc822, Size: 21197 bytes --]

From: "Knight, Joshua" <Joshua.Knight@netscout.com>
To: "pve-user@lists.proxmox.com" <pve-user@lists.proxmox.com>
Subject: QEMU crash with dpdk 22.11 app on Proxmox 8
Date: Wed, 28 Aug 2024 14:56:48 +0000
Message-ID: <PH7PR01MB84961C594E216ED7DF7944FA87952@PH7PR01MB8496.prod.exchangelabs.com>

We are seeing an issue on Proxmox 8 hosts where the underlying QEMU process for a guest will crash while starting a DPDK application in the guest.


  *   Proxmox 8.2.4 with QEMU 9.0.2-2
  *   Guest running Ubuntu 22.04, application is dpdk 22.11 testpmd
  *   Using virtio network interfaces that are up/connected
  *   Binding interfaces with the (legacy) igb_uio driver

When starting the application, the VM ssh connection will disconnect and the VM will be powered off in the ui.

root@karma06:~/dpdk-22.11# python3 /root/dpdk-22.11/res/usr/local/bin/dpdk-devbind.py --bind=igb_uio enp6s20
root@karma06:~/dpdk-22.11# python3 /root/dpdk-22.11/res/usr/local/bin/dpdk-devbind.py --bind=igb_uio enp6s21
root@karma06:~/dpdk-22.11# python3 /root/dpdk-22.11/res/usr/local/bin/dpdk-devbind.py --bind=igb_uio enp6s22
root@karma06:~/dpdk-22.11# python3 /root/dpdk-22.11/res/usr/local/bin/dpdk-devbind.py --bind=igb_uio enp6s23

root@karma06:~/dpdk-22.11# /root/dpdk-22.11/res/usr/local/bin/dpdk-testpmd -- -i --port-topology=chained --rxq=1 --txq=1 --rss-ip
EAL: Detected CPU lcores: 6
EAL: Detected NUMA nodes: 1
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:12.0 (socket -1)
eth_virtio_pci_init(): Failed to init PCI device
EAL: Requested device 0000:06:12.0 cannot be used
EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:13.0 (socket -1)
eth_virtio_pci_init(): Failed to init PCI device
EAL: Requested device 0000:06:13.0 cannot be used
EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:14.0 (socket -1)
EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:15.0 (socket -1)
EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:16.0 (socket -1)
EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:17.0 (socket -1)
TELEMETRY: No legacy callbacks, legacy socket not created
Interactive-mode selected
Warning: NUMA should be configured manually by using --port-numa-config and --ring-numa-config parameters along with --numa.
testpmd: create a new mbuf pool <mb_pool_0>: n=187456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Configuring Port 0 (socket 0)

client_loop: send disconnect: Broken pipe



A QEMU assertion is seen in the host’s system log. Using GDB we can see that QEMU is aborted.

karma QEMU[27334]: kvm: ../accel/kvm/kvm-all.c:1836: kvm_irqchip_commit_routes: Assertion `ret == 0' failed.

Thread 10 "CPU 0/KVM" received signal SIGABRT, Aborted.
[Switching to Thread 0x7d999cc006c0 (LWP 36256)]
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1  0x00007d99a10a9e8f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  0x00007d99a105afb2 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007d99a1045472 in __GI_abort () at ./stdlib/abort.c:79
#4  0x00007d99a1045395 in __assert_fail_base (fmt=0x7d99a11b9a90 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0x5a9eb5a20f5e "ret == 0", file=file@entry=0x5a9eb5a021a5 "../accel/kvm/kvm-all.c", line=line@entry=1836,
    function=function@entry=0x5a9eb5a03ca0 <__PRETTY_FUNCTION__.23> "kvm_irqchip_commit_routes") at ./assert/assert.c:92
#5  0x00007d99a1053eb2 in __GI___assert_fail (assertion=assertion@entry=0x5a9eb5a20f5e "ret == 0",
    file=file@entry=0x5a9eb5a021a5 "../accel/kvm/kvm-all.c", line=line@entry=1836,
    function=function@entry=0x5a9eb5a03ca0 <__PRETTY_FUNCTION__.23> "kvm_irqchip_commit_routes") at ./assert/assert.c:101
#6  0x00005a9eb566248c in kvm_irqchip_commit_routes (s=0x5a9eb79eed10) at ../accel/kvm/kvm-all.c:1836
#7  kvm_irqchip_commit_routes (s=0x5a9eb79eed10) at ../accel/kvm/kvm-all.c:1821
#8  0x00005a9eb540bed2 in virtio_pci_one_vector_unmask (proxy=proxy@entry=0x5a9eb9f5ada0, queue_no=queue_no@entry=4294967295,
    vector=vector@entry=0, msg=..., n=0x5a9eb9f63368) at ../hw/virtio/virtio-pci.c:991
#9  0x00005a9eb540c09c in virtio_pci_vector_unmask (dev=0x5a9eb9f5ada0, vector=0, msg=...) at ../hw/virtio/virtio-pci.c:1056
#10 0x00005a9eb536ff62 in msix_fire_vector_notifier (is_masked=false, vector=0, dev=0x5a9eb9f5ada0) at ../hw/pci/msix.c:120
#11 msix_handle_mask_update (dev=0x5a9eb9f5ada0, vector=0, was_masked=<optimized out>) at ../hw/pci/msix.c:140
#12 0x00005a9eb5602260 in memory_region_write_accessor (mr=0x5a9eb9f5b3e0, addr=12, value=<optimized out>, size=4, shift=<optimized out>,
    mask=<optimized out>, attrs=...) at ../system/memory.c:497
#13 0x00005a9eb5602f4e in access_with_adjusted_size (addr=addr@entry=12, value=value@entry=0x7d999cbfae58, size=size@entry=4,
    access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x5a9eb56021e0 <memory_region_write_accessor>,
    mr=<optimized out>, attrs=...) at ../system/memory.c:573
#14 0x00005a9eb560403c in memory_region_dispatch_write (mr=mr@entry=0x5a9eb9f5b3e0, addr=addr@entry=12, data=<optimized out>,
    op=<optimized out>, attrs=attrs@entry=...) at ../system/memory.c:1528
#15 0x00005a9eb560b95f in flatview_write_continue_step (attrs=attrs@entry=..., buf=buf@entry=0x7d99a3433028 "", mr_addr=12,
    l=l@entry=0x7d999cbfaf80, mr=0x5a9eb9f5b3e0, len=4) at ../system/physmem.c:2713
#16 0x00005a9eb560bbed in flatview_write_continue (mr=<optimized out>, l=<optimized out>, mr_addr=<optimized out>, len=4, ptr=0xfdf8500c,
    attrs=..., addr=4260909068, fv=0x7d8d6c0796b0) at ../system/physmem.c:2743
#17 flatview_write (fv=0x7d8d6c0796b0, addr=addr@entry=4260909068, attrs=attrs@entry=..., buf=buf@entry=0x7d99a3433028, len=len@entry=4)
    at ../system/physmem.c:2774
#18 0x00005a9eb560f251 in address_space_write (len=4, buf=0x7d99a3433028, attrs=..., addr=4260909068, as=0x5a9eb66f1f20 <address_space_memory>)
    at ../system/physmem.c:2894
#19 address_space_rw (as=0x5a9eb66f1f20 <address_space_memory>, addr=4260909068, attrs=attrs@entry=..., buf=buf@entry=0x7d99a3433028, len=4,
    is_write=<optimized out>) at ../system/physmem.c:2904
#20 0x00005a9eb56660e8 in kvm_cpu_exec (cpu=cpu@entry=0x5a9eb81e6890) at ../accel/kvm/kvm-all.c:2917
#21 0x00005a9eb56676d5 in kvm_vcpu_thread_fn (arg=arg@entry=0x5a9eb81e6890) at ../accel/kvm/kvm-accel-ops.c:50
#22 0x00005a9eb581dfe8 in qemu_thread_start (args=0x5a9eb81ee390) at ../util/qemu-thread-posix.c:541
#23 0x00007d99a10a8134 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#24 0x00007d99a11287dc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81


One thing that’s interesting about this backtrace is it seems to exactly match an existing issue in QEMU that claims to be patched, and that patch should be present in QEMU 9.0.2, the version running on this Proxmox host.

https://gitlab.com/qemu-project/qemu/-/issues/1928

We’ve found a workaround by switching from the deprecated igb_uio driver to the vfio-pci driver when binding the interfaces for dpdk. In this case the VM does not crash. But I’m wondering if anyone has hit this before or if it’s a known issue.  I would certainly not expect any operation in the guest to cause QEMU to crash. It’s also odd that the crash seen claims to be patched in 9.0.2.

We’ve been able to reproduce this on Proxmox 8.0, 8.1, 8.2 on both AMD and Intel processors. The crash does not occur on earlier releases such as Proxmox 6.4, and does not occur with earlier dpdk versions such as 20.08.

Thanks,
Josh

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PVE-User] QEMU crash with dpdk 22.11 app on Proxmox 8
  2024-08-28 14:56 [PVE-User] QEMU crash with dpdk 22.11 app on Proxmox 8 Knight, Joshua via pve-user
@ 2024-09-04  9:58 ` Fiona Ebner
  2024-09-04 14:49   ` Knight, Joshua via pve-user
       [not found]   ` <PH7PR01MB84968F27787A0BF6B885EEF9879C2@PH7PR01MB8496.prod.exchangelabs.com>
  0 siblings, 2 replies; 4+ messages in thread
From: Fiona Ebner @ 2024-09-04  9:58 UTC (permalink / raw)
  To: Proxmox VE user list

Hi,

Am 28.08.24 um 16:56 schrieb Knight, Joshua via pve-user:
> 
> 
> We are seeing an issue on Proxmox 8 hosts where the underlying QEMU process for a guest will crash while starting a DPDK application in the guest.
> 
> 
>   *   Proxmox 8.2.4 with QEMU 9.0.2-2
>   *   Guest running Ubuntu 22.04, application is dpdk 22.11 testpmd
>   *   Using virtio network interfaces that are up/connected
>   *   Binding interfaces with the (legacy) igb_uio driver
> 
> When starting the application, the VM ssh connection will disconnect and the VM will be powered off in the ui.
> 
> root@karma06:~/dpdk-22.11# python3 /root/dpdk-22.11/res/usr/local/bin/dpdk-devbind.py --bind=igb_uio enp6s20
> root@karma06:~/dpdk-22.11# python3 /root/dpdk-22.11/res/usr/local/bin/dpdk-devbind.py --bind=igb_uio enp6s21
> root@karma06:~/dpdk-22.11# python3 /root/dpdk-22.11/res/usr/local/bin/dpdk-devbind.py --bind=igb_uio enp6s22
> root@karma06:~/dpdk-22.11# python3 /root/dpdk-22.11/res/usr/local/bin/dpdk-devbind.py --bind=igb_uio enp6s23
> 
> root@karma06:~/dpdk-22.11# /root/dpdk-22.11/res/usr/local/bin/dpdk-testpmd -- -i --port-topology=chained --rxq=1 --txq=1 --rss-ip
> EAL: Detected CPU lcores: 6
> EAL: Detected NUMA nodes: 1
> EAL: Detected static linkage of DPDK
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> EAL: Selected IOVA mode 'PA'
> EAL: VFIO support initialized
> EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:12.0 (socket -1)
> eth_virtio_pci_init(): Failed to init PCI device
> EAL: Requested device 0000:06:12.0 cannot be used
> EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:13.0 (socket -1)
> eth_virtio_pci_init(): Failed to init PCI device
> EAL: Requested device 0000:06:13.0 cannot be used
> EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:14.0 (socket -1)
> EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:15.0 (socket -1)
> EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:16.0 (socket -1)
> EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:17.0 (socket -1)
> TELEMETRY: No legacy callbacks, legacy socket not created
> Interactive-mode selected
> Warning: NUMA should be configured manually by using --port-numa-config and --ring-numa-config parameters along with --numa.
> testpmd: create a new mbuf pool <mb_pool_0>: n=187456, size=2176, socket=0
> testpmd: preferred mempool ops selected: ring_mp_mc
> Configuring Port 0 (socket 0)
> 
> client_loop: send disconnect: Broken pipe
> 
> 
> 
> A QEMU assertion is seen in the host’s system log. Using GDB we can see that QEMU is aborted.
> 
> karma QEMU[27334]: kvm: ../accel/kvm/kvm-all.c:1836: kvm_irqchip_commit_routes: Assertion `ret == 0' failed.
> 
> Thread 10 "CPU 0/KVM" received signal SIGABRT, Aborted.
> [Switching to Thread 0x7d999cc006c0 (LWP 36256)]
> __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
> 44      ./nptl/pthread_kill.c: No such file or directory.
> (gdb) bt
> #0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
> #1  0x00007d99a10a9e8f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
> #2  0x00007d99a105afb2 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
> #3  0x00007d99a1045472 in __GI_abort () at ./stdlib/abort.c:79
> #4  0x00007d99a1045395 in __assert_fail_base (fmt=0x7d99a11b9a90 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
>     assertion=assertion@entry=0x5a9eb5a20f5e "ret == 0", file=file@entry=0x5a9eb5a021a5 "../accel/kvm/kvm-all.c", line=line@entry=1836,
>     function=function@entry=0x5a9eb5a03ca0 <__PRETTY_FUNCTION__.23> "kvm_irqchip_commit_routes") at ./assert/assert.c:92
> #5  0x00007d99a1053eb2 in __GI___assert_fail (assertion=assertion@entry=0x5a9eb5a20f5e "ret == 0",
>     file=file@entry=0x5a9eb5a021a5 "../accel/kvm/kvm-all.c", line=line@entry=1836,
>     function=function@entry=0x5a9eb5a03ca0 <__PRETTY_FUNCTION__.23> "kvm_irqchip_commit_routes") at ./assert/assert.c:101
> #6  0x00005a9eb566248c in kvm_irqchip_commit_routes (s=0x5a9eb79eed10) at ../accel/kvm/kvm-all.c:1836
> #7  kvm_irqchip_commit_routes (s=0x5a9eb79eed10) at ../accel/kvm/kvm-all.c:1821
> #8  0x00005a9eb540bed2 in virtio_pci_one_vector_unmask (proxy=proxy@entry=0x5a9eb9f5ada0, queue_no=queue_no@entry=4294967295,
>     vector=vector@entry=0, msg=..., n=0x5a9eb9f63368) at ../hw/virtio/virtio-pci.c:991
> #9  0x00005a9eb540c09c in virtio_pci_vector_unmask (dev=0x5a9eb9f5ada0, vector=0, msg=...) at ../hw/virtio/virtio-pci.c:1056
> #10 0x00005a9eb536ff62 in msix_fire_vector_notifier (is_masked=false, vector=0, dev=0x5a9eb9f5ada0) at ../hw/pci/msix.c:120
> #11 msix_handle_mask_update (dev=0x5a9eb9f5ada0, vector=0, was_masked=<optimized out>) at ../hw/pci/msix.c:140
> #12 0x00005a9eb5602260 in memory_region_write_accessor (mr=0x5a9eb9f5b3e0, addr=12, value=<optimized out>, size=4, shift=<optimized out>,
>     mask=<optimized out>, attrs=...) at ../system/memory.c:497
> #13 0x00005a9eb5602f4e in access_with_adjusted_size (addr=addr@entry=12, value=value@entry=0x7d999cbfae58, size=size@entry=4,
>     access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x5a9eb56021e0 <memory_region_write_accessor>,
>     mr=<optimized out>, attrs=...) at ../system/memory.c:573
> #14 0x00005a9eb560403c in memory_region_dispatch_write (mr=mr@entry=0x5a9eb9f5b3e0, addr=addr@entry=12, data=<optimized out>,
>     op=<optimized out>, attrs=attrs@entry=...) at ../system/memory.c:1528
> #15 0x00005a9eb560b95f in flatview_write_continue_step (attrs=attrs@entry=..., buf=buf@entry=0x7d99a3433028 "", mr_addr=12,
>     l=l@entry=0x7d999cbfaf80, mr=0x5a9eb9f5b3e0, len=4) at ../system/physmem.c:2713
> #16 0x00005a9eb560bbed in flatview_write_continue (mr=<optimized out>, l=<optimized out>, mr_addr=<optimized out>, len=4, ptr=0xfdf8500c,
>     attrs=..., addr=4260909068, fv=0x7d8d6c0796b0) at ../system/physmem.c:2743
> #17 flatview_write (fv=0x7d8d6c0796b0, addr=addr@entry=4260909068, attrs=attrs@entry=..., buf=buf@entry=0x7d99a3433028, len=len@entry=4)
>     at ../system/physmem.c:2774
> #18 0x00005a9eb560f251 in address_space_write (len=4, buf=0x7d99a3433028, attrs=..., addr=4260909068, as=0x5a9eb66f1f20 <address_space_memory>)
>     at ../system/physmem.c:2894
> #19 address_space_rw (as=0x5a9eb66f1f20 <address_space_memory>, addr=4260909068, attrs=attrs@entry=..., buf=buf@entry=0x7d99a3433028, len=4,
>     is_write=<optimized out>) at ../system/physmem.c:2904
> #20 0x00005a9eb56660e8 in kvm_cpu_exec (cpu=cpu@entry=0x5a9eb81e6890) at ../accel/kvm/kvm-all.c:2917
> #21 0x00005a9eb56676d5 in kvm_vcpu_thread_fn (arg=arg@entry=0x5a9eb81e6890) at ../accel/kvm/kvm-accel-ops.c:50
> #22 0x00005a9eb581dfe8 in qemu_thread_start (args=0x5a9eb81ee390) at ../util/qemu-thread-posix.c:541
> #23 0x00007d99a10a8134 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
> #24 0x00007d99a11287dc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
> 
> 
> One thing that’s interesting about this backtrace is it seems to exactly match an existing issue in QEMU that claims to be patched, and that patch should be present in QEMU 9.0.2, the version running on this Proxmox host.
> 
> https://gitlab.com/qemu-project/qemu/-/issues/1928
> 
> We’ve found a workaround by switching from the deprecated igb_uio driver to the vfio-pci driver when binding the interfaces for dpdk. In this case the VM does not crash. But I’m wondering if anyone has hit this before or if it’s a known issue.  I would certainly not expect any operation in the guest to cause QEMU to crash. It’s also odd that the crash seen claims to be patched in 9.0.2.
> 
> We’ve been able to reproduce this on Proxmox 8.0, 8.1, 8.2 on both AMD and Intel processors. The crash does not occur on earlier releases such as Proxmox 6.4, and does not occur with earlier dpdk versions such as 20.08.
> 
> Thanks,
> Josh
> 

we do have a revert of that patch currently, because it caused some
regressions that sounded just as bad as the original issue [0].

A fix for the regressions has landed upstream now [1], and I'll take a
look at pulling it in and dropping the revert.

[0]:
https://git.proxmox.com/?p=pve-qemu.git;a=blob;f=debian/patches/extra/0006-Revert-virtio-pci-fix-use-of-a-released-vector.patch;h=d2de6d11ba1e2a2bd2ea8dccf660ac6e66b047d4;hb=582fd47901356342b8e0bef19d7d8fdc324d2d96
[1]:
https://lore.kernel.org/qemu-devel/a8e63ff289d137197ad7a701a587cc432872d798.1724151593.git.mst@redhat.com/

Best Regards,
Fiona


_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PVE-User] QEMU crash with dpdk 22.11 app on Proxmox 8
  2024-09-04  9:58 ` Fiona Ebner
@ 2024-09-04 14:49   ` Knight, Joshua via pve-user
       [not found]   ` <PH7PR01MB84968F27787A0BF6B885EEF9879C2@PH7PR01MB8496.prod.exchangelabs.com>
  1 sibling, 0 replies; 4+ messages in thread
From: Knight, Joshua via pve-user @ 2024-09-04 14:49 UTC (permalink / raw)
  To: Fiona Ebner, Proxmox VE user list; +Cc: Knight, Joshua

[-- Attachment #1: Type: message/rfc822, Size: 28566 bytes --]

From: "Knight, Joshua" <Joshua.Knight@netscout.com>
To: Fiona Ebner <f.ebner@proxmox.com>, Proxmox VE user list <pve-user@lists.proxmox.com>
Subject: Re: [PVE-User] QEMU crash with dpdk 22.11 app on Proxmox 8
Date: Wed, 4 Sep 2024 14:49:50 +0000
Message-ID: <PH7PR01MB84968F27787A0BF6B885EEF9879C2@PH7PR01MB8496.prod.exchangelabs.com>

Thank you for the response and explanation.  Would you like me to file a Bugzilla entry for this? Or is there an existing bug ID already that could be used to track the issue?

Thanks,
Josh

From: Fiona Ebner <f.ebner@proxmox.com>
Date: Wednesday, September 4, 2024 at 5:59 AM
To: Proxmox VE user list <pve-user@lists.proxmox.com>
Cc: Knight, Joshua <Joshua.Knight@netscout.com>
Subject: Re: [PVE-User] QEMU crash with dpdk 22.11 app on Proxmox 8
External Email: This message originated outside of NETSCOUT. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Hi,

Am 28.08.24 um 16:56 schrieb Knight, Joshua via pve-user:
>
>
> We are seeing an issue on Proxmox 8 hosts where the underlying QEMU process for a guest will crash while starting a DPDK application in the guest.
>
>
>   *   Proxmox 8.2.4 with QEMU 9.0.2-2
>   *   Guest running Ubuntu 22.04, application is dpdk 22.11 testpmd
>   *   Using virtio network interfaces that are up/connected
>   *   Binding interfaces with the (legacy) igb_uio driver
>
> When starting the application, the VM ssh connection will disconnect and the VM will be powered off in the ui.
>
> root@karma06:~/dpdk-22.11# python3 /root/dpdk-22.11/res/usr/local/bin/dpdk-devbind.py --bind=igb_uio enp6s20
> root@karma06:~/dpdk-22.11# python3 /root/dpdk-22.11/res/usr/local/bin/dpdk-devbind.py --bind=igb_uio enp6s21
> root@karma06:~/dpdk-22.11# python3 /root/dpdk-22.11/res/usr/local/bin/dpdk-devbind.py --bind=igb_uio enp6s22
> root@karma06:~/dpdk-22.11# python3 /root/dpdk-22.11/res/usr/local/bin/dpdk-devbind.py --bind=igb_uio enp6s23
>
> root@karma06:~/dpdk-22.11# /root/dpdk-22.11/res/usr/local/bin/dpdk-testpmd -- -i --port-topology=chained --rxq=1 --txq=1 --rss-ip
> EAL: Detected CPU lcores: 6
> EAL: Detected NUMA nodes: 1
> EAL: Detected static linkage of DPDK
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> EAL: Selected IOVA mode 'PA'
> EAL: VFIO support initialized
> EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:12.0 (socket -1)
> eth_virtio_pci_init(): Failed to init PCI device
> EAL: Requested device 0000:06:12.0 cannot be used
> EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:13.0 (socket -1)
> eth_virtio_pci_init(): Failed to init PCI device
> EAL: Requested device 0000:06:13.0 cannot be used
> EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:14.0 (socket -1)
> EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:15.0 (socket -1)
> EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:16.0 (socket -1)
> EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:17.0 (socket -1)
> TELEMETRY: No legacy callbacks, legacy socket not created
> Interactive-mode selected
> Warning: NUMA should be configured manually by using --port-numa-config and --ring-numa-config parameters along with --numa.
> testpmd: create a new mbuf pool <mb_pool_0>: n=187456, size=2176, socket=0
> testpmd: preferred mempool ops selected: ring_mp_mc
> Configuring Port 0 (socket 0)
>
> client_loop: send disconnect: Broken pipe
>
>
>
> A QEMU assertion is seen in the host’s system log. Using GDB we can see that QEMU is aborted.
>
> karma QEMU[27334]: kvm: ../accel/kvm/kvm-all.c:1836: kvm_irqchip_commit_routes: Assertion `ret == 0' failed.
>
> Thread 10 "CPU 0/KVM" received signal SIGABRT, Aborted.
> [Switching to Thread 0x7d999cc006c0 (LWP 36256)]
> __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
> 44      ./nptl/pthread_kill.c: No such file or directory.
> (gdb) bt
> #0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
> #1  0x00007d99a10a9e8f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
> #2  0x00007d99a105afb2 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
> #3  0x00007d99a1045472 in __GI_abort () at ./stdlib/abort.c:79
> #4  0x00007d99a1045395 in __assert_fail_base (fmt=0x7d99a11b9a90 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
>     assertion=assertion@entry=0x5a9eb5a20f5e "ret == 0", file=file@entry=0x5a9eb5a021a5 "../accel/kvm/kvm-all.c", line=line@entry=1836,
>     function=function@entry=0x5a9eb5a03ca0 <__PRETTY_FUNCTION__.23> "kvm_irqchip_commit_routes") at ./assert/assert.c:92
> #5  0x00007d99a1053eb2 in __GI___assert_fail (assertion=assertion@entry=0x5a9eb5a20f5e "ret == 0",
>     file=file@entry=0x5a9eb5a021a5 "../accel/kvm/kvm-all.c", line=line@entry=1836,
>     function=function@entry=0x5a9eb5a03ca0 <__PRETTY_FUNCTION__.23> "kvm_irqchip_commit_routes") at ./assert/assert.c:101
> #6  0x00005a9eb566248c in kvm_irqchip_commit_routes (s=0x5a9eb79eed10) at ../accel/kvm/kvm-all.c:1836
> #7  kvm_irqchip_commit_routes (s=0x5a9eb79eed10) at ../accel/kvm/kvm-all.c:1821
> #8  0x00005a9eb540bed2 in virtio_pci_one_vector_unmask (proxy=proxy@entry=0x5a9eb9f5ada0, queue_no=queue_no@entry=4294967295,
>     vector=vector@entry=0, msg=..., n=0x5a9eb9f63368) at ../hw/virtio/virtio-pci.c:991
> #9  0x00005a9eb540c09c in virtio_pci_vector_unmask (dev=0x5a9eb9f5ada0, vector=0, msg=...) at ../hw/virtio/virtio-pci.c:1056
> #10 0x00005a9eb536ff62 in msix_fire_vector_notifier (is_masked=false, vector=0, dev=0x5a9eb9f5ada0) at ../hw/pci/msix.c:120
> #11 msix_handle_mask_update (dev=0x5a9eb9f5ada0, vector=0, was_masked=<optimized out>) at ../hw/pci/msix.c:140
> #12 0x00005a9eb5602260 in memory_region_write_accessor (mr=0x5a9eb9f5b3e0, addr=12, value=<optimized out>, size=4, shift=<optimized out>,
>     mask=<optimized out>, attrs=...) at ../system/memory.c:497
> #13 0x00005a9eb5602f4e in access_with_adjusted_size (addr=addr@entry=12, value=value@entry=0x7d999cbfae58, size=size@entry=4,
>     access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x5a9eb56021e0 <memory_region_write_accessor>,
>     mr=<optimized out>, attrs=...) at ../system/memory.c:573
> #14 0x00005a9eb560403c in memory_region_dispatch_write (mr=mr@entry=0x5a9eb9f5b3e0, addr=addr@entry=12, data=<optimized out>,
>     op=<optimized out>, attrs=attrs@entry=...) at ../system/memory.c:1528
> #15 0x00005a9eb560b95f in flatview_write_continue_step (attrs=attrs@entry=..., buf=buf@entry=0x7d99a3433028 "", mr_addr=12,
>     l=l@entry=0x7d999cbfaf80, mr=0x5a9eb9f5b3e0, len=4) at ../system/physmem.c:2713
> #16 0x00005a9eb560bbed in flatview_write_continue (mr=<optimized out>, l=<optimized out>, mr_addr=<optimized out>, len=4, ptr=0xfdf8500c,
>     attrs=..., addr=4260909068, fv=0x7d8d6c0796b0) at ../system/physmem.c:2743
> #17 flatview_write (fv=0x7d8d6c0796b0, addr=addr@entry=4260909068, attrs=attrs@entry=..., buf=buf@entry=0x7d99a3433028, len=len@entry=4)
>     at ../system/physmem.c:2774
> #18 0x00005a9eb560f251 in address_space_write (len=4, buf=0x7d99a3433028, attrs=..., addr=4260909068, as=0x5a9eb66f1f20 <address_space_memory>)
>     at ../system/physmem.c:2894
> #19 address_space_rw (as=0x5a9eb66f1f20 <address_space_memory>, addr=4260909068, attrs=attrs@entry=..., buf=buf@entry=0x7d99a3433028, len=4,
>     is_write=<optimized out>) at ../system/physmem.c:2904
> #20 0x00005a9eb56660e8 in kvm_cpu_exec (cpu=cpu@entry=0x5a9eb81e6890) at ../accel/kvm/kvm-all.c:2917
> #21 0x00005a9eb56676d5 in kvm_vcpu_thread_fn (arg=arg@entry=0x5a9eb81e6890) at ../accel/kvm/kvm-accel-ops.c:50
> #22 0x00005a9eb581dfe8 in qemu_thread_start (args=0x5a9eb81ee390) at ../util/qemu-thread-posix.c:541
> #23 0x00007d99a10a8134 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
> #24 0x00007d99a11287dc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
>
>
> One thing that’s interesting about this backtrace is it seems to exactly match an existing issue in QEMU that claims to be patched, and that patch should be present in QEMU 9.0.2, the version running on this Proxmox host.
>
> https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/1928__;!!Nzg7nt7_!DDyzn2spNViFXSmqxBuK7TasCnLmJNaWW2Mqdm1FXd2DPjls_iN6d4QRHEnsnwc_7Atv-hrey5DbkrHuZ366b3Q$<https://urldefense.com/v3/__https:/gitlab.com/qemu-project/qemu/-/issues/1928__;!!Nzg7nt7_!DDyzn2spNViFXSmqxBuK7TasCnLmJNaWW2Mqdm1FXd2DPjls_iN6d4QRHEnsnwc_7Atv-hrey5DbkrHuZ366b3Q$>
>
> We’ve found a workaround by switching from the deprecated igb_uio driver to the vfio-pci driver when binding the interfaces for dpdk. In this case the VM does not crash. But I’m wondering if anyone has hit this before or if it’s a known issue.  I would certainly not expect any operation in the guest to cause QEMU to crash. It’s also odd that the crash seen claims to be patched in 9.0.2.
>
> We’ve been able to reproduce this on Proxmox 8.0, 8.1, 8.2 on both AMD and Intel processors. The crash does not occur on earlier releases such as Proxmox 6.4, and does not occur with earlier dpdk versions such as 20.08.
>
> Thanks,
> Josh
>

we do have a revert of that patch currently, because it caused some
regressions that sounded just as bad as the original issue [0].

A fix for the regressions has landed upstream now [1], and I'll take a
look at pulling it in and dropping the revert.

[0]:
https://urldefense.com/v3/__https://git.proxmox.com/?p=pve-qemu.git;a=blob;f=debian*patches*extra*0006-Revert-virtio-pci-fix-use-of-a-released-vector.patch;h=d2de6d11ba1e2a2bd2ea8dccf660ac6e66b047d4;hb=582fd47901356342b8e0bef19d7d8fdc324d2d96__;Ly8v!!Nzg7nt7_!DDyzn2spNViFXSmqxBuK7TasCnLmJNaWW2Mqdm1FXd2DPjls_iN6d4QRHEnsnwc_7Atv-hrey5DbkrHueDRcgcQ$<https://urldefense.com/v3/__https:/git.proxmox.com/?p=pve-qemu.git;a=blob;f=debian*patches*extra*0006-Revert-virtio-pci-fix-use-of-a-released-vector.patch;h=d2de6d11ba1e2a2bd2ea8dccf660ac6e66b047d4;hb=582fd47901356342b8e0bef19d7d8fdc324d2d96__;Ly8v!!Nzg7nt7_!DDyzn2spNViFXSmqxBuK7TasCnLmJNaWW2Mqdm1FXd2DPjls_iN6d4QRHEnsnwc_7Atv-hrey5DbkrHueDRcgcQ$>
[1]:
https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/a8e63ff289d137197ad7a701a587cc432872d798.1724151593.git.mst@redhat.com/__;!!Nzg7nt7_!DDyzn2spNViFXSmqxBuK7TasCnLmJNaWW2Mqdm1FXd2DPjls_iN6d4QRHEnsnwc_7Atv-hrey5DbkrHulPlHOF4$<https://urldefense.com/v3/__https:/lore.kernel.org/qemu-devel/a8e63ff289d137197ad7a701a587cc432872d798.1724151593.git.mst@redhat.com/__;!!Nzg7nt7_!DDyzn2spNViFXSmqxBuK7TasCnLmJNaWW2Mqdm1FXd2DPjls_iN6d4QRHEnsnwc_7Atv-hrey5DbkrHulPlHOF4$>

Best Regards,
Fiona

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PVE-User] QEMU crash with dpdk 22.11 app on Proxmox 8
       [not found]   ` <PH7PR01MB84968F27787A0BF6B885EEF9879C2@PH7PR01MB8496.prod.exchangelabs.com>
@ 2024-09-05  9:53     ` Fiona Ebner
  0 siblings, 0 replies; 4+ messages in thread
From: Fiona Ebner @ 2024-09-05  9:53 UTC (permalink / raw)
  To: Knight, Joshua, Proxmox VE user list

Am 04.09.24 um 16:49 schrieb Knight, Joshua:
> Thank you for the response and explanation.  Would you like me to file a
> Bugzilla entry for this? Or is there an existing bug ID already that
> could be used to track the issue?

If you want to. I don't think there is a bugzilla entry yet.

A patch has been sent to the mailing list now:
https://lists.proxmox.com/pipermail/pve-devel/2024-September/065243.html

Best Regards,
Fiona


_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-09-05  9:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-28 14:56 [PVE-User] QEMU crash with dpdk 22.11 app on Proxmox 8 Knight, Joshua via pve-user
2024-09-04  9:58 ` Fiona Ebner
2024-09-04 14:49   ` Knight, Joshua via pve-user
     [not found]   ` <PH7PR01MB84968F27787A0BF6B885EEF9879C2@PH7PR01MB8496.prod.exchangelabs.com>
2024-09-05  9:53     ` Fiona Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal