* [PVE-User] Debian 11 hard lock issues as VM
@ 2023-01-17 5:06 Bryan Fields
[not found] ` <mailman.261.1673943754.458.pve-user@lists.proxmox.com>
0 siblings, 1 reply; 2+ messages in thread
From: Bryan Fields @ 2023-01-17 5:06 UTC (permalink / raw)
To: pve-user
I am running proxmox 7.3-4 with a now Debian 11 VM.
I have ZFS local storage in each server in the cluster. Every 15 minutes the
VM is replicated to the other server(s). Recently I've upgraded a server from
Debian 9 to Debian 11 and it started locking up. This didn't seem to have a
certain amount of time that it took to lockup, or a certain number of
replications.
Through some debugging I found this was the qemu-agent not unfreezing the OS
after the replication. This should happen in under 100 ms is my understanding
and from what I could see, it worked fine on all my other VM's with Ubuntu or
RHEL.
I compared the agent from the debian 11 server and the Ubuntu servers, and
debian was 5.2.0 vs 6.2.0 on Ubuntu. I compiled the agent from the 7.2.0 qemu
sources (statically too if anyone wants a copy) and ran it from screen on a
terminal on the Debian 11 VM. This still locked up hard after 2-4 hours.
Debian is using the stock kernel:
> Linux eyes 5.10.0-20-amd64 #1 SMP Debian 5.10.158-2 (2022-12-13) x86_64 GNU/Linux
I read some things online and thought it might be related to VirtIO, and
changed that to VirtIO single with no difference.
I've reverted back to the old kernel and am going to let this run.
4.9.0-19-amd64 #1 SMP Debian 4.9.320-2 (2022-06-30) x86_64 GNU/Linux
Complicating this, the box is my observium install and I don't have another
device watching it, so when it locks up, it takes my monitoring offline :-D
On the working Ubuntu boxes I'm running:
> 5.15.0-58-generic #64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Below is the log where this locks up, and there's no more output after the
last one (I have verbose enabled)
> 1673846104.535376: debug: received EOF
> 1673846104.635560: debug: received EOF
> 1673846104.735735: debug: received EOF
> 1673846104.835868: debug: received EOF
> 1673846104.936067: debug: read data, count: 104, data: {"execute":"guest-sync-delimited","arguments":{"id":371290701}}
> {"arguments":{},"execute":"guest-ping"}
>
> 1673846104.936136: debug: process_event: called
> 1673846104.936144: debug: processing command
> 1673846104.936216: debug: sending data, count: 23
> 1673846104.936257: debug: process_event: called
> 1673846104.936272: debug: processing command
> 1673846104.936350: debug: sending data, count: 15
> 1673846104.936833: debug: received EOF
> 1673846105.37003: debug: received EOF
> 1673846105.137190: debug: received EOF
> 1673846105.237344: debug: received EOF
> 1673846105.337525: debug: received EOF
> 1673846105.437693: debug: received EOF
> 1673846105.537907: debug: received EOF
> 1673846105.638096: debug: received EOF
> 1673846105.738307: debug: received EOF
> 1673846105.838495: debug: received EOF
> 1673846105.938652: debug: received EOF
> 1673846106.38813: debug: received EOF
> 1673846106.139011: debug: received EOF
> 1673846106.239210: debug: received EOF
> 1673846106.339403: debug: received EOF
> 1673846106.439583: debug: received EOF
> 1673846106.539782: debug: received EOF
> 1673846106.639990: debug: received EOF
> 1673846106.740190: debug: received EOF
> 1673846106.840388: debug: read data, count: 115, data: {"arguments":{"id":371290702},"execute":"guest-sync-delimited"}
> {"execute":"guest-fsfreeze-freeze","arguments":{}}
>
> 1673846106.840450: debug: process_event: called
> 1673846106.840465: debug: processing command
> 1673846106.840497: debug: sending data, count: 23
> 1673846106.840545: debug: process_event: called
> 1673846106.840563: debug: processing command
> 1673846106.841114: debug: disabling command: guest-get-time
> 1673846106.841131: debug: disabling command: guest-set-time
> 1673846106.841138: debug: disabling command: guest-shutdown
> 1673846106.841145: debug: disabling command: guest-file-open
> 1673846106.841151: debug: disabling command: guest-file-close
> 1673846106.841157: debug: disabling command: guest-file-read
> 1673846106.841164: debug: disabling command: guest-file-write
> 1673846106.841171: debug: disabling command: guest-file-seek
> 1673846106.841179: debug: disabling command: guest-file-flush
> 1673846106.841187: debug: disabling command: guest-fsfreeze-freeze
> 1673846106.841194: debug: disabling command: guest-fsfreeze-freeze-list
> 1673846106.841202: debug: disabling command: guest-fstrim
> 1673846106.841209: debug: disabling command: guest-suspend-disk
> 1673846106.841217: debug: disabling command: guest-suspend-ram
> 1673846106.841225: debug: disabling command: guest-suspend-hybrid
> 1673846106.841232: debug: disabling command: guest-network-get-interfaces
> 1673846106.841239: debug: disabling command: guest-get-vcpus
> 1673846106.841245: debug: disabling command: guest-set-vcpus
> 1673846106.841251: debug: disabling command: guest-get-disks
> 1673846106.841257: debug: disabling command: guest-get-fsinfo
> 1673846106.841265: debug: disabling command: guest-set-user-password
> 1673846106.841272: debug: disabling command: guest-get-memory-blocks
> 1673846106.841278: debug: disabling command: guest-set-memory-blocks
> 1673846106.841286: debug: disabling command: guest-get-memory-block-info
> 1673846106.841294: debug: disabling command: guest-exec-status
> 1673846106.841303: debug: disabling command: guest-exec
> 1673846106.841311: debug: disabling command: guest-get-host-name
> 1673846106.841319: debug: disabling command: guest-get-users
> 1673846106.841326: debug: disabling command: guest-get-timezone
> 1673846106.841334: debug: disabling command: guest-get-osinfo
> 1673846106.841343: debug: disabling command: guest-get-devices
> 1673846106.841350: debug: disabling command: guest-ssh-get-authorized-keys
> 1673846106.841356: debug: disabling command: guest-ssh-add-authorized-keys
> 1673846106.841363: debug: disabling command: guest-ssh-remove-authorized-keys
> 1673846106.841371: warning: disabling logging due to filesystem freeze
Other than disabling the agent, is there any reason this is hapening? I can't
think that Debian 11 is shipping with a broken kernel, but the 'qm guest cmd
152 fsfreeze-freeze' and 'qm guest cmd 152 fsfreeze-thaw' works fine from the
host. Could this be something with the VirtIO pipe/IPC?
Anyone else seeing this or have any ideas?
--
Bryan Fields
727-409-1194 - Voice
http://bryanfields.net
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PVE-User] Debian 11 hard lock issues as VM
[not found] ` <mailman.261.1673943754.458.pve-user@lists.proxmox.com>
@ 2023-01-18 0:32 ` Bryan Fields
0 siblings, 0 replies; 2+ messages in thread
From: Bryan Fields @ 2023-01-18 0:32 UTC (permalink / raw)
To: pve-user
On 1/17/23 3:22 AM, Eneko Lacunza via pve-user wrote:
> Hi Bryan,
>
> We started to upgrade our cluster from PVE 7.2 to 7.3 yesterday.
>
> I have enabled the agent in our only VM with Debian 11 running on a
> 7.3-4 node at the moment, and performed 5 full backups in a row, VM
> continues working (no hang).
This is replication, but I believe it's the same.
> You haven't provided details about your setup:
>
> - Server (especially CPU model). Debian could be suffering from weird
> BIOS clock issues.
The Hosts are HP DL360's Generation 7. ZFS Raid2 local storage using 1.6 TB
SAS SSD's. The life used indicator is now 6% or 7% on most disks.
There is 192 GB of ram in each server 16384 MB 1600 MHz ECC ram.
There are dual 3.07 GHz 6 core (12 thread) CPU's. /proc/cpuinfo is below.
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU X5675 @ 3.07GHz
stepping : 2
microcode : 0x1a
cpu MHz : 1910.971
cache size : 12288 KB
physical id : 0
siblings : 12
core id : 0
cpu cores : 6
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx pdpe1gb rdtscp
lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc
cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3
cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes lahf_lm epb pti tpr_shadow
vnmi flexpriority ept vpid dtherm ida arat
vmx flags : vnmi preemption_timer invvpid ept_x_only ept_1gb flexpriority
tsc_offset vtpr mtf vapic ept vpid unrestricted_guest ple
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs
itlb_multihit mmio_unknown
bogomips : 6134.18
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
the proxmox config for the VM is here:
agent: 1,fstrim_cloned_disks=1
bootdisk: scsi0
cores: 2
cpuunits: 2048
ide2: none,media=cdrom
memory: 8192
name: eyes.tampacoop.net
net0: virtio=86:49:26:AA:86:E7,bridge=vmbr199,firewall=1
net1: virtio=A2:C5:47:85:3E:3B,bridge=vmbr8
numa: 0
onboot: 1
ostype: l26
parent: before_extend
scsi0: local-zfs:vm-102-disk-0,discard=on,format=raw,iothread=1,size=48G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=11ed5a86-3395-49f2-ac80-16804b237a0d
sockets: 1
startup: order=1
vmgenid: 6238f0f2-ac90-43e0-b56c-05e1ed1c2431
> - Running kernel on PVE 7.3-4 . Kernel 5.15.x has been quite bad for us,
> have you tried kernel 5.13 or 5.19?
I reverted to 4.9.0-19-amd64 #1 SMP Debian 4.9.320-2 (2022-06-30) x86_64
GNU/Linux Kernel on the guest OS and it's not locked up once now. This is
running either the 5.2.0 or 7.2.0 agent.
I've moved the VM's across hosts and they have the same problem.
FingerlessGloves mentioned there was the possibility of this being a mariadb
issue and I can confirm we have the official Maria DB packages installed on
this server. 10.10.2-MariaDB-1:10.10.2+maria~deb11 is what we're running on
the server.
Could this be some interaction of new kernel and new maria db?
--
Bryan Fields
727-409-1194 - Voice
http://bryanfields.net
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2023-01-18 0:32 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-17 5:06 [PVE-User] Debian 11 hard lock issues as VM Bryan Fields
[not found] ` <mailman.261.1673943754.458.pve-user@lists.proxmox.com>
2023-01-18 0:32 ` Bryan Fields
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox