From: Bryan Fields <Bryan@bryanfields.net>
To: pve-user@lists.proxmox.com
Subject: [PVE-User] Debian 11 hard lock issues as VM
Date: Tue, 17 Jan 2023 00:06:52 -0500 [thread overview]
Message-ID: <2635f65d-33fb-5447-a3c1-d5cbab9e04e1@bryanfields.net> (raw)
I am running proxmox 7.3-4 with a now Debian 11 VM.
I have ZFS local storage in each server in the cluster. Every 15 minutes the
VM is replicated to the other server(s). Recently I've upgraded a server from
Debian 9 to Debian 11 and it started locking up. This didn't seem to have a
certain amount of time that it took to lockup, or a certain number of
replications.
Through some debugging I found this was the qemu-agent not unfreezing the OS
after the replication. This should happen in under 100 ms is my understanding
and from what I could see, it worked fine on all my other VM's with Ubuntu or
RHEL.
I compared the agent from the debian 11 server and the Ubuntu servers, and
debian was 5.2.0 vs 6.2.0 on Ubuntu. I compiled the agent from the 7.2.0 qemu
sources (statically too if anyone wants a copy) and ran it from screen on a
terminal on the Debian 11 VM. This still locked up hard after 2-4 hours.
Debian is using the stock kernel:
> Linux eyes 5.10.0-20-amd64 #1 SMP Debian 5.10.158-2 (2022-12-13) x86_64 GNU/Linux
I read some things online and thought it might be related to VirtIO, and
changed that to VirtIO single with no difference.
I've reverted back to the old kernel and am going to let this run.
4.9.0-19-amd64 #1 SMP Debian 4.9.320-2 (2022-06-30) x86_64 GNU/Linux
Complicating this, the box is my observium install and I don't have another
device watching it, so when it locks up, it takes my monitoring offline :-D
On the working Ubuntu boxes I'm running:
> 5.15.0-58-generic #64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Below is the log where this locks up, and there's no more output after the
last one (I have verbose enabled)
> 1673846104.535376: debug: received EOF
> 1673846104.635560: debug: received EOF
> 1673846104.735735: debug: received EOF
> 1673846104.835868: debug: received EOF
> 1673846104.936067: debug: read data, count: 104, data: {"execute":"guest-sync-delimited","arguments":{"id":371290701}}
> {"arguments":{},"execute":"guest-ping"}
>
> 1673846104.936136: debug: process_event: called
> 1673846104.936144: debug: processing command
> 1673846104.936216: debug: sending data, count: 23
> 1673846104.936257: debug: process_event: called
> 1673846104.936272: debug: processing command
> 1673846104.936350: debug: sending data, count: 15
> 1673846104.936833: debug: received EOF
> 1673846105.37003: debug: received EOF
> 1673846105.137190: debug: received EOF
> 1673846105.237344: debug: received EOF
> 1673846105.337525: debug: received EOF
> 1673846105.437693: debug: received EOF
> 1673846105.537907: debug: received EOF
> 1673846105.638096: debug: received EOF
> 1673846105.738307: debug: received EOF
> 1673846105.838495: debug: received EOF
> 1673846105.938652: debug: received EOF
> 1673846106.38813: debug: received EOF
> 1673846106.139011: debug: received EOF
> 1673846106.239210: debug: received EOF
> 1673846106.339403: debug: received EOF
> 1673846106.439583: debug: received EOF
> 1673846106.539782: debug: received EOF
> 1673846106.639990: debug: received EOF
> 1673846106.740190: debug: received EOF
> 1673846106.840388: debug: read data, count: 115, data: {"arguments":{"id":371290702},"execute":"guest-sync-delimited"}
> {"execute":"guest-fsfreeze-freeze","arguments":{}}
>
> 1673846106.840450: debug: process_event: called
> 1673846106.840465: debug: processing command
> 1673846106.840497: debug: sending data, count: 23
> 1673846106.840545: debug: process_event: called
> 1673846106.840563: debug: processing command
> 1673846106.841114: debug: disabling command: guest-get-time
> 1673846106.841131: debug: disabling command: guest-set-time
> 1673846106.841138: debug: disabling command: guest-shutdown
> 1673846106.841145: debug: disabling command: guest-file-open
> 1673846106.841151: debug: disabling command: guest-file-close
> 1673846106.841157: debug: disabling command: guest-file-read
> 1673846106.841164: debug: disabling command: guest-file-write
> 1673846106.841171: debug: disabling command: guest-file-seek
> 1673846106.841179: debug: disabling command: guest-file-flush
> 1673846106.841187: debug: disabling command: guest-fsfreeze-freeze
> 1673846106.841194: debug: disabling command: guest-fsfreeze-freeze-list
> 1673846106.841202: debug: disabling command: guest-fstrim
> 1673846106.841209: debug: disabling command: guest-suspend-disk
> 1673846106.841217: debug: disabling command: guest-suspend-ram
> 1673846106.841225: debug: disabling command: guest-suspend-hybrid
> 1673846106.841232: debug: disabling command: guest-network-get-interfaces
> 1673846106.841239: debug: disabling command: guest-get-vcpus
> 1673846106.841245: debug: disabling command: guest-set-vcpus
> 1673846106.841251: debug: disabling command: guest-get-disks
> 1673846106.841257: debug: disabling command: guest-get-fsinfo
> 1673846106.841265: debug: disabling command: guest-set-user-password
> 1673846106.841272: debug: disabling command: guest-get-memory-blocks
> 1673846106.841278: debug: disabling command: guest-set-memory-blocks
> 1673846106.841286: debug: disabling command: guest-get-memory-block-info
> 1673846106.841294: debug: disabling command: guest-exec-status
> 1673846106.841303: debug: disabling command: guest-exec
> 1673846106.841311: debug: disabling command: guest-get-host-name
> 1673846106.841319: debug: disabling command: guest-get-users
> 1673846106.841326: debug: disabling command: guest-get-timezone
> 1673846106.841334: debug: disabling command: guest-get-osinfo
> 1673846106.841343: debug: disabling command: guest-get-devices
> 1673846106.841350: debug: disabling command: guest-ssh-get-authorized-keys
> 1673846106.841356: debug: disabling command: guest-ssh-add-authorized-keys
> 1673846106.841363: debug: disabling command: guest-ssh-remove-authorized-keys
> 1673846106.841371: warning: disabling logging due to filesystem freeze
Other than disabling the agent, is there any reason this is hapening? I can't
think that Debian 11 is shipping with a broken kernel, but the 'qm guest cmd
152 fsfreeze-freeze' and 'qm guest cmd 152 fsfreeze-thaw' works fine from the
host. Could this be something with the VirtIO pipe/IPC?
Anyone else seeing this or have any ideas?
--
Bryan Fields
727-409-1194 - Voice
http://bryanfields.net
next reply other threads:[~2023-01-17 5:14 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-17 5:06 Bryan Fields [this message]
[not found] ` <mailman.261.1673943754.458.pve-user@lists.proxmox.com>
2023-01-18 0:32 ` Bryan Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2635f65d-33fb-5447-a3c1-d5cbab9e04e1@bryanfields.net \
--to=bryan@bryanfields.net \
--cc=pve-user@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox