From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <Bryan@bryanfields.net>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id E14DC94F3A
 for <pve-user@lists.proxmox.com>; Tue, 17 Jan 2023 06:14:23 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id C987D732F
 for <pve-user@lists.proxmox.com>; Tue, 17 Jan 2023 06:14:23 +0100 (CET)
Received: from morty.keekles.org (Morty.keekles.org [199.47.174.151])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS
 for <pve-user@lists.proxmox.com>; Tue, 17 Jan 2023 06:14:20 +0100 (CET)
Received: from localhost (localhost [127.0.0.1])
 by morty.keekles.org (Postfix) with ESMTP id 6D0AF19E0A3A
 for <pve-user@lists.proxmox.com>; Tue, 17 Jan 2023 05:06:57 +0000 (UTC)
Received: from morty.keekles.org ([127.0.0.1])
 by localhost (morty.keekles.org [127.0.0.1]) (amavisd-new, port 10032)
 with ESMTP id FFK7weHF_uja for <pve-user@lists.proxmox.com>;
 Tue, 17 Jan 2023 05:06:53 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
 by morty.keekles.org (Postfix) with ESMTP id 1382A19E0C66
 for <pve-user@lists.proxmox.com>; Tue, 17 Jan 2023 05:06:53 +0000 (UTC)
DKIM-Filter: OpenDKIM Filter v2.10.3 morty.keekles.org 1382A19E0C66
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bryanfields.net;
 s=909DCF92-EFE7-11EB-9235-648EB8AF1B81; t=1673932013;
 bh=1+Zf8rL+ZP+mwOXYmJbYKaNgXIg0HrfUXC2TXNOpgmM=;
 h=Message-ID:Date:MIME-Version:To:From;
 b=t3x+/nMb7CX7EUFZOaeSl/pNJW0N1DcLpLDy/woBOTKvzL1T/eI+OCba3vGRVeQAb
 pcksopWCjDI47ExoI+K9CHgUe6dYNkiWx8rM/gewPZY7ppXMPuNfMqPlX9qtlw79+s
 Oe+T3d496PnjmWahvIThJVf/mkGC9ypqlJGjXDoLLjjxfo/wsXKaudMkXDtpTeHWtS
 3C+gY7YNBHpMZAvmGm5Q3aXWuQbzOO5MvX7E59l7jH7QgxIc0PpLFmi8RK/UJ87iZV
 YT3hkZ27pdsUM69OrFbL7iykXsemVJ/zcnwc+OUym6ns4ax0qxrHThhM7J3YU6TVYs
 +kJ6gl/8qRWQQ==
X-Virus-Scanned: amavisd-new at morty.keekles.org
Received: from morty.keekles.org ([127.0.0.1])
 by localhost (morty.keekles.org [127.0.0.1]) (amavisd-new, port 10026)
 with ESMTP id eX_LInJmhWcj for <pve-user@lists.proxmox.com>;
 Tue, 17 Jan 2023 05:06:52 +0000 (UTC)
Received: from [192.168.128.105]
 (static-47-206-239-202.tamp.fl.frontiernet.net [47.206.239.202])
 by morty.keekles.org (Postfix) with ESMTPSA id D450319E0A3A
 for <pve-user@lists.proxmox.com>; Tue, 17 Jan 2023 05:06:52 +0000 (UTC)
Message-ID: <2635f65d-33fb-5447-a3c1-d5cbab9e04e1@bryanfields.net>
Date: Tue, 17 Jan 2023 00:06:52 -0500
MIME-Version: 1.0
User-Agent: Mutt/1.12.0 (2019-05-25)
To: pve-user@lists.proxmox.com
Content-Language: en-US
From: Bryan Fields <Bryan@bryanfields.net>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-SPAM-LEVEL: Spam detection results:  0
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DKIM_SIGNED               0.1 Message has a DKIM or DK signature,
 not necessarily valid
 DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature
 DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's
 domain
 DKIM_VALID_EF -0.1 Message has a valid DKIM or DK signature from envelope-from
 domain
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [bryanfields.net]
Subject: [PVE-User] Debian 11 hard lock issues as VM
X-BeenThere: pve-user@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE user list <pve-user.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-user>, 
 <mailto:pve-user-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-user/>
List-Post: <mailto:pve-user@lists.proxmox.com>
List-Help: <mailto:pve-user-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user>, 
 <mailto:pve-user-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Tue, 17 Jan 2023 05:14:23 -0000

I am running proxmox 7.3-4 with a now Debian 11 VM.

I have ZFS local storage in each server in the cluster.   Every 15 minutes the 
VM is replicated to the other server(s).  Recently I've upgraded a server from 
Debian 9 to Debian 11 and it started locking up.  This didn't seem to have a 
certain amount of time that it took to lockup, or a certain number of 
replications.

Through some debugging I found this was the qemu-agent not unfreezing the OS 
after the replication.  This should happen in under 100 ms is my understanding 
and from what I could see, it worked fine on all my other VM's with Ubuntu or 
RHEL.

I compared the agent from the debian 11 server and the Ubuntu servers, and 
debian was 5.2.0 vs 6.2.0 on Ubuntu.  I compiled the agent from the 7.2.0 qemu 
sources (statically too if anyone wants a copy) and ran it from screen on a 
terminal on the Debian 11 VM.  This still locked up hard after 2-4 hours.

Debian is using the stock kernel:
> Linux eyes 5.10.0-20-amd64 #1 SMP Debian 5.10.158-2 (2022-12-13) x86_64 GNU/Linux  

I read some things online and thought it might be related to VirtIO, and 
changed that to VirtIO single with no difference.

I've reverted back to the old kernel and am going to let this run.
4.9.0-19-amd64 #1 SMP Debian 4.9.320-2 (2022-06-30) x86_64 GNU/Linux

Complicating this, the box is my observium install and I don't have another 
device watching it, so when it locks up, it takes my monitoring offline :-D

On the working Ubuntu boxes I'm running:
> 5.15.0-58-generic #64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Below is the log where this locks up, and there's no more output after the 
last one (I have verbose enabled)

> 1673846104.535376: debug: received EOF
> 1673846104.635560: debug: received EOF
> 1673846104.735735: debug: received EOF
> 1673846104.835868: debug: received EOF
> 1673846104.936067: debug: read data, count: 104, data: {"execute":"guest-sync-delimited","arguments":{"id":371290701}}
> {"arguments":{},"execute":"guest-ping"}
> 
> 1673846104.936136: debug: process_event: called
> 1673846104.936144: debug: processing command
> 1673846104.936216: debug: sending data, count: 23
> 1673846104.936257: debug: process_event: called
> 1673846104.936272: debug: processing command
> 1673846104.936350: debug: sending data, count: 15
> 1673846104.936833: debug: received EOF
> 1673846105.37003: debug: received EOF
> 1673846105.137190: debug: received EOF
> 1673846105.237344: debug: received EOF
> 1673846105.337525: debug: received EOF
> 1673846105.437693: debug: received EOF
> 1673846105.537907: debug: received EOF
> 1673846105.638096: debug: received EOF
> 1673846105.738307: debug: received EOF
> 1673846105.838495: debug: received EOF
> 1673846105.938652: debug: received EOF
> 1673846106.38813: debug: received EOF
> 1673846106.139011: debug: received EOF
> 1673846106.239210: debug: received EOF
> 1673846106.339403: debug: received EOF
> 1673846106.439583: debug: received EOF
> 1673846106.539782: debug: received EOF
> 1673846106.639990: debug: received EOF
> 1673846106.740190: debug: received EOF
> 1673846106.840388: debug: read data, count: 115, data: {"arguments":{"id":371290702},"execute":"guest-sync-delimited"}
> {"execute":"guest-fsfreeze-freeze","arguments":{}}
> 
> 1673846106.840450: debug: process_event: called
> 1673846106.840465: debug: processing command
> 1673846106.840497: debug: sending data, count: 23
> 1673846106.840545: debug: process_event: called
> 1673846106.840563: debug: processing command
> 1673846106.841114: debug: disabling command: guest-get-time
> 1673846106.841131: debug: disabling command: guest-set-time
> 1673846106.841138: debug: disabling command: guest-shutdown
> 1673846106.841145: debug: disabling command: guest-file-open
> 1673846106.841151: debug: disabling command: guest-file-close
> 1673846106.841157: debug: disabling command: guest-file-read
> 1673846106.841164: debug: disabling command: guest-file-write
> 1673846106.841171: debug: disabling command: guest-file-seek
> 1673846106.841179: debug: disabling command: guest-file-flush
> 1673846106.841187: debug: disabling command: guest-fsfreeze-freeze
> 1673846106.841194: debug: disabling command: guest-fsfreeze-freeze-list
> 1673846106.841202: debug: disabling command: guest-fstrim
> 1673846106.841209: debug: disabling command: guest-suspend-disk
> 1673846106.841217: debug: disabling command: guest-suspend-ram
> 1673846106.841225: debug: disabling command: guest-suspend-hybrid
> 1673846106.841232: debug: disabling command: guest-network-get-interfaces
> 1673846106.841239: debug: disabling command: guest-get-vcpus
> 1673846106.841245: debug: disabling command: guest-set-vcpus
> 1673846106.841251: debug: disabling command: guest-get-disks
> 1673846106.841257: debug: disabling command: guest-get-fsinfo
> 1673846106.841265: debug: disabling command: guest-set-user-password
> 1673846106.841272: debug: disabling command: guest-get-memory-blocks
> 1673846106.841278: debug: disabling command: guest-set-memory-blocks
> 1673846106.841286: debug: disabling command: guest-get-memory-block-info
> 1673846106.841294: debug: disabling command: guest-exec-status
> 1673846106.841303: debug: disabling command: guest-exec
> 1673846106.841311: debug: disabling command: guest-get-host-name
> 1673846106.841319: debug: disabling command: guest-get-users
> 1673846106.841326: debug: disabling command: guest-get-timezone
> 1673846106.841334: debug: disabling command: guest-get-osinfo
> 1673846106.841343: debug: disabling command: guest-get-devices
> 1673846106.841350: debug: disabling command: guest-ssh-get-authorized-keys
> 1673846106.841356: debug: disabling command: guest-ssh-add-authorized-keys
> 1673846106.841363: debug: disabling command: guest-ssh-remove-authorized-keys
> 1673846106.841371: warning: disabling logging due to filesystem freeze


Other than disabling the agent, is there any reason this is hapening?  I can't 
think that Debian 11 is shipping with a broken kernel, but the 'qm guest cmd 
152 fsfreeze-freeze' and 'qm guest cmd 152 fsfreeze-thaw' works fine from the 
host. Could this be something with the VirtIO pipe/IPC?

Anyone else seeing this or have any ideas?

-- 
Bryan Fields

727-409-1194 - Voice
http://bryanfields.net