From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id E14DC94F3A for ; Tue, 17 Jan 2023 06:14:23 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id C987D732F for ; Tue, 17 Jan 2023 06:14:23 +0100 (CET) Received: from morty.keekles.org (Morty.keekles.org [199.47.174.151]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Tue, 17 Jan 2023 06:14:20 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by morty.keekles.org (Postfix) with ESMTP id 6D0AF19E0A3A for ; Tue, 17 Jan 2023 05:06:57 +0000 (UTC) Received: from morty.keekles.org ([127.0.0.1]) by localhost (morty.keekles.org [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id FFK7weHF_uja for ; Tue, 17 Jan 2023 05:06:53 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by morty.keekles.org (Postfix) with ESMTP id 1382A19E0C66 for ; Tue, 17 Jan 2023 05:06:53 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.10.3 morty.keekles.org 1382A19E0C66 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bryanfields.net; s=909DCF92-EFE7-11EB-9235-648EB8AF1B81; t=1673932013; bh=1+Zf8rL+ZP+mwOXYmJbYKaNgXIg0HrfUXC2TXNOpgmM=; h=Message-ID:Date:MIME-Version:To:From; b=t3x+/nMb7CX7EUFZOaeSl/pNJW0N1DcLpLDy/woBOTKvzL1T/eI+OCba3vGRVeQAb pcksopWCjDI47ExoI+K9CHgUe6dYNkiWx8rM/gewPZY7ppXMPuNfMqPlX9qtlw79+s Oe+T3d496PnjmWahvIThJVf/mkGC9ypqlJGjXDoLLjjxfo/wsXKaudMkXDtpTeHWtS 3C+gY7YNBHpMZAvmGm5Q3aXWuQbzOO5MvX7E59l7jH7QgxIc0PpLFmi8RK/UJ87iZV YT3hkZ27pdsUM69OrFbL7iykXsemVJ/zcnwc+OUym6ns4ax0qxrHThhM7J3YU6TVYs +kJ6gl/8qRWQQ== X-Virus-Scanned: amavisd-new at morty.keekles.org Received: from morty.keekles.org ([127.0.0.1]) by localhost (morty.keekles.org [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id eX_LInJmhWcj for ; Tue, 17 Jan 2023 05:06:52 +0000 (UTC) Received: from [192.168.128.105] (static-47-206-239-202.tamp.fl.frontiernet.net [47.206.239.202]) by morty.keekles.org (Postfix) with ESMTPSA id D450319E0A3A for ; Tue, 17 Jan 2023 05:06:52 +0000 (UTC) Message-ID: <2635f65d-33fb-5447-a3c1-d5cbab9e04e1@bryanfields.net> Date: Tue, 17 Jan 2023 00:06:52 -0500 MIME-Version: 1.0 User-Agent: Mutt/1.12.0 (2019-05-25) To: pve-user@lists.proxmox.com Content-Language: en-US From: Bryan Fields Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-SPAM-LEVEL: Spam detection results: 0 BAYES_00 -1.9 Bayes spam probability is 0 to 1% DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's domain DKIM_VALID_EF -0.1 Message has a valid DKIM or DK signature from envelope-from domain SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [bryanfields.net] Subject: [PVE-User] Debian 11 hard lock issues as VM X-BeenThere: pve-user@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE user list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Jan 2023 05:14:23 -0000 I am running proxmox 7.3-4 with a now Debian 11 VM. I have ZFS local storage in each server in the cluster. Every 15 minutes the VM is replicated to the other server(s). Recently I've upgraded a server from Debian 9 to Debian 11 and it started locking up. This didn't seem to have a certain amount of time that it took to lockup, or a certain number of replications. Through some debugging I found this was the qemu-agent not unfreezing the OS after the replication. This should happen in under 100 ms is my understanding and from what I could see, it worked fine on all my other VM's with Ubuntu or RHEL. I compared the agent from the debian 11 server and the Ubuntu servers, and debian was 5.2.0 vs 6.2.0 on Ubuntu. I compiled the agent from the 7.2.0 qemu sources (statically too if anyone wants a copy) and ran it from screen on a terminal on the Debian 11 VM. This still locked up hard after 2-4 hours. Debian is using the stock kernel: > Linux eyes 5.10.0-20-amd64 #1 SMP Debian 5.10.158-2 (2022-12-13) x86_64 GNU/Linux I read some things online and thought it might be related to VirtIO, and changed that to VirtIO single with no difference. I've reverted back to the old kernel and am going to let this run. 4.9.0-19-amd64 #1 SMP Debian 4.9.320-2 (2022-06-30) x86_64 GNU/Linux Complicating this, the box is my observium install and I don't have another device watching it, so when it locks up, it takes my monitoring offline :-D On the working Ubuntu boxes I'm running: > 5.15.0-58-generic #64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux Below is the log where this locks up, and there's no more output after the last one (I have verbose enabled) > 1673846104.535376: debug: received EOF > 1673846104.635560: debug: received EOF > 1673846104.735735: debug: received EOF > 1673846104.835868: debug: received EOF > 1673846104.936067: debug: read data, count: 104, data: {"execute":"guest-sync-delimited","arguments":{"id":371290701}} > {"arguments":{},"execute":"guest-ping"} > > 1673846104.936136: debug: process_event: called > 1673846104.936144: debug: processing command > 1673846104.936216: debug: sending data, count: 23 > 1673846104.936257: debug: process_event: called > 1673846104.936272: debug: processing command > 1673846104.936350: debug: sending data, count: 15 > 1673846104.936833: debug: received EOF > 1673846105.37003: debug: received EOF > 1673846105.137190: debug: received EOF > 1673846105.237344: debug: received EOF > 1673846105.337525: debug: received EOF > 1673846105.437693: debug: received EOF > 1673846105.537907: debug: received EOF > 1673846105.638096: debug: received EOF > 1673846105.738307: debug: received EOF > 1673846105.838495: debug: received EOF > 1673846105.938652: debug: received EOF > 1673846106.38813: debug: received EOF > 1673846106.139011: debug: received EOF > 1673846106.239210: debug: received EOF > 1673846106.339403: debug: received EOF > 1673846106.439583: debug: received EOF > 1673846106.539782: debug: received EOF > 1673846106.639990: debug: received EOF > 1673846106.740190: debug: received EOF > 1673846106.840388: debug: read data, count: 115, data: {"arguments":{"id":371290702},"execute":"guest-sync-delimited"} > {"execute":"guest-fsfreeze-freeze","arguments":{}} > > 1673846106.840450: debug: process_event: called > 1673846106.840465: debug: processing command > 1673846106.840497: debug: sending data, count: 23 > 1673846106.840545: debug: process_event: called > 1673846106.840563: debug: processing command > 1673846106.841114: debug: disabling command: guest-get-time > 1673846106.841131: debug: disabling command: guest-set-time > 1673846106.841138: debug: disabling command: guest-shutdown > 1673846106.841145: debug: disabling command: guest-file-open > 1673846106.841151: debug: disabling command: guest-file-close > 1673846106.841157: debug: disabling command: guest-file-read > 1673846106.841164: debug: disabling command: guest-file-write > 1673846106.841171: debug: disabling command: guest-file-seek > 1673846106.841179: debug: disabling command: guest-file-flush > 1673846106.841187: debug: disabling command: guest-fsfreeze-freeze > 1673846106.841194: debug: disabling command: guest-fsfreeze-freeze-list > 1673846106.841202: debug: disabling command: guest-fstrim > 1673846106.841209: debug: disabling command: guest-suspend-disk > 1673846106.841217: debug: disabling command: guest-suspend-ram > 1673846106.841225: debug: disabling command: guest-suspend-hybrid > 1673846106.841232: debug: disabling command: guest-network-get-interfaces > 1673846106.841239: debug: disabling command: guest-get-vcpus > 1673846106.841245: debug: disabling command: guest-set-vcpus > 1673846106.841251: debug: disabling command: guest-get-disks > 1673846106.841257: debug: disabling command: guest-get-fsinfo > 1673846106.841265: debug: disabling command: guest-set-user-password > 1673846106.841272: debug: disabling command: guest-get-memory-blocks > 1673846106.841278: debug: disabling command: guest-set-memory-blocks > 1673846106.841286: debug: disabling command: guest-get-memory-block-info > 1673846106.841294: debug: disabling command: guest-exec-status > 1673846106.841303: debug: disabling command: guest-exec > 1673846106.841311: debug: disabling command: guest-get-host-name > 1673846106.841319: debug: disabling command: guest-get-users > 1673846106.841326: debug: disabling command: guest-get-timezone > 1673846106.841334: debug: disabling command: guest-get-osinfo > 1673846106.841343: debug: disabling command: guest-get-devices > 1673846106.841350: debug: disabling command: guest-ssh-get-authorized-keys > 1673846106.841356: debug: disabling command: guest-ssh-add-authorized-keys > 1673846106.841363: debug: disabling command: guest-ssh-remove-authorized-keys > 1673846106.841371: warning: disabling logging due to filesystem freeze Other than disabling the agent, is there any reason this is hapening? I can't think that Debian 11 is shipping with a broken kernel, but the 'qm guest cmd 152 fsfreeze-freeze' and 'qm guest cmd 152 fsfreeze-thaw' works fine from the host. Could this be something with the VirtIO pipe/IPC? Anyone else seeing this or have any ideas? -- Bryan Fields 727-409-1194 - Voice http://bryanfields.net