From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id AE6AA95235 for ; Tue, 17 Jan 2023 19:59:45 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 9013F180AE for ; Tue, 17 Jan 2023 19:59:15 +0100 (CET) Received: from mx0.matrixscience.co.uk (mx0.matrixscience.co.uk [188.215.17.82]) by firstgate.proxmox.com (Proxmox) with ESMTP for ; Tue, 17 Jan 2023 19:59:13 +0100 (CET) Received: from [10.200.20.3] (roadies-10-200-20-3.matrixscience.co.uk [10.200.20.3]) by mx0.matrixscience.co.uk (Postfix) with ESMTP id 591F72C022F; Tue, 17 Jan 2023 18:59:13 +0000 (GMT) Message-ID: <65cc3f98-4eab-0c22-1571-0f72eacc6449@matrixscience.com> Date: Tue, 17 Jan 2023 18:59:10 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.6.1 Content-Language: en-US To: Roland , Proxmox VE user list References: <4ac6cb85-bf4e-ff81-e120-7365be0f1c10@matrixscience.com> <8d7aca90-efda-2a8f-9ca2-68792fe258cc@web.de> From: Adam Weremczuk Organization: Matrix Science Ltd In-Reply-To: <8d7aca90-efda-2a8f-9ca2-68792fe258cc@web.de> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.049 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -0.097 Looks like a legit reply (A) SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [theregister.com, proxmox.com] Subject: Re: [PVE-User] Proxmox VM hard resets X-BeenThere: pve-user@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE user list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Jan 2023 18:59:45 -0000 I have dozens of Debian VMs (including 11) on VMware 7.0.2 being backed up with Altaro daily and never seen this anywhere else. Even to Proxmox it happens only once in a while (3 times so far). I'm now down to 4 containers with low/moderate load, no extreme spikes. What doesn't feel right is they take up to 5 mins to reboot to a login prompt and start responding to pings. The consoles remain pitch black and unresponsive until then. It was the same when I ran Proxmox on bare metal. Maybe these 2 issues are related? On 17/01/2023 17:45, Roland wrote: > can you reproduce this with debian 11 or ubuntu 22 VM (create some load > there), i think this is not a proxmox problem which can be solved at the > proxmox/vm-guest level > > see > https://www.theregister.com/2017/11/28/stunning_antistun_vm_stun_problem_fix/ > > for example > > roland > > Am 17.01.23 um 16:04 schrieb Adam Weremczuk: >> Hi all, >> >> My environment is quite unusual as I run PVE 7.2-11 as a VM on VMware >> 7.0.2. It runs several LXC containers and generally things are working >> fine. >> >> Recently the Proxmox VM (called "jaguar") started resetting itself >> (and all containers) shortly after Altaro VM Backup kicked off a >> scheduled VM backup over the network. >> Each time a hard reset was requested by the OS itself (Proxmox >> hypervisor). >> >> The time of the "stun/unstun" operation seems to be causing the issue >> here i.e. usually the stun/unstun operation should take a very short >> amount of time, however, in my case, depending on the load on both the >> hypervisor and the guest VM (nested hypervisor), that time can vary >> and take a bit longer, snippet below from various stun/unstun >> operations: >> >> 2023-01-12T23:00:55.407Z| vcpu-0| | I005: CPT: vm was stunned for >> 32142467 us >> 2023-01-12T23:01:12.848Z| vcpu-0| | I005: CPT: vm was stunned for >> 14942070 us >> 2023-01-12T23:11:35.984Z| vcpu-0| opID=1487b0d5| I005: CPT: vm was >> stunned for 277986 us >> 2023-01-12T23:11:39.431Z| vcpu-0| | I005: CPT: vm was stunned for >> 122089 us >> >> As you can see the stun time is different between each disk, now what >> I think that is happening here is depending on the stun/unstun time of >> the VM (virtualized hypervisor), the virtualized hypervisor watchdog >> is noticing that the OS is being frozen for a X amount time and >> issuing a hard reset. I guess when the stun time is over 30 sec, the >> guest OS is issuing a hard reset. >> >> 2023-01-12T23:00:55.407Z| vcpu-0| | I005: CPT: vm was stunned for >> 32142467 us >> 2023-01-12T23:00:55.407Z| vcpu-0| | I005: SnapshotVMXTakeSnapshotWork: >> Transition to mode 1. >> 2023-01-12T23:00:55.407Z| vcpu-0| | I005: >> SnapshotVMXTakeSnapshotComplete: Done with snapshot >> 'ALTAROTEMPSNAPSHOTDONOTDELETE463b73a7-f363-4daf-acf3-b0322fe84429': 95 >> 2023-01-12T23:00:55.407Z| vcpu-0| | I005: >> VigorTransport_ServerSendResponse opID=1487b008 seq=887616: Completed >> Snapshot request. >> 2023-01-12T23:00:55.409Z| vcpu-8| | I005: HBACommon: First write on >> scsi0:0.fileName='/vmfs/volumes/61364720-e494cfe4-6cff-b083fed97d91/jaguar/jaguar-000001.vmdk' >> >> 2023-01-12T23:00:55.409Z| vcpu-8| | I005: DDB: "longContentID" = >> "08bf301ae8e75c151d2f273571a4ea9f" (was >> "2a6fd4c33a60f8d724ccc100a666f0d7") >> 2023-01-12T23:00:57.906Z| vcpu-8| | I005: DISKLIB-CHAIN : >> DiskChainUpdateContentID: old=0xa666f0d7, new=0x71a4ea9f >> (08bf301ae8e75c151d2f273571a4ea9f) >> 2023-01-12T23:00:57.906Z| vcpu-9| | I005: Chipset: The guest has >> requested that the virtual machine be hard reset. >> >> I'm struggling to establish how the watchdog timer (or equivalent) is >> configured :( Maybe increasing its trigger time would solve the issue? >> >> Any other ideas / similar experiences? >> >> Regards, >> Adam >> >> >> _______________________________________________ >> pve-user mailing list >> pve-user@lists.proxmox.com >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user >>