From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id D410468A95 for ; Thu, 10 Mar 2022 08:36:39 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id C2F3076C9 for ; Thu, 10 Mar 2022 08:36:09 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id BFD5C76BF for ; Thu, 10 Mar 2022 08:36:07 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 8AE9740AE6; Thu, 10 Mar 2022 08:36:07 +0100 (CET) Message-ID: <84a8516f-4aa2-f111-339c-b8bc2840d15f@proxmox.com> Date: Thu, 10 Mar 2022 08:36:01 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.6.2 Content-Language: en-US To: Mark Schouten Cc: Proxmox VE development discussion References: <20211123115949.2462727-1-f.ebner@proxmox.com> <5CC63593-424B-4439-93FB-BFFD6B087952@tuxis.nl> <9C5B831F-C706-4834-B38B-D5BEEE5B32DA@tuxis.nl> From: Fabian Ebner In-Reply-To: <9C5B831F-C706-4834-B38B-D5BEEE5B32DA@tuxis.nl> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.122 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -0.001 Looks like a legit reply (A) SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com] Subject: Re: [pve-devel] [PATCH kernel] Backport two io-wq fixes relevant for io_uring X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Mar 2022 07:36:39 -0000 Am 09.03.22 um 14:38 schrieb Mark Schouten: > Hi, > > Ok. Funny enough, I suffered from this issue last night during maintenance… But since I have not had this issue before, I had no snapshot… > > I guess the snapshot would need to have a state as well, right? So the machine is ‘booted’ as it was. > Not necessarily. It'd be enough if the issue can be reproduced by doing a cold boot, maybe some specific operation afterwards, then reboot. From there, it'd be very easy to create such a snapshot for further testing. > — > Mark Schouten, CTO > Tuxis B.V. > mark@tuxis.nl > > > >> On 9 Mar 2022, at 08:31, Fabian Ebner wrote: >> >> Am 08.03.22 um 17:19 schrieb Mark Schouten: >>> Hi, >>> >>> So should I try and find someone who is able to reproduce this with a test-machine and is able to give you remote access to debug? Would that help? >>> >> >> It would certainly increase the likelihood of finding the issue. Since >> it only happens on 7.x, it's likely a regression. Ideally, there needs >> to be a snapshot of a problematic VM before the reboot, so that it can >> be quickly tested against with e.g. different builds of QEMU/kernel. >> Providing such a VM with snapshot state would of course be an >> alternative to remote access. >> >>> — >>> Mark Schouten, CTO >>> Tuxis B.V. >>> mark@tuxis.nl >>> >>> >>> >>>> On 8 Mar 2022, at 10:12, Fabian Ebner wrote: >>>> >>>> Am 07.03.22 um 15:51 schrieb Mark Schouten: >>>>> Hi, >>>>> >>>>> Sorry for getting back on this thread after a few months, but is the Windows-case mentioned here the case that is discussed in this forum-thread: >>>>> https://forum.proxmox.com/threads/windows-vms-stuck-on-boot-after-proxmox-upgrade-to-7-0.100744/page-3 >>>>> >>>>> ? >>>> >>>> Hi, >>>> the symptoms there sound rather different. The issue addressed by this >>>> patch was about a QEMU process getting completely stuck on I/O while the >>>> VM was live already. "completely" meant that e.g. connecting for the >>>> display also would fail and there would be messages like >>>> >>>> VM 182 qmp command failed - VM 182 qmp command 'query-proxmox-support' >>>> failed - unable to connect to VM 182 qmp socket - timeout after 31 retries >>>> >>>> in the syslog. The issue described in the forum thread reads like it >>>> happens only upon reboot from inside the guest and nobody mentioned >>>> messages like the above. >>>> >>>>> >>>>> If so, should this be investigated further or are there other issues? I have personally not had the issue mentioned in the forum, but quite a few people seem to be suffering from issues with Windows VMs, which is currently holding us back from upgrading from 6.x to 7.x on a whole bunch of customer clusters. >>>> >>>> I also haven't seen the issue myself yet and haven't heard from any >>>> colleagues either. Without a reproducer, it's very difficult to debug. >>>> >>>>> >>>>> Thanks, >>>>> >>>>> — >>>>> Mark Schouten, CTO >>>>> Tuxis B.V. >>>>> mark@tuxis.nl >>>> >>> >>> >>> >> > > >