From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pve-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9])
	by lore.proxmox.com (Postfix) with ESMTPS id 13A0D1FF16E
	for <inbox@lore.proxmox.com>; Mon, 31 Mar 2025 16:55:46 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id A8D3095BA;
	Mon, 31 Mar 2025 16:55:19 +0200 (CEST)
From: Fiona Ebner <f.ebner@proxmox.com>
To: pve-devel@lists.proxmox.com
Date: Mon, 31 Mar 2025 16:55:01 +0200
Message-Id: <20250331145507.196208-1-f.ebner@proxmox.com>
X-Mailer: git-send-email 2.39.5
MIME-Version: 1.0
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.038 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: [pve-devel] [PATCH-SERIES qemu 0/6] async snapshot improvements
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
Reply-To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: pve-devel-bounces@lists.proxmox.com
Sender: "pve-devel" <pve-devel-bounces@lists.proxmox.com>

Most importantly, start using a dedicated IO thread for the state
file when doing a live snapshot.

Having the state file be in the iohandler context means that a
blk_drain_all() call in the main thread or vCPU thread that happens
while the snapshot is running will result in a deadlock.

This change should also help in general to reduce load on the main
thread and for it to get stuck on IO, i.e. same benefits as using a
dedicated IO thread for regular drives. This is particularly
interesting when the VM state storage is a network storage like NFS.

With some luck, it could also help with bug #6262 [0]. The failure
there happens while issuing/right after the savevm-start QMP command,
so the most likely coroutine is the process_savevm_co() that was
previously scheduled to the iohandler context. Likely someone polls
the iohandler context and wants to enter the already scheduled
coroutine leading to the abort():
> qemu_aio_coroutine_enter: Co-routine was already scheduled in 'aio_co_schedule'
With a dedicated iothread, there hopefully is no such race.


Additionally, fix up some edge cases in error handling and setting the
state of the snapshot operation.


[0]: https://bugzilla.proxmox.com/show_bug.cgi?id=6262

Fiona Ebner (6):
  savevm-async: improve setting state of snapshot operation in
    savevm-end handler
  savevm-async: rename saved_vm_running to vm_needs_start
  savevm-async: improve runstate preservation
  savevm-async: cleanup error handling in savevm_start
  savevm-async: use dedicated iothread for state file
  savevm-async: treat failure to set iothread context as a hard failure

 migration/savevm-async.c | 119 +++++++++++++++++++++++----------------
 1 file changed, 69 insertions(+), 50 deletions(-)

-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel