From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <d.csapak@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id EB5AC62EA8
 for <pve-devel@lists.proxmox.com>; Fri, 18 Sep 2020 16:20:47 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id DCF1716E67
 for <pve-devel@lists.proxmox.com>; Fri, 18 Sep 2020 16:20:47 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [212.186.127.180])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id 7EB3916E58
 for <pve-devel@lists.proxmox.com>; Fri, 18 Sep 2020 16:20:46 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 4A8034543E
 for <pve-devel@lists.proxmox.com>; Fri, 18 Sep 2020 16:20:46 +0200 (CEST)
To: pve-devel@lists.proxmox.com
References: <20200903085851.5073-1-s.reiter@proxmox.com>
From: Dominik Csapak <d.csapak@proxmox.com>
Message-ID: <b7c67146-4daf-8979-78a3-dc54d090f0f8@proxmox.com>
Date: Fri, 18 Sep 2020 16:20:45 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:81.0) Gecko/20100101
 Thunderbird/81.0
MIME-Version: 1.0
In-Reply-To: <20200903085851.5073-1-s.reiter@proxmox.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.517 Adjusted score from AWL reputation of From: address
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 NICE_REPLY_A           -0.001 Looks like a legit reply (A)
 RCVD_IN_DNSWL_MED        -2.3 Sender listed at https://www.dnswl.org/,
 medium trust
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [qemuserver.pm]
Subject: Re: [pve-devel] [PATCH 0/7] Handle guest shutdown during backups
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Fri, 18 Sep 2020 14:20:48 -0000

overall the series looks ok to me (and tested ok), though a few points:

* i'd really like for someone else to look over this too
   maybe someone who is really good with c ( wolfgang, when you're back 
from holidays? ;) )
* regarding the killing, i'd like better what we already discussed off-list:
   maybe using either the 'quit' command via qmp, or using a pidfd
   to avoid races
* at this point, i think a rewrite in rust could be good,
   before we tack even more features onto this?
   (we have all that we need to handle this, mio,serde_json, etc.)


On 9/3/20 10:58 AM, Stefan Reiter wrote:
> Use QEMU's -no-shutdown argument so the QEMU instance stays alive even if the
> guest shuts down. This allows running backups to continue.
> 
> To handle cleanup of QEMU processes, this series extends the qmeventd to handle
> SHUTDOWN events not just for detecting guest triggered shutdowns, but also to
> clean the QEMU process via SIGTERM (which quits it even with -no-shutdown
> enabled).
> 
> A VZDump instance can then signal qmeventd (via the /var/run/qmeventd.sock) to
> keep alive certain VM processes if they're backing up, and once the backup is
> done, they close their connection to the socket, and qmeventd knows that it can
> now safely kill the VM (as long as the guest hasn't booted again, which is
> possible with some changes to the vm_start code also done in this series).
> 
> This series requires a lot of testing, since there can be quite a few edge cases
> lounging around. So far it's been doing well for me, aside from the VNC GUI
> looking a bit confused when you do the 'shutdown during backup' motion (i.e. the
> last image from the framebuffer stays in the VNC window, looks more like the
> guest has crashed than shut down) - but I haven't found a solution for that.
> 
> 
> qemu-server: Stefan Reiter (6):
>    qmeventd: add handling for -no-shutdown QEMU instances
>    qmeventd: add last-ditch effort SIGKILL cleanup
>    vzdump: connect to qmeventd for duration of backup
>    vzdump: use dirty bitmap for not running VMs too
>    config_to_command: use -no-shutdown option
>    fix vm_resume and allow vm_start with QMP status 'shutdown'
> 
>   PVE/QemuServer.pm                             |  25 +-
>   PVE/VZDump/QemuServer.pm                      |  40 ++-
>   debian/control                                |   1 +
>   qmeventd/Makefile                             |   4 +-
>   qmeventd/qmeventd.c                           | 331 ++++++++++++++++--
>   qmeventd/qmeventd.h                           |  41 ++-
>   .../custom-cpu-model-defaults.conf.cmd        |   1 +
>   .../custom-cpu-model-host-phys-bits.conf.cmd  |   1 +
>   test/cfg2cmd/custom-cpu-model.conf.cmd        |   1 +
>   test/cfg2cmd/efi-raw-old.conf.cmd             |   1 +
>   test/cfg2cmd/efi-raw.conf.cmd                 |   1 +
>   test/cfg2cmd/i440fx-win10-hostpci.conf.cmd    |   1 +
>   test/cfg2cmd/minimal-defaults.conf.cmd        |   1 +
>   test/cfg2cmd/netdev.conf.cmd                  |   1 +
>   test/cfg2cmd/pinned-version.conf.cmd          |   1 +
>   .../q35-linux-hostpci-multifunction.conf.cmd  |   1 +
>   test/cfg2cmd/q35-linux-hostpci.conf.cmd       |   1 +
>   test/cfg2cmd/q35-win10-hostpci.conf.cmd       |   1 +
>   test/cfg2cmd/simple-virtio-blk.conf.cmd       |   1 +
>   test/cfg2cmd/simple1.conf.cmd                 |   1 +
>   test/cfg2cmd/spice-enhancments.conf.cmd       |   1 +
>   test/cfg2cmd/spice-linux-4.1.conf.cmd         |   1 +
>   test/cfg2cmd/spice-usb3.conf.cmd              |   1 +
>   test/cfg2cmd/spice-win.conf.cmd               |   1 +
>   24 files changed, 410 insertions(+), 50 deletions(-)
> 
> manager: Stefan Reiter (1):
>    ui: qemu: set correct disabled state for start button
> 
>   www/manager6/qemu/Config.js | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
>