From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <d.csapak@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with UTF8SMTPS id 6906869F80
 for <pve-devel@lists.proxmox.com>; Thu, 21 Jan 2021 10:26:56 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with UTF8SMTP id 56E0DF5BF
 for <pve-devel@lists.proxmox.com>; Thu, 21 Jan 2021 10:26:26 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [212.186.127.180])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with UTF8SMTPS id E9FD2F5B2
 for <pve-devel@lists.proxmox.com>; Thu, 21 Jan 2021 10:26:24 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with UTF8SMTP id BB36C460A0;
 Thu, 21 Jan 2021 10:26:24 +0100 (CET)
To: Fabian Ebner <f.ebner@proxmox.com>, pve-devel@lists.proxmox.com
References: <20210120123204.11176-1-d.csapak@proxmox.com>
 <b96b058f-00cc-0574-98e9-ccd6773e6bed@proxmox.com>
From: Dominik Csapak <d.csapak@proxmox.com>
Message-ID: <7b32d46c-dc79-07d0-289a-107fbf26a4f0@proxmox.com>
Date: Thu, 21 Jan 2021 10:26:23 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:85.0) Gecko/20100101
 Thunderbird/85.0
MIME-Version: 1.0
In-Reply-To: <b96b058f-00cc-0574-98e9-ccd6773e6bed@proxmox.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 8bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.313 Adjusted score from AWL reputation of From: address
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 NICE_REPLY_A           -0.094 Looks like a legit reply (A)
 RCVD_IN_DNSWL_MED        -2.3 Sender listed at https://www.dnswl.org/,
 medium trust
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [qemu.pm, qemuserver.pm]
Subject: Re: [pve-devel] [PATCH qemu-server v3] fix #2788: do not resume vms
 after backup if they were paused before
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Thu, 21 Jan 2021 09:26:56 -0000

On 1/20/21 3:15 PM, Fabian Ebner wrote:
> One minor thing inline, and that likely needs a fix elsewhere if we even 
> want to bother. Other than that:
> 
> Tested-by: Fabian Ebner <f.ebner@proxmox.com>
> 
> On 20.01.21 13:32, Dominik Csapak wrote:
>> by checking if the vm is paused at the beginning and skipping the resume
>> now we also skip the qga freeze/thaw (which cannot work if the vm is
>> paused)
>>
>> moved the 'vm_is_paused' sub from the api to PVE/QemuServer.pm so it is
>> available everywhere we need it.
>>
>> since a suspend backup would pause the vm anyway, we can skip that step
>> also
>>
>> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
>> ---
>> changes to v2:
>> * actually include code changes (git add is not optional), sry for the 
>> noise
>>
>>   PVE/API2/Qemu.pm         | 14 ++------------
>>   PVE/QemuServer.pm        | 10 ++++++++++
>>   PVE/VZDump/QemuServer.pm | 11 +++++++++--
>>   3 files changed, 21 insertions(+), 14 deletions(-)
>>
>> diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
>> index e2d2d67..623c998 100644
>> --- a/PVE/API2/Qemu.pm
>> +++ b/PVE/API2/Qemu.pm
>> @@ -2402,16 +2402,6 @@ __PACKAGE__->register_method({
>>       return $rpcenv->fork_worker('qmreset', $vmid, $authuser, $realcmd);
>>       }});
>> -my sub vm_is_paused {
>> -    my ($vmid) = @_;
>> -    my $qmpstatus = eval {
>> -    PVE::QemuConfig::assert_config_exists_on_node($vmid);
>> -    mon_cmd($vmid, "query-status");
>> -    };
>> -    warn "$@\n" if $@;
>> -    return $qmpstatus && $qmpstatus->{status} eq "paused";
>> -}
>> -
>>   __PACKAGE__->register_method({
>>       name => 'vm_shutdown',
>>       path => '{vmid}/status/shutdown',
>> @@ -2480,7 +2470,7 @@ __PACKAGE__->register_method({
>>       #
>>       # checking the qmp status here to get feedback to the gui/cli/api
>>       # and the status query should not take too long
>> -    if (vm_is_paused($vmid)) {
>> +    if (PVE::QemuServer::vm_is_paused($vmid)) {
>>           if ($param->{forceStop}) {
>>           warn "VM is paused - stop instead of shutdown\n";
>>           $shutdown = 0;
>> @@ -2556,7 +2546,7 @@ __PACKAGE__->register_method({
>>       my $node = extract_param($param, 'node');
>>       my $vmid = extract_param($param, 'vmid');
>> -    die "VM is paused - cannot shutdown\n" if vm_is_paused($vmid);
>> +    die "VM is paused - cannot shutdown\n" if 
>> PVE::QemuServer::vm_is_paused($vmid);
>>       die "VM $vmid not running\n" if 
>> !PVE::QemuServer::check_running($vmid);
>> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
>> index 54278e5..d7ee05f 100644
>> --- a/PVE/QemuServer.pm
>> +++ b/PVE/QemuServer.pm
>> @@ -7382,4 +7382,14 @@ sub complete_migration_storage {
>>       return $res;
>>   }
>> +sub vm_is_paused {
>> +    my ($vmid) = @_;
>> +    my $qmpstatus = eval {
>> +    PVE::QemuConfig::assert_config_exists_on_node($vmid);
>> +    mon_cmd($vmid, "query-status");
>> +    };
>> +    warn "$@\n" if $@;
>> +    return $qmpstatus && $qmpstatus->{status} eq "paused";
>> +}
>> +
>>   1;
>> diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm
>> index b322701..b5e74d3 100644
>> --- a/PVE/VZDump/QemuServer.pm
>> +++ b/PVE/VZDump/QemuServer.pm
>> @@ -62,8 +62,11 @@ sub prepare {
>>       if defined($conf->{name});
>>       $self->{vm_was_running} = 1;
>> +    $self->{vm_was_paused} = 0;
>>       if (!PVE::QemuServer::check_running($vmid)) {
>>       $self->{vm_was_running} = 0;
>> +    } elsif (PVE::QemuServer::vm_is_paused($vmid)) {
>> +    $self->{vm_was_paused} = 1;
>>       }
>>       $task->{hostname} = $conf->{name};
>> @@ -176,12 +179,16 @@ sub start_vm {
>>   sub suspend_vm {
>>       my ($self, $task, $vmid) = @_;
>> +    return if $self->{vm_was_paused};
>> +
>>       $self->cmd ("qm suspend $vmid --skiplock");
>>   }
>>   sub resume_vm {
>>       my ($self, $task, $vmid) = @_;
>> +    return if $self->{vm_was_paused};
>> +
>>       $self->cmd ("qm resume $vmid --skiplock");
>>   }
>> @@ -794,7 +801,7 @@ sub _get_task_devlist {
>>   sub qga_fs_freeze {
>>       my ($self, $task, $vmid) = @_;
>> -    return if !$self->{vmlist}->{$vmid}->{agent} || $task->{mode} eq 
>> 'stop' || !$self->{vm_was_running};
>> +    return if !$self->{vmlist}->{$vmid}->{agent} || $task->{mode} eq 
>> 'stop' || !$self->{vm_was_running} || $self->{vm_was_paused};
> 
> It's possible to suspend and resume VMs during backup. E.g. somebody 
> else can resume the VM (even without using --skiplock) right after the 
> backup task called suspend_vm. Relying on vm_was_paused here leads to 
> skipping the fs-freeze for such an (admittedly unlikely) edge case.

i mean i get why it is possible to resume / suspend during backup,
but really this should not be possible until the backup started

so maybe we have to have some kind of intermediate locking?

alternatively we can replace that check here
with an actual qmp status check (that is also racy though...)

also if i suspend between the paused check and the resume part
it will resume it again..

> 
>>       if (!PVE::QemuServer::qga_check_running($vmid, 1)) {
>>       $self->loginfo("skipping guest-agent 'fs-freeze', agent 
>> configured but not running?");
>> @@ -872,7 +879,7 @@ sub register_qmeventd_handle {
>>   sub resume_vm_after_job_start {
>>       my ($self, $task, $vmid) = @_;
>> -    return if !$self->{vm_was_running};
>> +    return if !$self->{vm_was_running} || $self->{vm_was_paused};
>>       if (my $stoptime = $task->{vmstoptime}) {
>>       my $delay = time() - $task->{vmstoptime};
>>