From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <d.tschlatscher@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 5F8FC709E6
 for <pve-devel@lists.proxmox.com>; Wed, 22 Jun 2022 14:58:02 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 57CF619C22
 for <pve-devel@lists.proxmox.com>; Wed, 22 Jun 2022 14:58:02 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id 8D38619C17
 for <pve-devel@lists.proxmox.com>; Wed, 22 Jun 2022 14:58:01 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 6405843C51
 for <pve-devel@lists.proxmox.com>; Wed, 22 Jun 2022 14:58:01 +0200 (CEST)
Message-ID: <5f8414ab-1ed3-acac-8ae7-3a5b8433e7d6@proxmox.com>
Date: Wed, 22 Jun 2022 14:57:59 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.10.0
Content-Language: en-US
To: Thomas Lamprecht <t.lamprecht@proxmox.com>,
 Proxmox VE development discussion <pve-devel@lists.proxmox.com>
References: <20220621152026.496514-1-d.tschlatscher@proxmox.com>
 <9ddd6d92-5442-7d4b-f21a-6605f8a6ff95@proxmox.com>
From: Daniel Tschlatscher <d.tschlatscher@proxmox.com>
In-Reply-To: <9ddd6d92-5442-7d4b-f21a-6605f8a6ff95@proxmox.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.107 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 NICE_REPLY_A           -0.001 Looks like a legit reply (A)
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 T_SCC_BODY_TEXT_LINE    -0.01 -
Subject: Re: [pve-devel] [PATCH qemu-server 1/2] fix #3502: VM start timeout
 config parameter
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Wed, 22 Jun 2022 12:58:02 -0000



On 6/21/22 17:52, Thomas Lamprecht wrote:
> fix #3502: add start timeout config parameter
> 
> Am 21/06/2022 um 17:20 schrieb Daniel Tschlatscher:
>> It was already possible to set the timeout parameter for the VM config
>> via the API. However, the value was not considered when the function
>> config_aware_timeout() was called.
> 
> but you only add the timeout param now?! IOW. it was never possible to set
> a timeout config, but one could set the parameter for a single start.
> Please reword it, as of is its confusing.
> 
>> Now, if the timeout parameter is set, it will override the heuristic
>> calculation of the VM start timeout.
> 
> you mean if a timeout config property/option is set, as the API parameter
> was already honored in vm_start_nolock:
> 

Yes, I misunderstood part of the code handling the timeout parameter
when passed via the API call (so, when it is not set in the config)
I took another look and AFAICT, the code for setting the timeout in the
config should work as intended though (i.e. the code in this patch)

Still, I will definitely reword it to make it clear what I actually meant!

> my $start_timeout = $params->{timeout} // config_aware_timeout($conf, $resume);
> 
>>
>> During testing I found a problem where really big values (10^20)+
>> would be converted to scientific notation, which means they no longer
>> pass the integer type check. To get around this, I set the maximum
>> value for the timeout to 2,680,000 seconds, which is around 31 days.
>> This I'd wager, is an upper limit in which nobody should realistically
>> run into.
> 
> 24h is already really big enough, so please use 86000, better to go lower
> (but still reasonable) first, we can always easily relax it, but not really
> make it stricter without breaking existing setups.
> 

Alright!

> Also, did you test HA for the case when a really long timeout is set and
> triggers (by faking it with some sleep added somewhere)?
> 

No, I was not aware of this potential issue. I will look into it

>>
>> Signed-off-by: Daniel Tschlatscher <d.tschlatscher@proxmox.com>
>> ---
>>  PVE/API2/Qemu.pm          | 1 +
>>  PVE/QemuServer.pm         | 7 +++++++
>>  PVE/QemuServer/Helpers.pm | 3 +++
>>  3 files changed, 11 insertions(+)
>>
>> diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
>> index a824657..d0a4eaa 100644
>> --- a/PVE/API2/Qemu.pm
>> +++ b/PVE/API2/Qemu.pm
>> @@ -2494,6 +2494,7 @@ __PACKAGE__->register_method({
>>  		description => "Wait maximal timeout seconds.",
>>  		type => 'integer',
>>  		minimum => 0,
>> +		maximum => 2680000,
>>  		default => 'max(30, vm memory in GiB)',
>>  		optional => 1,
>>  	    },
>> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
>> index e9aa248..81a7f6d 100644
>> --- a/PVE/QemuServer.pm
>> +++ b/PVE/QemuServer.pm
>> @@ -713,6 +713,13 @@ EODESCR
>>  	description => "Some (read-only) meta-information about this guest.",
>>  	optional => 1,
>>      },
>> +    timeout => {
> 
> "timeout what"? in the API it's clear as there it's for an specific actions, but
> as config options one cannot have any idea about what this timeout actually does
> with such an overly generic name.
> 
> I'd maybe put it behind a 'start-options' format string, then it could keep the
> simple name and would be still telling, it would also allow to put any future
> startup related options in there, so not bloating config line amount up too much.
> 
> start-options: timeout=12345
> 
>> +	optional => 1,
>> +	type => 'integer',
>> +	description => 'The maximum timeout to wait for a VM to start',
> 
> Please describe what is done when this isn't passed. FWIW, there's the
> "verbose_description" schema property for very long ones, but I think
> a few sentences describing config_aware_timeout could still fit in the
> normal "description" one.
> 
>> +	minimum => 0,
>> +	maximum => 2680000,
>> +    }
>>  };
>>  
>>  my $cicustom_fmt = {
>> diff --git a/PVE/QemuServer/Helpers.pm b/PVE/QemuServer/Helpers.pm
>> index c10d842..c26d0dc 100644
>> --- a/PVE/QemuServer/Helpers.pm
>> +++ b/PVE/QemuServer/Helpers.pm
>> @@ -142,6 +142,9 @@ sub version_cmp {
>>  
>>  sub config_aware_timeout {
>>      my ($config, $is_suspended) = @_;
>> +
>> +    return $config->{timeout} if defined($config->{timeout});
>> +
>>      my $memory = $config->{memory};
>>      my $timeout = 30;
>>  
> 

Thanks for taking a look. I will look into it and especially, reword the
commits and increase the description verbosity