From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pve-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
	by lore.proxmox.com (Postfix) with ESMTPS id 752AF1FF164
	for <inbox@lore.proxmox.com>; Fri, 11 Apr 2025 18:05:32 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 7A5741E197;
	Fri, 11 Apr 2025 18:05:25 +0200 (CEST)
Mime-Version: 1.0
Date: Fri, 11 Apr 2025 18:04:52 +0200
Message-Id: <D93XRB2F7AS8.365FZ2C4M427F@proxmox.com>
To: "Proxmox VE development discussion" <pve-devel@lists.proxmox.com>
From: "Max Carrara" <m.carrara@proxmox.com>
X-Mailer: aerc 0.18.2-0-ge037c095a049
References: <20250411150831.255017-1-d.kral@proxmox.com>
 <20250411150831.255017-2-d.kral@proxmox.com>
In-Reply-To: <20250411150831.255017-2-d.kral@proxmox.com>
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.079 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [smartmontools.org, diskmanage.pm, proxmox.com]
Subject: Re: [pve-devel] [PATCH storage 2/2] fix #6224: disks: get: set
 timeout for retrieval of SMART stat data
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
Reply-To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: pve-devel-bounces@lists.proxmox.com
Sender: "pve-devel" <pve-devel-bounces@lists.proxmox.com>

On Fri Apr 11, 2025 at 5:08 PM CEST, Daniel Kral wrote:
> In rare scenarios, `smartctl` takes up to 60 seconds to timeout for SCSI
> commands to be completed, as reported in our user forum [0] and bugzilla
> [1]. It seems that USB drives handled by the USB Attached SCSI (UAS)
> kernel module are more likely to be affected by this [2], but is more of
> a case-by-case situation.
>
> Therefore, set a more reasonable timeout of 10 seconds, so that callers
> don't have to wait too long or seem unresponsive (e.g. Node Disks view
> in the WebGUI).
>
> [0] https://forum.proxmox.com/threads/164799/
> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=6224
> [2] https://www.smartmontools.org/wiki/SAT-with-UAS-Linux
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> As mentioned in the Bugzilla and indicated above, I haven't found any
> clear indicator for this happening besides that the most affected
> devices seem to be USB devices, which use the mentioned UAS kernel
> module.

Have you perhaps found any way to test this? I could then try to
replicate this behaviour. Otherwise no hard feelings; I think setting a
shorter timeout for (usually) smaller commands is something we should do
in general.

(That being said, looking through the code of PVE::Tools::run_command---
I'm surprised we don't set a default timeout there at all. I think
introducing one there could perhaps break something unexpected, though,
so I'd rather not touch it.)

>
> I'm fine lowering the timeout further, but 10 seconds seemed reasonable
> if only one disk is affected for now, so that loading takes some time
> and not seemingly forever.

Given that I've never had a single device take longer than a split
second, I think this is quite reasonable too.

>
> I was also thinking about just caching which disks have had that
> behavior and just not running the command for them, but I thought this
> would add more complexity than needed here.

I agree that this would be a little too much; you'd also have to
invalidate cache entries after a certain time / a certain condition etc.
You'd also have to handle the case where the disk starts to magically
respond to `smartctl` again. Better to just keep the timeout here as-is.


Either way, nice work! For both patches, consider:

Reviewed-by: Max Carrara <m.carrara@proxmox.com>

(Though, I'd still like to test this somehow, if you found a way to do so)

>
>  src/PVE/Diskmanage.pm | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/PVE/Diskmanage.pm b/src/PVE/Diskmanage.pm
> index 059d645..6aa1338 100644
> --- a/src/PVE/Diskmanage.pm
> +++ b/src/PVE/Diskmanage.pm
> @@ -98,7 +98,7 @@ sub get_smart_data {
>      push @$cmd, $disk;
>  
>      my $returncode = eval {
> -	run_command($cmd, noerr => 1, outfunc => sub {
> +	run_command($cmd, noerr => 1, timeout => 10, outfunc => sub {
>  	    my ($line) = @_;
>  
>  # ATA SMART attributes, e.g.:



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel