public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Max Carrara" <m.carrara@proxmox.com>
To: "Proxmox VE development discussion" <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [PATCH storage 2/2] fix #6224: disks: get: set timeout for retrieval of SMART stat data
Date: Fri, 11 Apr 2025 18:04:52 +0200	[thread overview]
Message-ID: <D93XRB2F7AS8.365FZ2C4M427F@proxmox.com> (raw)
In-Reply-To: <20250411150831.255017-2-d.kral@proxmox.com>

On Fri Apr 11, 2025 at 5:08 PM CEST, Daniel Kral wrote:
> In rare scenarios, `smartctl` takes up to 60 seconds to timeout for SCSI
> commands to be completed, as reported in our user forum [0] and bugzilla
> [1]. It seems that USB drives handled by the USB Attached SCSI (UAS)
> kernel module are more likely to be affected by this [2], but is more of
> a case-by-case situation.
>
> Therefore, set a more reasonable timeout of 10 seconds, so that callers
> don't have to wait too long or seem unresponsive (e.g. Node Disks view
> in the WebGUI).
>
> [0] https://forum.proxmox.com/threads/164799/
> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=6224
> [2] https://www.smartmontools.org/wiki/SAT-with-UAS-Linux
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> As mentioned in the Bugzilla and indicated above, I haven't found any
> clear indicator for this happening besides that the most affected
> devices seem to be USB devices, which use the mentioned UAS kernel
> module.

Have you perhaps found any way to test this? I could then try to
replicate this behaviour. Otherwise no hard feelings; I think setting a
shorter timeout for (usually) smaller commands is something we should do
in general.

(That being said, looking through the code of PVE::Tools::run_command---
I'm surprised we don't set a default timeout there at all. I think
introducing one there could perhaps break something unexpected, though,
so I'd rather not touch it.)

>
> I'm fine lowering the timeout further, but 10 seconds seemed reasonable
> if only one disk is affected for now, so that loading takes some time
> and not seemingly forever.

Given that I've never had a single device take longer than a split
second, I think this is quite reasonable too.

>
> I was also thinking about just caching which disks have had that
> behavior and just not running the command for them, but I thought this
> would add more complexity than needed here.

I agree that this would be a little too much; you'd also have to
invalidate cache entries after a certain time / a certain condition etc.
You'd also have to handle the case where the disk starts to magically
respond to `smartctl` again. Better to just keep the timeout here as-is.


Either way, nice work! For both patches, consider:

Reviewed-by: Max Carrara <m.carrara@proxmox.com>

(Though, I'd still like to test this somehow, if you found a way to do so)

>
>  src/PVE/Diskmanage.pm | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/PVE/Diskmanage.pm b/src/PVE/Diskmanage.pm
> index 059d645..6aa1338 100644
> --- a/src/PVE/Diskmanage.pm
> +++ b/src/PVE/Diskmanage.pm
> @@ -98,7 +98,7 @@ sub get_smart_data {
>      push @$cmd, $disk;
>  
>      my $returncode = eval {
> -	run_command($cmd, noerr => 1, outfunc => sub {
> +	run_command($cmd, noerr => 1, timeout => 10, outfunc => sub {
>  	    my ($line) = @_;
>  
>  # ATA SMART attributes, e.g.:



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


  reply	other threads:[~2025-04-11 16:05 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-11 15:08 [pve-devel] [PATCH storage 1/2] disks: get: separate error path for retrieving SMART data Daniel Kral
2025-04-11 15:08 ` [pve-devel] [PATCH storage 2/2] fix #6224: disks: get: set timeout for retrieval of SMART stat data Daniel Kral
2025-04-11 16:04   ` Max Carrara [this message]
2025-04-15  6:42     ` Daniel Kral
2025-04-11 15:52 ` [pve-devel] [PATCH storage 1/2] disks: get: separate error path for retrieving SMART data Max Carrara
2025-04-15  7:15   ` Daniel Kral

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D93XRB2F7AS8.365FZ2C4M427F@proxmox.com \
    --to=m.carrara@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal