From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <pve-devel-bounces@lists.proxmox.com> Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 752AF1FF164 for <inbox@lore.proxmox.com>; Fri, 11 Apr 2025 18:05:32 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 7A5741E197; Fri, 11 Apr 2025 18:05:25 +0200 (CEST) Mime-Version: 1.0 Date: Fri, 11 Apr 2025 18:04:52 +0200 Message-Id: <D93XRB2F7AS8.365FZ2C4M427F@proxmox.com> To: "Proxmox VE development discussion" <pve-devel@lists.proxmox.com> From: "Max Carrara" <m.carrara@proxmox.com> X-Mailer: aerc 0.18.2-0-ge037c095a049 References: <20250411150831.255017-1-d.kral@proxmox.com> <20250411150831.255017-2-d.kral@proxmox.com> In-Reply-To: <20250411150831.255017-2-d.kral@proxmox.com> X-SPAM-LEVEL: Spam detection results: 0 AWL 0.079 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [smartmontools.org, diskmanage.pm, proxmox.com] Subject: Re: [pve-devel] [PATCH storage 2/2] fix #6224: disks: get: set timeout for retrieval of SMART stat data X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com> List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe> List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/> List-Post: <mailto:pve-devel@lists.proxmox.com> List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help> List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe> Reply-To: Proxmox VE development discussion <pve-devel@lists.proxmox.com> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" <pve-devel-bounces@lists.proxmox.com> On Fri Apr 11, 2025 at 5:08 PM CEST, Daniel Kral wrote: > In rare scenarios, `smartctl` takes up to 60 seconds to timeout for SCSI > commands to be completed, as reported in our user forum [0] and bugzilla > [1]. It seems that USB drives handled by the USB Attached SCSI (UAS) > kernel module are more likely to be affected by this [2], but is more of > a case-by-case situation. > > Therefore, set a more reasonable timeout of 10 seconds, so that callers > don't have to wait too long or seem unresponsive (e.g. Node Disks view > in the WebGUI). > > [0] https://forum.proxmox.com/threads/164799/ > [1] https://bugzilla.proxmox.com/show_bug.cgi?id=6224 > [2] https://www.smartmontools.org/wiki/SAT-with-UAS-Linux > > Signed-off-by: Daniel Kral <d.kral@proxmox.com> > --- > As mentioned in the Bugzilla and indicated above, I haven't found any > clear indicator for this happening besides that the most affected > devices seem to be USB devices, which use the mentioned UAS kernel > module. Have you perhaps found any way to test this? I could then try to replicate this behaviour. Otherwise no hard feelings; I think setting a shorter timeout for (usually) smaller commands is something we should do in general. (That being said, looking through the code of PVE::Tools::run_command--- I'm surprised we don't set a default timeout there at all. I think introducing one there could perhaps break something unexpected, though, so I'd rather not touch it.) > > I'm fine lowering the timeout further, but 10 seconds seemed reasonable > if only one disk is affected for now, so that loading takes some time > and not seemingly forever. Given that I've never had a single device take longer than a split second, I think this is quite reasonable too. > > I was also thinking about just caching which disks have had that > behavior and just not running the command for them, but I thought this > would add more complexity than needed here. I agree that this would be a little too much; you'd also have to invalidate cache entries after a certain time / a certain condition etc. You'd also have to handle the case where the disk starts to magically respond to `smartctl` again. Better to just keep the timeout here as-is. Either way, nice work! For both patches, consider: Reviewed-by: Max Carrara <m.carrara@proxmox.com> (Though, I'd still like to test this somehow, if you found a way to do so) > > src/PVE/Diskmanage.pm | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/PVE/Diskmanage.pm b/src/PVE/Diskmanage.pm > index 059d645..6aa1338 100644 > --- a/src/PVE/Diskmanage.pm > +++ b/src/PVE/Diskmanage.pm > @@ -98,7 +98,7 @@ sub get_smart_data { > push @$cmd, $disk; > > my $returncode = eval { > - run_command($cmd, noerr => 1, outfunc => sub { > + run_command($cmd, noerr => 1, timeout => 10, outfunc => sub { > my ($line) = @_; > > # ATA SMART attributes, e.g.: _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel