From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <t.lamprecht@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id E2C05E858
 for <pbs-devel@lists.proxmox.com>; Tue, 26 Sep 2023 15:42:40 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id C96131E47
 for <pbs-devel@lists.proxmox.com>; Tue, 26 Sep 2023 15:42:10 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS
 for <pbs-devel@lists.proxmox.com>; Tue, 26 Sep 2023 15:42:09 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 8C30F48E0E
 for <pbs-devel@lists.proxmox.com>; Tue, 26 Sep 2023 15:42:09 +0200 (CEST)
Message-ID: <7b0ac740-096b-4850-a81d-5fa6bc9c347a@proxmox.com>
Date: Tue, 26 Sep 2023 15:42:08 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Content-Language: de-AT, en-US
To: Proxmox Backup Server development discussion
 <pbs-devel@lists.proxmox.com>, Dominik Csapak <d.csapak@proxmox.com>,
 =?UTF-8?Q?Fabian_Gr=C3=BCnbichler?= <f.gruenbichler@proxmox.com>,
 Gabriel Goller <g.goller@proxmox.com>
References: <20230921082157.33718-1-g.goller@proxmox.com>
 <1695714131.wbzdf9yjca.astroid@yuna.none>
 <f31588c3-5dba-47f8-a140-d015d0a0456c@proxmox.com>
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
Autocrypt: addr=t.lamprecht@proxmox.com; keydata=
 xsFNBFsLjcYBEACsaQP6uTtw/xHTUCKF4VD4/Wfg7gGn47+OfCKJQAD+Oyb3HSBkjclopC5J
 uXsB1vVOfqVYE6PO8FlD2L5nxgT3SWkc6Ka634G/yGDU3ZC3C/7NcDVKhSBI5E0ww4Qj8s9w
 OQRloemb5LOBkJNEUshkWRTHHOmk6QqFB/qBPW2COpAx6oyxVUvBCgm/1S0dAZ9gfkvpqFSD
 90B5j3bL6i9FIv3YGUCgz6Ue3f7u+HsEAew6TMtlt90XV3vT4M2IOuECG/pXwTy7NtmHaBQ7
 UJBcwSOpDEweNob50+9B4KbnVn1ydx+K6UnEcGDvUWBkREccvuExvupYYYQ5dIhRFf3fkS4+
 wMlyAFh8PQUgauod+vqs45FJaSgTqIALSBsEHKEs6IoTXtnnpbhu3p6XBin4hunwoBFiyYt6
 YHLAM1yLfCyX510DFzX/Ze2hLqatqzY5Wa7NIXqYYelz7tXiuCLHP84+sV6JtEkeSUCuOiUY
 virj6nT/nJK8m0BzdR6FgGtNxp7RVXFRz/+mwijJVLpFsyG1i0Hmv2zTn3h2nyGK/I6yhFNt
 dX69y5hbo6LAsRjLUvZeHXpTU4TrpN/WiCjJblbj5um5eEr4yhcwhVmG102puTtuCECsDucZ
 jpKpUqzXlpLbzG/dp9dXFH3MivvfuaHrg3MtjXY1i+/Oxyp5iwARAQABzTNUaG9tYXMgTGFt
 cHJlY2h0IChBdXRoLTQpIDx0LmxhbXByZWNodEBwcm94bW94LmNvbT7CwY4EEwEIADgWIQQO
 R4qbEl/pah9K6VrTZCM6gDZWBgUCWwuNxgIbAwULCQgHAgYVCAkKCwIEFgIDAQIeAQIXgAAK
 CRDTZCM6gDZWBm/jD/4+6JB2s67eaqoP6x9VGaXNGJPCscwzLuxDTCG90G9FYu29VcXtubH/
 bPwsyBbNUQpqTm/s4XboU2qpS5ykCuTjqavrcP33tdkYfGcItj2xMipJ1i3TWvpikQVsX42R
 G64wovLs/dvpTYphRZkg5DwhgTmy3mRkmofFCTa+//MOcNOORltemp984tWjpR3bUJETNWpF
 sKGZHa3N4kCNxb7A+VMsJZ/1gN3jbQbQG7GkJtnHlWkw9rKCYqBtWrnrHa4UAvSa9M/XCIAB
 FThFGqZI1ojdVlv5gd6b/nWxfOPrLlSxbUo5FZ1i/ycj7/24nznW1V4ykG9iUld4uYUY86bB
 UGSjew1KYp9FmvKiwEoB+zxNnuEQfS7/Bj1X9nxizgweiHIyFsRqgogTvLh403QMSGNSoArk
 tqkorf1U+VhEncIn4H3KksJF0njZKfilrieOO7Vuot1xKr9QnYrZzJ7m7ZxJ/JfKGaRHXkE1
 feMmrvZD1AtdUATZkoeQtTOpMu4r6IQRfSdwm/CkppZXfDe50DJxAMDWwfK2rr2bVkNg/yZI
 tKLBS0YgRTIynkvv0h8d9dIjiicw3RMeYXyqOnSWVva2r+tl+JBaenr8YTQw0zARrhC0mttu
 cIZGnVEvQuDwib57QLqMjQaC1gazKHvhA15H5MNxUhwm229UmdH3KM7BTQRbC43GARAAyTkR
 D6KRJ9Xa2fVMh+6f186q0M3ni+5tsaVhUiykxjsPgkuWXWW9MbLpYXkzX6h/RIEKlo2BGA95
 QwG5+Ya2Bo3g7FGJHAkXY6loq7DgMp5/TVQ8phsSv3WxPTJLCBq6vNBamp5hda4cfXFUymsy
 HsJy4dtgkrPQ/bnsdFDCRUuhJHopnAzKHN8APXpKU6xV5e3GE4LwFsDhNHfH/m9+2yO/trcD
 txSFpyftbK2gaMERHgA8SKkzRhiwRTt9w5idOfpJVkYRsgvuSGZ0pcD4kLCOIFrer5xXudk6
 NgJc36XkFRMnwqrL/bB4k6Pi2u5leyqcXSLyBgeHsZJxg6Lcr2LZ35+8RQGPOw9C0ItmRjtY
 ZpGKPlSxjxA1WHT2YlF9CEt3nx7c4C3thHHtqBra6BGPyW8rvtq4zRqZRLPmZ0kt/kiMPhTM
 8wZAlObbATVrUMcZ/uNjRv2vU9O5aTAD9E5r1B0dlqKgxyoImUWB0JgpILADaT3VybDd3C8X
 s6Jt8MytUP+1cEWt9VKo4vY4Jh5vwrJUDLJvzpN+TsYCZPNVj18+jf9uGRaoK6W++DdMAr5l
 gQiwsNgf9372dbMI7pt2gnT5/YdG+ZHnIIlXC6OUonA1Ro/Itg90Q7iQySnKKkqqnWVc+qO9
 GJbzcGykxD6EQtCSlurt3/5IXTA7t6sAEQEAAcLBdgQYAQgAIBYhBA5HipsSX+lqH0rpWtNk
 IzqANlYGBQJbC43GAhsMAAoJENNkIzqANlYGD1sP/ikKgHgcspEKqDED9gQrTBvipH85si0j
 /Jwu/tBtnYjLgKLh2cjv1JkgYYjb3DyZa1pLsIv6rGnPX9bH9IN03nqirC/Q1Y1lnbNTynPk
 IflgvsJjoTNZjgu1wUdQlBgL/JhUp1sIYID11jZphgzfDgp/E6ve/8xE2HMAnf4zAfJaKgD0
 F+fL1DlcdYUditAiYEuN40Ns/abKs8I1MYx7Yglu3RzJfBzV4t86DAR+OvuF9v188WrFwXCS
 RSf4DmJ8tntyNej+DVGUnmKHupLQJO7uqCKB/1HLlMKc5G3GLoGqJliHjUHUAXNzinlpE2Vj
 C78pxpwxRNg2ilE3AhPoAXrY5qED5PLE9sLnmQ9AzRcMMJUXjTNEDxEYbF55SdGBHHOAcZtA
 kEQKub86e+GHA+Z8oXQSGeSGOkqHi7zfgW1UexddTvaRwE6AyZ6FxTApm8wq8NT2cryWPWTF
 BDSGB3ujWHMM8ERRYJPcBSjTvt0GcEqnd+OSGgxTkGOdufn51oz82zfpVo1t+J/FNz6MRMcg
 8nEC+uKvgzH1nujxJ5pRCBOquFZaGn/p71Yr0oVitkttLKblFsqwa+10Lt6HBxm+2+VLp4Ja
 0WZNncZciz3V3cuArpan/ZhhyiWYV5FD0pOXPCJIx7WS9PTtxiv0AOS4ScWEUmBxyhFeOpYa DrEx
In-Reply-To: <f31588c3-5dba-47f8-a140-d015d0a0456c@proxmox.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.077 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pbs-devel] [PATCH widget-toolkit] fix #4961: disks: increase
 timeout to 1min
X-BeenThere: pbs-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Tue, 26 Sep 2023 13:42:40 -0000

Am 26/09/2023 um 10:15 schrieb Dominik Csapak:
> On 9/26/23 09:55, Fabian Gr=C3=BCnbichler wrote:
>> On September 21, 2023 10:21 am, Gabriel Goller wrote:
>>> Increase the timeout of the api call to `../nodes/localhost/disks/lis=
t`
>>> to 1min. When having a lot of disks the default timeout of 30sec is
>>> not enough.
>>>
>>
>> that alone won't be enough, since we also have a 30s timeout for
>> proxying requests in a PVE cluster (and this component is used there).=

>=20
> it would actually be enough to fix that exact bug, which is for pbs
> where we don't have the 30s proxy limit, but you're right it's not
> a generic fix across both products
>=20
> imho we could still apply this here (maybe with a short test to
> see if it doesn't make the situation worse for pve) so it's fixed
> for pbs at least and can properly fix it for pve later still

I'd rather like to avoid that, 30s is already long and might make some
users just close the dialogue or whatever.

And general "lightweight-task" mechanism that Fabian proposed could be
nice for some things but IMO the wrong approach here too.

Here, with the end result being to show a simple disk list, I just
refuse to believe that this has to take that long; here the API call has
some serious architectural problems, among other things that can mean a
mix of:

1. too much stuff read that is not relevant most of the time, we could
   avoid querying the data that has the worst expensive/useful ratio
   and make that only available on a per-disk info dialogue.

2. getting some info in a convoluted way (e.g, we have already lsblk
   info, which knows if a disk is a "zfs_member", but still do a zpool
   list, and all of those do separate stats on the device nodes to get
   the rdevs, multiple times...)

3. missing a cache for the more expensive stuff, e.g., one that the rrd
   stat querying code could probably fill every X'd run or so, as it
   checks for IO metrics anyway

I'd start with the 2. point above, as I saw quite a bit optimization
potential with only a quick skimming, some of that should hopefully
produce some actual improvements.

Then I'd check where most of the time is spent, either by asking the
reporter nicely for timings of all relevant commands and potentially
also by just checking ourselves, we have slow HW here too, and while
not 90 disks, even a tenth should show slowish enough behavior to
be able to find the major culprits with some tracing or even just
printf-timestamp-debugging


The _last_ thing I'd do here for this specific endpoint is extending
timeouts or inventing some new fancy light-weight task mechanisms.
As optimizations, omitting expensive stuff, or moving it out from
"for-all-disk" to a "per-single-disk" view, and caching for the
"expensive but just really interesting to always see" stuff goes a
long way..

btw. PBS still misses the Wipe-Disk feature from PVE..