* [pbs-devel] [PATCH widget-toolkit] fix #4961: disks: increase timeout to 1min @ 2023-09-21 8:21 Gabriel Goller 2023-09-26 7:55 ` Fabian Grünbichler 0 siblings, 1 reply; 5+ messages in thread From: Gabriel Goller @ 2023-09-21 8:21 UTC (permalink / raw) To: pbs-devel Increase the timeout of the api call to `../nodes/localhost/disks/list` to 1min. When having a lot of disks the default timeout of 30sec is not enough. Signed-off-by: Gabriel Goller <g.goller@proxmox.com> --- src/panel/DiskList.js | 1 + 1 file changed, 1 insertion(+) diff --git a/src/panel/DiskList.js b/src/panel/DiskList.js index c5a0050..5785297 100644 --- a/src/panel/DiskList.js +++ b/src/panel/DiskList.js @@ -70,6 +70,7 @@ Ext.define('Proxmox.DiskList', { type: 'proxmox', extraParams: extraParams, url: url, + timeout: 60*1000, }); me.store.load(); }, -- 2.39.2 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [pbs-devel] [PATCH widget-toolkit] fix #4961: disks: increase timeout to 1min 2023-09-21 8:21 [pbs-devel] [PATCH widget-toolkit] fix #4961: disks: increase timeout to 1min Gabriel Goller @ 2023-09-26 7:55 ` Fabian Grünbichler 2023-09-26 8:15 ` Dominik Csapak 0 siblings, 1 reply; 5+ messages in thread From: Fabian Grünbichler @ 2023-09-26 7:55 UTC (permalink / raw) To: Proxmox Backup Server development discussion On September 21, 2023 10:21 am, Gabriel Goller wrote: > Increase the timeout of the api call to `../nodes/localhost/disks/list` > to 1min. When having a lot of disks the default timeout of 30sec is > not enough. > > Signed-off-by: Gabriel Goller <g.goller@proxmox.com> > --- > src/panel/DiskList.js | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/src/panel/DiskList.js b/src/panel/DiskList.js > index c5a0050..5785297 100644 > --- a/src/panel/DiskList.js > +++ b/src/panel/DiskList.js > @@ -70,6 +70,7 @@ Ext.define('Proxmox.DiskList', { > type: 'proxmox', > extraParams: extraParams, > url: url, > + timeout: 60*1000, that alone won't be enough, since we also have a 30s timeout for proxying requests in a PVE cluster (and this component is used there). see https://bugzilla.proxmox.com/show_bug.cgi?id=3045 and https://bugzilla.proxmox.com/show_bug.cgi?id=4447 for some previous discussion (with similar API endpoints). IMHO the most complete solution would be some sort of light-weight/ephemeral task mechanism: - runs in a worker, like regular task (possibly with a bounded queue of allowed in-flight requests?) - not visible in the regular task list (or only via a special parameter?) - status is pollable, but once completed, results are returned once and discarded, and not stored with the regular task logs/results - result is discarded after $timeout if not polled in time this would allow us to make *every* currently sync API handler (optionally!) async by just spawning such a task and making it the clients responsibility to request async handling and taking care of polling to retrieve the result. then we could selectively switch over certain parts of our GUI, add client-side caching and an explicit refresh button. if the server responds quickly, not much changed except an additional fork and some state housekeeping. there are other solutions as well of course (e.g., some web UIs use a long-lived WebSocket connection and pipeline requests/responses in an async fashion over that). > }); > me.store.load(); > }, > -- > 2.39.2 > > > > _______________________________________________ > pbs-devel mailing list > pbs-devel@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel > > > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [pbs-devel] [PATCH widget-toolkit] fix #4961: disks: increase timeout to 1min 2023-09-26 7:55 ` Fabian Grünbichler @ 2023-09-26 8:15 ` Dominik Csapak 2023-09-26 13:42 ` Thomas Lamprecht 0 siblings, 1 reply; 5+ messages in thread From: Dominik Csapak @ 2023-09-26 8:15 UTC (permalink / raw) To: Proxmox Backup Server development discussion, Fabian Grünbichler On 9/26/23 09:55, Fabian Grünbichler wrote: > On September 21, 2023 10:21 am, Gabriel Goller wrote: >> Increase the timeout of the api call to `../nodes/localhost/disks/list` >> to 1min. When having a lot of disks the default timeout of 30sec is >> not enough. >> >> Signed-off-by: Gabriel Goller <g.goller@proxmox.com> >> --- >> src/panel/DiskList.js | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/src/panel/DiskList.js b/src/panel/DiskList.js >> index c5a0050..5785297 100644 >> --- a/src/panel/DiskList.js >> +++ b/src/panel/DiskList.js >> @@ -70,6 +70,7 @@ Ext.define('Proxmox.DiskList', { >> type: 'proxmox', >> extraParams: extraParams, >> url: url, >> + timeout: 60*1000, > > that alone won't be enough, since we also have a 30s timeout for > proxying requests in a PVE cluster (and this component is used there). it would actually be enough to fix that exact bug, which is for pbs where we don't have the 30s proxy limit, but you're right it's not a generic fix across both products imho we could still apply this here (maybe with a short test to see if it doesn't make the situation worse for pve) so it's fixed for pbs at least and can properly fix it for pve later still ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [pbs-devel] [PATCH widget-toolkit] fix #4961: disks: increase timeout to 1min 2023-09-26 8:15 ` Dominik Csapak @ 2023-09-26 13:42 ` Thomas Lamprecht 2023-09-28 12:57 ` Gabriel Goller 0 siblings, 1 reply; 5+ messages in thread From: Thomas Lamprecht @ 2023-09-26 13:42 UTC (permalink / raw) To: Proxmox Backup Server development discussion, Dominik Csapak, Fabian Grünbichler, Gabriel Goller Am 26/09/2023 um 10:15 schrieb Dominik Csapak: > On 9/26/23 09:55, Fabian Grünbichler wrote: >> On September 21, 2023 10:21 am, Gabriel Goller wrote: >>> Increase the timeout of the api call to `../nodes/localhost/disks/list` >>> to 1min. When having a lot of disks the default timeout of 30sec is >>> not enough. >>> >> >> that alone won't be enough, since we also have a 30s timeout for >> proxying requests in a PVE cluster (and this component is used there). > > it would actually be enough to fix that exact bug, which is for pbs > where we don't have the 30s proxy limit, but you're right it's not > a generic fix across both products > > imho we could still apply this here (maybe with a short test to > see if it doesn't make the situation worse for pve) so it's fixed > for pbs at least and can properly fix it for pve later still I'd rather like to avoid that, 30s is already long and might make some users just close the dialogue or whatever. And general "lightweight-task" mechanism that Fabian proposed could be nice for some things but IMO the wrong approach here too. Here, with the end result being to show a simple disk list, I just refuse to believe that this has to take that long; here the API call has some serious architectural problems, among other things that can mean a mix of: 1. too much stuff read that is not relevant most of the time, we could avoid querying the data that has the worst expensive/useful ratio and make that only available on a per-disk info dialogue. 2. getting some info in a convoluted way (e.g, we have already lsblk info, which knows if a disk is a "zfs_member", but still do a zpool list, and all of those do separate stats on the device nodes to get the rdevs, multiple times...) 3. missing a cache for the more expensive stuff, e.g., one that the rrd stat querying code could probably fill every X'd run or so, as it checks for IO metrics anyway I'd start with the 2. point above, as I saw quite a bit optimization potential with only a quick skimming, some of that should hopefully produce some actual improvements. Then I'd check where most of the time is spent, either by asking the reporter nicely for timings of all relevant commands and potentially also by just checking ourselves, we have slow HW here too, and while not 90 disks, even a tenth should show slowish enough behavior to be able to find the major culprits with some tracing or even just printf-timestamp-debugging The _last_ thing I'd do here for this specific endpoint is extending timeouts or inventing some new fancy light-weight task mechanisms. As optimizations, omitting expensive stuff, or moving it out from "for-all-disk" to a "per-single-disk" view, and caching for the "expensive but just really interesting to always see" stuff goes a long way.. btw. PBS still misses the Wipe-Disk feature from PVE.. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [pbs-devel] [PATCH widget-toolkit] fix #4961: disks: increase timeout to 1min 2023-09-26 13:42 ` Thomas Lamprecht @ 2023-09-28 12:57 ` Gabriel Goller 0 siblings, 0 replies; 5+ messages in thread From: Gabriel Goller @ 2023-09-28 12:57 UTC (permalink / raw) To: Thomas Lamprecht, Proxmox Backup Server development discussion, Dominik Csapak, Fabian Grünbichler On 9/26/23 15:42, Thomas Lamprecht wrote: > [..] > I'd rather like to avoid that, 30s is already long and might make some > users just close the dialogue or whatever. > > And general "lightweight-task" mechanism that Fabian proposed could be > nice for some things but IMO the wrong approach here too. > > Here, with the end result being to show a simple disk list, I just > refuse to believe that this has to take that long; here the API call has > some serious architectural problems, among other things that can mean a > mix of: > > 1. too much stuff read that is not relevant most of the time, we could > avoid querying the data that has the worst expensive/useful ratio > and make that only available on a per-disk info dialogue. > > 2. getting some info in a convoluted way (e.g, we have already lsblk > info, which knows if a disk is a "zfs_member", but still do a zpool > list, and all of those do separate stats on the device nodes to get > the rdevs, multiple times...) > > 3. missing a cache for the more expensive stuff, e.g., one that the rrd > stat querying code could probably fill every X'd run or so, as it > checks for IO metrics anyway > > I'd start with the 2. point above, as I saw quite a bit optimization > potential with only a quick skimming, some of that should hopefully > produce some actual improvements. > > Then I'd check where most of the time is spent, either by asking the > reporter nicely for timings of all relevant commands and potentially > also by just checking ourselves, we have slow HW here too, and while > not 90 disks, even a tenth should show slowish enough behavior to > be able to find the major culprits with some tracing or even just > printf-timestamp-debugging > > > The _last_ thing I'd do here for this specific endpoint is extending > timeouts or inventing some new fancy light-weight task mechanisms. > As optimizations, omitting expensive stuff, or moving it out from > "for-all-disk" to a "per-single-disk" view, and caching for the > "expensive but just really interesting to always see" stuff goes a > long way.. > > btw. PBS still misses the Wipe-Disk feature from PVE.. I looked into optimizing some stuff and noticed that (as you wrote in the issue) the slowest part is the smartctl. Getting the smartctl-data takes about 60-70% of the time spent on the request. I only have 4 disks with 6 partitions, but the api call still takes about 232.22ms on average. Disabling the smartctl check we get to 157.22ms. I did some other optimizations and got down to 121.27ms, which is still not great but an improvement. I will do some other stuff and check if it helps... But in the meantime I thought about some way of disabling smartctl checks when there are a lot of disks: How about we add a button on the frontend "get smartctl data", which manually updates the smartctl data in the table. We could then check server-side if there are more than x disks, or "lsblk" takes more than x seconds and don't run the smartctl check, which returns a unknown status. Then the user can manually click the "get smartctl data" button which makes the smartctl check in a separate api request. So when a user doesn't have a lot of disks, we still get them all including the smartctl data on the same request. But if it takes too long, we skip the smartctl check, and let the user decide if he needs the data or not. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-09-28 12:57 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-09-21 8:21 [pbs-devel] [PATCH widget-toolkit] fix #4961: disks: increase timeout to 1min Gabriel Goller 2023-09-26 7:55 ` Fabian Grünbichler 2023-09-26 8:15 ` Dominik Csapak 2023-09-26 13:42 ` Thomas Lamprecht 2023-09-28 12:57 ` Gabriel Goller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox