Re: [pbs-devel] Slow overview of existing backups

From: Roland <devzero@web.de>
To: Mark Schouten <mark@tuxis.nl>,
	Proxmox Backup Server development discussion
	<pbs-devel@lists.proxmox.com>,
	Thomas Lamprecht <t.lamprecht@proxmox.com>
Subject: Re: [pbs-devel] Slow overview of existing backups
Date: Fri, 10 Mar 2023 11:16:05 +0100	[thread overview]
Message-ID: <205dd8f8-a0f6-8146-27c1-f53d9b98b838@web.de> (raw)
In-Reply-To: <emb66595c1-d8df-4996-903f-e39ebd7039c2@eb9993ff.com>

Hi,

 >> Requesting the available backups from a PBS takes quite a long time.
 >> Are there any plans to start implementing caching or an overal
index-file for a datastore?

> There's already the host systems page cache that helps a lot, as long
> there's enough memory to avoid displacing its content frequently.

 >- 2 three-way mirrors of 960GB Samsung PM9A3 nvme’s as special devices

ah ok, i see you should have fast metadata access because of special device

what about freshly booting your backup server and issuing

zpool iostat -rv

after listing backups and observing slowness ?

with this we can get more insight where time is spent ,  if it's only
all about metadata access and if things are working good from a
filesystem/performance/metadata point of view.

i don't expect issues there anymore as special vdev got mature in the
meantime, but you never know.  remembering
https://github.com/openzfs/zfs/issues/8130  for example....

if that looks sane from a performance perspective, taking a closer look
at the pbs/indexer level would be good.

regards
roland

Am 10.03.23 um 10:09 schrieb Mark Schouten:
> Hi all,
>
> any thought on this?
>
> —
> Mark Schouten, CTO
> Tuxis B.V.
> mark@tuxis.nl / +31 318 200208
>
>
> ------ Original Message ------
> From "Mark Schouten" <mark@tuxis.nl>
> To "Thomas Lamprecht" <t.lamprecht@proxmox.com>; "Proxmox Backup
> Server development discussion" <pbs-devel@lists.proxmox.com>
> Date 1/26/2023 9:03:24 AM
> Subject Re[2]: [pbs-devel] Slow overview of existing backups
>
>> Hi,
>>
>>>>  PBS knows when something changed in terms of backups, and thus
>>>> when it’s time to update that index.
>>>>
>>>
>>> PBS is build such that the file system is the source of truth, one can,
>>> e.g., remove stuff there or use the manager CLI, multiple PBS instances
>>> can also run parallel, e.g., during upgrade.
>>>
>>> So having a guaranteed in-sync cache is not as trivial as it might
>>> sound.
>>>
>>
>> You can also remove stuff from /var/lib/mysql/, but then you break
>> it. There is nothing wrong with demanding your user to don’t touch
>> any files, except via the tooling you provide. And the tooling you
>> provide, can hint the service to rebuild the index. Same goes for
>> upgrades, you are in charge of them.
>>
>> We also need to regularly run garbage collection, which is a nice
>> moment to update my desired index and check if it’s actually correct.
>> On every backup run, delete, verify, you can update and check the
>> index. Those are all moments a user is not actually waiting for it
>> and getting timeouts, refreshing screens, and other annoyances.
>>
>>>
>>>>  I have the feeling that when you request an overview now, all
>>>> individual backups are checked, which seems suboptimal.
>>>
>>> We mostly walk the directory structure and read the (quite small)
>>> manifest
>>> files for some info like last verification, but we do not check the
>>> backup
>>> (data) itself.
>>>
>>> Note that using namespaces for separating many backups into multiple
>>> folder
>>> can help, as a listing then only needs to check the indices from the
>>> namespace.
>>>
>>> But, what data and backup amount count/sizes are we talking here?
>>
>> Server:
>> 2x Intel Silver 4114 (10 cores, 20 threads each)
>> 256GB RAM
>> A zpool consisting of:
>> - 17 three-way mirrors of 18TB Western Digital HC550’s, SAS
>> - 2 three-way mirrors of 960GB Samsung PM9A3 nvme’s as special devices
>>
>> Datastores:
>> - 73 datastores
>> - Total of 240T Allocated data
>>
>> Datastore that triggered my question:
>> - 263 Groups
>> - 2325 Snapshots
>> - 60TB In use
>> - Dedup factor of 19.3
>>
>>> How many groups, how many snapshots (per group), many disks on backups?
>>>
>>> And what hardware is hosting that data (cpu, disk, memory).
>>>
>>> Hows PSI looking during listing? head /proc/pressure/*
>>
>> root@pbs003:/proc/pressure# head *
>> ==> cpu <==
>> some avg10=0.74 avg60=0.58 avg300=0.21 total=8570917611
>> full avg10=0.00 avg60=0.00 avg300=0.00 total=0
>>
>> ==> io <==
>> some avg10=20.45 avg60=23.93 avg300=27.69 total=176562636690
>> full avg10=19.25 avg60=22.69 avg300=26.82 total=165397148422
>>
>> ==> memory <==
>> some avg10=0.00 avg60=0.00 avg300=0.00 total=67894436
>> full avg10=0.00 avg60=0.00 avg300=0.00 total=66761631
>>
>> Currently running 9 tasks:
>> - 3 Verifys
>> - 1 Backup
>> - 2 Syncjobs
>> - 2 GC Runs
>> - 1 Reader
>>
>> —
>> Mark Schouten, CTO
>> Tuxis B.V.
>> mark@tuxis.nl / +31 318 200208
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel