From: Roland <devzero@web.de>
To: Mark Schouten <mark@tuxis.nl>,
Proxmox Backup Server development discussion
<pbs-devel@lists.proxmox.com>,
Thomas Lamprecht <t.lamprecht@proxmox.com>
Subject: Re: [pbs-devel] Slow overview of existing backups
Date: Fri, 10 Mar 2023 11:16:05 +0100 [thread overview]
Message-ID: <205dd8f8-a0f6-8146-27c1-f53d9b98b838@web.de> (raw)
In-Reply-To: <emb66595c1-d8df-4996-903f-e39ebd7039c2@eb9993ff.com>
Hi,
>> Requesting the available backups from a PBS takes quite a long time.
>> Are there any plans to start implementing caching or an overal
index-file for a datastore?
> There's already the host systems page cache that helps a lot, as long
> there's enough memory to avoid displacing its content frequently.
>- 2 three-way mirrors of 960GB Samsung PM9A3 nvme’s as special devices
ah ok, i see you should have fast metadata access because of special device
what about freshly booting your backup server and issuing
zpool iostat -rv
after listing backups and observing slowness ?
with this we can get more insight where time is spent , if it's only
all about metadata access and if things are working good from a
filesystem/performance/metadata point of view.
i don't expect issues there anymore as special vdev got mature in the
meantime, but you never know. remembering
https://github.com/openzfs/zfs/issues/8130 for example....
if that looks sane from a performance perspective, taking a closer look
at the pbs/indexer level would be good.
regards
roland
Am 10.03.23 um 10:09 schrieb Mark Schouten:
> Hi all,
>
> any thought on this?
>
> —
> Mark Schouten, CTO
> Tuxis B.V.
> mark@tuxis.nl / +31 318 200208
>
>
> ------ Original Message ------
> From "Mark Schouten" <mark@tuxis.nl>
> To "Thomas Lamprecht" <t.lamprecht@proxmox.com>; "Proxmox Backup
> Server development discussion" <pbs-devel@lists.proxmox.com>
> Date 1/26/2023 9:03:24 AM
> Subject Re[2]: [pbs-devel] Slow overview of existing backups
>
>> Hi,
>>
>>>> PBS knows when something changed in terms of backups, and thus
>>>> when it’s time to update that index.
>>>>
>>>
>>> PBS is build such that the file system is the source of truth, one can,
>>> e.g., remove stuff there or use the manager CLI, multiple PBS instances
>>> can also run parallel, e.g., during upgrade.
>>>
>>> So having a guaranteed in-sync cache is not as trivial as it might
>>> sound.
>>>
>>
>> You can also remove stuff from /var/lib/mysql/, but then you break
>> it. There is nothing wrong with demanding your user to don’t touch
>> any files, except via the tooling you provide. And the tooling you
>> provide, can hint the service to rebuild the index. Same goes for
>> upgrades, you are in charge of them.
>>
>> We also need to regularly run garbage collection, which is a nice
>> moment to update my desired index and check if it’s actually correct.
>> On every backup run, delete, verify, you can update and check the
>> index. Those are all moments a user is not actually waiting for it
>> and getting timeouts, refreshing screens, and other annoyances.
>>
>>>
>>>> I have the feeling that when you request an overview now, all
>>>> individual backups are checked, which seems suboptimal.
>>>
>>> We mostly walk the directory structure and read the (quite small)
>>> manifest
>>> files for some info like last verification, but we do not check the
>>> backup
>>> (data) itself.
>>>
>>> Note that using namespaces for separating many backups into multiple
>>> folder
>>> can help, as a listing then only needs to check the indices from the
>>> namespace.
>>>
>>> But, what data and backup amount count/sizes are we talking here?
>>
>> Server:
>> 2x Intel Silver 4114 (10 cores, 20 threads each)
>> 256GB RAM
>> A zpool consisting of:
>> - 17 three-way mirrors of 18TB Western Digital HC550’s, SAS
>> - 2 three-way mirrors of 960GB Samsung PM9A3 nvme’s as special devices
>>
>> Datastores:
>> - 73 datastores
>> - Total of 240T Allocated data
>>
>> Datastore that triggered my question:
>> - 263 Groups
>> - 2325 Snapshots
>> - 60TB In use
>> - Dedup factor of 19.3
>>
>>> How many groups, how many snapshots (per group), many disks on backups?
>>>
>>> And what hardware is hosting that data (cpu, disk, memory).
>>>
>>> Hows PSI looking during listing? head /proc/pressure/*
>>
>> root@pbs003:/proc/pressure# head *
>> ==> cpu <==
>> some avg10=0.74 avg60=0.58 avg300=0.21 total=8570917611
>> full avg10=0.00 avg60=0.00 avg300=0.00 total=0
>>
>> ==> io <==
>> some avg10=20.45 avg60=23.93 avg300=27.69 total=176562636690
>> full avg10=19.25 avg60=22.69 avg300=26.82 total=165397148422
>>
>> ==> memory <==
>> some avg10=0.00 avg60=0.00 avg300=0.00 total=67894436
>> full avg10=0.00 avg60=0.00 avg300=0.00 total=66761631
>>
>> Currently running 9 tasks:
>> - 3 Verifys
>> - 1 Backup
>> - 2 Syncjobs
>> - 2 GC Runs
>> - 1 Reader
>>
>> —
>> Mark Schouten, CTO
>> Tuxis B.V.
>> mark@tuxis.nl / +31 318 200208
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
next prev parent reply other threads:[~2023-03-10 10:16 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-25 10:26 Mark Schouten
2023-01-25 16:08 ` Thomas Lamprecht
2023-01-26 8:03 ` Mark Schouten
2023-03-10 9:09 ` Mark Schouten
2023-03-10 10:16 ` Roland [this message]
2023-03-10 10:52 ` Mark Schouten
2023-03-13 12:48 ` Mark Schouten
2023-03-10 10:02 ` Roland
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=205dd8f8-a0f6-8146-27c1-f53d9b98b838@web.de \
--to=devzero@web.de \
--cc=mark@tuxis.nl \
--cc=pbs-devel@lists.proxmox.com \
--cc=t.lamprecht@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox