[pbs-devel] Slow overview of existing backups

public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed

* [pbs-devel] Slow overview of existing backups
@ 2023-01-25 10:26 Mark Schouten
  2023-01-25 16:08 ` Thomas Lamprecht
  2023-03-10 10:02 ` Roland
  0 siblings, 2 replies; 8+ messages in thread
From: Mark Schouten @ 2023-01-25 10:26 UTC (permalink / raw)
  To: pbs-devel

Hi,

Requesting the available backups from a PBS takes quite a long time. Are 
there any plans to start implementing caching or an overal index-file 
for a datastore? PBS knows when something changed in terms of backups, 
and thus when it’s time to update that index.

I have the feeling that when you request an overview now, all individual 
backups are checked, which seems suboptimal.

Thanks,

—
Mark Schouten, CTO
Tuxis B.V.
mark@tuxis.nl / +31 318 200208

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pbs-devel] Slow overview of existing backups
  2023-01-25 10:26 [pbs-devel] Slow overview of existing backups Mark Schouten
@ 2023-01-25 16:08 ` Thomas Lamprecht
  2023-01-26  8:03   ` Mark Schouten
  2023-03-10 10:02 ` Roland
  1 sibling, 1 reply; 8+ messages in thread
From: Thomas Lamprecht @ 2023-01-25 16:08 UTC (permalink / raw)
  To: Mark Schouten, Proxmox Backup Server development discussion

Hi,

Am 25/01/2023 um 11:26 schrieb Mark Schouten:
> Requesting the available backups from a PBS takes quite a long time.> Are there any plans to start implementing caching or an overal index-file for a datastore?

There's already the host systems page cache that helps a lot, as long
there's enough memory to avoid displacing its content frequently.

> PBS knows when something changed in terms of backups, and thus when it’s time to update that index.
> 

PBS is build such that the file system is the source of truth, one can,
e.g., remove stuff there or use the manager CLI, multiple PBS instances
can also run parallel, e.g., during upgrade.

So having a guaranteed in-sync cache is not as trivial as it might sound.

> I have the feeling that when you request an overview now, all individual backups are checked, which seems suboptimal.

We mostly walk the directory structure and read the (quite small) manifest
files for some info like last verification, but we do not check the backup
(data) itself.

Note that using namespaces for separating many backups into multiple folder
can help, as a listing then only needs to check the indices from the namespace.

But, what data and backup amount count/sizes are we talking here?
How many groups, how many snapshots (per group), many disks on backups?

And what hardware is hosting that data (cpu, disk, memory).

Hows PSI looking during listing? head /proc/pressure/*

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pbs-devel] Slow overview of existing backups
  2023-01-25 16:08 ` Thomas Lamprecht
@ 2023-01-26  8:03   ` Mark Schouten
  2023-03-10  9:09     ` Mark Schouten
  0 siblings, 1 reply; 8+ messages in thread
From: Mark Schouten @ 2023-01-26  8:03 UTC (permalink / raw)
  To: Thomas Lamprecht, Proxmox Backup Server development discussion

Hi,

>>  PBS knows when something changed in terms of backups, and thus when it’s time to update that index.
>>
>
>PBS is build such that the file system is the source of truth, one can,
>e.g., remove stuff there or use the manager CLI, multiple PBS instances
>can also run parallel, e.g., during upgrade.
>
>So having a guaranteed in-sync cache is not as trivial as it might sound.
>

You can also remove stuff from /var/lib/mysql/, but then you break it. 
There is nothing wrong with demanding your user to don’t touch any 
files, except via the tooling you provide. And the tooling you provide, 
can hint the service to rebuild the index. Same goes for upgrades, you 
are in charge of them.

We also need to regularly run garbage collection, which is a nice moment 
to update my desired index and check if it’s actually correct. On every 
backup run, delete, verify, you can update and check the index. Those 
are all moments a user is not actually waiting for it and getting 
timeouts, refreshing screens, and other annoyances.

>
>>  I have the feeling that when you request an overview now, all individual backups are checked, which seems suboptimal.
>
>We mostly walk the directory structure and read the (quite small) manifest
>files for some info like last verification, but we do not check the backup
>(data) itself.
>
>Note that using namespaces for separating many backups into multiple folder
>can help, as a listing then only needs to check the indices from the namespace.
>
>But, what data and backup amount count/sizes are we talking here?

Server:
2x Intel Silver 4114 (10 cores, 20 threads each)
256GB RAM
A zpool consisting of:
- 17 three-way mirrors of 18TB Western Digital HC550’s, SAS
- 2 three-way mirrors of 960GB Samsung PM9A3 nvme’s as special devices

Datastores:
- 73 datastores
- Total of 240T Allocated data

Datastore that triggered my question:
- 263 Groups
- 2325 Snapshots
- 60TB In use
- Dedup factor of 19.3

>How many groups, how many snapshots (per group), many disks on backups?
>
>And what hardware is hosting that data (cpu, disk, memory).
>
>Hows PSI looking during listing? head /proc/pressure/*

root@pbs003:/proc/pressure# head *
==> cpu <==
some avg10=0.74 avg60=0.58 avg300=0.21 total=8570917611
full avg10=0.00 avg60=0.00 avg300=0.00 total=0

==> io <==
some avg10=20.45 avg60=23.93 avg300=27.69 total=176562636690
full avg10=19.25 avg60=22.69 avg300=26.82 total=165397148422

==> memory <==
some avg10=0.00 avg60=0.00 avg300=0.00 total=67894436
full avg10=0.00 avg60=0.00 avg300=0.00 total=66761631

Currently running 9 tasks:
- 3 Verifys
- 1 Backup
- 2 Syncjobs
- 2 GC Runs
- 1 Reader

—
Mark Schouten, CTO
Tuxis B.V.
mark@tuxis.nl / +31 318 200208




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pbs-devel] Slow overview of existing backups
  2023-01-26  8:03   ` Mark Schouten
@ 2023-03-10  9:09     ` Mark Schouten
  2023-03-10 10:16       ` Roland
  0 siblings, 1 reply; 8+ messages in thread
From: Mark Schouten @ 2023-03-10  9:09 UTC (permalink / raw)
  To: Thomas Lamprecht, Proxmox Backup Server development discussion

Hi all,

any thought on this?

—
Mark Schouten, CTO
Tuxis B.V.
mark@tuxis.nl / +31 318 200208


------ Original Message ------
From "Mark Schouten" <mark@tuxis.nl>
To "Thomas Lamprecht" <t.lamprecht@proxmox.com>; "Proxmox Backup Server 
development discussion" <pbs-devel@lists.proxmox.com>
Date 1/26/2023 9:03:24 AM
Subject Re[2]: [pbs-devel] Slow overview of existing backups

>Hi,
>
>>>  PBS knows when something changed in terms of backups, and thus when it’s time to update that index.
>>>
>>
>>PBS is build such that the file system is the source of truth, one can,
>>e.g., remove stuff there or use the manager CLI, multiple PBS instances
>>can also run parallel, e.g., during upgrade.
>>
>>So having a guaranteed in-sync cache is not as trivial as it might sound.
>>
>
>You can also remove stuff from /var/lib/mysql/, but then you break it. There is nothing wrong with demanding your user to don’t touch any files, except via the tooling you provide. And the tooling you provide, can hint the service to rebuild the index. Same goes for upgrades, you are in charge of them.
>
>We also need to regularly run garbage collection, which is a nice moment to update my desired index and check if it’s actually correct. On every backup run, delete, verify, you can update and check the index. Those are all moments a user is not actually waiting for it and getting timeouts, refreshing screens, and other annoyances.
>
>>
>>>  I have the feeling that when you request an overview now, all individual backups are checked, which seems suboptimal.
>>
>>We mostly walk the directory structure and read the (quite small) manifest
>>files for some info like last verification, but we do not check the backup
>>(data) itself.
>>
>>Note that using namespaces for separating many backups into multiple folder
>>can help, as a listing then only needs to check the indices from the namespace.
>>
>>But, what data and backup amount count/sizes are we talking here?
>
>Server:
>2x Intel Silver 4114 (10 cores, 20 threads each)
>256GB RAM
>A zpool consisting of:
>- 17 three-way mirrors of 18TB Western Digital HC550’s, SAS
>- 2 three-way mirrors of 960GB Samsung PM9A3 nvme’s as special devices
>
>Datastores:
>- 73 datastores
>- Total of 240T Allocated data
>
>Datastore that triggered my question:
>- 263 Groups
>- 2325 Snapshots
>- 60TB In use
>- Dedup factor of 19.3
>
>>How many groups, how many snapshots (per group), many disks on backups?
>>
>>And what hardware is hosting that data (cpu, disk, memory).
>>
>>Hows PSI looking during listing? head /proc/pressure/*
>
>root@pbs003:/proc/pressure# head *
>==> cpu <==
>some avg10=0.74 avg60=0.58 avg300=0.21 total=8570917611
>full avg10=0.00 avg60=0.00 avg300=0.00 total=0
>
>==> io <==
>some avg10=20.45 avg60=23.93 avg300=27.69 total=176562636690
>full avg10=19.25 avg60=22.69 avg300=26.82 total=165397148422
>
>==> memory <==
>some avg10=0.00 avg60=0.00 avg300=0.00 total=67894436
>full avg10=0.00 avg60=0.00 avg300=0.00 total=66761631
>
>Currently running 9 tasks:
>- 3 Verifys
>- 1 Backup
>- 2 Syncjobs
>- 2 GC Runs
>- 1 Reader
>
>—
>Mark Schouten, CTO
>Tuxis B.V.
>mark@tuxis.nl / +31 318 200208




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pbs-devel] Slow overview of existing backups
  2023-01-25 10:26 [pbs-devel] Slow overview of existing backups Mark Schouten
  2023-01-25 16:08 ` Thomas Lamprecht
@ 2023-03-10 10:02 ` Roland
  1 sibling, 0 replies; 8+ messages in thread
From: Roland @ 2023-03-10 10:02 UTC (permalink / raw)
  To: Mark Schouten, Proxmox Backup Server development discussion

hello,

yes, this is really slow, i even observed timeouts (backup list simply
won't appear without any notice) when querying pbs while another node is
running a backup

 > I have the feeling that when you request an overview now, all
individual backups are checked, which seems suboptimal.

yes, but did you know that zfs metadata caching sucks and this may also
be one reason for slowness?

see https://github.com/openzfs/zfs/issues/12028

there is improvement on the way:

https://github.com/openzfs/zfs/pull/14359

regards
roland

Am 25.01.23 um 11:26 schrieb Mark Schouten:
> Hi,
>
> Requesting the available backups from a PBS takes quite a long time.
> Are there any plans to start implementing caching or an overal
> index-file for a datastore? PBS knows when something changed in terms
> of backups, and thus when it’s time to update that index.
>
> I have the feeling that when you request an overview now, all
> individual backups are checked, which seems suboptimal.
>
> Thanks,
>
> —
> Mark Schouten, CTO
> Tuxis B.V.
> mark@tuxis.nl / +31 318 200208
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pbs-devel] Slow overview of existing backups
  2023-03-10  9:09     ` Mark Schouten
@ 2023-03-10 10:16       ` Roland
  2023-03-10 10:52         ` Mark Schouten
  0 siblings, 1 reply; 8+ messages in thread
From: Roland @ 2023-03-10 10:16 UTC (permalink / raw)
  To: Mark Schouten, Proxmox Backup Server development discussion,
	Thomas Lamprecht

Hi,

 >> Requesting the available backups from a PBS takes quite a long time.
 >> Are there any plans to start implementing caching or an overal
index-file for a datastore?

> There's already the host systems page cache that helps a lot, as long
> there's enough memory to avoid displacing its content frequently.

 >- 2 three-way mirrors of 960GB Samsung PM9A3 nvme’s as special devices

ah ok, i see you should have fast metadata access because of special device

what about freshly booting your backup server and issuing

zpool iostat -rv

after listing backups and observing slowness ?

with this we can get more insight where time is spent ,  if it's only
all about metadata access and if things are working good from a
filesystem/performance/metadata point of view.

i don't expect issues there anymore as special vdev got mature in the
meantime, but you never know.  remembering
https://github.com/openzfs/zfs/issues/8130  for example....

if that looks sane from a performance perspective, taking a closer look
at the pbs/indexer level would be good.

regards
roland

Am 10.03.23 um 10:09 schrieb Mark Schouten:
> Hi all,
>
> any thought on this?
>
> —
> Mark Schouten, CTO
> Tuxis B.V.
> mark@tuxis.nl / +31 318 200208
>
>
> ------ Original Message ------
> From "Mark Schouten" <mark@tuxis.nl>
> To "Thomas Lamprecht" <t.lamprecht@proxmox.com>; "Proxmox Backup
> Server development discussion" <pbs-devel@lists.proxmox.com>
> Date 1/26/2023 9:03:24 AM
> Subject Re[2]: [pbs-devel] Slow overview of existing backups
>
>> Hi,
>>
>>>>  PBS knows when something changed in terms of backups, and thus
>>>> when it’s time to update that index.
>>>>
>>>
>>> PBS is build such that the file system is the source of truth, one can,
>>> e.g., remove stuff there or use the manager CLI, multiple PBS instances
>>> can also run parallel, e.g., during upgrade.
>>>
>>> So having a guaranteed in-sync cache is not as trivial as it might
>>> sound.
>>>
>>
>> You can also remove stuff from /var/lib/mysql/, but then you break
>> it. There is nothing wrong with demanding your user to don’t touch
>> any files, except via the tooling you provide. And the tooling you
>> provide, can hint the service to rebuild the index. Same goes for
>> upgrades, you are in charge of them.
>>
>> We also need to regularly run garbage collection, which is a nice
>> moment to update my desired index and check if it’s actually correct.
>> On every backup run, delete, verify, you can update and check the
>> index. Those are all moments a user is not actually waiting for it
>> and getting timeouts, refreshing screens, and other annoyances.
>>
>>>
>>>>  I have the feeling that when you request an overview now, all
>>>> individual backups are checked, which seems suboptimal.
>>>
>>> We mostly walk the directory structure and read the (quite small)
>>> manifest
>>> files for some info like last verification, but we do not check the
>>> backup
>>> (data) itself.
>>>
>>> Note that using namespaces for separating many backups into multiple
>>> folder
>>> can help, as a listing then only needs to check the indices from the
>>> namespace.
>>>
>>> But, what data and backup amount count/sizes are we talking here?
>>
>> Server:
>> 2x Intel Silver 4114 (10 cores, 20 threads each)
>> 256GB RAM
>> A zpool consisting of:
>> - 17 three-way mirrors of 18TB Western Digital HC550’s, SAS
>> - 2 three-way mirrors of 960GB Samsung PM9A3 nvme’s as special devices
>>
>> Datastores:
>> - 73 datastores
>> - Total of 240T Allocated data
>>
>> Datastore that triggered my question:
>> - 263 Groups
>> - 2325 Snapshots
>> - 60TB In use
>> - Dedup factor of 19.3
>>
>>> How many groups, how many snapshots (per group), many disks on backups?
>>>
>>> And what hardware is hosting that data (cpu, disk, memory).
>>>
>>> Hows PSI looking during listing? head /proc/pressure/*
>>
>> root@pbs003:/proc/pressure# head *
>> ==> cpu <==
>> some avg10=0.74 avg60=0.58 avg300=0.21 total=8570917611
>> full avg10=0.00 avg60=0.00 avg300=0.00 total=0
>>
>> ==> io <==
>> some avg10=20.45 avg60=23.93 avg300=27.69 total=176562636690
>> full avg10=19.25 avg60=22.69 avg300=26.82 total=165397148422
>>
>> ==> memory <==
>> some avg10=0.00 avg60=0.00 avg300=0.00 total=67894436
>> full avg10=0.00 avg60=0.00 avg300=0.00 total=66761631
>>
>> Currently running 9 tasks:
>> - 3 Verifys
>> - 1 Backup
>> - 2 Syncjobs
>> - 2 GC Runs
>> - 1 Reader
>>
>> —
>> Mark Schouten, CTO
>> Tuxis B.V.
>> mark@tuxis.nl / +31 318 200208
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pbs-devel] Slow overview of existing backups
  2023-03-10 10:16       ` Roland
@ 2023-03-10 10:52         ` Mark Schouten
  2023-03-13 12:48           ` Mark Schouten
  0 siblings, 1 reply; 8+ messages in thread
From: Mark Schouten @ 2023-03-10 10:52 UTC (permalink / raw)
  To: Roland, Proxmox Backup Server development discussion, Thomas Lamprecht

Hi,

> >- 2 three-way mirrors of 960GB Samsung PM9A3 nvme’s as special devices
>
>ah ok, i see you should have fast metadata access because of special device
>

That’s what I would expect yes. But maybe I can try to set primarycache 
to metadata, not sure if that would have a negative impact on 
prefetching though..

>what about freshly booting your backup server and issuing
>
>zpool iostat -rv
>
>after listing backups and observing slowness ?

There are too many clusters connected to this machine to make it 
’silent’ for a while and debugging this.

—
Mark Schouten, CTO
Tuxis B.V.
mark@tuxis.nl / +31 318 200208




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pbs-devel] Slow overview of existing backups
  2023-03-10 10:52         ` Mark Schouten
@ 2023-03-13 12:48           ` Mark Schouten
  0 siblings, 0 replies; 8+ messages in thread
From: Mark Schouten @ 2023-03-13 12:48 UTC (permalink / raw)
  To: Roland, Proxmox Backup Server development discussion, Thomas Lamprecht

So, setting primarycache to metadata does not help. Maybe even makes the 
overview worse.

Requesting one specific customer takes 55 seconds. This is a datastore 
with 334 groups and 2991 snapshots (About 53TB of total data in this 
datastore).

—
Mark Schouten, CTO
Tuxis B.V.
mark@tuxis.nl / +31 318 200208


------ Original Message ------
From "Mark Schouten" <mark@tuxis.nl>
To "Roland" <devzero@web.de>; "Proxmox Backup Server development 
discussion" <pbs-devel@lists.proxmox.com>; "Thomas Lamprecht" 
<t.lamprecht@proxmox.com>
Date 3/10/2023 11:52:37 AM
Subject Re[2]: [pbs-devel] Slow overview of existing backups

>Hi,
>
>> >- 2 three-way mirrors of 960GB Samsung PM9A3 nvme’s as special devices
>>
>>ah ok, i see you should have fast metadata access because of special device
>>
>
>That’s what I would expect yes. But maybe I can try to set primarycache to metadata, not sure if that would have a negative impact on prefetching though..
>
>>what about freshly booting your backup server and issuing
>>
>>zpool iostat -rv
>>
>>after listing backups and observing slowness ?
>
>There are too many clusters connected to this machine to make it ’silent’ for a while and debugging this.
>
>—
>Mark Schouten, CTO
>Tuxis B.V.
>mark@tuxis.nl / +31 318 200208




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-03-13 12:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-25 10:26 [pbs-devel] Slow overview of existing backups Mark Schouten
2023-01-25 16:08 ` Thomas Lamprecht
2023-01-26  8:03   ` Mark Schouten
2023-03-10  9:09     ` Mark Schouten
2023-03-10 10:16       ` Roland
2023-03-10 10:52         ` Mark Schouten
2023-03-13 12:48           ` Mark Schouten
2023-03-10 10:02 ` Roland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal