public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Christian Ebner <c.ebner@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] [PATCH v2 proxmox-backup 0/4] GC: avoid multiple atime updates
Date: Mon, 10 Mar 2025 12:16:30 +0100	[thread overview]
Message-ID: <20250310111634.162156-1-c.ebner@proxmox.com> (raw)

This patches implement the logic to greatly improve the performance
of phase 1 garbage collection by avoiding multiple atime updates on
the same chunk.

Currently, phase 1 GC iterates over all folders in the datastore
looking and collecting all image index files without taking any
logical assumptions (e.g. namespaces, groups, snapshots, ...). This
is to avoid accidentally missing image index files located in
unexpected paths and therefore not marking their chunks as in use,
leading to potential data losses.

This patches improve phase 1 by iterating index images using the datatstore's
iterators and keep track of the already touched chunks for consecutive
backup snapshots, following the same principle as for incremental backup
snapshots. Paths outside of the iterator logic are still taken into
account and processed as well by generating a list of all the found images
first, removing index files encountered while iterating, finally leaving
a list of indexes with unexpected paths. These unexpected paths are now
also logged, for the user to potentially take action.

By keeping track of already seen and therefore updated chunk atimes, it is now
avoided to update the atime over and over again on the chunks shared by
consecutive backup snaphshots.

Most notably changes since version 1 (thanks Fabian for feedback):
- Logically iterate using pre-existing iterators instead of constructing
  data structure for iteration when listing images.
- Tested that double listing does not affect runtime.
- Chunks are now remembered for all archives per snapshot, not just a
  single archive per snapshot as previously, this mimics more closely
  the backup behaviour, this give some additional gains in some cases.

Statistics generated by averaging 3 GC runtimes, measured after an initial
run each to warm up caches. Datastores A and B (192 index files) are unrelated,
containing "real" backups. The syscall counts were generated using
`strace -f -e utimensat -p $(pidof proxmox-backup-proxy)` and (after small
cleanup) `wc -l`.

datastore A on spinning disk:
unpatched: 117 ± 4 s,    utimensat calls: 12059913
version 1: 27.6 ± 0.5 s, utimensat calls:  1178913
version 2: 24.3 ± 0.5 s, utimensat calls:  1120317

datastore B on SSD:
unpatched: 27 ± 1 s,     utimensat calls: 2032380
version 1: 14.3 ± 0.5 s, utimensat calls:  565417
version 2: 15.1 ± 0.2 s, utimensat calls:  564617

datastore B via NFS export:
unpatched: aborted after 10 min - no progress (while other versions did)
version 1: 34 min 3 s
version 2: 32 min 19 s
Above results are from only 1 run, since GC takes long time on the NFS shared
datastore.

Christian Ebner (4):
  datastore: restrict datastores list_images method scope to module
  datastore: add helper method to open index reader from path
  garbage collection: allow to keep track of already touched chunks
  fix #5331: garbage collection: avoid multiple chunk atime updates

 pbs-datastore/src/datastore.rs | 206 ++++++++++++++++++++++++---------
 1 file changed, 154 insertions(+), 52 deletions(-)

-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

             reply	other threads:[~2025-03-10 11:17 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-10 11:16 Christian Ebner [this message]
2025-03-10 11:16 ` [pbs-devel] [PATCH v2 proxmox-backup 1/4] datastore: restrict datastores list_images method scope to module Christian Ebner
2025-03-10 11:16 ` [pbs-devel] [PATCH v2 proxmox-backup 2/4] datastore: add helper method to open index reader from path Christian Ebner
2025-03-10 11:16 ` [pbs-devel] [PATCH v2 proxmox-backup 3/4] garbage collection: allow to keep track of already touched chunks Christian Ebner
2025-03-10 11:16 ` [pbs-devel] [PATCH v2 proxmox-backup 4/4] fix #5331: garbage collection: avoid multiple chunk atime updates Christian Ebner
2025-03-10 11:40   ` Christian Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250310111634.162156-1-c.ebner@proxmox.com \
    --to=c.ebner@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal