public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Christian Ebner <c.ebner@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] [PATCH v5 proxmox-backup 0/5] fix #5331: GC: avoid multiple atime updates
Date: Wed, 26 Mar 2025 11:03:28 +0100	[thread overview]
Message-ID: <20250326100333.116722-1-c.ebner@proxmox.com> (raw)

This patches implement the logic to greatly improve the performance
of phase 1 garbage collection by avoiding multiple atime updates on
the same chunk.

Currently, phase 1 GC iterates over all folders in the datastore
looking and collecting all image index files without taking any
logical assumptions (e.g. namespaces, groups, snapshots, ...). This
is to avoid accidentally missing image index files located in
unexpected paths and therefore not marking their chunks as in use,
leading to potential data losses.

This patches improve phase 1 by:
- Iterating index images using the datatstore's iterators for detecting
  regular index files. Paths outside of the iterator logic are still taken
  into account and processed as well by generating a list of all the found
  images first, removing index files encountered while iterating, finally
  leaving a list of indexes with unexpected paths. These unexpected paths
  are now also logged, for the user to potentially take action.
- Keeping track of recently touched chunks by storing their digests in a
  LRU cache, skipping over expensive atime updates for chunks already
  present in the cache.

Most notably changes since version 4 (thanks Thomas for feedback):
- Added basic benchmark results to the respective commit messages
- Extend reasoning in commit messages
- Adapted variable name, fixed formatting issue

Most notably changes since version 3 (thanks Wolfgang for feedback):
- Use `with_context` over `context` to avoid possibly unnecessary allocation
- Align terminology with docs and rest of the codebase by using index
  file instead of image in method and variable names.

Most notably changes since version 2 (thanks Fabian for feedback):
- Use LRU cache instead of keeping track of chunks from the previous
  snapshot in the group.
- Split patches to logically separate iteration from caching logic
- Adapt for better anyhow context error propagation and formatting

Most notably changes since version 1 (thanks Fabian for feedback):
- Logically iterate using pre-existing iterators instead of constructing
  data structure for iteration when listing images.
- Tested that double listing does not affect runtime.
- Chunks are now remembered for all archives per snapshot, not just a
  single archive per snapshot as previously, this mimics more closely
  the backup behaviour, this give some additional gains in some cases.

Christian Ebner (5):
  tools: lru cache: tell if node was already present or newly inserted
  garbage collection: format error including anyhow error context
  datastore: add helper method to open index reader from path
  garbage collection: generate index file list via datastore iterators
  fix #5331: garbage collection: avoid multiple chunk atime updates

 pbs-datastore/src/datastore.rs  | 179 ++++++++++++++++++++++----------
 pbs-tools/src/lru_cache.rs      |   4 +-
 src/api2/admin/datastore.rs     |   6 +-
 src/bin/proxmox-backup-proxy.rs |   2 +-
 4 files changed, 131 insertions(+), 60 deletions(-)

-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


             reply	other threads:[~2025-03-26 10:04 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-26 10:03 Christian Ebner [this message]
2025-03-26 10:03 ` [pbs-devel] [PATCH v5 proxmox-backup 1/5] tools: lru cache: tell if node was already present or newly inserted Christian Ebner
2025-03-26 10:03 ` [pbs-devel] [PATCH v5 proxmox-backup 2/5] garbage collection: format error including anyhow error context Christian Ebner
2025-03-26 10:03 ` [pbs-devel] [PATCH v5 proxmox-backup 3/5] datastore: add helper method to open index reader from path Christian Ebner
2025-03-26 10:03 ` [pbs-devel] [PATCH v5 proxmox-backup 4/5] garbage collection: generate index file list via datastore iterators Christian Ebner
2025-04-02 17:26   ` Thomas Lamprecht
2025-04-02 19:39     ` Christian Ebner
2025-03-26 10:03 ` [pbs-devel] [PATCH v5 proxmox-backup 5/5] fix #5331: garbage collection: avoid multiple chunk atime updates Christian Ebner
2025-04-02 15:57   ` Thomas Lamprecht
2025-04-02 19:50     ` Christian Ebner
2025-04-02 19:54       ` Thomas Lamprecht
2025-04-02 17:45 ` [pbs-devel] applied-series: [PATCH v5 proxmox-backup 0/5] fix #5331: GC: avoid multiple " Thomas Lamprecht
2025-04-03  9:55   ` Christian Ebner
2025-04-03 10:17     ` Thomas Lamprecht
2025-04-03 10:24       ` Christian Ebner
2025-04-03 10:30         ` Thomas Lamprecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250326100333.116722-1-c.ebner@proxmox.com \
    --to=c.ebner@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal