public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Proxmox Backup Server development discussion
	<pbs-devel@lists.proxmox.com>
Subject: Re: [pbs-devel] [PATCH v8 proxmox-backup 48/69] pxar: caching: add look-ahead cache
Date: Tue, 04 Jun 2024 11:35:15 +0200	[thread overview]
Message-ID: <1717493343.bps2geb0tc.astroid@yuna.none> (raw)
In-Reply-To: <20240528094303.309806-49-c.ebner@proxmox.com>

On May 28, 2024 11:42 am, Christian Ebner wrote:
> Add a lookahead cache and the neccessary types to store the required
> data and keep track of directory boundaries while traversing the
> filesystem tree, in order to postpone a decision if to reuse or
> reencode a given regular file with unchanged metadata.
> 
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> changes since version 7:
> - no changes
> 
> changes since version 6:
> - add PxarLookaheadCache and refactor some of the logic to be contained
>   within this patch
> 
>  pbs-client/src/pxar/create.rs           |   2 +-
>  pbs-client/src/pxar/look_ahead_cache.rs | 165 ++++++++++++++++++++++++
>  pbs-client/src/pxar/mod.rs              |   1 +
>  3 files changed, 167 insertions(+), 1 deletion(-)
>  create mode 100644 pbs-client/src/pxar/look_ahead_cache.rs
> 
> diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
> index ac8827bb2..6127aa88f 100644
> --- a/pbs-client/src/pxar/create.rs
> +++ b/pbs-client/src/pxar/create.rs
> @@ -131,7 +131,7 @@ impl fmt::Display for ArchiveError {
>  }
>  
>  #[derive(Eq, PartialEq, Hash)]
> -struct HardLinkInfo {
> +pub(crate) struct HardLinkInfo {
>      st_dev: u64,
>      st_ino: u64,
>  }
> diff --git a/pbs-client/src/pxar/look_ahead_cache.rs b/pbs-client/src/pxar/look_ahead_cache.rs
> new file mode 100644
> index 000000000..539586271
> --- /dev/null
> +++ b/pbs-client/src/pxar/look_ahead_cache.rs
> @@ -0,0 +1,165 @@
> +use std::collections::HashSet;
> +use std::ffi::CString;
> +use std::ops::Range;
> +use std::os::unix::io::OwnedFd;
> +use std::path::PathBuf;
> +
> +use nix::sys::stat::FileStat;
> +
> +use pxar::encoder::PayloadOffset;
> +use pxar::Metadata;
> +
> +use super::create::*;
> +
> +const DEFAULT_CACHE_SIZE: usize = 512;
> +
> +pub(crate) struct CacheEntryData {
> +    pub(crate) fd: OwnedFd,
> +    pub(crate) c_file_name: CString,
> +    pub(crate) stat: FileStat,
> +    pub(crate) metadata: Metadata,
> +    pub(crate) payload_offset: PayloadOffset,
> +}
> +
> +pub(crate) enum CacheEntry {
> +    RegEntry(CacheEntryData),
> +    DirEntry(CacheEntryData),
> +    DirEnd,
> +}
> +
> +pub(crate) struct PxarLookaheadCache {
> +    // Current state of the cache
> +    enabled: bool,
> +    // Cached entries
> +    entries: Vec<CacheEntry>,
> +    // Entries encountered having more than one link given by stat
> +    hardlinks: HashSet<HardLinkInfo>,
> +    // Payload range covered by the currently cached entries
> +    range: Range<u64>,
> +    // Possible held back last chunk from last flush, used for possible chunk continuation
> +    last_chunk: Option<ReusableDynamicEntry>,
> +    // Path when started caching
> +    start_path: PathBuf,
> +    // Number of entries with file descriptors
> +    fd_entries: usize,
> +    // Max number of entries with file descriptors
> +    cache_size: usize,
> +}
> +
> +impl PxarLookaheadCache {
> +    pub(crate) fn new(size: Option<usize>) -> Self {
> +        Self {
> +            enabled: false,
> +            entries: Vec::new(),
> +            hardlinks: HashSet::new(),
> +            range: 0..0,
> +            last_chunk: None,
> +            start_path: PathBuf::new(),
> +            fd_entries: 0,
> +            cache_size: size.unwrap_or(DEFAULT_CACHE_SIZE),
> +        }
> +    }
> +
> +    pub(crate) fn is_full(&self) -> bool {
> +        self.fd_entries >= self.cache_size
> +    }
> +
> +    pub(crate) fn caching_enabled(&self) -> bool {
> +        self.enabled
> +    }
> +
> +    pub(crate) fn insert(

2 out of 3 calls to this are preceded by the same call to
update_start_path.. we could just add the path as parameter here, and
inline that call and drop update_start_path altogether AFAICT?

> +        &mut self,
> +        fd: OwnedFd,
> +        c_file_name: CString,
> +        stat: FileStat,
> +        metadata: Metadata,
> +        payload_offset: PayloadOffset,
> +    ) {
> +        self.enabled = true;
> +        self.fd_entries += 1;
> +        if metadata.is_dir() {
> +            self.entries.push(CacheEntry::DirEntry(CacheEntryData {
> +                fd,
> +                c_file_name,
> +                stat,
> +                metadata,
> +                payload_offset,
> +            }))
> +        } else {
> +            self.entries.push(CacheEntry::RegEntry(CacheEntryData {
> +                fd,
> +                c_file_name,
> +                stat,
> +                metadata,
> +                payload_offset,
> +            }))
> +        }
> +    }
> +
> +    pub(crate) fn insert_dir_end(&mut self) {
> +        self.entries.push(CacheEntry::DirEnd);
> +    }
> +
> +    pub(crate) fn take_and_reset(&mut self) -> Vec<CacheEntry> {
> +        self.fd_entries = 0;
> +        self.enabled = false;
> +        self.start_path.clear();

start_path is cleared here, and take_and_reset is called

> +        self.clear_range();
> +        std::mem::take(&mut self.entries)
> +    }
> +
> +    pub(crate) fn update_start_path(&mut self, path: PathBuf) {
> +        self.start_path = path;
> +    }
> +
> +    pub(crate) fn start_path(&self) -> &PathBuf {
> +        &self.start_path

right after the only call to this..

so take_and_reset could just take the path as well and return it, and we
can drop this one here?

> +    }
> +
> +    pub(crate) fn contains_hardlink(&self, info: &HardLinkInfo) -> bool {
> +        self.hardlinks.contains(info)
> +    }
> +
> +    pub(crate) fn insert_hardlink(&mut self, info: HardLinkInfo) -> bool {
> +        self.hardlinks.insert(info)
> +    }
> +
> +    pub(crate) fn range(&self) -> &Range<u64> {
> +        &self.range
> +    }
> +
> +    pub(crate) fn update_range(&mut self, range: Range<u64>) {
> +        self.range = range;
> +    }
> +
> +    pub(crate) fn clear_range(&mut self) {
> +        // keep end for possible continuation if cache has been cleared because
> +        // it was full, but further caching would be fine
> +        self.range = self.range.end..self.range.end
> +    }

dangerous name.. clear to me always implies removing everything..
especially since there is no doc comment on it that gives me such
important information at the call site.

buuuut, thankfully this is only called once, and that call is a few
lines above in take_and_reset, so maybe we can just inline it for now
and not expose this to accidents?

> +
> +    pub(crate) fn try_extend_range(&mut self, range: Range<u64>) -> bool {
> +        if self.range.end == 0 {
> +            // initialize first range to start and end with start of new range
> +            self.range.start = range.start;
> +            self.range.end = range.start;
> +        }
> +
> +        // range continued, update end
> +        if self.range.end == range.start {
> +            self.range.end = range.end;
> +            return true;
> +        }
> +
> +        false
> +    }
> +
> +    pub(crate) fn take_last_chunk(&mut self) -> Option<ReusableDynamicEntry> {
> +        self.last_chunk.take()
> +    }
> +
> +    pub(crate) fn update_last_chunk(&mut self, chunk: Option<ReusableDynamicEntry>) {
> +        self.last_chunk = chunk;
> +    }
> +}
> diff --git a/pbs-client/src/pxar/mod.rs b/pbs-client/src/pxar/mod.rs
> index 5248a1956..334759df6 100644
> --- a/pbs-client/src/pxar/mod.rs
> +++ b/pbs-client/src/pxar/mod.rs
> @@ -50,6 +50,7 @@
>  pub(crate) mod create;
>  pub(crate) mod dir_stack;
>  pub(crate) mod extract;
> +pub(crate) mod look_ahead_cache;
>  pub(crate) mod metadata;
>  pub(crate) mod tools;
>  
> -- 
> 2.39.2
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


  reply	other threads:[~2024-06-04  9:35 UTC|newest]

Thread overview: 101+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-28  9:41 [pbs-devel] [PATCH v8 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
2024-05-28  9:41 ` [pbs-devel] [PATCH v8 pxar 01/69] decoder: factor out skip part from skip_entry Christian Ebner
2024-05-28  9:41 ` [pbs-devel] [PATCH v8 pxar 02/69] lib: add type for input/output variant differentiation Christian Ebner
2024-05-28  9:41 ` [pbs-devel] [PATCH v8 pxar 03/69] encoder: move to stack based state tracking Christian Ebner
2024-05-28  9:41 ` [pbs-devel] [PATCH v8 pxar 04/69] format/examples: add header type `PXAR_PAYLOAD_REF` Christian Ebner
2024-05-28  9:41 ` [pbs-devel] [PATCH v8 pxar 05/69] decoder: add method to read payload references Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 pxar 06/69] encoder: allow split output writer for archive creation Christian Ebner
2024-05-29 11:54   ` Dominik Csapak
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 pxar 07/69] decoder/accessor: allow for split input stream variant Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 pxar 08/69] decoder: set payload input range when decoding via accessor Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 pxar 09/69] encoder: add payload reference capability Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 pxar 10/69] encoder: add payload position capability Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 pxar 11/69] encoder: add payload advance capability Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 pxar 12/69] encoder/format: finish payload stream with marker Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 pxar 13/69] format: add payload stream start marker Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 pxar 14/69] format/encoder/decoder: new pxar entry type `Version` Christian Ebner
2024-06-03 11:25   ` Fabian Grünbichler
2024-06-03 11:54     ` Christian Ebner
2024-06-03 12:10       ` Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 pxar 15/69] format/encoder/decoder: new pxar entry type `Prelude` Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 16/69] client: backup: factor out extension from backup target Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 17/69] api: datastore: refactor getting local chunk reader Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 18/69] client: pxar: switch to stack based encoder state Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 19/69] client: pxar: combine writers into struct Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 20/69] client: pxar: optionally split metadata and payload streams Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 21/69] client: helper: add helpers for creating reader instances Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 22/69] client: helper: add method for split archive name mapping Christian Ebner
2024-06-04  8:17   ` Fabian Grünbichler
2024-06-04  8:30     ` Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 23/69] client: tools: helper to check pxar filename extensions Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 24/69] client: restore: read payload from dedicated index Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 25/69] tools: cover extension for split pxar archives Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 26/69] restore: " Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 27/69] client: mount: make split pxar archives mountable Christian Ebner
2024-06-04  8:24   ` Fabian Grünbichler
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 28/69] api: datastore: attach split archive payload chunk reader Christian Ebner
2024-06-04  8:26   ` Fabian Grünbichler
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 29/69] catalog: shell: make split pxar archives accessible Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 30/69] www: cover metadata extension for pxar archives Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 31/69] file restore: factor out getting pxar reader Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 32/69] file restore: cover split metadata and payload archives Christian Ebner
2024-06-04  8:28   ` Fabian Grünbichler
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 33/69] file restore: show more error context when extraction fails Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 34/69] pxar: add optional payload input for archive restore Christian Ebner
2024-06-03 13:23   ` Fabian Grünbichler
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 35/69] pxar: cover listing for split archives Christian Ebner
2024-06-03 13:27   ` Fabian Grünbichler
2024-06-03 13:36     ` Christian Ebner
2024-06-03 14:54       ` Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 36/69] pxar: add more context to extraction error Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 37/69] client: pxar: include payload offset in entry listing Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 38/69] pxar: show padding in debug output on archive list Christian Ebner
2024-06-04  8:34   ` Fabian Grünbichler
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 39/69] datastore: dynamic index: add method to get digest Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 40/69] client: pxar: helper for lookup of reusable dynamic entries Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 41/69] upload stream: implement reused chunk injector Christian Ebner
2024-06-04  8:50   ` Fabian Grünbichler
2024-06-04  8:58     ` Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 42/69] client: chunk stream: add struct to hold injection state Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 43/69] chunker: add method to reset chunker state Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 44/69] client: streams: add channels for dynamic entry injection Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 45/69] specs: add backup detection mode specification Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 46/69] client: implement prepare reference method Christian Ebner
2024-06-04  9:24   ` Fabian Grünbichler
2024-06-04 12:45     ` Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 47/69] client: pxar: add method for metadata comparison Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 48/69] pxar: caching: add look-ahead cache Christian Ebner
2024-06-04  9:35   ` Fabian Grünbichler [this message]
2024-06-04 13:58     ` Christian Ebner
2024-06-05 10:56   ` Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 49/69] client: pxar: refactor catalog encoding for directories Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 50/69] fix #3174: client: pxar: enable caching and meta comparison Christian Ebner
2024-06-04 11:50   ` Fabian Grünbichler
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 51/69] client: backup writer: add injected chunk count to stats Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 52/69] pxar: create: keep track of reused chunks and files Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 53/69] pxar: create: show chunk injection stats debug output Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 54/69] client: pxar: add helper to handle optional preludes Christian Ebner
2024-06-04 11:55   ` Fabian Grünbichler
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 55/69] client: pxar: opt encode cli exclude patterns as Prelude Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 56/69] pxar: ignore version and prelude entries in listing Christian Ebner
2024-06-04  8:39   ` Fabian Grünbichler
2024-06-04  8:48     ` Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 57/69] docs: file formats: describe split pxar archive file layout Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 58/69] docs: add section describing change detection mode Christian Ebner
2024-06-04 12:07   ` Fabian Grünbichler
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 59/69] test-suite: add detection mode change benchmark Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 60/69] test-suite: Makefile: add debian package and related files Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 61/69] datastore: chunker: add Chunker trait Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 62/69] datastore: chunker: implement chunker for payload stream Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 63/69] client: chunk stream: switch payload stream chunker Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 64/69] client: pxar: allow to restore prelude to optional path Christian Ebner
2024-06-03 13:57   ` Fabian Grünbichler
2024-06-03 15:02     ` Christian Ebner
2024-05-28  9:42 ` [pbs-devel] [PATCH v8 proxmox-backup 65/69] client: pxar: add archive creation with reference test Christian Ebner
2024-05-28  9:43 ` [pbs-devel] [PATCH v8 proxmox-backup 66/69] client: tools: add helper to raise nofile rlimit Christian Ebner
2024-05-28  9:43 ` [pbs-devel] [PATCH v8 proxmox-backup 67/69] client: pxar: set cache limit based on " Christian Ebner
2024-05-28  9:43 ` [pbs-devel] [PATCH v8 proxmox-backup 68/69] chunker: tests: add regression tests for payload chunker Christian Ebner
2024-05-28  9:43 ` [pbs-devel] [PATCH v8 proxmox-backup 69/69] chunk stream: " Christian Ebner
2024-05-31 10:40 ` [pbs-devel] [PATCH v8 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Dominik Csapak
2024-05-31 11:19   ` Christian Ebner
2024-06-05  8:51 ` [pbs-devel] partially-applied: " Fabian Grünbichler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1717493343.bps2geb0tc.astroid@yuna.none \
    --to=f.gruenbichler@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal