public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Proxmox Backup Server development discussion
	<pbs-devel@lists.proxmox.com>
Subject: Re: [pbs-devel] [RFC v2 proxmox-backup 34/36] fix #3174: client: pxar: enable caching and meta comparison
Date: Wed, 13 Mar 2024 12:12:00 +0100	[thread overview]
Message-ID: <1710325718.hwp7rvfzh8.astroid@yuna.none> (raw)
In-Reply-To: <20240305092703.126906-35-c.ebner@proxmox.com>

On March 5, 2024 10:27 am, Christian Ebner wrote:
> Add the final glue logic to enable the look-ahead caching and
> metadata comparison introduced in the preparatory patches.

I have to say the call stacks here are not getting easier to follow with
all the intermingled caching_enabled logic...

create_archive
-> archive_dir_contents
--> loop over files -> add_entry
--->.add_entry_to_archive or flush cache and cache or
add_entry_to_archive
-> flush_cached_to_archive
-> encoder.finish

maybe it does get a bit disentangled or easier if
add_entry/flush_entry_to_archive are merged?

> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> changes since version 1:
> - fix pxar exclude cli entry caching
> 
>  pbs-client/src/pxar/create.rs | 121 +++++++++++++++++++++++++++++++---
>  1 file changed, 113 insertions(+), 8 deletions(-)
> 
> diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
> index b2ce898f..bb4597bc 100644
> --- a/pbs-client/src/pxar/create.rs
> +++ b/pbs-client/src/pxar/create.rs
> @@ -32,10 +32,14 @@ use pbs_datastore::dynamic_index::{
>  };
>  
>  use crate::inject_reused_chunks::InjectChunks;
> +use crate::pxar::look_ahead_cache::{CacheEntry, CacheEntryData};
>  use crate::pxar::metadata::errno_is_unsupported;
>  use crate::pxar::tools::assert_single_path_component;
>  use crate::pxar::Flags;
>  
> +const MAX_CACHE_SIZE: usize = 512;
> +const CACHED_PAYLOAD_THRESHOLD: u64 = 2 * 1024 * 1024;
> +
>  #[derive(Default)]
>  struct ReusedChunks {
>      start_boundary: PayloadOffset,
> @@ -253,6 +257,9 @@ struct Archiver {
>      reused_chunks: ReusedChunks,
>      previous_payload_index: Option<DynamicIndexReader>,
>      forced_boundaries: Arc<Mutex<VecDeque<InjectChunks>>>,
> +    cached_entries: Vec<CacheEntry>,
> +    caching_enabled: bool,
> +    cached_payload_size: u64,

you can probably already guess ;) this should be combined/refactored
into some common "re-use enabled" struct.

>  }
>  
>  type Encoder<'a, T> = pxar::encoder::aio::Encoder<'a, T>;
> @@ -335,16 +342,32 @@ where
>          reused_chunks: ReusedChunks::new(),
>          previous_payload_index,
>          forced_boundaries,
> +        cached_entries: Vec::new(),
> +        caching_enabled: false,
> +        cached_payload_size: 0,
>      };
>  
>      archiver
>          .archive_dir_contents(&mut encoder, accessor, source_dir, true)
>          .await?;
> +
> +    if let Some(last) = archiver.cached_entries.pop() {
> +        match last {
> +            // do not close final directory, this is done by the caller
> +            CacheEntry::DirEnd => {}
> +            _ => archiver.cached_entries.push(last),
> +        }
> +    }

    // do not close final directory, this is done by the caller
    if let Some(CacheEntry::DirEnd) = archiver.cached_entries.last() {
        archiver.cached_entries.pop();
    }
 
should do the same but a bit cheaper / easier to read.

but - "caller" is kind of misleading here, right? because it's not the
caller of `create_archive` that handles the top-level DirEnd, it's  the
call to `encoder.finish()` right below..

it kinda seems like it would be an error to end up here with some other
last cached element? if so, then maybe it would make sense to make this
an invariant instead:

match archiver.cached_entries.pop() {
    Some(CachEntry::DirEnd) | None => { // OK },
    Some(entry) => { bail!("Finished creating archive with cache, but
    last cache element is {entry:?} instead of top-level directory end
    marker."); },
}

OTOH, it's archive_dir_contents itself that adds that entry, and it
knows whether it is called for the top-level dir or not, so it could
just skip adding it in the first place in the root case?

> +
> +    archiver
> +        .flush_cached_to_archive(&mut encoder, true, false)
> +        .await?;
> +
>      encoder.finish().await?;
>      Ok(())
>  }
>  
> -struct FileListEntry {
> +pub(crate) struct FileListEntry {
>      name: CString,
>      path: PathBuf,
>      stat: FileStat,
> @@ -396,8 +419,15 @@ impl Archiver {
>                  let file_name = file_entry.name.to_bytes();
>  
>                  if is_root && file_name == b".pxarexclude-cli" {
> -                    self.encode_pxarexclude_cli(encoder, &file_entry.name, old_patterns_count)
> -                        .await?;
> +                    if self.caching_enabled {
> +                        self.cached_entries.push(CacheEntry::PxarExcludeCliEntry(
> +                            file_entry,
> +                            old_patterns_count,
> +                        ));
> +                    } else {
> +                        self.encode_pxarexclude_cli(encoder, &file_entry.name, old_patterns_count)
> +                            .await?;
> +                    }
>                      continue;
>                  }
>  
> @@ -413,6 +443,11 @@ impl Archiver {
>                  .await
>                  .map_err(|err| self.wrap_err(err))?;
>              }
> +
> +            if self.caching_enabled {
> +                self.cached_entries.push(CacheEntry::DirEnd);
> +            }
> +
>              self.path = old_path;
>              self.entry_counter = entry_counter;
>              self.patterns.truncate(old_patterns_count);
> @@ -693,8 +728,6 @@ impl Archiver {
>          c_file_name: &CStr,
>          stat: &FileStat,
>      ) -> Result<(), Error> {
> -        use pxar::format::mode;
> -
>          let file_mode = stat.st_mode & libc::S_IFMT;
>          let open_mode = if file_mode == libc::S_IFREG || file_mode == libc::S_IFDIR {
>              OFlag::empty()
> @@ -732,6 +765,71 @@ impl Archiver {
>              self.skip_e2big_xattr,
>          )?;
>  
> +        if self.previous_payload_index.is_none() {
> +            return self
> +                .add_entry_to_archive(encoder, accessor, c_file_name, stat, fd, &metadata)
> +                .await;
> +        }
> +
> +        // Avoid having to many open file handles in cached entries
> +        if self.cached_entries.len() > MAX_CACHE_SIZE {
> +            self.flush_cached_to_archive(encoder, false, true).await?;
> +        }
> +
> +        if metadata.is_regular_file() {
> +            self.cache_or_flush_entries(encoder, accessor, c_file_name, stat, fd, &metadata)
> +                .await
> +        } else {
> +            if self.caching_enabled {
> +                if stat.st_mode & libc::S_IFMT == libc::S_IFDIR {
> +                    let fd_clone = fd.try_clone()?;
> +                    let cache_entry = CacheEntry::DirEntry(CacheEntryData::new(
> +                        fd,
> +                        c_file_name.into(),
> +                        stat.clone(),
> +                        metadata.clone(),
> +                        PayloadOffset::default(),
> +                    ));
> +                    self.cached_entries.push(cache_entry);
> +
> +                    let dir = Dir::from_fd(fd_clone.into_raw_fd())?;
> +                    self.add_directory(encoder, accessor, dir, c_file_name, &metadata, stat)
> +                        .await?;
> +
> +                    if let Some(ref catalog) = self.catalog {
> +                        if !self.caching_enabled {
> +                            catalog.lock().unwrap().end_directory()?;
> +                        }
> +                    }
> +                } else {
> +                    let cache_entry = CacheEntry::RegEntry(CacheEntryData::new(
> +                        fd,
> +                        c_file_name.into(),
> +                        stat.clone(),
> +                        metadata,
> +                        PayloadOffset::default(),
> +                    ));
> +                    self.cached_entries.push(cache_entry);
> +                }
> +                Ok(())
> +            } else {
> +                self.add_entry_to_archive(encoder, accessor, c_file_name, stat, fd, &metadata)
> +                    .await
> +            }
> +        }
> +    }
> +
> +    async fn add_entry_to_archive<T: SeqWrite + Send>(
> +        &mut self,
> +        encoder: &mut Encoder<'_, T>,
> +        accessor: &mut Option<Directory<LocalDynamicReadAt<RemoteChunkReader>>>,
> +        c_file_name: &CStr,
> +        stat: &FileStat,
> +        fd: OwnedFd,
> +        metadata: &Metadata,
> +    ) -> Result<(), Error> {
> +        use pxar::format::mode;
> +
>          let file_name: &Path = OsStr::from_bytes(c_file_name.to_bytes()).as_ref();
>          match metadata.file_type() {
>              mode::IFREG => {
> @@ -781,7 +879,9 @@ impl Archiver {
>                      .add_directory(encoder, accessor, dir, c_file_name, &metadata, stat)
>                      .await;
>                  if let Some(ref catalog) = self.catalog {
> -                    catalog.lock().unwrap().end_directory()?;
> +                    if !self.caching_enabled {
> +                        catalog.lock().unwrap().end_directory()?;
> +                    }
>                  }
>                  result
>              }
> @@ -1132,7 +1232,9 @@ impl Archiver {
>      ) -> Result<(), Error> {
>          let dir_name = OsStr::from_bytes(dir_name.to_bytes());
>  
> -        encoder.create_directory(dir_name, metadata).await?;
> +        if !self.caching_enabled {
> +            encoder.create_directory(dir_name, metadata).await?;
> +        }
>  
>          let old_fs_magic = self.fs_magic;
>          let old_fs_feature_flags = self.fs_feature_flags;
> @@ -1172,7 +1274,10 @@ impl Archiver {
>          self.fs_feature_flags = old_fs_feature_flags;
>          self.current_st_dev = old_st_dev;
>  
> -        encoder.finish().await?;
> +        if !self.caching_enabled {
> +            encoder.finish().await?;
> +        }
> +
>          result
>      }
>  
> -- 
> 2.39.2
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 




  reply	other threads:[~2024-03-13 11:12 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-05  9:26 [pbs-devel] [RFC pxar proxmox-backup 00/36] fix #3174: improve file-level backup Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 pxar 01/36] format/examples: add PXAR_PAYLOAD_REF entry header Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 pxar 02/36] encoder: add optional output writer for file payloads Christian Ebner
2024-03-11 13:21   ` Fabian Grünbichler
2024-03-11 13:50     ` Christian Ebner
2024-03-11 15:41       ` Fabian Grünbichler
2024-03-05  9:26 ` [pbs-devel] [RFC v2 pxar 03/36] format/decoder: add method to read payload references Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 pxar 04/36] decoder: add optional payload input stream Christian Ebner
2024-03-11 13:21   ` Fabian Grünbichler
2024-03-11 14:05     ` Christian Ebner
2024-03-11 15:27       ` Fabian Grünbichler
2024-03-11 15:51         ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 pxar 05/36] accessor: " Christian Ebner
2024-03-11 13:21   ` Fabian Grünbichler
2024-03-05  9:26 ` [pbs-devel] [RFC v2 pxar 06/36] encoder: move to stack based state tracking Christian Ebner
2024-03-11 13:21   ` Fabian Grünbichler
2024-03-11 14:12     ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 pxar 07/36] encoder: add payload reference capability Christian Ebner
2024-03-11 13:21   ` Fabian Grünbichler
2024-03-11 14:15     ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 pxar 08/36] encoder: add payload position capability Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 pxar 09/36] encoder: add payload advance capability Christian Ebner
2024-03-11 13:22   ` Fabian Grünbichler
2024-03-11 14:22     ` Christian Ebner
2024-03-11 15:27       ` Fabian Grünbichler
2024-03-11 15:41         ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 pxar 10/36] encoder/format: finish payload stream with marker Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 11/36] client: pxar: switch to stack based encoder state Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 12/36] client: backup: factor out extension from backup target Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 13/36] client: backup: early check for fixed index type Christian Ebner
2024-03-11 14:57   ` Fabian Grünbichler
2024-03-11 15:12     ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 14/36] client: backup: split payload to dedicated stream Christian Ebner
2024-03-11 14:57   ` Fabian Grünbichler
2024-03-11 15:22     ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 15/36] client: restore: read payload from dedicated index Christian Ebner
2024-03-11 14:58   ` Fabian Grünbichler
2024-03-11 15:26     ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 16/36] tools: cover meta extension for pxar archives Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 17/36] restore: " Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 18/36] client: mount: make split pxar archives mountable Christian Ebner
2024-03-11 14:58   ` Fabian Grünbichler
2024-03-11 15:29     ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 19/36] api: datastore: refactor getting local chunk reader Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 20/36] api: datastore: attach optional payload " Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 21/36] catalog: shell: factor out pxar fuse reader instantiation Christian Ebner
2024-03-11 14:58   ` Fabian Grünbichler
2024-03-11 15:31     ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 22/36] catalog: shell: redirect payload reader for split streams Christian Ebner
2024-03-11 14:58   ` Fabian Grünbichler
2024-03-11 15:24     ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 23/36] www: cover meta extension for pxar archives Christian Ebner
2024-03-11 14:58   ` Fabian Grünbichler
2024-03-11 15:31     ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 24/36] index: fetch chunk form index by start/end-offset Christian Ebner
2024-03-12  8:50   ` Fabian Grünbichler
2024-03-14  8:23     ` Christian Ebner
2024-03-12 12:47   ` Dietmar Maurer
2024-03-12 12:51     ` Christian Ebner
2024-03-12 13:03       ` Dietmar Maurer
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 25/36] upload stream: impl reused chunk injector Christian Ebner
2024-03-13  9:43   ` Dietmar Maurer
2024-03-14 14:03     ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 26/36] client: chunk stream: add chunk injection queues Christian Ebner
2024-03-12  9:46   ` Fabian Grünbichler
2024-03-19 10:52     ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 27/36] client: implement prepare reference method Christian Ebner
2024-03-12 10:07   ` Fabian Grünbichler
2024-03-19 11:51     ` Christian Ebner
2024-03-19 12:49       ` Fabian Grünbichler
2024-03-20  8:37         ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 28/36] client: pxar: implement store to insert chunks on caching Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 29/36] client: pxar: add previous reference to archiver Christian Ebner
2024-03-12 12:12   ` Fabian Grünbichler
2024-03-12 12:25     ` Christian Ebner
2024-03-19 12:59     ` Christian Ebner
2024-03-19 13:04       ` Fabian Grünbichler
2024-03-20  8:52         ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 30/36] client: pxar: add method for metadata comparison Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 31/36] specs: add backup detection mode specification Christian Ebner
2024-03-12 12:17   ` Fabian Grünbichler
2024-03-12 12:31     ` Christian Ebner
2024-03-20  9:28       ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 32/36] pxar: caching: add look-ahead cache types Christian Ebner
2024-03-05  9:27 ` [pbs-devel] [RFC v2 proxmox-backup 33/36] client: pxar: add look-ahead caching Christian Ebner
2024-03-12 14:08   ` Fabian Grünbichler
2024-03-20 10:28     ` Christian Ebner
2024-03-05  9:27 ` [pbs-devel] [RFC v2 proxmox-backup 34/36] fix #3174: client: pxar: enable caching and meta comparison Christian Ebner
2024-03-13 11:12   ` Fabian Grünbichler [this message]
2024-03-05  9:27 ` [pbs-devel] [RFC v2 proxmox-backup 35/36] test-suite: add detection mode change benchmark Christian Ebner
2024-03-13 11:48   ` Fabian Grünbichler
2024-03-05  9:27 ` [pbs-devel] [RFC v2 proxmox-backup 36/36] test-suite: Add bin to deb, add shell completions Christian Ebner
2024-03-13 11:18   ` Fabian Grünbichler
2024-03-13 11:44 ` [pbs-devel] [RFC pxar proxmox-backup 00/36] fix #3174: improve file-level backup Fabian Grünbichler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1710325718.hwp7rvfzh8.astroid@yuna.none \
    --to=f.gruenbichler@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal