From: Christian Ebner <c.ebner@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] [RFC v2 proxmox-backup 28/36] client: pxar: implement store to insert chunks on caching
Date: Tue, 5 Mar 2024 10:26:55 +0100 [thread overview]
Message-ID: <20240305092703.126906-29-c.ebner@proxmox.com> (raw)
In-Reply-To: <20240305092703.126906-1-c.ebner@proxmox.com>
In preparation for the look-ahead caching used to temprarily store
entries before encoding them in the pxar archive, being able to
decide wether to re-use or re-encode regular file entries.
Allows to insert and store reused chunks in the archiver,
deduplicating chunks upon insert when possible.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 1:
- s/Appendable/Reusable/ incorrect naming leftover from previous
approach
pbs-client/src/pxar/create.rs | 109 +++++++++++++++++++++++++++++++++-
1 file changed, 107 insertions(+), 2 deletions(-)
diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index cb0af29e..66bdbce8 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -18,7 +18,7 @@ use nix::sys::stat::{FileStat, Mode};
use pathpatterns::{MatchEntry, MatchFlag, MatchList, MatchType, PatternFlag};
use proxmox_sys::error::SysError;
use pxar::accessor::aio::Accessor;
-use pxar::encoder::{LinkOffset, SeqWrite};
+use pxar::encoder::{LinkOffset, PayloadOffset, SeqWrite};
use pxar::Metadata;
use proxmox_io::vec;
@@ -27,13 +27,116 @@ use proxmox_sys::fs::{self, acl, xattr};
use crate::RemoteChunkReader;
use pbs_datastore::catalog::BackupCatalogWriter;
-use pbs_datastore::dynamic_index::{DynamicIndexReader, LocalDynamicReadAt};
+use pbs_datastore::dynamic_index::{
+ ReusableDynamicEntry, DynamicIndexReader, LocalDynamicReadAt,
+};
use crate::inject_reused_chunks::InjectChunks;
use crate::pxar::metadata::errno_is_unsupported;
use crate::pxar::tools::assert_single_path_component;
use crate::pxar::Flags;
+#[derive(Default)]
+struct ReusedChunks {
+ start_boundary: PayloadOffset,
+ total: PayloadOffset,
+ chunks: Vec<ReusableDynamicEntry>,
+ must_flush_first: bool,
+}
+
+impl ReusedChunks {
+ fn new() -> Self {
+ Self {
+ start_boundary: PayloadOffset::default(),
+ total: PayloadOffset::default(),
+ chunks: Vec::new(),
+ must_flush_first: false,
+ }
+ }
+
+ fn start_boundary(&self) -> PayloadOffset {
+ self.start_boundary
+ }
+
+ fn is_empty(&self) -> bool {
+ self.chunks.is_empty()
+ }
+
+ fn insert(
+ &mut self,
+ indices: Vec<ReusableDynamicEntry>,
+ boundary: PayloadOffset,
+ start_padding: u64,
+ ) -> PayloadOffset {
+ if self.is_empty() {
+ self.start_boundary = boundary;
+ }
+
+ if let Some(offset) = self.digest_sequence_contained(&indices) {
+ self.start_boundary.add(offset + start_padding)
+ } else if let Some(offset) = self.last_digest_matched(&indices) {
+ for chunk in indices.into_iter().skip(1) {
+ self.total = self.total.add(chunk.size());
+ self.chunks.push(chunk);
+ }
+ self.start_boundary.add(offset + start_padding)
+ } else {
+ let offset = self.total.raw();
+ for chunk in indices.into_iter() {
+ self.total = self.total.add(chunk.size());
+ self.chunks.push(chunk);
+ }
+ self.start_boundary.add(offset + start_padding)
+ }
+ }
+
+ fn digest_sequence_contained(&self, indices: &[ReusableDynamicEntry]) -> Option<u64> {
+ let digest = if let Some(first) = indices.first() {
+ first.digest()
+ } else {
+ return None;
+ };
+
+ let mut offset = 0;
+ let mut iter = self.chunks.iter();
+ while let Some(position) = iter.position(|e| {
+ offset += e.size();
+ e.digest() == digest
+ }) {
+ if indices.len() + position > self.chunks.len() {
+ return None;
+ }
+
+ for (ind, chunk) in indices.iter().skip(1).enumerate() {
+ if chunk.digest() != self.chunks[ind + position].digest() {
+ return None;
+ }
+ }
+
+ offset -= self.chunks[position].size();
+ return Some(offset);
+ }
+
+ None
+ }
+
+ fn last_digest_matched(&self, indices: &[ReusableDynamicEntry]) -> Option<u64> {
+ let digest = if let Some(first) = indices.first() {
+ first.digest()
+ } else {
+ return None;
+ };
+
+ if let Some(last) = self.chunks.last() {
+ if last.digest() == digest {
+ return Some(self.total.raw() - last.size());
+ }
+ }
+
+ None
+ }
+}
+
/// Pxar options for creating a pxar archive/stream
#[derive(Default, Clone)]
pub struct PxarCreateOptions {
@@ -145,6 +248,7 @@ struct Archiver {
hardlinks: HashMap<HardLinkInfo, (PathBuf, LinkOffset)>,
file_copy_buffer: Vec<u8>,
skip_e2big_xattr: bool,
+ reused_chunks: ReusedChunks,
forced_boundaries: Arc<Mutex<VecDeque<InjectChunks>>>,
}
@@ -217,6 +321,7 @@ where
hardlinks: HashMap::new(),
file_copy_buffer: vec::undefined(4 * 1024 * 1024),
skip_e2big_xattr: options.skip_e2big_xattr,
+ reused_chunks: ReusedChunks::new(),
forced_boundaries,
};
--
2.39.2
next prev parent reply other threads:[~2024-03-05 9:28 UTC|newest]
Thread overview: 94+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-05 9:26 [pbs-devel] [RFC pxar proxmox-backup 00/36] fix #3174: improve file-level backup Christian Ebner
2024-03-05 9:26 ` [pbs-devel] [RFC v2 pxar 01/36] format/examples: add PXAR_PAYLOAD_REF entry header Christian Ebner
2024-03-05 9:26 ` [pbs-devel] [RFC v2 pxar 02/36] encoder: add optional output writer for file payloads Christian Ebner
2024-03-11 13:21 ` Fabian Grünbichler
2024-03-11 13:50 ` Christian Ebner
2024-03-11 15:41 ` Fabian Grünbichler
2024-03-05 9:26 ` [pbs-devel] [RFC v2 pxar 03/36] format/decoder: add method to read payload references Christian Ebner
2024-03-05 9:26 ` [pbs-devel] [RFC v2 pxar 04/36] decoder: add optional payload input stream Christian Ebner
2024-03-11 13:21 ` Fabian Grünbichler
2024-03-11 14:05 ` Christian Ebner
2024-03-11 15:27 ` Fabian Grünbichler
2024-03-11 15:51 ` Christian Ebner
2024-03-05 9:26 ` [pbs-devel] [RFC v2 pxar 05/36] accessor: " Christian Ebner
2024-03-11 13:21 ` Fabian Grünbichler
2024-03-05 9:26 ` [pbs-devel] [RFC v2 pxar 06/36] encoder: move to stack based state tracking Christian Ebner
2024-03-11 13:21 ` Fabian Grünbichler
2024-03-11 14:12 ` Christian Ebner
2024-03-05 9:26 ` [pbs-devel] [RFC v2 pxar 07/36] encoder: add payload reference capability Christian Ebner
2024-03-11 13:21 ` Fabian Grünbichler
2024-03-11 14:15 ` Christian Ebner
2024-03-05 9:26 ` [pbs-devel] [RFC v2 pxar 08/36] encoder: add payload position capability Christian Ebner
2024-03-05 9:26 ` [pbs-devel] [RFC v2 pxar 09/36] encoder: add payload advance capability Christian Ebner
2024-03-11 13:22 ` Fabian Grünbichler
2024-03-11 14:22 ` Christian Ebner
2024-03-11 15:27 ` Fabian Grünbichler
2024-03-11 15:41 ` Christian Ebner
2024-03-05 9:26 ` [pbs-devel] [RFC v2 pxar 10/36] encoder/format: finish payload stream with marker Christian Ebner
2024-03-05 9:26 ` [pbs-devel] [RFC v2 proxmox-backup 11/36] client: pxar: switch to stack based encoder state Christian Ebner
2024-03-05 9:26 ` [pbs-devel] [RFC v2 proxmox-backup 12/36] client: backup: factor out extension from backup target Christian Ebner
2024-03-05 9:26 ` [pbs-devel] [RFC v2 proxmox-backup 13/36] client: backup: early check for fixed index type Christian Ebner
2024-03-11 14:57 ` Fabian Grünbichler
2024-03-11 15:12 ` Christian Ebner
2024-03-05 9:26 ` [pbs-devel] [RFC v2 proxmox-backup 14/36] client: backup: split payload to dedicated stream Christian Ebner
2024-03-11 14:57 ` Fabian Grünbichler
2024-03-11 15:22 ` Christian Ebner
2024-03-05 9:26 ` [pbs-devel] [RFC v2 proxmox-backup 15/36] client: restore: read payload from dedicated index Christian Ebner
2024-03-11 14:58 ` Fabian Grünbichler
2024-03-11 15:26 ` Christian Ebner
2024-03-05 9:26 ` [pbs-devel] [RFC v2 proxmox-backup 16/36] tools: cover meta extension for pxar archives Christian Ebner
2024-03-05 9:26 ` [pbs-devel] [RFC v2 proxmox-backup 17/36] restore: " Christian Ebner
2024-03-05 9:26 ` [pbs-devel] [RFC v2 proxmox-backup 18/36] client: mount: make split pxar archives mountable Christian Ebner
2024-03-11 14:58 ` Fabian Grünbichler
2024-03-11 15:29 ` Christian Ebner
2024-03-05 9:26 ` [pbs-devel] [RFC v2 proxmox-backup 19/36] api: datastore: refactor getting local chunk reader Christian Ebner
2024-03-05 9:26 ` [pbs-devel] [RFC v2 proxmox-backup 20/36] api: datastore: attach optional payload " Christian Ebner
2024-03-05 9:26 ` [pbs-devel] [RFC v2 proxmox-backup 21/36] catalog: shell: factor out pxar fuse reader instantiation Christian Ebner
2024-03-11 14:58 ` Fabian Grünbichler
2024-03-11 15:31 ` Christian Ebner
2024-03-05 9:26 ` [pbs-devel] [RFC v2 proxmox-backup 22/36] catalog: shell: redirect payload reader for split streams Christian Ebner
2024-03-11 14:58 ` Fabian Grünbichler
2024-03-11 15:24 ` Christian Ebner
2024-03-05 9:26 ` [pbs-devel] [RFC v2 proxmox-backup 23/36] www: cover meta extension for pxar archives Christian Ebner
2024-03-11 14:58 ` Fabian Grünbichler
2024-03-11 15:31 ` Christian Ebner
2024-03-05 9:26 ` [pbs-devel] [RFC v2 proxmox-backup 24/36] index: fetch chunk form index by start/end-offset Christian Ebner
2024-03-12 8:50 ` Fabian Grünbichler
2024-03-14 8:23 ` Christian Ebner
2024-03-12 12:47 ` Dietmar Maurer
2024-03-12 12:51 ` Christian Ebner
2024-03-12 13:03 ` Dietmar Maurer
2024-03-05 9:26 ` [pbs-devel] [RFC v2 proxmox-backup 25/36] upload stream: impl reused chunk injector Christian Ebner
2024-03-13 9:43 ` Dietmar Maurer
2024-03-14 14:03 ` Christian Ebner
2024-03-05 9:26 ` [pbs-devel] [RFC v2 proxmox-backup 26/36] client: chunk stream: add chunk injection queues Christian Ebner
2024-03-12 9:46 ` Fabian Grünbichler
2024-03-19 10:52 ` Christian Ebner
2024-03-05 9:26 ` [pbs-devel] [RFC v2 proxmox-backup 27/36] client: implement prepare reference method Christian Ebner
2024-03-12 10:07 ` Fabian Grünbichler
2024-03-19 11:51 ` Christian Ebner
2024-03-19 12:49 ` Fabian Grünbichler
2024-03-20 8:37 ` Christian Ebner
2024-03-05 9:26 ` Christian Ebner [this message]
2024-03-05 9:26 ` [pbs-devel] [RFC v2 proxmox-backup 29/36] client: pxar: add previous reference to archiver Christian Ebner
2024-03-12 12:12 ` Fabian Grünbichler
2024-03-12 12:25 ` Christian Ebner
2024-03-19 12:59 ` Christian Ebner
2024-03-19 13:04 ` Fabian Grünbichler
2024-03-20 8:52 ` Christian Ebner
2024-03-05 9:26 ` [pbs-devel] [RFC v2 proxmox-backup 30/36] client: pxar: add method for metadata comparison Christian Ebner
2024-03-05 9:26 ` [pbs-devel] [RFC v2 proxmox-backup 31/36] specs: add backup detection mode specification Christian Ebner
2024-03-12 12:17 ` Fabian Grünbichler
2024-03-12 12:31 ` Christian Ebner
2024-03-20 9:28 ` Christian Ebner
2024-03-05 9:26 ` [pbs-devel] [RFC v2 proxmox-backup 32/36] pxar: caching: add look-ahead cache types Christian Ebner
2024-03-05 9:27 ` [pbs-devel] [RFC v2 proxmox-backup 33/36] client: pxar: add look-ahead caching Christian Ebner
2024-03-12 14:08 ` Fabian Grünbichler
2024-03-20 10:28 ` Christian Ebner
2024-03-05 9:27 ` [pbs-devel] [RFC v2 proxmox-backup 34/36] fix #3174: client: pxar: enable caching and meta comparison Christian Ebner
2024-03-13 11:12 ` Fabian Grünbichler
2024-03-05 9:27 ` [pbs-devel] [RFC v2 proxmox-backup 35/36] test-suite: add detection mode change benchmark Christian Ebner
2024-03-13 11:48 ` Fabian Grünbichler
2024-03-05 9:27 ` [pbs-devel] [RFC v2 proxmox-backup 36/36] test-suite: Add bin to deb, add shell completions Christian Ebner
2024-03-13 11:18 ` Fabian Grünbichler
2024-03-13 11:44 ` [pbs-devel] [RFC pxar proxmox-backup 00/36] fix #3174: improve file-level backup Fabian Grünbichler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240305092703.126906-29-c.ebner@proxmox.com \
--to=c.ebner@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox