public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Christian Ebner <c.ebner@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] [PATCH v3 pxar 4/6] decoder: move payload header check for split input
Date: Wed, 12 Jun 2024 10:23:58 +0200	[thread overview]
Message-ID: <20240612082400.110789-5-c.ebner@proxmox.com> (raw)
In-Reply-To: <20240612082400.110789-1-c.ebner@proxmox.com>

The payload entries in the payload output for split pxar archives are
separated by payload headers, which allow to perform consistency
checks for the payload references encoded in the metadata archive.

Currently, this consistency check is performed right after reading the
entry in the metadata archive, which however has the downside that the
payload has to be fetched and decoded just for this consistency check.
This greatly impacts performance when accessing a metadata archive
with attached payload input reader, e.g. in the fuse implementation to
mount pxar archives, being especially severe when accessed over the
network in combination with a remote chunk reader as the Proxmox
Backup Server does.

Therefore, move this check to the contents reader instantiation
instead and add an additional flag to the decoder's `InPayload` state.

Getting the decoder now needs to be async and the method must return
an error when the check fails.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 2:
- no changes

 src/decoder/aio.rs  |  4 ++--
 src/decoder/mod.rs  | 56 +++++++++++++++++++++++++--------------------
 src/decoder/sync.rs |  5 ++--
 3 files changed, 36 insertions(+), 29 deletions(-)

diff --git a/src/decoder/aio.rs b/src/decoder/aio.rs
index 3f9881d..19e7023 100644
--- a/src/decoder/aio.rs
+++ b/src/decoder/aio.rs
@@ -60,8 +60,8 @@ impl<T: SeqRead> Decoder<T> {
     }
 
     /// Get a reader for the contents of the current entry, if the entry has contents.
-    pub fn contents(&mut self) -> Option<Contents<T>> {
-        self.inner.content_reader()
+    pub async fn contents(&mut self) -> io::Result<Option<Contents<T>>> {
+        self.inner.content_reader().await
     }
 
     /// Get the size of the current contents, if the entry has contents.
diff --git a/src/decoder/mod.rs b/src/decoder/mod.rs
index 46a21b8..6191627 100644
--- a/src/decoder/mod.rs
+++ b/src/decoder/mod.rs
@@ -182,6 +182,7 @@ enum State {
     InPayload {
         offset: u64,
         size: u64,
+        header_checked: bool,
     },
 
     /// file entries with no data (fifo, socket)
@@ -296,8 +297,16 @@ impl<I: SeqRead> DecoderImpl<I> {
                     // hierarchy and parse the next PXAR_FILENAME or the PXAR_GOODBYE:
                     self.read_next_item().await?;
                 }
-                State::InPayload { offset, .. } => {
+                State::InPayload {
+                    offset,
+                    header_checked,
+                    ..
+                } => {
                     if self.input.payload().is_some() {
+                        if !header_checked {
+                            // header is only checked if payload has been accessed
+                            self.payload_consumed += size_of::<Header>() as u64;
+                        }
                         // Update consumed payload as given by the offset referenced by the content reader
                         self.payload_consumed += offset;
                     } else {
@@ -370,19 +379,31 @@ impl<I: SeqRead> DecoderImpl<I> {
         }
     }
 
-    pub fn content_reader(&mut self) -> Option<Contents<I>> {
-        if let State::InPayload { offset, size } = &mut self.state {
-            if self.input.payload().is_some() {
-                Some(Contents::new(
+    pub async fn content_reader(&mut self) -> Result<Option<Contents<I>>, io::Error> {
+        if let State::InPayload {
+            offset,
+            size,
+            header_checked,
+        } = &mut self.state
+        {
+            if let Some(payload_input) = self.input.payload_mut() {
+                if !*header_checked {
+                    let header: Header = seq_read_entry(payload_input).await?;
+                    self.payload_consumed += size_of::<Header>() as u64;
+                    format::check_payload_header_and_size(&header, *size)?;
+                    *header_checked = true;
+                }
+
+                Ok(Some(Contents::new(
                     self.input.payload_mut().unwrap(),
                     offset,
                     *size,
-                ))
+                )))
             } else {
-                Some(Contents::new(self.input.archive_mut(), offset, *size))
+                Ok(Some(Contents::new(self.input.archive_mut(), offset, *size)))
             }
         } else {
-            None
+            Ok(None)
         }
     }
 
@@ -621,6 +642,7 @@ impl<I: SeqRead> DecoderImpl<I> {
                 };
                 self.state = State::InPayload {
                     offset: 0,
+                    header_checked: false,
                     size: self.current_header.content_size(),
                 };
                 return Ok(ItemResult::Entry);
@@ -652,23 +674,6 @@ impl<I: SeqRead> DecoderImpl<I> {
                         let end = start + payload_ref.size + size_of::<Header>() as u64;
                         payload_input.update_range(start..end);
                     }
-
-                    let header: Header = seq_read_entry(payload_input).await?;
-                    if header.htype != format::PXAR_PAYLOAD {
-                        io_bail!(
-                            "unexpected header in payload input: expected {} , got {header}",
-                            format::PXAR_PAYLOAD,
-                        );
-                    }
-                    self.payload_consumed += size_of::<Header>() as u64;
-
-                    if header.content_size() != payload_ref.size {
-                        io_bail!(
-                            "encountered payload size mismatch: got {}, expected {}",
-                            payload_ref.size,
-                            header.content_size(),
-                        );
-                    }
                 }
 
                 self.entry.kind = EntryKind::File {
@@ -678,6 +683,7 @@ impl<I: SeqRead> DecoderImpl<I> {
                 };
                 self.state = State::InPayload {
                     offset: 0,
+                    header_checked: false,
                     size: payload_ref.size,
                 };
                 return Ok(ItemResult::Entry);
diff --git a/src/decoder/sync.rs b/src/decoder/sync.rs
index 8779f87..1116fe8 100644
--- a/src/decoder/sync.rs
+++ b/src/decoder/sync.rs
@@ -77,8 +77,9 @@ impl<T: SeqRead> Decoder<T> {
     }
 
     /// Get a reader for the contents of the current entry, if the entry has contents.
-    pub fn contents(&mut self) -> Option<Contents<T>> {
-        self.inner.content_reader().map(|inner| Contents { inner })
+    pub fn contents(&mut self) -> io::Result<Option<Contents<T>>> {
+        let content_reader = poll_result_once(self.inner.content_reader())?;
+        Ok(content_reader.map(|inner| Contents { inner }))
     }
 
     /// Get the size of the current contents, if the entry has contents.
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


  parent reply	other threads:[~2024-06-12  8:23 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-12  8:23 [pbs-devel] [PATCH v3 pxar proxmox-backp 0/6] fix fuse mount performance for split archives Christian Ebner
2024-06-12  8:23 ` [pbs-devel] [PATCH v3 pxar 1/6] accessor: fix minor formatting issue Christian Ebner
2024-06-12  8:23 ` [pbs-devel] [PATCH v3 pxar 2/6] format: add helper for payload header consistency checks Christian Ebner
2024-06-12  8:23 ` [pbs-devel] [PATCH v3 pxar 3/6] format: add helper type ContentRange Christian Ebner
2024-06-12  9:27   ` Fabian Grünbichler
     [not found]     ` <CAFtnzVcG1CFUhs9WiePBLuDKihHQM8mOiuJKKjhFy+rmuxY4pw@mail.gmail.com>
2024-06-12 11:47       ` Fabian Grünbichler
2024-06-12 11:50     ` Christian Ebner
2024-06-12  8:23 ` Christian Ebner [this message]
2024-06-12  8:23 ` [pbs-devel] [PATCH v3 pxar 5/6] accessor: add payload checks for split archives Christian Ebner
2024-06-12  8:24 ` [pbs-devel] [PATCH v3 proxmox-backup 6/6] client: pxar: fix fuse mount performance " Christian Ebner
2024-06-12  9:27 ` [pbs-devel] partially-applied: [PATCH v3 pxar proxmox-backp 0/6] " Fabian Grünbichler
2024-06-12 13:18 ` [pbs-devel] " Christian Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240612082400.110789-5-c.ebner@proxmox.com \
    --to=c.ebner@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal