public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [PATCH 2/2] client: pxar: re-encode files with non-monotonic previous offsets
Date: Tue, 28 Apr 2026 03:28:10 +0200	[thread overview]
Message-ID: <20260428014635.2655155-3-t.lamprecht@proxmox.com> (raw)
In-Reply-To: <20260428014635.2655155-1-t.lamprecht@proxmox.com>

The strict payload-offset check added in pxar 1.0.1 rejects any
payload_offset not strictly greater than the encoder's previously
recorded one. Older client versions could write metadata archives
with duplicate or non-monotonic PXAR_PAYLOAD_REF offsets via the
producer-side bug fixed in the previous commit. Without the encode-
time guard such an archive looked fine on disk, but every later
metadata-mode incremental backup now aborts on the strict check until
the chain is cycled out via --change-detection-mode=data.

Track the highest previously-recorded payload offset globally on the
Archiver and treat any prev-archive entry with a smaller-or-equal
offset as non-reusable, so it is re-encoded instead. The chain
self-heals on the next run, and a single warning per affected file
is emitted to make the fall-back visible. Tracking is intentionally
global, not per cache range, so inversions that span ranges (because
the next backup's cache flushes happen between the two inverted
offsets) are caught too.

Healthy archives written on or after pxar 1.0.1 cannot trigger this
fall-back since the encoder already enforces strict monotonicity at
write time.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
 pbs-client/src/pxar/create.rs | 86 +++++++++++++++++++++++++++++++++++
 1 file changed, 86 insertions(+)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 15e149a23..70b858092 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -188,6 +188,11 @@ struct Archiver {
     suggested_boundaries: Option<mpsc::Sender<u64>>,
     previous_payload_index: Option<DynamicIndexReader>,
     cache: PxarLookaheadCache,
+    // Highest payload offset accepted as reusable so far. Mirrors the encoder's accumulated
+    // strict-monotonic-offset invariant on the read side, so that a previous archive written by
+    // an older client without the encode-time check is detected here and the affected file is
+    // re-encoded instead of reused.
+    last_reusable_offset: Option<u64>,
     reuse_stats: ReuseStats,
     split_archive: bool,
 }
@@ -298,6 +303,7 @@ where
         suggested_boundaries,
         previous_payload_index,
         cache: PxarLookaheadCache::new(options.max_cache_size),
+        last_reusable_offset: None,
         reuse_stats: ReuseStats::default(),
         split_archive,
     };
@@ -418,6 +424,16 @@ impl Archiver {
         .boxed()
     }
 
+    /// Record an accepted reusable payload offset from the previous metadata archive.
+    ///
+    /// Returns `false` if `offset` is not strictly greater than the last accepted offset, which
+    /// signals a duplicate or non-monotonic previous archive. The caller must then treat the file
+    /// as non-reusable so the encoder's strict offset check cannot reject the eventual
+    /// `add_payload_ref`. Tracking is global to catch inversions that span cache ranges.
+    fn try_record_reusable_offset(&mut self, offset: u64) -> bool {
+        try_record_strictly_greater(&mut self.last_reusable_offset, offset)
+    }
+
     async fn is_reusable_entry(
         &mut self,
         previous_metadata_accessor: &Option<Directory<MetadataArchiveReader>>,
@@ -437,6 +453,19 @@ impl Archiver {
                         if file_size != *size {
                             return Ok(None);
                         }
+                        // a previous archive written by an older client version may contain
+                        // duplicate or non-monotonic PXAR_PAYLOAD_REF offsets for distinct files.
+                        // Re-encoding the affected file lets the chain self-heal and avoids the
+                        // pxar strict offset check rejecting the whole backup.
+                        if !self.try_record_reusable_offset(*offset) {
+                            let last = self.last_reusable_offset.unwrap_or(0);
+                            warn!(
+                                "re-encode: {file_name:?} previous archive payload offset \
+                                 {offset} not strictly greater than last seen {last}, \
+                                 treating as non-reusable"
+                            );
+                            return Ok(None);
+                        }
                         let range =
                             *offset..*offset + size + size_of::<pxar::format::Header>() as u64;
                         debug!(
@@ -1378,6 +1407,19 @@ impl ReusableDynamicEntry {
     }
 }
 
+/// Updates `last` to `offset` and returns `true` if `offset` is strictly greater than the current
+/// value (or `last` is `None`). Returns `false` without modifying `last` for duplicate or
+/// backwards offsets.
+fn try_record_strictly_greater(last: &mut Option<u64>, offset: u64) -> bool {
+    if let Some(prev) = *last {
+        if offset <= prev {
+            return false;
+        }
+    }
+    *last = Some(offset);
+    true
+}
+
 /// List of dynamic entries containing the data given by an offset range
 fn lookup_dynamic_entries(
     index: &DynamicIndexReader,
@@ -2009,6 +2051,7 @@ mod tests {
                 previous_payload_index,
                 suggested_boundaries: Some(suggested_boundaries),
                 cache: PxarLookaheadCache::new(None),
+                last_reusable_offset: None,
                 reuse_stats: ReuseStats::default(),
                 split_archive: true,
             };
@@ -2071,4 +2114,47 @@ mod tests {
             "take_last_chunk must consume the value"
         );
     }
+
+    /// Strictly-greater bookkeeping for [`Archiver::try_record_reusable_offset`]; mirrors the
+    /// encoder's accumulated strict-monotonic-offset invariant on the read side.
+    #[test]
+    fn try_record_strictly_greater_accepts_increasing() {
+        let mut last = None;
+        assert!(super::try_record_strictly_greater(&mut last, 100));
+        assert_eq!(last, Some(100));
+        assert!(super::try_record_strictly_greater(&mut last, 200));
+        assert_eq!(last, Some(200));
+    }
+
+    #[test]
+    fn try_record_strictly_greater_rejects_equal_and_backwards() {
+        let mut last = Some(200);
+        // duplicate offset: previous archive recorded two distinct files at the same offset
+        assert!(!super::try_record_strictly_greater(&mut last, 200));
+        assert_eq!(last, Some(200), "rejected offset must not update state");
+        // backwards offset: previous archive recorded the next file at a lower offset
+        assert!(!super::try_record_strictly_greater(&mut last, 150));
+        assert_eq!(last, Some(200));
+    }
+
+    #[test]
+    fn try_record_strictly_greater_first_call_accepts_any_value() {
+        let mut last = None;
+        assert!(super::try_record_strictly_greater(&mut last, 0));
+        assert_eq!(last, Some(0));
+    }
+
+    /// Inversions that span cache ranges must be caught by the global tracking; a per-range
+    /// implementation would have reset on flush and accepted the second offset, after which
+    /// the encoder strict check would still abort the backup.
+    #[test]
+    fn try_record_strictly_greater_persists_across_resets() {
+        let mut last = None;
+        assert!(super::try_record_strictly_greater(&mut last, 1000));
+        // a per-range implementation would have cleared `last` here on cache flush
+        assert!(!super::try_record_strictly_greater(&mut last, 500));
+        assert_eq!(last, Some(1000));
+        assert!(super::try_record_strictly_greater(&mut last, 1500));
+        assert_eq!(last, Some(1500));
+    }
 }
-- 
2.47.3





      parent reply	other threads:[~2026-04-28  1:46 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-28  1:28 applied: [PATCH 0/2]: client: pxar: make payload-offset handling more robust Thomas Lamprecht
2026-04-28  1:28 ` [PATCH 1/2] client: pxar: inject held chunk on cache-range discontinuity Thomas Lamprecht
2026-04-28  1:28 ` Thomas Lamprecht [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260428014635.2655155-3-t.lamprecht@proxmox.com \
    --to=t.lamprecht@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal