From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id A58F41FF14F for ; Tue, 28 Apr 2026 03:47:15 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 7E2D31CE7; Tue, 28 Apr 2026 03:47:15 +0200 (CEST) From: Thomas Lamprecht To: pbs-devel@lists.proxmox.com Subject: [PATCH 1/2] client: pxar: inject held chunk on cache-range discontinuity Date: Tue, 28 Apr 2026 03:28:09 +0200 Message-ID: <20260428014635.2655155-2-t.lamprecht@proxmox.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260428014635.2655155-1-t.lamprecht@proxmox.com> References: <20260428014635.2655155-1-t.lamprecht@proxmox.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1777340705300 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.003 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [create.rs] Message-ID-Hash: JMINEZ5BDRP7LTBWCIEVKG2W3PGS7NHC X-Message-ID-Hash: JMINEZ5BDRP7LTBWCIEVKG2W3PGS7NHC X-MailFrom: t.lamprecht@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox Backup Server development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: When the cached payload range cannot be extended because a non-reusable file or directory end opens a hole, the previous flush may have held back its last chunk for continuation. If the new range's first chunk shares a digest with the held chunk via dedup, but the two correspond to different physical positions in the previous archive, the next flush's digest-match fast path collapses both into one without ever injecting the held chunk's bytes. The encoder's payload_write_position then trails the previous_payload_offset recorded during this flush, and the next add_payload_ref produces a backwards PXAR_PAYLOAD_REF offset, which the strict offset check introduced in pxar 1.0.1 rejects. Inject any held chunk explicitly when a cache-range discontinuity is detected, before update_range, so no held chunk can survive across a hole. Healthy archives are unaffected: the held chunk only exists when keep_last_chunk = true, and the inject path is the same one used by every other branch that already advances the payload position. The added test pins the cache contract this fix relies on: update_range alone must not drop last_chunk, so the explicit take_last_chunk + inject pair has to stay on the hole path. Signed-off-by: Thomas Lamprecht --- pbs-client/src/pxar/create.rs | 37 +++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs index e42bcc87f..15e149a23 100644 --- a/pbs-client/src/pxar/create.rs +++ b/pbs-client/src/pxar/create.rs @@ -822,6 +822,14 @@ impl Archiver { debug!("Cache range has hole, new range: {payload_range:?}"); self.flush_cached_reusing_if_below_threshold(encoder, true) .await?; + // Inject any held-back chunk now: the new range is non-contiguous, so it + // cannot legitimately merge with the next range's first chunk. Without this, + // a digest collision via dedup would let the next flush skip the injection, + // leaving the encoder's payload position behind the recorded offset and + // writing a backwards PXAR_PAYLOAD_REF. + if let Some(held) = self.cache.take_last_chunk() { + self.inject_chunks_at_current_payload_position(encoder, vec![held])?; + } // range has to be set after flushing of cached entries, which resets the range self.cache.update_range(payload_range.clone()); } @@ -2034,4 +2042,33 @@ mod tests { Ok::<(), Error>(()) }) } + + /// Pins the contract the cache-hole path in [`Archiver`] depends on: `update_range` must not + /// clear `last_chunk`. The hole-path must take and inject the held chunk before + /// `update_range`, otherwise it would carry across the discontinuity and the next flush's + /// digest-match fast-path could collapse two physically separate but content-identical + /// (via dedup) chunks, leaving the encoder's payload position behind the recorded offset. + #[test] + fn cache_update_range_preserves_held_chunk() { + let mut cache = super::PxarLookaheadCache::new(None); + let chunk = super::ReusableDynamicEntry { + size: 1024, + padding: 0, + digest: [0u8; 32], + }; + + cache.update_last_chunk(Some(chunk)); + cache.update_range(100..200); + cache.update_range(500..600); + + assert!( + cache.take_last_chunk().is_some(), + "update_range alone must preserve last_chunk -- if this fails, the explicit \ + take_last_chunk in the cache-hole path of add_entry_to_archive_impl can be removed" + ); + assert!( + cache.take_last_chunk().is_none(), + "take_last_chunk must consume the value" + ); + } } -- 2.47.3