From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <c.ebner@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id ADE6393A4D
 for <pbs-devel@lists.proxmox.com>; Tue,  9 Apr 2024 14:52:34 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 87F231E46D
 for <pbs-devel@lists.proxmox.com>; Tue,  9 Apr 2024 14:52:04 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS
 for <pbs-devel@lists.proxmox.com>; Tue,  9 Apr 2024 14:52:03 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 948FB41C0F
 for <pbs-devel@lists.proxmox.com>; Tue,  9 Apr 2024 14:52:03 +0200 (CEST)
Message-ID: <abecd9e2-2f64-4190-8c39-1ce309278cdb@proxmox.com>
Date: Tue, 9 Apr 2024 14:52:02 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
From: Christian Ebner <c.ebner@proxmox.com>
To: =?UTF-8?Q?Fabian_Gr=C3=BCnbichler?= <f.gruenbichler@proxmox.com>,
 pbs-devel@lists.proxmox.com
Reply-To: Proxmox Backup Server development discussion
 <pbs-devel@lists.proxmox.com>
References: <20240328123707.336951-1-c.ebner@proxmox.com>
 <20240328123707.336951-46-c.ebner@proxmox.com>
 <171230450235.1926770.8602698179855647404@yuna.proxmox.com>
 <b48ea31b-2b76-417a-b8b6-7882a75f366f@proxmox.com>
Content-Language: en-US, de-DE
In-Reply-To: <b48ea31b-2b76-417a-b8b6-7882a75f366f@proxmox.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.031 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [create.rs]
Subject: Re: [pbs-devel] [PATCH v3 proxmox-backup 45/58] client: pxar: add
 method for metadata comparison
X-BeenThere: pbs-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Tue, 09 Apr 2024 12:52:34 -0000

On 4/5/24 10:14, Christian Ebner wrote:
> On 4/5/24 10:08, Fabian Grünbichler wrote:
>> Quoting Christian Ebner (2024-03-28 13:36:54)
>>> Adds a method to compare the metadata of the current file entry
>>> against the metadata of the entry looked up in the previous backup
>>> snapshot.
>>>
>>> If the metadata matched, the start offset for the payload stream is
>>> returned.
>>>
>>> This is in preparation for reusing payload chunks for unchanged files.
>>>
>>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>>> ---
>>> changes since version 2:
>>> - refactored to new padding based threshold
>>>
>>>   pbs-client/src/pxar/create.rs | 31 ++++++++++++++++++++++++++++++-
>>>   1 file changed, 30 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/pbs-client/src/pxar/create.rs 
>>> b/pbs-client/src/pxar/create.rs
>>> index 79925bba2..c64084a74 100644
>>> --- a/pbs-client/src/pxar/create.rs
>>> +++ b/pbs-client/src/pxar/create.rs
>>> @@ -21,7 +21,7 @@ use pbs_datastore::index::IndexFile;
>>>   use proxmox_sys::error::SysError;
>>>   use pxar::accessor::aio::{Accessor, Directory};
>>>   use pxar::encoder::{LinkOffset, PayloadOffset, SeqWrite};
>>> -use pxar::Metadata;
>>> +use pxar::{EntryKind, Metadata};
>>>   use proxmox_io::vec;
>>>   use proxmox_lang::c_str;
>>> @@ -466,6 +466,35 @@ impl Archiver {
>>>           .boxed()
>>>       }
>>> +    async fn is_reusable_entry(
>>> +        &mut self,
>>> +        previous_metadata_accessor: &mut 
>>> Directory<LocalDynamicReadAt<RemoteChunkReader>>,
>>> +        file_name: &Path,
>>> +        stat: &FileStat,
>>> +        metadata: &Metadata,
>>> +    ) -> Result<Option<u64>, Error> {
>>> +        if stat.st_nlink > 1 {
>>> +            log::debug!("re-encode: {file_name:?} has hardlinks.");
>>> +            return Ok(None);
>>> +        }
>>
>> it would be nice if we had a way to handle those as well.. what's the 
>> current
>> blocker? shouldn't we be able to use the same scheme as for regular 
>> archives?
>>
>> first encounter adds (possibly re-uses) the payload and remembers the 
>> offset,
>> subsequent ones just add another reference/meta entry?
> 
> True, this is a leftover from the initial approach with the appendix 
> section instead of the split archive where it caused issues.
> 

Hardlinks will be encoded as such with the upcoming version of the 
patches, this however required additional changes to the pxar encoder, 
such that it returns the LinkOffset also for the `add_payload_ref` calls 
and requires an additional HashSet on the Archiver to remember cached 
regular files for which the payload chunks have already been looked up, 
so that the encountered, hard-linked file does not lead to re-injected 
chunks again.