From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <c.ebner@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id B187F920DA
 for <pbs-devel@lists.proxmox.com>; Fri,  5 Apr 2024 10:15:24 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 8D795F4DA
 for <pbs-devel@lists.proxmox.com>; Fri,  5 Apr 2024 10:14:54 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS
 for <pbs-devel@lists.proxmox.com>; Fri,  5 Apr 2024 10:14:53 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id AED3245870
 for <pbs-devel@lists.proxmox.com>; Fri,  5 Apr 2024 10:14:53 +0200 (CEST)
Message-ID: <b48ea31b-2b76-417a-b8b6-7882a75f366f@proxmox.com>
Date: Fri, 5 Apr 2024 10:14:52 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
To: =?UTF-8?Q?Fabian_Gr=C3=BCnbichler?= <f.gruenbichler@proxmox.com>,
 pbs-devel@lists.proxmox.com
References: <20240328123707.336951-1-c.ebner@proxmox.com>
 <20240328123707.336951-46-c.ebner@proxmox.com>
 <171230450235.1926770.8602698179855647404@yuna.proxmox.com>
Content-Language: en-US, de-DE
From: Christian Ebner <c.ebner@proxmox.com>
In-Reply-To: <171230450235.1926770.8602698179855647404@yuna.proxmox.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.031 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [create.rs]
Subject: Re: [pbs-devel] [PATCH v3 proxmox-backup 45/58] client: pxar: add
 method for metadata comparison
X-BeenThere: pbs-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Fri, 05 Apr 2024 08:15:24 -0000

On 4/5/24 10:08, Fabian Grünbichler wrote:
> Quoting Christian Ebner (2024-03-28 13:36:54)
>> Adds a method to compare the metadata of the current file entry
>> against the metadata of the entry looked up in the previous backup
>> snapshot.
>>
>> If the metadata matched, the start offset for the payload stream is
>> returned.
>>
>> This is in preparation for reusing payload chunks for unchanged files.
>>
>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>> ---
>> changes since version 2:
>> - refactored to new padding based threshold
>>
>>   pbs-client/src/pxar/create.rs | 31 ++++++++++++++++++++++++++++++-
>>   1 file changed, 30 insertions(+), 1 deletion(-)
>>
>> diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
>> index 79925bba2..c64084a74 100644
>> --- a/pbs-client/src/pxar/create.rs
>> +++ b/pbs-client/src/pxar/create.rs
>> @@ -21,7 +21,7 @@ use pbs_datastore::index::IndexFile;
>>   use proxmox_sys::error::SysError;
>>   use pxar::accessor::aio::{Accessor, Directory};
>>   use pxar::encoder::{LinkOffset, PayloadOffset, SeqWrite};
>> -use pxar::Metadata;
>> +use pxar::{EntryKind, Metadata};
>>   
>>   use proxmox_io::vec;
>>   use proxmox_lang::c_str;
>> @@ -466,6 +466,35 @@ impl Archiver {
>>           .boxed()
>>       }
>>   
>> +    async fn is_reusable_entry(
>> +        &mut self,
>> +        previous_metadata_accessor: &mut Directory<LocalDynamicReadAt<RemoteChunkReader>>,
>> +        file_name: &Path,
>> +        stat: &FileStat,
>> +        metadata: &Metadata,
>> +    ) -> Result<Option<u64>, Error> {
>> +        if stat.st_nlink > 1 {
>> +            log::debug!("re-encode: {file_name:?} has hardlinks.");
>> +            return Ok(None);
>> +        }
> 
> it would be nice if we had a way to handle those as well.. what's the current
> blocker? shouldn't we be able to use the same scheme as for regular archives?
> 
> first encounter adds (possibly re-uses) the payload and remembers the offset,
> subsequent ones just add another reference/meta entry?

True, this is a leftover from the initial approach with the appendix 
section instead of the split archive where it caused issues.