From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id C68289341D for ; Tue, 9 Apr 2024 09:19:45 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id ACCD1172E1 for ; Tue, 9 Apr 2024 09:19:45 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Tue, 9 Apr 2024 09:19:44 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 71D5042C2E for ; Tue, 9 Apr 2024 09:19:44 +0200 (CEST) Date: Tue, 09 Apr 2024 09:19:40 +0200 From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= To: Christian Ebner , Proxmox Backup Server development discussion References: <20240328123707.336951-1-c.ebner@proxmox.com> <20240328123707.336951-41-c.ebner@proxmox.com> <1712241225.maig1bup9p.astroid@yuna.none> In-Reply-To: MIME-Version: 1.0 User-Agent: astroid/0.16.0 (https://github.com/astroidmail/astroid) Message-Id: <1712646920.o9k1jsiy8t.astroid@yuna.none> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-SPAM-LEVEL: Spam detection results: 0 AWL 0.058 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pbs-devel] [PATCH v3 proxmox-backup 40/58] client: chunk stream: add dynamic entries injection queues X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Apr 2024 07:19:45 -0000 On April 8, 2024 3:54 pm, Christian Ebner wrote: > On 4/4/24 16:52, Fabian Gr=C3=BCnbichler wrote: >> once more I am wondering here whether for the payload stream, a vastly >> simplified chunker that just picks the boundaries based on re-use and >> payload size(s) (to avoid the one file =3D=3D one chunk pathological cas= e >> for lots of small files) wouldn't improve performance :) >=20 > Do you suggest to have 2 chunker implementations and for the payload=20 > stream, instead of performing chunking by the statistical sliding window=20 > approach use the provide the chunk boundaries by some interface rather=20 > than performing the chunking based on the statistical approach with the=20 > sliding window? As you mentioned in response to Dietmar on patch 49 of=20 > this patch series version? yes - I think it would be interesting to evaluate. but only if such an experiment is not a week-long effort :) the two main questions would be: - is a metadata-informed chunker faster than the sliding window (or how much faster) - how does the dedup rate compare for some common scenarios so maybe it would make sense to have a "change based" test corpus first (which we IMHO want anyway), and then compare the two.