From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pbs-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
	by lore.proxmox.com (Postfix) with ESMTPS id EE87D1FF2CA
	for <inbox@lore.proxmox.com>; Tue, 23 Jul 2024 12:10:35 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 7342979E;
	Tue, 23 Jul 2024 12:11:10 +0200 (CEST)
From: Dominik Csapak <d.csapak@proxmox.com>
To: pbs-devel@lists.proxmox.com
Date: Tue, 23 Jul 2024 12:10:36 +0200
Message-Id: <20240723101037.1596714-1-d.csapak@proxmox.com>
X-Mailer: git-send-email 2.39.2
MIME-Version: 1.0
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.016 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: [pbs-devel] [PATCH proxmox-backup 1/2] datastore: data blob:
 increase compression throughput
X-BeenThere: pbs-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe>
Reply-To: Proxmox Backup Server development discussion
 <pbs-devel@lists.proxmox.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: pbs-devel-bounces@lists.proxmox.com
Sender: "pbs-devel" <pbs-devel-bounces@lists.proxmox.com>

by not using `zstd::stream::copy_encode`, because that has an allocation
pattern that reduces throughput if the target/source storage and the
network are faster than the chunk creation.

instead use `zstd::bulk::compress_to_buffer` which shouldn't to any big
allocations, since we provide the target buffer.

To handle the case that the target buffer is too small, we now ignore
all zstd error and continue with the unencrypted data, logging the error
except if the target buffer is too small.

For now, we have to parse the error string for that, as `zstd` maps all
errors as `io::ErrorKind::Other`. Until that gets changed, there is no
other way to differentiate between different kind of errors.

In my local benchmarks from tmpfs to tmpfs on localhost, where i
previously maxed out at ~450MiB/s i know get ~625MiB/s throughput.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---

Note: if we want a different behavior for the errors, that's also ok
with me, but zstd errors should be rare i guess (except the target
buffer one) and in that case I find it better to continue with
uncompressed data. For the case that it was a transient error,
the next upload of the chunk will replace the uncompressed one
if it's smaller anyway.

 pbs-datastore/src/data_blob.rs | 31 +++++++++++++++++++++----------
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/pbs-datastore/src/data_blob.rs b/pbs-datastore/src/data_blob.rs
index a7a55fb7..92242076 100644
--- a/pbs-datastore/src/data_blob.rs
+++ b/pbs-datastore/src/data_blob.rs
@@ -136,7 +136,8 @@ impl DataBlob {
 
             DataBlob { raw_data }
         } else {
-            let max_data_len = data.len() + std::mem::size_of::<DataBlobHeader>();
+            let header_len = std::mem::size_of::<DataBlobHeader>();
+            let max_data_len = data.len() + header_len;
             if compress {
                 let mut comp_data = Vec::with_capacity(max_data_len);
 
@@ -147,15 +148,25 @@ impl DataBlob {
                 unsafe {
                     comp_data.write_le_value(head)?;
                 }
-
-                zstd::stream::copy_encode(data, &mut comp_data, 1)?;
-
-                if comp_data.len() < max_data_len {
-                    let mut blob = DataBlob {
-                        raw_data: comp_data,
-                    };
-                    blob.set_crc(blob.compute_crc());
-                    return Ok(blob);
+                comp_data.resize(max_data_len, 0u8);
+
+                match zstd::bulk::compress_to_buffer(data, &mut comp_data[header_len..], 1) {
+                    Ok(size) if size <= data.len() => {
+                        comp_data.resize(header_len + size, 0u8);
+                        let mut blob = DataBlob {
+                            raw_data: comp_data,
+                        };
+                        blob.set_crc(blob.compute_crc());
+                        return Ok(blob);
+                    }
+                    // if size is bigger than the data, or any error is returned, continue with non
+                    // compressed archive but log all errors beside buffer too small
+                    Ok(_) => {}
+                    Err(err) => {
+                        if !err.to_string().contains("Destination buffer is too small") {
+                            log::error!("zstd compression error: {err}");
+                        }
+                    }
                 }
             }
 
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel