public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH proxmox-backup v3 0/5] improve compression throughput
@ 2024-08-05  9:24 Dominik Csapak
  2024-08-05  9:24 ` [pbs-devel] [PATCH proxmox-backup v3 1/5] remove data blob writer Dominik Csapak
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Dominik Csapak @ 2024-08-05  9:24 UTC (permalink / raw)
  To: pbs-devel

in my tests (against current master) it improved the throughput if
the source/target storage is fast enough (tmpfs -> tmpfs):

Type                master (MiB/s)   with my patches (MiB/s)
.img file           ~614             ~767
pxar one big file   ~657             ~807
pxar small files    ~576             ~627

(these results are also in the relevant commit message)

It would be great, if someone else can cross check my results here.
Note: the the pxar code being faster than the img code seems to stem
from better multithreading pipelining in that code or in tokio (pxar
codepath scales more directly with more cores than the .img codepath)

changes from v2:
* use zstd_safe instead of zstd so we have access to the underlying
  error code
* add test for the error code handling since that's not part of the
  public zstd api, only an implementation detail (albeit one that's
  not likely to change soon)
* seperated the tests for the decode(encode()) roundtrip so a failure
  can more easily assigned to a specific codepath

changes from v1:
* reorder patches so that the data blob writer removal is the first one
* add tests for DataBlob that we can decode what we encoded
  (to see that my patches don't mess up the chunk generation)
* add new patch to cleanup the `encode` function a bit

Dominik Csapak (5):
  remove data blob writer
  datastore: test DataBlob encode/decode roundtrip
  datastore: data blob: add helper and test for checking zstd_safe error
    code
  datastore: data blob: increase compression throughput
  datastore: DataBlob encode: simplify code

 Cargo.toml                            |   1 +
 pbs-datastore/Cargo.toml              |   1 +
 pbs-datastore/src/data_blob.rs        | 193 ++++++++++++++++-------
 pbs-datastore/src/data_blob_writer.rs | 212 --------------------------
 pbs-datastore/src/lib.rs              |   2 -
 tests/blob_writer.rs                  | 105 -------------
 6 files changed, 144 insertions(+), 370 deletions(-)
 delete mode 100644 pbs-datastore/src/data_blob_writer.rs
 delete mode 100644 tests/blob_writer.rs

-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pbs-devel] [PATCH proxmox-backup v3 1/5] remove data blob writer
  2024-08-05  9:24 [pbs-devel] [PATCH proxmox-backup v3 0/5] improve compression throughput Dominik Csapak
@ 2024-08-05  9:24 ` Dominik Csapak
  2024-08-05  9:24 ` [pbs-devel] [PATCH proxmox-backup v3 2/5] datastore: test DataBlob encode/decode roundtrip Dominik Csapak
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Dominik Csapak @ 2024-08-05  9:24 UTC (permalink / raw)
  To: pbs-devel

this is leftover code that is not currently used outside of tests.
Should we need it again, we can just revert this commit.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
 pbs-datastore/src/data_blob_writer.rs | 212 --------------------------
 pbs-datastore/src/lib.rs              |   2 -
 tests/blob_writer.rs                  | 105 -------------
 3 files changed, 319 deletions(-)
 delete mode 100644 pbs-datastore/src/data_blob_writer.rs
 delete mode 100644 tests/blob_writer.rs

diff --git a/pbs-datastore/src/data_blob_writer.rs b/pbs-datastore/src/data_blob_writer.rs
deleted file mode 100644
index 30d9645f..00000000
--- a/pbs-datastore/src/data_blob_writer.rs
+++ /dev/null
@@ -1,212 +0,0 @@
-use std::io::{Seek, SeekFrom, Write};
-use std::sync::Arc;
-
-use anyhow::Error;
-
-use proxmox_io::WriteExt;
-
-use pbs_tools::crypt_config::CryptConfig;
-
-use crate::checksum_writer::ChecksumWriter;
-use crate::crypt_writer::CryptWriter;
-use crate::file_formats::{self, DataBlobHeader, EncryptedDataBlobHeader};
-
-enum BlobWriterState<'writer, W: Write> {
-    Uncompressed {
-        csum_writer: ChecksumWriter<W>,
-    },
-    Compressed {
-        compr: zstd::stream::write::Encoder<'writer, ChecksumWriter<W>>,
-    },
-    Encrypted {
-        crypt_writer: CryptWriter<ChecksumWriter<W>>,
-    },
-    EncryptedCompressed {
-        compr: zstd::stream::write::Encoder<'writer, CryptWriter<ChecksumWriter<W>>>,
-    },
-}
-
-/// Data blob writer
-pub struct DataBlobWriter<'writer, W: Write> {
-    state: BlobWriterState<'writer, W>,
-}
-
-impl<W: Write + Seek> DataBlobWriter<'_, W> {
-    pub fn new_uncompressed(mut writer: W) -> Result<Self, Error> {
-        writer.seek(SeekFrom::Start(0))?;
-        let head = DataBlobHeader {
-            magic: file_formats::UNCOMPRESSED_BLOB_MAGIC_1_0,
-            crc: [0; 4],
-        };
-        unsafe {
-            writer.write_le_value(head)?;
-        }
-        let csum_writer = ChecksumWriter::new(writer, None);
-        Ok(Self {
-            state: BlobWriterState::Uncompressed { csum_writer },
-        })
-    }
-
-    pub fn new_compressed(mut writer: W) -> Result<Self, Error> {
-        writer.seek(SeekFrom::Start(0))?;
-        let head = DataBlobHeader {
-            magic: file_formats::COMPRESSED_BLOB_MAGIC_1_0,
-            crc: [0; 4],
-        };
-        unsafe {
-            writer.write_le_value(head)?;
-        }
-        let csum_writer = ChecksumWriter::new(writer, None);
-        let compr = zstd::stream::write::Encoder::new(csum_writer, 1)?;
-        Ok(Self {
-            state: BlobWriterState::Compressed { compr },
-        })
-    }
-
-    pub fn new_encrypted(mut writer: W, config: Arc<CryptConfig>) -> Result<Self, Error> {
-        writer.seek(SeekFrom::Start(0))?;
-        let head = EncryptedDataBlobHeader {
-            head: DataBlobHeader {
-                magic: file_formats::ENCRYPTED_BLOB_MAGIC_1_0,
-                crc: [0; 4],
-            },
-            iv: [0u8; 16],
-            tag: [0u8; 16],
-        };
-        unsafe {
-            writer.write_le_value(head)?;
-        }
-
-        let csum_writer = ChecksumWriter::new(writer, None);
-        let crypt_writer = CryptWriter::new(csum_writer, config)?;
-        Ok(Self {
-            state: BlobWriterState::Encrypted { crypt_writer },
-        })
-    }
-
-    pub fn new_encrypted_compressed(
-        mut writer: W,
-        config: Arc<CryptConfig>,
-    ) -> Result<Self, Error> {
-        writer.seek(SeekFrom::Start(0))?;
-        let head = EncryptedDataBlobHeader {
-            head: DataBlobHeader {
-                magic: file_formats::ENCR_COMPR_BLOB_MAGIC_1_0,
-                crc: [0; 4],
-            },
-            iv: [0u8; 16],
-            tag: [0u8; 16],
-        };
-        unsafe {
-            writer.write_le_value(head)?;
-        }
-
-        let csum_writer = ChecksumWriter::new(writer, None);
-        let crypt_writer = CryptWriter::new(csum_writer, config)?;
-        let compr = zstd::stream::write::Encoder::new(crypt_writer, 1)?;
-        Ok(Self {
-            state: BlobWriterState::EncryptedCompressed { compr },
-        })
-    }
-
-    pub fn finish(self) -> Result<W, Error> {
-        match self.state {
-            BlobWriterState::Uncompressed { csum_writer } => {
-                // write CRC
-                let (mut writer, crc, _) = csum_writer.finish()?;
-                let head = DataBlobHeader {
-                    magic: file_formats::UNCOMPRESSED_BLOB_MAGIC_1_0,
-                    crc: crc.to_le_bytes(),
-                };
-
-                writer.seek(SeekFrom::Start(0))?;
-                unsafe {
-                    writer.write_le_value(head)?;
-                }
-
-                Ok(writer)
-            }
-            BlobWriterState::Compressed { compr } => {
-                let csum_writer = compr.finish()?;
-                let (mut writer, crc, _) = csum_writer.finish()?;
-
-                let head = DataBlobHeader {
-                    magic: file_formats::COMPRESSED_BLOB_MAGIC_1_0,
-                    crc: crc.to_le_bytes(),
-                };
-
-                writer.seek(SeekFrom::Start(0))?;
-                unsafe {
-                    writer.write_le_value(head)?;
-                }
-
-                Ok(writer)
-            }
-            BlobWriterState::Encrypted { crypt_writer } => {
-                let (csum_writer, iv, tag) = crypt_writer.finish()?;
-                let (mut writer, crc, _) = csum_writer.finish()?;
-
-                let head = EncryptedDataBlobHeader {
-                    head: DataBlobHeader {
-                        magic: file_formats::ENCRYPTED_BLOB_MAGIC_1_0,
-                        crc: crc.to_le_bytes(),
-                    },
-                    iv,
-                    tag,
-                };
-                writer.seek(SeekFrom::Start(0))?;
-                unsafe {
-                    writer.write_le_value(head)?;
-                }
-                Ok(writer)
-            }
-            BlobWriterState::EncryptedCompressed { compr } => {
-                let crypt_writer = compr.finish()?;
-                let (csum_writer, iv, tag) = crypt_writer.finish()?;
-                let (mut writer, crc, _) = csum_writer.finish()?;
-
-                let head = EncryptedDataBlobHeader {
-                    head: DataBlobHeader {
-                        magic: file_formats::ENCR_COMPR_BLOB_MAGIC_1_0,
-                        crc: crc.to_le_bytes(),
-                    },
-                    iv,
-                    tag,
-                };
-                writer.seek(SeekFrom::Start(0))?;
-                unsafe {
-                    writer.write_le_value(head)?;
-                }
-                Ok(writer)
-            }
-        }
-    }
-}
-
-impl<W: Write + Seek> Write for DataBlobWriter<'_, W> {
-    fn write(&mut self, buf: &[u8]) -> Result<usize, std::io::Error> {
-        match self.state {
-            BlobWriterState::Uncompressed {
-                ref mut csum_writer,
-            } => csum_writer.write(buf),
-            BlobWriterState::Compressed { ref mut compr } => compr.write(buf),
-            BlobWriterState::Encrypted {
-                ref mut crypt_writer,
-            } => crypt_writer.write(buf),
-            BlobWriterState::EncryptedCompressed { ref mut compr } => compr.write(buf),
-        }
-    }
-
-    fn flush(&mut self) -> Result<(), std::io::Error> {
-        match self.state {
-            BlobWriterState::Uncompressed {
-                ref mut csum_writer,
-            } => csum_writer.flush(),
-            BlobWriterState::Compressed { ref mut compr } => compr.flush(),
-            BlobWriterState::Encrypted {
-                ref mut crypt_writer,
-            } => crypt_writer.flush(),
-            BlobWriterState::EncryptedCompressed { ref mut compr } => compr.flush(),
-        }
-    }
-}
diff --git a/pbs-datastore/src/lib.rs b/pbs-datastore/src/lib.rs
index 3e4aa34c..202b0955 100644
--- a/pbs-datastore/src/lib.rs
+++ b/pbs-datastore/src/lib.rs
@@ -179,7 +179,6 @@ pub mod crypt_reader;
 pub mod crypt_writer;
 pub mod data_blob;
 pub mod data_blob_reader;
-pub mod data_blob_writer;
 pub mod file_formats;
 pub mod index;
 pub mod manifest;
@@ -201,7 +200,6 @@ pub use crypt_reader::CryptReader;
 pub use crypt_writer::CryptWriter;
 pub use data_blob::DataBlob;
 pub use data_blob_reader::DataBlobReader;
-pub use data_blob_writer::DataBlobWriter;
 pub use manifest::BackupManifest;
 pub use store_progress::StoreProgress;
 
diff --git a/tests/blob_writer.rs b/tests/blob_writer.rs
deleted file mode 100644
index 23a3283d..00000000
--- a/tests/blob_writer.rs
+++ /dev/null
@@ -1,105 +0,0 @@
-use std::io::Cursor;
-use std::io::{Read, Seek, SeekFrom, Write};
-use std::sync::Arc;
-
-use anyhow::{bail, Error};
-use lazy_static::lazy_static;
-
-use pbs_datastore::{DataBlob, DataBlobReader, DataBlobWriter};
-use pbs_tools::crypt_config::CryptConfig;
-
-lazy_static! {
-    static ref TEST_DATA: Vec<u8> = {
-        let mut data = Vec::new();
-
-        for i in 0..100_000 {
-            data.push((i % 255) as u8);
-        }
-
-        data
-    };
-    static ref CRYPT_CONFIG: Arc<CryptConfig> = {
-        let key = [1u8; 32];
-        Arc::new(CryptConfig::new(key).unwrap())
-    };
-    static ref TEST_DIGEST_PLAIN: [u8; 32] = [
-        83, 154, 96, 195, 167, 204, 38, 142, 204, 224, 130, 201, 24, 71, 2, 188, 130, 155, 177, 6,
-        162, 100, 61, 238, 38, 219, 63, 240, 191, 132, 87, 238
-    ];
-    static ref TEST_DIGEST_ENC: [u8; 32] = [
-        50, 162, 191, 93, 255, 132, 9, 14, 127, 23, 92, 39, 246, 102, 245, 204, 130, 104, 4, 106,
-        182, 239, 218, 14, 80, 17, 150, 188, 239, 253, 198, 117
-    ];
-}
-
-fn verify_test_blob(mut cursor: Cursor<Vec<u8>>, digest: &[u8; 32]) -> Result<(), Error> {
-    // run read tests with different buffer sizes
-    for size in [1, 3, 64 * 1024].iter() {
-        println!("Starting DataBlobReader test (size = {})", size);
-
-        cursor.seek(SeekFrom::Start(0))?;
-        let mut reader = DataBlobReader::new(&mut cursor, Some(CRYPT_CONFIG.clone()))?;
-        let mut buffer = Vec::<u8>::new();
-        // read the whole file
-        //reader.read_to_end(&mut buffer)?;
-        let mut buf = vec![0u8; *size];
-        loop {
-            let count = reader.read(&mut buf)?;
-            if count == 0 {
-                break;
-            }
-            buffer.extend(&buf[..count]);
-        }
-
-        reader.finish()?;
-        if buffer != *TEST_DATA {
-            bail!("blob data is wrong (read buffer size {})", size);
-        }
-    }
-
-    let raw_data = cursor.into_inner();
-
-    let blob = DataBlob::load_from_reader(&mut &raw_data[..])?;
-
-    let data = blob.decode(Some(&CRYPT_CONFIG), Some(digest))?;
-    if data != *TEST_DATA {
-        bail!("blob data is wrong (decode)");
-    }
-    Ok(())
-}
-
-#[test]
-fn test_uncompressed_blob_writer() -> Result<(), Error> {
-    let tmp = Cursor::new(Vec::<u8>::new());
-    let mut blob_writer = DataBlobWriter::new_uncompressed(tmp)?;
-    blob_writer.write_all(&TEST_DATA)?;
-
-    verify_test_blob(blob_writer.finish()?, &TEST_DIGEST_PLAIN)
-}
-
-#[test]
-fn test_compressed_blob_writer() -> Result<(), Error> {
-    let tmp = Cursor::new(Vec::<u8>::new());
-    let mut blob_writer = DataBlobWriter::new_compressed(tmp)?;
-    blob_writer.write_all(&TEST_DATA)?;
-
-    verify_test_blob(blob_writer.finish()?, &TEST_DIGEST_PLAIN)
-}
-
-#[test]
-fn test_encrypted_blob_writer() -> Result<(), Error> {
-    let tmp = Cursor::new(Vec::<u8>::new());
-    let mut blob_writer = DataBlobWriter::new_encrypted(tmp, CRYPT_CONFIG.clone())?;
-    blob_writer.write_all(&TEST_DATA)?;
-
-    verify_test_blob(blob_writer.finish()?, &TEST_DIGEST_ENC)
-}
-
-#[test]
-fn test_encrypted_compressed_blob_writer() -> Result<(), Error> {
-    let tmp = Cursor::new(Vec::<u8>::new());
-    let mut blob_writer = DataBlobWriter::new_encrypted_compressed(tmp, CRYPT_CONFIG.clone())?;
-    blob_writer.write_all(&TEST_DATA)?;
-
-    verify_test_blob(blob_writer.finish()?, &TEST_DIGEST_ENC)
-}
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pbs-devel] [PATCH proxmox-backup v3 2/5] datastore: test DataBlob encode/decode roundtrip
  2024-08-05  9:24 [pbs-devel] [PATCH proxmox-backup v3 0/5] improve compression throughput Dominik Csapak
  2024-08-05  9:24 ` [pbs-devel] [PATCH proxmox-backup v3 1/5] remove data blob writer Dominik Csapak
@ 2024-08-05  9:24 ` Dominik Csapak
  2024-08-05  9:24 ` [pbs-devel] [PATCH proxmox-backup v3 3/5] datastore: data blob: add helper and test for checking zstd_safe error code Dominik Csapak
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Dominik Csapak @ 2024-08-05  9:24 UTC (permalink / raw)
  To: pbs-devel

so that we can be sure we can decode an encoded blob again

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
changes from v2:
* separate into individual tests

 pbs-datastore/src/data_blob.rs | 79 ++++++++++++++++++++++++++++++++++
 1 file changed, 79 insertions(+)

diff --git a/pbs-datastore/src/data_blob.rs b/pbs-datastore/src/data_blob.rs
index a7a55fb7..a3a41c5e 100644
--- a/pbs-datastore/src/data_blob.rs
+++ b/pbs-datastore/src/data_blob.rs
@@ -562,3 +562,82 @@ impl<'a, 'b> DataChunkBuilder<'a, 'b> {
         chunk_builder.build()
     }
 }
+
+#[cfg(test)]
+mod test {
+    use pbs_tools::crypt_config::CryptConfig;
+
+    use super::DataChunkBuilder;
+
+    const TEST_DATA_LEN: usize = 50;
+
+    fn build_test_data() -> Vec<u8> {
+        let mut data = Vec::with_capacity(TEST_DATA_LEN);
+        for i in 0..TEST_DATA_LEN / 10 {
+            for _ in 0..10 {
+                data.push(i as u8);
+            }
+        }
+        data
+    }
+
+    #[test]
+    fn unencrypted_uncompressed() {
+        let data = build_test_data();
+        let (chunk, digest) = DataChunkBuilder::new(&data)
+            .compress(false)
+            .build()
+            .expect("could not create unencrypted, uncompressed chunk");
+
+        let data_decoded = chunk
+            .decode(None, Some(&digest))
+            .expect("cannot decode unencrypted, uncompressed chunk");
+        assert_eq!(data, data_decoded);
+    }
+
+    #[test]
+    fn unencrypted_compressed() {
+        let data = build_test_data();
+        let (chunk, digest) = DataChunkBuilder::new(&data)
+            .compress(true)
+            .build()
+            .expect("could not create unencrypted, compressed chunk");
+
+        let data_decoded = chunk
+            .decode(None, Some(&digest))
+            .expect("cannot decode unencrypted, compressed chunk");
+        assert_eq!(data, data_decoded);
+    }
+
+    #[test]
+    fn encrypted_uncompressed() {
+        let data = build_test_data();
+        let crypt_config = CryptConfig::new([9; 32]).expect("could not create crypt config");
+        let (chunk, digest) = DataChunkBuilder::new(&data)
+            .compress(false)
+            .crypt_config(&crypt_config)
+            .build()
+            .expect("could not create encrypted, uncompressed chunk");
+
+        let data_decoded = chunk
+            .decode(Some(&crypt_config), Some(&digest))
+            .expect("cannot decode encrypted, uncompressed chunk");
+        assert_eq!(data, data_decoded);
+    }
+
+    #[test]
+    fn encrypted_compressed() {
+        let data = build_test_data();
+        let crypt_config = CryptConfig::new([9; 32]).expect("could not create crypt config");
+        let (chunk, digest) = DataChunkBuilder::new(&data)
+            .compress(true)
+            .crypt_config(&crypt_config)
+            .build()
+            .expect("could not create encrypted, compressed chunk");
+
+        let data_decoded = chunk
+            .decode(Some(&crypt_config), Some(&digest))
+            .expect("cannot decode encrypted, compressed chunk");
+        assert_eq!(data, data_decoded);
+    }
+}
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pbs-devel] [PATCH proxmox-backup v3 3/5] datastore: data blob: add helper and test for checking zstd_safe error code
  2024-08-05  9:24 [pbs-devel] [PATCH proxmox-backup v3 0/5] improve compression throughput Dominik Csapak
  2024-08-05  9:24 ` [pbs-devel] [PATCH proxmox-backup v3 1/5] remove data blob writer Dominik Csapak
  2024-08-05  9:24 ` [pbs-devel] [PATCH proxmox-backup v3 2/5] datastore: test DataBlob encode/decode roundtrip Dominik Csapak
@ 2024-08-05  9:24 ` Dominik Csapak
  2024-08-05  9:24 ` [pbs-devel] [PATCH proxmox-backup v3 4/5] datastore: data blob: increase compression throughput Dominik Csapak
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Dominik Csapak @ 2024-08-05  9:24 UTC (permalink / raw)
  To: pbs-devel

We want to check the error code of zstd not to be 'Destination buffer to
small',  but there is no practical api at the moment. So we introduce a
helper that uses the internal logic of zstd to determine the error.

Since this is not guaranteed to be a stable api, add a test for that so
we catch that error early on build. This should be fine, as long as this
zstd behavior only changes with e.g. major debian upgrades. (Which means
a new version)

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
new in v3

note that this probably throws a warning until the next patch due to the
unused function. it would also be possible to merge this patch into
the next if that's wanted

 Cargo.toml                     |  1 +
 pbs-datastore/Cargo.toml       |  1 +
 pbs-datastore/src/data_blob.rs | 28 +++++++++++++++++++++++++++-
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/Cargo.toml b/Cargo.toml
index 2b51ec82..275e3c95 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -158,6 +158,7 @@ url = "2.1"
 walkdir = "2"
 xdg = "2.2"
 zstd = { version = "0.12", features = [ "bindgen" ] }
+zstd-safe = "6.0"
 
 [dependencies]
 anyhow.workspace = true
diff --git a/pbs-datastore/Cargo.toml b/pbs-datastore/Cargo.toml
index d8997e1d..494c231b 100644
--- a/pbs-datastore/Cargo.toml
+++ b/pbs-datastore/Cargo.toml
@@ -23,6 +23,7 @@ tokio = { workspace = true, features = [] }
 tracing.workspace = true
 walkdir.workspace = true
 zstd.workspace = true
+zstd-safe.workspace = true
 
 pathpatterns.workspace = true
 pxar.workspace = true
diff --git a/pbs-datastore/src/data_blob.rs b/pbs-datastore/src/data_blob.rs
index a3a41c5e..adf5a932 100644
--- a/pbs-datastore/src/data_blob.rs
+++ b/pbs-datastore/src/data_blob.rs
@@ -12,6 +12,17 @@ use super::file_formats::*;
 
 const MAX_BLOB_SIZE: usize = 128 * 1024 * 1024;
 
+// tests if the error code was 'Destination buffer is too small'
+// by subtracting the error code from 0 (with underflow)
+// see https://github.com/facebook/zstd/blob/dev/lib/common/error_private.h
+// we test for this below so we catch errors if the interface changes
+fn zstd_error_is_target_too_small(err: usize) -> bool {
+    let (real_code, _) = 0usize.overflowing_sub(err);
+    // ZSTD_error_dstSize_tooSmall = 70,
+    // see https://github.com/facebook/zstd/blob/dev/lib/zstd_errors.h
+    real_code == 70
+}
+
 /// Encoded data chunk with digest and positional information
 pub struct ChunkInfo {
     pub chunk: DataBlob,
@@ -567,7 +578,7 @@ impl<'a, 'b> DataChunkBuilder<'a, 'b> {
 mod test {
     use pbs_tools::crypt_config::CryptConfig;
 
-    use super::DataChunkBuilder;
+    use super::{zstd_error_is_target_too_small, DataChunkBuilder};
 
     const TEST_DATA_LEN: usize = 50;
 
@@ -640,4 +651,19 @@ mod test {
             .expect("cannot decode encrypted, compressed chunk");
         assert_eq!(data, data_decoded);
     }
+
+    #[test]
+    fn zstd_error_code_test() {
+        // test for the error code internal logic of zstd so we catch interface changes here
+        let data = &build_test_data();
+        let mut target = Vec::new();
+        match zstd_safe::compress(&mut target, data, 1) {
+            Ok(_) => panic!("unexpected success with zero-sized buffer"),
+            Err(err) => {
+                if !zstd_error_is_target_too_small(err) {
+                    panic!("unexpected error code");
+                }
+            }
+        }
+    }
 }
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pbs-devel] [PATCH proxmox-backup v3 4/5] datastore: data blob: increase compression throughput
  2024-08-05  9:24 [pbs-devel] [PATCH proxmox-backup v3 0/5] improve compression throughput Dominik Csapak
                   ` (2 preceding siblings ...)
  2024-08-05  9:24 ` [pbs-devel] [PATCH proxmox-backup v3 3/5] datastore: data blob: add helper and test for checking zstd_safe error code Dominik Csapak
@ 2024-08-05  9:24 ` Dominik Csapak
  2024-08-05  9:32   ` Dominik Csapak
  2024-08-05  9:24 ` [pbs-devel] [PATCH proxmox-backup v3 5/5] datastore: DataBlob encode: simplify code Dominik Csapak
  2024-08-07 17:06 ` [pbs-devel] applied-series: [PATCH proxmox-backup v3 0/5] improve compression throughput Thomas Lamprecht
  5 siblings, 1 reply; 9+ messages in thread
From: Dominik Csapak @ 2024-08-05  9:24 UTC (permalink / raw)
  To: pbs-devel

by not using `zstd::stream::copy_encode`, because that has an allocation
pattern that reduces throughput if the target/source storage and the
network are faster than the chunk creation.

instead use `zstd_safe::compress` which shouldn't do any big
allocations, since we provide the target buffer.

To handle the case that the target buffer is too small, we now ignore
all zstd error and continue with the uncompressed data, logging the error
except if the target buffer is too small.

Some benchmarks on my machine from tmpfs to a datastore on tmpfs:

Type                without patches (MiB/s)  with patches (MiB/s)
.img file           ~614                     ~767
pxar one big file   ~657                     ~807
pxar small files    ~576                     ~627

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
changes from v2:
* use zstd_safe instead of zstd
 pbs-datastore/src/data_blob.rs | 33 +++++++++++++++++----------------
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/pbs-datastore/src/data_blob.rs b/pbs-datastore/src/data_blob.rs
index adf5a932..4e689364 100644
--- a/pbs-datastore/src/data_blob.rs
+++ b/pbs-datastore/src/data_blob.rs
@@ -147,39 +147,40 @@ impl DataBlob {
 
             DataBlob { raw_data }
         } else {
-            let max_data_len = data.len() + std::mem::size_of::<DataBlobHeader>();
+            let header_len = std::mem::size_of::<DataBlobHeader>();
+            let max_data_len = data.len() + header_len;
+            let mut raw_data = vec![0; max_data_len];
             if compress {
-                let mut comp_data = Vec::with_capacity(max_data_len);
-
                 let head = DataBlobHeader {
                     magic: COMPRESSED_BLOB_MAGIC_1_0,
                     crc: [0; 4],
                 };
                 unsafe {
-                    comp_data.write_le_value(head)?;
+                    (&mut raw_data[0..header_len]).write_le_value(head)?;
                 }
 
-                zstd::stream::copy_encode(data, &mut comp_data, 1)?;
-
-                if comp_data.len() < max_data_len {
-                    let mut blob = DataBlob {
-                        raw_data: comp_data,
-                    };
-                    blob.set_crc(blob.compute_crc());
-                    return Ok(blob);
+                match zstd_safe::compress(&mut raw_data[header_len..], data, 1) {
+                    Ok(size) if size <= data.len() => {
+                        raw_data.truncate(header_len + size);
+                        let mut blob = DataBlob { raw_data };
+                        blob.set_crc(blob.compute_crc());
+                        return Ok(blob);
+                    }
+                    Err(err) if !zstd_error_is_target_too_small(err) => {
+                        log::warn!("zstd compression error: {err}");
+                    }
+                    _ => {}
                 }
             }
 
-            let mut raw_data = Vec::with_capacity(max_data_len);
-
             let head = DataBlobHeader {
                 magic: UNCOMPRESSED_BLOB_MAGIC_1_0,
                 crc: [0; 4],
             };
             unsafe {
-                raw_data.write_le_value(head)?;
+                (&mut raw_data[0..header_len]).write_le_value(head)?;
             }
-            raw_data.extend_from_slice(data);
+            (&mut raw_data[header_len..]).write_all(data)?;
 
             DataBlob { raw_data }
         };
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pbs-devel] [PATCH proxmox-backup v3 5/5] datastore: DataBlob encode: simplify code
  2024-08-05  9:24 [pbs-devel] [PATCH proxmox-backup v3 0/5] improve compression throughput Dominik Csapak
                   ` (3 preceding siblings ...)
  2024-08-05  9:24 ` [pbs-devel] [PATCH proxmox-backup v3 4/5] datastore: data blob: increase compression throughput Dominik Csapak
@ 2024-08-05  9:24 ` Dominik Csapak
  2024-08-07 17:06 ` [pbs-devel] applied-series: [PATCH proxmox-backup v3 0/5] improve compression throughput Thomas Lamprecht
  5 siblings, 0 replies; 9+ messages in thread
From: Dominik Csapak @ 2024-08-05  9:24 UTC (permalink / raw)
  To: pbs-devel

by combining the compression call from both encrypted and unencrypted
paths and deciding on the header magic at one site.

No functional changes intended, besides reusing the same buffer for
compression.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
changes from v2:
* adapt to cahnges in previous patches

 pbs-datastore/src/data_blob.rs | 89 ++++++++++++++--------------------
 1 file changed, 37 insertions(+), 52 deletions(-)

diff --git a/pbs-datastore/src/data_blob.rs b/pbs-datastore/src/data_blob.rs
index 4e689364..ee1bb664 100644
--- a/pbs-datastore/src/data_blob.rs
+++ b/pbs-datastore/src/data_blob.rs
@@ -104,28 +104,42 @@ impl DataBlob {
             bail!("data blob too large ({} bytes).", data.len());
         }
 
-        let mut blob = if let Some(config) = config {
-            let compr_data;
-            let (_compress, data, magic) = if compress {
-                compr_data = zstd::bulk::compress(data, 1)?;
-                // Note: We only use compression if result is shorter
-                if compr_data.len() < data.len() {
-                    (true, &compr_data[..], ENCR_COMPR_BLOB_MAGIC_1_0)
-                } else {
-                    (false, data, ENCRYPTED_BLOB_MAGIC_1_0)
+        let header_len = if config.is_some() {
+            std::mem::size_of::<EncryptedDataBlobHeader>()
+        } else {
+            std::mem::size_of::<DataBlobHeader>()
+        };
+
+        let mut compressed = false;
+        let mut data_compressed = vec![0u8; header_len + data.len()];
+        if compress {
+            match zstd_safe::compress(&mut data_compressed[header_len..], data, 1) {
+                Ok(size) if size <= data.len() => {
+                    data_compressed.truncate(header_len + size);
+                    compressed = true;
                 }
-            } else {
-                (false, data, ENCRYPTED_BLOB_MAGIC_1_0)
-            };
+                Err(err) if !zstd_error_is_target_too_small(err) => {
+                    log::warn!("zstd compression error: {err}");
+                }
+                _ => {}
+            }
+        }
 
-            let header_len = std::mem::size_of::<EncryptedDataBlobHeader>();
+        let (magic, encryption_source) = match (compressed, config.is_some()) {
+            (true, true) => (ENCR_COMPR_BLOB_MAGIC_1_0, &data_compressed[header_len..]),
+            (true, false) => (COMPRESSED_BLOB_MAGIC_1_0, &data_compressed[header_len..]),
+            (false, true) => (ENCRYPTED_BLOB_MAGIC_1_0, data),
+            (false, false) => {
+                (&mut data_compressed[header_len..]).write_all(data)?;
+                (UNCOMPRESSED_BLOB_MAGIC_1_0, data)
+            }
+        };
+
+        let raw_data = if let Some(config) = config {
             let mut raw_data = Vec::with_capacity(data.len() + header_len);
 
             let dummy_head = EncryptedDataBlobHeader {
-                head: DataBlobHeader {
-                    magic: [0u8; 8],
-                    crc: [0; 4],
-                },
+                head: DataBlobHeader { magic, crc: [0; 4] },
                 iv: [0u8; 16],
                 tag: [0u8; 16],
             };
@@ -133,7 +147,7 @@ impl DataBlob {
                 raw_data.write_le_value(dummy_head)?;
             }
 
-            let (iv, tag) = Self::encrypt_to(config, data, &mut raw_data)?;
+            let (iv, tag) = Self::encrypt_to(config, encryption_source, &mut raw_data)?;
 
             let head = EncryptedDataBlobHeader {
                 head: DataBlobHeader { magic, crc: [0; 4] },
@@ -145,46 +159,17 @@ impl DataBlob {
                 (&mut raw_data[0..header_len]).write_le_value(head)?;
             }
 
-            DataBlob { raw_data }
+            raw_data
         } else {
-            let header_len = std::mem::size_of::<DataBlobHeader>();
-            let max_data_len = data.len() + header_len;
-            let mut raw_data = vec![0; max_data_len];
-            if compress {
-                let head = DataBlobHeader {
-                    magic: COMPRESSED_BLOB_MAGIC_1_0,
-                    crc: [0; 4],
-                };
-                unsafe {
-                    (&mut raw_data[0..header_len]).write_le_value(head)?;
-                }
-
-                match zstd_safe::compress(&mut raw_data[header_len..], data, 1) {
-                    Ok(size) if size <= data.len() => {
-                        raw_data.truncate(header_len + size);
-                        let mut blob = DataBlob { raw_data };
-                        blob.set_crc(blob.compute_crc());
-                        return Ok(blob);
-                    }
-                    Err(err) if !zstd_error_is_target_too_small(err) => {
-                        log::warn!("zstd compression error: {err}");
-                    }
-                    _ => {}
-                }
-            }
-
-            let head = DataBlobHeader {
-                magic: UNCOMPRESSED_BLOB_MAGIC_1_0,
-                crc: [0; 4],
-            };
+            let head = DataBlobHeader { magic, crc: [0; 4] };
             unsafe {
-                (&mut raw_data[0..header_len]).write_le_value(head)?;
+                (&mut data_compressed[0..header_len]).write_le_value(head)?;
             }
-            (&mut raw_data[header_len..]).write_all(data)?;
 
-            DataBlob { raw_data }
+            data_compressed
         };
 
+        let mut blob = DataBlob { raw_data };
         blob.set_crc(blob.compute_crc());
 
         Ok(blob)
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup v3 4/5] datastore: data blob: increase compression throughput
  2024-08-05  9:24 ` [pbs-devel] [PATCH proxmox-backup v3 4/5] datastore: data blob: increase compression throughput Dominik Csapak
@ 2024-08-05  9:32   ` Dominik Csapak
  0 siblings, 0 replies; 9+ messages in thread
From: Dominik Csapak @ 2024-08-05  9:32 UTC (permalink / raw)
  To: pbs-devel; +Cc: Thomas Lamprecht

On 8/5/24 11:24, Dominik Csapak wrote:
> by not using `zstd::stream::copy_encode`, because that has an allocation
> pattern that reduces throughput if the target/source storage and the
> network are faster than the chunk creation.
> 
> instead use `zstd_safe::compress` which shouldn't do any big
> allocations, since we provide the target buffer.
> 
> To handle the case that the target buffer is too small, we now ignore
> all zstd error and continue with the uncompressed data, logging the error
> except if the target buffer is too small.
> 
> Some benchmarks on my machine from tmpfs to a datastore on tmpfs:
> 
> Type                without patches (MiB/s)  with patches (MiB/s)
> .img file           ~614                     ~767
> pxar one big file   ~657                     ~807
> pxar small files    ~576                     ~627
> 
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>

@thomas

sorry i forgot to adapt the commit message to your (or similar) suggestion
feel free to either adapt it to your liking or telling me i should send a v4 for this ;)
(which i'll happily do if you want)


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pbs-devel] applied-series: [PATCH proxmox-backup v3 0/5] improve compression throughput
  2024-08-05  9:24 [pbs-devel] [PATCH proxmox-backup v3 0/5] improve compression throughput Dominik Csapak
                   ` (4 preceding siblings ...)
  2024-08-05  9:24 ` [pbs-devel] [PATCH proxmox-backup v3 5/5] datastore: DataBlob encode: simplify code Dominik Csapak
@ 2024-08-07 17:06 ` Thomas Lamprecht
  2024-08-08  6:53   ` Dominik Csapak
  5 siblings, 1 reply; 9+ messages in thread
From: Thomas Lamprecht @ 2024-08-07 17:06 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Dominik Csapak

On 05/08/2024 11:24, Dominik Csapak wrote:
> in my tests (against current master) it improved the throughput if
> the source/target storage is fast enough (tmpfs -> tmpfs):
> 
> Type                master (MiB/s)   with my patches (MiB/s)
> .img file           ~614             ~767
> pxar one big file   ~657             ~807
> pxar small files    ~576             ~627
> 
> (these results are also in the relevant commit message)
> 
> It would be great, if someone else can cross check my results here.
> Note: the the pxar code being faster than the img code seems to stem
> from better multithreading pipelining in that code or in tokio (pxar
> codepath scales more directly with more cores than the .img codepath)
> 
> changes from v2:
> * use zstd_safe instead of zstd so we have access to the underlying
>   error code
> * add test for the error code handling since that's not part of the
>   public zstd api, only an implementation detail (albeit one that's
>   not likely to change soon)
> * seperated the tests for the decode(encode()) roundtrip so a failure
>   can more easily assigned to a specific codepath
> 
> changes from v1:
> * reorder patches so that the data blob writer removal is the first one
> * add tests for DataBlob that we can decode what we encoded
>   (to see that my patches don't mess up the chunk generation)
> * add new patch to cleanup the `encode` function a bit
> 
> Dominik Csapak (5):
>   remove data blob writer
>   datastore: test DataBlob encode/decode roundtrip
>   datastore: data blob: add helper and test for checking zstd_safe error
>     code
>   datastore: data blob: increase compression throughput
>   datastore: DataBlob encode: simplify code
> 
>  Cargo.toml                            |   1 +
>  pbs-datastore/Cargo.toml              |   1 +
>  pbs-datastore/src/data_blob.rs        | 193 ++++++++++++++++-------
>  pbs-datastore/src/data_blob_writer.rs | 212 --------------------------
>  pbs-datastore/src/lib.rs              |   2 -
>  tests/blob_writer.rs                  | 105 -------------
>  6 files changed, 144 insertions(+), 370 deletions(-)
>  delete mode 100644 pbs-datastore/src/data_blob_writer.rs
>  delete mode 100644 tests/blob_writer.rs
> 

Applied, with some rewording of the commit message and some slight
adaption to the test commit.

Ps, it seems the zstd crate authors aren't so sure why they use the
32 KB buffer either, which FWICT is the underlying issue here:

https://docs.rs/zstd/latest/src/zstd/stream/zio/writer.rs.html#41-42

But it's a bit hard to follow, to me this looks less like a allocation
pattern issue (on its own), but rather than a increased overhead due to
processing in 32 KiB chunks, the extra copying itself naturally doesn't
help, but that's not a bad allocation pattern but rather a single
(FWICT) avoidable allocations for the small buffer, but as said, not
100% sure as the code is rather over-engineered... Anyhow, I tried to
add these findings, including the uncertainty they have, in the commit
message to have some better background.

I know you could have done this in a v4, but it felt faster to just
amend the changes, especially since I have a few days off and would
have to recreate the mental context anyway.


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [pbs-devel] applied-series: [PATCH proxmox-backup v3 0/5] improve compression throughput
  2024-08-07 17:06 ` [pbs-devel] applied-series: [PATCH proxmox-backup v3 0/5] improve compression throughput Thomas Lamprecht
@ 2024-08-08  6:53   ` Dominik Csapak
  0 siblings, 0 replies; 9+ messages in thread
From: Dominik Csapak @ 2024-08-08  6:53 UTC (permalink / raw)
  To: Thomas Lamprecht, Proxmox Backup Server development discussion

On 8/7/24 19:06, Thomas Lamprecht wrote:
> On 05/08/2024 11:24, Dominik Csapak wrote:
>> in my tests (against current master) it improved the throughput if
>> the source/target storage is fast enough (tmpfs -> tmpfs):
>>
>> Type                master (MiB/s)   with my patches (MiB/s)
>> .img file           ~614             ~767
>> pxar one big file   ~657             ~807
>> pxar small files    ~576             ~627
>>
>> (these results are also in the relevant commit message)
>>
>> It would be great, if someone else can cross check my results here.
>> Note: the the pxar code being faster than the img code seems to stem
>> from better multithreading pipelining in that code or in tokio (pxar
>> codepath scales more directly with more cores than the .img codepath)
>>
>> changes from v2:
>> * use zstd_safe instead of zstd so we have access to the underlying
>>    error code
>> * add test for the error code handling since that's not part of the
>>    public zstd api, only an implementation detail (albeit one that's
>>    not likely to change soon)
>> * seperated the tests for the decode(encode()) roundtrip so a failure
>>    can more easily assigned to a specific codepath
>>
>> changes from v1:
>> * reorder patches so that the data blob writer removal is the first one
>> * add tests for DataBlob that we can decode what we encoded
>>    (to see that my patches don't mess up the chunk generation)
>> * add new patch to cleanup the `encode` function a bit
>>
>> Dominik Csapak (5):
>>    remove data blob writer
>>    datastore: test DataBlob encode/decode roundtrip
>>    datastore: data blob: add helper and test for checking zstd_safe error
>>      code
>>    datastore: data blob: increase compression throughput
>>    datastore: DataBlob encode: simplify code
>>
>>   Cargo.toml                            |   1 +
>>   pbs-datastore/Cargo.toml              |   1 +
>>   pbs-datastore/src/data_blob.rs        | 193 ++++++++++++++++-------
>>   pbs-datastore/src/data_blob_writer.rs | 212 --------------------------
>>   pbs-datastore/src/lib.rs              |   2 -
>>   tests/blob_writer.rs                  | 105 -------------
>>   6 files changed, 144 insertions(+), 370 deletions(-)
>>   delete mode 100644 pbs-datastore/src/data_blob_writer.rs
>>   delete mode 100644 tests/blob_writer.rs
>>
> 
> Applied, with some rewording of the commit message and some slight
> adaption to the test commit.
> 
> Ps, it seems the zstd crate authors aren't so sure why they use the
> 32 KB buffer either, which FWICT is the underlying issue here:
> 
> https://docs.rs/zstd/latest/src/zstd/stream/zio/writer.rs.html#41-42
> 
> But it's a bit hard to follow, to me this looks less like a allocation
> pattern issue (on its own), but rather than a increased overhead due to
> processing in 32 KiB chunks, the extra copying itself naturally doesn't
> help, but that's not a bad allocation pattern but rather a single
> (FWICT) avoidable allocations for the small buffer, but as said, not
> 100% sure as the code is rather over-engineered... Anyhow, I tried to
> add these findings, including the uncertainty they have, in the commit
> message to have some better background.
> 
> I know you could have done this in a v4, but it felt faster to just
> amend the changes, especially since I have a few days off and would
> have to recreate the mental context anyway.


Ah ok, thanks for investigating (I was not patient enough for that seemingly..)

also thanks for the amending of the commit messages :)


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-08-08  6:53 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-05  9:24 [pbs-devel] [PATCH proxmox-backup v3 0/5] improve compression throughput Dominik Csapak
2024-08-05  9:24 ` [pbs-devel] [PATCH proxmox-backup v3 1/5] remove data blob writer Dominik Csapak
2024-08-05  9:24 ` [pbs-devel] [PATCH proxmox-backup v3 2/5] datastore: test DataBlob encode/decode roundtrip Dominik Csapak
2024-08-05  9:24 ` [pbs-devel] [PATCH proxmox-backup v3 3/5] datastore: data blob: add helper and test for checking zstd_safe error code Dominik Csapak
2024-08-05  9:24 ` [pbs-devel] [PATCH proxmox-backup v3 4/5] datastore: data blob: increase compression throughput Dominik Csapak
2024-08-05  9:32   ` Dominik Csapak
2024-08-05  9:24 ` [pbs-devel] [PATCH proxmox-backup v3 5/5] datastore: DataBlob encode: simplify code Dominik Csapak
2024-08-07 17:06 ` [pbs-devel] applied-series: [PATCH proxmox-backup v3 0/5] improve compression throughput Thomas Lamprecht
2024-08-08  6:53   ` Dominik Csapak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal