* [pbs-devel] [PATCH proxmox-backup v3 0/5] improve compression throughput @ 2024-08-05 9:24 Dominik Csapak 2024-08-05 9:24 ` [pbs-devel] [PATCH proxmox-backup v3 1/5] remove data blob writer Dominik Csapak ` (5 more replies) 0 siblings, 6 replies; 9+ messages in thread From: Dominik Csapak @ 2024-08-05 9:24 UTC (permalink / raw) To: pbs-devel in my tests (against current master) it improved the throughput if the source/target storage is fast enough (tmpfs -> tmpfs): Type master (MiB/s) with my patches (MiB/s) .img file ~614 ~767 pxar one big file ~657 ~807 pxar small files ~576 ~627 (these results are also in the relevant commit message) It would be great, if someone else can cross check my results here. Note: the the pxar code being faster than the img code seems to stem from better multithreading pipelining in that code or in tokio (pxar codepath scales more directly with more cores than the .img codepath) changes from v2: * use zstd_safe instead of zstd so we have access to the underlying error code * add test for the error code handling since that's not part of the public zstd api, only an implementation detail (albeit one that's not likely to change soon) * seperated the tests for the decode(encode()) roundtrip so a failure can more easily assigned to a specific codepath changes from v1: * reorder patches so that the data blob writer removal is the first one * add tests for DataBlob that we can decode what we encoded (to see that my patches don't mess up the chunk generation) * add new patch to cleanup the `encode` function a bit Dominik Csapak (5): remove data blob writer datastore: test DataBlob encode/decode roundtrip datastore: data blob: add helper and test for checking zstd_safe error code datastore: data blob: increase compression throughput datastore: DataBlob encode: simplify code Cargo.toml | 1 + pbs-datastore/Cargo.toml | 1 + pbs-datastore/src/data_blob.rs | 193 ++++++++++++++++------- pbs-datastore/src/data_blob_writer.rs | 212 -------------------------- pbs-datastore/src/lib.rs | 2 - tests/blob_writer.rs | 105 ------------- 6 files changed, 144 insertions(+), 370 deletions(-) delete mode 100644 pbs-datastore/src/data_blob_writer.rs delete mode 100644 tests/blob_writer.rs -- 2.39.2 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v3 1/5] remove data blob writer 2024-08-05 9:24 [pbs-devel] [PATCH proxmox-backup v3 0/5] improve compression throughput Dominik Csapak @ 2024-08-05 9:24 ` Dominik Csapak 2024-08-05 9:24 ` [pbs-devel] [PATCH proxmox-backup v3 2/5] datastore: test DataBlob encode/decode roundtrip Dominik Csapak ` (4 subsequent siblings) 5 siblings, 0 replies; 9+ messages in thread From: Dominik Csapak @ 2024-08-05 9:24 UTC (permalink / raw) To: pbs-devel this is leftover code that is not currently used outside of tests. Should we need it again, we can just revert this commit. Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> --- pbs-datastore/src/data_blob_writer.rs | 212 -------------------------- pbs-datastore/src/lib.rs | 2 - tests/blob_writer.rs | 105 ------------- 3 files changed, 319 deletions(-) delete mode 100644 pbs-datastore/src/data_blob_writer.rs delete mode 100644 tests/blob_writer.rs diff --git a/pbs-datastore/src/data_blob_writer.rs b/pbs-datastore/src/data_blob_writer.rs deleted file mode 100644 index 30d9645f..00000000 --- a/pbs-datastore/src/data_blob_writer.rs +++ /dev/null @@ -1,212 +0,0 @@ -use std::io::{Seek, SeekFrom, Write}; -use std::sync::Arc; - -use anyhow::Error; - -use proxmox_io::WriteExt; - -use pbs_tools::crypt_config::CryptConfig; - -use crate::checksum_writer::ChecksumWriter; -use crate::crypt_writer::CryptWriter; -use crate::file_formats::{self, DataBlobHeader, EncryptedDataBlobHeader}; - -enum BlobWriterState<'writer, W: Write> { - Uncompressed { - csum_writer: ChecksumWriter<W>, - }, - Compressed { - compr: zstd::stream::write::Encoder<'writer, ChecksumWriter<W>>, - }, - Encrypted { - crypt_writer: CryptWriter<ChecksumWriter<W>>, - }, - EncryptedCompressed { - compr: zstd::stream::write::Encoder<'writer, CryptWriter<ChecksumWriter<W>>>, - }, -} - -/// Data blob writer -pub struct DataBlobWriter<'writer, W: Write> { - state: BlobWriterState<'writer, W>, -} - -impl<W: Write + Seek> DataBlobWriter<'_, W> { - pub fn new_uncompressed(mut writer: W) -> Result<Self, Error> { - writer.seek(SeekFrom::Start(0))?; - let head = DataBlobHeader { - magic: file_formats::UNCOMPRESSED_BLOB_MAGIC_1_0, - crc: [0; 4], - }; - unsafe { - writer.write_le_value(head)?; - } - let csum_writer = ChecksumWriter::new(writer, None); - Ok(Self { - state: BlobWriterState::Uncompressed { csum_writer }, - }) - } - - pub fn new_compressed(mut writer: W) -> Result<Self, Error> { - writer.seek(SeekFrom::Start(0))?; - let head = DataBlobHeader { - magic: file_formats::COMPRESSED_BLOB_MAGIC_1_0, - crc: [0; 4], - }; - unsafe { - writer.write_le_value(head)?; - } - let csum_writer = ChecksumWriter::new(writer, None); - let compr = zstd::stream::write::Encoder::new(csum_writer, 1)?; - Ok(Self { - state: BlobWriterState::Compressed { compr }, - }) - } - - pub fn new_encrypted(mut writer: W, config: Arc<CryptConfig>) -> Result<Self, Error> { - writer.seek(SeekFrom::Start(0))?; - let head = EncryptedDataBlobHeader { - head: DataBlobHeader { - magic: file_formats::ENCRYPTED_BLOB_MAGIC_1_0, - crc: [0; 4], - }, - iv: [0u8; 16], - tag: [0u8; 16], - }; - unsafe { - writer.write_le_value(head)?; - } - - let csum_writer = ChecksumWriter::new(writer, None); - let crypt_writer = CryptWriter::new(csum_writer, config)?; - Ok(Self { - state: BlobWriterState::Encrypted { crypt_writer }, - }) - } - - pub fn new_encrypted_compressed( - mut writer: W, - config: Arc<CryptConfig>, - ) -> Result<Self, Error> { - writer.seek(SeekFrom::Start(0))?; - let head = EncryptedDataBlobHeader { - head: DataBlobHeader { - magic: file_formats::ENCR_COMPR_BLOB_MAGIC_1_0, - crc: [0; 4], - }, - iv: [0u8; 16], - tag: [0u8; 16], - }; - unsafe { - writer.write_le_value(head)?; - } - - let csum_writer = ChecksumWriter::new(writer, None); - let crypt_writer = CryptWriter::new(csum_writer, config)?; - let compr = zstd::stream::write::Encoder::new(crypt_writer, 1)?; - Ok(Self { - state: BlobWriterState::EncryptedCompressed { compr }, - }) - } - - pub fn finish(self) -> Result<W, Error> { - match self.state { - BlobWriterState::Uncompressed { csum_writer } => { - // write CRC - let (mut writer, crc, _) = csum_writer.finish()?; - let head = DataBlobHeader { - magic: file_formats::UNCOMPRESSED_BLOB_MAGIC_1_0, - crc: crc.to_le_bytes(), - }; - - writer.seek(SeekFrom::Start(0))?; - unsafe { - writer.write_le_value(head)?; - } - - Ok(writer) - } - BlobWriterState::Compressed { compr } => { - let csum_writer = compr.finish()?; - let (mut writer, crc, _) = csum_writer.finish()?; - - let head = DataBlobHeader { - magic: file_formats::COMPRESSED_BLOB_MAGIC_1_0, - crc: crc.to_le_bytes(), - }; - - writer.seek(SeekFrom::Start(0))?; - unsafe { - writer.write_le_value(head)?; - } - - Ok(writer) - } - BlobWriterState::Encrypted { crypt_writer } => { - let (csum_writer, iv, tag) = crypt_writer.finish()?; - let (mut writer, crc, _) = csum_writer.finish()?; - - let head = EncryptedDataBlobHeader { - head: DataBlobHeader { - magic: file_formats::ENCRYPTED_BLOB_MAGIC_1_0, - crc: crc.to_le_bytes(), - }, - iv, - tag, - }; - writer.seek(SeekFrom::Start(0))?; - unsafe { - writer.write_le_value(head)?; - } - Ok(writer) - } - BlobWriterState::EncryptedCompressed { compr } => { - let crypt_writer = compr.finish()?; - let (csum_writer, iv, tag) = crypt_writer.finish()?; - let (mut writer, crc, _) = csum_writer.finish()?; - - let head = EncryptedDataBlobHeader { - head: DataBlobHeader { - magic: file_formats::ENCR_COMPR_BLOB_MAGIC_1_0, - crc: crc.to_le_bytes(), - }, - iv, - tag, - }; - writer.seek(SeekFrom::Start(0))?; - unsafe { - writer.write_le_value(head)?; - } - Ok(writer) - } - } - } -} - -impl<W: Write + Seek> Write for DataBlobWriter<'_, W> { - fn write(&mut self, buf: &[u8]) -> Result<usize, std::io::Error> { - match self.state { - BlobWriterState::Uncompressed { - ref mut csum_writer, - } => csum_writer.write(buf), - BlobWriterState::Compressed { ref mut compr } => compr.write(buf), - BlobWriterState::Encrypted { - ref mut crypt_writer, - } => crypt_writer.write(buf), - BlobWriterState::EncryptedCompressed { ref mut compr } => compr.write(buf), - } - } - - fn flush(&mut self) -> Result<(), std::io::Error> { - match self.state { - BlobWriterState::Uncompressed { - ref mut csum_writer, - } => csum_writer.flush(), - BlobWriterState::Compressed { ref mut compr } => compr.flush(), - BlobWriterState::Encrypted { - ref mut crypt_writer, - } => crypt_writer.flush(), - BlobWriterState::EncryptedCompressed { ref mut compr } => compr.flush(), - } - } -} diff --git a/pbs-datastore/src/lib.rs b/pbs-datastore/src/lib.rs index 3e4aa34c..202b0955 100644 --- a/pbs-datastore/src/lib.rs +++ b/pbs-datastore/src/lib.rs @@ -179,7 +179,6 @@ pub mod crypt_reader; pub mod crypt_writer; pub mod data_blob; pub mod data_blob_reader; -pub mod data_blob_writer; pub mod file_formats; pub mod index; pub mod manifest; @@ -201,7 +200,6 @@ pub use crypt_reader::CryptReader; pub use crypt_writer::CryptWriter; pub use data_blob::DataBlob; pub use data_blob_reader::DataBlobReader; -pub use data_blob_writer::DataBlobWriter; pub use manifest::BackupManifest; pub use store_progress::StoreProgress; diff --git a/tests/blob_writer.rs b/tests/blob_writer.rs deleted file mode 100644 index 23a3283d..00000000 --- a/tests/blob_writer.rs +++ /dev/null @@ -1,105 +0,0 @@ -use std::io::Cursor; -use std::io::{Read, Seek, SeekFrom, Write}; -use std::sync::Arc; - -use anyhow::{bail, Error}; -use lazy_static::lazy_static; - -use pbs_datastore::{DataBlob, DataBlobReader, DataBlobWriter}; -use pbs_tools::crypt_config::CryptConfig; - -lazy_static! { - static ref TEST_DATA: Vec<u8> = { - let mut data = Vec::new(); - - for i in 0..100_000 { - data.push((i % 255) as u8); - } - - data - }; - static ref CRYPT_CONFIG: Arc<CryptConfig> = { - let key = [1u8; 32]; - Arc::new(CryptConfig::new(key).unwrap()) - }; - static ref TEST_DIGEST_PLAIN: [u8; 32] = [ - 83, 154, 96, 195, 167, 204, 38, 142, 204, 224, 130, 201, 24, 71, 2, 188, 130, 155, 177, 6, - 162, 100, 61, 238, 38, 219, 63, 240, 191, 132, 87, 238 - ]; - static ref TEST_DIGEST_ENC: [u8; 32] = [ - 50, 162, 191, 93, 255, 132, 9, 14, 127, 23, 92, 39, 246, 102, 245, 204, 130, 104, 4, 106, - 182, 239, 218, 14, 80, 17, 150, 188, 239, 253, 198, 117 - ]; -} - -fn verify_test_blob(mut cursor: Cursor<Vec<u8>>, digest: &[u8; 32]) -> Result<(), Error> { - // run read tests with different buffer sizes - for size in [1, 3, 64 * 1024].iter() { - println!("Starting DataBlobReader test (size = {})", size); - - cursor.seek(SeekFrom::Start(0))?; - let mut reader = DataBlobReader::new(&mut cursor, Some(CRYPT_CONFIG.clone()))?; - let mut buffer = Vec::<u8>::new(); - // read the whole file - //reader.read_to_end(&mut buffer)?; - let mut buf = vec![0u8; *size]; - loop { - let count = reader.read(&mut buf)?; - if count == 0 { - break; - } - buffer.extend(&buf[..count]); - } - - reader.finish()?; - if buffer != *TEST_DATA { - bail!("blob data is wrong (read buffer size {})", size); - } - } - - let raw_data = cursor.into_inner(); - - let blob = DataBlob::load_from_reader(&mut &raw_data[..])?; - - let data = blob.decode(Some(&CRYPT_CONFIG), Some(digest))?; - if data != *TEST_DATA { - bail!("blob data is wrong (decode)"); - } - Ok(()) -} - -#[test] -fn test_uncompressed_blob_writer() -> Result<(), Error> { - let tmp = Cursor::new(Vec::<u8>::new()); - let mut blob_writer = DataBlobWriter::new_uncompressed(tmp)?; - blob_writer.write_all(&TEST_DATA)?; - - verify_test_blob(blob_writer.finish()?, &TEST_DIGEST_PLAIN) -} - -#[test] -fn test_compressed_blob_writer() -> Result<(), Error> { - let tmp = Cursor::new(Vec::<u8>::new()); - let mut blob_writer = DataBlobWriter::new_compressed(tmp)?; - blob_writer.write_all(&TEST_DATA)?; - - verify_test_blob(blob_writer.finish()?, &TEST_DIGEST_PLAIN) -} - -#[test] -fn test_encrypted_blob_writer() -> Result<(), Error> { - let tmp = Cursor::new(Vec::<u8>::new()); - let mut blob_writer = DataBlobWriter::new_encrypted(tmp, CRYPT_CONFIG.clone())?; - blob_writer.write_all(&TEST_DATA)?; - - verify_test_blob(blob_writer.finish()?, &TEST_DIGEST_ENC) -} - -#[test] -fn test_encrypted_compressed_blob_writer() -> Result<(), Error> { - let tmp = Cursor::new(Vec::<u8>::new()); - let mut blob_writer = DataBlobWriter::new_encrypted_compressed(tmp, CRYPT_CONFIG.clone())?; - blob_writer.write_all(&TEST_DATA)?; - - verify_test_blob(blob_writer.finish()?, &TEST_DIGEST_ENC) -} -- 2.39.2 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v3 2/5] datastore: test DataBlob encode/decode roundtrip 2024-08-05 9:24 [pbs-devel] [PATCH proxmox-backup v3 0/5] improve compression throughput Dominik Csapak 2024-08-05 9:24 ` [pbs-devel] [PATCH proxmox-backup v3 1/5] remove data blob writer Dominik Csapak @ 2024-08-05 9:24 ` Dominik Csapak 2024-08-05 9:24 ` [pbs-devel] [PATCH proxmox-backup v3 3/5] datastore: data blob: add helper and test for checking zstd_safe error code Dominik Csapak ` (3 subsequent siblings) 5 siblings, 0 replies; 9+ messages in thread From: Dominik Csapak @ 2024-08-05 9:24 UTC (permalink / raw) To: pbs-devel so that we can be sure we can decode an encoded blob again Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> --- changes from v2: * separate into individual tests pbs-datastore/src/data_blob.rs | 79 ++++++++++++++++++++++++++++++++++ 1 file changed, 79 insertions(+) diff --git a/pbs-datastore/src/data_blob.rs b/pbs-datastore/src/data_blob.rs index a7a55fb7..a3a41c5e 100644 --- a/pbs-datastore/src/data_blob.rs +++ b/pbs-datastore/src/data_blob.rs @@ -562,3 +562,82 @@ impl<'a, 'b> DataChunkBuilder<'a, 'b> { chunk_builder.build() } } + +#[cfg(test)] +mod test { + use pbs_tools::crypt_config::CryptConfig; + + use super::DataChunkBuilder; + + const TEST_DATA_LEN: usize = 50; + + fn build_test_data() -> Vec<u8> { + let mut data = Vec::with_capacity(TEST_DATA_LEN); + for i in 0..TEST_DATA_LEN / 10 { + for _ in 0..10 { + data.push(i as u8); + } + } + data + } + + #[test] + fn unencrypted_uncompressed() { + let data = build_test_data(); + let (chunk, digest) = DataChunkBuilder::new(&data) + .compress(false) + .build() + .expect("could not create unencrypted, uncompressed chunk"); + + let data_decoded = chunk + .decode(None, Some(&digest)) + .expect("cannot decode unencrypted, uncompressed chunk"); + assert_eq!(data, data_decoded); + } + + #[test] + fn unencrypted_compressed() { + let data = build_test_data(); + let (chunk, digest) = DataChunkBuilder::new(&data) + .compress(true) + .build() + .expect("could not create unencrypted, compressed chunk"); + + let data_decoded = chunk + .decode(None, Some(&digest)) + .expect("cannot decode unencrypted, compressed chunk"); + assert_eq!(data, data_decoded); + } + + #[test] + fn encrypted_uncompressed() { + let data = build_test_data(); + let crypt_config = CryptConfig::new([9; 32]).expect("could not create crypt config"); + let (chunk, digest) = DataChunkBuilder::new(&data) + .compress(false) + .crypt_config(&crypt_config) + .build() + .expect("could not create encrypted, uncompressed chunk"); + + let data_decoded = chunk + .decode(Some(&crypt_config), Some(&digest)) + .expect("cannot decode encrypted, uncompressed chunk"); + assert_eq!(data, data_decoded); + } + + #[test] + fn encrypted_compressed() { + let data = build_test_data(); + let crypt_config = CryptConfig::new([9; 32]).expect("could not create crypt config"); + let (chunk, digest) = DataChunkBuilder::new(&data) + .compress(true) + .crypt_config(&crypt_config) + .build() + .expect("could not create encrypted, compressed chunk"); + + let data_decoded = chunk + .decode(Some(&crypt_config), Some(&digest)) + .expect("cannot decode encrypted, compressed chunk"); + assert_eq!(data, data_decoded); + } +} -- 2.39.2 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v3 3/5] datastore: data blob: add helper and test for checking zstd_safe error code 2024-08-05 9:24 [pbs-devel] [PATCH proxmox-backup v3 0/5] improve compression throughput Dominik Csapak 2024-08-05 9:24 ` [pbs-devel] [PATCH proxmox-backup v3 1/5] remove data blob writer Dominik Csapak 2024-08-05 9:24 ` [pbs-devel] [PATCH proxmox-backup v3 2/5] datastore: test DataBlob encode/decode roundtrip Dominik Csapak @ 2024-08-05 9:24 ` Dominik Csapak 2024-08-05 9:24 ` [pbs-devel] [PATCH proxmox-backup v3 4/5] datastore: data blob: increase compression throughput Dominik Csapak ` (2 subsequent siblings) 5 siblings, 0 replies; 9+ messages in thread From: Dominik Csapak @ 2024-08-05 9:24 UTC (permalink / raw) To: pbs-devel We want to check the error code of zstd not to be 'Destination buffer to small', but there is no practical api at the moment. So we introduce a helper that uses the internal logic of zstd to determine the error. Since this is not guaranteed to be a stable api, add a test for that so we catch that error early on build. This should be fine, as long as this zstd behavior only changes with e.g. major debian upgrades. (Which means a new version) Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> --- new in v3 note that this probably throws a warning until the next patch due to the unused function. it would also be possible to merge this patch into the next if that's wanted Cargo.toml | 1 + pbs-datastore/Cargo.toml | 1 + pbs-datastore/src/data_blob.rs | 28 +++++++++++++++++++++++++++- 3 files changed, 29 insertions(+), 1 deletion(-) diff --git a/Cargo.toml b/Cargo.toml index 2b51ec82..275e3c95 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -158,6 +158,7 @@ url = "2.1" walkdir = "2" xdg = "2.2" zstd = { version = "0.12", features = [ "bindgen" ] } +zstd-safe = "6.0" [dependencies] anyhow.workspace = true diff --git a/pbs-datastore/Cargo.toml b/pbs-datastore/Cargo.toml index d8997e1d..494c231b 100644 --- a/pbs-datastore/Cargo.toml +++ b/pbs-datastore/Cargo.toml @@ -23,6 +23,7 @@ tokio = { workspace = true, features = [] } tracing.workspace = true walkdir.workspace = true zstd.workspace = true +zstd-safe.workspace = true pathpatterns.workspace = true pxar.workspace = true diff --git a/pbs-datastore/src/data_blob.rs b/pbs-datastore/src/data_blob.rs index a3a41c5e..adf5a932 100644 --- a/pbs-datastore/src/data_blob.rs +++ b/pbs-datastore/src/data_blob.rs @@ -12,6 +12,17 @@ use super::file_formats::*; const MAX_BLOB_SIZE: usize = 128 * 1024 * 1024; +// tests if the error code was 'Destination buffer is too small' +// by subtracting the error code from 0 (with underflow) +// see https://github.com/facebook/zstd/blob/dev/lib/common/error_private.h +// we test for this below so we catch errors if the interface changes +fn zstd_error_is_target_too_small(err: usize) -> bool { + let (real_code, _) = 0usize.overflowing_sub(err); + // ZSTD_error_dstSize_tooSmall = 70, + // see https://github.com/facebook/zstd/blob/dev/lib/zstd_errors.h + real_code == 70 +} + /// Encoded data chunk with digest and positional information pub struct ChunkInfo { pub chunk: DataBlob, @@ -567,7 +578,7 @@ impl<'a, 'b> DataChunkBuilder<'a, 'b> { mod test { use pbs_tools::crypt_config::CryptConfig; - use super::DataChunkBuilder; + use super::{zstd_error_is_target_too_small, DataChunkBuilder}; const TEST_DATA_LEN: usize = 50; @@ -640,4 +651,19 @@ mod test { .expect("cannot decode encrypted, compressed chunk"); assert_eq!(data, data_decoded); } + + #[test] + fn zstd_error_code_test() { + // test for the error code internal logic of zstd so we catch interface changes here + let data = &build_test_data(); + let mut target = Vec::new(); + match zstd_safe::compress(&mut target, data, 1) { + Ok(_) => panic!("unexpected success with zero-sized buffer"), + Err(err) => { + if !zstd_error_is_target_too_small(err) { + panic!("unexpected error code"); + } + } + } + } } -- 2.39.2 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v3 4/5] datastore: data blob: increase compression throughput 2024-08-05 9:24 [pbs-devel] [PATCH proxmox-backup v3 0/5] improve compression throughput Dominik Csapak ` (2 preceding siblings ...) 2024-08-05 9:24 ` [pbs-devel] [PATCH proxmox-backup v3 3/5] datastore: data blob: add helper and test for checking zstd_safe error code Dominik Csapak @ 2024-08-05 9:24 ` Dominik Csapak 2024-08-05 9:32 ` Dominik Csapak 2024-08-05 9:24 ` [pbs-devel] [PATCH proxmox-backup v3 5/5] datastore: DataBlob encode: simplify code Dominik Csapak 2024-08-07 17:06 ` [pbs-devel] applied-series: [PATCH proxmox-backup v3 0/5] improve compression throughput Thomas Lamprecht 5 siblings, 1 reply; 9+ messages in thread From: Dominik Csapak @ 2024-08-05 9:24 UTC (permalink / raw) To: pbs-devel by not using `zstd::stream::copy_encode`, because that has an allocation pattern that reduces throughput if the target/source storage and the network are faster than the chunk creation. instead use `zstd_safe::compress` which shouldn't do any big allocations, since we provide the target buffer. To handle the case that the target buffer is too small, we now ignore all zstd error and continue with the uncompressed data, logging the error except if the target buffer is too small. Some benchmarks on my machine from tmpfs to a datastore on tmpfs: Type without patches (MiB/s) with patches (MiB/s) .img file ~614 ~767 pxar one big file ~657 ~807 pxar small files ~576 ~627 Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> --- changes from v2: * use zstd_safe instead of zstd pbs-datastore/src/data_blob.rs | 33 +++++++++++++++++---------------- 1 file changed, 17 insertions(+), 16 deletions(-) diff --git a/pbs-datastore/src/data_blob.rs b/pbs-datastore/src/data_blob.rs index adf5a932..4e689364 100644 --- a/pbs-datastore/src/data_blob.rs +++ b/pbs-datastore/src/data_blob.rs @@ -147,39 +147,40 @@ impl DataBlob { DataBlob { raw_data } } else { - let max_data_len = data.len() + std::mem::size_of::<DataBlobHeader>(); + let header_len = std::mem::size_of::<DataBlobHeader>(); + let max_data_len = data.len() + header_len; + let mut raw_data = vec![0; max_data_len]; if compress { - let mut comp_data = Vec::with_capacity(max_data_len); - let head = DataBlobHeader { magic: COMPRESSED_BLOB_MAGIC_1_0, crc: [0; 4], }; unsafe { - comp_data.write_le_value(head)?; + (&mut raw_data[0..header_len]).write_le_value(head)?; } - zstd::stream::copy_encode(data, &mut comp_data, 1)?; - - if comp_data.len() < max_data_len { - let mut blob = DataBlob { - raw_data: comp_data, - }; - blob.set_crc(blob.compute_crc()); - return Ok(blob); + match zstd_safe::compress(&mut raw_data[header_len..], data, 1) { + Ok(size) if size <= data.len() => { + raw_data.truncate(header_len + size); + let mut blob = DataBlob { raw_data }; + blob.set_crc(blob.compute_crc()); + return Ok(blob); + } + Err(err) if !zstd_error_is_target_too_small(err) => { + log::warn!("zstd compression error: {err}"); + } + _ => {} } } - let mut raw_data = Vec::with_capacity(max_data_len); - let head = DataBlobHeader { magic: UNCOMPRESSED_BLOB_MAGIC_1_0, crc: [0; 4], }; unsafe { - raw_data.write_le_value(head)?; + (&mut raw_data[0..header_len]).write_le_value(head)?; } - raw_data.extend_from_slice(data); + (&mut raw_data[header_len..]).write_all(data)?; DataBlob { raw_data } }; -- 2.39.2 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v3 4/5] datastore: data blob: increase compression throughput 2024-08-05 9:24 ` [pbs-devel] [PATCH proxmox-backup v3 4/5] datastore: data blob: increase compression throughput Dominik Csapak @ 2024-08-05 9:32 ` Dominik Csapak 0 siblings, 0 replies; 9+ messages in thread From: Dominik Csapak @ 2024-08-05 9:32 UTC (permalink / raw) To: pbs-devel; +Cc: Thomas Lamprecht On 8/5/24 11:24, Dominik Csapak wrote: > by not using `zstd::stream::copy_encode`, because that has an allocation > pattern that reduces throughput if the target/source storage and the > network are faster than the chunk creation. > > instead use `zstd_safe::compress` which shouldn't do any big > allocations, since we provide the target buffer. > > To handle the case that the target buffer is too small, we now ignore > all zstd error and continue with the uncompressed data, logging the error > except if the target buffer is too small. > > Some benchmarks on my machine from tmpfs to a datastore on tmpfs: > > Type without patches (MiB/s) with patches (MiB/s) > .img file ~614 ~767 > pxar one big file ~657 ~807 > pxar small files ~576 ~627 > > Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> @thomas sorry i forgot to adapt the commit message to your (or similar) suggestion feel free to either adapt it to your liking or telling me i should send a v4 for this ;) (which i'll happily do if you want) _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v3 5/5] datastore: DataBlob encode: simplify code 2024-08-05 9:24 [pbs-devel] [PATCH proxmox-backup v3 0/5] improve compression throughput Dominik Csapak ` (3 preceding siblings ...) 2024-08-05 9:24 ` [pbs-devel] [PATCH proxmox-backup v3 4/5] datastore: data blob: increase compression throughput Dominik Csapak @ 2024-08-05 9:24 ` Dominik Csapak 2024-08-07 17:06 ` [pbs-devel] applied-series: [PATCH proxmox-backup v3 0/5] improve compression throughput Thomas Lamprecht 5 siblings, 0 replies; 9+ messages in thread From: Dominik Csapak @ 2024-08-05 9:24 UTC (permalink / raw) To: pbs-devel by combining the compression call from both encrypted and unencrypted paths and deciding on the header magic at one site. No functional changes intended, besides reusing the same buffer for compression. Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> --- changes from v2: * adapt to cahnges in previous patches pbs-datastore/src/data_blob.rs | 89 ++++++++++++++-------------------- 1 file changed, 37 insertions(+), 52 deletions(-) diff --git a/pbs-datastore/src/data_blob.rs b/pbs-datastore/src/data_blob.rs index 4e689364..ee1bb664 100644 --- a/pbs-datastore/src/data_blob.rs +++ b/pbs-datastore/src/data_blob.rs @@ -104,28 +104,42 @@ impl DataBlob { bail!("data blob too large ({} bytes).", data.len()); } - let mut blob = if let Some(config) = config { - let compr_data; - let (_compress, data, magic) = if compress { - compr_data = zstd::bulk::compress(data, 1)?; - // Note: We only use compression if result is shorter - if compr_data.len() < data.len() { - (true, &compr_data[..], ENCR_COMPR_BLOB_MAGIC_1_0) - } else { - (false, data, ENCRYPTED_BLOB_MAGIC_1_0) + let header_len = if config.is_some() { + std::mem::size_of::<EncryptedDataBlobHeader>() + } else { + std::mem::size_of::<DataBlobHeader>() + }; + + let mut compressed = false; + let mut data_compressed = vec![0u8; header_len + data.len()]; + if compress { + match zstd_safe::compress(&mut data_compressed[header_len..], data, 1) { + Ok(size) if size <= data.len() => { + data_compressed.truncate(header_len + size); + compressed = true; } - } else { - (false, data, ENCRYPTED_BLOB_MAGIC_1_0) - }; + Err(err) if !zstd_error_is_target_too_small(err) => { + log::warn!("zstd compression error: {err}"); + } + _ => {} + } + } - let header_len = std::mem::size_of::<EncryptedDataBlobHeader>(); + let (magic, encryption_source) = match (compressed, config.is_some()) { + (true, true) => (ENCR_COMPR_BLOB_MAGIC_1_0, &data_compressed[header_len..]), + (true, false) => (COMPRESSED_BLOB_MAGIC_1_0, &data_compressed[header_len..]), + (false, true) => (ENCRYPTED_BLOB_MAGIC_1_0, data), + (false, false) => { + (&mut data_compressed[header_len..]).write_all(data)?; + (UNCOMPRESSED_BLOB_MAGIC_1_0, data) + } + }; + + let raw_data = if let Some(config) = config { let mut raw_data = Vec::with_capacity(data.len() + header_len); let dummy_head = EncryptedDataBlobHeader { - head: DataBlobHeader { - magic: [0u8; 8], - crc: [0; 4], - }, + head: DataBlobHeader { magic, crc: [0; 4] }, iv: [0u8; 16], tag: [0u8; 16], }; @@ -133,7 +147,7 @@ impl DataBlob { raw_data.write_le_value(dummy_head)?; } - let (iv, tag) = Self::encrypt_to(config, data, &mut raw_data)?; + let (iv, tag) = Self::encrypt_to(config, encryption_source, &mut raw_data)?; let head = EncryptedDataBlobHeader { head: DataBlobHeader { magic, crc: [0; 4] }, @@ -145,46 +159,17 @@ impl DataBlob { (&mut raw_data[0..header_len]).write_le_value(head)?; } - DataBlob { raw_data } + raw_data } else { - let header_len = std::mem::size_of::<DataBlobHeader>(); - let max_data_len = data.len() + header_len; - let mut raw_data = vec![0; max_data_len]; - if compress { - let head = DataBlobHeader { - magic: COMPRESSED_BLOB_MAGIC_1_0, - crc: [0; 4], - }; - unsafe { - (&mut raw_data[0..header_len]).write_le_value(head)?; - } - - match zstd_safe::compress(&mut raw_data[header_len..], data, 1) { - Ok(size) if size <= data.len() => { - raw_data.truncate(header_len + size); - let mut blob = DataBlob { raw_data }; - blob.set_crc(blob.compute_crc()); - return Ok(blob); - } - Err(err) if !zstd_error_is_target_too_small(err) => { - log::warn!("zstd compression error: {err}"); - } - _ => {} - } - } - - let head = DataBlobHeader { - magic: UNCOMPRESSED_BLOB_MAGIC_1_0, - crc: [0; 4], - }; + let head = DataBlobHeader { magic, crc: [0; 4] }; unsafe { - (&mut raw_data[0..header_len]).write_le_value(head)?; + (&mut data_compressed[0..header_len]).write_le_value(head)?; } - (&mut raw_data[header_len..]).write_all(data)?; - DataBlob { raw_data } + data_compressed }; + let mut blob = DataBlob { raw_data }; blob.set_crc(blob.compute_crc()); Ok(blob) -- 2.39.2 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* [pbs-devel] applied-series: [PATCH proxmox-backup v3 0/5] improve compression throughput 2024-08-05 9:24 [pbs-devel] [PATCH proxmox-backup v3 0/5] improve compression throughput Dominik Csapak ` (4 preceding siblings ...) 2024-08-05 9:24 ` [pbs-devel] [PATCH proxmox-backup v3 5/5] datastore: DataBlob encode: simplify code Dominik Csapak @ 2024-08-07 17:06 ` Thomas Lamprecht 2024-08-08 6:53 ` Dominik Csapak 5 siblings, 1 reply; 9+ messages in thread From: Thomas Lamprecht @ 2024-08-07 17:06 UTC (permalink / raw) To: Proxmox Backup Server development discussion, Dominik Csapak On 05/08/2024 11:24, Dominik Csapak wrote: > in my tests (against current master) it improved the throughput if > the source/target storage is fast enough (tmpfs -> tmpfs): > > Type master (MiB/s) with my patches (MiB/s) > .img file ~614 ~767 > pxar one big file ~657 ~807 > pxar small files ~576 ~627 > > (these results are also in the relevant commit message) > > It would be great, if someone else can cross check my results here. > Note: the the pxar code being faster than the img code seems to stem > from better multithreading pipelining in that code or in tokio (pxar > codepath scales more directly with more cores than the .img codepath) > > changes from v2: > * use zstd_safe instead of zstd so we have access to the underlying > error code > * add test for the error code handling since that's not part of the > public zstd api, only an implementation detail (albeit one that's > not likely to change soon) > * seperated the tests for the decode(encode()) roundtrip so a failure > can more easily assigned to a specific codepath > > changes from v1: > * reorder patches so that the data blob writer removal is the first one > * add tests for DataBlob that we can decode what we encoded > (to see that my patches don't mess up the chunk generation) > * add new patch to cleanup the `encode` function a bit > > Dominik Csapak (5): > remove data blob writer > datastore: test DataBlob encode/decode roundtrip > datastore: data blob: add helper and test for checking zstd_safe error > code > datastore: data blob: increase compression throughput > datastore: DataBlob encode: simplify code > > Cargo.toml | 1 + > pbs-datastore/Cargo.toml | 1 + > pbs-datastore/src/data_blob.rs | 193 ++++++++++++++++------- > pbs-datastore/src/data_blob_writer.rs | 212 -------------------------- > pbs-datastore/src/lib.rs | 2 - > tests/blob_writer.rs | 105 ------------- > 6 files changed, 144 insertions(+), 370 deletions(-) > delete mode 100644 pbs-datastore/src/data_blob_writer.rs > delete mode 100644 tests/blob_writer.rs > Applied, with some rewording of the commit message and some slight adaption to the test commit. Ps, it seems the zstd crate authors aren't so sure why they use the 32 KB buffer either, which FWICT is the underlying issue here: https://docs.rs/zstd/latest/src/zstd/stream/zio/writer.rs.html#41-42 But it's a bit hard to follow, to me this looks less like a allocation pattern issue (on its own), but rather than a increased overhead due to processing in 32 KiB chunks, the extra copying itself naturally doesn't help, but that's not a bad allocation pattern but rather a single (FWICT) avoidable allocations for the small buffer, but as said, not 100% sure as the code is rather over-engineered... Anyhow, I tried to add these findings, including the uncertainty they have, in the commit message to have some better background. I know you could have done this in a v4, but it felt faster to just amend the changes, especially since I have a few days off and would have to recreate the mental context anyway. _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [pbs-devel] applied-series: [PATCH proxmox-backup v3 0/5] improve compression throughput 2024-08-07 17:06 ` [pbs-devel] applied-series: [PATCH proxmox-backup v3 0/5] improve compression throughput Thomas Lamprecht @ 2024-08-08 6:53 ` Dominik Csapak 0 siblings, 0 replies; 9+ messages in thread From: Dominik Csapak @ 2024-08-08 6:53 UTC (permalink / raw) To: Thomas Lamprecht, Proxmox Backup Server development discussion On 8/7/24 19:06, Thomas Lamprecht wrote: > On 05/08/2024 11:24, Dominik Csapak wrote: >> in my tests (against current master) it improved the throughput if >> the source/target storage is fast enough (tmpfs -> tmpfs): >> >> Type master (MiB/s) with my patches (MiB/s) >> .img file ~614 ~767 >> pxar one big file ~657 ~807 >> pxar small files ~576 ~627 >> >> (these results are also in the relevant commit message) >> >> It would be great, if someone else can cross check my results here. >> Note: the the pxar code being faster than the img code seems to stem >> from better multithreading pipelining in that code or in tokio (pxar >> codepath scales more directly with more cores than the .img codepath) >> >> changes from v2: >> * use zstd_safe instead of zstd so we have access to the underlying >> error code >> * add test for the error code handling since that's not part of the >> public zstd api, only an implementation detail (albeit one that's >> not likely to change soon) >> * seperated the tests for the decode(encode()) roundtrip so a failure >> can more easily assigned to a specific codepath >> >> changes from v1: >> * reorder patches so that the data blob writer removal is the first one >> * add tests for DataBlob that we can decode what we encoded >> (to see that my patches don't mess up the chunk generation) >> * add new patch to cleanup the `encode` function a bit >> >> Dominik Csapak (5): >> remove data blob writer >> datastore: test DataBlob encode/decode roundtrip >> datastore: data blob: add helper and test for checking zstd_safe error >> code >> datastore: data blob: increase compression throughput >> datastore: DataBlob encode: simplify code >> >> Cargo.toml | 1 + >> pbs-datastore/Cargo.toml | 1 + >> pbs-datastore/src/data_blob.rs | 193 ++++++++++++++++------- >> pbs-datastore/src/data_blob_writer.rs | 212 -------------------------- >> pbs-datastore/src/lib.rs | 2 - >> tests/blob_writer.rs | 105 ------------- >> 6 files changed, 144 insertions(+), 370 deletions(-) >> delete mode 100644 pbs-datastore/src/data_blob_writer.rs >> delete mode 100644 tests/blob_writer.rs >> > > Applied, with some rewording of the commit message and some slight > adaption to the test commit. > > Ps, it seems the zstd crate authors aren't so sure why they use the > 32 KB buffer either, which FWICT is the underlying issue here: > > https://docs.rs/zstd/latest/src/zstd/stream/zio/writer.rs.html#41-42 > > But it's a bit hard to follow, to me this looks less like a allocation > pattern issue (on its own), but rather than a increased overhead due to > processing in 32 KiB chunks, the extra copying itself naturally doesn't > help, but that's not a bad allocation pattern but rather a single > (FWICT) avoidable allocations for the small buffer, but as said, not > 100% sure as the code is rather over-engineered... Anyhow, I tried to > add these findings, including the uncertainty they have, in the commit > message to have some better background. > > I know you could have done this in a v4, but it felt faster to just > amend the changes, especially since I have a few days off and would > have to recreate the mental context anyway. Ah ok, thanks for investigating (I was not patient enough for that seemingly..) also thanks for the amending of the commit messages :) _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-08-08 6:53 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-08-05 9:24 [pbs-devel] [PATCH proxmox-backup v3 0/5] improve compression throughput Dominik Csapak 2024-08-05 9:24 ` [pbs-devel] [PATCH proxmox-backup v3 1/5] remove data blob writer Dominik Csapak 2024-08-05 9:24 ` [pbs-devel] [PATCH proxmox-backup v3 2/5] datastore: test DataBlob encode/decode roundtrip Dominik Csapak 2024-08-05 9:24 ` [pbs-devel] [PATCH proxmox-backup v3 3/5] datastore: data blob: add helper and test for checking zstd_safe error code Dominik Csapak 2024-08-05 9:24 ` [pbs-devel] [PATCH proxmox-backup v3 4/5] datastore: data blob: increase compression throughput Dominik Csapak 2024-08-05 9:32 ` Dominik Csapak 2024-08-05 9:24 ` [pbs-devel] [PATCH proxmox-backup v3 5/5] datastore: DataBlob encode: simplify code Dominik Csapak 2024-08-07 17:06 ` [pbs-devel] applied-series: [PATCH proxmox-backup v3 0/5] improve compression throughput Thomas Lamprecht 2024-08-08 6:53 ` Dominik Csapak
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox