* [pbs-devel] [PATCH proxmox-backup v2 0/4] improve compression throughput @ 2024-07-31 9:36 Dominik Csapak 2024-07-31 9:36 ` [pbs-devel] [PATCH proxmox-backup v2 1/4] remove data blob writer Dominik Csapak ` (4 more replies) 0 siblings, 5 replies; 14+ messages in thread From: Dominik Csapak @ 2024-07-31 9:36 UTC (permalink / raw) To: pbs-devel this series replaces my previous 2 patches [0] for that in my tests (again current master) it improved the throughput if the source/target storage is fast enough (tmpfs -> tmpfs): Type master (MiB/s) with my patches (MiB/s) .img file ~614 ~767 pxar one big file ~657 ~807 pxar small files ~576 ~627 It would be great, if someone else can cross check my results here. Note: the the pxar code being faster than the img code seems to stem from better multithreading pipelining in that code or in tokio (pxar codepath scales more directly with more cores than the .img codepath) changes from v1: * reorder patches so that the data blob writer removal is the first one * add tests for DataBlob that we can decode what we encoded (to see that my patches don't mess up the chunk generation) * add new patch to cleanup the `encode` function a bit Dominik Csapak (4): remove data blob writer datastore: test DataBlob encode/decode roundtrip datastore: data blob: increase compression throughput datastore: DataBlob encode: simplify code pbs-datastore/src/data_blob.rs | 158 ++++++++++++------- pbs-datastore/src/data_blob_writer.rs | 212 -------------------------- pbs-datastore/src/lib.rs | 2 - tests/blob_writer.rs | 105 ------------- 4 files changed, 107 insertions(+), 370 deletions(-) delete mode 100644 pbs-datastore/src/data_blob_writer.rs delete mode 100644 tests/blob_writer.rs -- 2.39.2 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 14+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v2 1/4] remove data blob writer 2024-07-31 9:36 [pbs-devel] [PATCH proxmox-backup v2 0/4] improve compression throughput Dominik Csapak @ 2024-07-31 9:36 ` Dominik Csapak 2024-07-31 9:36 ` [pbs-devel] [PATCH proxmox-backup v2 2/4] datastore: test DataBlob encode/decode roundtrip Dominik Csapak ` (3 subsequent siblings) 4 siblings, 0 replies; 14+ messages in thread From: Dominik Csapak @ 2024-07-31 9:36 UTC (permalink / raw) To: pbs-devel this is leftover code that is not currently used outside of tests. Should we need it again, we can just revert this commit. Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> --- pbs-datastore/src/data_blob_writer.rs | 212 -------------------------- pbs-datastore/src/lib.rs | 2 - tests/blob_writer.rs | 105 ------------- 3 files changed, 319 deletions(-) delete mode 100644 pbs-datastore/src/data_blob_writer.rs delete mode 100644 tests/blob_writer.rs diff --git a/pbs-datastore/src/data_blob_writer.rs b/pbs-datastore/src/data_blob_writer.rs deleted file mode 100644 index 30d9645f..00000000 --- a/pbs-datastore/src/data_blob_writer.rs +++ /dev/null @@ -1,212 +0,0 @@ -use std::io::{Seek, SeekFrom, Write}; -use std::sync::Arc; - -use anyhow::Error; - -use proxmox_io::WriteExt; - -use pbs_tools::crypt_config::CryptConfig; - -use crate::checksum_writer::ChecksumWriter; -use crate::crypt_writer::CryptWriter; -use crate::file_formats::{self, DataBlobHeader, EncryptedDataBlobHeader}; - -enum BlobWriterState<'writer, W: Write> { - Uncompressed { - csum_writer: ChecksumWriter<W>, - }, - Compressed { - compr: zstd::stream::write::Encoder<'writer, ChecksumWriter<W>>, - }, - Encrypted { - crypt_writer: CryptWriter<ChecksumWriter<W>>, - }, - EncryptedCompressed { - compr: zstd::stream::write::Encoder<'writer, CryptWriter<ChecksumWriter<W>>>, - }, -} - -/// Data blob writer -pub struct DataBlobWriter<'writer, W: Write> { - state: BlobWriterState<'writer, W>, -} - -impl<W: Write + Seek> DataBlobWriter<'_, W> { - pub fn new_uncompressed(mut writer: W) -> Result<Self, Error> { - writer.seek(SeekFrom::Start(0))?; - let head = DataBlobHeader { - magic: file_formats::UNCOMPRESSED_BLOB_MAGIC_1_0, - crc: [0; 4], - }; - unsafe { - writer.write_le_value(head)?; - } - let csum_writer = ChecksumWriter::new(writer, None); - Ok(Self { - state: BlobWriterState::Uncompressed { csum_writer }, - }) - } - - pub fn new_compressed(mut writer: W) -> Result<Self, Error> { - writer.seek(SeekFrom::Start(0))?; - let head = DataBlobHeader { - magic: file_formats::COMPRESSED_BLOB_MAGIC_1_0, - crc: [0; 4], - }; - unsafe { - writer.write_le_value(head)?; - } - let csum_writer = ChecksumWriter::new(writer, None); - let compr = zstd::stream::write::Encoder::new(csum_writer, 1)?; - Ok(Self { - state: BlobWriterState::Compressed { compr }, - }) - } - - pub fn new_encrypted(mut writer: W, config: Arc<CryptConfig>) -> Result<Self, Error> { - writer.seek(SeekFrom::Start(0))?; - let head = EncryptedDataBlobHeader { - head: DataBlobHeader { - magic: file_formats::ENCRYPTED_BLOB_MAGIC_1_0, - crc: [0; 4], - }, - iv: [0u8; 16], - tag: [0u8; 16], - }; - unsafe { - writer.write_le_value(head)?; - } - - let csum_writer = ChecksumWriter::new(writer, None); - let crypt_writer = CryptWriter::new(csum_writer, config)?; - Ok(Self { - state: BlobWriterState::Encrypted { crypt_writer }, - }) - } - - pub fn new_encrypted_compressed( - mut writer: W, - config: Arc<CryptConfig>, - ) -> Result<Self, Error> { - writer.seek(SeekFrom::Start(0))?; - let head = EncryptedDataBlobHeader { - head: DataBlobHeader { - magic: file_formats::ENCR_COMPR_BLOB_MAGIC_1_0, - crc: [0; 4], - }, - iv: [0u8; 16], - tag: [0u8; 16], - }; - unsafe { - writer.write_le_value(head)?; - } - - let csum_writer = ChecksumWriter::new(writer, None); - let crypt_writer = CryptWriter::new(csum_writer, config)?; - let compr = zstd::stream::write::Encoder::new(crypt_writer, 1)?; - Ok(Self { - state: BlobWriterState::EncryptedCompressed { compr }, - }) - } - - pub fn finish(self) -> Result<W, Error> { - match self.state { - BlobWriterState::Uncompressed { csum_writer } => { - // write CRC - let (mut writer, crc, _) = csum_writer.finish()?; - let head = DataBlobHeader { - magic: file_formats::UNCOMPRESSED_BLOB_MAGIC_1_0, - crc: crc.to_le_bytes(), - }; - - writer.seek(SeekFrom::Start(0))?; - unsafe { - writer.write_le_value(head)?; - } - - Ok(writer) - } - BlobWriterState::Compressed { compr } => { - let csum_writer = compr.finish()?; - let (mut writer, crc, _) = csum_writer.finish()?; - - let head = DataBlobHeader { - magic: file_formats::COMPRESSED_BLOB_MAGIC_1_0, - crc: crc.to_le_bytes(), - }; - - writer.seek(SeekFrom::Start(0))?; - unsafe { - writer.write_le_value(head)?; - } - - Ok(writer) - } - BlobWriterState::Encrypted { crypt_writer } => { - let (csum_writer, iv, tag) = crypt_writer.finish()?; - let (mut writer, crc, _) = csum_writer.finish()?; - - let head = EncryptedDataBlobHeader { - head: DataBlobHeader { - magic: file_formats::ENCRYPTED_BLOB_MAGIC_1_0, - crc: crc.to_le_bytes(), - }, - iv, - tag, - }; - writer.seek(SeekFrom::Start(0))?; - unsafe { - writer.write_le_value(head)?; - } - Ok(writer) - } - BlobWriterState::EncryptedCompressed { compr } => { - let crypt_writer = compr.finish()?; - let (csum_writer, iv, tag) = crypt_writer.finish()?; - let (mut writer, crc, _) = csum_writer.finish()?; - - let head = EncryptedDataBlobHeader { - head: DataBlobHeader { - magic: file_formats::ENCR_COMPR_BLOB_MAGIC_1_0, - crc: crc.to_le_bytes(), - }, - iv, - tag, - }; - writer.seek(SeekFrom::Start(0))?; - unsafe { - writer.write_le_value(head)?; - } - Ok(writer) - } - } - } -} - -impl<W: Write + Seek> Write for DataBlobWriter<'_, W> { - fn write(&mut self, buf: &[u8]) -> Result<usize, std::io::Error> { - match self.state { - BlobWriterState::Uncompressed { - ref mut csum_writer, - } => csum_writer.write(buf), - BlobWriterState::Compressed { ref mut compr } => compr.write(buf), - BlobWriterState::Encrypted { - ref mut crypt_writer, - } => crypt_writer.write(buf), - BlobWriterState::EncryptedCompressed { ref mut compr } => compr.write(buf), - } - } - - fn flush(&mut self) -> Result<(), std::io::Error> { - match self.state { - BlobWriterState::Uncompressed { - ref mut csum_writer, - } => csum_writer.flush(), - BlobWriterState::Compressed { ref mut compr } => compr.flush(), - BlobWriterState::Encrypted { - ref mut crypt_writer, - } => crypt_writer.flush(), - BlobWriterState::EncryptedCompressed { ref mut compr } => compr.flush(), - } - } -} diff --git a/pbs-datastore/src/lib.rs b/pbs-datastore/src/lib.rs index 3e4aa34c..202b0955 100644 --- a/pbs-datastore/src/lib.rs +++ b/pbs-datastore/src/lib.rs @@ -179,7 +179,6 @@ pub mod crypt_reader; pub mod crypt_writer; pub mod data_blob; pub mod data_blob_reader; -pub mod data_blob_writer; pub mod file_formats; pub mod index; pub mod manifest; @@ -201,7 +200,6 @@ pub use crypt_reader::CryptReader; pub use crypt_writer::CryptWriter; pub use data_blob::DataBlob; pub use data_blob_reader::DataBlobReader; -pub use data_blob_writer::DataBlobWriter; pub use manifest::BackupManifest; pub use store_progress::StoreProgress; diff --git a/tests/blob_writer.rs b/tests/blob_writer.rs deleted file mode 100644 index 23a3283d..00000000 --- a/tests/blob_writer.rs +++ /dev/null @@ -1,105 +0,0 @@ -use std::io::Cursor; -use std::io::{Read, Seek, SeekFrom, Write}; -use std::sync::Arc; - -use anyhow::{bail, Error}; -use lazy_static::lazy_static; - -use pbs_datastore::{DataBlob, DataBlobReader, DataBlobWriter}; -use pbs_tools::crypt_config::CryptConfig; - -lazy_static! { - static ref TEST_DATA: Vec<u8> = { - let mut data = Vec::new(); - - for i in 0..100_000 { - data.push((i % 255) as u8); - } - - data - }; - static ref CRYPT_CONFIG: Arc<CryptConfig> = { - let key = [1u8; 32]; - Arc::new(CryptConfig::new(key).unwrap()) - }; - static ref TEST_DIGEST_PLAIN: [u8; 32] = [ - 83, 154, 96, 195, 167, 204, 38, 142, 204, 224, 130, 201, 24, 71, 2, 188, 130, 155, 177, 6, - 162, 100, 61, 238, 38, 219, 63, 240, 191, 132, 87, 238 - ]; - static ref TEST_DIGEST_ENC: [u8; 32] = [ - 50, 162, 191, 93, 255, 132, 9, 14, 127, 23, 92, 39, 246, 102, 245, 204, 130, 104, 4, 106, - 182, 239, 218, 14, 80, 17, 150, 188, 239, 253, 198, 117 - ]; -} - -fn verify_test_blob(mut cursor: Cursor<Vec<u8>>, digest: &[u8; 32]) -> Result<(), Error> { - // run read tests with different buffer sizes - for size in [1, 3, 64 * 1024].iter() { - println!("Starting DataBlobReader test (size = {})", size); - - cursor.seek(SeekFrom::Start(0))?; - let mut reader = DataBlobReader::new(&mut cursor, Some(CRYPT_CONFIG.clone()))?; - let mut buffer = Vec::<u8>::new(); - // read the whole file - //reader.read_to_end(&mut buffer)?; - let mut buf = vec![0u8; *size]; - loop { - let count = reader.read(&mut buf)?; - if count == 0 { - break; - } - buffer.extend(&buf[..count]); - } - - reader.finish()?; - if buffer != *TEST_DATA { - bail!("blob data is wrong (read buffer size {})", size); - } - } - - let raw_data = cursor.into_inner(); - - let blob = DataBlob::load_from_reader(&mut &raw_data[..])?; - - let data = blob.decode(Some(&CRYPT_CONFIG), Some(digest))?; - if data != *TEST_DATA { - bail!("blob data is wrong (decode)"); - } - Ok(()) -} - -#[test] -fn test_uncompressed_blob_writer() -> Result<(), Error> { - let tmp = Cursor::new(Vec::<u8>::new()); - let mut blob_writer = DataBlobWriter::new_uncompressed(tmp)?; - blob_writer.write_all(&TEST_DATA)?; - - verify_test_blob(blob_writer.finish()?, &TEST_DIGEST_PLAIN) -} - -#[test] -fn test_compressed_blob_writer() -> Result<(), Error> { - let tmp = Cursor::new(Vec::<u8>::new()); - let mut blob_writer = DataBlobWriter::new_compressed(tmp)?; - blob_writer.write_all(&TEST_DATA)?; - - verify_test_blob(blob_writer.finish()?, &TEST_DIGEST_PLAIN) -} - -#[test] -fn test_encrypted_blob_writer() -> Result<(), Error> { - let tmp = Cursor::new(Vec::<u8>::new()); - let mut blob_writer = DataBlobWriter::new_encrypted(tmp, CRYPT_CONFIG.clone())?; - blob_writer.write_all(&TEST_DATA)?; - - verify_test_blob(blob_writer.finish()?, &TEST_DIGEST_ENC) -} - -#[test] -fn test_encrypted_compressed_blob_writer() -> Result<(), Error> { - let tmp = Cursor::new(Vec::<u8>::new()); - let mut blob_writer = DataBlobWriter::new_encrypted_compressed(tmp, CRYPT_CONFIG.clone())?; - blob_writer.write_all(&TEST_DATA)?; - - verify_test_blob(blob_writer.finish()?, &TEST_DIGEST_ENC) -} -- 2.39.2 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 14+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v2 2/4] datastore: test DataBlob encode/decode roundtrip 2024-07-31 9:36 [pbs-devel] [PATCH proxmox-backup v2 0/4] improve compression throughput Dominik Csapak 2024-07-31 9:36 ` [pbs-devel] [PATCH proxmox-backup v2 1/4] remove data blob writer Dominik Csapak @ 2024-07-31 9:36 ` Dominik Csapak 2024-07-31 9:47 ` Lukas Wagner 2024-07-31 9:36 ` [pbs-devel] [PATCH proxmox-backup v2 3/4] datastore: data blob: increase compression throughput Dominik Csapak ` (2 subsequent siblings) 4 siblings, 1 reply; 14+ messages in thread From: Dominik Csapak @ 2024-07-31 9:36 UTC (permalink / raw) To: pbs-devel so that we can be sure we can decode an encoded blob again Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> --- new in v2 pbs-datastore/src/data_blob.rs | 66 ++++++++++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) diff --git a/pbs-datastore/src/data_blob.rs b/pbs-datastore/src/data_blob.rs index a7a55fb7..8715afef 100644 --- a/pbs-datastore/src/data_blob.rs +++ b/pbs-datastore/src/data_blob.rs @@ -562,3 +562,69 @@ impl<'a, 'b> DataChunkBuilder<'a, 'b> { chunk_builder.build() } } + +#[cfg(test)] +mod test { + use pbs_tools::crypt_config::CryptConfig; + + use super::DataChunkBuilder; + + const TEST_DATA_LEN: usize = 50; + + #[test] + fn test_data_blob_builder() { + let mut data = Vec::with_capacity(TEST_DATA_LEN); + for i in 0..TEST_DATA_LEN / 10 { + for _ in 0..10 { + data.push(i as u8); + } + } + + // unencrypted, uncompressed + let (chunk, digest) = DataChunkBuilder::new(&data) + .compress(false) + .build() + .expect("could not create unencrypted, uncompressed chunk"); + + let data_decoded = chunk + .decode(None, Some(&digest)) + .expect("cannot decode unencrypted, uncompressed chunk"); + assert_eq!(data, data_decoded); + + // unencrypted, compressed + let (chunk, digest) = DataChunkBuilder::new(&data) + .compress(true) + .build() + .expect("could not create unencrypted, compressed chunk"); + + let data_decoded = chunk + .decode(None, Some(&digest)) + .expect("cannot decode unencrypted, compressed chunk"); + assert_eq!(data, data_decoded); + + // encrypted, uncompressed + let crypt_config = CryptConfig::new([9; 32]).expect("could not create crypt config"); + let (chunk, digest) = DataChunkBuilder::new(&data) + .compress(false) + .crypt_config(&crypt_config) + .build() + .expect("could not create encrypted, uncompressed chunk"); + + let data_decoded = chunk + .decode(Some(&crypt_config), Some(&digest)) + .expect("cannot decode encrypted, uncompressed chunk"); + assert_eq!(data, data_decoded); + + // encrypted, compressed + let (chunk, digest) = DataChunkBuilder::new(&data) + .compress(true) + .crypt_config(&crypt_config) + .build() + .expect("could not create encrypted, compressed chunk"); + + let data_decoded = chunk + .decode(Some(&crypt_config), Some(&digest)) + .expect("cannot decode encrypted, compressed chunk"); + assert_eq!(data, data_decoded); + } +} -- 2.39.2 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v2 2/4] datastore: test DataBlob encode/decode roundtrip 2024-07-31 9:36 ` [pbs-devel] [PATCH proxmox-backup v2 2/4] datastore: test DataBlob encode/decode roundtrip Dominik Csapak @ 2024-07-31 9:47 ` Lukas Wagner 2024-07-31 9:50 ` Dominik Csapak 0 siblings, 1 reply; 14+ messages in thread From: Lukas Wagner @ 2024-07-31 9:47 UTC (permalink / raw) To: Proxmox Backup Server development discussion, Dominik Csapak On 2024-07-31 11:36, Dominik Csapak wrote: > so that we can be sure we can decode an encoded blob again > > Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> > --- > new in v2 > pbs-datastore/src/data_blob.rs | 66 ++++++++++++++++++++++++++++++++++ > 1 file changed, 66 insertions(+) > > diff --git a/pbs-datastore/src/data_blob.rs b/pbs-datastore/src/data_blob.rs > index a7a55fb7..8715afef 100644 > --- a/pbs-datastore/src/data_blob.rs > +++ b/pbs-datastore/src/data_blob.rs > @@ -562,3 +562,69 @@ impl<'a, 'b> DataChunkBuilder<'a, 'b> { > chunk_builder.build() > } > } > + > +#[cfg(test)] > +mod test { > + use pbs_tools::crypt_config::CryptConfig; > + > + use super::DataChunkBuilder; > + > + const TEST_DATA_LEN: usize = 50; > + > + #[test] > + fn test_data_blob_builder() { > + let mut data = Vec::with_capacity(TEST_DATA_LEN); > + for i in 0..TEST_DATA_LEN / 10 { > + for _ in 0..10 { > + data.push(i as u8); > + } > + } > + > + // unencrypted, uncompressed > + let (chunk, digest) = DataChunkBuilder::new(&data) > + .compress(false) > + .build() > + .expect("could not create unencrypted, uncompressed chunk"); > + > + let data_decoded = chunk > + .decode(None, Some(&digest)) > + .expect("cannot decode unencrypted, uncompressed chunk"); > + assert_eq!(data, data_decoded); > + > + // unencrypted, compressed > + let (chunk, digest) = DataChunkBuilder::new(&data) > + .compress(true) > + .build() > + .expect("could not create unencrypted, compressed chunk"); > + > + let data_decoded = chunk > + .decode(None, Some(&digest)) > + .expect("cannot decode unencrypted, compressed chunk"); > + assert_eq!(data, data_decoded); > + > + // encrypted, uncompressed > + let crypt_config = CryptConfig::new([9; 32]).expect("could not create crypt config"); > + let (chunk, digest) = DataChunkBuilder::new(&data) > + .compress(false) > + .crypt_config(&crypt_config) > + .build() > + .expect("could not create encrypted, uncompressed chunk"); > + > + let data_decoded = chunk > + .decode(Some(&crypt_config), Some(&digest)) > + .expect("cannot decode encrypted, uncompressed chunk"); > + assert_eq!(data, data_decoded); > + > + // encrypted, compressed > + let (chunk, digest) = DataChunkBuilder::new(&data) > + .compress(true) > + .crypt_config(&crypt_config) > + .build() > + .expect("could not create encrypted, compressed chunk"); > + > + let data_decoded = chunk > + .decode(Some(&crypt_config), Some(&digest)) > + .expect("cannot decode encrypted, compressed chunk"); > + assert_eq!(data, data_decoded); > + } > +} IMO it would be nicer to move the (sub) testcases (e.g. 'encrypted, compressed', 'encrypted, uncompressed') to individual test cases, transforming the info from the comments into the test name, for instance: #[test] fn test_encrypted_uncompressed() { ... } (the fact that you test the data_blob module is clear from the full test name, which includes the module prefix - so I think you can drop it from the test name) IMO that is quite nice when at some point we end up with a failing assertion, because you see exactly from the name of the failing test what scenario did not work. What's your take on this? -- - Lukas _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v2 2/4] datastore: test DataBlob encode/decode roundtrip 2024-07-31 9:47 ` Lukas Wagner @ 2024-07-31 9:50 ` Dominik Csapak 0 siblings, 0 replies; 14+ messages in thread From: Dominik Csapak @ 2024-07-31 9:50 UTC (permalink / raw) To: Lukas Wagner, Proxmox Backup Server development discussion On 7/31/24 11:47, Lukas Wagner wrote: > On 2024-07-31 11:36, Dominik Csapak wrote: >> so that we can be sure we can decode an encoded blob again >> >> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> >> --- >> new in v2 >> pbs-datastore/src/data_blob.rs | 66 ++++++++++++++++++++++++++++++++++ >> 1 file changed, 66 insertions(+) >> >> diff --git a/pbs-datastore/src/data_blob.rs b/pbs-datastore/src/data_blob.rs >> index a7a55fb7..8715afef 100644 >> --- a/pbs-datastore/src/data_blob.rs >> +++ b/pbs-datastore/src/data_blob.rs >> @@ -562,3 +562,69 @@ impl<'a, 'b> DataChunkBuilder<'a, 'b> { >> chunk_builder.build() >> } >> } >> + >> +#[cfg(test)] >> +mod test { >> + use pbs_tools::crypt_config::CryptConfig; >> + >> + use super::DataChunkBuilder; >> + >> + const TEST_DATA_LEN: usize = 50; >> + >> + #[test] >> + fn test_data_blob_builder() { >> + let mut data = Vec::with_capacity(TEST_DATA_LEN); >> + for i in 0..TEST_DATA_LEN / 10 { >> + for _ in 0..10 { >> + data.push(i as u8); >> + } >> + } >> + >> + // unencrypted, uncompressed >> + let (chunk, digest) = DataChunkBuilder::new(&data) >> + .compress(false) >> + .build() >> + .expect("could not create unencrypted, uncompressed chunk"); >> + >> + let data_decoded = chunk >> + .decode(None, Some(&digest)) >> + .expect("cannot decode unencrypted, uncompressed chunk"); >> + assert_eq!(data, data_decoded); >> + >> + // unencrypted, compressed >> + let (chunk, digest) = DataChunkBuilder::new(&data) >> + .compress(true) >> + .build() >> + .expect("could not create unencrypted, compressed chunk"); >> + >> + let data_decoded = chunk >> + .decode(None, Some(&digest)) >> + .expect("cannot decode unencrypted, compressed chunk"); >> + assert_eq!(data, data_decoded); >> + >> + // encrypted, uncompressed >> + let crypt_config = CryptConfig::new([9; 32]).expect("could not create crypt config"); >> + let (chunk, digest) = DataChunkBuilder::new(&data) >> + .compress(false) >> + .crypt_config(&crypt_config) >> + .build() >> + .expect("could not create encrypted, uncompressed chunk"); >> + >> + let data_decoded = chunk >> + .decode(Some(&crypt_config), Some(&digest)) >> + .expect("cannot decode encrypted, uncompressed chunk"); >> + assert_eq!(data, data_decoded); >> + >> + // encrypted, compressed >> + let (chunk, digest) = DataChunkBuilder::new(&data) >> + .compress(true) >> + .crypt_config(&crypt_config) >> + .build() >> + .expect("could not create encrypted, compressed chunk"); >> + >> + let data_decoded = chunk >> + .decode(Some(&crypt_config), Some(&digest)) >> + .expect("cannot decode encrypted, compressed chunk"); >> + assert_eq!(data, data_decoded); >> + } >> +} > > IMO it would be nicer to move the (sub) testcases > (e.g. 'encrypted, compressed', 'encrypted, uncompressed') to individual test cases, > transforming the info from the comments into the test name, for instance: > > #[test] > fn test_encrypted_uncompressed() { ... } > > (the fact that you test the data_blob module is clear from the full test name, which includes > the module prefix - so I think you can drop it from the test name) > > IMO that is quite nice when at some point we end up with a failing assertion, because > you see exactly from the name of the failing test what scenario did not work. > > What's your take on this? > sure makes sense i'll send a v3, but i'll wait for a review on the other patches first _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 14+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v2 3/4] datastore: data blob: increase compression throughput 2024-07-31 9:36 [pbs-devel] [PATCH proxmox-backup v2 0/4] improve compression throughput Dominik Csapak 2024-07-31 9:36 ` [pbs-devel] [PATCH proxmox-backup v2 1/4] remove data blob writer Dominik Csapak 2024-07-31 9:36 ` [pbs-devel] [PATCH proxmox-backup v2 2/4] datastore: test DataBlob encode/decode roundtrip Dominik Csapak @ 2024-07-31 9:36 ` Dominik Csapak 2024-07-31 14:39 ` Thomas Lamprecht 2024-07-31 9:36 ` [pbs-devel] [PATCH proxmox-backup v2 4/4] datastore: DataBlob encode: simplify code Dominik Csapak 2024-08-05 9:33 ` [pbs-devel] [PATCH proxmox-backup v2 0/4] improve compression throughput Dominik Csapak 4 siblings, 1 reply; 14+ messages in thread From: Dominik Csapak @ 2024-07-31 9:36 UTC (permalink / raw) To: pbs-devel by not using `zstd::stream::copy_encode`, because that has an allocation pattern that reduces throughput if the target/source storage and the network are faster than the chunk creation. instead use `zstd::bulk::compress_to_buffer` which shouldn't do any big allocations, since we provide the target buffer. To handle the case that the target buffer is too small, we now ignore all zstd error and continue with the uncompressed data, logging the error except if the target buffer is too small. For now, we have to parse the error string for that, as `zstd` maps all errors as `io::ErrorKind::Other`. Until that gets changed, there is no other way to differentiate between different kind of errors. Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> --- changes from v1: * fixed commit message * reduced log severity to `warn` * use vec![0; size] * omit unnecessary buffer allocation in the unencrypted,uncompressed case by reusing the initial buffer that was tried for compression pbs-datastore/src/data_blob.rs | 37 +++++++++++++++++++--------------- 1 file changed, 21 insertions(+), 16 deletions(-) diff --git a/pbs-datastore/src/data_blob.rs b/pbs-datastore/src/data_blob.rs index 8715afef..2a528204 100644 --- a/pbs-datastore/src/data_blob.rs +++ b/pbs-datastore/src/data_blob.rs @@ -136,39 +136,44 @@ impl DataBlob { DataBlob { raw_data } } else { - let max_data_len = data.len() + std::mem::size_of::<DataBlobHeader>(); + let header_len = std::mem::size_of::<DataBlobHeader>(); + let max_data_len = data.len() + header_len; + let mut raw_data = vec![0; max_data_len]; if compress { - let mut comp_data = Vec::with_capacity(max_data_len); - let head = DataBlobHeader { magic: COMPRESSED_BLOB_MAGIC_1_0, crc: [0; 4], }; unsafe { - comp_data.write_le_value(head)?; + (&mut raw_data[0..header_len]).write_le_value(head)?; } - zstd::stream::copy_encode(data, &mut comp_data, 1)?; - - if comp_data.len() < max_data_len { - let mut blob = DataBlob { - raw_data: comp_data, - }; - blob.set_crc(blob.compute_crc()); - return Ok(blob); + match zstd::bulk::compress_to_buffer(data, &mut raw_data[header_len..], 1) { + Ok(size) if size <= data.len() => { + raw_data.truncate(header_len + size); + let mut blob = DataBlob { raw_data }; + blob.set_crc(blob.compute_crc()); + return Ok(blob); + } + // if size is bigger than the data, or any error is returned, continue with non + // compressed archive but log all errors beside buffer too small + Ok(_) => {} + Err(err) => { + if !err.to_string().contains("Destination buffer is too small") { + log::warn!("zstd compression error: {err}"); + } + } } } - let mut raw_data = Vec::with_capacity(max_data_len); - let head = DataBlobHeader { magic: UNCOMPRESSED_BLOB_MAGIC_1_0, crc: [0; 4], }; unsafe { - raw_data.write_le_value(head)?; + (&mut raw_data[0..header_len]).write_le_value(head)?; } - raw_data.extend_from_slice(data); + (&mut raw_data[header_len..]).write_all(data)?; DataBlob { raw_data } }; -- 2.39.2 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v2 3/4] datastore: data blob: increase compression throughput 2024-07-31 9:36 ` [pbs-devel] [PATCH proxmox-backup v2 3/4] datastore: data blob: increase compression throughput Dominik Csapak @ 2024-07-31 14:39 ` Thomas Lamprecht 2024-08-01 6:55 ` Dominik Csapak 2024-08-02 10:47 ` Dominik Csapak 0 siblings, 2 replies; 14+ messages in thread From: Thomas Lamprecht @ 2024-07-31 14:39 UTC (permalink / raw) To: Proxmox Backup Server development discussion, Dominik Csapak Am 31/07/2024 um 11:36 schrieb Dominik Csapak: > by not using `zstd::stream::copy_encode`, because that has an allocation > pattern that reduces throughput if the target/source storage and the > network are faster than the chunk creation. any before/after benchmark numbers would be really great to have in the commit message of any such patch. > > instead use `zstd::bulk::compress_to_buffer` which shouldn't do any big > allocations, since we provide the target buffer. > > To handle the case that the target buffer is too small, we now ignore > all zstd error and continue with the uncompressed data, logging the error > except if the target buffer is too small. This is hard to read to me and might to better with some reasoning add for why this is OK, even if it's clear to you, maybe something like: In case of a compression error just return the uncompressed data, there's nothing we can do and saving uncompressed data is better than having none. Additionally, log any such error besides the one for the target buffer being too small. > For now, we have to parse the error string for that, as `zstd` maps all > errors as `io::ErrorKind::Other`. Until that gets changed, there is no > other way to differentiate between different kind of errors. FWIW, you could also use the lower-level zstd_safe's compress2 [0] here, compress_to_buffer is just a thin wrapper around that [1] anyway. Then you could match the return value to see if it equals `70`, i.e., the value of the ZSTD_error_dstSize_tooSmall [2] from the ZSTD_ErrorCode enum. I mean, naturally it would be much better if upstream provided a saner interface or at least a binding for the enum, but IME such error codes are quite stable if defined in this enum way, at least more stable than strings, so might be a slightly better workaround. [0]: https://docs.rs/zstd-safe/latest/zstd_safe/struct.CCtx.html#method.compress2 [1]: https://docs.rs/zstd/latest/src/zstd/bulk/compressor.rs.html#117-125 [2]: https://github.com/facebook/zstd/blob/fdfb2aff39dc498372d8c9e5f2330b692fea9794/lib/zstd_errors.h#L88 besides that and a small nit below: looks OK to me > > Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> > --- > changes from v1: > * fixed commit message > * reduced log severity to `warn` > * use vec![0; size] > * omit unnecessary buffer allocation in the unencrypted,uncompressed case > by reusing the initial buffer that was tried for compression > pbs-datastore/src/data_blob.rs | 37 +++++++++++++++++++--------------- > 1 file changed, 21 insertions(+), 16 deletions(-) > > diff --git a/pbs-datastore/src/data_blob.rs b/pbs-datastore/src/data_blob.rs > index 8715afef..2a528204 100644 > --- a/pbs-datastore/src/data_blob.rs > +++ b/pbs-datastore/src/data_blob.rs > @@ -136,39 +136,44 @@ impl DataBlob { > > DataBlob { raw_data } > } else { > - let max_data_len = data.len() + std::mem::size_of::<DataBlobHeader>(); > + let header_len = std::mem::size_of::<DataBlobHeader>(); > + let max_data_len = data.len() + header_len; > + let mut raw_data = vec![0; max_data_len]; > if compress { > - let mut comp_data = Vec::with_capacity(max_data_len); > - > let head = DataBlobHeader { > magic: COMPRESSED_BLOB_MAGIC_1_0, > crc: [0; 4], > }; > unsafe { > - comp_data.write_le_value(head)?; > + (&mut raw_data[0..header_len]).write_le_value(head)?; > } > > - zstd::stream::copy_encode(data, &mut comp_data, 1)?; > - > - if comp_data.len() < max_data_len { > - let mut blob = DataBlob { > - raw_data: comp_data, > - }; > - blob.set_crc(blob.compute_crc()); > - return Ok(blob); > + match zstd::bulk::compress_to_buffer(data, &mut raw_data[header_len..], 1) { > + Ok(size) if size <= data.len() => { > + raw_data.truncate(header_len + size); > + let mut blob = DataBlob { raw_data }; > + blob.set_crc(blob.compute_crc()); > + return Ok(blob); > + } > + // if size is bigger than the data, or any error is returned, continue with non > + // compressed archive but log all errors beside buffer too small this is mostly a 1:1 translation of the code to a comment, IMO not _that_ useful, at least if not really complex, and something one has to remember to update too on modifying the code; but not too hard feelings here. > + Ok(_) => {} > + Err(err) => { > + if !err.to_string().contains("Destination buffer is too small") { > + log::warn!("zstd compression error: {err}"); > + } > + } > } > } > > - let mut raw_data = Vec::with_capacity(max_data_len); > - > let head = DataBlobHeader { > magic: UNCOMPRESSED_BLOB_MAGIC_1_0, > crc: [0; 4], > }; > unsafe { > - raw_data.write_le_value(head)?; > + (&mut raw_data[0..header_len]).write_le_value(head)?; > } > - raw_data.extend_from_slice(data); > + (&mut raw_data[header_len..]).write_all(data)?; > > DataBlob { raw_data } > }; _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v2 3/4] datastore: data blob: increase compression throughput 2024-07-31 14:39 ` Thomas Lamprecht @ 2024-08-01 6:55 ` Dominik Csapak 2024-08-02 10:47 ` Dominik Csapak 1 sibling, 0 replies; 14+ messages in thread From: Dominik Csapak @ 2024-08-01 6:55 UTC (permalink / raw) To: Thomas Lamprecht, Proxmox Backup Server development discussion On 7/31/24 16:39, Thomas Lamprecht wrote: > Am 31/07/2024 um 11:36 schrieb Dominik Csapak: >> by not using `zstd::stream::copy_encode`, because that has an allocation >> pattern that reduces throughput if the target/source storage and the >> network are faster than the chunk creation. > > any before/after benchmark numbers would be really great to have in the > commit message of any such patch. > ok, i put it in the cover letter but you're right, it's better to put them here >> >> instead use `zstd::bulk::compress_to_buffer` which shouldn't do any big >> allocations, since we provide the target buffer. >> >> To handle the case that the target buffer is too small, we now ignore >> all zstd error and continue with the uncompressed data, logging the error >> except if the target buffer is too small. > > This is hard to read to me and might to better with some reasoning > add for why this is OK, even if it's clear to you, maybe something like: > > In case of a compression error just return the uncompressed data, > there's nothing we can do and saving uncompressed data is better than > having none. Additionally, log any such error besides the one for the > target buffer being too small. Ok > > >> For now, we have to parse the error string for that, as `zstd` maps all >> errors as `io::ErrorKind::Other`. Until that gets changed, there is no >> other way to differentiate between different kind of errors. > > FWIW, you could also use the lower-level zstd_safe's compress2 [0] here, > compress_to_buffer is just a thin wrapper around that [1] anyway. Then you > could match the return value to see if it equals `70`, i.e., the value of > the ZSTD_error_dstSize_tooSmall [2] from the ZSTD_ErrorCode enum. > > I mean, naturally it would be much better if upstream provided a saner > interface or at least a binding for the enum, but IME such error codes > are quite stable if defined in this enum way, at least more stable than > strings, so might be a slightly better workaround. > > [0]: https://docs.rs/zstd-safe/latest/zstd_safe/struct.CCtx.html#method.compress2 > [1]: https://docs.rs/zstd/latest/src/zstd/bulk/compressor.rs.html#117-125 > [2]: https://github.com/facebook/zstd/blob/fdfb2aff39dc498372d8c9e5f2330b692fea9794/lib/zstd_errors.h#L88 > > besides that and a small nit below: looks OK to me i did actually try something like that before sending v1 of this, but i could not get it to work reliably because the returned integer did not match with what zfs had in the code, rather it were (at least on my machine) consistent but "garbage" numbers i guessed at the time that this has to do with how the zstd crates are compiled into the rust binary, but maybe I was mistaken. I'll look again and see if i was just holding it wrong... > >> >> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> >> --- >> changes from v1: >> * fixed commit message >> * reduced log severity to `warn` >> * use vec![0; size] >> * omit unnecessary buffer allocation in the unencrypted,uncompressed case >> by reusing the initial buffer that was tried for compression >> pbs-datastore/src/data_blob.rs | 37 +++++++++++++++++++--------------- >> 1 file changed, 21 insertions(+), 16 deletions(-) >> >> diff --git a/pbs-datastore/src/data_blob.rs b/pbs-datastore/src/data_blob.rs >> index 8715afef..2a528204 100644 >> --- a/pbs-datastore/src/data_blob.rs >> +++ b/pbs-datastore/src/data_blob.rs >> @@ -136,39 +136,44 @@ impl DataBlob { >> >> DataBlob { raw_data } >> } else { >> - let max_data_len = data.len() + std::mem::size_of::<DataBlobHeader>(); >> + let header_len = std::mem::size_of::<DataBlobHeader>(); >> + let max_data_len = data.len() + header_len; >> + let mut raw_data = vec![0; max_data_len]; >> if compress { >> - let mut comp_data = Vec::with_capacity(max_data_len); >> - >> let head = DataBlobHeader { >> magic: COMPRESSED_BLOB_MAGIC_1_0, >> crc: [0; 4], >> }; >> unsafe { >> - comp_data.write_le_value(head)?; >> + (&mut raw_data[0..header_len]).write_le_value(head)?; >> } >> >> - zstd::stream::copy_encode(data, &mut comp_data, 1)?; >> - >> - if comp_data.len() < max_data_len { >> - let mut blob = DataBlob { >> - raw_data: comp_data, >> - }; >> - blob.set_crc(blob.compute_crc()); >> - return Ok(blob); >> + match zstd::bulk::compress_to_buffer(data, &mut raw_data[header_len..], 1) { >> + Ok(size) if size <= data.len() => { >> + raw_data.truncate(header_len + size); >> + let mut blob = DataBlob { raw_data }; >> + blob.set_crc(blob.compute_crc()); >> + return Ok(blob); >> + } >> + // if size is bigger than the data, or any error is returned, continue with non >> + // compressed archive but log all errors beside buffer too small > > this is mostly a 1:1 translation of the code to a comment, IMO not _that_ > useful, at least if not really complex, and something one has to remember > to update too on modifying the code; but not too hard feelings here. you're right that it doesn't tell much more than the code. I wanted to capture the purpose of both the remaining Ok and Err paths in a comment, but did poorly so, i'll try to find a better comment in v3 > >> + Ok(_) => {} >> + Err(err) => { >> + if !err.to_string().contains("Destination buffer is too small") { >> + log::warn!("zstd compression error: {err}"); >> + } >> + } >> } >> } >> >> - let mut raw_data = Vec::with_capacity(max_data_len); >> - >> let head = DataBlobHeader { >> magic: UNCOMPRESSED_BLOB_MAGIC_1_0, >> crc: [0; 4], >> }; >> unsafe { >> - raw_data.write_le_value(head)?; >> + (&mut raw_data[0..header_len]).write_le_value(head)?; >> } >> - raw_data.extend_from_slice(data); >> + (&mut raw_data[header_len..]).write_all(data)?; >> >> DataBlob { raw_data } >> }; > _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v2 3/4] datastore: data blob: increase compression throughput 2024-07-31 14:39 ` Thomas Lamprecht 2024-08-01 6:55 ` Dominik Csapak @ 2024-08-02 10:47 ` Dominik Csapak 2024-08-02 11:59 ` Thomas Lamprecht 1 sibling, 1 reply; 14+ messages in thread From: Dominik Csapak @ 2024-08-02 10:47 UTC (permalink / raw) To: Thomas Lamprecht, Proxmox Backup Server development discussion On 7/31/24 16:39, Thomas Lamprecht wrote: > Am 31/07/2024 um 11:36 schrieb Dominik Csapak: >> For now, we have to parse the error string for that, as `zstd` maps all >> errors as `io::ErrorKind::Other`. Until that gets changed, there is no >> other way to differentiate between different kind of errors. > > FWIW, you could also use the lower-level zstd_safe's compress2 [0] here, > compress_to_buffer is just a thin wrapper around that [1] anyway. Then you > could match the return value to see if it equals `70`, i.e., the value of > the ZSTD_error_dstSize_tooSmall [2] from the ZSTD_ErrorCode enum. > > I mean, naturally it would be much better if upstream provided a saner > interface or at least a binding for the enum, but IME such error codes > are quite stable if defined in this enum way, at least more stable than > strings, so might be a slightly better workaround. > > [0]: https://docs.rs/zstd-safe/latest/zstd_safe/struct.CCtx.html#method.compress2 > [1]: https://docs.rs/zstd/latest/src/zstd/bulk/compressor.rs.html#117-125 > [2]: https://github.com/facebook/zstd/blob/fdfb2aff39dc498372d8c9e5f2330b692fea9794/lib/zstd_errors.h#L88 sadly AFAICS this is currently not possible i tried this and instead of error 70 i got: '18446744073709551546' reading your [2] link, it also states: > * note 1 : this API shall be used with static linking only. > * dynamic linking is not yet officially supported. so i don't thinks this works, unless we'd link statically? so how do we want go forward with this? _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v2 3/4] datastore: data blob: increase compression throughput 2024-08-02 10:47 ` Dominik Csapak @ 2024-08-02 11:59 ` Thomas Lamprecht 2024-08-02 12:38 ` Dominik Csapak 0 siblings, 1 reply; 14+ messages in thread From: Thomas Lamprecht @ 2024-08-02 11:59 UTC (permalink / raw) To: Dominik Csapak, Proxmox Backup Server development discussion On 02/08/2024 12:47, Dominik Csapak wrote: > sadly AFAICS this is currently not possible > > i tried this and instead of error 70 i got: > > '18446744073709551546' > > reading your [2] link, it also states: > >> * note 1 : this API shall be used with static linking only. >> * dynamic linking is not yet officially supported. That just sounds wrong... but oh well. > so i don't thinks this works, unless we'd link statically? > I mean, the library can obviously translated the error to some meaningful string. I mean that could be due to relying on internal compiled stuff that we cannot use, but while I think that is a solid guess, I'd still evaluate how it actually works to be sure. > so how do we want go forward with this? If, after checking, this really seems to be unfeasible then lets go for the string comparison with a TODO/FIXME comment point out why it's done that way and that one might want to reevaluate if still required (in the future) _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v2 3/4] datastore: data blob: increase compression throughput 2024-08-02 11:59 ` Thomas Lamprecht @ 2024-08-02 12:38 ` Dominik Csapak 2024-08-07 15:01 ` Thomas Lamprecht 0 siblings, 1 reply; 14+ messages in thread From: Dominik Csapak @ 2024-08-02 12:38 UTC (permalink / raw) To: Thomas Lamprecht, Proxmox Backup Server development discussion On 8/2/24 13:59, Thomas Lamprecht wrote: > On 02/08/2024 12:47, Dominik Csapak wrote: >> sadly AFAICS this is currently not possible >> >> i tried this and instead of error 70 i got: >> >> '18446744073709551546' >> >> reading your [2] link, it also states: >> >>> * note 1 : this API shall be used with static linking only. >>> * dynamic linking is not yet officially supported. > > That just sounds wrong... but oh well. > >> so i don't thinks this works, unless we'd link statically? >> > > I mean, the library can obviously translated the error to some meaningful > string. I mean that could be due to relying on internal compiled stuff that > we cannot use, but while I think that is a solid guess, I'd still evaluate > how it actually works to be sure. > >> so how do we want go forward with this? > > > If, after checking, this really seems to be unfeasible then lets go for > the string comparison with a TODO/FIXME comment point out why it's done > that way and that one might want to reevaluate if still required (in the > future) mhmm zstd just calls this: --- fn map_error_code(code: usize) -> io::Error { let msg = zstd_safe::get_error_name(code); io::Error::new(io::ErrorKind::Other, msg.to_string()) } --- which calls this: --- pub fn get_error_name(code: usize) -> &'static str { unsafe { // Safety: assumes ZSTD returns a well-formed utf8 string. let name = zstd_sys::ZSTD_getErrorName(code); c_char_to_str(name) } } --- which is part of the zstd api and at the end it maps the error code like this: --- ERR_STATIC ERR_enum ERR_getErrorCode(size_t code) { if (!ERR_isError(code)) return (ERR_enum)0; return (ERR_enum) (0-code); } --- with that result, it maps the code to a string... which matches what i get, since 2^64 - 70 = 18446744073709551546 [0] but, i'm really not sure if we could rely in that since the function is in a 'error_private.c' which seems to me like it's an implementation detail only? 0: https://www.wolframalpha.com/input?i=2%5E64+-+70 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v2 3/4] datastore: data blob: increase compression throughput 2024-08-02 12:38 ` Dominik Csapak @ 2024-08-07 15:01 ` Thomas Lamprecht 0 siblings, 0 replies; 14+ messages in thread From: Thomas Lamprecht @ 2024-08-07 15:01 UTC (permalink / raw) To: Proxmox Backup Server development discussion, Dominik Csapak Seems I forgot to reply-all, so while this is outdated (i.e., already implemented by Dominik in v3) it still is nice to have the list complete, for the record so to say. On 02/08/2024 14:38, Dominik Csapak wrote: > --- > fn map_error_code(code: usize) -> io::Error { > let msg = zstd_safe::get_error_name(code); > io::Error::new(io::ErrorKind::Other, msg.to_string()) > } > --- > > which calls this: > > --- > pub fn get_error_name(code: usize) -> &'static str { > unsafe { > // Safety: assumes ZSTD returns a well-formed utf8 string. > let name = zstd_sys::ZSTD_getErrorName(code); > c_char_to_str(name) > } > } > --- > > which is part of the zstd api and at the end it maps the error code like this: > > --- > ERR_STATIC ERR_enum ERR_getErrorCode(size_t code) { if (!ERR_isError(code)) return (ERR_enum)0; return (ERR_enum) (0-code); } > --- > > with that result, it maps the code to a string... > > which matches what i get, since > > 2^64 - 70 = 18446744073709551546 [0] > Thanks for looking into this and providing the explanation. > but, i'm really not sure if we could rely in that since the function is in a 'error_private.c' which seems to me like it's an implementation detail only? > Yeah, it's not ideal... But it could be made safe enough by adding a test that runs on build and triggers this error explicitly by passing a way to small target buffer, that way we can notice when this internal error changes, which is IMO not _that_ likely, at least not during the same major Debian release, as there we normally only get critical bug and security fixes, and while I don't want to curse it, but I'd really be surprised if this particular code would change semantics, as it's hard to envision that the widely used `- code` pattern to return errors in C ABIs should be part of such a critical flaw. And yeah, while that is not the interface I'd wish for, it doesn't really feels significantly worse to me than doing matching on error string, as those aren't guaranteed to be 100% stable either I'd think. _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 14+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v2 4/4] datastore: DataBlob encode: simplify code 2024-07-31 9:36 [pbs-devel] [PATCH proxmox-backup v2 0/4] improve compression throughput Dominik Csapak ` (2 preceding siblings ...) 2024-07-31 9:36 ` [pbs-devel] [PATCH proxmox-backup v2 3/4] datastore: data blob: increase compression throughput Dominik Csapak @ 2024-07-31 9:36 ` Dominik Csapak 2024-08-05 9:33 ` [pbs-devel] [PATCH proxmox-backup v2 0/4] improve compression throughput Dominik Csapak 4 siblings, 0 replies; 14+ messages in thread From: Dominik Csapak @ 2024-07-31 9:36 UTC (permalink / raw) To: pbs-devel by combining the compression call from both encrypted and unencrypted paths and deciding on the header magic at one site. No functional changes intended, besides reusing the same buffer for compression. Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> --- new in v2 probably better to review the old and new code side by side, instead of trying to decode the diff... pbs-datastore/src/data_blob.rs | 97 ++++++++++++++-------------------- 1 file changed, 41 insertions(+), 56 deletions(-) diff --git a/pbs-datastore/src/data_blob.rs b/pbs-datastore/src/data_blob.rs index 2a528204..4411bc08 100644 --- a/pbs-datastore/src/data_blob.rs +++ b/pbs-datastore/src/data_blob.rs @@ -93,28 +93,46 @@ impl DataBlob { bail!("data blob too large ({} bytes).", data.len()); } - let mut blob = if let Some(config) = config { - let compr_data; - let (_compress, data, magic) = if compress { - compr_data = zstd::bulk::compress(data, 1)?; - // Note: We only use compression if result is shorter - if compr_data.len() < data.len() { - (true, &compr_data[..], ENCR_COMPR_BLOB_MAGIC_1_0) - } else { - (false, data, ENCRYPTED_BLOB_MAGIC_1_0) + let header_len = if config.is_some() { + std::mem::size_of::<EncryptedDataBlobHeader>() + } else { + std::mem::size_of::<DataBlobHeader>() + }; + + let mut compressed = false; + let mut data_compressed = vec![0u8; header_len + data.len()]; + if compress { + match zstd::bulk::compress_to_buffer(data, &mut data_compressed[header_len..], 1) { + Ok(size) if size <= data.len() => { + data_compressed.truncate(header_len + size); + compressed = true; } - } else { - (false, data, ENCRYPTED_BLOB_MAGIC_1_0) - }; + // if size is bigger than the data, or any error is returned, continue with non + // compressed archive but log all errors beside buffer too small + Ok(_) => {} + Err(err) => { + if !err.to_string().contains("Destination buffer is too small") { + log::warn!("zstd compression error: {err}"); + } + } + } + } - let header_len = std::mem::size_of::<EncryptedDataBlobHeader>(); + let (magic, encryption_source) = match (compressed, config.is_some()) { + (true, true) => (ENCR_COMPR_BLOB_MAGIC_1_0, &data_compressed[header_len..]), + (true, false) => (COMPRESSED_BLOB_MAGIC_1_0, &data_compressed[header_len..]), + (false, true) => (ENCRYPTED_BLOB_MAGIC_1_0, data), + (false, false) => { + (&mut data_compressed[header_len..]).write_all(data)?; + (UNCOMPRESSED_BLOB_MAGIC_1_0, data) + } + }; + + let raw_data = if let Some(config) = config { let mut raw_data = Vec::with_capacity(data.len() + header_len); let dummy_head = EncryptedDataBlobHeader { - head: DataBlobHeader { - magic: [0u8; 8], - crc: [0; 4], - }, + head: DataBlobHeader { magic, crc: [0; 4] }, iv: [0u8; 16], tag: [0u8; 16], }; @@ -122,7 +140,7 @@ impl DataBlob { raw_data.write_le_value(dummy_head)?; } - let (iv, tag) = Self::encrypt_to(config, data, &mut raw_data)?; + let (iv, tag) = Self::encrypt_to(config, encryption_source, &mut raw_data)?; let head = EncryptedDataBlobHeader { head: DataBlobHeader { magic, crc: [0; 4] }, @@ -134,50 +152,17 @@ impl DataBlob { (&mut raw_data[0..header_len]).write_le_value(head)?; } - DataBlob { raw_data } + raw_data } else { - let header_len = std::mem::size_of::<DataBlobHeader>(); - let max_data_len = data.len() + header_len; - let mut raw_data = vec![0; max_data_len]; - if compress { - let head = DataBlobHeader { - magic: COMPRESSED_BLOB_MAGIC_1_0, - crc: [0; 4], - }; - unsafe { - (&mut raw_data[0..header_len]).write_le_value(head)?; - } - - match zstd::bulk::compress_to_buffer(data, &mut raw_data[header_len..], 1) { - Ok(size) if size <= data.len() => { - raw_data.truncate(header_len + size); - let mut blob = DataBlob { raw_data }; - blob.set_crc(blob.compute_crc()); - return Ok(blob); - } - // if size is bigger than the data, or any error is returned, continue with non - // compressed archive but log all errors beside buffer too small - Ok(_) => {} - Err(err) => { - if !err.to_string().contains("Destination buffer is too small") { - log::warn!("zstd compression error: {err}"); - } - } - } - } - - let head = DataBlobHeader { - magic: UNCOMPRESSED_BLOB_MAGIC_1_0, - crc: [0; 4], - }; + let head = DataBlobHeader { magic, crc: [0; 4] }; unsafe { - (&mut raw_data[0..header_len]).write_le_value(head)?; + (&mut data_compressed[0..header_len]).write_le_value(head)?; } - (&mut raw_data[header_len..]).write_all(data)?; - DataBlob { raw_data } + data_compressed }; + let mut blob = DataBlob { raw_data }; blob.set_crc(blob.compute_crc()); Ok(blob) -- 2.39.2 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v2 0/4] improve compression throughput 2024-07-31 9:36 [pbs-devel] [PATCH proxmox-backup v2 0/4] improve compression throughput Dominik Csapak ` (3 preceding siblings ...) 2024-07-31 9:36 ` [pbs-devel] [PATCH proxmox-backup v2 4/4] datastore: DataBlob encode: simplify code Dominik Csapak @ 2024-08-05 9:33 ` Dominik Csapak 4 siblings, 0 replies; 14+ messages in thread From: Dominik Csapak @ 2024-08-05 9:33 UTC (permalink / raw) To: pbs-devel replaced by a v3: https://lists.proxmox.com/pipermail/pbs-devel/2024-August/010402.html _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-08-07 15:02 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-07-31 9:36 [pbs-devel] [PATCH proxmox-backup v2 0/4] improve compression throughput Dominik Csapak 2024-07-31 9:36 ` [pbs-devel] [PATCH proxmox-backup v2 1/4] remove data blob writer Dominik Csapak 2024-07-31 9:36 ` [pbs-devel] [PATCH proxmox-backup v2 2/4] datastore: test DataBlob encode/decode roundtrip Dominik Csapak 2024-07-31 9:47 ` Lukas Wagner 2024-07-31 9:50 ` Dominik Csapak 2024-07-31 9:36 ` [pbs-devel] [PATCH proxmox-backup v2 3/4] datastore: data blob: increase compression throughput Dominik Csapak 2024-07-31 14:39 ` Thomas Lamprecht 2024-08-01 6:55 ` Dominik Csapak 2024-08-02 10:47 ` Dominik Csapak 2024-08-02 11:59 ` Thomas Lamprecht 2024-08-02 12:38 ` Dominik Csapak 2024-08-07 15:01 ` Thomas Lamprecht 2024-07-31 9:36 ` [pbs-devel] [PATCH proxmox-backup v2 4/4] datastore: DataBlob encode: simplify code Dominik Csapak 2024-08-05 9:33 ` [pbs-devel] [PATCH proxmox-backup v2 0/4] improve compression throughput Dominik Csapak
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox