From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 89CF51FF183 for ; Thu, 1 Aug 2024 08:56:13 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id E16251F4DE; Thu, 1 Aug 2024 08:56:16 +0200 (CEST) Message-ID: Date: Thu, 1 Aug 2024 08:55:42 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Beta To: Thomas Lamprecht , Proxmox Backup Server development discussion References: <20240731093604.1315088-1-d.csapak@proxmox.com> <20240731093604.1315088-4-d.csapak@proxmox.com> Content-Language: en-US From: Dominik Csapak In-Reply-To: X-SPAM-LEVEL: Spam detection results: 0 AWL 0.014 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pbs-devel] [PATCH proxmox-backup v2 3/4] datastore: data blob: increase compression throughput X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox Backup Server development discussion Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" On 7/31/24 16:39, Thomas Lamprecht wrote: > Am 31/07/2024 um 11:36 schrieb Dominik Csapak: >> by not using `zstd::stream::copy_encode`, because that has an allocation >> pattern that reduces throughput if the target/source storage and the >> network are faster than the chunk creation. > > any before/after benchmark numbers would be really great to have in the > commit message of any such patch. > ok, i put it in the cover letter but you're right, it's better to put them here >> >> instead use `zstd::bulk::compress_to_buffer` which shouldn't do any big >> allocations, since we provide the target buffer. >> >> To handle the case that the target buffer is too small, we now ignore >> all zstd error and continue with the uncompressed data, logging the error >> except if the target buffer is too small. > > This is hard to read to me and might to better with some reasoning > add for why this is OK, even if it's clear to you, maybe something like: > > In case of a compression error just return the uncompressed data, > there's nothing we can do and saving uncompressed data is better than > having none. Additionally, log any such error besides the one for the > target buffer being too small. Ok > > >> For now, we have to parse the error string for that, as `zstd` maps all >> errors as `io::ErrorKind::Other`. Until that gets changed, there is no >> other way to differentiate between different kind of errors. > > FWIW, you could also use the lower-level zstd_safe's compress2 [0] here, > compress_to_buffer is just a thin wrapper around that [1] anyway. Then you > could match the return value to see if it equals `70`, i.e., the value of > the ZSTD_error_dstSize_tooSmall [2] from the ZSTD_ErrorCode enum. > > I mean, naturally it would be much better if upstream provided a saner > interface or at least a binding for the enum, but IME such error codes > are quite stable if defined in this enum way, at least more stable than > strings, so might be a slightly better workaround. > > [0]: https://docs.rs/zstd-safe/latest/zstd_safe/struct.CCtx.html#method.compress2 > [1]: https://docs.rs/zstd/latest/src/zstd/bulk/compressor.rs.html#117-125 > [2]: https://github.com/facebook/zstd/blob/fdfb2aff39dc498372d8c9e5f2330b692fea9794/lib/zstd_errors.h#L88 > > besides that and a small nit below: looks OK to me i did actually try something like that before sending v1 of this, but i could not get it to work reliably because the returned integer did not match with what zfs had in the code, rather it were (at least on my machine) consistent but "garbage" numbers i guessed at the time that this has to do with how the zstd crates are compiled into the rust binary, but maybe I was mistaken. I'll look again and see if i was just holding it wrong... > >> >> Signed-off-by: Dominik Csapak >> --- >> changes from v1: >> * fixed commit message >> * reduced log severity to `warn` >> * use vec![0; size] >> * omit unnecessary buffer allocation in the unencrypted,uncompressed case >> by reusing the initial buffer that was tried for compression >> pbs-datastore/src/data_blob.rs | 37 +++++++++++++++++++--------------- >> 1 file changed, 21 insertions(+), 16 deletions(-) >> >> diff --git a/pbs-datastore/src/data_blob.rs b/pbs-datastore/src/data_blob.rs >> index 8715afef..2a528204 100644 >> --- a/pbs-datastore/src/data_blob.rs >> +++ b/pbs-datastore/src/data_blob.rs >> @@ -136,39 +136,44 @@ impl DataBlob { >> >> DataBlob { raw_data } >> } else { >> - let max_data_len = data.len() + std::mem::size_of::(); >> + let header_len = std::mem::size_of::(); >> + let max_data_len = data.len() + header_len; >> + let mut raw_data = vec![0; max_data_len]; >> if compress { >> - let mut comp_data = Vec::with_capacity(max_data_len); >> - >> let head = DataBlobHeader { >> magic: COMPRESSED_BLOB_MAGIC_1_0, >> crc: [0; 4], >> }; >> unsafe { >> - comp_data.write_le_value(head)?; >> + (&mut raw_data[0..header_len]).write_le_value(head)?; >> } >> >> - zstd::stream::copy_encode(data, &mut comp_data, 1)?; >> - >> - if comp_data.len() < max_data_len { >> - let mut blob = DataBlob { >> - raw_data: comp_data, >> - }; >> - blob.set_crc(blob.compute_crc()); >> - return Ok(blob); >> + match zstd::bulk::compress_to_buffer(data, &mut raw_data[header_len..], 1) { >> + Ok(size) if size <= data.len() => { >> + raw_data.truncate(header_len + size); >> + let mut blob = DataBlob { raw_data }; >> + blob.set_crc(blob.compute_crc()); >> + return Ok(blob); >> + } >> + // if size is bigger than the data, or any error is returned, continue with non >> + // compressed archive but log all errors beside buffer too small > > this is mostly a 1:1 translation of the code to a comment, IMO not _that_ > useful, at least if not really complex, and something one has to remember > to update too on modifying the code; but not too hard feelings here. you're right that it doesn't tell much more than the code. I wanted to capture the purpose of both the remaining Ok and Err paths in a comment, but did poorly so, i'll try to find a better comment in v3 > >> + Ok(_) => {} >> + Err(err) => { >> + if !err.to_string().contains("Destination buffer is too small") { >> + log::warn!("zstd compression error: {err}"); >> + } >> + } >> } >> } >> >> - let mut raw_data = Vec::with_capacity(max_data_len); >> - >> let head = DataBlobHeader { >> magic: UNCOMPRESSED_BLOB_MAGIC_1_0, >> crc: [0; 4], >> }; >> unsafe { >> - raw_data.write_le_value(head)?; >> + (&mut raw_data[0..header_len]).write_le_value(head)?; >> } >> - raw_data.extend_from_slice(data); >> + (&mut raw_data[header_len..]).write_all(data)?; >> >> DataBlob { raw_data } >> }; > _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel