From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pbs-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9])
	by lore.proxmox.com (Postfix) with ESMTPS id 60C3C1FF167
	for <inbox@lore.proxmox.com>; Wed, 31 Jul 2024 16:40:30 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 572D318199;
	Wed, 31 Jul 2024 16:40:32 +0200 (CEST)
Message-ID: <fa70b3f1-e7d0-4cd2-ad07-65a0cfa0f892@proxmox.com>
Date: Wed, 31 Jul 2024 16:39:57 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird Beta
To: Proxmox Backup Server development discussion
 <pbs-devel@lists.proxmox.com>, Dominik Csapak <d.csapak@proxmox.com>
References: <20240731093604.1315088-1-d.csapak@proxmox.com>
 <20240731093604.1315088-4-d.csapak@proxmox.com>
Content-Language: en-GB, de-AT
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
Autocrypt: addr=t.lamprecht@proxmox.com; keydata=
 xsFNBFsLjcYBEACsaQP6uTtw/xHTUCKF4VD4/Wfg7gGn47+OfCKJQAD+Oyb3HSBkjclopC5J
 uXsB1vVOfqVYE6PO8FlD2L5nxgT3SWkc6Ka634G/yGDU3ZC3C/7NcDVKhSBI5E0ww4Qj8s9w
 OQRloemb5LOBkJNEUshkWRTHHOmk6QqFB/qBPW2COpAx6oyxVUvBCgm/1S0dAZ9gfkvpqFSD
 90B5j3bL6i9FIv3YGUCgz6Ue3f7u+HsEAew6TMtlt90XV3vT4M2IOuECG/pXwTy7NtmHaBQ7
 UJBcwSOpDEweNob50+9B4KbnVn1ydx+K6UnEcGDvUWBkREccvuExvupYYYQ5dIhRFf3fkS4+
 wMlyAFh8PQUgauod+vqs45FJaSgTqIALSBsEHKEs6IoTXtnnpbhu3p6XBin4hunwoBFiyYt6
 YHLAM1yLfCyX510DFzX/Ze2hLqatqzY5Wa7NIXqYYelz7tXiuCLHP84+sV6JtEkeSUCuOiUY
 virj6nT/nJK8m0BzdR6FgGtNxp7RVXFRz/+mwijJVLpFsyG1i0Hmv2zTn3h2nyGK/I6yhFNt
 dX69y5hbo6LAsRjLUvZeHXpTU4TrpN/WiCjJblbj5um5eEr4yhcwhVmG102puTtuCECsDucZ
 jpKpUqzXlpLbzG/dp9dXFH3MivvfuaHrg3MtjXY1i+/Oxyp5iwARAQABzTNUaG9tYXMgTGFt
 cHJlY2h0IChBdXRoLTQpIDx0LmxhbXByZWNodEBwcm94bW94LmNvbT7CwY4EEwEIADgWIQQO
 R4qbEl/pah9K6VrTZCM6gDZWBgUCWwuNxgIbAwULCQgHAgYVCAkKCwIEFgIDAQIeAQIXgAAK
 CRDTZCM6gDZWBm/jD/4+6JB2s67eaqoP6x9VGaXNGJPCscwzLuxDTCG90G9FYu29VcXtubH/
 bPwsyBbNUQpqTm/s4XboU2qpS5ykCuTjqavrcP33tdkYfGcItj2xMipJ1i3TWvpikQVsX42R
 G64wovLs/dvpTYphRZkg5DwhgTmy3mRkmofFCTa+//MOcNOORltemp984tWjpR3bUJETNWpF
 sKGZHa3N4kCNxb7A+VMsJZ/1gN3jbQbQG7GkJtnHlWkw9rKCYqBtWrnrHa4UAvSa9M/XCIAB
 FThFGqZI1ojdVlv5gd6b/nWxfOPrLlSxbUo5FZ1i/ycj7/24nznW1V4ykG9iUld4uYUY86bB
 UGSjew1KYp9FmvKiwEoB+zxNnuEQfS7/Bj1X9nxizgweiHIyFsRqgogTvLh403QMSGNSoArk
 tqkorf1U+VhEncIn4H3KksJF0njZKfilrieOO7Vuot1xKr9QnYrZzJ7m7ZxJ/JfKGaRHXkE1
 feMmrvZD1AtdUATZkoeQtTOpMu4r6IQRfSdwm/CkppZXfDe50DJxAMDWwfK2rr2bVkNg/yZI
 tKLBS0YgRTIynkvv0h8d9dIjiicw3RMeYXyqOnSWVva2r+tl+JBaenr8YTQw0zARrhC0mttu
 cIZGnVEvQuDwib57QLqMjQaC1gazKHvhA15H5MNxUhwm229UmdH3KM7BTQRbC43GARAAyTkR
 D6KRJ9Xa2fVMh+6f186q0M3ni+5tsaVhUiykxjsPgkuWXWW9MbLpYXkzX6h/RIEKlo2BGA95
 QwG5+Ya2Bo3g7FGJHAkXY6loq7DgMp5/TVQ8phsSv3WxPTJLCBq6vNBamp5hda4cfXFUymsy
 HsJy4dtgkrPQ/bnsdFDCRUuhJHopnAzKHN8APXpKU6xV5e3GE4LwFsDhNHfH/m9+2yO/trcD
 txSFpyftbK2gaMERHgA8SKkzRhiwRTt9w5idOfpJVkYRsgvuSGZ0pcD4kLCOIFrer5xXudk6
 NgJc36XkFRMnwqrL/bB4k6Pi2u5leyqcXSLyBgeHsZJxg6Lcr2LZ35+8RQGPOw9C0ItmRjtY
 ZpGKPlSxjxA1WHT2YlF9CEt3nx7c4C3thHHtqBra6BGPyW8rvtq4zRqZRLPmZ0kt/kiMPhTM
 8wZAlObbATVrUMcZ/uNjRv2vU9O5aTAD9E5r1B0dlqKgxyoImUWB0JgpILADaT3VybDd3C8X
 s6Jt8MytUP+1cEWt9VKo4vY4Jh5vwrJUDLJvzpN+TsYCZPNVj18+jf9uGRaoK6W++DdMAr5l
 gQiwsNgf9372dbMI7pt2gnT5/YdG+ZHnIIlXC6OUonA1Ro/Itg90Q7iQySnKKkqqnWVc+qO9
 GJbzcGykxD6EQtCSlurt3/5IXTA7t6sAEQEAAcLBdgQYAQgAIBYhBA5HipsSX+lqH0rpWtNk
 IzqANlYGBQJbC43GAhsMAAoJENNkIzqANlYGD1sP/ikKgHgcspEKqDED9gQrTBvipH85si0j
 /Jwu/tBtnYjLgKLh2cjv1JkgYYjb3DyZa1pLsIv6rGnPX9bH9IN03nqirC/Q1Y1lnbNTynPk
 IflgvsJjoTNZjgu1wUdQlBgL/JhUp1sIYID11jZphgzfDgp/E6ve/8xE2HMAnf4zAfJaKgD0
 F+fL1DlcdYUditAiYEuN40Ns/abKs8I1MYx7Yglu3RzJfBzV4t86DAR+OvuF9v188WrFwXCS
 RSf4DmJ8tntyNej+DVGUnmKHupLQJO7uqCKB/1HLlMKc5G3GLoGqJliHjUHUAXNzinlpE2Vj
 C78pxpwxRNg2ilE3AhPoAXrY5qED5PLE9sLnmQ9AzRcMMJUXjTNEDxEYbF55SdGBHHOAcZtA
 kEQKub86e+GHA+Z8oXQSGeSGOkqHi7zfgW1UexddTvaRwE6AyZ6FxTApm8wq8NT2cryWPWTF
 BDSGB3ujWHMM8ERRYJPcBSjTvt0GcEqnd+OSGgxTkGOdufn51oz82zfpVo1t+J/FNz6MRMcg
 8nEC+uKvgzH1nujxJ5pRCBOquFZaGn/p71Yr0oVitkttLKblFsqwa+10Lt6HBxm+2+VLp4Ja
 0WZNncZciz3V3cuArpan/ZhhyiWYV5FD0pOXPCJIx7WS9PTtxiv0AOS4ScWEUmBxyhFeOpYa DrEx
In-Reply-To: <20240731093604.1315088-4-d.csapak@proxmox.com>
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.052 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [docs.rs]
Subject: Re: [pbs-devel] [PATCH proxmox-backup v2 3/4] datastore: data blob:
 increase compression throughput
X-BeenThere: pbs-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe>
Reply-To: Proxmox Backup Server development discussion
 <pbs-devel@lists.proxmox.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: pbs-devel-bounces@lists.proxmox.com
Sender: "pbs-devel" <pbs-devel-bounces@lists.proxmox.com>

Am 31/07/2024 um 11:36 schrieb Dominik Csapak:
> by not using `zstd::stream::copy_encode`, because that has an allocation
> pattern that reduces throughput if the target/source storage and the
> network are faster than the chunk creation.

any before/after benchmark numbers would be really great to have in the
commit message of any such patch.

> 
> instead use `zstd::bulk::compress_to_buffer` which shouldn't do any big
> allocations, since we provide the target buffer.
> 
> To handle the case that the target buffer is too small, we now ignore
> all zstd error and continue with the uncompressed data, logging the error
> except if the target buffer is too small.

This is hard to read to me and might to better with some reasoning
add for why this is OK, even if it's clear to you, maybe something like:

In case of a compression error just return the uncompressed data,
there's nothing we can do and saving uncompressed data is better than
having none. Additionally, log any such error besides the one for the
target buffer being too small.


> For now, we have to parse the error string for that, as `zstd` maps all
> errors as `io::ErrorKind::Other`. Until that gets changed, there is no
> other way to differentiate between different kind of errors.

FWIW, you could also use the lower-level zstd_safe's compress2 [0] here,
compress_to_buffer is just a thin wrapper around that [1] anyway. Then you
could match the return value to see if it equals `70`, i.e., the value of
the ZSTD_error_dstSize_tooSmall [2] from the ZSTD_ErrorCode enum.

I mean, naturally it would be much better if upstream provided a saner
interface or at least a binding for the enum, but IME such error codes
are quite stable if defined in this enum way, at least more stable than
strings, so might be a slightly better workaround.

[0]: https://docs.rs/zstd-safe/latest/zstd_safe/struct.CCtx.html#method.compress2
[1]: https://docs.rs/zstd/latest/src/zstd/bulk/compressor.rs.html#117-125
[2]: https://github.com/facebook/zstd/blob/fdfb2aff39dc498372d8c9e5f2330b692fea9794/lib/zstd_errors.h#L88

besides that and a small nit below: looks OK to me

> 
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> changes from v1:
> * fixed commit message
> * reduced log severity to `warn`
> * use vec![0; size]
> * omit unnecessary buffer allocation in the unencrypted,uncompressed case
>   by reusing the initial buffer that was tried for compression
>  pbs-datastore/src/data_blob.rs | 37 +++++++++++++++++++---------------
>  1 file changed, 21 insertions(+), 16 deletions(-)
> 
> diff --git a/pbs-datastore/src/data_blob.rs b/pbs-datastore/src/data_blob.rs
> index 8715afef..2a528204 100644
> --- a/pbs-datastore/src/data_blob.rs
> +++ b/pbs-datastore/src/data_blob.rs
> @@ -136,39 +136,44 @@ impl DataBlob {
>  
>              DataBlob { raw_data }
>          } else {
> -            let max_data_len = data.len() + std::mem::size_of::<DataBlobHeader>();
> +            let header_len = std::mem::size_of::<DataBlobHeader>();
> +            let max_data_len = data.len() + header_len;
> +            let mut raw_data = vec![0; max_data_len];
>              if compress {
> -                let mut comp_data = Vec::with_capacity(max_data_len);
> -
>                  let head = DataBlobHeader {
>                      magic: COMPRESSED_BLOB_MAGIC_1_0,
>                      crc: [0; 4],
>                  };
>                  unsafe {
> -                    comp_data.write_le_value(head)?;
> +                    (&mut raw_data[0..header_len]).write_le_value(head)?;
>                  }
>  
> -                zstd::stream::copy_encode(data, &mut comp_data, 1)?;
> -
> -                if comp_data.len() < max_data_len {
> -                    let mut blob = DataBlob {
> -                        raw_data: comp_data,
> -                    };
> -                    blob.set_crc(blob.compute_crc());
> -                    return Ok(blob);
> +                match zstd::bulk::compress_to_buffer(data, &mut raw_data[header_len..], 1) {
> +                    Ok(size) if size <= data.len() => {
> +                        raw_data.truncate(header_len + size);
> +                        let mut blob = DataBlob { raw_data };
> +                        blob.set_crc(blob.compute_crc());
> +                        return Ok(blob);
> +                    }
> +                    // if size is bigger than the data, or any error is returned, continue with non
> +                    // compressed archive but log all errors beside buffer too small

this is mostly a 1:1 translation of the code to a comment, IMO not _that_
useful, at least if not really complex, and something one has to remember
to update too on modifying the code; but not too hard feelings here.

> +                    Ok(_) => {}
> +                    Err(err) => {
> +                        if !err.to_string().contains("Destination buffer is too small") {
> +                            log::warn!("zstd compression error: {err}");
> +                        }
> +                    }
>                  }
>              }
>  
> -            let mut raw_data = Vec::with_capacity(max_data_len);
> -
>              let head = DataBlobHeader {
>                  magic: UNCOMPRESSED_BLOB_MAGIC_1_0,
>                  crc: [0; 4],
>              };
>              unsafe {
> -                raw_data.write_le_value(head)?;
> +                (&mut raw_data[0..header_len]).write_le_value(head)?;
>              }
> -            raw_data.extend_from_slice(data);
> +            (&mut raw_data[header_len..]).write_all(data)?;
>  
>              DataBlob { raw_data }
>          };



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel