public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Proxmox Backup Server development discussion
	<pbs-devel@lists.proxmox.com>,
	Dominik Csapak <d.csapak@proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup v2 3/4] datastore: data blob: increase compression throughput
Date: Wed, 31 Jul 2024 16:39:57 +0200	[thread overview]
Message-ID: <fa70b3f1-e7d0-4cd2-ad07-65a0cfa0f892@proxmox.com> (raw)
In-Reply-To: <20240731093604.1315088-4-d.csapak@proxmox.com>

Am 31/07/2024 um 11:36 schrieb Dominik Csapak:
> by not using `zstd::stream::copy_encode`, because that has an allocation
> pattern that reduces throughput if the target/source storage and the
> network are faster than the chunk creation.

any before/after benchmark numbers would be really great to have in the
commit message of any such patch.

> 
> instead use `zstd::bulk::compress_to_buffer` which shouldn't do any big
> allocations, since we provide the target buffer.
> 
> To handle the case that the target buffer is too small, we now ignore
> all zstd error and continue with the uncompressed data, logging the error
> except if the target buffer is too small.

This is hard to read to me and might to better with some reasoning
add for why this is OK, even if it's clear to you, maybe something like:

In case of a compression error just return the uncompressed data,
there's nothing we can do and saving uncompressed data is better than
having none. Additionally, log any such error besides the one for the
target buffer being too small.


> For now, we have to parse the error string for that, as `zstd` maps all
> errors as `io::ErrorKind::Other`. Until that gets changed, there is no
> other way to differentiate between different kind of errors.

FWIW, you could also use the lower-level zstd_safe's compress2 [0] here,
compress_to_buffer is just a thin wrapper around that [1] anyway. Then you
could match the return value to see if it equals `70`, i.e., the value of
the ZSTD_error_dstSize_tooSmall [2] from the ZSTD_ErrorCode enum.

I mean, naturally it would be much better if upstream provided a saner
interface or at least a binding for the enum, but IME such error codes
are quite stable if defined in this enum way, at least more stable than
strings, so might be a slightly better workaround.

[0]: https://docs.rs/zstd-safe/latest/zstd_safe/struct.CCtx.html#method.compress2
[1]: https://docs.rs/zstd/latest/src/zstd/bulk/compressor.rs.html#117-125
[2]: https://github.com/facebook/zstd/blob/fdfb2aff39dc498372d8c9e5f2330b692fea9794/lib/zstd_errors.h#L88

besides that and a small nit below: looks OK to me

> 
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> changes from v1:
> * fixed commit message
> * reduced log severity to `warn`
> * use vec![0; size]
> * omit unnecessary buffer allocation in the unencrypted,uncompressed case
>   by reusing the initial buffer that was tried for compression
>  pbs-datastore/src/data_blob.rs | 37 +++++++++++++++++++---------------
>  1 file changed, 21 insertions(+), 16 deletions(-)
> 
> diff --git a/pbs-datastore/src/data_blob.rs b/pbs-datastore/src/data_blob.rs
> index 8715afef..2a528204 100644
> --- a/pbs-datastore/src/data_blob.rs
> +++ b/pbs-datastore/src/data_blob.rs
> @@ -136,39 +136,44 @@ impl DataBlob {
>  
>              DataBlob { raw_data }
>          } else {
> -            let max_data_len = data.len() + std::mem::size_of::<DataBlobHeader>();
> +            let header_len = std::mem::size_of::<DataBlobHeader>();
> +            let max_data_len = data.len() + header_len;
> +            let mut raw_data = vec![0; max_data_len];
>              if compress {
> -                let mut comp_data = Vec::with_capacity(max_data_len);
> -
>                  let head = DataBlobHeader {
>                      magic: COMPRESSED_BLOB_MAGIC_1_0,
>                      crc: [0; 4],
>                  };
>                  unsafe {
> -                    comp_data.write_le_value(head)?;
> +                    (&mut raw_data[0..header_len]).write_le_value(head)?;
>                  }
>  
> -                zstd::stream::copy_encode(data, &mut comp_data, 1)?;
> -
> -                if comp_data.len() < max_data_len {
> -                    let mut blob = DataBlob {
> -                        raw_data: comp_data,
> -                    };
> -                    blob.set_crc(blob.compute_crc());
> -                    return Ok(blob);
> +                match zstd::bulk::compress_to_buffer(data, &mut raw_data[header_len..], 1) {
> +                    Ok(size) if size <= data.len() => {
> +                        raw_data.truncate(header_len + size);
> +                        let mut blob = DataBlob { raw_data };
> +                        blob.set_crc(blob.compute_crc());
> +                        return Ok(blob);
> +                    }
> +                    // if size is bigger than the data, or any error is returned, continue with non
> +                    // compressed archive but log all errors beside buffer too small

this is mostly a 1:1 translation of the code to a comment, IMO not _that_
useful, at least if not really complex, and something one has to remember
to update too on modifying the code; but not too hard feelings here.

> +                    Ok(_) => {}
> +                    Err(err) => {
> +                        if !err.to_string().contains("Destination buffer is too small") {
> +                            log::warn!("zstd compression error: {err}");
> +                        }
> +                    }
>                  }
>              }
>  
> -            let mut raw_data = Vec::with_capacity(max_data_len);
> -
>              let head = DataBlobHeader {
>                  magic: UNCOMPRESSED_BLOB_MAGIC_1_0,
>                  crc: [0; 4],
>              };
>              unsafe {
> -                raw_data.write_le_value(head)?;
> +                (&mut raw_data[0..header_len]).write_le_value(head)?;
>              }
> -            raw_data.extend_from_slice(data);
> +            (&mut raw_data[header_len..]).write_all(data)?;
>  
>              DataBlob { raw_data }
>          };



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


  reply	other threads:[~2024-07-31 14:40 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-31  9:36 [pbs-devel] [PATCH proxmox-backup v2 0/4] improve " Dominik Csapak
2024-07-31  9:36 ` [pbs-devel] [PATCH proxmox-backup v2 1/4] remove data blob writer Dominik Csapak
2024-07-31  9:36 ` [pbs-devel] [PATCH proxmox-backup v2 2/4] datastore: test DataBlob encode/decode roundtrip Dominik Csapak
2024-07-31  9:47   ` Lukas Wagner
2024-07-31  9:50     ` Dominik Csapak
2024-07-31  9:36 ` [pbs-devel] [PATCH proxmox-backup v2 3/4] datastore: data blob: increase compression throughput Dominik Csapak
2024-07-31 14:39   ` Thomas Lamprecht [this message]
2024-08-01  6:55     ` Dominik Csapak
2024-08-02 10:47     ` Dominik Csapak
2024-08-02 11:59       ` Thomas Lamprecht
2024-08-02 12:38         ` Dominik Csapak
2024-08-07 15:01           ` Thomas Lamprecht
2024-07-31  9:36 ` [pbs-devel] [PATCH proxmox-backup v2 4/4] datastore: DataBlob encode: simplify code Dominik Csapak
2024-08-05  9:33 ` [pbs-devel] [PATCH proxmox-backup v2 0/4] improve compression throughput Dominik Csapak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fa70b3f1-e7d0-4cd2-ad07-65a0cfa0f892@proxmox.com \
    --to=t.lamprecht@proxmox.com \
    --cc=d.csapak@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal