public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Proxmox Backup Server development discussion
	<pbs-devel@lists.proxmox.com>,
	Dominik Csapak <d.csapak@proxmox.com>
Subject: [pbs-devel] applied-series: [PATCH proxmox-backup v3 0/5] improve compression throughput
Date: Wed, 7 Aug 2024 19:06:49 +0200	[thread overview]
Message-ID: <2e54f16c-f9a3-488b-b22a-04bd45ffbd05@proxmox.com> (raw)
In-Reply-To: <20240805092414.1178930-1-d.csapak@proxmox.com>

On 05/08/2024 11:24, Dominik Csapak wrote:
> in my tests (against current master) it improved the throughput if
> the source/target storage is fast enough (tmpfs -> tmpfs):
> 
> Type                master (MiB/s)   with my patches (MiB/s)
> .img file           ~614             ~767
> pxar one big file   ~657             ~807
> pxar small files    ~576             ~627
> 
> (these results are also in the relevant commit message)
> 
> It would be great, if someone else can cross check my results here.
> Note: the the pxar code being faster than the img code seems to stem
> from better multithreading pipelining in that code or in tokio (pxar
> codepath scales more directly with more cores than the .img codepath)
> 
> changes from v2:
> * use zstd_safe instead of zstd so we have access to the underlying
>   error code
> * add test for the error code handling since that's not part of the
>   public zstd api, only an implementation detail (albeit one that's
>   not likely to change soon)
> * seperated the tests for the decode(encode()) roundtrip so a failure
>   can more easily assigned to a specific codepath
> 
> changes from v1:
> * reorder patches so that the data blob writer removal is the first one
> * add tests for DataBlob that we can decode what we encoded
>   (to see that my patches don't mess up the chunk generation)
> * add new patch to cleanup the `encode` function a bit
> 
> Dominik Csapak (5):
>   remove data blob writer
>   datastore: test DataBlob encode/decode roundtrip
>   datastore: data blob: add helper and test for checking zstd_safe error
>     code
>   datastore: data blob: increase compression throughput
>   datastore: DataBlob encode: simplify code
> 
>  Cargo.toml                            |   1 +
>  pbs-datastore/Cargo.toml              |   1 +
>  pbs-datastore/src/data_blob.rs        | 193 ++++++++++++++++-------
>  pbs-datastore/src/data_blob_writer.rs | 212 --------------------------
>  pbs-datastore/src/lib.rs              |   2 -
>  tests/blob_writer.rs                  | 105 -------------
>  6 files changed, 144 insertions(+), 370 deletions(-)
>  delete mode 100644 pbs-datastore/src/data_blob_writer.rs
>  delete mode 100644 tests/blob_writer.rs
> 

Applied, with some rewording of the commit message and some slight
adaption to the test commit.

Ps, it seems the zstd crate authors aren't so sure why they use the
32 KB buffer either, which FWICT is the underlying issue here:

https://docs.rs/zstd/latest/src/zstd/stream/zio/writer.rs.html#41-42

But it's a bit hard to follow, to me this looks less like a allocation
pattern issue (on its own), but rather than a increased overhead due to
processing in 32 KiB chunks, the extra copying itself naturally doesn't
help, but that's not a bad allocation pattern but rather a single
(FWICT) avoidable allocations for the small buffer, but as said, not
100% sure as the code is rather over-engineered... Anyhow, I tried to
add these findings, including the uncertainty they have, in the commit
message to have some better background.

I know you could have done this in a v4, but it felt faster to just
amend the changes, especially since I have a few days off and would
have to recreate the mental context anyway.


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


  parent reply	other threads:[~2024-08-07 17:06 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-05  9:24 [pbs-devel] " Dominik Csapak
2024-08-05  9:24 ` [pbs-devel] [PATCH proxmox-backup v3 1/5] remove data blob writer Dominik Csapak
2024-08-05  9:24 ` [pbs-devel] [PATCH proxmox-backup v3 2/5] datastore: test DataBlob encode/decode roundtrip Dominik Csapak
2024-08-05  9:24 ` [pbs-devel] [PATCH proxmox-backup v3 3/5] datastore: data blob: add helper and test for checking zstd_safe error code Dominik Csapak
2024-08-05  9:24 ` [pbs-devel] [PATCH proxmox-backup v3 4/5] datastore: data blob: increase compression throughput Dominik Csapak
2024-08-05  9:32   ` Dominik Csapak
2024-08-05  9:24 ` [pbs-devel] [PATCH proxmox-backup v3 5/5] datastore: DataBlob encode: simplify code Dominik Csapak
2024-08-07 17:06 ` Thomas Lamprecht [this message]
2024-08-08  6:53   ` [pbs-devel] applied-series: [PATCH proxmox-backup v3 0/5] improve compression throughput Dominik Csapak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2e54f16c-f9a3-488b-b22a-04bd45ffbd05@proxmox.com \
    --to=t.lamprecht@proxmox.com \
    --cc=d.csapak@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal