From: Dominik Csapak <d.csapak@proxmox.com>
To: Thomas Lamprecht <t.lamprecht@proxmox.com>,
Proxmox Backup Server development discussion
<pbs-devel@lists.proxmox.com>
Subject: Re: [pbs-devel] applied-series: [PATCH proxmox-backup v3 0/5] improve compression throughput
Date: Thu, 8 Aug 2024 08:53:38 +0200 [thread overview]
Message-ID: <0a451e8e-fa0f-44ec-ae6e-f4dcb95a82ab@proxmox.com> (raw)
In-Reply-To: <2e54f16c-f9a3-488b-b22a-04bd45ffbd05@proxmox.com>
On 8/7/24 19:06, Thomas Lamprecht wrote:
> On 05/08/2024 11:24, Dominik Csapak wrote:
>> in my tests (against current master) it improved the throughput if
>> the source/target storage is fast enough (tmpfs -> tmpfs):
>>
>> Type master (MiB/s) with my patches (MiB/s)
>> .img file ~614 ~767
>> pxar one big file ~657 ~807
>> pxar small files ~576 ~627
>>
>> (these results are also in the relevant commit message)
>>
>> It would be great, if someone else can cross check my results here.
>> Note: the the pxar code being faster than the img code seems to stem
>> from better multithreading pipelining in that code or in tokio (pxar
>> codepath scales more directly with more cores than the .img codepath)
>>
>> changes from v2:
>> * use zstd_safe instead of zstd so we have access to the underlying
>> error code
>> * add test for the error code handling since that's not part of the
>> public zstd api, only an implementation detail (albeit one that's
>> not likely to change soon)
>> * seperated the tests for the decode(encode()) roundtrip so a failure
>> can more easily assigned to a specific codepath
>>
>> changes from v1:
>> * reorder patches so that the data blob writer removal is the first one
>> * add tests for DataBlob that we can decode what we encoded
>> (to see that my patches don't mess up the chunk generation)
>> * add new patch to cleanup the `encode` function a bit
>>
>> Dominik Csapak (5):
>> remove data blob writer
>> datastore: test DataBlob encode/decode roundtrip
>> datastore: data blob: add helper and test for checking zstd_safe error
>> code
>> datastore: data blob: increase compression throughput
>> datastore: DataBlob encode: simplify code
>>
>> Cargo.toml | 1 +
>> pbs-datastore/Cargo.toml | 1 +
>> pbs-datastore/src/data_blob.rs | 193 ++++++++++++++++-------
>> pbs-datastore/src/data_blob_writer.rs | 212 --------------------------
>> pbs-datastore/src/lib.rs | 2 -
>> tests/blob_writer.rs | 105 -------------
>> 6 files changed, 144 insertions(+), 370 deletions(-)
>> delete mode 100644 pbs-datastore/src/data_blob_writer.rs
>> delete mode 100644 tests/blob_writer.rs
>>
>
> Applied, with some rewording of the commit message and some slight
> adaption to the test commit.
>
> Ps, it seems the zstd crate authors aren't so sure why they use the
> 32 KB buffer either, which FWICT is the underlying issue here:
>
> https://docs.rs/zstd/latest/src/zstd/stream/zio/writer.rs.html#41-42
>
> But it's a bit hard to follow, to me this looks less like a allocation
> pattern issue (on its own), but rather than a increased overhead due to
> processing in 32 KiB chunks, the extra copying itself naturally doesn't
> help, but that's not a bad allocation pattern but rather a single
> (FWICT) avoidable allocations for the small buffer, but as said, not
> 100% sure as the code is rather over-engineered... Anyhow, I tried to
> add these findings, including the uncertainty they have, in the commit
> message to have some better background.
>
> I know you could have done this in a v4, but it felt faster to just
> amend the changes, especially since I have a few days off and would
> have to recreate the mental context anyway.
Ah ok, thanks for investigating (I was not patient enough for that seemingly..)
also thanks for the amending of the commit messages :)
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
prev parent reply other threads:[~2024-08-08 6:53 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-05 9:24 [pbs-devel] " Dominik Csapak
2024-08-05 9:24 ` [pbs-devel] [PATCH proxmox-backup v3 1/5] remove data blob writer Dominik Csapak
2024-08-05 9:24 ` [pbs-devel] [PATCH proxmox-backup v3 2/5] datastore: test DataBlob encode/decode roundtrip Dominik Csapak
2024-08-05 9:24 ` [pbs-devel] [PATCH proxmox-backup v3 3/5] datastore: data blob: add helper and test for checking zstd_safe error code Dominik Csapak
2024-08-05 9:24 ` [pbs-devel] [PATCH proxmox-backup v3 4/5] datastore: data blob: increase compression throughput Dominik Csapak
2024-08-05 9:32 ` Dominik Csapak
2024-08-05 9:24 ` [pbs-devel] [PATCH proxmox-backup v3 5/5] datastore: DataBlob encode: simplify code Dominik Csapak
2024-08-07 17:06 ` [pbs-devel] applied-series: [PATCH proxmox-backup v3 0/5] improve compression throughput Thomas Lamprecht
2024-08-08 6:53 ` Dominik Csapak [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0a451e8e-fa0f-44ec-ae6e-f4dcb95a82ab@proxmox.com \
--to=d.csapak@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
--cc=t.lamprecht@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox