public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH proxmox-backup 0/2] improve fixed size chunker performance
@ 2024-07-17 13:08 Fabian Grünbichler
  2024-07-17 13:08 ` [pbs-devel] [PATCH proxmox-backup 1/2] example: improve chunking speed example Fabian Grünbichler
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Fabian Grünbichler @ 2024-07-17 13:08 UTC (permalink / raw)
  To: pbs-devel

found while looking for obvious bottlenecks - the fixed size chunker
used a default 8k input buffer, which causes a lot of polling while
copying data from the input into the 4M chunk.

Fabian Grünbichler (2):
  example: improve chunking speed example
  image backup: use 4M input buffer

 examples/test_chunk_speed2.rs     | 23 ++++++++++++++++++-----
 proxmox-backup-client/src/main.rs |  8 ++++++--
 2 files changed, 24 insertions(+), 7 deletions(-)

-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 1/2] example: improve chunking speed example
  2024-07-17 13:08 [pbs-devel] [PATCH proxmox-backup 0/2] improve fixed size chunker performance Fabian Grünbichler
@ 2024-07-17 13:08 ` Fabian Grünbichler
  2024-07-17 13:08 ` [pbs-devel] [RFC proxmox-backup 2/2] image backup: use 4M input buffer Fabian Grünbichler
  2024-07-22  8:03 ` [pbs-devel] applied: [PATCH proxmox-backup 0/2] improve fixed size chunker performance Thomas Lamprecht
  2 siblings, 0 replies; 5+ messages in thread
From: Fabian Grünbichler @ 2024-07-17 13:08 UTC (permalink / raw)
  To: pbs-devel

by dropping the print-per-chunk and making the input buffer size configurable
(8k is the default when using `new()`).

this allows benchmarking various input buffer sizes. basically the same code is
used for image-based backups in proxmox-backup-client, but just the
reading and chunking part. looking at the flame graphs the smaller input
buffer sizes clearly show most of time spent polling, instead of
reading+copying (or reading and scanning and copying).

for a fixed chunk size stream with a 16G input file on tmpfs:

fixed 1M ran
    1.06 ± 0.17 times faster than fixed 4M
    1.22 ± 0.11 times faster than fixed 16M
    1.25 ± 0.09 times faster than fixed 512k
    1.31 ± 0.10 times faster than fixed 256k
    1.55 ± 0.13 times faster than fixed 128k
    1.92 ± 0.15 times faster than fixed 64k
    3.09 ± 0.31 times faster than fixed 32k
    4.76 ± 0.32 times faster than fixed 16k
    8.08 ± 0.59 times faster than fixed 8k

(from 15.275s down to 1.890s)

dynamic chunk stream, same input:

dynamic 4M ran
    1.01 ± 0.03 times faster than dynamic 1M
    1.03 ± 0.03 times faster than dynamic 16M
    1.06 ± 0.04 times faster than dynamic 512k
    1.07 ± 0.03 times faster than dynamic 128k
    1.12 ± 0.03 times faster than dynamic 64k
    1.15 ± 0.20 times faster than dynamic 256k
    1.23 ± 0.03 times faster than dynamic 32k
    1.47 ± 0.04 times faster than dynamic 16k
    1.92 ± 0.05 times faster than dynamic 8k

(from 26.5s down to 13.772s)

same input file on ext4 on LVM on CT2000P5PSSD8 (with caches dropped for each run):

fixed 4M ran
   1.06 ± 0.02 times faster than fixed 16M
   1.10 ± 0.01 times faster than fixed 1M
   1.12 ± 0.01 times faster than fixed 512k
   1.15 ± 0.02 times faster than fixed 128k
   1.17 ± 0.01 times faster than fixed 256k
   1.22 ± 0.02 times faster than fixed 64k
   1.55 ± 0.05 times faster than fixed 32k
   2.00 ± 0.07 times faster than fixed 16k
   3.01 ± 0.15 times faster than fixed 8k

(from 19.807s down to 6.574s)

dynamic 4M ran
    1.04 ± 0.02 times faster than dynamic 512k
    1.04 ± 0.02 times faster than dynamic 128k
    1.04 ± 0.02 times faster than dynamic 16M
    1.06 ± 0.02 times faster than dynamic 1M
    1.06 ± 0.02 times faster than dynamic 256k
    1.08 ± 0.02 times faster than dynamic 64k
    1.16 ± 0.02 times faster than dynamic 32k
    1.34 ± 0.03 times faster than dynamic 16k
    1.70 ± 0.04 times faster than dynamic 8k

(from 31.184s down to 18.378s)

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
 examples/test_chunk_speed2.rs | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/examples/test_chunk_speed2.rs b/examples/test_chunk_speed2.rs
index f2963746a..5ce08ac17 100644
--- a/examples/test_chunk_speed2.rs
+++ b/examples/test_chunk_speed2.rs
@@ -1,9 +1,12 @@
+use std::str::FromStr;
+
 use anyhow::Error;
 use futures::*;
 
 extern crate proxmox_backup;
 
-use pbs_client::ChunkStream;
+use pbs_client::{ChunkStream, FixedChunkStream};
+use proxmox_human_byte::HumanByte;
 
 // Test Chunker with real data read from a file.
 //
@@ -21,9 +24,19 @@ fn main() {
 async fn run() -> Result<(), Error> {
     let file = tokio::fs::File::open("random-test.dat").await?;
 
-    let stream = tokio_util::codec::FramedRead::new(file, tokio_util::codec::BytesCodec::new())
-        .map_ok(|bytes| bytes.to_vec())
-        .map_err(Error::from);
+    let mut args = std::env::args();
+    args.next();
+
+    let buffer_size = args.next().unwrap_or("8k".to_string());
+    let buffer_size = HumanByte::from_str(&buffer_size)?;
+    println!("Using buffer size {buffer_size}");
+
+    let stream = tokio_util::codec::FramedRead::with_capacity(
+        file,
+        tokio_util::codec::BytesCodec::new(),
+        buffer_size.as_u64() as usize,
+    )
+    .map_err(Error::from);
 
     //let chunk_stream = FixedChunkStream::new(stream, 4*1024*1024);
     let mut chunk_stream = ChunkStream::new(stream, None, None, None);
@@ -40,7 +53,7 @@ async fn run() -> Result<(), Error> {
         repeat += 1;
         stream_len += chunk.len();
 
-        println!("Got chunk {}", chunk.len());
+        //println!("Got chunk {}", chunk.len());
     }
 
     let speed =
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [pbs-devel] [RFC proxmox-backup 2/2] image backup: use 4M input buffer
  2024-07-17 13:08 [pbs-devel] [PATCH proxmox-backup 0/2] improve fixed size chunker performance Fabian Grünbichler
  2024-07-17 13:08 ` [pbs-devel] [PATCH proxmox-backup 1/2] example: improve chunking speed example Fabian Grünbichler
@ 2024-07-17 13:08 ` Fabian Grünbichler
  2024-07-19  7:28   ` Dietmar Maurer
  2024-07-22  8:03 ` [pbs-devel] applied: [PATCH proxmox-backup 0/2] improve fixed size chunker performance Thomas Lamprecht
  2 siblings, 1 reply; 5+ messages in thread
From: Fabian Grünbichler @ 2024-07-17 13:08 UTC (permalink / raw)
  To: pbs-devel

with the default 8k input buffer size, the client will spend most of the time
polling instead of reading/chunking/uploading.

tested with 16G random data file from tmpfs to fresh datastore backed by tmpfs,
without encryption.

stock:

Time (mean ± σ):     36.064 s ±  0.655 s    [User: 21.079 s, System: 26.415 s]
  Range (min … max):   35.663 s … 36.819 s    3 runs

patched:

 Time (mean ± σ):     23.591 s ±  0.807 s    [User: 16.532 s, System: 18.629 s]
  Range (min … max):   22.663 s … 24.125 s    3 runs

Summary
  patched ran
    1.53 ± 0.06 times faster than stock

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
obviously, this slightly increases memory usage..

the effect here is less pronounced than for the example because the actual
reading and chunking part is not all the client has to do for a backup - digest
calculation and TLS crypto make up the bulk of the rest..

 proxmox-backup-client/src/main.rs | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index 6a7d09047..5edb2a824 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -286,8 +286,12 @@ async fn backup_image<P: AsRef<Path>>(
 
     let file = tokio::fs::File::open(path).await?;
 
-    let stream = tokio_util::codec::FramedRead::new(file, tokio_util::codec::BytesCodec::new())
-        .map_err(Error::from);
+    let stream = tokio_util::codec::FramedRead::with_capacity(
+        file,
+        tokio_util::codec::BytesCodec::new(),
+        4 * 1024 * 1024,
+    )
+    .map_err(Error::from);
 
     let stream = FixedChunkStream::new(stream, chunk_size.unwrap_or(4 * 1024 * 1024));
 
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pbs-devel] [RFC proxmox-backup 2/2] image backup: use 4M input buffer
  2024-07-17 13:08 ` [pbs-devel] [RFC proxmox-backup 2/2] image backup: use 4M input buffer Fabian Grünbichler
@ 2024-07-19  7:28   ` Dietmar Maurer
  0 siblings, 0 replies; 5+ messages in thread
From: Dietmar Maurer @ 2024-07-19  7:28 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Fabian Grünbichler

I get the following from git blame:

db0cb9ce0 src/bin/proxmox-backup-client.rs  (Wolfgang Bumiller    2019-12-12 15:27:07 +0100  289)     let stream = tokio_util::codec::FramedRead::new(file, tokio_util::codec::BytesCodec::new())


So we comited that code in 2019, while FramedRead::with_capacity was
added later in 2020: 

https://github.com/tokio-rs/tokio/pull/2215

So I think that was the reason for using the small buffer.


> On 17.7.2024 15:08 CEST Fabian Grünbichler <f.gruenbichler@proxmox.com> wrote:
> 
>  
> with the default 8k input buffer size, the client will spend most of the time
> polling instead of reading/chunking/uploading.
> 
> tested with 16G random data file from tmpfs to fresh datastore backed by tmpfs,
> without encryption.
> 
> stock:
> 
> Time (mean ± σ):     36.064 s ±  0.655 s    [User: 21.079 s, System: 26.415 s]
>   Range (min … max):   35.663 s … 36.819 s    3 runs
> 
> patched:
> 
>  Time (mean ± σ):     23.591 s ±  0.807 s    [User: 16.532 s, System: 18.629 s]
>   Range (min … max):   22.663 s … 24.125 s    3 runs
> 
> Summary
>   patched ran
>     1.53 ± 0.06 times faster than stock
> 
> Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> ---
> obviously, this slightly increases memory usage..
> 
> the effect here is less pronounced than for the example because the actual
> reading and chunking part is not all the client has to do for a backup - digest
> calculation and TLS crypto make up the bulk of the rest..
> 
>  proxmox-backup-client/src/main.rs | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
> index 6a7d09047..5edb2a824 100644
> --- a/proxmox-backup-client/src/main.rs
> +++ b/proxmox-backup-client/src/main.rs
> @@ -286,8 +286,12 @@ async fn backup_image<P: AsRef<Path>>(
>  
>      let file = tokio::fs::File::open(path).await?;
>  
> -    let stream = tokio_util::codec::FramedRead::new(file, tokio_util::codec::BytesCodec::new())
> -        .map_err(Error::from);
> +    let stream = tokio_util::codec::FramedRead::with_capacity(
> +        file,
> +        tokio_util::codec::BytesCodec::new(),
> +        4 * 1024 * 1024,
> +    )
> +    .map_err(Error::from);
>  
>      let stream = FixedChunkStream::new(stream, chunk_size.unwrap_or(4 * 1024 * 1024));
>  
> -- 
> 2.39.2
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [pbs-devel] applied: [PATCH proxmox-backup 0/2] improve fixed size chunker performance
  2024-07-17 13:08 [pbs-devel] [PATCH proxmox-backup 0/2] improve fixed size chunker performance Fabian Grünbichler
  2024-07-17 13:08 ` [pbs-devel] [PATCH proxmox-backup 1/2] example: improve chunking speed example Fabian Grünbichler
  2024-07-17 13:08 ` [pbs-devel] [RFC proxmox-backup 2/2] image backup: use 4M input buffer Fabian Grünbichler
@ 2024-07-22  8:03 ` Thomas Lamprecht
  2 siblings, 0 replies; 5+ messages in thread
From: Thomas Lamprecht @ 2024-07-22  8:03 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Fabian Grünbichler

Am 17/07/2024 um 15:08 schrieb Fabian Grünbichler:
> found while looking for obvious bottlenecks - the fixed size chunker
> used a default 8k input buffer, which causes a lot of polling while
> copying data from the input into the 4M chunk.
> 
> Fabian Grünbichler (2):
>   example: improve chunking speed example
>   image backup: use 4M input buffer
> 
>  examples/test_chunk_speed2.rs     | 23 ++++++++++++++++++-----
>  proxmox-backup-client/src/main.rs |  8 ++++++--
>  2 files changed, 24 insertions(+), 7 deletions(-)
> 


for the record: this two patches got applied [0][1], thanks!

[0]: https://git.proxmox.com/?p=proxmox-backup.git;a=commitdiff;h=00ce0e38bd3bdb5f482c3b783ea3fa57fe8b8439
[1]: https://git.proxmox.com/?p=proxmox-backup.git;a=commitdiff;h=deb237a28883bba0584766129b01997ccd63c4fe


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-07-22  8:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-17 13:08 [pbs-devel] [PATCH proxmox-backup 0/2] improve fixed size chunker performance Fabian Grünbichler
2024-07-17 13:08 ` [pbs-devel] [PATCH proxmox-backup 1/2] example: improve chunking speed example Fabian Grünbichler
2024-07-17 13:08 ` [pbs-devel] [RFC proxmox-backup 2/2] image backup: use 4M input buffer Fabian Grünbichler
2024-07-19  7:28   ` Dietmar Maurer
2024-07-22  8:03 ` [pbs-devel] applied: [PATCH proxmox-backup 0/2] improve fixed size chunker performance Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal