* [pbs-devel] [PATCH proxmox-backup 1/2] example: improve chunking speed example
2024-07-17 13:08 [pbs-devel] [PATCH proxmox-backup 0/2] improve fixed size chunker performance Fabian Grünbichler
@ 2024-07-17 13:08 ` Fabian Grünbichler
2024-07-17 13:08 ` [pbs-devel] [RFC proxmox-backup 2/2] image backup: use 4M input buffer Fabian Grünbichler
2024-07-22 8:03 ` [pbs-devel] applied: [PATCH proxmox-backup 0/2] improve fixed size chunker performance Thomas Lamprecht
2 siblings, 0 replies; 5+ messages in thread
From: Fabian Grünbichler @ 2024-07-17 13:08 UTC (permalink / raw)
To: pbs-devel
by dropping the print-per-chunk and making the input buffer size configurable
(8k is the default when using `new()`).
this allows benchmarking various input buffer sizes. basically the same code is
used for image-based backups in proxmox-backup-client, but just the
reading and chunking part. looking at the flame graphs the smaller input
buffer sizes clearly show most of time spent polling, instead of
reading+copying (or reading and scanning and copying).
for a fixed chunk size stream with a 16G input file on tmpfs:
fixed 1M ran
1.06 ± 0.17 times faster than fixed 4M
1.22 ± 0.11 times faster than fixed 16M
1.25 ± 0.09 times faster than fixed 512k
1.31 ± 0.10 times faster than fixed 256k
1.55 ± 0.13 times faster than fixed 128k
1.92 ± 0.15 times faster than fixed 64k
3.09 ± 0.31 times faster than fixed 32k
4.76 ± 0.32 times faster than fixed 16k
8.08 ± 0.59 times faster than fixed 8k
(from 15.275s down to 1.890s)
dynamic chunk stream, same input:
dynamic 4M ran
1.01 ± 0.03 times faster than dynamic 1M
1.03 ± 0.03 times faster than dynamic 16M
1.06 ± 0.04 times faster than dynamic 512k
1.07 ± 0.03 times faster than dynamic 128k
1.12 ± 0.03 times faster than dynamic 64k
1.15 ± 0.20 times faster than dynamic 256k
1.23 ± 0.03 times faster than dynamic 32k
1.47 ± 0.04 times faster than dynamic 16k
1.92 ± 0.05 times faster than dynamic 8k
(from 26.5s down to 13.772s)
same input file on ext4 on LVM on CT2000P5PSSD8 (with caches dropped for each run):
fixed 4M ran
1.06 ± 0.02 times faster than fixed 16M
1.10 ± 0.01 times faster than fixed 1M
1.12 ± 0.01 times faster than fixed 512k
1.15 ± 0.02 times faster than fixed 128k
1.17 ± 0.01 times faster than fixed 256k
1.22 ± 0.02 times faster than fixed 64k
1.55 ± 0.05 times faster than fixed 32k
2.00 ± 0.07 times faster than fixed 16k
3.01 ± 0.15 times faster than fixed 8k
(from 19.807s down to 6.574s)
dynamic 4M ran
1.04 ± 0.02 times faster than dynamic 512k
1.04 ± 0.02 times faster than dynamic 128k
1.04 ± 0.02 times faster than dynamic 16M
1.06 ± 0.02 times faster than dynamic 1M
1.06 ± 0.02 times faster than dynamic 256k
1.08 ± 0.02 times faster than dynamic 64k
1.16 ± 0.02 times faster than dynamic 32k
1.34 ± 0.03 times faster than dynamic 16k
1.70 ± 0.04 times faster than dynamic 8k
(from 31.184s down to 18.378s)
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
examples/test_chunk_speed2.rs | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)
diff --git a/examples/test_chunk_speed2.rs b/examples/test_chunk_speed2.rs
index f2963746a..5ce08ac17 100644
--- a/examples/test_chunk_speed2.rs
+++ b/examples/test_chunk_speed2.rs
@@ -1,9 +1,12 @@
+use std::str::FromStr;
+
use anyhow::Error;
use futures::*;
extern crate proxmox_backup;
-use pbs_client::ChunkStream;
+use pbs_client::{ChunkStream, FixedChunkStream};
+use proxmox_human_byte::HumanByte;
// Test Chunker with real data read from a file.
//
@@ -21,9 +24,19 @@ fn main() {
async fn run() -> Result<(), Error> {
let file = tokio::fs::File::open("random-test.dat").await?;
- let stream = tokio_util::codec::FramedRead::new(file, tokio_util::codec::BytesCodec::new())
- .map_ok(|bytes| bytes.to_vec())
- .map_err(Error::from);
+ let mut args = std::env::args();
+ args.next();
+
+ let buffer_size = args.next().unwrap_or("8k".to_string());
+ let buffer_size = HumanByte::from_str(&buffer_size)?;
+ println!("Using buffer size {buffer_size}");
+
+ let stream = tokio_util::codec::FramedRead::with_capacity(
+ file,
+ tokio_util::codec::BytesCodec::new(),
+ buffer_size.as_u64() as usize,
+ )
+ .map_err(Error::from);
//let chunk_stream = FixedChunkStream::new(stream, 4*1024*1024);
let mut chunk_stream = ChunkStream::new(stream, None, None, None);
@@ -40,7 +53,7 @@ async fn run() -> Result<(), Error> {
repeat += 1;
stream_len += chunk.len();
- println!("Got chunk {}", chunk.len());
+ //println!("Got chunk {}", chunk.len());
}
let speed =
--
2.39.2
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* [pbs-devel] [RFC proxmox-backup 2/2] image backup: use 4M input buffer
2024-07-17 13:08 [pbs-devel] [PATCH proxmox-backup 0/2] improve fixed size chunker performance Fabian Grünbichler
2024-07-17 13:08 ` [pbs-devel] [PATCH proxmox-backup 1/2] example: improve chunking speed example Fabian Grünbichler
@ 2024-07-17 13:08 ` Fabian Grünbichler
2024-07-19 7:28 ` Dietmar Maurer
2024-07-22 8:03 ` [pbs-devel] applied: [PATCH proxmox-backup 0/2] improve fixed size chunker performance Thomas Lamprecht
2 siblings, 1 reply; 5+ messages in thread
From: Fabian Grünbichler @ 2024-07-17 13:08 UTC (permalink / raw)
To: pbs-devel
with the default 8k input buffer size, the client will spend most of the time
polling instead of reading/chunking/uploading.
tested with 16G random data file from tmpfs to fresh datastore backed by tmpfs,
without encryption.
stock:
Time (mean ± σ): 36.064 s ± 0.655 s [User: 21.079 s, System: 26.415 s]
Range (min … max): 35.663 s … 36.819 s 3 runs
patched:
Time (mean ± σ): 23.591 s ± 0.807 s [User: 16.532 s, System: 18.629 s]
Range (min … max): 22.663 s … 24.125 s 3 runs
Summary
patched ran
1.53 ± 0.06 times faster than stock
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
obviously, this slightly increases memory usage..
the effect here is less pronounced than for the example because the actual
reading and chunking part is not all the client has to do for a backup - digest
calculation and TLS crypto make up the bulk of the rest..
proxmox-backup-client/src/main.rs | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index 6a7d09047..5edb2a824 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -286,8 +286,12 @@ async fn backup_image<P: AsRef<Path>>(
let file = tokio::fs::File::open(path).await?;
- let stream = tokio_util::codec::FramedRead::new(file, tokio_util::codec::BytesCodec::new())
- .map_err(Error::from);
+ let stream = tokio_util::codec::FramedRead::with_capacity(
+ file,
+ tokio_util::codec::BytesCodec::new(),
+ 4 * 1024 * 1024,
+ )
+ .map_err(Error::from);
let stream = FixedChunkStream::new(stream, chunk_size.unwrap_or(4 * 1024 * 1024));
--
2.39.2
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 5+ messages in thread