From: Dominik Csapak <d.csapak@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: Re: [pve-devel] [RFC PATCH v2 proxmox-backup-qemu] restore: make chunk loading more parallel
Date: Fri, 11 Jul 2025 10:21:45 +0200 [thread overview]
Message-ID: <82e82fdf-5010-4a70-af5d-35c0a0826c0c@proxmox.com> (raw)
In-Reply-To: <650a97d5-d8e8-4afb-8450-83254f398bb2@proxmox.com>
On 7/10/25 14:48, Dominik Csapak wrote:
[snip]
>
> Just for the record i also benchmarked a slower system here:
> 6x16 TiB spinners in raid-10 with nvme special devices
> over a 2.5 g link:
>
> current approach is ~61 MiB/s restore speed
> with my patch it's ~160MiB/s restore speed with not much increase
> in cpu time (both were under 30% of a single core)
>
> Also did perf stat for those to compare how much overhead the additional futures/async/await
> brings:
>
>
> first restore:
>
> 62,871.24 msec task-clock # 0.115 CPUs utilized
> 878,151 context-switches # 13.967 K/sec
> 28,205 cpu-migrations # 448.615 /sec
> 519,396 page-faults # 8.261 K/sec
> 277,239,999,474 cpu_core/cycles/ # 4.410 G/sec (89.20%)
> 190,782,860,504 cpu_atom/cycles/ # 3.035 G/sec (10.80%)
> 482,534,267,606 cpu_core/instructions/ # 7.675 G/sec (89.20%)
> 188,659,352,613 cpu_atom/instructions/ # 3.001 G/sec (10.80%)
> 46,913,925,346 cpu_core/branches/ # 746.191 M/sec (89.20%)
> 19,251,496,445 cpu_atom/branches/ # 306.205 M/sec (10.80%)
> 904,032,529 cpu_core/branch-misses/ # 14.379 M/sec (89.20%)
> 621,228,739 cpu_atom/branch-misses/ # 9.881 M/sec (10.80%)
> 1,633,142,624,469 cpu_core/slots/ # 25.976 G/sec (89.20%)
> 489,311,603,992 cpu_core/topdown-retiring/ # 29.7% Retiring (89.20%)
> 97,617,585,755 cpu_core/topdown-bad-spec/ # 5.9% Bad Speculation (89.20%)
> 317,074,236,582 cpu_core/topdown-fe-bound/ # 19.2% Frontend Bound (89.20%)
> 745,485,954,022 cpu_core/topdown-be-bound/ # 45.2% Backend Bound (89.20%)
> 57,463,995,650 cpu_core/topdown-heavy-ops/ # 3.5% Heavy Operations # 26.2%
> Light Operations (89.20%)
> 88,333,173,745 cpu_core/topdown-br-mispredict/ # 5.4% Branch Mispredict # 0.6%
> Machine Clears (89.20%)
> 217,424,427,912 cpu_core/topdown-fetch-lat/ # 13.2% Fetch Latency # 6.0%
> Fetch Bandwidth (89.20%)
> 354,250,103,398 cpu_core/topdown-mem-bound/ # 21.5% Memory Bound # 23.7%
> Core Bound (89.20%)
>
>
> 548.195368256 seconds time elapsed
>
>
> 44.493218000 seconds user
> 21.315124000 seconds sys
>
> second restore:
>
> 67,908.11 msec task-clock # 0.297 CPUs utilized
> 856,402 context-switches # 12.611 K/sec
> 46,539 cpu-migrations # 685.323 /sec
> 942,002 page-faults # 13.872 K/sec
> 300,757,558,837 cpu_core/cycles/ # 4.429 G/sec (75.93%)
> 234,595,451,063 cpu_atom/cycles/ # 3.455 G/sec (24.07%)
> 511,747,593,432 cpu_core/instructions/ # 7.536 G/sec (75.93%)
> 289,348,171,298 cpu_atom/instructions/ # 4.261 G/sec (24.07%)
> 49,993,266,992 cpu_core/branches/ # 736.190 M/sec (75.93%)
> 29,624,743,216 cpu_atom/branches/ # 436.248 M/sec (24.07%)
> 911,770,988 cpu_core/branch-misses/ # 13.427 M/sec (75.93%)
> 811,321,806 cpu_atom/branch-misses/ # 11.947 M/sec (24.07%)
> 1,788,660,631,633 cpu_core/slots/ # 26.339 G/sec (75.93%)
> 569,029,214,725 cpu_core/topdown-retiring/ # 31.4% Retiring (75.93%)
> 125,815,987,213 cpu_core/topdown-bad-spec/ # 6.9% Bad Speculation (75.93%)
> 234,249,755,030 cpu_core/topdown-fe-bound/ # 12.9% Frontend Bound (75.93%)
> 885,539,445,254 cpu_core/topdown-be-bound/ # 48.8% Backend Bound (75.93%)
> 86,825,030,719 cpu_core/topdown-heavy-ops/ # 4.8% Heavy Operations # 26.6%
> Light Operations (75.93%)
> 116,566,866,551 cpu_core/topdown-br-mispredict/ # 6.4% Branch Mispredict # 0.5%
> Machine Clears (75.93%)
> 135,276,276,904 cpu_core/topdown-fetch-lat/ # 7.5% Fetch Latency # 5.5%
> Fetch Bandwidth (75.93%)
> 409,898,741,185 cpu_core/topdown-mem-bound/ # 22.6% Memory Bound # 26.2%
> Core Bound (75.93%)
>
>
> 228.528573197 seconds time elapsed
>
>
> 48.379229000 seconds user
> 21.779166000 seconds sys
>
>
> so the overhead for the additional futures was ~8% in cycles, ~6% in instructions
> which does not seem too bad
>
addendum:
the tests above did sadly run into a network limit of ~600MBit/s (still
trying to figure out where the bottleneck in the network is...)
tested again from a different machine that has a 10G link to the pbs mentioned above.
This time i restored to the 'null-co' driver from qemu since the target storage was too slow....
anyways, the results are:
current code: restore ~75MiB/s
16 way parallel: ~528MiB/s (7x !)
cpu usage went up from <50% of one core to ~350% (like in my initial tests with a different setup)
perf stat output below:
current:
183,534.85 msec task-clock # 0.409 CPUs utilized
117,267 context-switches # 638.936 /sec
700 cpu-migrations # 3.814 /sec
462,432 page-faults # 2.520 K/sec
468,609,612,840 cycles # 2.553 GHz
1,286,188,699,253 instructions # 2.74 insn per cycle
41,342,312,275 branches # 225.256 M/sec
846,432,249 branch-misses # 2.05% of all branches
448.965517535 seconds time elapsed
152.007611000 seconds user
32.189942000 seconds sys
16 way parallel:
228,583.26 msec task-clock # 3.545 CPUs utilized
114,575 context-switches # 501.240 /sec
6,028 cpu-migrations # 26.371 /sec
1,561,179 page-faults # 6.830 K/sec
510,861,534,387 cycles # 2.235 GHz
1,296,819,542,686 instructions # 2.54 insn per cycle
43,202,234,699 branches # 189.000 M/sec
828,196,795 branch-misses # 1.92% of all branches
64.482868654 seconds time elapsed
184.172759000 seconds user
44.560342000 seconds sys
so still about ~8% more cycles, about the same amount of instructions but in much less time
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
prev parent reply other threads:[~2025-07-11 8:21 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-08 8:49 [pbs-devel] " Dominik Csapak
2025-07-08 8:52 ` Dominik Csapak
2025-07-08 8:52 ` [pve-devel] " Dominik Csapak
2025-07-08 9:03 ` [pbs-devel] [pve-devel] " Thomas Lamprecht
2025-07-08 9:03 ` [pve-devel] [pbs-devel] " Thomas Lamprecht
2025-07-08 10:04 ` [pve-devel] " Adam Kalisz via pve-devel
[not found] ` <9dc2c099169ee1ed64c274d64cc0a1c19f9f6c92.camel@notnullmakers.com>
2025-07-08 10:58 ` [pbs-devel] " Dominik Csapak
2025-07-08 10:58 ` [pve-devel] " Dominik Csapak
2025-07-08 15:08 ` Adam Kalisz via pve-devel
[not found] ` <a148f2105e2e3e453e5503be61cf6ae0f28d0eba.camel@notnullmakers.com>
2025-07-10 12:48 ` Dominik Csapak
2025-07-11 8:21 ` Dominik Csapak [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=82e82fdf-5010-4a70-af5d-35c0a0826c0c@proxmox.com \
--to=d.csapak@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.