all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Proxmox Backup Server development discussion
	<pbs-devel@lists.proxmox.com>,
	Dominik Csapak <d.csapak@proxmox.com>
Subject: [pbs-devel] applied: [PATCH proxmox-backup] backup/verify: improve speed by sorting chunks by inode
Date: Wed, 14 Apr 2021 17:42:08 +0200	[thread overview]
Message-ID: <52bb61ea-ad18-65d6-5e76-71a8a3a5047e@proxmox.com> (raw)
In-Reply-To: <20210413143536.19004-1-d.csapak@proxmox.com>

On 13.04.21 16:35, Dominik Csapak wrote:
> before reading the chunks from disk in the order of the index file,
> stat them first and sort them by inode number.
> 
> this can have a very positive impact on read speed on spinning disks,
> even with the additional stat'ing of the chunks.
> 
> memory footprint should be tolerable, for 1_000_000 chunks
> we need about ~16MiB of memory (Vec of 64bit position + 64bit inode)
> (assuming 4MiB Chunks, such an index would reference 4TiB of data)
> 
> two small benchmarks (single spinner, ext4) here showed an improvement from
> ~430 seconds to ~330 seconds for a 32GiB fixed index
> and from
> ~160 seconds to ~120 seconds for a 10GiB dynamic index
> 
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> it would be great if other people could also benchmark this patch on
> different setups a little (in addition to me), to verify or disprove my results
> 
>  src/backup/datastore.rs |  5 +++++
>  src/backup/verify.rs    | 32 +++++++++++++++++++++++++++++---
>  2 files changed, 34 insertions(+), 3 deletions(-)
> 
>

applied this for now, did already so before Fabians feedback.

Actually I had a a slight regression here too, but not as bad as Fabian reported
and also on a plain SSD-backed ext4, where I expected that the overhead of getting
the inodes out weights the advantages for storage which is already good at random IO.

I booted an older test-server and with lots of data and a more complex spinner
setup, lets see what that one reports.

Any how, we could, and probably should, make this a switch very easily, either as a
datastore option, or by checking the underlying storage - the latter is easy for single
disk storage (just check the rotational flag in /sys/block/...) but gets quickly ugly
with zfs/btrfs/... and the special devices they support.

If we further add such optimizations for sync (to remote and tape) then those would also
fall under that option-switch. Admins like to tune and this would give them a knob to
check what's best for a setup of theirs.




      parent reply	other threads:[~2021-04-14 15:42 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-13 14:35 [pbs-devel] " Dominik Csapak
2021-04-14 13:24 ` Fabian Grünbichler
2021-04-14 15:42 ` Thomas Lamprecht [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52bb61ea-ad18-65d6-5e76-71a8a3a5047e@proxmox.com \
    --to=t.lamprecht@proxmox.com \
    --cc=d.csapak@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal