From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
Dominik Csapak <d.csapak@proxmox.com>
Subject: [pve-devel] applied: [PATCH cluster] fix #5728: pmxcfs: allow bigger writes than 4k for fuse
Date: Wed, 16 Oct 2024 18:58:04 +0200 [thread overview]
Message-ID: <24194371-1e25-4408-9c34-c8b2cdcaf813@proxmox.com> (raw)
In-Reply-To: <20241014100938.1288020-1-d.csapak@proxmox.com>
Am 14/10/2024 um 12:09 schrieb Dominik Csapak:
> by default libfuse2 limits writes to 4k size, which means that on writes
> bigger than that, we do a whole write cycle for each 4k block that comes
> in. To avoid that, add the option 'big_writes' to allow writes bigger
> than 4k at once (namely up to 128 KiB).
>
> This means that if we update a file with more than 4KiB data, the
> following pattern occurs:
>
> * cfs_fuse_write is called with at offset 0 with 4096 size
> * sqlite writes the partial file to disk since it's a transaction
> * cfs_fuse_write is called with an offset 4096 and with 4096 size
> * sqlite updates the data and writes again
> * repeat until all data reached cfs_fuse_write
>
> So when cfs_fuse_write accepts bigger chunks, we have less
> cfs_fuse_write -> sqlite write cycles, leading to a reduced disk
> activity.
>
> Note that sqlite itself uses 4096 byte blocks to write to the file
> system layer below.
>
> Most files on pmxcfs are written with `file_set_contents`, which writes
> the file into a tmp file and renames it, so we always have some write
> overhead.
>
> Previous to pve-common commit
> ef0bcc9 (tools: file_set_contents: use syswrite instead of print)
>
> it used `print` to write, which uses an internal 8k buffer, and after
> the commit it uses `syswrite`, which writes the file unbuffered in one
> go. (Fuse still splits writes at it's defined maximum)
>
> The commit message of that patch includes benchmarks for various sizes
> of writes on pmxcfs with this patch included. Results show that we can
> reduce the amount of bytes written to disk for files larger than 4 KiB
> by a significant amount (with both patches we can reduce the
> amplification at 8KiB from ~15x to ~11x, and for 1024KiB from ~360x to
> ~15x)
>
> When we change to libfuse3, we have to remove this option again, since
> it got removed and is the default there.
>
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> changes from rfc:
> * improve commit message to contain more detail and reference
> Filips commit that improves `file_set_contents`
> * add a comment above the option to remove it with change to libfuse3
>
> src/pmxcfs/pmxcfs.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>
applied, many thanks for the elaborate commit message!
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
prev parent reply other threads:[~2024-10-16 16:58 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-14 10:09 [pve-devel] " Dominik Csapak
2024-10-16 16:58 ` Thomas Lamprecht [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=24194371-1e25-4408-9c34-c8b2cdcaf813@proxmox.com \
--to=t.lamprecht@proxmox.com \
--cc=d.csapak@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.