public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
	Dominik Csapak <d.csapak@proxmox.com>
Subject: [pve-devel] applied: [PATCH cluster] fix #5728: pmxcfs: allow bigger writes than 4k for fuse
Date: Wed, 16 Oct 2024 18:58:04 +0200	[thread overview]
Message-ID: <24194371-1e25-4408-9c34-c8b2cdcaf813@proxmox.com> (raw)
In-Reply-To: <20241014100938.1288020-1-d.csapak@proxmox.com>

Am 14/10/2024 um 12:09 schrieb Dominik Csapak:
> by default libfuse2 limits writes to 4k size, which means that on writes
> bigger than that, we do a whole write cycle for each 4k block that comes
> in. To avoid that, add the option 'big_writes' to allow writes bigger
> than 4k at once (namely up to 128 KiB).
> 
> This means that if we update a file with more than 4KiB data, the
> following pattern occurs:
> 
> * cfs_fuse_write is called with at offset 0 with 4096 size
> * sqlite writes the partial file to disk since it's a transaction
> * cfs_fuse_write is called with an offset 4096 and with 4096 size
> * sqlite updates the data and writes again
> * repeat until all data reached cfs_fuse_write
> 
> So when cfs_fuse_write accepts bigger chunks, we have less
> cfs_fuse_write -> sqlite write cycles, leading to a reduced disk
> activity.
> 
> Note that sqlite itself uses 4096 byte blocks to write to the file
> system layer below.
> 
> Most files on pmxcfs are written with `file_set_contents`, which writes
> the file into a tmp file and renames it, so we always have some write
> overhead.
> 
> Previous to pve-common commit
> ef0bcc9 (tools: file_set_contents: use syswrite instead of print)
> 
> it used `print` to write, which uses an internal 8k buffer, and after
> the commit it uses `syswrite`, which writes the file unbuffered in one
> go. (Fuse still splits writes at it's defined maximum)
> 
> The commit message of that patch includes benchmarks for various sizes
> of writes on pmxcfs with this patch included. Results show that we can
> reduce the amount of bytes written to disk for files larger than 4 KiB
> by a significant amount (with both patches we can reduce the
> amplification at 8KiB from ~15x to ~11x, and for 1024KiB from ~360x to
> ~15x)
> 
> When we change to libfuse3, we have to remove this option again, since
> it got removed and is the default there.
> 
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> changes from rfc:
> * improve commit message to contain more detail and reference
>   Filips commit that improves `file_set_contents`
> * add a comment above the option to remove it with change to libfuse3
> 
>  src/pmxcfs/pmxcfs.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
>

applied, many thanks for the elaborate commit message!


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


      reply	other threads:[~2024-10-16 16:58 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-14 10:09 [pve-devel] " Dominik Csapak
2024-10-16 16:58 ` Thomas Lamprecht [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=24194371-1e25-4408-9c34-c8b2cdcaf813@proxmox.com \
    --to=t.lamprecht@proxmox.com \
    --cc=d.csapak@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal