public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Dominik Csapak <d.csapak@proxmox.com>
To: Thomas Lamprecht <t.lamprecht@proxmox.com>,
	Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [RFC PATCH pve-cluster] fix #5728: pmxcfs: allow bigger writes than 4k for fuse
Date: Thu, 19 Sep 2024 14:45:08 +0200	[thread overview]
Message-ID: <c6a05311-215e-4a5a-82ac-a1032fb331ec@proxmox.com> (raw)
In-Reply-To: <21f250b8-a59c-426d-96de-11606cbb0e42@proxmox.com>

On 9/19/24 14:01, Thomas Lamprecht wrote:
> Am 19/09/2024 um 11:52 schrieb Dominik Csapak:
>> by default libfuse2 limits writes to 4k size, which means that on writes
>> bigger than that, we do a whole write cycle for each 4k block that comes
>> in. To avoid that, add the option 'big_writes' to allow writes bigger
>> than 4k at once.
>>
>> This should improve pmxcfs performance for situations where we often
>> write large files (e.g. big ha status) and maybe reduce writes to disk.
> 
> Should? Something like before/after for benchmark numbers, flamegraphs
> would be really good to have, without those it's rather hard to discuss
> this, and I'd like to avoid having to do those, or check the inner workings
> of the affected fuse userspace/kernel code paths here myself.

well I mean the code change is relatively small and the result is rather clear:

in the current case we have the following calls from pmxcfs (shortened for e-mail)
when writing a single 128k block:
(dd if=... of=/etc/pve/test bs=128k count=1)

---
[main] debug: enter cfs_fuse_write /test 4096 0 (pmxcfs.c:355:cfs_fuse_write)
[main] debug: find_plug start test (pmxcfs.c:103:find_plug)
[main] debug: cfs_plug_base_lookup_plug test (cfs-plug.c:52:cfs_plug_base_lookup_plug)
[main] debug: cfs_plug_base_lookup_plug name = test new path = (null) 
(cfs-plug.c:59:cfs_plug_base_lookup_plug)
[main] debug: find_plug end test = 0x64319c3e0bc0 (test) (pmxcfs.c:110:find_plug)
[main] debug: enter cfs_plug_base_write test 4096 0 (cfs-plug.c:294:cfs_plug_base_write)
[main] debug: memdb_pwrite . test 0000000000DE88EC 0000000000DE88ED (memdb.c:850:memdb_pwrite)
[database] debug: enter backend_write_inode 0000000000DE88EC 'test', size 4096 
(database.c:157:backend_write_inode)
[database] debug: enter backend_write_inode 0000000000000000 '__version__', size 0 
(database.c:157:backend_write_inode)
[main] debug: leave cfs_fuse_write /test (4096) (pmxcfs.c:368:cfs_fuse_write)
[main] debug: enter cfs_fuse_write /test 4096 4096 (pmxcfs.c:355:cfs_fuse_write)
[main] debug: find_plug start test (pmxcfs.c:103:find_plug)
[main] debug: cfs_plug_base_lookup_plug test (cfs-plug.c:52:cfs_plug_base_lookup_plug)
[main] debug: cfs_plug_base_lookup_plug name = test new path = (null) 
(cfs-plug.c:59:cfs_plug_base_lookup_plug)
[main] debug: find_plug end test = 0x64319c3e0bc0 (test) (pmxcfs.c:110:find_plug)
[main] debug: enter cfs_plug_base_write test 4096 4096 (cfs-plug.c:294:cfs_plug_base_write)
[main] debug: memdb_pwrite . test 0000000000DE88EC 0000000000DE88EE (memdb.c:850:memdb_pwrite)
[database] debug: enter backend_write_inode 0000000000DE88EC 'test', size 8192 
(database.c:157:backend_write_inode)
[database] debug: enter backend_write_inode 0000000000000000 '__version__', size 0 
(database.c:157:backend_write_inode)
[main] debug: leave cfs_fuse_write /test (4096) (pmxcfs.c:368:cfs_fuse_write)
[main] debug: enter cfs_fuse_write /test 4096 8192 (pmxcfs.c:355:cfs_fuse_write)
---
and so on until we reach

---
[main] debug: enter cfs_fuse_write /test 4096 126976 (pmxcfs.c:355:cfs_fuse_write)
[main] debug: find_plug start test (pmxcfs.c:103:find_plug)
[main] debug: cfs_plug_base_lookup_plug test (cfs-plug.c:52:cfs_plug_base_lookup_plug)
[main] debug: cfs_plug_base_lookup_plug name = test new path = (null) 
(cfs-plug.c:59:cfs_plug_base_lookup_plug)
[main] debug: find_plug end test = 0x64319c3e0bc0 (test) (pmxcfs.c:110:find_plug)
[main] debug: enter cfs_plug_base_write test 4096 126976 (cfs-plug.c:294:cfs_plug_base_write)
[main] debug: memdb_pwrite . test 0000000000DE88EC 0000000000DE890C (memdb.c:850:memdb_pwrite)
[database] debug: enter backend_write_inode 0000000000DE88EC 'test', size 131072 
(database.c:157:backend_write_inode)
[database] debug: enter backend_write_inode 0000000000000000 '__version__', size 0 
(database.c:157:backend_write_inode)
[main] debug: leave cfs_fuse_write /test (4096) (pmxcfs.c:368:cfs_fuse_write)
---

with my changes it results in:

---
[main] debug: enter cfs_fuse_write /test 131072 0 (pmxcfs.c:355:cfs_fuse_write)
[main] debug: find_plug start test (pmxcfs.c:103:find_plug)
[main] debug: cfs_plug_base_lookup_plug test (cfs-plug.c:52:cfs_plug_base_lookup_plug)
[main] debug: cfs_plug_base_lookup_plug name = test new path = (null) 
(cfs-plug.c:59:cfs_plug_base_lookup_plug)
[main] debug: find_plug end test = 0x62f47945f360 (test) (pmxcfs.c:110:find_plug)
[main] debug: enter cfs_plug_base_write test 131072 0 (cfs-plug.c:294:cfs_plug_base_write)
[main] debug: memdb_pwrite . test 0000000000DE88EC 0000000000DE8AB8 (memdb.c:850:memdb_pwrite)
[database] debug: enter backend_write_inode 0000000000DE88EC 'test', size 131072 
(database.c:157:backend_write_inode)
[database] debug: enter backend_write_inode 0000000000000000 '__version__', size 0 
(database.c:157:backend_write_inode)
[main] debug: leave cfs_fuse_write /test (131072) (pmxcfs.c:368:cfs_fuse_write)
---

so a factor of 32 less calls to cfs_fuse_write (including memdb_pwrite)

> 
>>
>> If we'd change to libfuse3, this would be a non-issue, since that option
>> got removed and is the default there.
> 
> I'd prefer that. At least if done with the future PVE 9.0, as I do not think
> it's a good idea in the middle of a stable release cycle.

why not this change now, and the rewrite to libfuse3 later? that way we can
have some improvements now too...

> 
> FWIW, some Debian Devs also want to push the fuse 2 to 3 migration forward
> [0]. So, sooner or later we have to switch over pmxcfs anyway, and there's
> already a POC from Fabian [1].
> 
> [0]: https://lists.debian.org/debian-devel/2024/08/msg00177.html
> [1]: https://lore.proxmox.com/pve-devel/20220715074742.3387301-1-f.gruenbichler@proxmox.com/
> 
> 


sure I can look at that when we're nearing 9.0


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


  reply	other threads:[~2024-09-19 12:45 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-19  9:52 Dominik Csapak
2024-09-19 12:01 ` Thomas Lamprecht
2024-09-19 12:45   ` Dominik Csapak [this message]
2024-09-19 14:57     ` Thomas Lamprecht
2024-09-19 12:23 ` Esi Y via pve-devel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c6a05311-215e-4a5a-82ac-a1032fb331ec@proxmox.com \
    --to=d.csapak@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    --cc=t.lamprecht@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal