From: Dominik Csapak <d.csapak@proxmox.com>
To: Thomas Lamprecht <t.lamprecht@proxmox.com>,
Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [RFC PATCH pve-cluster] fix #5728: pmxcfs: allow bigger writes than 4k for fuse
Date: Thu, 19 Sep 2024 14:45:08 +0200 [thread overview]
Message-ID: <c6a05311-215e-4a5a-82ac-a1032fb331ec@proxmox.com> (raw)
In-Reply-To: <21f250b8-a59c-426d-96de-11606cbb0e42@proxmox.com>
On 9/19/24 14:01, Thomas Lamprecht wrote:
> Am 19/09/2024 um 11:52 schrieb Dominik Csapak:
>> by default libfuse2 limits writes to 4k size, which means that on writes
>> bigger than that, we do a whole write cycle for each 4k block that comes
>> in. To avoid that, add the option 'big_writes' to allow writes bigger
>> than 4k at once.
>>
>> This should improve pmxcfs performance for situations where we often
>> write large files (e.g. big ha status) and maybe reduce writes to disk.
>
> Should? Something like before/after for benchmark numbers, flamegraphs
> would be really good to have, without those it's rather hard to discuss
> this, and I'd like to avoid having to do those, or check the inner workings
> of the affected fuse userspace/kernel code paths here myself.
well I mean the code change is relatively small and the result is rather clear:
in the current case we have the following calls from pmxcfs (shortened for e-mail)
when writing a single 128k block:
(dd if=... of=/etc/pve/test bs=128k count=1)
---
[main] debug: enter cfs_fuse_write /test 4096 0 (pmxcfs.c:355:cfs_fuse_write)
[main] debug: find_plug start test (pmxcfs.c:103:find_plug)
[main] debug: cfs_plug_base_lookup_plug test (cfs-plug.c:52:cfs_plug_base_lookup_plug)
[main] debug: cfs_plug_base_lookup_plug name = test new path = (null)
(cfs-plug.c:59:cfs_plug_base_lookup_plug)
[main] debug: find_plug end test = 0x64319c3e0bc0 (test) (pmxcfs.c:110:find_plug)
[main] debug: enter cfs_plug_base_write test 4096 0 (cfs-plug.c:294:cfs_plug_base_write)
[main] debug: memdb_pwrite . test 0000000000DE88EC 0000000000DE88ED (memdb.c:850:memdb_pwrite)
[database] debug: enter backend_write_inode 0000000000DE88EC 'test', size 4096
(database.c:157:backend_write_inode)
[database] debug: enter backend_write_inode 0000000000000000 '__version__', size 0
(database.c:157:backend_write_inode)
[main] debug: leave cfs_fuse_write /test (4096) (pmxcfs.c:368:cfs_fuse_write)
[main] debug: enter cfs_fuse_write /test 4096 4096 (pmxcfs.c:355:cfs_fuse_write)
[main] debug: find_plug start test (pmxcfs.c:103:find_plug)
[main] debug: cfs_plug_base_lookup_plug test (cfs-plug.c:52:cfs_plug_base_lookup_plug)
[main] debug: cfs_plug_base_lookup_plug name = test new path = (null)
(cfs-plug.c:59:cfs_plug_base_lookup_plug)
[main] debug: find_plug end test = 0x64319c3e0bc0 (test) (pmxcfs.c:110:find_plug)
[main] debug: enter cfs_plug_base_write test 4096 4096 (cfs-plug.c:294:cfs_plug_base_write)
[main] debug: memdb_pwrite . test 0000000000DE88EC 0000000000DE88EE (memdb.c:850:memdb_pwrite)
[database] debug: enter backend_write_inode 0000000000DE88EC 'test', size 8192
(database.c:157:backend_write_inode)
[database] debug: enter backend_write_inode 0000000000000000 '__version__', size 0
(database.c:157:backend_write_inode)
[main] debug: leave cfs_fuse_write /test (4096) (pmxcfs.c:368:cfs_fuse_write)
[main] debug: enter cfs_fuse_write /test 4096 8192 (pmxcfs.c:355:cfs_fuse_write)
---
and so on until we reach
---
[main] debug: enter cfs_fuse_write /test 4096 126976 (pmxcfs.c:355:cfs_fuse_write)
[main] debug: find_plug start test (pmxcfs.c:103:find_plug)
[main] debug: cfs_plug_base_lookup_plug test (cfs-plug.c:52:cfs_plug_base_lookup_plug)
[main] debug: cfs_plug_base_lookup_plug name = test new path = (null)
(cfs-plug.c:59:cfs_plug_base_lookup_plug)
[main] debug: find_plug end test = 0x64319c3e0bc0 (test) (pmxcfs.c:110:find_plug)
[main] debug: enter cfs_plug_base_write test 4096 126976 (cfs-plug.c:294:cfs_plug_base_write)
[main] debug: memdb_pwrite . test 0000000000DE88EC 0000000000DE890C (memdb.c:850:memdb_pwrite)
[database] debug: enter backend_write_inode 0000000000DE88EC 'test', size 131072
(database.c:157:backend_write_inode)
[database] debug: enter backend_write_inode 0000000000000000 '__version__', size 0
(database.c:157:backend_write_inode)
[main] debug: leave cfs_fuse_write /test (4096) (pmxcfs.c:368:cfs_fuse_write)
---
with my changes it results in:
---
[main] debug: enter cfs_fuse_write /test 131072 0 (pmxcfs.c:355:cfs_fuse_write)
[main] debug: find_plug start test (pmxcfs.c:103:find_plug)
[main] debug: cfs_plug_base_lookup_plug test (cfs-plug.c:52:cfs_plug_base_lookup_plug)
[main] debug: cfs_plug_base_lookup_plug name = test new path = (null)
(cfs-plug.c:59:cfs_plug_base_lookup_plug)
[main] debug: find_plug end test = 0x62f47945f360 (test) (pmxcfs.c:110:find_plug)
[main] debug: enter cfs_plug_base_write test 131072 0 (cfs-plug.c:294:cfs_plug_base_write)
[main] debug: memdb_pwrite . test 0000000000DE88EC 0000000000DE8AB8 (memdb.c:850:memdb_pwrite)
[database] debug: enter backend_write_inode 0000000000DE88EC 'test', size 131072
(database.c:157:backend_write_inode)
[database] debug: enter backend_write_inode 0000000000000000 '__version__', size 0
(database.c:157:backend_write_inode)
[main] debug: leave cfs_fuse_write /test (131072) (pmxcfs.c:368:cfs_fuse_write)
---
so a factor of 32 less calls to cfs_fuse_write (including memdb_pwrite)
>
>>
>> If we'd change to libfuse3, this would be a non-issue, since that option
>> got removed and is the default there.
>
> I'd prefer that. At least if done with the future PVE 9.0, as I do not think
> it's a good idea in the middle of a stable release cycle.
why not this change now, and the rewrite to libfuse3 later? that way we can
have some improvements now too...
>
> FWIW, some Debian Devs also want to push the fuse 2 to 3 migration forward
> [0]. So, sooner or later we have to switch over pmxcfs anyway, and there's
> already a POC from Fabian [1].
>
> [0]: https://lists.debian.org/debian-devel/2024/08/msg00177.html
> [1]: https://lore.proxmox.com/pve-devel/20220715074742.3387301-1-f.gruenbichler@proxmox.com/
>
>
sure I can look at that when we're nearing 9.0
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
next prev parent reply other threads:[~2024-09-19 12:45 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-19 9:52 Dominik Csapak
2024-09-19 12:01 ` Thomas Lamprecht
2024-09-19 12:45 ` Dominik Csapak [this message]
2024-09-19 14:57 ` Thomas Lamprecht
2024-09-20 4:04 ` Esi Y via pve-devel
2024-09-20 5:29 ` Esi Y via pve-devel
[not found] ` <CABtLnHoQFAUN0KcahbMF6hoX=WTfL8bHL0St77gQMSaojVGhBA@mail.gmail.com>
2024-09-20 7:32 ` Dominik Csapak
2024-09-20 6:16 ` Dominik Csapak
2024-09-22 9:25 ` Dietmar Maurer
2024-09-23 9:17 ` Dominik Csapak
2024-09-23 11:48 ` Filip Schauer
2024-09-23 14:08 ` Filip Schauer
2024-09-23 12:00 ` Friedrich Weber
2024-09-23 12:03 ` Dominik Csapak
2024-09-19 12:23 ` Esi Y via pve-devel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c6a05311-215e-4a5a-82ac-a1032fb331ec@proxmox.com \
--to=d.csapak@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
--cc=t.lamprecht@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.