public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH qemu-server 0/1] increase qemu max openfiles limit
@ 2023-12-10 14:49 Alexandre Derumier
  2023-12-10 14:49 ` [pve-devel] [PATCH qemu-server 1/1] fix #4507 : " Alexandre Derumier
  0 siblings, 1 reply; 8+ messages in thread
From: Alexandre Derumier @ 2023-12-10 14:49 UTC (permalink / raw)
  To: pve-devel

Hi,

The current qemu max openfile limit is 1024.

This is really to low it you use ceph storage through librbd.

For each disk, qemu process is doing 1 tcp connection to each osd.

I have trigger a bug this week, a vm with 6 disk and a cluster with 100 osd,

At boot vm was working fine, but after some time, vm begin to do random disk access
timeout. The number of connection was around 600~700.
Also the qemu monitor/qemu-agent was hanging too.

Other users have also reported this bug:

https://forum.proxmox.com/threads/vm-qmp-command-failed-vm-qmp-command-query-proxmox-support-failed.90160/#post-613685
https://forum.proxmox.com/threads/qemu-crash-with-vzdump.131603/
https://bugzilla.proxmox.com/show_bug.cgi?id=4507#c1


This patch use prlimit command to increase the limit after vm start.
I don't have found a way to increase it in the qemu scope directly with LimitNOFILE.



Alternative fix is to increase max openfile globally with

if vm is launched through gui:
/etc/systemd/system.conf.d/max-open-files.conf
[Manager]
DefaultLimitNOFILE=524289:524289

if vm is launched through ssh with qm:
/etc/security/limits.d/10-max-open-files.conf
root                -       nofile          524289


But maybe users could already have tuned it for containers,
so I think it's better to  only change limit for qemu process.




Alexandre Derumier (1):
  fix #4507 : increase qemu max openfiles limit

 PVE/QemuServer.pm | 6 ++++++
 1 file changed, 6 insertions(+)

-- 
2.39.2




^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] [PATCH qemu-server 1/1] fix #4507 : increase qemu max openfiles limit
  2023-12-10 14:49 [pve-devel] [PATCH qemu-server 0/1] increase qemu max openfiles limit Alexandre Derumier
@ 2023-12-10 14:49 ` Alexandre Derumier
  2023-12-11  9:46   ` Fiona Ebner
  0 siblings, 1 reply; 8+ messages in thread
From: Alexandre Derumier @ 2023-12-10 14:49 UTC (permalink / raw)
  To: pve-devel

---
 PVE/QemuServer.pm | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 2063e66..eba26f3 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -5991,6 +5991,12 @@ sub vm_start_nolock {
     eval { PVE::QemuServer::PCI::reserve_pci_usage($pci_reserve_list, $vmid, undef, $pid) };
     warn $@ if $@;
 
+    #increase OpenFile limit
+    eval {
+	run_command(['/usr/bin/prlimit', '--nofile=524288:524288',  "--pid", $pid]);
+    };
+    warn $@ if $@;
+
     if (defined($res->{migrate})) {
 	print "migration listens on $res->{migrate}->{uri}\n";
     } elsif ($statefile) {
-- 
2.39.2




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pve-devel] [PATCH qemu-server 1/1] fix #4507 : increase qemu max openfiles limit
  2023-12-10 14:49 ` [pve-devel] [PATCH qemu-server 1/1] fix #4507 : " Alexandre Derumier
@ 2023-12-11  9:46   ` Fiona Ebner
  2023-12-11 16:29     ` DERUMIER, Alexandre
  0 siblings, 1 reply; 8+ messages in thread
From: Fiona Ebner @ 2023-12-11  9:46 UTC (permalink / raw)
  To: Proxmox VE development discussion, Alexandre Derumier

Much of the cover letter would be nice to have as a commit message here.

Missing your Signed-off-by.

Am 10.12.23 um 15:49 schrieb Alexandre Derumier:
> ---
>  PVE/QemuServer.pm | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 2063e66..eba26f3 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -5991,6 +5991,12 @@ sub vm_start_nolock {
>      eval { PVE::QemuServer::PCI::reserve_pci_usage($pci_reserve_list, $vmid, undef, $pid) };
>      warn $@ if $@;
>  
> +    #increase OpenFile limit

Nit: why is "OpenFile" capitalized like this? I'd rather put the bug
number in the comment, because the "what" is rather obvious while the
"why" is less so.

> +    eval {
> +	run_command(['/usr/bin/prlimit', '--nofile=524288:524288',  "--pid", $pid]);
> +    };

Some short mention that this is the default hard limit would be nice.

We don't usually write an absolute path to the binary with run_command.

> +    warn $@ if $@;
> +

Please use log_warn() so it will be visible as a task warning too.

>      if (defined($res->{migrate})) {
>  	print "migration listens on $res->{migrate}->{uri}\n";
>      } elsif ($statefile) {

I thought: can't we instead do this as part of setting up the systemd
scope/qemu.slice with LimitNOFILE?

But that would also affect the swtpm process spawned there :/ and the
systemd.exec man page says:

>            │LimitNOFILE=     │ ulimit -n         │ Number of File Descriptors │ Don't use. Be careful when raising the soft limit above 1024, since select() cannot │
>            │                 │                   │                            │ function with file descriptors above 1023 on Linux. Nowadays, the hard limit        │
>            │                 │                   │                            │ defaults to 524288, a very high value compared to historical defaults. Typically    │
>            │                 │                   │                            │ applications should increase their soft limit to the hard limit on their own, if    │
>            │                 │                   │                            │ they are OK with working with file descriptors above 1023, i.e. do not use          │
>            │                 │                   │                            │ select(). Note that file descriptors are nowadays accounted like any other form of  │
>            │                 │                   │                            │ memory, thus there should not be any need to lower the hard limit. Use MemoryMax=   │
>            │                 │                   │                            │ to control overall service memory use, including file descriptor memory.            │

So not sure if that's really nicer. This suggests QEMU should raise the
limit itself.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pve-devel] [PATCH qemu-server 1/1] fix #4507 : increase qemu max openfiles limit
  2023-12-11  9:46   ` Fiona Ebner
@ 2023-12-11 16:29     ` DERUMIER, Alexandre
  2023-12-12  9:21       ` Fiona Ebner
  0 siblings, 1 reply; 8+ messages in thread
From: DERUMIER, Alexandre @ 2023-12-11 16:29 UTC (permalink / raw)
  To: pve-devel, aderumier, f.ebner



>>I thought: can't we instead do this as part of setting up the systemd
>>scope/qemu.slice with LimitNOFILE?

I have tried (see my cover-letter ;), I can't get it work.


"start failed: org.freedesktop.DBus.Error.PropertyReadOnly: Cannot set
property LimitNOFILE, or unknown property.
"

I don't seem to be available in scope object
https://www.freedesktop.org/wiki/Software/systemd/dbus/



>>But that would also affect the swtpm process spawned there :/

Yes, not sure about the impact on other process


 and the systemd.exec man page says:

>            │LimitNOFILE=     │ ulimit -n         │ Number of File
> Descriptors │ Don't use. Be careful when raising the soft limit above
> 1024, since select() cannot │
>            │                 │                  
> │                            │ function with file descriptors above
> 1023 on Linux. Nowadays, the hard limit        │
>            │                 │                  
> │                            │ defaults to 524288, a very high value
> compared to historical defaults. Typically    │
>            │                 │                  
> │                            │ applications should increase their
> soft limit to the hard limit on their own, if    │
>            │                 │                  
> │                            │ they are OK with working with file
> descriptors above 1023, i.e. do not use          │
>            │                 │                  
> │                            │ select(). Note that file descriptors
> are nowadays accounted like any other form of  │
>            │                 │                  
> │                            │ memory, thus there should not be any
> need to lower the hard limit. Use MemoryMax=   │
>            │                 │                  
> │                            │ to control overall service memory use,
> including file descriptor memory.            │

>>So not sure if that's really nicer. This suggests QEMU should raise
>>the
>>limit itself.

Yes, but it don't raise the limit :/ But it's really working with more
than 1024 file descriptor.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pve-devel] [PATCH qemu-server 1/1] fix #4507 : increase qemu max openfiles limit
  2023-12-11 16:29     ` DERUMIER, Alexandre
@ 2023-12-12  9:21       ` Fiona Ebner
  2023-12-12 11:10         ` DERUMIER, Alexandre
  2023-12-12 11:55         ` Thomas Lamprecht
  0 siblings, 2 replies; 8+ messages in thread
From: Fiona Ebner @ 2023-12-12  9:21 UTC (permalink / raw)
  To: DERUMIER, Alexandre, pve-devel, aderumier

Am 11.12.23 um 17:29 schrieb DERUMIER, Alexandre:
>>> So not sure if that's really nicer. This suggests QEMU should raise
>>> the
>>> limit itself.
> 
> Yes, but it don't raise the limit :/ But it's really working with more
> than 1024 file descriptor.
> 

From a quick look, I didn't find an option to pass along QEMU for this,
so it would likely need to be implemented first/discussed with upstream.
But thinking about it a bit, it feels wrong for each application to be
responsible for raising the limit itself. Sure the application needs to
avoid using select(), but IMHO such limits are better set from the
outside. I think your approach is fine, can you please send a v2 with a
Signed-off-by and my nits addressed?

I mean, ideally select() would just change to allow getting rid of the
low default limit, but the man page says it won't, so ¯\_(ツ)_/¯:

>        WARNING:  select()  can  monitor only file descriptors numbers that are less than FD_SETSIZE (1024)—an unreasonably low limit for
>        many modern applications—and this limitation will not change.  All modern applications should instead use  poll(2)  or  epoll(7),
>        which do not suffer this limitation.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pve-devel] [PATCH qemu-server 1/1] fix #4507 : increase qemu max openfiles limit
  2023-12-12  9:21       ` Fiona Ebner
@ 2023-12-12 11:10         ` DERUMIER, Alexandre
  2023-12-12 11:55         ` Thomas Lamprecht
  1 sibling, 0 replies; 8+ messages in thread
From: DERUMIER, Alexandre @ 2023-12-12 11:10 UTC (permalink / raw)
  To: pve-devel, aderumier, f.ebner

> 

>>From a quick look, I didn't find an option to pass along QEMU for
>>this,
>>so it would likely need to be implemented first/discussed with
>>upstream.
>>But thinking about it a bit, it feels wrong for each application to
>>be
>>responsible for raising the limit itself. Sure the application needs
>>to
>>avoid using select(), but IMHO such limits are better set from the
>>outside. 
>>I think your approach is fine, can you please send a v2 with a
>>Signed-off-by and my nits addressed?

yes, sure ! I'll send a patch this afternoon.

>>I mean, ideally select() would just change to allow getting rid of
>>the
>>low default limit, but the man page says it won't, so ¯\_(ツ)_/¯:

Yes, I was thinking the same ^_^


>        WARNING:  select()  can  monitor only file descriptors numbers
> that are less than FD_SETSIZE (1024)—an unreasonably low limit for
>        many modern applications—and this limitation will not change. 
> All modern applications should instead use  poll(2)  or  epoll(7),
>        which do not suffer this limitation.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pve-devel] [PATCH qemu-server 1/1] fix #4507 : increase qemu max openfiles limit
  2023-12-12  9:21       ` Fiona Ebner
  2023-12-12 11:10         ` DERUMIER, Alexandre
@ 2023-12-12 11:55         ` Thomas Lamprecht
  2023-12-12 12:20           ` Fiona Ebner
  1 sibling, 1 reply; 8+ messages in thread
From: Thomas Lamprecht @ 2023-12-12 11:55 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fiona Ebner, DERUMIER,
	Alexandre, aderumier

Am 12/12/2023 um 10:21 schrieb Fiona Ebner:
> Am 11.12.23 um 17:29 schrieb DERUMIER, Alexandre:
>>>> So not sure if that's really nicer. This suggests QEMU should raise
>>>> the
>>>> limit itself.
>>
>> Yes, but it don't raise the limit :/ But it's really working with more
>> than 1024 file descriptor.
>>
> 
> From a quick look, I didn't find an option to pass along QEMU for this,
> so it would likely need to be implemented first/discussed with upstream.
> But thinking about it a bit, it feels wrong for each application to be
> responsible for raising the limit itself. Sure the application needs to
> avoid using select(), but IMHO such limits are better set from the
> outside. I think your approach is fine, can you please send a v2 with a
> Signed-off-by and my nits addressed?

Meh, I think the application that actually can use many FDs, which are not
*that* many, should just raise it to the highest limit possible, so I'd
rather do this inside QEMU.
Doing such stuff from the outside is almost always a bit more maintenance
burden, e.g., cue to our various hacks over multiple releases for correctly
waiting for the VMID scope to exit.

We can just patch it in for now downstream while checking if upstream would
accept that, IMO in modern times FD limits are not much a protection,
especially if raised from 1024 to a few hundreds of thousands, especially
as in QEMU the amount isn't really controllable via the guest (i.e.,
unprivileged code).




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pve-devel] [PATCH qemu-server 1/1] fix #4507 : increase qemu max openfiles limit
  2023-12-12 11:55         ` Thomas Lamprecht
@ 2023-12-12 12:20           ` Fiona Ebner
  0 siblings, 0 replies; 8+ messages in thread
From: Fiona Ebner @ 2023-12-12 12:20 UTC (permalink / raw)
  To: Thomas Lamprecht, Proxmox VE development discussion, DERUMIER,
	Alexandre, aderumier

Am 12.12.23 um 12:55 schrieb Thomas Lamprecht:
> Meh, I think the application that actually can use many FDs, which are not
> *that* many, should just raise it to the highest limit possible, so I'd
> rather do this inside QEMU.
> Doing such stuff from the outside is almost always a bit more maintenance
> burden, e.g., cue to our various hacks over multiple releases for correctly
> waiting for the VMID scope to exit.
> 
> We can just patch it in for now downstream while checking if upstream would
> accept that, IMO in modern times FD limits are not much a protection,
> especially if raised from 1024 to a few hundreds of thousands, especially
> as in QEMU the amount isn't really controllable via the guest (i.e.,
> unprivileged code).

Okay, then I'll prepare a QEMU patch and also ask upstream.




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-12-12 12:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-10 14:49 [pve-devel] [PATCH qemu-server 0/1] increase qemu max openfiles limit Alexandre Derumier
2023-12-10 14:49 ` [pve-devel] [PATCH qemu-server 1/1] fix #4507 : " Alexandre Derumier
2023-12-11  9:46   ` Fiona Ebner
2023-12-11 16:29     ` DERUMIER, Alexandre
2023-12-12  9:21       ` Fiona Ebner
2023-12-12 11:10         ` DERUMIER, Alexandre
2023-12-12 11:55         ` Thomas Lamprecht
2023-12-12 12:20           ` Fiona Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal