[PATCH v2 http-server 0/1] fix pveproxy OOM in websocket and spice proxy handlers

public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed

From: Kefu Chai <k.chai@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v2 http-server 0/1] fix pveproxy OOM in websocket and spice proxy handlers
Date: Mon, 13 Apr 2026 20:56:49 +0800	[thread overview]
Message-ID: <20260413125650.2569621-1-k.chai@proxmox.com> (raw)

pveproxy can be OOM-killed when it forwards a WebSocket or SPICE connection
and the backend can't keep up with the client [1]. easy to trigger with PDM
cross-cluster migration to LVM-thin storage, since LVM-thin's zero-on-alloc
makes writes to newly-allocated extents much slower than steady-state, but
the same bug applies to any remote migration, to the VM/node console
sessions that go through the same proxy path, and to SPICE sessions that
carry bulk data like USB passthrough.

the wbuf_max we already set on the backend handle doesn't actually help.
AnyEvent only checks it inside the `if (!$self->{_ww})` guard in
_drain_wbuf:

    if (!$self->{_ww} && length $self->{wbuf}) {
        ...
        if (defined $self->{wbuf_max} && $self->{wbuf_max} < ...) {
            $self->_error(Errno::ENOSPC, 1);
        }
    }

so once the first EAGAIN installs the write watcher (_ww), all subsequent
push_write calls return immediately without ever reaching the check, and
wbuf grows without bound. and as Fabian pointed out, even if it did fire
it'd just tear the connection down with ENOSPC rather than apply any
backpressure, so we don't really want to rely on it anyway.

the fix is to extract a small apply_read_backpressure() helper modelled
on the inline pattern that response_stream() already uses, and call it
from all three proxy handlers: response_stream() (refactor only, no
behaviour change), websocket_proxy(), and handle_spice_proxy_request()
(which previously had only the ineffective wbuf_max and a TODO comment
for this exact problem). the two proxy handlers also get an on_eof
flush so that data buffered while the pause was active doesn't get
silently dropped when the backend closes mid-transfer.

side effect: while reads are paused, any in-band control messages
multiplexed on the same channel are also delayed. for WebSocket
that's ping/pong frames; for SPICE it's whatever protocol-level
keepalives the client uses. in normal operation this is
imperceptible, 640 KB drains in single-digit milliseconds even on
first-time LVM-thin allocations, well within any realistic ping
timeout. only if the backend stalls completely does the pause last
long enough for the client to give up, and in that case a single
connection times out gracefully. still strictly better than the
previous behaviour of OOM-killing pveproxy and taking down every
session on the node.

tested with a synthetic AnyEvent script that pushes a fast writer
through a proxy into a slow reader. without backpressure the proxy
write buffer grows to ~1.4 GB in 5 seconds (2254x the limit); with
the fix it stays bounded just over the 640 KB limit, as expected.

Changes since v1 [2] (thanks @Fabian for the review):
- broaden the scope, not LVM-thin specific
- handle_spice_proxy_request() now gets the same fix; it had the bug
  too and SPICE can carry bulk data via USB passthrough
- pull the helper out into apply_read_backpressure() so response_stream
  uses it too instead of carrying its own copy of the same pattern
- on_eof in both proxy handlers now flushes whatever is left in rbuf
  through the reader before tearing down, so the tail of a bulk
  transfer isn't dropped when the backend closes while reads are paused
- drop an old TODO in response_stream() about whether wbuf_max should
  be used as the threshold, the wbuf_max investigation in this patch
  settles it

[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7483
[2] https://lore.proxmox.com/pve-devel/20260412111209.3960421-1-k.chai@proxmox.com/

Kefu Chai (1):
  fix #7483: apiserver: add backpressure to proxy handlers

 src/PVE/APIServer/AnyEvent.pm | 154 ++++++++++++++++++++++++++--------
 1 file changed, 121 insertions(+), 33 deletions(-)

-- 
2.47.3

next             reply	other threads:[~2026-04-13 12:56 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-13 12:56 Kefu Chai [this message]
2026-04-13 12:56 ` [PATCH v2 http-server] fix #7483: apiserver: add backpressure to " Kefu Chai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260413125650.2569621-1-k.chai@proxmox.com \
    --to=k.chai@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal