From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id BF8F71FF141 for ; Mon, 13 Apr 2026 14:56:48 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id D66E1242F3; Mon, 13 Apr 2026 14:57:36 +0200 (CEST) From: Kefu Chai To: pve-devel@lists.proxmox.com Subject: [PATCH v2 http-server 0/1] fix pveproxy OOM in websocket and spice proxy handlers Date: Mon, 13 Apr 2026 20:56:49 +0800 Message-ID: <20260413125650.2569621-1-k.chai@proxmox.com> X-Mailer: git-send-email 2.47.3 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1776084946163 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.358 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [anyevent.pm] Message-ID-Hash: RNLCEM3DYDGDNZGT6IMBEU6SLWVPIEUX X-Message-ID-Hash: RNLCEM3DYDGDNZGT6IMBEU6SLWVPIEUX X-MailFrom: k.chai@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: pveproxy can be OOM-killed when it forwards a WebSocket or SPICE connection and the backend can't keep up with the client [1]. easy to trigger with PDM cross-cluster migration to LVM-thin storage, since LVM-thin's zero-on-alloc makes writes to newly-allocated extents much slower than steady-state, but the same bug applies to any remote migration, to the VM/node console sessions that go through the same proxy path, and to SPICE sessions that carry bulk data like USB passthrough. the wbuf_max we already set on the backend handle doesn't actually help. AnyEvent only checks it inside the `if (!$self->{_ww})` guard in _drain_wbuf: if (!$self->{_ww} && length $self->{wbuf}) { ... if (defined $self->{wbuf_max} && $self->{wbuf_max} < ...) { $self->_error(Errno::ENOSPC, 1); } } so once the first EAGAIN installs the write watcher (_ww), all subsequent push_write calls return immediately without ever reaching the check, and wbuf grows without bound. and as Fabian pointed out, even if it did fire it'd just tear the connection down with ENOSPC rather than apply any backpressure, so we don't really want to rely on it anyway. the fix is to extract a small apply_read_backpressure() helper modelled on the inline pattern that response_stream() already uses, and call it from all three proxy handlers: response_stream() (refactor only, no behaviour change), websocket_proxy(), and handle_spice_proxy_request() (which previously had only the ineffective wbuf_max and a TODO comment for this exact problem). the two proxy handlers also get an on_eof flush so that data buffered while the pause was active doesn't get silently dropped when the backend closes mid-transfer. side effect: while reads are paused, any in-band control messages multiplexed on the same channel are also delayed. for WebSocket that's ping/pong frames; for SPICE it's whatever protocol-level keepalives the client uses. in normal operation this is imperceptible, 640 KB drains in single-digit milliseconds even on first-time LVM-thin allocations, well within any realistic ping timeout. only if the backend stalls completely does the pause last long enough for the client to give up, and in that case a single connection times out gracefully. still strictly better than the previous behaviour of OOM-killing pveproxy and taking down every session on the node. tested with a synthetic AnyEvent script that pushes a fast writer through a proxy into a slow reader. without backpressure the proxy write buffer grows to ~1.4 GB in 5 seconds (2254x the limit); with the fix it stays bounded just over the 640 KB limit, as expected. Changes since v1 [2] (thanks @Fabian for the review): - broaden the scope, not LVM-thin specific - handle_spice_proxy_request() now gets the same fix; it had the bug too and SPICE can carry bulk data via USB passthrough - pull the helper out into apply_read_backpressure() so response_stream uses it too instead of carrying its own copy of the same pattern - on_eof in both proxy handlers now flushes whatever is left in rbuf through the reader before tearing down, so the tail of a bulk transfer isn't dropped when the backend closes while reads are paused - drop an old TODO in response_stream() about whether wbuf_max should be used as the threshold, the wbuf_max investigation in this patch settles it [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7483 [2] https://lore.proxmox.com/pve-devel/20260412111209.3960421-1-k.chai@proxmox.com/ Kefu Chai (1): fix #7483: apiserver: add backpressure to proxy handlers src/PVE/APIServer/AnyEvent.pm | 154 ++++++++++++++++++++++++++-------- 1 file changed, 121 insertions(+), 33 deletions(-) -- 2.47.3