From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id E04581FF141 for ; Mon, 13 Apr 2026 09:47:22 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 48B6A1A4DF; Mon, 13 Apr 2026 09:48:10 +0200 (CEST) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable In-Reply-To: <20260412111209.3960421-1-k.chai@proxmox.com> References: <20260412111209.3960421-1-k.chai@proxmox.com> Subject: Re: [PATCH http-server 0/1] fix pveproxy OOM during PDM cross-cluster migration to LVM-thin From: Fabian =?utf-8?q?Gr=C3=BCnbichler?= To: Kefu Chai , pve-devel@lists.proxmox.com Date: Mon, 13 Apr 2026 09:47:24 +0200 Message-ID: <177606644465.154362.17704319540020721205@yuna.proxmox.com> User-Agent: alot/0.0.0 X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1776066382257 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.053 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com,anyevent.pm] Message-ID-Hash: GWLZRXLOIYTM5EXOI2GLVEEGYVPS5QP2 X-Message-ID-Hash: GWLZRXLOIYTM5EXOI2GLVEEGYVPS5QP2 X-MailFrom: f.gruenbichler@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Quoting Kefu Chai (2026-04-12 13:12:08) > pveproxy on the destination host can be OOM-killed during PDM offline > cross-cluster migration when the target storage (e.g. LVM-thin) writes mo= re > slowly than data arrives over the network [1]. >=20 > the websocket proxy already had wbuf_max set on the backend handle, but it > turns out not to help. AnyEvent::Handle only checks it inside the > `if (!$self->{_ww})` guard in _drain_wbuf: >=20 > if (!$self->{_ww} && length $self->{wbuf}) { > ... > if (defined $self->{wbuf_max} && $self->{wbuf_max} < ...) { > $self->_error(Errno::ENOSPC, 1); > } > } >=20 > once the first EAGAIN installs the write watcher (_ww), all subsequent > push_write calls return immediately without reaching the check, so wbuf > grows without bound. >=20 > to fix this, follow the same approach response_stream() already takes: st= op > reading from the source handle when the backend write buffer exceeds the > limit, and resume via on_drain once it empties. >=20 > handle_spice_proxy_request() has the same issue and even carries a > "# todo: use stop_read/start_read" comment acknowledging it, but is not > addressed here as SPICE carries interactive VM console traffic rather than > bulk data. note that in regular PVE (and also PDM), the websocket proxy is used for proxying console sessions for nodes and guests. so in principle, the same applies for both - but it is probably harder to trigger in practice via such connections ;) spice does potentially handle bulk streams of data as well though - e.g., it supports passing through USB devices from the client to the spice session.. > tested with a synthetic AnyEvent script that drives a fast writer through= a > proxy into a slow reader. without backpressure, the proxy write buffer gr= ows > to ~1.4 GB in 5 seconds (2254x the limit). with the fix, it stays bounded= at > just over the 640 KB limit, as expected. >=20 > [1] https://bugzilla.proxmox.com/show_bug.cgi?id=3D7483 >=20 > Kefu Chai (1): > fix #7483: apiserver: add backpressure to websocket proxy >=20 > src/PVE/APIServer/AnyEvent.pm | 45 +++++++++++++++++++++++++++++------ > 1 file changed, 38 insertions(+), 7 deletions(-) >=20 > --=20 > 2.47.3 >=20 >=20 >=20 >=20 >