From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id B5A0D1FF148 for ; Sun, 12 Apr 2026 13:12:15 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 69CFFF977; Sun, 12 Apr 2026 13:12:55 +0200 (CEST) From: Kefu Chai To: pve-devel@lists.proxmox.com Subject: [PATCH http-server 1/1] fix #7483: apiserver: add backpressure to websocket proxy Date: Sun, 12 Apr 2026 19:12:09 +0800 Message-ID: <20260412111209.3960421-2-k.chai@proxmox.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260412111209.3960421-1-k.chai@proxmox.com> References: <20260412111209.3960421-1-k.chai@proxmox.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1775992268064 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.370 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: F3NXR5UNYLLPNNTIY3UV2RCRO5SMQETI X-Message-ID-Hash: F3NXR5UNYLLPNNTIY3UV2RCRO5SMQETI X-MailFrom: k.chai@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: During PDM cross-cluster migration to LVM-thin storage, pveproxy can be OOM-killed on the destination host when disk writes are slower than the incoming network transfer rate. The existing wbuf_max on the backend handle turns out not to help: AnyEvent::Handle only checks it inside the `if (!$self->{_ww})` guard in _drain_wbuf. Once the first EAGAIN installs a write watcher, all subsequent push_write calls return immediately without ever reaching the check, so wbuf grows without bound. Instead, follow the same approach response_stream() already takes: pause reading from the source handle when the backend write buffer exceeds the limit, and resume via an on_drain callback once it empties. Signed-off-by: Kefu Chai Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=7483 --- src/PVE/APIServer/AnyEvent.pm | 45 +++++++++++++++++++++++++++++------ 1 file changed, 38 insertions(+), 7 deletions(-) diff --git a/src/PVE/APIServer/AnyEvent.pm b/src/PVE/APIServer/AnyEvent.pm index 915d678..5a4b449 100644 --- a/src/PVE/APIServer/AnyEvent.pm +++ b/src/PVE/APIServer/AnyEvent.pm @@ -581,10 +581,11 @@ sub websocket_proxy { $self->dprint("CONNECTed to '$remhost:$remport'"); + my $wbuf_limit = $max_payload_size * 5; + $reqstate->{proxyhdl} = AnyEvent::Handle->new( fh => $fh, rbuf_max => $max_payload_size, - wbuf_max => $max_payload_size * 5, timeout => 5, on_eof => sub { my ($hdl) = @_; @@ -604,7 +605,30 @@ sub websocket_proxy { }, ); - my $proxyhdlreader = sub { + # Stop reading from $read_hdl until $write_hdl drains its write + # buffer, then re-register $on_read_cb. Returns true if + # backpressure was applied. We cannot rely on AnyEvent::Handle's + # wbuf_max for this because its check in _drain_wbuf is skipped + # when a write watcher is already active. + my $apply_backpressure = sub { + my ($read_hdl, $write_hdl, $on_read_cb, $alive_key) = @_; + return if length($write_hdl->{wbuf}) <= $wbuf_limit; + + $read_hdl->on_read(); + my $prev_on_drain = $write_hdl->{on_drain}; + $write_hdl->on_drain(sub { + my ($wrhdl) = @_; + $read_hdl->on_read($on_read_cb) if $reqstate->{$alive_key}; + if ($prev_on_drain) { + $wrhdl->on_drain($prev_on_drain); + $prev_on_drain->($wrhdl); + } + }); + return 1; + }; + + my $proxyhdlreader; + $proxyhdlreader = sub { my ($hdl) = @_; my $len = length($hdl->{rbuf}); @@ -614,10 +638,15 @@ sub websocket_proxy { my $string = $encode->(\$data); - $reqstate->{hdl}->push_write($string) if $reqstate->{hdl}; + my $clienthdl = $reqstate->{hdl}; + return if !$clienthdl; + + $clienthdl->push_write($string); + $apply_backpressure->($hdl, $clienthdl, $proxyhdlreader, 'proxyhdl'); }; - my $hdlreader = sub { + my $hdlreader; + $hdlreader = sub { my ($hdl) = @_; while (my $len = length($hdl->{rbuf})) { @@ -672,7 +701,11 @@ sub websocket_proxy { } if ($opcode == 1 || $opcode == 2) { - $reqstate->{proxyhdl}->push_write($payload) if $reqstate->{proxyhdl}; + my $proxyhdl = $reqstate->{proxyhdl}; + if ($proxyhdl) { + $proxyhdl->push_write($payload); + return if $apply_backpressure->($hdl, $proxyhdl, $hdlreader, 'hdl'); + } } elsif ($opcode == 8) { my $statuscode = unpack("n", $payload); $self->dprint("websocket received close. status code: '$statuscode'"); @@ -700,8 +733,6 @@ sub websocket_proxy { $reqstate->{proxyhdl}->on_read($proxyhdlreader); $reqstate->{hdl}->on_read($hdlreader); - # todo: use stop_read/start_read if write buffer grows to much - # FIXME: remove protocol in PVE/PMG 8.x # # for backwards, compatibility, we have to reply with the websocket -- 2.47.3