From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 2E2131FF141 for ; Mon, 13 Apr 2026 09:38:29 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id E51BE1A0EB; Mon, 13 Apr 2026 09:39:16 +0200 (CEST) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable In-Reply-To: <20260412111209.3960421-2-k.chai@proxmox.com> References: <20260412111209.3960421-1-k.chai@proxmox.com> <20260412111209.3960421-2-k.chai@proxmox.com> Subject: Re: [PATCH http-server 1/1] fix #7483: apiserver: add backpressure to websocket proxy From: Fabian =?utf-8?q?Gr=C3=BCnbichler?= To: Kefu Chai , pve-devel@lists.proxmox.com Date: Mon, 13 Apr 2026 09:38:27 +0200 Message-ID: <177606590735.45911.9684650018695214160@yuna.proxmox.com> User-Agent: alot/0.0.0 X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1776065847278 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.053 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [anyevent.pm,proxmox.com] Message-ID-Hash: GM5XH3FBJYTILCMEKVE6XFQF674VMK6F X-Message-ID-Hash: GM5XH3FBJYTILCMEKVE6XFQF674VMK6F X-MailFrom: f.gruenbichler@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Quoting Kefu Chai (2026-04-12 13:12:09) > During PDM cross-cluster migration to LVM-thin storage, pveproxy can > be OOM-killed on the destination host when disk writes are slower than > the incoming network transfer rate. I don't think this is limited to that particular combination, in principle = all remote migrations are affected. Might just be easier to trigger with LVM th= in because of its zero-on-alloc behaviour? Might make sense to adapt the commit message either when applying or sendin= g a v2. > The existing wbuf_max on the backend handle turns out not to help: > AnyEvent::Handle only checks it inside the `if (!$self->{_ww})` guard > in _drain_wbuf. Once the first EAGAIN installs a write watcher, all > subsequent push_write calls return immediately without ever reaching > the check, so wbuf grows without bound. nice find! though it's a good thing it doesn't I guess, since AFAICT it wou= ld trigger a fatal error on the connection instead of applying back pressure? > Instead, follow the same approach response_stream() already takes: > pause reading from the source handle when the backend write buffer > exceeds the limit, and resume via an on_drain callback once it empties. I guess there isn't enough overlap to make this a shared helper? the variant there does handle EOF correctly ;) >=20 > Signed-off-by: Kefu Chai > Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=3D7483 > --- > src/PVE/APIServer/AnyEvent.pm | 45 +++++++++++++++++++++++++++++------ > 1 file changed, 38 insertions(+), 7 deletions(-) >=20 > diff --git a/src/PVE/APIServer/AnyEvent.pm b/src/PVE/APIServer/AnyEvent.pm > index 915d678..5a4b449 100644 > --- a/src/PVE/APIServer/AnyEvent.pm > +++ b/src/PVE/APIServer/AnyEvent.pm > @@ -581,10 +581,11 @@ sub websocket_proxy { > =20 > $self->dprint("CONNECTed to '$remhost:$remport'"); > =20 > + my $wbuf_limit =3D $max_payload_size * 5; > + > $reqstate->{proxyhdl} =3D AnyEvent::Handle->new( > fh =3D> $fh, > rbuf_max =3D> $max_payload_size, > - wbuf_max =3D> $max_payload_size * 5, > timeout =3D> 5, > on_eof =3D> sub { > my ($hdl) =3D @_; > @@ -604,7 +605,30 @@ sub websocket_proxy { > }, > ); > =20 > - my $proxyhdlreader =3D sub { > + # Stop reading from $read_hdl until $write_hdl drains its wr= ite > + # buffer, then re-register $on_read_cb. Returns true if > + # backpressure was applied. We cannot rely on AnyEvent::Hand= le's > + # wbuf_max for this because its check in _drain_wbuf is skip= ped > + # when a write watcher is already active. > + my $apply_backpressure =3D sub { > + my ($read_hdl, $write_hdl, $on_read_cb, $alive_key) =3D = @_; > + return if length($write_hdl->{wbuf}) <=3D $wbuf_limit; > + > + $read_hdl->on_read(); AFAICT (but I haven't tried to actually trigger this) this is not handling = EOF conditions correctly now? if you remove the on_read callback here without having an on_eof callback s= et, then nothing will read and handle potentially buffered data.. the way websockets work that shouldn't be a problem IIRC, except when a client is n= ot waiting for the ack when closing the socket? still probably doesn't hurt to= try to handle it? > + my $prev_on_drain =3D $write_hdl->{on_drain}; > + $write_hdl->on_drain(sub { > + my ($wrhdl) =3D @_; > + $read_hdl->on_read($on_read_cb) if $reqstate->{$aliv= e_key}; > + if ($prev_on_drain) { > + $wrhdl->on_drain($prev_on_drain); > + $prev_on_drain->($wrhdl); > + } > + }); > + return 1; > + }; > + > + my $proxyhdlreader; > + $proxyhdlreader =3D sub { > my ($hdl) =3D @_; > =20 > my $len =3D length($hdl->{rbuf}); > @@ -614,10 +638,15 @@ sub websocket_proxy { > =20 > my $string =3D $encode->(\$data); > =20 > - $reqstate->{hdl}->push_write($string) if $reqstate->{hdl= }; > + my $clienthdl =3D $reqstate->{hdl}; > + return if !$clienthdl; > + > + $clienthdl->push_write($string); > + $apply_backpressure->($hdl, $clienthdl, $proxyhdlreader,= 'proxyhdl'); > }; > =20 > - my $hdlreader =3D sub { > + my $hdlreader; > + $hdlreader =3D sub { > my ($hdl) =3D @_; > =20 > while (my $len =3D length($hdl->{rbuf})) { > @@ -672,7 +701,11 @@ sub websocket_proxy { > } > =20 > if ($opcode =3D=3D 1 || $opcode =3D=3D 2) { > - $reqstate->{proxyhdl}->push_write($payload) if $= reqstate->{proxyhdl}; > + my $proxyhdl =3D $reqstate->{proxyhdl}; > + if ($proxyhdl) { > + $proxyhdl->push_write($payload); > + return if $apply_backpressure->($hdl, $proxy= hdl, $hdlreader, 'hdl'); > + } doesn't this also mean that the server stops responding to pings with pongs= if we apply backpressure.. though I don't see a way to avoid that except for adding custom control frames that apply the pressure client-side.. which probably causes more harm than good. > } elsif ($opcode =3D=3D 8) { > my $statuscode =3D unpack("n", $payload); > $self->dprint("websocket received close. status = code: '$statuscode'"); > @@ -700,8 +733,6 @@ sub websocket_proxy { > $reqstate->{proxyhdl}->on_read($proxyhdlreader); > $reqstate->{hdl}->on_read($hdlreader); > =20 > - # todo: use stop_read/start_read if write buffer grows to mu= ch > - > # FIXME: remove protocol in PVE/PMG 8.x > # > # for backwards, compatibility, we have to reply with the w= ebsocket > --=20 > 2.47.3 >=20 >=20 >=20 >=20 >