public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [PATCH http-server 0/1] fix pveproxy OOM during PDM cross-cluster migration to LVM-thin
@ 2026-04-12 11:12 Kefu Chai
  2026-04-12 11:12 ` [PATCH http-server 1/1] fix #7483: apiserver: add backpressure to websocket proxy Kefu Chai
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Kefu Chai @ 2026-04-12 11:12 UTC (permalink / raw)
  To: pve-devel

pveproxy on the destination host can be OOM-killed during PDM offline
cross-cluster migration when the target storage (e.g. LVM-thin) writes more
slowly than data arrives over the network [1].

the websocket proxy already had wbuf_max set on the backend handle, but it
turns out not to help. AnyEvent::Handle only checks it inside the
`if (!$self->{_ww})` guard in _drain_wbuf:

    if (!$self->{_ww} && length $self->{wbuf}) {
        ...
        if (defined $self->{wbuf_max} && $self->{wbuf_max} < ...) {
            $self->_error(Errno::ENOSPC, 1);
        }
    }

once the first EAGAIN installs the write watcher (_ww), all subsequent
push_write calls return immediately without reaching the check, so wbuf
grows without bound.

to fix this, follow the same approach response_stream() already takes: stop
reading from the source handle when the backend write buffer exceeds the
limit, and resume via on_drain once it empties.

handle_spice_proxy_request() has the same issue and even carries a
"# todo: use stop_read/start_read" comment acknowledging it, but is not
addressed here as SPICE carries interactive VM console traffic rather than
bulk data.

tested with a synthetic AnyEvent script that drives a fast writer through a
proxy into a slow reader. without backpressure, the proxy write buffer grows
to ~1.4 GB in 5 seconds (2254x the limit). with the fix, it stays bounded at
just over the 640 KB limit, as expected.

[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7483

Kefu Chai (1):
  fix #7483: apiserver: add backpressure to websocket proxy

 src/PVE/APIServer/AnyEvent.pm | 45 +++++++++++++++++++++++++++++------
 1 file changed, 38 insertions(+), 7 deletions(-)

-- 
2.47.3





^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH http-server 1/1] fix #7483: apiserver: add backpressure to websocket proxy
  2026-04-12 11:12 [PATCH http-server 0/1] fix pveproxy OOM during PDM cross-cluster migration to LVM-thin Kefu Chai
@ 2026-04-12 11:12 ` Kefu Chai
  2026-04-13  7:38   ` Fabian Grünbichler
  2026-04-13  7:47 ` [PATCH http-server 0/1] fix pveproxy OOM during PDM cross-cluster migration to LVM-thin Fabian Grünbichler
  2026-04-13 13:00 ` superseded: " Kefu Chai
  2 siblings, 1 reply; 5+ messages in thread
From: Kefu Chai @ 2026-04-12 11:12 UTC (permalink / raw)
  To: pve-devel

During PDM cross-cluster migration to LVM-thin storage, pveproxy can
be OOM-killed on the destination host when disk writes are slower than
the incoming network transfer rate.

The existing wbuf_max on the backend handle turns out not to help:
AnyEvent::Handle only checks it inside the `if (!$self->{_ww})` guard
in _drain_wbuf. Once the first EAGAIN installs a write watcher, all
subsequent push_write calls return immediately without ever reaching
the check, so wbuf grows without bound.

Instead, follow the same approach response_stream() already takes:
pause reading from the source handle when the backend write buffer
exceeds the limit, and resume via an on_drain callback once it empties.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>
Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=7483
---
 src/PVE/APIServer/AnyEvent.pm | 45 +++++++++++++++++++++++++++++------
 1 file changed, 38 insertions(+), 7 deletions(-)

diff --git a/src/PVE/APIServer/AnyEvent.pm b/src/PVE/APIServer/AnyEvent.pm
index 915d678..5a4b449 100644
--- a/src/PVE/APIServer/AnyEvent.pm
+++ b/src/PVE/APIServer/AnyEvent.pm
@@ -581,10 +581,11 @@ sub websocket_proxy {
 
             $self->dprint("CONNECTed to '$remhost:$remport'");
 
+            my $wbuf_limit = $max_payload_size * 5;
+
             $reqstate->{proxyhdl} = AnyEvent::Handle->new(
                 fh => $fh,
                 rbuf_max => $max_payload_size,
-                wbuf_max => $max_payload_size * 5,
                 timeout => 5,
                 on_eof => sub {
                     my ($hdl) = @_;
@@ -604,7 +605,30 @@ sub websocket_proxy {
                 },
             );
 
-            my $proxyhdlreader = sub {
+            # Stop reading from $read_hdl until $write_hdl drains its write
+            # buffer, then re-register $on_read_cb. Returns true if
+            # backpressure was applied. We cannot rely on AnyEvent::Handle's
+            # wbuf_max for this because its check in _drain_wbuf is skipped
+            # when a write watcher is already active.
+            my $apply_backpressure = sub {
+                my ($read_hdl, $write_hdl, $on_read_cb, $alive_key) = @_;
+                return if length($write_hdl->{wbuf}) <= $wbuf_limit;
+
+                $read_hdl->on_read();
+                my $prev_on_drain = $write_hdl->{on_drain};
+                $write_hdl->on_drain(sub {
+                    my ($wrhdl) = @_;
+                    $read_hdl->on_read($on_read_cb) if $reqstate->{$alive_key};
+                    if ($prev_on_drain) {
+                        $wrhdl->on_drain($prev_on_drain);
+                        $prev_on_drain->($wrhdl);
+                    }
+                });
+                return 1;
+            };
+
+            my $proxyhdlreader;
+            $proxyhdlreader = sub {
                 my ($hdl) = @_;
 
                 my $len = length($hdl->{rbuf});
@@ -614,10 +638,15 @@ sub websocket_proxy {
 
                 my $string = $encode->(\$data);
 
-                $reqstate->{hdl}->push_write($string) if $reqstate->{hdl};
+                my $clienthdl = $reqstate->{hdl};
+                return if !$clienthdl;
+
+                $clienthdl->push_write($string);
+                $apply_backpressure->($hdl, $clienthdl, $proxyhdlreader, 'proxyhdl');
             };
 
-            my $hdlreader = sub {
+            my $hdlreader;
+            $hdlreader = sub {
                 my ($hdl) = @_;
 
                 while (my $len = length($hdl->{rbuf})) {
@@ -672,7 +701,11 @@ sub websocket_proxy {
                     }
 
                     if ($opcode == 1 || $opcode == 2) {
-                        $reqstate->{proxyhdl}->push_write($payload) if $reqstate->{proxyhdl};
+                        my $proxyhdl = $reqstate->{proxyhdl};
+                        if ($proxyhdl) {
+                            $proxyhdl->push_write($payload);
+                            return if $apply_backpressure->($hdl, $proxyhdl, $hdlreader, 'hdl');
+                        }
                     } elsif ($opcode == 8) {
                         my $statuscode = unpack("n", $payload);
                         $self->dprint("websocket received close. status code: '$statuscode'");
@@ -700,8 +733,6 @@ sub websocket_proxy {
             $reqstate->{proxyhdl}->on_read($proxyhdlreader);
             $reqstate->{hdl}->on_read($hdlreader);
 
-            # todo: use stop_read/start_read if write buffer grows to much
-
             # FIXME: remove protocol in PVE/PMG 8.x
             #
             # for backwards, compatibility,  we have to reply with the websocket
-- 
2.47.3





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH http-server 1/1] fix #7483: apiserver: add backpressure to websocket proxy
  2026-04-12 11:12 ` [PATCH http-server 1/1] fix #7483: apiserver: add backpressure to websocket proxy Kefu Chai
@ 2026-04-13  7:38   ` Fabian Grünbichler
  0 siblings, 0 replies; 5+ messages in thread
From: Fabian Grünbichler @ 2026-04-13  7:38 UTC (permalink / raw)
  To: Kefu Chai, pve-devel

Quoting Kefu Chai (2026-04-12 13:12:09)
> During PDM cross-cluster migration to LVM-thin storage, pveproxy can
> be OOM-killed on the destination host when disk writes are slower than
> the incoming network transfer rate.

I don't think this is limited to that particular combination, in principle all
remote migrations are affected. Might just be easier to trigger with LVM thin
because of its zero-on-alloc behaviour?

Might make sense to adapt the commit message either when applying or sending a
v2.

> The existing wbuf_max on the backend handle turns out not to help:
> AnyEvent::Handle only checks it inside the `if (!$self->{_ww})` guard
> in _drain_wbuf. Once the first EAGAIN installs a write watcher, all
> subsequent push_write calls return immediately without ever reaching
> the check, so wbuf grows without bound.

nice find! though it's a good thing it doesn't I guess, since AFAICT it would
trigger a fatal error on the connection instead of applying back pressure?

> Instead, follow the same approach response_stream() already takes:
> pause reading from the source handle when the backend write buffer
> exceeds the limit, and resume via an on_drain callback once it empties.

I guess there isn't enough overlap to make this a shared helper? the variant
there does handle EOF correctly ;)

> 
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=7483
> ---
>  src/PVE/APIServer/AnyEvent.pm | 45 +++++++++++++++++++++++++++++------
>  1 file changed, 38 insertions(+), 7 deletions(-)
> 
> diff --git a/src/PVE/APIServer/AnyEvent.pm b/src/PVE/APIServer/AnyEvent.pm
> index 915d678..5a4b449 100644
> --- a/src/PVE/APIServer/AnyEvent.pm
> +++ b/src/PVE/APIServer/AnyEvent.pm
> @@ -581,10 +581,11 @@ sub websocket_proxy {
>  
>              $self->dprint("CONNECTed to '$remhost:$remport'");
>  
> +            my $wbuf_limit = $max_payload_size * 5;
> +
>              $reqstate->{proxyhdl} = AnyEvent::Handle->new(
>                  fh => $fh,
>                  rbuf_max => $max_payload_size,
> -                wbuf_max => $max_payload_size * 5,
>                  timeout => 5,
>                  on_eof => sub {
>                      my ($hdl) = @_;
> @@ -604,7 +605,30 @@ sub websocket_proxy {
>                  },
>              );
>  
> -            my $proxyhdlreader = sub {
> +            # Stop reading from $read_hdl until $write_hdl drains its write
> +            # buffer, then re-register $on_read_cb. Returns true if
> +            # backpressure was applied. We cannot rely on AnyEvent::Handle's
> +            # wbuf_max for this because its check in _drain_wbuf is skipped
> +            # when a write watcher is already active.
> +            my $apply_backpressure = sub {
> +                my ($read_hdl, $write_hdl, $on_read_cb, $alive_key) = @_;
> +                return if length($write_hdl->{wbuf}) <= $wbuf_limit;
> +
> +                $read_hdl->on_read();

AFAICT (but I haven't tried to actually trigger this) this is not handling EOF
conditions correctly now?

if you remove the on_read callback here without having an on_eof callback set,
then nothing will read and handle potentially buffered data.. the way
websockets work that shouldn't be a problem IIRC, except when a client is not
waiting for the ack when closing the socket? still probably doesn't hurt to try
to handle it?

> +                my $prev_on_drain = $write_hdl->{on_drain};
> +                $write_hdl->on_drain(sub {
> +                    my ($wrhdl) = @_;
> +                    $read_hdl->on_read($on_read_cb) if $reqstate->{$alive_key};
> +                    if ($prev_on_drain) {
> +                        $wrhdl->on_drain($prev_on_drain);
> +                        $prev_on_drain->($wrhdl);
> +                    }
> +                });
> +                return 1;
> +            };
> +
> +            my $proxyhdlreader;
> +            $proxyhdlreader = sub {
>                  my ($hdl) = @_;
>  
>                  my $len = length($hdl->{rbuf});
> @@ -614,10 +638,15 @@ sub websocket_proxy {
>  
>                  my $string = $encode->(\$data);
>  
> -                $reqstate->{hdl}->push_write($string) if $reqstate->{hdl};
> +                my $clienthdl = $reqstate->{hdl};
> +                return if !$clienthdl;
> +
> +                $clienthdl->push_write($string);
> +                $apply_backpressure->($hdl, $clienthdl, $proxyhdlreader, 'proxyhdl');
>              };
>  
> -            my $hdlreader = sub {
> +            my $hdlreader;
> +            $hdlreader = sub {
>                  my ($hdl) = @_;
>  
>                  while (my $len = length($hdl->{rbuf})) {
> @@ -672,7 +701,11 @@ sub websocket_proxy {
>                      }
>  
>                      if ($opcode == 1 || $opcode == 2) {
> -                        $reqstate->{proxyhdl}->push_write($payload) if $reqstate->{proxyhdl};
> +                        my $proxyhdl = $reqstate->{proxyhdl};
> +                        if ($proxyhdl) {
> +                            $proxyhdl->push_write($payload);
> +                            return if $apply_backpressure->($hdl, $proxyhdl, $hdlreader, 'hdl');
> +                        }

doesn't this also mean that the server stops responding to pings with pongs if
we apply backpressure.. though I don't see a way to avoid that except for
adding custom control frames that apply the pressure client-side.. which
probably causes more harm than good.

>                      } elsif ($opcode == 8) {
>                          my $statuscode = unpack("n", $payload);
>                          $self->dprint("websocket received close. status code: '$statuscode'");
> @@ -700,8 +733,6 @@ sub websocket_proxy {
>              $reqstate->{proxyhdl}->on_read($proxyhdlreader);
>              $reqstate->{hdl}->on_read($hdlreader);
>  
> -            # todo: use stop_read/start_read if write buffer grows to much
> -
>              # FIXME: remove protocol in PVE/PMG 8.x
>              #
>              # for backwards, compatibility,  we have to reply with the websocket
> -- 
> 2.47.3
> 
> 
> 
> 
>




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH http-server 0/1] fix pveproxy OOM during PDM cross-cluster migration to LVM-thin
  2026-04-12 11:12 [PATCH http-server 0/1] fix pveproxy OOM during PDM cross-cluster migration to LVM-thin Kefu Chai
  2026-04-12 11:12 ` [PATCH http-server 1/1] fix #7483: apiserver: add backpressure to websocket proxy Kefu Chai
@ 2026-04-13  7:47 ` Fabian Grünbichler
  2026-04-13 13:00 ` superseded: " Kefu Chai
  2 siblings, 0 replies; 5+ messages in thread
From: Fabian Grünbichler @ 2026-04-13  7:47 UTC (permalink / raw)
  To: Kefu Chai, pve-devel

Quoting Kefu Chai (2026-04-12 13:12:08)
> pveproxy on the destination host can be OOM-killed during PDM offline
> cross-cluster migration when the target storage (e.g. LVM-thin) writes more
> slowly than data arrives over the network [1].
> 
> the websocket proxy already had wbuf_max set on the backend handle, but it
> turns out not to help. AnyEvent::Handle only checks it inside the
> `if (!$self->{_ww})` guard in _drain_wbuf:
> 
>     if (!$self->{_ww} && length $self->{wbuf}) {
>         ...
>         if (defined $self->{wbuf_max} && $self->{wbuf_max} < ...) {
>             $self->_error(Errno::ENOSPC, 1);
>         }
>     }
> 
> once the first EAGAIN installs the write watcher (_ww), all subsequent
> push_write calls return immediately without reaching the check, so wbuf
> grows without bound.
> 
> to fix this, follow the same approach response_stream() already takes: stop
> reading from the source handle when the backend write buffer exceeds the
> limit, and resume via on_drain once it empties.
> 
> handle_spice_proxy_request() has the same issue and even carries a
> "# todo: use stop_read/start_read" comment acknowledging it, but is not
> addressed here as SPICE carries interactive VM console traffic rather than
> bulk data.

note that in regular PVE (and also PDM), the websocket proxy is used for
proxying console sessions for nodes and guests. so in principle, the same
applies for both - but it is probably harder to trigger in practice via such
connections ;)

spice does potentially handle bulk streams of data as well though - e.g., it
supports passing through USB devices from the client to the spice session..

> tested with a synthetic AnyEvent script that drives a fast writer through a
> proxy into a slow reader. without backpressure, the proxy write buffer grows
> to ~1.4 GB in 5 seconds (2254x the limit). with the fix, it stays bounded at
> just over the 640 KB limit, as expected.
> 
> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7483
> 
> Kefu Chai (1):
>   fix #7483: apiserver: add backpressure to websocket proxy
> 
>  src/PVE/APIServer/AnyEvent.pm | 45 +++++++++++++++++++++++++++++------
>  1 file changed, 38 insertions(+), 7 deletions(-)
> 
> -- 
> 2.47.3
> 
> 
> 
> 
>




^ permalink raw reply	[flat|nested] 5+ messages in thread

* superseded: [PATCH http-server 0/1] fix pveproxy OOM during PDM cross-cluster migration to LVM-thin
  2026-04-12 11:12 [PATCH http-server 0/1] fix pveproxy OOM during PDM cross-cluster migration to LVM-thin Kefu Chai
  2026-04-12 11:12 ` [PATCH http-server 1/1] fix #7483: apiserver: add backpressure to websocket proxy Kefu Chai
  2026-04-13  7:47 ` [PATCH http-server 0/1] fix pveproxy OOM during PDM cross-cluster migration to LVM-thin Fabian Grünbichler
@ 2026-04-13 13:00 ` Kefu Chai
  2 siblings, 0 replies; 5+ messages in thread
From: Kefu Chai @ 2026-04-13 13:00 UTC (permalink / raw)
  To: pve-devel

superseded by https://lore.proxmox.com/pve-devel/20260413125650.2569621-1-k.chai@proxmox.com/




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-04-13 13:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-04-12 11:12 [PATCH http-server 0/1] fix pveproxy OOM during PDM cross-cluster migration to LVM-thin Kefu Chai
2026-04-12 11:12 ` [PATCH http-server 1/1] fix #7483: apiserver: add backpressure to websocket proxy Kefu Chai
2026-04-13  7:38   ` Fabian Grünbichler
2026-04-13  7:47 ` [PATCH http-server 0/1] fix pveproxy OOM during PDM cross-cluster migration to LVM-thin Fabian Grünbichler
2026-04-13 13:00 ` superseded: " Kefu Chai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal