public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH http-server 0/3] improve error handling in accept_connections
@ 2020-12-03 18:43 Stoiko Ivanov
  2020-12-03 18:43 ` [pve-devel] [PATCH http-server 1/3] accept-phase: fix conn_count "leak" Stoiko Ivanov
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Stoiko Ivanov @ 2020-12-03 18:43 UTC (permalink / raw)
  To: pve-devel

This patchset is the result of investigating a report in our community forum:
https://forum.proxmox.com/threads/pveproxy-eats-available-ram.79617/

The first patch fixes an issue where pveproxy worker processes would never
exit (and eat quite a bit of ram+cpu) when 'getpeername' returned an error.

The second seemed to me like a sensible further cleanup, and the third patch
will hopefully provide the needed information when debugging such things in
the future.

Huge thanks to Dominik, who analyzed this issue with me!

Stoiko Ivanov (3):
  accept-phase: fix conn_count "leak"
  accept-phase: shutdown socket on early error
  add debug log for problems during accept

 PVE/APIServer/AnyEvent.pm | 29 +++++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 4 deletions(-)

-- 
2.20.1





^ permalink raw reply	[flat|nested] 6+ messages in thread

* [pve-devel] [PATCH http-server 1/3] accept-phase: fix conn_count "leak"
  2020-12-03 18:43 [pve-devel] [PATCH http-server 0/3] improve error handling in accept_connections Stoiko Ivanov
@ 2020-12-03 18:43 ` Stoiko Ivanov
  2020-12-04  7:41   ` Thomas Lamprecht
  2020-12-03 18:43 ` [pve-devel] [PATCH http-server 2/3] accept-phase: shutdown socket on early error Stoiko Ivanov
  2020-12-03 18:43 ` [pve-devel] [PATCH http-server 3/3] add debug log for problems during accept Stoiko Ivanov
  2 siblings, 1 reply; 6+ messages in thread
From: Stoiko Ivanov @ 2020-12-03 18:43 UTC (permalink / raw)
  To: pve-devel

When handling new connections in 'accept_connections' the number of
active connections got increased before the AnyEvent::Handle
registered the callback which would decrement it on error.

Any error/die beforehand would skip the decrement, and leave the
process in an endless loop upon exiting in wait_end_loop.

This can happen e.g. when the call to getpeername fails, or if the
connection is denied by the ALLOW_FROM/DENY_FROM settings in
'/etc/default/pveproxy' (which is also the simplest reproducer for
that).

Additionally it can cause a denial of service, by attempting to
connect from a denied ip until the connection count exeeds the maximum
connections of all child-processes.

Reported via our community-forum:
https://forum.proxmox.com/threads/pveproxy-eats-available-ram.79617/

Co-Authored-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 PVE/APIServer/AnyEvent.pm | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/PVE/APIServer/AnyEvent.pm b/PVE/APIServer/AnyEvent.pm
index c55da7f..c5f5fdc 100644
--- a/PVE/APIServer/AnyEvent.pm
+++ b/PVE/APIServer/AnyEvent.pm
@@ -1479,8 +1479,6 @@ sub accept {
 
     fh_nonblocking $clientfh, 1;
 
-    $self->{conn_count}++;
-
     return $clientfh;
 }
 
@@ -1561,7 +1559,7 @@ sub accept_connections {
 	    my $reqstate = { keep_alive => $self->{keep_alive} };
 
 	    # stop keep-alive when there are many open connections
-	    if ($self->{conn_count} >= $self->{max_conn_soft_limit}) {
+	    if ($self->{conn_count}+1 >= $self->{max_conn_soft_limit}) {
 		$reqstate->{keep_alive} = 0;
 	    }
 
@@ -1600,6 +1598,9 @@ sub accept_connections {
 		},
 		($self->{tls_ctx} ? (tls => "accept", tls_ctx => $self->{tls_ctx}) : ()));
 
+	    $self->{conn_count}++;
+
+
 	    print "$$: ACCEPT FH" .  $clientfh->fileno() . " CONN$self->{conn_count}\n" if $self->{debug};
 
 	    $self->push_request_header($reqstate);
-- 
2.20.1





^ permalink raw reply	[flat|nested] 6+ messages in thread

* [pve-devel] [PATCH http-server 2/3] accept-phase: shutdown socket on early error
  2020-12-03 18:43 [pve-devel] [PATCH http-server 0/3] improve error handling in accept_connections Stoiko Ivanov
  2020-12-03 18:43 ` [pve-devel] [PATCH http-server 1/3] accept-phase: fix conn_count "leak" Stoiko Ivanov
@ 2020-12-03 18:43 ` Stoiko Ivanov
  2020-12-03 18:43 ` [pve-devel] [PATCH http-server 3/3] add debug log for problems during accept Stoiko Ivanov
  2 siblings, 0 replies; 6+ messages in thread
From: Stoiko Ivanov @ 2020-12-03 18:43 UTC (permalink / raw)
  To: pve-devel

if an error happens before AnyEvent::Handle registers the cleanup
callback, we should shutdown the socket, when handling it.

Co-Authored-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 PVE/APIServer/AnyEvent.pm | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/PVE/APIServer/AnyEvent.pm b/PVE/APIServer/AnyEvent.pm
index c5f5fdc..7038b07 100644
--- a/PVE/APIServer/AnyEvent.pm
+++ b/PVE/APIServer/AnyEvent.pm
@@ -1552,9 +1552,13 @@ sub check_host_access {
 sub accept_connections {
     my ($self) = @_;
 
+    my ($clientfh, $early_err);
     eval {
 
-	while (my $clientfh = $self->accept()) {
+	while (1) {
+	    $early_err = 1;
+	    $clientfh = $self->accept();
+	    last if !$clientfh;
 
 	    my $reqstate = { keep_alive => $self->{keep_alive} };
 
@@ -1566,15 +1570,21 @@ sub accept_connections {
 	    if (my $sin = getpeername($clientfh)) {
 		my ($pfamily, $pport, $phost) = PVE::Tools::unpack_sockaddr_in46($sin);
 		($reqstate->{peer_port}, $reqstate->{peer_host}) = ($pport,  Socket::inet_ntop($pfamily, $phost));
+	    } else {
+		shutdown($clientfh, 1);
+		next;
 	    }
 
 	    if (!$self->{trusted_env} && !$self->check_host_access($reqstate->{peer_host})) {
 		print "$$: ABORT request from $reqstate->{peer_host} - access denied\n" if $self->{debug};
 		$reqstate->{log}->{code} = 403;
 		$self->log_request($reqstate);
+		shutdown($clientfh, 1);
 		next;
 	    }
 
+	    $early_err = 0;
+
 	    $reqstate->{hdl} = AnyEvent::Handle->new(
 		fh => $clientfh,
 		rbuf_max => 64*1024,
@@ -1609,6 +1619,7 @@ sub accept_connections {
 
     if (my $err = $@) {
 	syslog('err', $err);
+	shutdown($clientfh, 1) if $early_err;
 	$self->{end_loop} = 1;
     }
 
-- 
2.20.1





^ permalink raw reply	[flat|nested] 6+ messages in thread

* [pve-devel] [PATCH http-server 3/3] add debug log for problems during accept
  2020-12-03 18:43 [pve-devel] [PATCH http-server 0/3] improve error handling in accept_connections Stoiko Ivanov
  2020-12-03 18:43 ` [pve-devel] [PATCH http-server 1/3] accept-phase: fix conn_count "leak" Stoiko Ivanov
  2020-12-03 18:43 ` [pve-devel] [PATCH http-server 2/3] accept-phase: shutdown socket on early error Stoiko Ivanov
@ 2020-12-03 18:43 ` Stoiko Ivanov
  2020-12-04  6:37   ` Thomas Lamprecht
  2 siblings, 1 reply; 6+ messages in thread
From: Stoiko Ivanov @ 2020-12-03 18:43 UTC (permalink / raw)
  To: pve-devel

Co-Authored-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 PVE/APIServer/AnyEvent.pm | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/PVE/APIServer/AnyEvent.pm b/PVE/APIServer/AnyEvent.pm
index 7038b07..d33f6b0 100644
--- a/PVE/APIServer/AnyEvent.pm
+++ b/PVE/APIServer/AnyEvent.pm
@@ -1520,6 +1520,11 @@ sub check_host_access {
 
     my $cip = Net::IP->new($clientip);
 
+    if (!$cip) {
+	print "$$: check_host_access: clientip not parsable: $@\n" if $self->{debug};
+	return 0;
+    }
+
     my $match_allow = 0;
     my $match_deny = 0;
 
@@ -1527,6 +1532,7 @@ sub check_host_access {
 	foreach my $t (@{$self->{allow_from}}) {
 	    if ($t->overlaps($cip)) {
 		$match_allow = 1;
+		print "$$: check_host_access: clientip allowed: ". $t->prefix() . "\n" if $self->{debug};
 		last;
 	    }
 	}
@@ -1535,6 +1541,7 @@ sub check_host_access {
     if ($self->{deny_from}) {
 	foreach my $t (@{$self->{deny_from}}) {
 	    if ($t->overlaps($cip)) {
+		print "$$: check_host_access: clientip denied: ". $t->prefix() . "\n" if $self->{debug};
 		$match_deny = 1;
 		last;
 	    }
@@ -1571,6 +1578,7 @@ sub accept_connections {
 		my ($pfamily, $pport, $phost) = PVE::Tools::unpack_sockaddr_in46($sin);
 		($reqstate->{peer_port}, $reqstate->{peer_host}) = ($pport,  Socket::inet_ntop($pfamily, $phost));
 	    } else {
+		print "$$: ACCEPT connection: getpeername failed: $!\n" if $self->{debug};
 		shutdown($clientfh, 1);
 		next;
 	    }
@@ -1619,6 +1627,7 @@ sub accept_connections {
 
     if (my $err = $@) {
 	syslog('err', $err);
+	print "$$: ACCEPT connection error: $err\n" if $self->{debug};
 	shutdown($clientfh, 1) if $early_err;
 	$self->{end_loop} = 1;
     }
-- 
2.20.1





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pve-devel] [PATCH http-server 3/3] add debug log for problems during accept
  2020-12-03 18:43 ` [pve-devel] [PATCH http-server 3/3] add debug log for problems during accept Stoiko Ivanov
@ 2020-12-04  6:37   ` Thomas Lamprecht
  0 siblings, 0 replies; 6+ messages in thread
From: Thomas Lamprecht @ 2020-12-04  6:37 UTC (permalink / raw)
  To: Proxmox VE development discussion, Stoiko Ivanov

On 03.12.20 19:43, Stoiko Ivanov wrote:
> Co-Authored-by: Dominik Csapak <d.csapak@proxmox.com>
> Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
> ---
>  PVE/APIServer/AnyEvent.pm | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/PVE/APIServer/AnyEvent.pm b/PVE/APIServer/AnyEvent.pm
> index 7038b07..d33f6b0 100644
> --- a/PVE/APIServer/AnyEvent.pm
> +++ b/PVE/APIServer/AnyEvent.pm
> @@ -1520,6 +1520,11 @@ sub check_host_access {
>  
>      my $cip = Net::IP->new($clientip);
>  
> +    if (!$cip) {
> +	print "$$: check_host_access: clientip not parsable: $@\n" if $self->{debug};

I'd avoid using variable names 1:1 in log messages for such things, use "client IP"
(for not hard coding sub name see below)

> +	return 0;
> +    }
> +
>      my $match_allow = 0;
>      my $match_deny = 0;
>  
> @@ -1527,6 +1532,7 @@ sub check_host_access {
>  	foreach my $t (@{$self->{allow_from}}) {
>  	    if ($t->overlaps($cip)) {
>  		$match_allow = 1;
> +		print "$$: check_host_access: clientip allowed: ". $t->prefix() . "\n" if $self->{debug};
>  		last;
>  	    }
>  	}
> @@ -1535,6 +1541,7 @@ sub check_host_access {
>      if ($self->{deny_from}) {
>  	foreach my $t (@{$self->{deny_from}}) {
>  	    if ($t->overlaps($cip)) {
> +		print "$$: check_host_access: clientip denied: ". $t->prefix() . "\n" if $self->{debug};
>  		$match_deny = 1;
>  		last;
>  	    }
> @@ -1571,6 +1578,7 @@ sub accept_connections {
>  		my ($pfamily, $pport, $phost) = PVE::Tools::unpack_sockaddr_in46($sin);
>  		($reqstate->{peer_port}, $reqstate->{peer_host}) = ($pport,  Socket::inet_ntop($pfamily, $phost));
>  	    } else {
> +		print "$$: ACCEPT connection: getpeername failed: $!\n" if $self->{debug};
>  		shutdown($clientfh, 1);
>  		next;
>  	    }
> @@ -1619,6 +1627,7 @@ sub accept_connections {
>  
>      if (my $err = $@) {
>  	syslog('err', $err);
> +	print "$$: ACCEPT connection error: $err\n" if $self->{debug};
>  	shutdown($clientfh, 1) if $early_err;
>  	$self->{end_loop} = 1;
>      }
> 

can we use a helper method for the printing? Something along:

my sub dprintln {
    my ($line) = @_;
    return if !$self->{debug};
    print "worker[$$]: $line\n";
}

could maybe drop the worker prefix, but I'd say that all processes accepting connections
can be classified as workers..

If you want to get real fancy (I mean, for perl ;) you could use caller[0] to get some call
context to print.

    my ($pkg, $pkgfile, $line, $sub) = caller(1);

    print "worker[$$]: $sub: $line\n";

or even

    print "worker[$$]: $pkg +$line: $sub: $line\n";

We could later put such a helper in PVE::Tools and add to EXPORT_OK, could be helpful,
may want to add some bells n' whistels (control flags?) then - but that's out of scope
for this series.

[0]: https://perldoc.perl.org/functions/caller





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pve-devel] [PATCH http-server 1/3] accept-phase: fix conn_count "leak"
  2020-12-03 18:43 ` [pve-devel] [PATCH http-server 1/3] accept-phase: fix conn_count "leak" Stoiko Ivanov
@ 2020-12-04  7:41   ` Thomas Lamprecht
  0 siblings, 0 replies; 6+ messages in thread
From: Thomas Lamprecht @ 2020-12-04  7:41 UTC (permalink / raw)
  To: Proxmox VE development discussion, Stoiko Ivanov

On 03.12.20 19:43, Stoiko Ivanov wrote:
> When handling new connections in 'accept_connections' the number of
> active connections got increased before the AnyEvent::Handle
> registered the callback which would decrement it on error.
> 
> Any error/die beforehand would skip the decrement, and leave the
> process in an endless loop upon exiting in wait_end_loop.
> 
> This can happen e.g. when the call to getpeername fails, or if the
> connection is denied by the ALLOW_FROM/DENY_FROM settings in
> '/etc/default/pveproxy' (which is also the simplest reproducer for
> that).
> 
> Additionally it can cause a denial of service, by attempting to
> connect from a denied ip until the connection count exeeds the maximum
> connections of all child-processes.
> 
> Reported via our community-forum:
> https://forum.proxmox.com/threads/pveproxy-eats-available-ram.79617/
> 
> Co-Authored-by: Dominik Csapak <d.csapak@proxmox.com>
> Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
> ---
>  PVE/APIServer/AnyEvent.pm | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/PVE/APIServer/AnyEvent.pm b/PVE/APIServer/AnyEvent.pm
> index c55da7f..c5f5fdc 100644
> --- a/PVE/APIServer/AnyEvent.pm
> +++ b/PVE/APIServer/AnyEvent.pm
> @@ -1479,8 +1479,6 @@ sub accept {
>  
>      fh_nonblocking $clientfh, 1;
>  
> -    $self->{conn_count}++;
> -
>      return $clientfh;
>  }
>  
> @@ -1561,7 +1559,7 @@ sub accept_connections {
>  	    my $reqstate = { keep_alive => $self->{keep_alive} };
>  
>  	    # stop keep-alive when there are many open connections
> -	    if ($self->{conn_count} >= $self->{max_conn_soft_limit}) {
> +	    if ($self->{conn_count}+1 >= $self->{max_conn_soft_limit}) {

style nit: don't glue operators together `self->{conn_count} + 1`

>  		$reqstate->{keep_alive} = 0;
>  	    }
>  
> @@ -1600,6 +1598,9 @@ sub accept_connections {
>  		},
>  		($self->{tls_ctx} ? (tls => "accept", tls_ctx => $self->{tls_ctx}) : ()));
>  
> +	    $self->{conn_count}++;
> +

But isn't this wrong too? The FH could already get a EOF here, and thus get reduced
before increased - one could maybe argue "well it should get increased again after,
here, so brought in sync again", i.e.:

1. Get's registered
                                            2. clientfh EOF
                                                -> $self->client_do_disconnect($reqstate);
                                                -> $self->{conn_count}--;

! Wrong counter here, could lead to possible wrong decisions now already (not checked
for sure) or when adding/changing something (as this is non-obvious, not even a comment
hinting it!)

3. resume here, brought in sync again, reminds me of a short comic strip I recently
   run into [0].

So between 2. and 3. we are in limbo, while short it still matters, every race triggers
sooner or later, computers are just to fast and scheduling to nondeterministic for that
to not happen.

Why not move the $self->{conn_count}++; before AnyEvent Handle instance is created,
i.e., where we do $early_err = 0; as this effectively is the barrier for the connection
being valid or not. We could also add handling for when the handle creation itself fails,
setting a flag afterwards and checking both, that flag and $early_err in the existing
error handling branch outside of the eval, and decrement in that case.


Or, do you have some documented behavior, not stated here in the commit, that this all
just cannot happen at all?

[0]: https://i.redd.it/m4zbw3u7rbk21.jpg

> +
>  	    print "$$: ACCEPT FH" .  $clientfh->fileno() . " CONN$self->{conn_count}\n" if $self->{debug};
>  
>  	    $self->push_request_header($reqstate);
> 






^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-12-04  7:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-03 18:43 [pve-devel] [PATCH http-server 0/3] improve error handling in accept_connections Stoiko Ivanov
2020-12-03 18:43 ` [pve-devel] [PATCH http-server 1/3] accept-phase: fix conn_count "leak" Stoiko Ivanov
2020-12-04  7:41   ` Thomas Lamprecht
2020-12-03 18:43 ` [pve-devel] [PATCH http-server 2/3] accept-phase: shutdown socket on early error Stoiko Ivanov
2020-12-03 18:43 ` [pve-devel] [PATCH http-server 3/3] add debug log for problems during accept Stoiko Ivanov
2020-12-04  6:37   ` Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal