* [pve-devel] [PATCH http-server 0/3] improve error handling in accept_connections @ 2020-12-03 18:43 Stoiko Ivanov 2020-12-03 18:43 ` [pve-devel] [PATCH http-server 1/3] accept-phase: fix conn_count "leak" Stoiko Ivanov ` (2 more replies) 0 siblings, 3 replies; 6+ messages in thread From: Stoiko Ivanov @ 2020-12-03 18:43 UTC (permalink / raw) To: pve-devel This patchset is the result of investigating a report in our community forum: https://forum.proxmox.com/threads/pveproxy-eats-available-ram.79617/ The first patch fixes an issue where pveproxy worker processes would never exit (and eat quite a bit of ram+cpu) when 'getpeername' returned an error. The second seemed to me like a sensible further cleanup, and the third patch will hopefully provide the needed information when debugging such things in the future. Huge thanks to Dominik, who analyzed this issue with me! Stoiko Ivanov (3): accept-phase: fix conn_count "leak" accept-phase: shutdown socket on early error add debug log for problems during accept PVE/APIServer/AnyEvent.pm | 29 +++++++++++++++++++++++++---- 1 file changed, 25 insertions(+), 4 deletions(-) -- 2.20.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [pve-devel] [PATCH http-server 1/3] accept-phase: fix conn_count "leak" 2020-12-03 18:43 [pve-devel] [PATCH http-server 0/3] improve error handling in accept_connections Stoiko Ivanov @ 2020-12-03 18:43 ` Stoiko Ivanov 2020-12-04 7:41 ` Thomas Lamprecht 2020-12-03 18:43 ` [pve-devel] [PATCH http-server 2/3] accept-phase: shutdown socket on early error Stoiko Ivanov 2020-12-03 18:43 ` [pve-devel] [PATCH http-server 3/3] add debug log for problems during accept Stoiko Ivanov 2 siblings, 1 reply; 6+ messages in thread From: Stoiko Ivanov @ 2020-12-03 18:43 UTC (permalink / raw) To: pve-devel When handling new connections in 'accept_connections' the number of active connections got increased before the AnyEvent::Handle registered the callback which would decrement it on error. Any error/die beforehand would skip the decrement, and leave the process in an endless loop upon exiting in wait_end_loop. This can happen e.g. when the call to getpeername fails, or if the connection is denied by the ALLOW_FROM/DENY_FROM settings in '/etc/default/pveproxy' (which is also the simplest reproducer for that). Additionally it can cause a denial of service, by attempting to connect from a denied ip until the connection count exeeds the maximum connections of all child-processes. Reported via our community-forum: https://forum.proxmox.com/threads/pveproxy-eats-available-ram.79617/ Co-Authored-by: Dominik Csapak <d.csapak@proxmox.com> Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com> --- PVE/APIServer/AnyEvent.pm | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/PVE/APIServer/AnyEvent.pm b/PVE/APIServer/AnyEvent.pm index c55da7f..c5f5fdc 100644 --- a/PVE/APIServer/AnyEvent.pm +++ b/PVE/APIServer/AnyEvent.pm @@ -1479,8 +1479,6 @@ sub accept { fh_nonblocking $clientfh, 1; - $self->{conn_count}++; - return $clientfh; } @@ -1561,7 +1559,7 @@ sub accept_connections { my $reqstate = { keep_alive => $self->{keep_alive} }; # stop keep-alive when there are many open connections - if ($self->{conn_count} >= $self->{max_conn_soft_limit}) { + if ($self->{conn_count}+1 >= $self->{max_conn_soft_limit}) { $reqstate->{keep_alive} = 0; } @@ -1600,6 +1598,9 @@ sub accept_connections { }, ($self->{tls_ctx} ? (tls => "accept", tls_ctx => $self->{tls_ctx}) : ())); + $self->{conn_count}++; + + print "$$: ACCEPT FH" . $clientfh->fileno() . " CONN$self->{conn_count}\n" if $self->{debug}; $self->push_request_header($reqstate); -- 2.20.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [pve-devel] [PATCH http-server 1/3] accept-phase: fix conn_count "leak" 2020-12-03 18:43 ` [pve-devel] [PATCH http-server 1/3] accept-phase: fix conn_count "leak" Stoiko Ivanov @ 2020-12-04 7:41 ` Thomas Lamprecht 0 siblings, 0 replies; 6+ messages in thread From: Thomas Lamprecht @ 2020-12-04 7:41 UTC (permalink / raw) To: Proxmox VE development discussion, Stoiko Ivanov On 03.12.20 19:43, Stoiko Ivanov wrote: > When handling new connections in 'accept_connections' the number of > active connections got increased before the AnyEvent::Handle > registered the callback which would decrement it on error. > > Any error/die beforehand would skip the decrement, and leave the > process in an endless loop upon exiting in wait_end_loop. > > This can happen e.g. when the call to getpeername fails, or if the > connection is denied by the ALLOW_FROM/DENY_FROM settings in > '/etc/default/pveproxy' (which is also the simplest reproducer for > that). > > Additionally it can cause a denial of service, by attempting to > connect from a denied ip until the connection count exeeds the maximum > connections of all child-processes. > > Reported via our community-forum: > https://forum.proxmox.com/threads/pveproxy-eats-available-ram.79617/ > > Co-Authored-by: Dominik Csapak <d.csapak@proxmox.com> > Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com> > --- > PVE/APIServer/AnyEvent.pm | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/PVE/APIServer/AnyEvent.pm b/PVE/APIServer/AnyEvent.pm > index c55da7f..c5f5fdc 100644 > --- a/PVE/APIServer/AnyEvent.pm > +++ b/PVE/APIServer/AnyEvent.pm > @@ -1479,8 +1479,6 @@ sub accept { > > fh_nonblocking $clientfh, 1; > > - $self->{conn_count}++; > - > return $clientfh; > } > > @@ -1561,7 +1559,7 @@ sub accept_connections { > my $reqstate = { keep_alive => $self->{keep_alive} }; > > # stop keep-alive when there are many open connections > - if ($self->{conn_count} >= $self->{max_conn_soft_limit}) { > + if ($self->{conn_count}+1 >= $self->{max_conn_soft_limit}) { style nit: don't glue operators together `self->{conn_count} + 1` > $reqstate->{keep_alive} = 0; > } > > @@ -1600,6 +1598,9 @@ sub accept_connections { > }, > ($self->{tls_ctx} ? (tls => "accept", tls_ctx => $self->{tls_ctx}) : ())); > > + $self->{conn_count}++; > + But isn't this wrong too? The FH could already get a EOF here, and thus get reduced before increased - one could maybe argue "well it should get increased again after, here, so brought in sync again", i.e.: 1. Get's registered 2. clientfh EOF -> $self->client_do_disconnect($reqstate); -> $self->{conn_count}--; ! Wrong counter here, could lead to possible wrong decisions now already (not checked for sure) or when adding/changing something (as this is non-obvious, not even a comment hinting it!) 3. resume here, brought in sync again, reminds me of a short comic strip I recently run into [0]. So between 2. and 3. we are in limbo, while short it still matters, every race triggers sooner or later, computers are just to fast and scheduling to nondeterministic for that to not happen. Why not move the $self->{conn_count}++; before AnyEvent Handle instance is created, i.e., where we do $early_err = 0; as this effectively is the barrier for the connection being valid or not. We could also add handling for when the handle creation itself fails, setting a flag afterwards and checking both, that flag and $early_err in the existing error handling branch outside of the eval, and decrement in that case. Or, do you have some documented behavior, not stated here in the commit, that this all just cannot happen at all? [0]: https://i.redd.it/m4zbw3u7rbk21.jpg > + > print "$$: ACCEPT FH" . $clientfh->fileno() . " CONN$self->{conn_count}\n" if $self->{debug}; > > $self->push_request_header($reqstate); > ^ permalink raw reply [flat|nested] 6+ messages in thread
* [pve-devel] [PATCH http-server 2/3] accept-phase: shutdown socket on early error 2020-12-03 18:43 [pve-devel] [PATCH http-server 0/3] improve error handling in accept_connections Stoiko Ivanov 2020-12-03 18:43 ` [pve-devel] [PATCH http-server 1/3] accept-phase: fix conn_count "leak" Stoiko Ivanov @ 2020-12-03 18:43 ` Stoiko Ivanov 2020-12-03 18:43 ` [pve-devel] [PATCH http-server 3/3] add debug log for problems during accept Stoiko Ivanov 2 siblings, 0 replies; 6+ messages in thread From: Stoiko Ivanov @ 2020-12-03 18:43 UTC (permalink / raw) To: pve-devel if an error happens before AnyEvent::Handle registers the cleanup callback, we should shutdown the socket, when handling it. Co-Authored-by: Dominik Csapak <d.csapak@proxmox.com> Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com> --- PVE/APIServer/AnyEvent.pm | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/PVE/APIServer/AnyEvent.pm b/PVE/APIServer/AnyEvent.pm index c5f5fdc..7038b07 100644 --- a/PVE/APIServer/AnyEvent.pm +++ b/PVE/APIServer/AnyEvent.pm @@ -1552,9 +1552,13 @@ sub check_host_access { sub accept_connections { my ($self) = @_; + my ($clientfh, $early_err); eval { - while (my $clientfh = $self->accept()) { + while (1) { + $early_err = 1; + $clientfh = $self->accept(); + last if !$clientfh; my $reqstate = { keep_alive => $self->{keep_alive} }; @@ -1566,15 +1570,21 @@ sub accept_connections { if (my $sin = getpeername($clientfh)) { my ($pfamily, $pport, $phost) = PVE::Tools::unpack_sockaddr_in46($sin); ($reqstate->{peer_port}, $reqstate->{peer_host}) = ($pport, Socket::inet_ntop($pfamily, $phost)); + } else { + shutdown($clientfh, 1); + next; } if (!$self->{trusted_env} && !$self->check_host_access($reqstate->{peer_host})) { print "$$: ABORT request from $reqstate->{peer_host} - access denied\n" if $self->{debug}; $reqstate->{log}->{code} = 403; $self->log_request($reqstate); + shutdown($clientfh, 1); next; } + $early_err = 0; + $reqstate->{hdl} = AnyEvent::Handle->new( fh => $clientfh, rbuf_max => 64*1024, @@ -1609,6 +1619,7 @@ sub accept_connections { if (my $err = $@) { syslog('err', $err); + shutdown($clientfh, 1) if $early_err; $self->{end_loop} = 1; } -- 2.20.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [pve-devel] [PATCH http-server 3/3] add debug log for problems during accept 2020-12-03 18:43 [pve-devel] [PATCH http-server 0/3] improve error handling in accept_connections Stoiko Ivanov 2020-12-03 18:43 ` [pve-devel] [PATCH http-server 1/3] accept-phase: fix conn_count "leak" Stoiko Ivanov 2020-12-03 18:43 ` [pve-devel] [PATCH http-server 2/3] accept-phase: shutdown socket on early error Stoiko Ivanov @ 2020-12-03 18:43 ` Stoiko Ivanov 2020-12-04 6:37 ` Thomas Lamprecht 2 siblings, 1 reply; 6+ messages in thread From: Stoiko Ivanov @ 2020-12-03 18:43 UTC (permalink / raw) To: pve-devel Co-Authored-by: Dominik Csapak <d.csapak@proxmox.com> Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com> --- PVE/APIServer/AnyEvent.pm | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/PVE/APIServer/AnyEvent.pm b/PVE/APIServer/AnyEvent.pm index 7038b07..d33f6b0 100644 --- a/PVE/APIServer/AnyEvent.pm +++ b/PVE/APIServer/AnyEvent.pm @@ -1520,6 +1520,11 @@ sub check_host_access { my $cip = Net::IP->new($clientip); + if (!$cip) { + print "$$: check_host_access: clientip not parsable: $@\n" if $self->{debug}; + return 0; + } + my $match_allow = 0; my $match_deny = 0; @@ -1527,6 +1532,7 @@ sub check_host_access { foreach my $t (@{$self->{allow_from}}) { if ($t->overlaps($cip)) { $match_allow = 1; + print "$$: check_host_access: clientip allowed: ". $t->prefix() . "\n" if $self->{debug}; last; } } @@ -1535,6 +1541,7 @@ sub check_host_access { if ($self->{deny_from}) { foreach my $t (@{$self->{deny_from}}) { if ($t->overlaps($cip)) { + print "$$: check_host_access: clientip denied: ". $t->prefix() . "\n" if $self->{debug}; $match_deny = 1; last; } @@ -1571,6 +1578,7 @@ sub accept_connections { my ($pfamily, $pport, $phost) = PVE::Tools::unpack_sockaddr_in46($sin); ($reqstate->{peer_port}, $reqstate->{peer_host}) = ($pport, Socket::inet_ntop($pfamily, $phost)); } else { + print "$$: ACCEPT connection: getpeername failed: $!\n" if $self->{debug}; shutdown($clientfh, 1); next; } @@ -1619,6 +1627,7 @@ sub accept_connections { if (my $err = $@) { syslog('err', $err); + print "$$: ACCEPT connection error: $err\n" if $self->{debug}; shutdown($clientfh, 1) if $early_err; $self->{end_loop} = 1; } -- 2.20.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [pve-devel] [PATCH http-server 3/3] add debug log for problems during accept 2020-12-03 18:43 ` [pve-devel] [PATCH http-server 3/3] add debug log for problems during accept Stoiko Ivanov @ 2020-12-04 6:37 ` Thomas Lamprecht 0 siblings, 0 replies; 6+ messages in thread From: Thomas Lamprecht @ 2020-12-04 6:37 UTC (permalink / raw) To: Proxmox VE development discussion, Stoiko Ivanov On 03.12.20 19:43, Stoiko Ivanov wrote: > Co-Authored-by: Dominik Csapak <d.csapak@proxmox.com> > Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com> > --- > PVE/APIServer/AnyEvent.pm | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/PVE/APIServer/AnyEvent.pm b/PVE/APIServer/AnyEvent.pm > index 7038b07..d33f6b0 100644 > --- a/PVE/APIServer/AnyEvent.pm > +++ b/PVE/APIServer/AnyEvent.pm > @@ -1520,6 +1520,11 @@ sub check_host_access { > > my $cip = Net::IP->new($clientip); > > + if (!$cip) { > + print "$$: check_host_access: clientip not parsable: $@\n" if $self->{debug}; I'd avoid using variable names 1:1 in log messages for such things, use "client IP" (for not hard coding sub name see below) > + return 0; > + } > + > my $match_allow = 0; > my $match_deny = 0; > > @@ -1527,6 +1532,7 @@ sub check_host_access { > foreach my $t (@{$self->{allow_from}}) { > if ($t->overlaps($cip)) { > $match_allow = 1; > + print "$$: check_host_access: clientip allowed: ". $t->prefix() . "\n" if $self->{debug}; > last; > } > } > @@ -1535,6 +1541,7 @@ sub check_host_access { > if ($self->{deny_from}) { > foreach my $t (@{$self->{deny_from}}) { > if ($t->overlaps($cip)) { > + print "$$: check_host_access: clientip denied: ". $t->prefix() . "\n" if $self->{debug}; > $match_deny = 1; > last; > } > @@ -1571,6 +1578,7 @@ sub accept_connections { > my ($pfamily, $pport, $phost) = PVE::Tools::unpack_sockaddr_in46($sin); > ($reqstate->{peer_port}, $reqstate->{peer_host}) = ($pport, Socket::inet_ntop($pfamily, $phost)); > } else { > + print "$$: ACCEPT connection: getpeername failed: $!\n" if $self->{debug}; > shutdown($clientfh, 1); > next; > } > @@ -1619,6 +1627,7 @@ sub accept_connections { > > if (my $err = $@) { > syslog('err', $err); > + print "$$: ACCEPT connection error: $err\n" if $self->{debug}; > shutdown($clientfh, 1) if $early_err; > $self->{end_loop} = 1; > } > can we use a helper method for the printing? Something along: my sub dprintln { my ($line) = @_; return if !$self->{debug}; print "worker[$$]: $line\n"; } could maybe drop the worker prefix, but I'd say that all processes accepting connections can be classified as workers.. If you want to get real fancy (I mean, for perl ;) you could use caller[0] to get some call context to print. my ($pkg, $pkgfile, $line, $sub) = caller(1); print "worker[$$]: $sub: $line\n"; or even print "worker[$$]: $pkg +$line: $sub: $line\n"; We could later put such a helper in PVE::Tools and add to EXPORT_OK, could be helpful, may want to add some bells n' whistels (control flags?) then - but that's out of scope for this series. [0]: https://perldoc.perl.org/functions/caller ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-12-04 7:41 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-12-03 18:43 [pve-devel] [PATCH http-server 0/3] improve error handling in accept_connections Stoiko Ivanov 2020-12-03 18:43 ` [pve-devel] [PATCH http-server 1/3] accept-phase: fix conn_count "leak" Stoiko Ivanov 2020-12-04 7:41 ` Thomas Lamprecht 2020-12-03 18:43 ` [pve-devel] [PATCH http-server 2/3] accept-phase: shutdown socket on early error Stoiko Ivanov 2020-12-03 18:43 ` [pve-devel] [PATCH http-server 3/3] add debug log for problems during accept Stoiko Ivanov 2020-12-04 6:37 ` Thomas Lamprecht
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox