From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <t.lamprecht@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 6048E69C1E
 for <pve-devel@lists.proxmox.com>; Mon,  7 Dec 2020 11:29:33 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 519BA22FCA
 for <pve-devel@lists.proxmox.com>; Mon,  7 Dec 2020 11:29:03 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [212.186.127.180])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id 8FDB622FBE
 for <pve-devel@lists.proxmox.com>; Mon,  7 Dec 2020 11:29:01 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 52B7344E4B
 for <pve-devel@lists.proxmox.com>; Mon,  7 Dec 2020 11:29:01 +0100 (CET)
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
 Stoiko Ivanov <s.ivanov@proxmox.com>
References: <20201204175629.30116-1-s.ivanov@proxmox.com>
 <20201204175629.30116-3-s.ivanov@proxmox.com>
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
Message-ID: <cd47afa9-4c6d-8476-02f9-f99e0765bf34@proxmox.com>
Date: Mon, 7 Dec 2020 11:28:59 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:84.0) Gecko/20100101
 Thunderbird/84.0
MIME-Version: 1.0
In-Reply-To: <20201204175629.30116-3-s.ivanov@proxmox.com>
Content-Type: text/plain; charset=UTF-8
Content-Language: en-US
Content-Transfer-Encoding: quoted-printable
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.070 Adjusted score from AWL reputation of From: address
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 NICE_REPLY_A           -0.001 Looks like a legit reply (A)
 RCVD_IN_DNSWL_MED        -2.3 Sender listed at https://www.dnswl.org/,
 medium trust
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pve-devel] [PATCH http-server v2 2/5] accept-phase: fix
 conn_count "leak"
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Mon, 07 Dec 2020 10:29:33 -0000

On 04.12.20 18:56, Stoiko Ivanov wrote:
> When handling new connections in 'accept_connections' the number of
> active connections got increased before the AnyEvent::Handle
> registered the callback which would decrement it on error/eof.
>=20
> Any error/die beforehand would skip the decrement, and leave the
> process in an endless loop upon exiting in wait_end_loop.
>=20
> This can happen e.g. when the call to getpeername fails, or if the
> connection is denied by the ALLOW_FROM/DENY_FROM settings in
> '/etc/default/pveproxy' (which is also a simple reproducer for
> that).
>=20
> Additionally it can cause a denial of service, by attempting to
> connect from a denied ip until the connection count exeeds the maximum
> connections of all child-processes.
>=20
> Should the connection count become negative a warning is logged in both=

> places where we decrement it, in case of a failed AnyEvent::Handle->new=
(),
> the count is not decremented if this would happen.
>=20
> Reported via our community-forum:
> https://forum.proxmox.com/threads/pveproxy-eats-available-ram.79617/
>=20
> Co-Authored-by: Dominik Csapak <d.csapak@proxmox.com>
> Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
> ---
>  PVE/APIServer/AnyEvent.pm | 22 +++++++++++++++++++---
>  1 file changed, 19 insertions(+), 3 deletions(-)
>=20
> diff --git a/PVE/APIServer/AnyEvent.pm b/PVE/APIServer/AnyEvent.pm
> index 7916bdd..af2fde8 100644
> --- a/PVE/APIServer/AnyEvent.pm
> +++ b/PVE/APIServer/AnyEvent.pm
> @@ -157,6 +157,11 @@ sub client_do_disconnect {
> =20
>      &$shutdown_hdl($hdl);
> =20
> +    if ($self->{conn_count} <=3D 0) {
> +	my $msg =3D "connection count <=3D 0!\n";
> +	syslog('warn', $msg);
> +	$self->dprint("warn: $msg");

see below, at the end, for a comment about doing both, debug print and sy=
slog.

> +    }
>      $self->{conn_count}--;
> =20
>      print "$$: CLOSE FH" .  $hdl->{fh}->fileno() . " CONN$self->{conn_=
count}\n" if $self->{debug};
> @@ -1489,8 +1494,6 @@ sub accept {
> =20
>      fh_nonblocking $clientfh, 1;
> =20
> -    $self->{conn_count}++;
> -
>      return $clientfh;
>  }
> =20
> @@ -1564,6 +1567,7 @@ sub check_host_access {
>  sub accept_connections {
>      my ($self) =3D @_;
> =20
> +    my $hdl_err;

can we please avoid adding to that mess of arbitrary shortened variable n=
ames, at
least for new ones?

Also, this suggest a bit like it would denote if there's an handle error,=
 but what
it actually tells is, if handle creation point got reached - naming shoul=
d not be bound
to much to one use case.

Maybe:
my $handle_creation =3D 0;

or rather, just reuse $early_err (which has similar naming issues, but is=
 pre-existing),
the $self->push_request_header is doing everything in a eval + warn only =
scope anyway.

>      eval {
> =20
>  	while (my $clientfh =3D $self->accept()) {
> @@ -1571,7 +1575,7 @@ sub accept_connections {
>  	    my $reqstate =3D { keep_alive =3D> $self->{keep_alive} };
> =20
>  	    # stop keep-alive when there are many open connections
> -	    if ($self->{conn_count} >=3D $self->{max_conn_soft_limit}) {
> +	    if ($self->{conn_count} + 1 >=3D $self->{max_conn_soft_limit}) {
>  		$reqstate->{keep_alive} =3D 0;
>  	    }
> =20
> @@ -1587,6 +1591,8 @@ sub accept_connections {
>  		next;
>  	    }
> =20
> +	    $hdl_err =3D 1;
> +	    $self->{conn_count}++;

still no comment whatsoever, that is unacceptable to me for such subtle t=
hing, especially
paired with the commit message, which still mentions that increasing the =
count before
registering the callback handle (which we still do, for good reasons), ma=
kes it
confusing.

Please note that we explicitly increment here, not after calling ->new, b=
ecause creating the
handle below starts off reading from the fh immediately and can trigger a=
t least an on_error
(and possible other) callback.

>  	    $reqstate->{hdl} =3D AnyEvent::Handle->new(
>  		fh =3D> $clientfh,
>  		rbuf_max =3D> 64*1024,
> @@ -1609,6 +1615,7 @@ sub accept_connections {
>  		    if (my $err =3D $@) { syslog('err', "$err"); }
>  		},
>  		($self->{tls_ctx} ? (tls =3D> "accept", tls_ctx =3D> $self->{tls_ctx=
}) : ()));
> +	    $hdl_err =3D 0;
> =20
>  	    print "$$: ACCEPT FH" .  $clientfh->fileno() . " CONN$self->{conn=
_count}\n" if $self->{debug};
> =20
> @@ -1618,6 +1625,15 @@ sub accept_connections {
> =20
>      if (my $err =3D $@) {
>  	syslog('err', $err);
> +	if ($hdl_err) {
> +	    if ($self->{conn_count} <=3D 0) {
> +		my $msg =3D "connection count <=3D 0 not decrementing!\n";
> +		syslog('warn', $msg);
> +		$self->dprint("warn: $msg");

If we already log to syslog, isn't that enough to have it visible?

Or could a single `warn` call do the "correct thing", i.e., stderr so to
syslog if running as systemd service and to console if running in foregro=
und?

Not too hard feelings here though.

> +	    } else {
> +		$self->{conn_count}--;
> +	    }
> +	}
>  	$self->{end_loop} =3D 1;
>      }
> =20
>=20