From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 655829521D for ; Tue, 28 Feb 2023 15:06:00 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 481867AFB for ; Tue, 28 Feb 2023 15:06:00 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Tue, 28 Feb 2023 15:05:57 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id A896948B7B for ; Tue, 28 Feb 2023 15:05:57 +0100 (CET) Date: Tue, 28 Feb 2023 15:05:48 +0100 From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= To: Proxmox VE development discussion References: <20230217152517.84874-1-m.carrara@proxmox.com> <20230217152517.84874-2-m.carrara@proxmox.com> In-Reply-To: <20230217152517.84874-2-m.carrara@proxmox.com> MIME-Version: 1.0 User-Agent: astroid/0.16.0 (https://github.com/astroidmail/astroid) Message-Id: <1677587548.c2g5w4ayp6.astroid@yuna.none> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-SPAM-LEVEL: Spam detection results: 0 AWL 0.125 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com, anyevent.pm] Subject: Re: [pve-devel] [PATCH http-server 1/2] http-server: anyevent: add new subroutine for processing requests X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Feb 2023 14:06:00 -0000 On February 17, 2023 4:25 pm, Max Carrara wrote: > The `begin_process_next_request` subroutine and its auxiliary > subroutines are added to serve as a replacement for > `push_request_header` and `unshift_read_header` in order to simplify > the parsing logic. The entire header is read by a single callback > in `begin_process_next_request` and then processed step-by-step > instead of reading each line and handling it separately. > This is done in order to separate reading the incoming header from > the processing algorithm's logic while simultaneously making the > steps used to process the header more transparent and explicit. >=20 > This comes with one noteworthy change in behaviour: > Because the header is now read all at once from the input buffer, > the header field count limit is only checked for after the entire > header was already read. In `unshift_read_header` the limit is > checked for each line that is read. >=20 > The behaviour otherwise remains fully identical. >=20 > Signed-off-by: Max Carrara Like Thomas already remarked - splitting these two patches doesn't make muc= h sense, since it's still needed to look at the total diff across both to see= what has changed.. I started going through the changes below and noticed some stuff, but quick= ly came to the conclusion that we actually gain very little from switching to parsing the full read buffer in one go instead of by line - in the end, the actual parsing still happens by line - unshift_read_header already contains the state machine handling all of th= e steps except for parsing the very first line (and the state is rather trivial/serial anyway) - adding 'no TLS, respond with redirect' logic there should be a few lines = of code OTOH, switching to non-line-based parsing here requires careful analysis of interactions with AnyEvent internals and buffer sizes with potential for su= btle breakage (e.g., how does this fare under load). it would still be possible to factor out the post-processing and authentication/handling part, but that doesn't require turning the header parsing onto its head.. unshift_read_header could simply call a post_proces= sing and a authenticate_and_handle_request sub instead of in-lining all that.. > --- > src/PVE/APIServer/AnyEvent.pm | 492 ++++++++++++++++++++++++++++++++++ > 1 file changed, 492 insertions(+) >=20 > Note: The subroutines in `@process_steps` of > `begin_process_next_request` mimic the steps the original algorithm > takes. If necessary, these can of course be split up into smaller > steps. >=20 > diff --git a/src/PVE/APIServer/AnyEvent.pm b/src/PVE/APIServer/AnyEvent.p= m > index 3cd77fa..bf7957a 100644 > --- a/src/PVE/APIServer/AnyEvent.pm > +++ b/src/PVE/APIServer/AnyEvent.pm > @@ -1570,6 +1570,498 @@ sub push_request_header { > warn $@ if $@; > } > =20 > +sub begin_process_next_request { > + my ($self, $reqstate) =3D @_; > + > + # Match for *end* of HTTP header fields > + my $re_accept =3D qr<\r?\n\r?\n>; > + > + # Don't reject anything - rejection causes EBADMSG otherwise, aborti= ng the callback > + # and might match for unexpected things (such as TLS handshake data!= ) > + my $re_reject =3D undef; > + > + # Skip over all header fields, but don't match for CRs and NLs > + # as those are required for re_accept above > + my $re_skip =3D qr<^.*[^\r\n]>; > + > + eval { > + $reqstate->{hdl}->push_read(regex =3D> $re_accept, $re_reject, $re_skip= , sub { > + my ($handle, $data) =3D @_; > + > + # $data now contains the entire (raw) HTTP header > + $self->dprint("parse_request_header: handle: $handle [$data]"); > + > + eval { > + $reqstate->{keep_alive}--; > + > + my $http_header =3D $data =3D~ s/^\s+|\s+$//gr ; why the 'r' modifier? is this the replacement for removing empty lines at t= he start of the request - if so, it lost the comment that states this reason a= nd the RE is unnecessarily opaque. > + my ($http_method_line, $http_header_field_lines) =3D split(/\r?\n/, $h= ttp_header, 2); so now we are splitting by line again > + > + $reqstate->{http_method_line} =3D $http_method_line; > + $reqstate->{http_header_field_lines} =3D $http_header_field_lines; > + > + # Array of references to the subroutines used to process the request > + my @process_steps =3D \( > + &parse_request_header_method, and this one here will parse just the first line > + &parse_request_header_fields, and this one will parse by line again.. > + &postprocess_header_fields, > + &authenticate_and_handle_request, I can actually see the benefit of having these two split out since they act= ually only operate on the parsed data - see top-level comment above. > + ); > + > + for my $step (@process_steps) { > + my $result =3D $step->($self, $reqstate); > + > + my $success =3D $result->{success} || 0; > + if (!$success) { > + $self->make_response_from_unsuccessful_result($reqstate, $result); > + return; > + } > + } > + > + }; > + > + $self->dprint("parse_request_header: handle $handle DONE"); > + warn $@ if $@; > + > + }); > + }; > + warn $@ if $@; > +} > + > +sub make_response_from_unsuccessful_result { > + my ($self, $reqstate, $result) =3D @_; > + > + die "No HTTP response status code provided\n" > + if !$result->{http_code}; > + > + eval { > + my $response_object =3D HTTP::Response->new( > + $result->{http_code}, > + $result->{http_message}, > + $result->{http_header}, > + $result->{http_content} > + ); > + > + $self->dprint("Sending response: " . $result->{http_code} . " " . $resu= lt->{http_message}); > + > + $self->response( > + $reqstate, > + $response_object, > + $result->{response_mtime}, > + $result->{response_nocomp}, > + $result->{response_delay}, > + $result->{response_stream_fh} > + ); > + > + $self->client_do_disconnect($reqstate); > + }; > + > + warn $@ if $@; > +} > + > +sub parse_request_header_method { > + my ($self, $reqstate) =3D @_; > + > + my $http_method_line =3D $reqstate->{http_method_line}; > + die "http_method_line not within reqstate\n" > + if !$http_method_line; > + > + my $re_http_method =3D qr<^(\S+)\040(\S+)\040HTTP\/(\d+)\.(\d+)>; this RE is different from the original one: qr<> vs =3D~ //o - I don't think the 'o' does anything meaningful here, so dropping is proba= bly okay - the '/' that is escaped in the expression doesn't make sense now, since '= /' is not the delimiter anymore > + > + $self->dprint("Processing initial header line: $http_method_line"); additional debug print, so it's not behaving identical as described by the commit message ;) > + > + if ($http_method_line =3D~ $re_http_method) { > + my ($method, $uri, $major, $minor) =3D ($1, $2, $3, $4); > + > + if ($major !=3D 1) { > + return { > + success =3D> 0, > + http_code =3D> 506, > + http_message =3D> "HTTP protocol version $major.$minor not supported", > + }; > + } > + > + # if an '@' comes before the first slash, proxy forwarding might consid= er > + # the frist part of the uri to be part of an authority > + if ($uri =3D~ m|^[^/]*@|) { > + return { > + success =3D> 0, > + http_code =3D> 400, > + http_message =3D> 'invalid uri', > + }; > + } > + this used to also save the request line (the line being parsed here) for lo= gging purposes. so /var/log/pveproxy/access.log now looks like this: ::ffff:192.168.X.Y - user@realm [28/02/2023:13:49:46 +0100] "-" 200 1024 ::ffff:192.168.X.Y - user@realm [28/02/2023:13:49:47 +0100] "-" 200 1352 ::ffff:192.168.X.Y - user@realm [28/02/2023:13:49:48 +0100] "-" 200 73 ::ffff:192.168.X.Y - user@realm [28/02/2023:13:49:50 +0100] "-" 200 1028 ::ffff:192.168.X.Y - user@realm [28/02/2023:13:49:51 +0100] "-" 200 1360 ::ffff:192.168.X.Y - user@realm [28/02/2023:13:49:52 +0100] "-" 200 73 which is not very helpful ;) > + $reqstate->{proto}->{str} =3D "HTTP/$major.$minor"; > + $reqstate->{proto}->{maj} =3D $major; > + $reqstate->{proto}->{min} =3D $minor; > + $reqstate->{proto}->{ver} =3D $major*1000 + $minor; > + > + $reqstate->{request} =3D HTTP::Request->new($method, $uri); > + $reqstate->{starttime} =3D [gettimeofday]; > + > + # Only requests with valid HTTP methods are counted > + $self->{request_count}++; > + > + if ($self->{request_count} >=3D $self->{max_requests}) { > + $self->{end_loop} =3D 1; > + } > + > + return { success =3D> 1 }; > + } > + > + $self->dprint("Could not match for HTTP method / proto: $http_method= _line"); same as above > + return { > + success =3D> 0, > + http_code =3D> 400, > + http_message =3D> 'bad request', > + }; > +} > + > +sub parse_request_header_fields { > + my ($self, $reqstate) =3D @_; > + > + my $http_header_field_lines =3D $reqstate->{http_header_field_lines}= ; > + > + die "http_header_field_lines not within reqstate\n" > + if !$reqstate->{http_header_field_lines}; > + > + die "request object not within reqstate\n" > + if !$reqstate->{request}; > + > + my $request =3D $reqstate->{request}; > + my $state =3D { size =3D> 0 }; > + > + # Matches a header field definition, e.g. > + # 'Foo: Bar' > + my $re_field =3D qr<^([^:\s]+)\s*:\s*(.*)>; > + > + # Matches a line beginning with at least one whitespace character, > + # followed by some other data; used to match multiline headers, > + # e.g. if 'Foo: Bar' was matched previously, and the next line > + # contains ' Qux', the resulting field will be 'Foo: Bar Qux' > + my $re_field_multiline =3D qr<^\s+(.*)>; > + > + my @lines =3D split(/\r?\n/, $http_header_field_lines); > + > + die "too many http header lines (> $limit_max_headers)\n" > + if scalar(@lines) >=3D $limit_max_headers; > + > + > + for my $line (@lines) { > + $self->dprint("Processing header line: $line"); > + > + die "http header too large\n" > + if ($state->{size} +=3D length($line)) >=3D $limit_max_header_size; > + > + if ($line =3D~ $re_field) { > + $request->push_header($state->{key}, $state->{value}) > + if $state->{key}; > + ($state->{key}, $state->{value}) =3D ($1, $2); > + > + } elsif ($line =3D~ $re_field_multiline) { > + > + if (!$state->{key}) { # sanity check for invalid header > + return { > + success =3D> 0, > + http_code =3D> 400, > + http_message =3D> 'bad request', > + }; > + } > + > + $state->{value} .=3D " $1"; > + > + } else { > + return { > + success =3D> 0, > + http_code =3D> 506, > + http_message =3D> 'unable to parse request header', > + }; > + } > + > + } > + > + $request->push_header($state->{key}, $state->{value}) > + if $state->{key}; > + > + return { success =3D> 1 }; > +} > + > +sub postprocess_header_fields { > + my ($self, $reqstate) =3D @_; > + > + die "request object not within reqstate\n" > + if !$reqstate->{request}; > + > + my $request =3D $reqstate->{request}; > + my $method =3D $request->method(); > + > + if (!$known_methods->{$method}) { > + return { > + success =3D> 0, > + http_code =3D> HTTP_NOT_IMPLEMENTED, > + http_message =3D> "method '$method' not available", > + }; > + } > + > + my $h_connection =3D $request->header('Connection'); > + my $h_accept_encoding =3D $request->header('Accept-Encoding'); > + > + $reqstate->{accept_gzip} =3D ($h_accept_encoding && $h_accept_encodi= ng =3D~ m/gzip/) ? 1 : 0; > + > + if ($h_connection) { > + $reqstate->{keep_alive} =3D 0 if $h_connection =3D~ m/close/oi; > + } else { > + if ($reqstate->{proto}->{ver} < 1001) { > + $reqstate->{keep_alive} =3D 0; > + } > + } > + > + my $h_transfer_encoding =3D $request->header('Transfer-Encoding'); > + if ($h_transfer_encoding && lc($h_transfer_encoding) eq 'chunked') { > + # Handle chunked transfer encoding > + return { > + success =3D> 0, > + http_code =3D> 501, > + http_message =3D> 'chunked transfer encoding not supported', > + }; > + > + } elsif ($h_transfer_encoding) { > + return { > + success =3D> 0, > + http_code =3D> 501, > + http_message =3D> "Unknown transfer encoding '$h_transfer_encoding'= ", > + }; > + } > + > + my $h_pveclientip =3D $request->header('PVEClientIP'); > + > + # FIXME: how can we make PVEClientIP header trusted? > + if ($self->{trusted_env} && $h_pveclientip) { > + $reqstate->{peer_host} =3D $h_pveclientip; > + } else { > + $request->header('PVEClientIP', $reqstate->{peer_host}); > + } > + > + if (my $rpcenv =3D $self->{rpcenv}) { > + $rpcenv->set_request_host($request->header('Host')); > + } > + > + return { success =3D> 1 }; > +} > + > +sub authenticate_and_handle_request { > + my ($self, $reqstate) =3D @_; > + > + die "request not within reqstate\n" > + if !$reqstate->{request}; > + > + my $request =3D $reqstate->{request}; > + my $method =3D $request->method(); > + my $path =3D uri_unescape($request->uri->path()); > + my $base_uri =3D $self->{base_uri}; > + > + # Begin authentication > + $self->dprint("Begin authentication"); > + my $auth =3D {}; > + > + if ($self->{spiceproxy}) { > + $self->dprint("Authenticating via spiceproxy"); > + my $connect_str =3D $request->header('Host'); > + my ($vmid, $node, $port) =3D $self->verify_spice_connect_url($connect_s= tr); > + > + if (!(defined($vmid) && $node && $port)) { > + return { > + success =3D> 0, > + http_code =3D> HTTP_UNAUTHORIZED, > + http_message =3D> 'invalid ticket', > + }; > + } > + > + $self->handle_spice_proxy_request($reqstate, $connect_str, $vmid, $node= , $port); > + $self->dprint("spiceproxy request handled"); > + return { success =3D> 1 }; > + > + } elsif ($path =3D~ m/^\Q$base_uri\E/) { > + $self->dprint("Authenticating via base_uri regex"); > + my $token =3D $request->header('CSRFPreventionToken'); > + my $cookie =3D $request->header('Cookie'); > + my $auth_header =3D $request->header('Authorization'); > + > + # prefer actual cookie > + my $ticket =3D PVE::APIServer::Formatter::extract_auth_value($cookie, $= self->{cookie_name}); > + > + # fallback to cookie in 'Authorization' header > + $ticket =3D PVE::APIServer::Formatter::extract_auth_value($auth_header,= $self->{cookie_name}) > + if !$ticket; > + > + # finally, fallback to API token if no ticket has been provided so far > + my $api_token; > + $api_token =3D PVE::APIServer::Formatter::extract_auth_value($auth_head= er, $self->{apitoken_name}) > + if !$ticket; > + > + my ($rel_uri, $format) =3D $split_abs_uri->($path, $base_uri); > + > + if (!$format) { > + return { > + success =3D> 0, > + http_code =3D> HTTP_NOT_IMPLEMENTED, > + http_message =3D> 'no such uri', > + }; > + } > + > + $self->dprint("Calling auth_handler"); > + eval { > + $auth =3D $self->auth_handler( > + $method, > + $rel_uri, > + $ticket, > + $token, > + $api_token, > + $reqstate->{peer_host} > + ); > + }; > + $self->dprint("auth_handler done"); > + > + if (my $err =3D $@) { > + $self->dprint("auth_handler threw error $@"); > + # HACK: see Note 1 > + Net::SSLeay::ERR_clear_error(); > + > + my $result =3D { > + success =3D> 0, > + response_delay =3D> 3, # always delay unauthorized calls by 3 seconds > + response_nocomp =3D> 0, > + }; > + > + if (ref($err) eq "PVE::Exception") { > + $result->{http_code} =3D $err->{code} || HTTP_INTERNAL_SERVER_ERROR; > + $result->{http_message} =3D $err->{msg}; > + > + } elsif (my $formatter =3D PVE::APIServer::Formatter::get_login_for= matter($format)) { > + my ($raw, $ct, $nocomp) =3D > + $formatter->($path, $auth, $self->{formatter_config}); > + > + if (ref($raw) && (ref($raw) eq 'HTTP::Response')) { > + $result->{http_code} =3D $raw->code; > + $result->{http_message} =3D $raw->message; > + $result->{http_header} =3D $raw->headers; > + $result->{http_content} =3D $raw->content; > + > + } else { > + $result->{http_code} =3D HTTP_UNAUTHORIZED; > + $result->{http_message} =3D 'Login Required'; > + $result->{http_header} =3D HTTP::Headers->new('Content-Type' =3D> = $ct); > + $result->{http_content} =3D $raw; > + > + } > + > + $result->{response_nocomp} =3D $nocomp; > + > + } else { > + $result->{http_code} =3D HTTP_UNAUTHORIZED; > + $result->{http_message} =3D $err; > + } > + > + return $result; > + } > + } > + > + $reqstate->{log}->{userid} =3D $auth->{userid}; > + > + # Continue handling request > + $self->dprint("Continuing to handle request"); > + > + my $h_content_length =3D $request->header('Content-Length'); > + my $h_content_type =3D $request->header('Content-Type'); > + > + if (!$h_content_length) { > + $self->dprint("No Content-Length provided; handling request the usual w= ay"); > + $self->handle_request($reqstate, $auth, $method, $path); > + return { success =3D> 1 }; > + } > + > + if (!($method eq 'PUT' || $method eq 'POST')) { > + return { > + success =3D> 0, > + http_code =3D> 501, > + http_message =3D> "Unexpected content for method '$method'", > + }; > + } > + > + my ($content, $boundary) =3D $h_content_type ? parse_content_type($h= _content_type) : (); > + > + if ($auth->{isUpload} && !$self->{trusted_env}) { > + die "upload 'Content-Type '$h_content_type' not implemented\n" > + if !($boundary && $content && ($content eq 'multipart/form-data')); > + > + die "upload without content length header not supported" > + if !$h_content_length; > + > + $self->dprint("start upload $path $content $boundary"); > + > + my $tmpfilename =3D get_upload_filename(); > + my $outfh =3D IO::File->new($tmpfilename, O_RDWR|O_CREAT|O_EXCL, 0600) = || > + die "unable to create temporary upload file '$tmpfilename'"; > + > + $reqstate->{keep_alive} =3D 0; > + > + my $boundlen =3D length($boundary) + 8; # \015?\012--$boundary--\015?\0= 12 > + > + my $state =3D { > + size =3D> $h_content_length, > + boundary =3D> $boundary, > + ctx =3D> Digest::MD5->new, > + boundlen =3D> $boundlen, > + maxheader =3D> 2048 + $boundlen, # should be large enough > + params =3D> decode_urlencoded($request->url->query()), > + phase =3D> 0, > + read =3D> 0, > + post_size =3D> 0, > + starttime =3D> [gettimeofday], > + outfh =3D> $outfh, > + }; > + > + $reqstate->{tmpfilename} =3D $tmpfilename; > + $reqstate->{hdl}->on_read(sub { > + $self->file_upload_multipart($reqstate, $auth, $method, $path, $sta= te); > + }); > + > + return { success =3D> 1 }; > + } > + > + if ($h_content_length > $limit_max_post) { > + return { > + success =3D> 0, > + http_code =3D> 501, > + 'for data too large', > + }; > + } > + > + if ( > + !$content > + || $content eq 'application/x-www-form-urlencoded' > + || $content eq 'application/json' > + ) { > + $self->dprint("Received content '$content', reading chunk length of '$h= _content_length'"); > + > + $reqstate->{hdl}->unshift_read(chunk =3D> $h_content_length, sub { > + my ($hdl, $data) =3D @_; > + $self->dprint("Read data for content '$content'"); > + $self->dprint("Continuing to handle request after reading content")= ; > + $request->content($data); > + $self->handle_request($reqstate, $auth, $method, $path); > + }); > + return { success =3D> 1 }; > + > + } else { > + return { > + success =3D> 0, > + http_code =3D> 506, > + http_message =3D> "upload 'Content-Type '$h_content_type' not imple= mented", > + }; > + } > +} > + > sub accept { > my ($self) =3D @_; > =20 > --=20 > 2.30.2 >=20 >=20 >=20 > _______________________________________________ > pve-devel mailing list > pve-devel@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel >=20 >=20 >=20