From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 85D5271BD4 for ; Fri, 13 May 2022 15:49:11 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 7D48A28BE for ; Fri, 13 May 2022 15:49:11 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 727FD28B1 for ; Fri, 13 May 2022 15:49:10 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 4BB2E4294F for ; Fri, 13 May 2022 15:49:10 +0200 (CEST) From: Matthias Heiserer To: pve-devel@lists.proxmox.com Date: Fri, 13 May 2022 15:49:00 +0200 Message-Id: <20220513134900.440420-2-m.heiserer@proxmox.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220513134900.440420-1-m.heiserer@proxmox.com> References: <20220513134900.440420-1-m.heiserer@proxmox.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.133 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - Subject: [pve-devel] [PATCH v2 http-server 2/2] AnyEvent: Fix #3990 - make small files uploadable X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 May 2022 13:49:11 -0000 == The problem Upload of files smaller than ~16kb failed. This was because the code assumed that it would be called several times, and each time would do a certain action. When the whole file was in the buffer, this failed because the function would try parssing the first part in the payload and then return, instead of parsing the rest of the available data. == Why not just modifying the current code a bit? The code had a lot of nested control statements and a non-intuitive control flow (phase 0->2->1->1->1 and so on). The way the phases and buffer content were checked made it rather difficult to just fix a few lines. == What was changed * Part headers are parsed with a single regex line each, which improves code readability. * Parsing the content is done in order, so even if the whole data is in the buffer, it can be read in one go. Files of arbitrary sizes can be uploaded. == Tested with * Uploaded 0B, 1B, 14KB, 16KB, 1GB, 10GB, 20GB files * Tested with all checksums and without * Tested on firefox, chromium, and pvesh I didn't do any fuzzing or automated upload testing. == Drawbacks & Potential issues * Part headers are hardcoded, adding new ones requries modifying this file == does not fix * upload can still time out Signed-off-by: Matthias Heiserer --- Note: Regarding trim, I forgot to answer the mail. Trim is imo a good name, as several languagues (e.g. rust, javascript) use trim to mean mean "remove all whitespace, including newlines and such". I can send a v3 if that's a problem. Changes from v1: * fix whitespace in separate patch * move trim into inline closure * correctly call trim * replace [^\S] with \S in regexes * improve trim regex: don't replace string * check for phase 1 once * remove regex comment src/PVE/APIServer/AnyEvent.pm | 146 ++++++++++++++++++---------------- 1 file changed, 76 insertions(+), 70 deletions(-) diff --git a/src/PVE/APIServer/AnyEvent.pm b/src/PVE/APIServer/AnyEvent.pm index b8bc35d..1c07d96 100644 --- a/src/PVE/APIServer/AnyEvent.pm +++ b/src/PVE/APIServer/AnyEvent.pm @@ -1158,62 +1158,81 @@ sub handle_request { sub file_upload_multipart { my ($self, $reqstate, $auth, $method, $path, $rstate) = @_; + my $trim = sub { + my ($string) = @_; + $_[0] =~ /\s*(\S+)/; + return $1; + }; + eval { my $boundary = $rstate->{boundary}; my $hdl = $reqstate->{hdl}; my $startlen = length($hdl->{rbuf}); - if ($rstate->{phase} == 0) { # skip everything until start - if ($hdl->{rbuf} =~ s/^.*?--\Q$boundary\E \015?\012 - ((?:[^\015]+\015\012)* ) \015?\012//xs) { - my $header = $1; - my ($ct, $disp, $name, $filename); - foreach my $line (split(/\015?\012/, $header)) { - # assume we have single line headers - if ($line =~ m/^Content-Type\s*:\s*(.*)/i) { - $ct = parse_content_type($1); - } elsif ($line =~ m/^Content-Disposition\s*:\s*(.*)/i) { - ($disp, $name, $filename) = parse_content_disposition($1); - } - } + my $newline = qr/\015?\012/; + my $delimiter = qr/--\Q$boundary\E${newline}/; + my $closeDelimiter = qr/--\Q$boundary\E--${newline}/; - if (!($disp && $disp eq 'form-data' && $name)) { - syslog('err', "wrong content disposition in multipart - abort upload"); - $rstate->{phase} = -1; - } else { + my $check_disposition = sub { + my ($disp) = @_; + die "wrong Content-Disposition in multipart, expected `form-data` - abort upload" + if $disp ne 'form-data'; + }; - $rstate->{fieldname} = $name; + # Phase 0 - preserve boundary, but remove everything before + if ($rstate->{phase} == 0 && $hdl->{rbuf} =~ s/^.*?($delimiter)/$1/xs) { + $rstate->{read} += $startlen - length($hdl->{rbuf}); + $rstate->{phase} = 1; + } - if ($filename) { - if ($name eq 'filename') { - # found file upload data - $rstate->{phase} = 1; - $rstate->{filename} = $filename; - } else { - syslog('err', "wrong field name for file upload - abort upload"); - $rstate->{phase} = -1; - } - } else { - # found form data for field $name - $rstate->{phase} = 2; - } - } - } else { - my $len = length($hdl->{rbuf}); - substr($hdl->{rbuf}, 0, $len - $rstate->{maxheader}, '') - if $len > $rstate->{maxheader}; # skip garbage + # Phase 1 - parse payload without file data + if ($rstate->{phase} == 1) { + if ($hdl->{rbuf} =~ + s/^${delimiter}Content-Disposition: (.*?); name="content"(.*?)($delimiter)/$3/s + ) { + $check_disposition->($1); + $rstate->{params}->{content} = $trim->($2); } - } elsif ($rstate->{phase} == 1) { # inside file - dump until end marker - if ($hdl->{rbuf} =~ s/^(.*?)\015?\012(--\Q$boundary\E(--)? \015?\012(.*))$/$2/xs) { + + if ( $hdl->{rbuf} =~ + s/^${delimiter}Content-Disposition: (.*?); name="checksum-algorithm"(.*?)($delimiter)/$3/s + ) { + $check_disposition->($1); + $rstate->{params}->{"checksum-algorithm"} = $trim->($2); + } + + if ( $hdl->{rbuf} =~ + s/^${delimiter}Content-Disposition: (.*?); name="checksum"(.*?)($delimiter)/$3/s + ) { + $check_disposition->($1); + $rstate->{params}->{checksum} = $trim->($2); + } + + if ( $hdl->{rbuf} =~ + s/^${delimiter} + Content-Disposition:\ (.*?);\ name="(.*?)";\ filename="([^"]+)"${newline} + Content-Type:\ \S*\s+ + //sxx + ) { + $check_disposition->($1); + die "wrong field `name` for file upload, expected `filename` - abort upload" + if $2 ne "filename"; + $rstate->{phase} = 2; + $rstate->{params}->{filename} = $trim->($3); + } + } + + # Phase 2 - dump content into file + if ($rstate->{phase} == 2) { + if ($hdl->{rbuf} =~ s/^(.*?)${newline}?+${closeDelimiter}.*$//s) { my ($rest, $eof) = ($1, $3); my $len = length($rest); die "write to temporary file failed - $!" if syswrite($rstate->{outfh}, $rest) != $len; $rstate->{ctx}->add($rest); - $rstate->{params}->{filename} = $rstate->{filename}; $rstate->{md5sum} = $rstate->{ctx}->hexdigest; $rstate->{bytes} += $len; - $rstate->{phase} = $eof ? 100 : 0; + $rstate->{phase} = 100; } else { my $len = length($hdl->{rbuf}); my $wlen = $len - $rstate->{boundlen}; @@ -1225,42 +1244,29 @@ sub file_upload_multipart { $rstate->{ctx}->add($data); } } - } elsif ($rstate->{phase} == 2) { # inside normal field + } - if ($hdl->{rbuf} =~ s/^(.*?)\015?\012(--\Q$boundary\E(--)? \015?\012(.*))$/$2/xs) { - my ($rest, $eof) = ($1, $3); - my $len = length($rest); - $rstate->{post_size} += $len; - if ($rstate->{post_size} < $limit_max_post) { - $rstate->{params}->{$rstate->{fieldname}} = $rest; - $rstate->{phase} = $eof ? 100 : 0; - } else { - syslog('err', "form data to large - abort upload"); - $rstate->{phase} = -1; # skip - } - } - } else { # skip - my $len = length($hdl->{rbuf}); - substr($hdl->{rbuf}, 0, $len, ''); # empty rbuf + # Phase 100 - transfer finished + if ($rstate->{phase} == 100) { + my $elapsed = tv_interval($rstate->{starttime}); + + my $rate = int($rstate->{bytes} / ($elapsed * 1024 * 1024)); + syslog('info', + "multipart upload complete (size: %d time: %ds rate: %.2fMiB/s md5sum: %s)", + $rstate->{bytes}, $elapsed, $rate, $rstate->{md5sum} + ); + $self->handle_api2_request($reqstate, $auth, $method, $path, $rstate); } $rstate->{read} += $startlen - length($hdl->{rbuf}); - if (!$rstate->{done} && ($rstate->{read} + length($hdl->{rbuf})) >= $rstate->{size}) { - $rstate->{done} = 1; # make sure we dont get called twice - if ($rstate->{phase} < 0 || !$rstate->{md5sum}) { - die "upload failed\n"; - } else { - my $elapsed = tv_interval($rstate->{starttime}); - - my $rate = int($rstate->{bytes} / ($elapsed * 1024 * 1024)); - syslog('info', - "multipart upload complete (size: %d time: %ds rate: %.2fMiB/s md5sum: %s)", - $rstate->{bytes}, $elapsed, $rate, $rstate->{md5sum} - ); - $self->handle_api2_request($reqstate, $auth, $method, $path, $rstate); - } + if ( + $rstate->{read} + length($hdl->{rbuf}) >= $rstate->{size} + && $rstate->{phase} != 100 + ) { + die "upload failed"; } + }; if (my $err = $@) { syslog('err', $err); -- 2.30.2