public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH v2 http-server 1/2] AnyEvent: whitespace fix
@ 2022-05-13 13:48 Matthias Heiserer
  2022-05-13 13:49 ` [pve-devel] [PATCH v2 http-server 2/2] AnyEvent: Fix #3990 - make small files uploadable Matthias Heiserer
  2022-09-29 15:06 ` [pve-devel] applied: [PATCH v2 http-server 1/2] AnyEvent: whitespace fix Thomas Lamprecht
  0 siblings, 2 replies; 6+ messages in thread
From: Matthias Heiserer @ 2022-05-13 13:48 UTC (permalink / raw)
  To: pve-devel

and remove unnecessary parentheses.

Signed-off-by: Matthias Heiserer <m.heiserer@proxmox.com>
---

Changes from v1:
new patch

 src/PVE/APIServer/AnyEvent.pm | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/PVE/APIServer/AnyEvent.pm b/src/PVE/APIServer/AnyEvent.pm
index 7dd7d2d..b8bc35d 100644
--- a/src/PVE/APIServer/AnyEvent.pm
+++ b/src/PVE/APIServer/AnyEvent.pm
@@ -1161,7 +1161,6 @@ sub file_upload_multipart {
     eval {
 	my $boundary = $rstate->{boundary};
 	my $hdl = $reqstate->{hdl};
-
 	my $startlen = length($hdl->{rbuf});
 
 	if ($rstate->{phase} == 0) { # skip everything until start
@@ -1245,7 +1244,7 @@ sub file_upload_multipart {
 	    substr($hdl->{rbuf}, 0, $len, ''); # empty rbuf
 	}
 
-	$rstate->{read} += ($startlen - length($hdl->{rbuf}));
+	$rstate->{read} += $startlen - length($hdl->{rbuf});
 
 	if (!$rstate->{done} && ($rstate->{read} + length($hdl->{rbuf})) >= $rstate->{size}) {
 	    $rstate->{done} = 1; # make sure we dont get called twice
-- 
2.30.2





^ permalink raw reply	[flat|nested] 6+ messages in thread

* [pve-devel] [PATCH v2 http-server 2/2] AnyEvent: Fix #3990 - make small files uploadable
  2022-05-13 13:48 [pve-devel] [PATCH v2 http-server 1/2] AnyEvent: whitespace fix Matthias Heiserer
@ 2022-05-13 13:49 ` Matthias Heiserer
  2022-07-06 16:09   ` Daniel Tschlatscher
  2022-09-29 15:13   ` [pve-devel] applied: " Thomas Lamprecht
  2022-09-29 15:06 ` [pve-devel] applied: [PATCH v2 http-server 1/2] AnyEvent: whitespace fix Thomas Lamprecht
  1 sibling, 2 replies; 6+ messages in thread
From: Matthias Heiserer @ 2022-05-13 13:49 UTC (permalink / raw)
  To: pve-devel

== The problem
Upload of files smaller than ~16kb failed.
This was because the code assumed that it would be called
several times, and each time would do a certain action.
When the whole file was in the buffer, this failed because
the function would try parssing the first part in the payload and
then return, instead of parsing the rest of the available data.

== Why not just modifying the current code a bit?
The code had a lot of nested control statements and a
non-intuitive control flow (phase 0->2->1->1->1 and so on).

The way the phases and buffer content were checked made it
rather difficult to just fix a few lines.

== What was changed
* Part headers are parsed with a single regex line each,
 which improves code readability.

* Parsing the content is done in order, so even if the whole data is in the buffer,
 it can be read in one go. Files of arbitrary sizes can be uploaded.

== Tested with
* Uploaded 0B, 1B, 14KB, 16KB, 1GB, 10GB, 20GB files

* Tested with all checksums and without

* Tested on firefox, chromium, and pvesh

I didn't do any fuzzing or automated upload testing.

== Drawbacks & Potential issues
* Part headers are hardcoded, adding new ones requries modifying this file

== does not fix
* upload can still time out

Signed-off-by: Matthias Heiserer <m.heiserer@proxmox.com>
---

Note:
Regarding trim, I forgot to answer the mail.
Trim is imo a good name, as several languagues (e.g. rust, javascript)
use trim to mean mean "remove all whitespace, including newlines and such".
I can send a v3 if that's a problem.

Changes from v1:
* fix whitespace in separate patch
* move trim into inline closure
* correctly call trim
* replace [^\S] with \S in regexes
* improve trim regex: don't replace string
* check for phase 1 once
* remove regex comment

 src/PVE/APIServer/AnyEvent.pm | 146 ++++++++++++++++++----------------
 1 file changed, 76 insertions(+), 70 deletions(-)

diff --git a/src/PVE/APIServer/AnyEvent.pm b/src/PVE/APIServer/AnyEvent.pm
index b8bc35d..1c07d96 100644
--- a/src/PVE/APIServer/AnyEvent.pm
+++ b/src/PVE/APIServer/AnyEvent.pm
@@ -1158,62 +1158,81 @@ sub handle_request {
 sub file_upload_multipart {
     my ($self, $reqstate, $auth, $method, $path, $rstate) = @_;
 
+    my $trim = sub {
+	my ($string) = @_;
+	$_[0] =~ /\s*(\S+)/;
+	return $1;
+    };
+
     eval {
 	my $boundary = $rstate->{boundary};
 	my $hdl = $reqstate->{hdl};
 	my $startlen = length($hdl->{rbuf});
 
-	if ($rstate->{phase} == 0) { # skip everything until start
-	    if ($hdl->{rbuf} =~ s/^.*?--\Q$boundary\E  \015?\012
-                       ((?:[^\015]+\015\012)* ) \015?\012//xs) {
-		my $header = $1;
-		my ($ct, $disp, $name, $filename);
-		foreach my $line (split(/\015?\012/, $header)) {
-		    # assume we have single line headers
-		    if ($line =~ m/^Content-Type\s*:\s*(.*)/i) {
-			$ct = parse_content_type($1);
-		    } elsif ($line =~ m/^Content-Disposition\s*:\s*(.*)/i) {
-			($disp, $name, $filename) = parse_content_disposition($1);
-		    }
-		}
+	my $newline = qr/\015?\012/;
+	my $delimiter = qr/--\Q$boundary\E${newline}/;
+	my $closeDelimiter = qr/--\Q$boundary\E--${newline}/;
 
-		if (!($disp && $disp eq 'form-data' && $name)) {
-		    syslog('err', "wrong content disposition in multipart - abort upload");
-		    $rstate->{phase} = -1;
-		} else {
+	my $check_disposition = sub {
+	    my ($disp) = @_;
+	    die "wrong Content-Disposition in multipart, expected `form-data` - abort upload"
+		if $disp ne 'form-data';
+	};
 
-		    $rstate->{fieldname} = $name;
+	# Phase 0 - preserve boundary, but remove everything before
+	if ($rstate->{phase} == 0 && $hdl->{rbuf} =~ s/^.*?($delimiter)/$1/xs) {
+	    $rstate->{read} += $startlen - length($hdl->{rbuf});
+	    $rstate->{phase} = 1;
+	}
 
-		    if ($filename) {
-			if ($name eq 'filename') {
-			    # found file upload data
-			    $rstate->{phase} = 1;
-			    $rstate->{filename} = $filename;
-			} else {
-			    syslog('err', "wrong field name for file upload - abort upload");
-			    $rstate->{phase} = -1;
-			}
-		    } else {
-			# found form data for field $name
-			$rstate->{phase} = 2;
-		    }
-		}
-	    } else {
-		my $len = length($hdl->{rbuf});
-		substr($hdl->{rbuf}, 0, $len - $rstate->{maxheader}, '')
-		    if $len > $rstate->{maxheader}; # skip garbage
+	# Phase 1 - parse payload without file data
+	if ($rstate->{phase} == 1) {
+	    if ($hdl->{rbuf} =~
+		s/^${delimiter}Content-Disposition: (.*?); name="content"(.*?)($delimiter)/$3/s
+	    ) {
+		$check_disposition->($1);
+		$rstate->{params}->{content} = $trim->($2);
 	    }
-	} elsif ($rstate->{phase} == 1) { # inside file - dump until end marker
-	    if ($hdl->{rbuf} =~ s/^(.*?)\015?\012(--\Q$boundary\E(--)? \015?\012(.*))$/$2/xs) {
+
+	    if ( $hdl->{rbuf} =~
+		s/^${delimiter}Content-Disposition: (.*?); name="checksum-algorithm"(.*?)($delimiter)/$3/s
+	    ) {
+		$check_disposition->($1);
+		$rstate->{params}->{"checksum-algorithm"} = $trim->($2);
+	    }
+
+	    if ( $hdl->{rbuf} =~
+		s/^${delimiter}Content-Disposition: (.*?); name="checksum"(.*?)($delimiter)/$3/s
+	    ) {
+		$check_disposition->($1);
+		$rstate->{params}->{checksum} = $trim->($2);
+	    }
+
+	    if ( $hdl->{rbuf} =~
+		s/^${delimiter}
+		Content-Disposition:\ (.*?);\ name="(.*?)";\ filename="([^"]+)"${newline}
+		Content-Type:\ \S*\s+
+		//sxx
+	    ) {
+		$check_disposition->($1);
+		die "wrong field `name` for file upload, expected `filename` - abort upload"
+		    if $2 ne "filename";
+		$rstate->{phase} = 2;
+		$rstate->{params}->{filename} = $trim->($3);
+	    }
+	}
+
+	# Phase 2 - dump content into file
+	if ($rstate->{phase} == 2) {
+	    if ($hdl->{rbuf} =~ s/^(.*?)${newline}?+${closeDelimiter}.*$//s) {
 		my ($rest, $eof) = ($1, $3);
 		my $len = length($rest);
 		die "write to temporary file failed - $!"
 		    if syswrite($rstate->{outfh}, $rest) != $len;
 		$rstate->{ctx}->add($rest);
-		$rstate->{params}->{filename} = $rstate->{filename};
 		$rstate->{md5sum} = $rstate->{ctx}->hexdigest;
 		$rstate->{bytes} += $len;
-		$rstate->{phase} =  $eof ? 100 : 0;
+		$rstate->{phase} =  100;
 	    } else {
 		my $len = length($hdl->{rbuf});
 		my $wlen = $len - $rstate->{boundlen};
@@ -1225,42 +1244,29 @@ sub file_upload_multipart {
 		    $rstate->{ctx}->add($data);
 		}
 	    }
-	} elsif ($rstate->{phase} == 2) { # inside normal field
+	}
 
-	    if ($hdl->{rbuf} =~ s/^(.*?)\015?\012(--\Q$boundary\E(--)? \015?\012(.*))$/$2/xs) {
-		my ($rest, $eof) = ($1, $3);
-		my $len = length($rest);
-		$rstate->{post_size} += $len;
-		if ($rstate->{post_size} < $limit_max_post) {
-		    $rstate->{params}->{$rstate->{fieldname}} = $rest;
-		    $rstate->{phase} = $eof ? 100 : 0;
-		} else {
-		    syslog('err', "form data to large - abort upload");
-		    $rstate->{phase} = -1; # skip
-		}
-	    }
-	} else { # skip
-	    my $len = length($hdl->{rbuf});
-	    substr($hdl->{rbuf}, 0, $len, ''); # empty rbuf
+	# Phase 100 - transfer finished
+	if ($rstate->{phase} == 100) {
+	    my $elapsed = tv_interval($rstate->{starttime});
+
+	    my $rate = int($rstate->{bytes} / ($elapsed * 1024 * 1024));
+	    syslog('info',
+		"multipart upload complete (size: %d time: %ds rate: %.2fMiB/s md5sum: %s)",
+		$rstate->{bytes}, $elapsed, $rate, $rstate->{md5sum}
+	    );
+	    $self->handle_api2_request($reqstate, $auth, $method, $path, $rstate);
 	}
 
 	$rstate->{read} += $startlen - length($hdl->{rbuf});
 
-	if (!$rstate->{done} && ($rstate->{read} + length($hdl->{rbuf})) >= $rstate->{size}) {
-	    $rstate->{done} = 1; # make sure we dont get called twice
-	    if ($rstate->{phase} < 0 || !$rstate->{md5sum}) {
-		die "upload failed\n";
-	    } else {
-		my $elapsed = tv_interval($rstate->{starttime});
-
-		my $rate = int($rstate->{bytes} / ($elapsed * 1024 * 1024));
-		syslog('info',
-		    "multipart upload complete (size: %d time: %ds rate: %.2fMiB/s md5sum: %s)",
-		     $rstate->{bytes}, $elapsed, $rate, $rstate->{md5sum}
-		);
-		$self->handle_api2_request($reqstate, $auth, $method, $path, $rstate);
-	    }
+	if (
+	    $rstate->{read} + length($hdl->{rbuf}) >= $rstate->{size}
+	    && $rstate->{phase} != 100
+	) {
+	    die "upload failed";
 	}
+
     };
     if (my $err = $@) {
 	syslog('err', $err);
-- 
2.30.2





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pve-devel] [PATCH v2 http-server 2/2] AnyEvent: Fix #3990 - make small files uploadable
  2022-05-13 13:49 ` [pve-devel] [PATCH v2 http-server 2/2] AnyEvent: Fix #3990 - make small files uploadable Matthias Heiserer
@ 2022-07-06 16:09   ` Daniel Tschlatscher
  2022-07-12 11:32     ` Matthias Heiserer
  2022-09-29 15:13   ` [pve-devel] applied: " Thomas Lamprecht
  1 sibling, 1 reply; 6+ messages in thread
From: Daniel Tschlatscher @ 2022-07-06 16:09 UTC (permalink / raw)
  To: pve-devel

The GUI works without problems. Files of arbitrary size can be
uploaded without issues.
Apart from that, I mostly focused on usage via the API using curl and
Postman.


What worked as expected:
* Upload in the GUI
  Tested by uploading files of sizes:
  0B, 1kB, 8kB, 15kB (did not work before), 16kB, 32kB, 1MB, 1GB, 10GB
* Uploading a 8kB and a 32kB file with and without the corresponding
  checksums.
* Unknown values for the "metadata" in the body (e.g. Content-
  Disposition, filename, ...) fail expectedly
* Unmatching file extensions (extension of the file passed in phase 0
  and file extension assigned in the filename) fail expectedly

All errors above return HTTP status 501 ("Not implemented") which I
think is rather confusing, even if they include a descriptive error
message.
I'd wager that it would be better to differentiate between malformed
multi-form data and internal server errors, and return 400 for incorrect
user inputs. This makes it much easier for anyone (that is not a
browser) to interface with this part of the API.


Below problems seem to originate in file_upload_multipart() or at least
reach this part of the code (and could therefore probably improve error
status communication for the user here):
* Using an unknown value for the "Content-Type" Http-Header or
  assigning boundary differently to what is used in the body
=> Connection dies with error "empty reply from server"

* Adding an unknown "metadata" field or changing an existing one (e.g.
  filename to fielname)
=> results in the same error as above

* Malforming the boundary for phase 1 (the second one in the http body)
  (Seems this is parsed incorrectly?)
=> 501 wrong field `name` for file upload, expected `filename` - abort
   upload

* Malforming the very last boundary
=> Connection dies, "empty reply from server".
   Syslog error: "problem with client [...] Connection timed out"


I am not quite sure how much precedence fixing the errors above should
have, as even most API users will probably never encounter them.
However, I feel like that at the very least the basic error message
should be revised to no longer return 501 and to include a more
descriptive error message than just "upload failed".

With no major problems and depending on how important the (edge-)cases
above are, consider this patch

Tested-by: Daniel Tschlatscher <d.tschlatscher@proxmox.com>




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pve-devel] [PATCH v2 http-server 2/2] AnyEvent: Fix #3990 - make small files uploadable
  2022-07-06 16:09   ` Daniel Tschlatscher
@ 2022-07-12 11:32     ` Matthias Heiserer
  0 siblings, 0 replies; 6+ messages in thread
From: Matthias Heiserer @ 2022-07-12 11:32 UTC (permalink / raw)
  To: pve-devel

On 06.07.2022 18:09, Daniel Tschlatscher wrote:
> All errors above return HTTP status 501 ("Not implemented") which I
> think is rather confusing, even if they include a descriptive error
> message.
> I'd wager that it would be better to differentiate between malformed
> multi-form data and internal server errors, and return 400 for incorrect
> user inputs. This makes it much easier for anyone (that is not a
> browser) to interface with this part of the API.
> 
> I am not quite sure how much precedence fixing the errors above should
> have, as even most API users will probably never encounter them.
> However, I feel like that at the very least the basic error message
> should be revised to no longer return 501 and to include a more
> descriptive error message than just "upload failed".
As we talked off list: The 501 error is what we currently return. I 
don't mind using more descriptive ones, but as it changes the API I'm 
not sure if this patch is the right place.
Thanks for testing!




^ permalink raw reply	[flat|nested] 6+ messages in thread

* [pve-devel] applied: [PATCH v2 http-server 1/2] AnyEvent: whitespace fix
  2022-05-13 13:48 [pve-devel] [PATCH v2 http-server 1/2] AnyEvent: whitespace fix Matthias Heiserer
  2022-05-13 13:49 ` [pve-devel] [PATCH v2 http-server 2/2] AnyEvent: Fix #3990 - make small files uploadable Matthias Heiserer
@ 2022-09-29 15:06 ` Thomas Lamprecht
  1 sibling, 0 replies; 6+ messages in thread
From: Thomas Lamprecht @ 2022-09-29 15:06 UTC (permalink / raw)
  To: Proxmox VE development discussion, Matthias Heiserer

Am 13/05/2022 um 15:48 schrieb Matthias Heiserer:
> and remove unnecessary parentheses.
> 
> Signed-off-by: Matthias Heiserer <m.heiserer@proxmox.com>
> ---
> 
> Changes from v1:
> new patch
> 
>  src/PVE/APIServer/AnyEvent.pm | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
>

applied, thanks!




^ permalink raw reply	[flat|nested] 6+ messages in thread

* [pve-devel] applied: [PATCH v2 http-server 2/2] AnyEvent: Fix #3990 - make small files uploadable
  2022-05-13 13:49 ` [pve-devel] [PATCH v2 http-server 2/2] AnyEvent: Fix #3990 - make small files uploadable Matthias Heiserer
  2022-07-06 16:09   ` Daniel Tschlatscher
@ 2022-09-29 15:13   ` Thomas Lamprecht
  1 sibling, 0 replies; 6+ messages in thread
From: Thomas Lamprecht @ 2022-09-29 15:13 UTC (permalink / raw)
  To: Proxmox VE development discussion, Matthias Heiserer

Am 13/05/2022 um 15:49 schrieb Matthias Heiserer:
> == The problem
> Upload of files smaller than ~16kb failed.
> This was because the code assumed that it would be called
> several times, and each time would do a certain action.
> When the whole file was in the buffer, this failed because
> the function would try parssing the first part in the payload and
> then return, instead of parsing the rest of the available data.
> 
> == Why not just modifying the current code a bit?
> The code had a lot of nested control statements and a
> non-intuitive control flow (phase 0->2->1->1->1 and so on).
> 
> The way the phases and buffer content were checked made it
> rather difficult to just fix a few lines.
> 
> == What was changed
> * Part headers are parsed with a single regex line each,
>  which improves code readability.
> 
> * Parsing the content is done in order, so even if the whole data is in the buffer,
>  it can be read in one go. Files of arbitrary sizes can be uploaded.
> 
> == Tested with
> * Uploaded 0B, 1B, 14KB, 16KB, 1GB, 10GB, 20GB files
> 
> * Tested with all checksums and without
> 
> * Tested on firefox, chromium, and pvesh
> 
> I didn't do any fuzzing or automated upload testing.
> 
> == Drawbacks & Potential issues
> * Part headers are hardcoded, adding new ones requries modifying this file
> 
> == does not fix
> * upload can still time out
> 
> Signed-off-by: Matthias Heiserer <m.heiserer@proxmox.com>
> ---
> 
> Note:
> Regarding trim, I forgot to answer the mail.
> Trim is imo a good name, as several languagues (e.g. rust, javascript)
> use trim to mean mean "remove all whitespace, including newlines and such".
> I can send a v3 if that's a problem.
> 
> Changes from v1:
> * fix whitespace in separate patch
> * move trim into inline closure
> * correctly call trim
> * replace [^\S] with \S in regexes
> * improve trim regex: don't replace string
> * check for phase 1 once
> * remove regex comment
> 
>  src/PVE/APIServer/AnyEvent.pm | 146 ++++++++++++++++++----------------
>  1 file changed, 76 insertions(+), 70 deletions(-)
> 
>

applied, with slightly reworking the commit messages subject and Daniel's T-b tag,
thanks to both!

Note that I made some follow ups trying to improve a few things of your change and the
existing code. E.g.: $string in trim wasn't used anywhere, same for $eof in phase 2.
The content-disposition extraction could be factored in a small closure to reduce code
size and (IMO) legibility, and some other smaller style/formatting stuff.

None of that was a blocker, and due to the age of this series (sorry for that!) I
really did not want to bother for a v3 and applied changes myself. I re-tested it
with various sizes and functionality (checksum), but an additional pair of eyes
would be appreciated to ensure no regression snuck in.




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-09-29 15:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-13 13:48 [pve-devel] [PATCH v2 http-server 1/2] AnyEvent: whitespace fix Matthias Heiserer
2022-05-13 13:49 ` [pve-devel] [PATCH v2 http-server 2/2] AnyEvent: Fix #3990 - make small files uploadable Matthias Heiserer
2022-07-06 16:09   ` Daniel Tschlatscher
2022-07-12 11:32     ` Matthias Heiserer
2022-09-29 15:13   ` [pve-devel] applied: " Thomas Lamprecht
2022-09-29 15:06 ` [pve-devel] applied: [PATCH v2 http-server 1/2] AnyEvent: whitespace fix Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal