public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
	Lorenz Stechauner <l.stechauner@proxmox.com>
Subject: [pve-devel] applied: [PATCH v7 common 1/1] tools: add download_file_from_url
Date: Tue, 15 Jun 2021 14:25:40 +0200	[thread overview]
Message-ID: <29c62c46-8790-99ef-f089-3736377ef0f5@proxmox.com> (raw)
In-Reply-To: <20210614090557.33455-2-l.stechauner@proxmox.com>

On 14.06.21 11:05, Lorenz Stechauner wrote:
> code is based on
> manager:PVE/API2/Nodes.pm:aplinfo
> 

applied, with a slightly adapted commit message you send afterwards and some followups.

I'm a bit sorry to not check on this more closely again earlier as I found quite some
issues when finally wanting to apply this, I fixed up most of them in a followup but the
remaining part of the series needs to adapt.

> Signed-off-by: Lorenz Stechauner <l.stechauner@proxmox.com>
> ---
>  src/PVE/Tools.pm | 124 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 124 insertions(+)
> 
> diff --git a/src/PVE/Tools.pm b/src/PVE/Tools.pm
> index 16ae3d2..7b82e00 100644
> --- a/src/PVE/Tools.pm
> +++ b/src/PVE/Tools.pm
> @@ -1829,4 +1829,128 @@ sub safe_compare {
>      return $cmp->($left, $right);
>  }
>  
> +
> +# opts
> +#  -> hash_required
> +#       if 1, at least one checksum has to be specified otherwise an error will be thrown
> +#  -> http_proxy
> +#  -> https_proxy
> +#  -> verify_certificates
> +#  -> sha(1|224|256|384|512)sum
> +#  -> md5sum
> +sub download_file_from_url {
> +    my ($dest, $url, $opts) = @_;
> +
> +    my $tmpdest = "$dest.tmp.$$";

If we'd kept the fork_worker here (see below) this wouldn't be ideal, as now it's using
the pveproxy worker pid, which can be shared by multiple in-flight requests, I'd have moved
it down into the worker sub which is actually it's own process and has a more unique pid..

> +
> +    my $algorithm;
> +    my $expected;
> +

prefixed above two variables with "checksum_" to avoid over general names in longer methods.

> +    for ('sha512', 'sha384', 'sha256', 'sha224', 'sha1', 'md5') {
> +	if (defined($opts->{"${_}sum"})) {
> +	    $algorithm = $_;
> +	    $expected = $opts->{"${_}sum"};
> +	    last;
> +	}
> +    }
> +
> +    die "checksum required but not specified\n" if ($opts->{hash_required} && !$algorithm);
> +
> +    my $worker = sub  {
> +	my $upid = shift;
> +
> +	print "downloading $url to $dest\n";
> +
> +	eval {
> +	    if (-f $dest && $algorithm) {
> +		print "calculating checksum of existing file...\n";
> +		my $correct = check_file_hash($algorithm, $expected, $dest);
> +
> +		if ($correct) {
> +		    print "file already exists, no need to download\n";
> +		    return;
> +		} else {
> +		    print "mismatch, downloading\n";

made a die "abort" out of above, IMO this is not really safe, admin should delete
the existing file first explicitly (we may relax this in the future if actually
requested by users, but as default I'd like to start out with safer behavior).

> +		}
> +	    }
> +
> +	    my @cmd = ('/usr/bin/wget', '--progress=dot:mega', '-O', $tmpdest, $url);

Changed to a array ref and dropped the /usr/bin/, which may be wrong (in theory) on some
systems.

> +
> +	    local %ENV;
> +	    if ($opts->{http_proxy}) {
> +		$ENV{http_proxy} = $opts->{http_proxy};
> +	    }
> +	    if ($opts->{https_proxy}) {
> +		$ENV{https_proxy} = $opts->{https_proxy};
> +	    }
> +
> +	    my $verify = $opts->{verify_certificates} // 1;
> +	    if (!$verify) {
> +		push @cmd, '--no-check-certificate';
> +	    }
> +
> +	    if (run_command([[@cmd]]) != 0) {
> +		die "download failed: $!\n";
> +	    }

why double array refs? I changed above three lines to:

run_command($cmd, errmsg => "download failed");

> +
> +	    if ($algorithm) {
> +		print "calculating checksum...\n";
> +
> +		my $correct = check_file_hash($algorithm, $expected, $tmpdest);
> +
> +		if ($correct) {
> +		    print "checksum verified\n";

not telling what was wrong is never good, i.e., the calculated checksum should be printed here.
I made 'check_file_hash' to 'get_file_hash' and let it just return the computed digest, that
way we do not need to pass expected either and just check ourself.

> +		} else {
> +		    die "checksum mismatch\n";
> +		}
> +	    } else {
> +		print "no checksum for verification specified\n";

nit: just noise, makes the task log unnecessarily longer 

> +	    }
> +
> +	    if (!rename($tmpdest, $dest)) {
> +		die "unable to save file: $!\n";

nit: it's already saved, so this error is misleading, I changed it to "unable to rename temporary file: $!"
to better reflect where the actual issues happened.

> +	    }
> +	};
> +	my $err = $@;
> +
> +	unlink $tmpdest;

this can fail too, and we should warn the admin about that 

> +
> +	if ($err) {
> +	    print "\n";
> +	    die $err;
> +	}
> +
> +	print "download finished\n";
> +    };
> +
> +    my $rpcenv = PVE::RPCEnvironment::get();

you use PVE::RPCEnvironment here but are missing an use statement, it works only as other
modules in pveproxy/pvedaemon include it and perl has no usage-hygiene.

But it cannot be "fixed" by just using this here, there's a reason that we have no other
fork_worker/RPCEnv use in PVE::Tools, PVE::RPCEnvironment is in pve-access-control, which
is a consumer of pve-common, so adding this here would create a circular dependency which
we want to avoid at all costs (makes bootstrapping a huge PITA without any benefit).

So I dropped the whole worker stuff, that needs to happen at the caller side.

> +    my $user = $rpcenv->get_user();
> +
> +    (my $filename = $dest) =~ s!.*/([^/]*)$!$1!;

not really safe, would break quite a bit if the filename contains a colon : or white-space,
this needs to be encoded

> +
> +    return $rpcenv->fork_worker('download', $filename, $user, $worker);
> +}
> +
> +sub check_file_hash {
> +    my ($algorithm, $expected, $filename) = @_;
> +
> +    my $algorithm_map = {
> +	'md5' => sub { Digest::MD5->new },
> +	'sha1' => sub { Digest::SHA->new(1) },
> +	'sha224' => sub { Digest::SHA->new(224) },
> +	'sha256' => sub { Digest::SHA->new(256) },
> +	'sha384' => sub { Digest::SHA->new(384) },
> +	'sha512' => sub { Digest::SHA->new(512) },

You use the Digest:: modules but never add an import use statement for that, meaning again
that it's up to luck (i.e., some other module imports it) if this works..

E.g., the following minimal test does not work even if the worker stuff is dropped:

perl -we 'use PVE::Tools; PVE::Tools::download_file_from_url("/tmp/foo",
  "http://download.proxmox.com/iso/proxmox-ve_3.4-102d4547-6.iso", {md5sum => "37efacd45b70d5d720b11b468f26136b"});'

> +    };
> +
> +    my $digester = $algorithm_map->{$algorithm}->() or die "unknown algorithm '$algorithm'\n";
> +
> +    open(my $fh, '<', $filename) or die "unable to open '$filename': $!\n";
> +    binmode($fh);
> +
> +    my $digest = $digester->addfile($fh)->hexdigest;
> +
> +    return lc($digest) eq lc($expected);
> +}
> +
>  1;
> 





  parent reply	other threads:[~2021-06-15 12:25 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-14  9:05 [pve-devel] [PATCH-SERIES v7 manager/common/storage] fix #1710: add downlaod from url button Lorenz Stechauner
2021-06-14  9:05 ` [pve-devel] [PATCH v7 common 1/1] tools: add download_file_from_url Lorenz Stechauner
2021-06-15  7:58   ` Lorenz Stechauner
2021-06-15 12:25   ` Thomas Lamprecht [this message]
2021-06-14  9:05 ` [pve-devel] [PATCH v7 storage 1/1] status: add download_url method Lorenz Stechauner
2021-06-15  7:58   ` Lorenz Stechauner
2021-06-14  9:05 ` [pve-devel] [PATCH v7 manager 1/5] api: nodes: add query_url_metadata method Lorenz Stechauner
2021-06-15  7:58   ` Lorenz Stechauner
2021-06-14  9:05 ` [pve-devel] [PATCH v7 manager 2/5] api: nodes: refactor aplinfo to use common download function Lorenz Stechauner
2021-06-15  7:57   ` Lorenz Stechauner
2021-06-14  9:05 ` [pve-devel] [PATCH v7 manager 3/5] ui: add HashAlgorithmSelector Lorenz Stechauner
2021-06-14  9:05 ` [pve-devel] [PATCH v7 manager 4/5] ui: Utils: change download task format Lorenz Stechauner
2021-06-14  9:05 ` [pve-devel] [PATCH v7 manager 5/5] fix #1710: ui: storage: add download from url button Lorenz Stechauner
2021-06-15  7:58   ` Lorenz Stechauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=29c62c46-8790-99ef-f089-3736377ef0f5@proxmox.com \
    --to=t.lamprecht@proxmox.com \
    --cc=l.stechauner@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal