public inbox for pmg-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Dominik Csapak <d.csapak@proxmox.com>, pmg-devel@lists.proxmox.com
Subject: Re: [pmg-devel] [PATCH pmg-api] fix #3734: scrub 'url' from style tags/attributes
Date: Thu, 25 Nov 2021 14:28:14 +0100	[thread overview]
Message-ID: <7d1801cb-0039-d9b7-ea76-9e848511d697@proxmox.com> (raw)
In-Reply-To: <20211125112231.3403069-1-d.csapak@proxmox.com>

On 25.11.21 12:22, Dominik Csapak wrote:
> if 'view images' for the quarantine is disabled, it is expected that
> *no* images will be loaded. but in addition to img (src/href/etc.)
> also css can load external images via the 'url' directive
> 
> since html scrubber does not parse/iterate over css, we simply remove
> the url+protocol part of those tags/attributes. this technically leaves behind
> invalid css, but the browsers should cope with that.
> (we cannot 'cleanly' remove without much more effort because of quoting)
> 
> also we have to scrub the style tags in 'dump_html' since HTML::Scrubber
> does not have a way to modify the *content* of a tag, only the
> attributes...
> 
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
>  src/PMG/HTMLMail.pm | 25 ++++++++++++++++++++++---
>  1 file changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/src/PMG/HTMLMail.pm b/src/PMG/HTMLMail.pm
> index b69a596..987dc39 100644
> --- a/src/PMG/HTMLMail.pm
> +++ b/src/PMG/HTMLMail.pm
> @@ -15,8 +15,16 @@ use HTML::Scrubber;
>  use PMG::Utils;
>  use PMG::MIMEUtils;
>  
> +# $value is a ref to a string scalar
> +my sub remove_urls {
> +    my ($value) = @_;
> +    # remove all urls with a protocol, this leaves partially invalid
> +    # css, but prevents the browser from loading them
> +    $$value =~ s|url\s*\(\s*(['"]?)[a-z]+://|($1|gi;

isn't the image() css function also problematic?
https://developer.mozilla.org/en-US/docs/Web/CSS/image/image()

And isn't this dangerous as an attacker could use this to craft the actual url,
for example:

urlurl(https://https:/my.super.evil.site.com/foo.png)

perl -we 'my $a = "urlurl(http://http:/my.super.evil.site.com/foo.png)"; $a =~ s|url\s*\(\s*(["]?)[a-z]+://|($1|gi; print "$a"'

prints:

url(https:/my.super.evil.site.com/foo.png)

So maybe rather just prefix the bad function names with __ or so, that would be safe
against such things, removing partial things won't ever be, at least not at the regex
level.

> +}
> +
>  sub dump_html {
> -    my ($tree, $cid_hash) = @_;
> +    my ($tree, $cid_hash, $viewimages) = @_;

I know this is the same name we use at caller side, but I'd still rather keep
local naming style: $view_images

>  
>      my @html = ();
>  
> @@ -37,6 +45,11 @@ sub dump_html {
>  			    $node->{src} = $datauri;
>  			}
>  		    }
> +		} elsif ($tag eq 'style' && !$viewimages) {
> +		    for my $el ($node->content_refs_list()) {
> +			next if ref $$el;
> +			remove_urls($el);
> +		    }

could be (and really just to let my line-bloat-reduction policy get out of hand):

		    remove_urls($_) for grep { !ref $$el } $node->content_refs_list();
>  		}
>  
>  		if($start) { # on the way in
> @@ -137,7 +150,13 @@ sub getscrubber {
>  	    span => 1,
>  	    src => $viewimages ? qr{^(?!(?:java)?script)}i : 0,
>  	    start => 1,
> -	    style => 1,
> +	    style => $viewimages ? 1 : sub {
> +		my ($obj, $tag_name, $attr_name, $value) = @_;
> +
> +		remove_urls(\$value);
> +
> +		return $value;
> +	    },

may be nicer do declare that separately and pass the code-ref directly by variable
or sub-ref-name here (not sure from top of my head if `my sub foo` can be passed along)

>  	    summary => 1,
>  	    tabindex => 1,
>  	    target => 1,
> @@ -267,7 +286,7 @@ sub entity_to_html {
>  	$tree->parse($raw);
>  	$tree->eof();
>  
> -	my $whtml = dump_html($tree, $viewimages ? $cid_hash : {});
> +	my $whtml = dump_html($tree, $viewimages ? $cid_hash : {}, $viewimages); #scrubs style tags

the comment is confusing, dump_html does more than just scrubbing style tags,
doesn't it?

Seems also like we split "show images" logic now a bit more, maybe always pass $cid_hash
and decide in dump_html if it's used or not, as it's just a ref it wouldn't be any
real overhead. But, no to hard feeling here and I did not looked at the full picture
to closely so I may miss something that would make that a bad idea.


>  	$tree->delete;
>  
>  	# remove dangerous/unneeded elements
> 





      reply	other threads:[~2021-11-25 13:28 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-25 11:22 Dominik Csapak
2021-11-25 13:28 ` Thomas Lamprecht [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7d1801cb-0039-d9b7-ea76-9e848511d697@proxmox.com \
    --to=t.lamprecht@proxmox.com \
    --cc=d.csapak@proxmox.com \
    --cc=pmg-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal