all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Dominik Csapak <d.csapak@proxmox.com>, pmg-devel@lists.proxmox.com
Subject: Re: [pmg-devel] [PATCH pmg-api] fix #3734: scrub 'url' from style tags/attributes
Date: Thu, 25 Nov 2021 14:28:14 +0100	[thread overview]
Message-ID: <7d1801cb-0039-d9b7-ea76-9e848511d697@proxmox.com> (raw)
In-Reply-To: <20211125112231.3403069-1-d.csapak@proxmox.com>

On 25.11.21 12:22, Dominik Csapak wrote:
> if 'view images' for the quarantine is disabled, it is expected that
> *no* images will be loaded. but in addition to img (src/href/etc.)
> also css can load external images via the 'url' directive
> 
> since html scrubber does not parse/iterate over css, we simply remove
> the url+protocol part of those tags/attributes. this technically leaves behind
> invalid css, but the browsers should cope with that.
> (we cannot 'cleanly' remove without much more effort because of quoting)
> 
> also we have to scrub the style tags in 'dump_html' since HTML::Scrubber
> does not have a way to modify the *content* of a tag, only the
> attributes...
> 
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
>  src/PMG/HTMLMail.pm | 25 ++++++++++++++++++++++---
>  1 file changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/src/PMG/HTMLMail.pm b/src/PMG/HTMLMail.pm
> index b69a596..987dc39 100644
> --- a/src/PMG/HTMLMail.pm
> +++ b/src/PMG/HTMLMail.pm
> @@ -15,8 +15,16 @@ use HTML::Scrubber;
>  use PMG::Utils;
>  use PMG::MIMEUtils;
>  
> +# $value is a ref to a string scalar
> +my sub remove_urls {
> +    my ($value) = @_;
> +    # remove all urls with a protocol, this leaves partially invalid
> +    # css, but prevents the browser from loading them
> +    $$value =~ s|url\s*\(\s*(['"]?)[a-z]+://|($1|gi;

isn't the image() css function also problematic?
https://developer.mozilla.org/en-US/docs/Web/CSS/image/image()

And isn't this dangerous as an attacker could use this to craft the actual url,
for example:

urlurl(https://https:/my.super.evil.site.com/foo.png)

perl -we 'my $a = "urlurl(http://http:/my.super.evil.site.com/foo.png)"; $a =~ s|url\s*\(\s*(["]?)[a-z]+://|($1|gi; print "$a"'

prints:

url(https:/my.super.evil.site.com/foo.png)

So maybe rather just prefix the bad function names with __ or so, that would be safe
against such things, removing partial things won't ever be, at least not at the regex
level.

> +}
> +
>  sub dump_html {
> -    my ($tree, $cid_hash) = @_;
> +    my ($tree, $cid_hash, $viewimages) = @_;

I know this is the same name we use at caller side, but I'd still rather keep
local naming style: $view_images

>  
>      my @html = ();
>  
> @@ -37,6 +45,11 @@ sub dump_html {
>  			    $node->{src} = $datauri;
>  			}
>  		    }
> +		} elsif ($tag eq 'style' && !$viewimages) {
> +		    for my $el ($node->content_refs_list()) {
> +			next if ref $$el;
> +			remove_urls($el);
> +		    }

could be (and really just to let my line-bloat-reduction policy get out of hand):

		    remove_urls($_) for grep { !ref $$el } $node->content_refs_list();
>  		}
>  
>  		if($start) { # on the way in
> @@ -137,7 +150,13 @@ sub getscrubber {
>  	    span => 1,
>  	    src => $viewimages ? qr{^(?!(?:java)?script)}i : 0,
>  	    start => 1,
> -	    style => 1,
> +	    style => $viewimages ? 1 : sub {
> +		my ($obj, $tag_name, $attr_name, $value) = @_;
> +
> +		remove_urls(\$value);
> +
> +		return $value;
> +	    },

may be nicer do declare that separately and pass the code-ref directly by variable
or sub-ref-name here (not sure from top of my head if `my sub foo` can be passed along)

>  	    summary => 1,
>  	    tabindex => 1,
>  	    target => 1,
> @@ -267,7 +286,7 @@ sub entity_to_html {
>  	$tree->parse($raw);
>  	$tree->eof();
>  
> -	my $whtml = dump_html($tree, $viewimages ? $cid_hash : {});
> +	my $whtml = dump_html($tree, $viewimages ? $cid_hash : {}, $viewimages); #scrubs style tags

the comment is confusing, dump_html does more than just scrubbing style tags,
doesn't it?

Seems also like we split "show images" logic now a bit more, maybe always pass $cid_hash
and decide in dump_html if it's used or not, as it's just a ref it wouldn't be any
real overhead. But, no to hard feeling here and I did not looked at the full picture
to closely so I may miss something that would make that a bad idea.


>  	$tree->delete;
>  
>  	# remove dangerous/unneeded elements
> 





      reply	other threads:[~2021-11-25 13:28 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-25 11:22 Dominik Csapak
2021-11-25 13:28 ` Thomas Lamprecht [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7d1801cb-0039-d9b7-ea76-9e848511d697@proxmox.com \
    --to=t.lamprecht@proxmox.com \
    --cc=d.csapak@proxmox.com \
    --cc=pmg-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal