From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Dominik Csapak <d.csapak@proxmox.com>, pmg-devel@lists.proxmox.com
Subject: Re: [pmg-devel] [PATCH pmg-api] fix #3734: scrub 'url' from style tags/attributes
Date: Thu, 25 Nov 2021 14:28:14 +0100 [thread overview]
Message-ID: <7d1801cb-0039-d9b7-ea76-9e848511d697@proxmox.com> (raw)
In-Reply-To: <20211125112231.3403069-1-d.csapak@proxmox.com>
On 25.11.21 12:22, Dominik Csapak wrote:
> if 'view images' for the quarantine is disabled, it is expected that
> *no* images will be loaded. but in addition to img (src/href/etc.)
> also css can load external images via the 'url' directive
>
> since html scrubber does not parse/iterate over css, we simply remove
> the url+protocol part of those tags/attributes. this technically leaves behind
> invalid css, but the browsers should cope with that.
> (we cannot 'cleanly' remove without much more effort because of quoting)
>
> also we have to scrub the style tags in 'dump_html' since HTML::Scrubber
> does not have a way to modify the *content* of a tag, only the
> attributes...
>
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> src/PMG/HTMLMail.pm | 25 ++++++++++++++++++++++---
> 1 file changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/src/PMG/HTMLMail.pm b/src/PMG/HTMLMail.pm
> index b69a596..987dc39 100644
> --- a/src/PMG/HTMLMail.pm
> +++ b/src/PMG/HTMLMail.pm
> @@ -15,8 +15,16 @@ use HTML::Scrubber;
> use PMG::Utils;
> use PMG::MIMEUtils;
>
> +# $value is a ref to a string scalar
> +my sub remove_urls {
> + my ($value) = @_;
> + # remove all urls with a protocol, this leaves partially invalid
> + # css, but prevents the browser from loading them
> + $$value =~ s|url\s*\(\s*(['"]?)[a-z]+://|($1|gi;
isn't the image() css function also problematic?
https://developer.mozilla.org/en-US/docs/Web/CSS/image/image()
And isn't this dangerous as an attacker could use this to craft the actual url,
for example:
urlurl(https://https:/my.super.evil.site.com/foo.png)
perl -we 'my $a = "urlurl(http://http:/my.super.evil.site.com/foo.png)"; $a =~ s|url\s*\(\s*(["]?)[a-z]+://|($1|gi; print "$a"'
prints:
url(https:/my.super.evil.site.com/foo.png)
So maybe rather just prefix the bad function names with __ or so, that would be safe
against such things, removing partial things won't ever be, at least not at the regex
level.
> +}
> +
> sub dump_html {
> - my ($tree, $cid_hash) = @_;
> + my ($tree, $cid_hash, $viewimages) = @_;
I know this is the same name we use at caller side, but I'd still rather keep
local naming style: $view_images
>
> my @html = ();
>
> @@ -37,6 +45,11 @@ sub dump_html {
> $node->{src} = $datauri;
> }
> }
> + } elsif ($tag eq 'style' && !$viewimages) {
> + for my $el ($node->content_refs_list()) {
> + next if ref $$el;
> + remove_urls($el);
> + }
could be (and really just to let my line-bloat-reduction policy get out of hand):
remove_urls($_) for grep { !ref $$el } $node->content_refs_list();
> }
>
> if($start) { # on the way in
> @@ -137,7 +150,13 @@ sub getscrubber {
> span => 1,
> src => $viewimages ? qr{^(?!(?:java)?script)}i : 0,
> start => 1,
> - style => 1,
> + style => $viewimages ? 1 : sub {
> + my ($obj, $tag_name, $attr_name, $value) = @_;
> +
> + remove_urls(\$value);
> +
> + return $value;
> + },
may be nicer do declare that separately and pass the code-ref directly by variable
or sub-ref-name here (not sure from top of my head if `my sub foo` can be passed along)
> summary => 1,
> tabindex => 1,
> target => 1,
> @@ -267,7 +286,7 @@ sub entity_to_html {
> $tree->parse($raw);
> $tree->eof();
>
> - my $whtml = dump_html($tree, $viewimages ? $cid_hash : {});
> + my $whtml = dump_html($tree, $viewimages ? $cid_hash : {}, $viewimages); #scrubs style tags
the comment is confusing, dump_html does more than just scrubbing style tags,
doesn't it?
Seems also like we split "show images" logic now a bit more, maybe always pass $cid_hash
and decide in dump_html if it's used or not, as it's just a ref it wouldn't be any
real overhead. But, no to hard feeling here and I did not looked at the full picture
to closely so I may miss something that would make that a bad idea.
> $tree->delete;
>
> # remove dangerous/unneeded elements
>
prev parent reply other threads:[~2021-11-25 13:28 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-25 11:22 Dominik Csapak
2021-11-25 13:28 ` Thomas Lamprecht [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7d1801cb-0039-d9b7-ea76-9e848511d697@proxmox.com \
--to=t.lamprecht@proxmox.com \
--cc=d.csapak@proxmox.com \
--cc=pmg-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox