From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id A989481A8D for ; Thu, 25 Nov 2021 14:28:48 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 952A5D9AC for ; Thu, 25 Nov 2021 14:28:18 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 27DDCD99B for ; Thu, 25 Nov 2021 14:28:16 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id EB2EE46871 for ; Thu, 25 Nov 2021 14:28:15 +0100 (CET) Message-ID: <7d1801cb-0039-d9b7-ea76-9e848511d697@proxmox.com> Date: Thu, 25 Nov 2021 14:28:14 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:95.0) Gecko/20100101 Thunderbird/95.0 Content-Language: en-US To: Dominik Csapak , pmg-devel@lists.proxmox.com References: <20211125112231.3403069-1-d.csapak@proxmox.com> From: Thomas Lamprecht In-Reply-To: <20211125112231.3403069-1-d.csapak@proxmox.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-SPAM-LEVEL: Spam detection results: 0 AWL 2.153 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -4.1 Looks like a legit reply (A) SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [site.com, htmlmail.pm, http.com, mozilla.org] Subject: Re: [pmg-devel] [PATCH pmg-api] fix #3734: scrub 'url' from style tags/attributes X-BeenThere: pmg-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Mail Gateway development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Nov 2021 13:28:48 -0000 On 25.11.21 12:22, Dominik Csapak wrote: > if 'view images' for the quarantine is disabled, it is expected that > *no* images will be loaded. but in addition to img (src/href/etc.) > also css can load external images via the 'url' directive > > since html scrubber does not parse/iterate over css, we simply remove > the url+protocol part of those tags/attributes. this technically leaves behind > invalid css, but the browsers should cope with that. > (we cannot 'cleanly' remove without much more effort because of quoting) > > also we have to scrub the style tags in 'dump_html' since HTML::Scrubber > does not have a way to modify the *content* of a tag, only the > attributes... > > Signed-off-by: Dominik Csapak > --- > src/PMG/HTMLMail.pm | 25 ++++++++++++++++++++++--- > 1 file changed, 22 insertions(+), 3 deletions(-) > > diff --git a/src/PMG/HTMLMail.pm b/src/PMG/HTMLMail.pm > index b69a596..987dc39 100644 > --- a/src/PMG/HTMLMail.pm > +++ b/src/PMG/HTMLMail.pm > @@ -15,8 +15,16 @@ use HTML::Scrubber; > use PMG::Utils; > use PMG::MIMEUtils; > > +# $value is a ref to a string scalar > +my sub remove_urls { > + my ($value) = @_; > + # remove all urls with a protocol, this leaves partially invalid > + # css, but prevents the browser from loading them > + $$value =~ s|url\s*\(\s*(['"]?)[a-z]+://|($1|gi; isn't the image() css function also problematic? https://developer.mozilla.org/en-US/docs/Web/CSS/image/image() And isn't this dangerous as an attacker could use this to craft the actual url, for example: urlurl(https://https:/my.super.evil.site.com/foo.png) perl -we 'my $a = "urlurl(http://http:/my.super.evil.site.com/foo.png)"; $a =~ s|url\s*\(\s*(["]?)[a-z]+://|($1|gi; print "$a"' prints: url(https:/my.super.evil.site.com/foo.png) So maybe rather just prefix the bad function names with __ or so, that would be safe against such things, removing partial things won't ever be, at least not at the regex level. > +} > + > sub dump_html { > - my ($tree, $cid_hash) = @_; > + my ($tree, $cid_hash, $viewimages) = @_; I know this is the same name we use at caller side, but I'd still rather keep local naming style: $view_images > > my @html = (); > > @@ -37,6 +45,11 @@ sub dump_html { > $node->{src} = $datauri; > } > } > + } elsif ($tag eq 'style' && !$viewimages) { > + for my $el ($node->content_refs_list()) { > + next if ref $$el; > + remove_urls($el); > + } could be (and really just to let my line-bloat-reduction policy get out of hand): remove_urls($_) for grep { !ref $$el } $node->content_refs_list(); > } > > if($start) { # on the way in > @@ -137,7 +150,13 @@ sub getscrubber { > span => 1, > src => $viewimages ? qr{^(?!(?:java)?script)}i : 0, > start => 1, > - style => 1, > + style => $viewimages ? 1 : sub { > + my ($obj, $tag_name, $attr_name, $value) = @_; > + > + remove_urls(\$value); > + > + return $value; > + }, may be nicer do declare that separately and pass the code-ref directly by variable or sub-ref-name here (not sure from top of my head if `my sub foo` can be passed along) > summary => 1, > tabindex => 1, > target => 1, > @@ -267,7 +286,7 @@ sub entity_to_html { > $tree->parse($raw); > $tree->eof(); > > - my $whtml = dump_html($tree, $viewimages ? $cid_hash : {}); > + my $whtml = dump_html($tree, $viewimages ? $cid_hash : {}, $viewimages); #scrubs style tags the comment is confusing, dump_html does more than just scrubbing style tags, doesn't it? Seems also like we split "show images" logic now a bit more, maybe always pass $cid_hash and decide in dump_html if it's used or not, as it's just a ref it wouldn't be any real overhead. But, no to hard feeling here and I did not looked at the full picture to closely so I may miss something that would make that a bad idea. > $tree->delete; > > # remove dangerous/unneeded elements >