From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 3F8B881B85 for ; Thu, 25 Nov 2021 15:14:43 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 3D6ABE84F for ; Thu, 25 Nov 2021 15:14:43 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id A4DD6E846 for ; Thu, 25 Nov 2021 15:14:42 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 7F61046A8F for ; Thu, 25 Nov 2021 15:14:42 +0100 (CET) From: Dominik Csapak To: pmg-devel@lists.proxmox.com Date: Thu, 25 Nov 2021 15:14:41 +0100 Message-Id: <20211125141441.1383250-1-d.csapak@proxmox.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.195 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [htmlmail.pm] Subject: [pmg-devel] [PATCH pmg-api v2] fix #3734: scrub 'url' from style tags/attributes X-BeenThere: pmg-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Mail Gateway development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Nov 2021 14:14:43 -0000 if 'view images' for the quarantine is disabled, it is expected that *no* images will be loaded. but in addition to img (src/href/etc.) also css can load external images via the 'url' directive since html scrubber does not parse/iterate over css, we simply remove the url+protocol part of those tags/attributes. this technically leaves behind invalid css, but the browsers should cope with that. (we cannot 'cleanly' remove without much more effort because of quoting) also we have to scrub the style tags in 'dump_html' since HTML::Scrubber does not have a way to modify the *content* of a tag, only the attributes... Signed-off-by: Dominik Csapak --- changes from v1: * replace url with ___ and protocol:// with _ instead of removing * move sub out and use the reference * always pass $cid_hash and only use it in the function when $view_images is set * improve comment to show what 'dump_html' does @thomas: a note to our off-list discussion regarding url-encoding the protocol: you *could* do it, but the browser does not recognize it as a protocol and interprets it as a relative url, so we're safe on this regard src/PMG/HTMLMail.pm | 31 +++++++++++++++++++++++++++---- 1 file changed, 27 insertions(+), 4 deletions(-) diff --git a/src/PMG/HTMLMail.pm b/src/PMG/HTMLMail.pm index b69a596..c5c94bf 100644 --- a/src/PMG/HTMLMail.pm +++ b/src/PMG/HTMLMail.pm @@ -15,8 +15,26 @@ use HTML::Scrubber; use PMG::Utils; use PMG::MIMEUtils; +# $value is a ref to a string scalar +my sub remove_urls { + my ($value) = @_; + # convert 'url([..])' to '___([..])' so the browser does not load it + $$value =~ s|url\(|___(|gi; + + # similar for all protocols + $$value =~ s|[a-z0-9]+://|_|gi; +} + +my sub remove_urls_from_attr { + my ($obj, $tag_name, $attr_name, $value) = @_; + + remove_urls(\$value); + + return $value; +} + sub dump_html { - my ($tree, $cid_hash) = @_; + my ($tree, $cid_hash, $view_images) = @_; my @html = (); @@ -31,7 +49,7 @@ sub dump_html { # try to open a new window when user activates a anchor $node->{target} = '_blank' if $tag eq 'a'; - if ($tag eq 'img') { + if ($tag eq 'img' && $view_images) { if ($node->{src} && $node->{src} =~ m/^cid:(\S+)$/) { if (my $datauri = $cid_hash->{$1}) { $node->{src} = $datauri; @@ -39,6 +57,10 @@ sub dump_html { } } + if ($tag eq 'style' && !$view_images) { + remove_urls($_) for grep { !ref $$_ } $node->content_refs_list(); + } + if($start) { # on the way in push(@html, $node->starttag); } else { @@ -137,7 +159,7 @@ sub getscrubber { span => 1, src => $viewimages ? qr{^(?!(?:java)?script)}i : 0, start => 1, - style => 1, + style => $viewimages ? 1 : \remove_urls_from_attr, summary => 1, tabindex => 1, target => 1, @@ -267,7 +289,8 @@ sub entity_to_html { $tree->parse($raw); $tree->eof(); - my $whtml = dump_html($tree, $viewimages ? $cid_hash : {}); + # normalizes html, replaces CID references with data uris and scrubs style tags + my $whtml = dump_html($tree, $cid_hash, $viewimages); $tree->delete; # remove dangerous/unneeded elements -- 2.30.2