From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <d.csapak@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 37CFD81A56
 for <pmg-devel@lists.proxmox.com>; Thu, 25 Nov 2021 12:23:03 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 24E90BADF
 for <pmg-devel@lists.proxmox.com>; Thu, 25 Nov 2021 12:22:33 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id F373ABAD3
 for <pmg-devel@lists.proxmox.com>; Thu, 25 Nov 2021 12:22:31 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id C338F46859
 for <pmg-devel@lists.proxmox.com>; Thu, 25 Nov 2021 12:22:31 +0100 (CET)
From: Dominik Csapak <d.csapak@proxmox.com>
To: pmg-devel@lists.proxmox.com
Date: Thu, 25 Nov 2021 12:22:31 +0100
Message-Id: <20211125112231.3403069-1-d.csapak@proxmox.com>
X-Mailer: git-send-email 2.30.2
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.200 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [htmlmail.pm]
Subject: [pmg-devel] [PATCH pmg-api] fix #3734: scrub 'url' from style
 tags/attributes
X-BeenThere: pmg-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Mail Gateway development discussion
 <pmg-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pmg-devel>, 
 <mailto:pmg-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pmg-devel/>
List-Post: <mailto:pmg-devel@lists.proxmox.com>
List-Help: <mailto:pmg-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pmg-devel>, 
 <mailto:pmg-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Thu, 25 Nov 2021 11:23:03 -0000

if 'view images' for the quarantine is disabled, it is expected that
*no* images will be loaded. but in addition to img (src/href/etc.)
also css can load external images via the 'url' directive

since html scrubber does not parse/iterate over css, we simply remove
the url+protocol part of those tags/attributes. this technically leaves behind
invalid css, but the browsers should cope with that.
(we cannot 'cleanly' remove without much more effort because of quoting)

also we have to scrub the style tags in 'dump_html' since HTML::Scrubber
does not have a way to modify the *content* of a tag, only the
attributes...

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
 src/PMG/HTMLMail.pm | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/src/PMG/HTMLMail.pm b/src/PMG/HTMLMail.pm
index b69a596..987dc39 100644
--- a/src/PMG/HTMLMail.pm
+++ b/src/PMG/HTMLMail.pm
@@ -15,8 +15,16 @@ use HTML::Scrubber;
 use PMG::Utils;
 use PMG::MIMEUtils;
 
+# $value is a ref to a string scalar
+my sub remove_urls {
+    my ($value) = @_;
+    # remove all urls with a protocol, this leaves partially invalid
+    # css, but prevents the browser from loading them
+    $$value =~ s|url\s*\(\s*(['"]?)[a-z]+://|($1|gi;
+}
+
 sub dump_html {
-    my ($tree, $cid_hash) = @_;
+    my ($tree, $cid_hash, $viewimages) = @_;
 
     my @html = ();
 
@@ -37,6 +45,11 @@ sub dump_html {
 			    $node->{src} = $datauri;
 			}
 		    }
+		} elsif ($tag eq 'style' && !$viewimages) {
+		    for my $el ($node->content_refs_list()) {
+			next if ref $$el;
+			remove_urls($el);
+		    }
 		}
 
 		if($start) { # on the way in
@@ -137,7 +150,13 @@ sub getscrubber {
 	    span => 1,
 	    src => $viewimages ? qr{^(?!(?:java)?script)}i : 0,
 	    start => 1,
-	    style => 1,
+	    style => $viewimages ? 1 : sub {
+		my ($obj, $tag_name, $attr_name, $value) = @_;
+
+		remove_urls(\$value);
+
+		return $value;
+	    },
 	    summary => 1,
 	    tabindex => 1,
 	    target => 1,
@@ -267,7 +286,7 @@ sub entity_to_html {
 	$tree->parse($raw);
 	$tree->eof();
 
-	my $whtml = dump_html($tree, $viewimages ? $cid_hash : {});
+	my $whtml = dump_html($tree, $viewimages ? $cid_hash : {}, $viewimages); #scrubs style tags
 	$tree->delete;
 
 	# remove dangerous/unneeded elements
-- 
2.30.2