From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id DCB5F81C22 for ; Thu, 25 Nov 2021 18:26:05 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id D26DE11120 for ; Thu, 25 Nov 2021 18:26:05 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id BDC4811110 for ; Thu, 25 Nov 2021 18:26:01 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 9379246A9B for ; Thu, 25 Nov 2021 18:26:01 +0100 (CET) Message-ID: <24bc3dce-0271-a982-d163-c885e9f92e8a@proxmox.com> Date: Thu, 25 Nov 2021 18:26:00 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:95.0) Gecko/20100101 Thunderbird/95.0 Content-Language: en-US To: Dominik Csapak , pmg-devel@lists.proxmox.com References: <20211125141441.1383250-1-d.csapak@proxmox.com> From: Thomas Lamprecht In-Reply-To: <20211125141441.1383250-1-d.csapak@proxmox.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-SPAM-LEVEL: Spam detection results: 0 AWL 2.143 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -4.1 Looks like a legit reply (A) SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [htmlmail.pm] Subject: Re: [pmg-devel] [PATCH pmg-api v2] fix #3734: scrub 'url' from style tags/attributes X-BeenThere: pmg-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Mail Gateway development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Nov 2021 17:26:05 -0000 On 25.11.21 15:14, Dominik Csapak wrote: > if 'view images' for the quarantine is disabled, it is expected that > *no* images will be loaded. but in addition to img (src/href/etc.) > also css can load external images via the 'url' directive > > since html scrubber does not parse/iterate over css, we simply remove > the url+protocol part of those tags/attributes. this technically leaves behind > invalid css, but the browsers should cope with that. > (we cannot 'cleanly' remove without much more effort because of quoting) > > also we have to scrub the style tags in 'dump_html' since HTML::Scrubber > does not have a way to modify the *content* of a tag, only the > attributes... > I found two issues (see inline), I got fully commited followups here, but I did not push that yet so we can quick check tomorrow if its ok for you and that I missed nothing else. > Signed-off-by: Dominik Csapak > --- > changes from v1: > * replace url with ___ and protocol:// with _ instead of removing > * move sub out and use the reference > * always pass $cid_hash and only use it in the function when > $view_images is set > * improve comment to show what 'dump_html' does > > @thomas: a note to our off-list discussion regarding url-encoding the > protocol: you *could* do it, but the browser does not recognize it as > a protocol and interprets it as a relative url, so we're safe on > this regard thx for checking! > > src/PMG/HTMLMail.pm | 31 +++++++++++++++++++++++++++---- > 1 file changed, 27 insertions(+), 4 deletions(-) > > diff --git a/src/PMG/HTMLMail.pm b/src/PMG/HTMLMail.pm > index b69a596..c5c94bf 100644 > --- a/src/PMG/HTMLMail.pm > +++ b/src/PMG/HTMLMail.pm > @@ -15,8 +15,26 @@ use HTML::Scrubber; > use PMG::Utils; > use PMG::MIMEUtils; > > +# $value is a ref to a string scalar > +my sub remove_urls { > + my ($value) = @_; $$value can be undef here, so I added a return if !defined $$value; to avoid a ugly warning like: pmgproxy[164923]: Use of uninitialized value in substitution (s///) at /usr/share/perl5/PMG/HTMLMail.pm line 22. pmgproxy[164923]: Use of uninitialized value in substitution (s///) at /usr/share/perl5/PMG/HTMLMail.pm line 25. every time one loads a mail where this happens. > -- 8< snip 8< -- > @@ -137,7 +159,7 @@ sub getscrubber { > span => 1, > src => $viewimages ? qr{^(?!(?:java)?script)}i : 0, > start => 1, > - style => 1, > + style => $viewimages ? 1 : \remove_urls_from_attr, this actually does not works as expected, to get the callback functionality we need to set it at the "rules" one, not here at the "default" one, which really just is the boolean default for that attribute type, and setting it to an code ref-makes may call it once but due to returning undef it just plainly disables the tag, which is way more scrubbing than we want to achieve.