public inbox for pmg-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pmg-devel] [PATCH pmg-api v2] fix #3734: scrub 'url' from style tags/attributes
@ 2021-11-25 14:14 Dominik Csapak
  2021-11-25 17:26 ` Thomas Lamprecht
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Dominik Csapak @ 2021-11-25 14:14 UTC (permalink / raw)
  To: pmg-devel

if 'view images' for the quarantine is disabled, it is expected that
*no* images will be loaded. but in addition to img (src/href/etc.)
also css can load external images via the 'url' directive

since html scrubber does not parse/iterate over css, we simply remove
the url+protocol part of those tags/attributes. this technically leaves behind
invalid css, but the browsers should cope with that.
(we cannot 'cleanly' remove without much more effort because of quoting)

also we have to scrub the style tags in 'dump_html' since HTML::Scrubber
does not have a way to modify the *content* of a tag, only the
attributes...

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
changes from v1:
* replace url with ___ and protocol:// with _ instead of removing
* move sub out and use the reference
* always pass $cid_hash and only use it in the function when
  $view_images is set
* improve comment to show what 'dump_html' does

@thomas: a note to our off-list discussion regarding url-encoding the
protocol: you *could* do it, but the browser does not recognize it as
a protocol and interprets it as a relative url, so we're safe on
this regard

 src/PMG/HTMLMail.pm | 31 +++++++++++++++++++++++++++----
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/src/PMG/HTMLMail.pm b/src/PMG/HTMLMail.pm
index b69a596..c5c94bf 100644
--- a/src/PMG/HTMLMail.pm
+++ b/src/PMG/HTMLMail.pm
@@ -15,8 +15,26 @@ use HTML::Scrubber;
 use PMG::Utils;
 use PMG::MIMEUtils;
 
+# $value is a ref to a string scalar
+my sub remove_urls {
+    my ($value) = @_;
+    # convert 'url([..])' to '___([..])' so the browser does not load it
+    $$value =~ s|url\(|___(|gi;
+
+    # similar for all protocols
+    $$value =~ s|[a-z0-9]+://|_|gi;
+}
+
+my sub remove_urls_from_attr {
+    my ($obj, $tag_name, $attr_name, $value) = @_;
+
+    remove_urls(\$value);
+
+    return $value;
+}
+
 sub dump_html {
-    my ($tree, $cid_hash) = @_;
+    my ($tree, $cid_hash, $view_images) = @_;
 
     my @html = ();
 
@@ -31,7 +49,7 @@ sub dump_html {
 		# try to open a new window when user activates a anchor
 		$node->{target} = '_blank' if $tag eq 'a';
 
-		if ($tag eq 'img') {
+		if ($tag eq 'img' && $view_images) {
 		    if ($node->{src} && $node->{src} =~ m/^cid:(\S+)$/) {
 			if (my $datauri = $cid_hash->{$1}) {
 			    $node->{src} = $datauri;
@@ -39,6 +57,10 @@ sub dump_html {
 		    }
 		}
 
+		if ($tag eq 'style' && !$view_images) {
+		    remove_urls($_) for grep { !ref $$_ } $node->content_refs_list();
+		}
+
 		if($start) { # on the way in
 		    push(@html, $node->starttag);
 		} else {
@@ -137,7 +159,7 @@ sub getscrubber {
 	    span => 1,
 	    src => $viewimages ? qr{^(?!(?:java)?script)}i : 0,
 	    start => 1,
-	    style => 1,
+	    style => $viewimages ? 1 : \remove_urls_from_attr,
 	    summary => 1,
 	    tabindex => 1,
 	    target => 1,
@@ -267,7 +289,8 @@ sub entity_to_html {
 	$tree->parse($raw);
 	$tree->eof();
 
-	my $whtml = dump_html($tree, $viewimages ? $cid_hash : {});
+	# normalizes html, replaces CID references with data uris and scrubs style tags
+	my $whtml = dump_html($tree, $cid_hash, $viewimages);
 	$tree->delete;
 
 	# remove dangerous/unneeded elements
-- 
2.30.2





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pmg-devel] [PATCH pmg-api v2] fix #3734: scrub 'url' from style tags/attributes
  2021-11-25 14:14 [pmg-devel] [PATCH pmg-api v2] fix #3734: scrub 'url' from style tags/attributes Dominik Csapak
@ 2021-11-25 17:26 ` Thomas Lamprecht
  2021-11-26  7:28 ` Thomas Lamprecht
  2021-11-26  9:07 ` [pmg-devel] applied: " Thomas Lamprecht
  2 siblings, 0 replies; 5+ messages in thread
From: Thomas Lamprecht @ 2021-11-25 17:26 UTC (permalink / raw)
  To: Dominik Csapak, pmg-devel

On 25.11.21 15:14, Dominik Csapak wrote:
> if 'view images' for the quarantine is disabled, it is expected that
> *no* images will be loaded. but in addition to img (src/href/etc.)
> also css can load external images via the 'url' directive
> 
> since html scrubber does not parse/iterate over css, we simply remove
> the url+protocol part of those tags/attributes. this technically leaves behind
> invalid css, but the browsers should cope with that.
> (we cannot 'cleanly' remove without much more effort because of quoting)
> 
> also we have to scrub the style tags in 'dump_html' since HTML::Scrubber
> does not have a way to modify the *content* of a tag, only the
> attributes...
>

I found two issues (see inline), I got fully commited followups here, but I did
not push that yet so we can quick check tomorrow if its ok for you and that I
missed nothing else.

> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> changes from v1:
> * replace url with ___ and protocol:// with _ instead of removing
> * move sub out and use the reference
> * always pass $cid_hash and only use it in the function when
>   $view_images is set
> * improve comment to show what 'dump_html' does
> 
> @thomas: a note to our off-list discussion regarding url-encoding the
> protocol: you *could* do it, but the browser does not recognize it as
> a protocol and interprets it as a relative url, so we're safe on
> this regard

thx for checking!

> 
>  src/PMG/HTMLMail.pm | 31 +++++++++++++++++++++++++++----
>  1 file changed, 27 insertions(+), 4 deletions(-)
> 
> diff --git a/src/PMG/HTMLMail.pm b/src/PMG/HTMLMail.pm
> index b69a596..c5c94bf 100644
> --- a/src/PMG/HTMLMail.pm
> +++ b/src/PMG/HTMLMail.pm
> @@ -15,8 +15,26 @@ use HTML::Scrubber;
>  use PMG::Utils;
>  use PMG::MIMEUtils;
>  
> +# $value is a ref to a string scalar
> +my sub remove_urls {
> +    my ($value) = @_;

$$value can be undef here, so I added a 
return if !defined $$value;

to avoid a ugly warning like:

pmgproxy[164923]: Use of uninitialized value in substitution (s///) at /usr/share/perl5/PMG/HTMLMail.pm line 22.
pmgproxy[164923]: Use of uninitialized value in substitution (s///) at /usr/share/perl5/PMG/HTMLMail.pm line 25.

every time one loads a mail where this happens.

> -- 8< snip 8< --

> @@ -137,7 +159,7 @@ sub getscrubber {
>  	    span => 1,
>  	    src => $viewimages ? qr{^(?!(?:java)?script)}i : 0,
>  	    start => 1,
> -	    style => 1,
> +	    style => $viewimages ? 1 : \remove_urls_from_attr,

this actually does not works as expected, to get the callback functionality we need
to set it at the "rules" one, not here at the "default" one, which really just is the
boolean default for that attribute type, and setting it to an code ref-makes may call
it once but due to returning undef it just plainly disables the tag, which is way more
scrubbing than we want to achieve.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pmg-devel] [PATCH pmg-api v2] fix #3734: scrub 'url' from style tags/attributes
  2021-11-25 14:14 [pmg-devel] [PATCH pmg-api v2] fix #3734: scrub 'url' from style tags/attributes Dominik Csapak
  2021-11-25 17:26 ` Thomas Lamprecht
@ 2021-11-26  7:28 ` Thomas Lamprecht
  2021-11-26  7:51   ` Dominik Csapak
  2021-11-26  9:07 ` [pmg-devel] applied: " Thomas Lamprecht
  2 siblings, 1 reply; 5+ messages in thread
From: Thomas Lamprecht @ 2021-11-26  7:28 UTC (permalink / raw)
  To: Dominik Csapak, pmg-devel

On 25.11.21 15:14, Dominik Csapak wrote:
> if 'view images' for the quarantine is disabled, it is expected that
> *no* images will be loaded. but in addition to img (src/href/etc.)
> also css can load external images via the 'url' directive
> 
> since html scrubber does not parse/iterate over css, we simply remove
> the url+protocol part of those tags/attributes. this technically leaves behind
> invalid css, but the browsers should cope with that.
> (we cannot 'cleanly' remove without much more effort because of quoting)
> 
> also we have to scrub the style tags in 'dump_html' since HTML::Scrubber
> does not have a way to modify the *content* of a tag, only the
> attributes...
> 
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> changes from v1:
> * replace url with ___ and protocol:// with _ instead of removing
> * move sub out and use the reference
> * always pass $cid_hash and only use it in the function when
>   $view_images is set
> * improve comment to show what 'dump_html' does
> 
> @thomas: a note to our off-list discussion regarding url-encoding the
> protocol: you *could* do it, but the browser does not recognize it as
> a protocol and interprets it as a relative url, so we're safe on
> this regard
> 

Another option: Setting the content security policy:

For this call we could use:
$resp->header("Content-Security-Policy", "default-src 'self'; style-src 'self' 'unsafe-inline';"); 

Maybe even;
"Content-Security-Policy", "default-src 'none'; style-src 'unsafe-inline';"

That works out quite well here.

In the long run the CSP is something we could evaluate in general, at least for API
calls, as only (mostly?) those contain dynamic, sometimes user/foreign controlled input.

If we would like to set a CSP for everything we'd need something like:

$resp->header("Content-Security-Policy", "default-src 'self'; style-src 'self' 'unsafe-inline'; script-src 'self' 'unsafe-inline' 'unsafe-eval'; img-src 'self' data:;");

to cover it, and ideally switch from 'unsafe-inline' to nonce/sha approach of
whitelisting, but in any case, nothing that we'd want to rush out now.

Also, doing both is an option, avoiding requests in the first place, so no scary
errors in the browser console, and the CSP, for really just this the *quarantine/content
API call, as safety net..




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pmg-devel] [PATCH pmg-api v2] fix #3734: scrub 'url' from style tags/attributes
  2021-11-26  7:28 ` Thomas Lamprecht
@ 2021-11-26  7:51   ` Dominik Csapak
  0 siblings, 0 replies; 5+ messages in thread
From: Dominik Csapak @ 2021-11-26  7:51 UTC (permalink / raw)
  To: Thomas Lamprecht, pmg-devel

On 11/25/21 18:26, Thomas Lamprecht wrote:
 > On 25.11.21 15:14, Dominik Csapak wrote:
 >> if 'view images' for the quarantine is disabled, it is expected that
 >> *no* images will be loaded. but in addition to img (src/href/etc.)
 >> also css can load external images via the 'url' directive
 >>
 >> since html scrubber does not parse/iterate over css, we simply remove
 >> the url+protocol part of those tags/attributes. this technically 
leaves behind
 >> invalid css, but the browsers should cope with that.
 >> (we cannot 'cleanly' remove without much more effort because of quoting)
 >>
 >> also we have to scrub the style tags in 'dump_html' since HTML::Scrubber
 >> does not have a way to modify the *content* of a tag, only the
 >> attributes...
 >>
 >
 > I found two issues (see inline), I got fully commited followups here, 
but I did
 > not push that yet so we can quick check tomorrow if its ok for you 
and that I
 > missed nothing else.
 >
 >> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
 >> ---
 >> changes from v1:
 >> * replace url with ___ and protocol:// with _ instead of removing
 >> * move sub out and use the reference
 >> * always pass $cid_hash and only use it in the function when
 >>    $view_images is set
 >> * improve comment to show what 'dump_html' does
 >>
 >> @thomas: a note to our off-list discussion regarding url-encoding the
 >> protocol: you *could* do it, but the browser does not recognize it as
 >> a protocol and interprets it as a relative url, so we're safe on
 >> this regard
 >
 > thx for checking!
 >
 >>
 >>   src/PMG/HTMLMail.pm | 31 +++++++++++++++++++++++++++----
 >>   1 file changed, 27 insertions(+), 4 deletions(-)
 >>
 >> diff --git a/src/PMG/HTMLMail.pm b/src/PMG/HTMLMail.pm
 >> index b69a596..c5c94bf 100644
 >> --- a/src/PMG/HTMLMail.pm
 >> +++ b/src/PMG/HTMLMail.pm
 >> @@ -15,8 +15,26 @@ use HTML::Scrubber;
 >>   use PMG::Utils;
 >>   use PMG::MIMEUtils;
 >>
 >> +# $value is a ref to a string scalar
 >> +my sub remove_urls {
 >> +    my ($value) = @_;
 >
 > $$value can be undef here, so I added a
 > return if !defined $$value;
 >
 > to avoid a ugly warning like:
 >
 > pmgproxy[164923]: Use of uninitialized value in substitution (s///) 
at /usr/share/perl5/PMG/HTMLMail.pm line 22.
 > pmgproxy[164923]: Use of uninitialized value in substitution (s///) 
at /usr/share/perl5/PMG/HTMLMail.pm line 25.
 >
 > every time one loads a mail where this happens.

make sense
 >
 >> -- 8< snip 8< --
 >
 >> @@ -137,7 +159,7 @@ sub getscrubber {
 >>   	    span => 1,
 >>   	    src => $viewimages ? qr{^(?!(?:java)?script)}i : 0,
 >>   	    start => 1,
 >> -	    style => 1,
 >> +	    style => $viewimages ? 1 : \remove_urls_from_attr,
 >
 > this actually does not works as expected, to get the callback 
functionality we need
 > to set it at the "rules" one, not here at the "default" one, which 
really just is the
 > boolean default for that attribute type, and setting it to an code 
ref-makes may call
 > it once but due to returning undef it just plainly disables the tag, 
which is way more
 > scrubbing than we want to achieve.
 >

ok so you're right in that it does not work correctly here
(idk why i did not catch that), but we can use a sub here

the 'default' contains the 'default' rules, else the regexes would
also not work

what prevents it from working here that it does not like the
reference to the function, so making it a
my $remove_urls_from_attr = sub {}; and giving '$remove_urls_from_attr' 
here works as expected. (i'll send a v3)

On 11/26/21 08:28, Thomas Lamprecht wrote:
> On 25.11.21 15:14, Dominik Csapak wrote:
>> if 'view images' for the quarantine is disabled, it is expected that
>> *no* images will be loaded. but in addition to img (src/href/etc.)
>> also css can load external images via the 'url' directive
>>
>> since html scrubber does not parse/iterate over css, we simply remove
>> the url+protocol part of those tags/attributes. this technically leaves behind
>> invalid css, but the browsers should cope with that.
>> (we cannot 'cleanly' remove without much more effort because of quoting)
>>
>> also we have to scrub the style tags in 'dump_html' since HTML::Scrubber
>> does not have a way to modify the *content* of a tag, only the
>> attributes...
>>
>> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
>> ---
>> changes from v1:
>> * replace url with ___ and protocol:// with _ instead of removing
>> * move sub out and use the reference
>> * always pass $cid_hash and only use it in the function when
>>    $view_images is set
>> * improve comment to show what 'dump_html' does
>>
>> @thomas: a note to our off-list discussion regarding url-encoding the
>> protocol: you *could* do it, but the browser does not recognize it as
>> a protocol and interprets it as a relative url, so we're safe on
>> this regard
>>
> 
> Another option: Setting the content security policy:
> 
> For this call we could use:
> $resp->header("Content-Security-Policy", "default-src 'self'; style-src 'self' 'unsafe-inline';");
> 
> Maybe even;
> "Content-Security-Policy", "default-src 'none'; style-src 'unsafe-inline';"
> 
> That works out quite well here.
> 
> In the long run the CSP is something we could evaluate in general, at least for API
> calls, as only (mostly?) those contain dynamic, sometimes user/foreign controlled input.
> 
> If we would like to set a CSP for everything we'd need something like:
> 
> $resp->header("Content-Security-Policy", "default-src 'self'; style-src 'self' 'unsafe-inline'; script-src 'self' 'unsafe-inline' 'unsafe-eval'; img-src 'self' data:;");
> 
> to cover it, and ideally switch from 'unsafe-inline' to nonce/sha approach of
> whitelisting, but in any case, nothing that we'd want to rush out now.
> 
> Also, doing both is an option, avoiding requests in the first place, so no scary
> errors in the browser console, and the CSP, for really just this the *quarantine/content
> API call, as safety net..
> 

yes imho setting the csp is also good, but requires patching http-server
too so that we can set the header there..





^ permalink raw reply	[flat|nested] 5+ messages in thread

* [pmg-devel] applied: [PATCH pmg-api v2] fix #3734: scrub 'url' from style tags/attributes
  2021-11-25 14:14 [pmg-devel] [PATCH pmg-api v2] fix #3734: scrub 'url' from style tags/attributes Dominik Csapak
  2021-11-25 17:26 ` Thomas Lamprecht
  2021-11-26  7:28 ` Thomas Lamprecht
@ 2021-11-26  9:07 ` Thomas Lamprecht
  2 siblings, 0 replies; 5+ messages in thread
From: Thomas Lamprecht @ 2021-11-26  9:07 UTC (permalink / raw)
  To: Dominik Csapak, pmg-devel

On 25.11.21 15:14, Dominik Csapak wrote:
> if 'view images' for the quarantine is disabled, it is expected that
> *no* images will be loaded. but in addition to img (src/href/etc.)
> also css can load external images via the 'url' directive
> 
> since html scrubber does not parse/iterate over css, we simply remove
> the url+protocol part of those tags/attributes. this technically leaves behind
> invalid css, but the browsers should cope with that.
> (we cannot 'cleanly' remove without much more effort because of quoting)
> 
> also we have to scrub the style tags in 'dump_html' since HTML::Scrubber
> does not have a way to modify the *content* of a tag, only the
> attributes...
> 
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> changes from v1:
> * replace url with ___ and protocol:// with _ instead of removing
> * move sub out and use the reference
> * always pass $cid_hash and only use it in the function when
>   $view_images is set
> * improve comment to show what 'dump_html' does
> 
> @thomas: a note to our off-list discussion regarding url-encoding the
> protocol: you *could* do it, but the browser does not recognize it as
> a protocol and interprets it as a relative url, so we're safe on
> this regard
> 
>  src/PMG/HTMLMail.pm | 31 +++++++++++++++++++++++++++----
>  1 file changed, 27 insertions(+), 4 deletions(-)
> 
>

ok, so I went down the wrong road due to the code-ref passing, ref(\foo) being
SCALAR vs \&foo being CODE tripped up the scrubber.

So after a pair debugging/understanding session with Dominik (thx!) I now:
* appreciate our perl code way more, as Scrubber shows that one can do it way
  more cryptic and harder to grasp

* got that the style stuff now works pretty great, I only fixed the undef value
  variant for the url remover and passing the code-ref

applied, thanks!




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-11-26  9:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-25 14:14 [pmg-devel] [PATCH pmg-api v2] fix #3734: scrub 'url' from style tags/attributes Dominik Csapak
2021-11-25 17:26 ` Thomas Lamprecht
2021-11-26  7:28 ` Thomas Lamprecht
2021-11-26  7:51   ` Dominik Csapak
2021-11-26  9:07 ` [pmg-devel] applied: " Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal