From: Friedrich Weber <f.weber@proxmox.com>
To: Stoiko Ivanov <s.ivanov@proxmox.com>, pmg-devel@lists.proxmox.com
Subject: Re: [pmg-devel] [PATCH pmg-api v2 2/2] ruledb: content-type: add flag for matching only based on magic/content
Date: Tue, 18 Feb 2025 18:18:13 +0100 [thread overview]
Message-ID: <ab5bd9b1-cb08-4b77-8e1b-64eab07ac191@proxmox.com> (raw)
In-Reply-To: <20250218135416.54504-3-s.ivanov@proxmox.com>
On 18/02/2025 14:54, Stoiko Ivanov wrote:
> our current content-type matching is sensibly quite cautious in
> matching if any available information indicates a potential match:
> * mime-type detection based on file contents
> * mime-type detection based on file suffix
> * content-type header
>
> Sometimes this can lead to surprises (e.g. when a MUA sets the
> filetype of a pdf to application/octet-stream (the default type if no
> information is available), or a filter for zip-files matching
> docx-files.
>
> This change gives users the option to restrict matching only on the
> content as detected by xdg_mime_get_mime_type_for_data.
>
> This is a fix for the intial request in #2691 and addresses the
> suggestion from Friedrich from:
> https://bugzilla.proxmox.com/show_bug.cgi?id=5618#c2
Thanks for tackling this! I think having a flag like only-content makes
sense.
I tested this a bit and there seems to be one issue, steps to reproduce:
- add a What object with a Content Type Filter for application/pdf,
enable the new "Ignore header information" flag
- create a rule that blocks incoming mails matching this What object
- send an email with a random 1K blob as attachment that sets
Content-Type: application/pdf and some non-descriptive filename for the
attachment:
swaks --from [...] --to [...] --server [...] --attach-type
application/pdf --attach-name foo.bin --attach <(dd if=/dev/urandom
bs=1k count=1)
The email is blocked by the rule. But I would expect it to be accepted,
because the `xdg_mime_get_mime_type_for_data` shouldn't recognize the
random blob as PDF, and the user-provided Content-Type application/pdf
should be ignored.
I think the email is accepted because the magic ct [1] defaults to the
user-provided Content-Type and since `xdg_mime_get_mime_type_for_data`
returns application/octet-stream, we're keep it at the user-provided
Content-Type. I guess it would be nicer if the magic wouldn't default to
the user-provided Content-Type if "Ignore header information" is
enabled, but I'm not sure how easily this can be done.
[1]
https://git.proxmox.com/?p=pmg-api.git;a=blob;f=src/PMG/Utils.pm;h=0b8945f245;hb=6bbc222#l623
>
> matches on the other items can be created with Match Field objects
> (for the content-type header) and Filename (for the match based on the
> provided filename - combinations of those should give us the complete
> flexibility.
>
> inspired by the changes for disclaimer released with PMG 8.1:
> 51d1507 ("fix #2430: ruledb disclaimer: make separator configurable")
>
> Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
> ---
> would be grateful for suggestions that are a better fit than 'only-content'!
>
> src/PMG/RuleDB/ContentTypeFilter.pm | 75 ++++++++++++++++++++++++++---
> 1 file changed, 68 insertions(+), 7 deletions(-)
>
> diff --git a/src/PMG/RuleDB/ContentTypeFilter.pm b/src/PMG/RuleDB/ContentTypeFilter.pm
> index 0199311..550a880 100644
> --- a/src/PMG/RuleDB/ContentTypeFilter.pm
> +++ b/src/PMG/RuleDB/ContentTypeFilter.pm
> @@ -26,7 +26,7 @@ sub otype_text {
> }
>
> sub new {
> - my ($type, $fvalue, $ogroup) = @_;
> + my ($type, $fvalue, $ogroup, $only_content) = @_;
>
> my $class = ref($type) || $type;
>
> @@ -36,6 +36,7 @@ sub new {
> }
>
> my $self = $class->SUPER::new('content-type', $fvalue, $ogroup);
> + $self->{only_content} = $only_content;
>
> return $self;
> }
> @@ -52,9 +53,50 @@ sub load_attr {
> $obj->{field_value} = $nt;
> }
>
> + my $sth = $ruledb->{dbh}->prepare(
> + "SELECT * FROM Attribut WHERE Object_ID = ?");
> +
> + $sth->execute($id);
> +
> + $obj->{only_content} = 0;
> +
> + while (my $ref = $sth->fetchrow_hashref()) {
> + if ($ref->{name} eq 'only_content') {
> + $obj->{only_content} = $ref->{value};
> + }
> + }
> +
> + $sth->finish();
> +
> + $obj->{id} = $id;
> +
> + $obj->{digest} = Digest::SHA::sha1_hex( $id, $value, $ogroup, $obj->{only_content});
> +
> return $obj;
> }
>
> +sub save {
> + my ($self, $ruledb) = @_;
> +
> + if (defined($self->{id})) {
> + #update - clean old attribut entries
> + $ruledb->{dbh}->do(
> + "DELETE FROM Attribut WHERE Object_ID = ?",
> + undef, $self->{id});
> + }
> +
> + $self->{id} = $self->SUPER::save($ruledb);
> +
> + if (defined($self->{only_content})) {
> + $ruledb->{dbh}->do(
> + "INSERT INTO Attribut (Value, Name, Object_ID) VALUES (?, 'only_content', ?) ".
> + "ON CONFLICT(Object_ID, Name) DO UPDATE SET Value = Excluded.Value ",
> + undef, $self->{only_content}, $self->{id});
> + }
> +
> + return $self->{id};
> +}
> +
> sub parse_entity {
> my ($self, $entity) = @_;
>
> @@ -78,12 +120,16 @@ sub parse_entity {
>
> my $glob_ct = $entity->{PMX_glob_ct};
>
> - if ($header_ct && $header_ct =~ m|$self->{field_value}|) {
> - push @$res, $id;
> - } elsif ($magic_ct && $magic_ct =~ m|$self->{field_value}|) {
> - push @$res, $id;
> - } elsif ($glob_ct && $glob_ct =~ m|$self->{field_value}|) {
> + my $check_only_content = ${self}->{only_content} // 1;
> +
> + if ($magic_ct && $magic_ct =~ m|$self->{field_value}|) {
> push @$res, $id;
> + } elsif (!$check_only_content) {
> + if ($header_ct && $header_ct =~ m|$self->{field_value}|) {
> + push @$res, $id;
> + } elsif ($glob_ct && $glob_ct =~ m|$self->{field_value}|) {
> + push @$res, $id;
> + }
> }
> }
>
> @@ -112,19 +158,34 @@ sub properties {
> pattern => '[0-9a-zA-Z\/\\\[\]\+\-\.\*\_]+',
> maxLength => 1024,
> },
> + 'only-content' => {
> + description => "use content-type from scanning only (ignore filename and header)",
> + type => 'boolean',
> + optional => 1,
> + default => 0,
> + },
> };
> }
>
> sub get {
> my ($self) = @_;
>
> - return { contenttype => $self->{field_value} };
> + return {
> + contenttype => $self->{field_value},
> + 'only-content' => $self->{only_content},
> + };
> }
>
> sub update {
> my ($self, $param) = @_;
>
> $self->{field_value} = $param->{contenttype};
> +
> + if (defined($param->{'only-content'}) && $param->{'only-content'} == 1) {
> + $self->{only_content} = 1;
> + } else {
> + delete $self->{only_content};
> + }
> }
>
> 1;
_______________________________________________
pmg-devel mailing list
pmg-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pmg-devel
next prev parent reply other threads:[~2025-02-18 17:18 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-18 13:54 [pmg-devel] [PATCH pmg-api/pmg-gui v2] content-type filter: add content-only option Stoiko Ivanov
2025-02-18 13:54 ` [pmg-devel] [PATCH pmg-api v2 1/2] ruledb: disclaimer: simplify update-case Stoiko Ivanov
2025-02-18 13:54 ` [pmg-devel] [PATCH pmg-api v2 2/2] ruledb: content-type: add flag for matching only based on magic/content Stoiko Ivanov
2025-02-18 17:18 ` Friedrich Weber [this message]
2025-02-19 12:22 ` Stoiko Ivanov
2025-02-18 13:54 ` [pmg-devel] [PATCH pmg-gui v2 1/2] rules/object: remove icon from remove button Stoiko Ivanov
2025-02-18 13:54 ` [pmg-devel] [PATCH pmg-gui v2 2/2] rules/content-typefilter: add checkbox for file content only matching Stoiko Ivanov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ab5bd9b1-cb08-4b77-8e1b-64eab07ac191@proxmox.com \
--to=f.weber@proxmox.com \
--cc=pmg-devel@lists.proxmox.com \
--cc=s.ivanov@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal