From: Friedrich Weber <f.weber@proxmox.com>
To: Stoiko Ivanov <s.ivanov@proxmox.com>, pmg-devel@lists.proxmox.com
Subject: Re: [pmg-devel] [PATCH pmg-api v2 2/2] ruledb: content-type: add flag for matching only based on magic/content
Date: Tue, 18 Feb 2025 18:18:13 +0100 [thread overview]
Message-ID: <ab5bd9b1-cb08-4b77-8e1b-64eab07ac191@proxmox.com> (raw)
In-Reply-To: <20250218135416.54504-3-s.ivanov@proxmox.com>
On 18/02/2025 14:54, Stoiko Ivanov wrote:
> our current content-type matching is sensibly quite cautious in
> matching if any available information indicates a potential match:
> * mime-type detection based on file contents
> * mime-type detection based on file suffix
> * content-type header
>
> Sometimes this can lead to surprises (e.g. when a MUA sets the
> filetype of a pdf to application/octet-stream (the default type if no
> information is available), or a filter for zip-files matching
> docx-files.
>
> This change gives users the option to restrict matching only on the
> content as detected by xdg_mime_get_mime_type_for_data.
>
> This is a fix for the intial request in #2691 and addresses the
> suggestion from Friedrich from:
> https://bugzilla.proxmox.com/show_bug.cgi?id=5618#c2
Thanks for tackling this! I think having a flag like only-content makes
sense.
I tested this a bit and there seems to be one issue, steps to reproduce:
- add a What object with a Content Type Filter for application/pdf,
enable the new "Ignore header information" flag
- create a rule that blocks incoming mails matching this What object
- send an email with a random 1K blob as attachment that sets
Content-Type: application/pdf and some non-descriptive filename for the
attachment:
swaks --from [...] --to [...] --server [...] --attach-type
application/pdf --attach-name foo.bin --attach <(dd if=/dev/urandom
bs=1k count=1)
The email is blocked by the rule. But I would expect it to be accepted,
because the `xdg_mime_get_mime_type_for_data` shouldn't recognize the
random blob as PDF, and the user-provided Content-Type application/pdf
should be ignored.
I think the email is accepted because the magic ct [1] defaults to the
user-provided Content-Type and since `xdg_mime_get_mime_type_for_data`
returns application/octet-stream, we're keep it at the user-provided
Content-Type. I guess it would be nicer if the magic wouldn't default to
the user-provided Content-Type if "Ignore header information" is
enabled, but I'm not sure how easily this can be done.
[1]
https://git.proxmox.com/?p=pmg-api.git;a=blob;f=src/PMG/Utils.pm;h=0b8945f245;hb=6bbc222#l623
>
> matches on the other items can be created with Match Field objects
> (for the content-type header) and Filename (for the match based on the
> provided filename - combinations of those should give us the complete
> flexibility.
>
> inspired by the changes for disclaimer released with PMG 8.1:
> 51d1507 ("fix #2430: ruledb disclaimer: make separator configurable")
>
> Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
> ---
> would be grateful for suggestions that are a better fit than 'only-content'!
>
> src/PMG/RuleDB/ContentTypeFilter.pm | 75 ++++++++++++++++++++++++++---
> 1 file changed, 68 insertions(+), 7 deletions(-)
>
> diff --git a/src/PMG/RuleDB/ContentTypeFilter.pm b/src/PMG/RuleDB/ContentTypeFilter.pm
> index 0199311..550a880 100644
> --- a/src/PMG/RuleDB/ContentTypeFilter.pm
> +++ b/src/PMG/RuleDB/ContentTypeFilter.pm
> @@ -26,7 +26,7 @@ sub otype_text {
> }
>
> sub new {
> - my ($type, $fvalue, $ogroup) = @_;
> + my ($type, $fvalue, $ogroup, $only_content) = @_;
>
> my $class = ref($type) || $type;
>
> @@ -36,6 +36,7 @@ sub new {
> }
>
> my $self = $class->SUPER::new('content-type', $fvalue, $ogroup);
> + $self->{only_content} = $only_content;
>
> return $self;
> }
> @@ -52,9 +53,50 @@ sub load_attr {
> $obj->{field_value} = $nt;
> }
>
> + my $sth = $ruledb->{dbh}->prepare(
> + "SELECT * FROM Attribut WHERE Object_ID = ?");
> +
> + $sth->execute($id);
> +
> + $obj->{only_content} = 0;
> +
> + while (my $ref = $sth->fetchrow_hashref()) {
> + if ($ref->{name} eq 'only_content') {
> + $obj->{only_content} = $ref->{value};
> + }
> + }
> +
> + $sth->finish();
> +
> + $obj->{id} = $id;
> +
> + $obj->{digest} = Digest::SHA::sha1_hex( $id, $value, $ogroup, $obj->{only_content});
> +
> return $obj;
> }
>
> +sub save {
> + my ($self, $ruledb) = @_;
> +
> + if (defined($self->{id})) {
> + #update - clean old attribut entries
> + $ruledb->{dbh}->do(
> + "DELETE FROM Attribut WHERE Object_ID = ?",
> + undef, $self->{id});
> + }
> +
> + $self->{id} = $self->SUPER::save($ruledb);
> +
> + if (defined($self->{only_content})) {
> + $ruledb->{dbh}->do(
> + "INSERT INTO Attribut (Value, Name, Object_ID) VALUES (?, 'only_content', ?) ".
> + "ON CONFLICT(Object_ID, Name) DO UPDATE SET Value = Excluded.Value ",
> + undef, $self->{only_content}, $self->{id});
> + }
> +
> + return $self->{id};
> +}
> +
> sub parse_entity {
> my ($self, $entity) = @_;
>
> @@ -78,12 +120,16 @@ sub parse_entity {
>
> my $glob_ct = $entity->{PMX_glob_ct};
>
> - if ($header_ct && $header_ct =~ m|$self->{field_value}|) {
> - push @$res, $id;
> - } elsif ($magic_ct && $magic_ct =~ m|$self->{field_value}|) {
> - push @$res, $id;
> - } elsif ($glob_ct && $glob_ct =~ m|$self->{field_value}|) {
> + my $check_only_content = ${self}->{only_content} // 1;
> +
> + if ($magic_ct && $magic_ct =~ m|$self->{field_value}|) {
> push @$res, $id;
> + } elsif (!$check_only_content) {
> + if ($header_ct && $header_ct =~ m|$self->{field_value}|) {
> + push @$res, $id;
> + } elsif ($glob_ct && $glob_ct =~ m|$self->{field_value}|) {
> + push @$res, $id;
> + }
> }
> }
>
> @@ -112,19 +158,34 @@ sub properties {
> pattern => '[0-9a-zA-Z\/\\\[\]\+\-\.\*\_]+',
> maxLength => 1024,
> },
> + 'only-content' => {
> + description => "use content-type from scanning only (ignore filename and header)",
> + type => 'boolean',
> + optional => 1,
> + default => 0,
> + },
> };
> }
>
> sub get {
> my ($self) = @_;
>
> - return { contenttype => $self->{field_value} };
> + return {
> + contenttype => $self->{field_value},
> + 'only-content' => $self->{only_content},
> + };
> }
>
> sub update {
> my ($self, $param) = @_;
>
> $self->{field_value} = $param->{contenttype};
> +
> + if (defined($param->{'only-content'}) && $param->{'only-content'} == 1) {
> + $self->{only_content} = 1;
> + } else {
> + delete $self->{only_content};
> + }
> }
>
> 1;
_______________________________________________
pmg-devel mailing list
pmg-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pmg-devel
next prev parent reply other threads:[~2025-02-18 17:18 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-18 13:54 [pmg-devel] [PATCH pmg-api/pmg-gui v2] content-type filter: add content-only option Stoiko Ivanov
2025-02-18 13:54 ` [pmg-devel] [PATCH pmg-api v2 1/2] ruledb: disclaimer: simplify update-case Stoiko Ivanov
2025-02-18 13:54 ` [pmg-devel] [PATCH pmg-api v2 2/2] ruledb: content-type: add flag for matching only based on magic/content Stoiko Ivanov
2025-02-18 17:18 ` Friedrich Weber [this message]
2025-02-19 12:22 ` Stoiko Ivanov
2025-02-18 13:54 ` [pmg-devel] [PATCH pmg-gui v2 1/2] rules/object: remove icon from remove button Stoiko Ivanov
2025-02-18 13:54 ` [pmg-devel] [PATCH pmg-gui v2 2/2] rules/content-typefilter: add checkbox for file content only matching Stoiko Ivanov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ab5bd9b1-cb08-4b77-8e1b-64eab07ac191@proxmox.com \
--to=f.weber@proxmox.com \
--cc=pmg-devel@lists.proxmox.com \
--cc=s.ivanov@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.