all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Stoiko Ivanov <s.ivanov@proxmox.com>
To: pmg-devel@lists.proxmox.com
Subject: [pmg-devel] [PATCH pmg-api v5 3/4] ruledb: content-type: add flag for matching only based on magic/content
Date: Fri, 21 Feb 2025 17:48:17 +0100	[thread overview]
Message-ID: <20250221164821.207845-4-s.ivanov@proxmox.com> (raw)
In-Reply-To: <20250221164821.207845-1-s.ivanov@proxmox.com>

our current content-type matching is sensibly quite cautious in
matching if any available information indicates a potential match:
* mime-type detection based on file contents
* mime-type detection based on file suffix
* content-type header

Sometimes this can lead to surprises (e.g. when a MUA sets the
filetype of a pdf to application/octet-stream (the default type if no
information is available), or a filter for zip-files matching
docx-files.

This change gives users the option to restrict matching only on the
content as detected by xdg_mime_get_mime_type_for_data.

This is a fix for the intial request in #2691 and addresses the
suggestion from Friedrich from:
https://bugzilla.proxmox.com/show_bug.cgi?id=5618#c2

matches on the other items can be created with Match Field objects
(for the content-type header) and Filename (for the match based on the
provided filename - combinations of those should give us the complete
flexibility.

inspired by the changes for disclaimer released with PMG 8.1:
51d1507 ("fix #2430: ruledb disclaimer: make separator configurable")

Tested-by: Dominik Csapak <d.csapak@proxmox.com>
Reviewed-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 src/PMG/RuleDB/ContentTypeFilter.pm | 75 ++++++++++++++++++++++++++---
 1 file changed, 68 insertions(+), 7 deletions(-)

diff --git a/src/PMG/RuleDB/ContentTypeFilter.pm b/src/PMG/RuleDB/ContentTypeFilter.pm
index fb45e95..e44bf3c 100644
--- a/src/PMG/RuleDB/ContentTypeFilter.pm
+++ b/src/PMG/RuleDB/ContentTypeFilter.pm
@@ -26,7 +26,7 @@ sub otype_text {
 }
 
 sub new {
-    my ($type, $fvalue, $ogroup) = @_;
+    my ($type, $fvalue, $ogroup, $only_content) = @_;
 
     my $class = ref($type) || $type;
 
@@ -36,6 +36,7 @@ sub new {
     }
 
     my $self = $class->SUPER::new('content-type', $fvalue, $ogroup);
+    $self->{only_content} = $only_content;
 
     return $self;
 }
@@ -52,9 +53,50 @@ sub load_attr {
 	$obj->{field_value} = $nt;
     }
 
+    my $sth = $ruledb->{dbh}->prepare(
+	"SELECT * FROM Attribut WHERE Object_ID = ?");
+
+    $sth->execute($id);
+
+    $obj->{only_content} = 0;
+
+    while (my $ref = $sth->fetchrow_hashref()) {
+	if ($ref->{name} eq 'only_content') {
+	    $obj->{only_content} = $ref->{value};
+	}
+    }
+
+    $sth->finish();
+
+    $obj->{id} = $id;
+
+    $obj->{digest} = Digest::SHA::sha1_hex( $id, $value, $ogroup, $obj->{only_content});
+
     return $obj;
 }
 
+sub save {
+    my ($self, $ruledb) = @_;
+
+    if (defined($self->{id})) {
+	#update - clean old attribut entries
+	$ruledb->{dbh}->do(
+	    "DELETE FROM Attribut WHERE Object_ID = ?",
+	    undef, $self->{id});
+    }
+
+    $self->{id} = $self->SUPER::save($ruledb);
+
+    if (defined($self->{only_content})) {
+	$ruledb->{dbh}->do(
+	    "INSERT INTO Attribut (Value, Name, Object_ID) VALUES (?, 'only_content', ?) ".
+	    "ON CONFLICT(Object_ID, Name) DO UPDATE SET Value = Excluded.Value ",
+	    undef, $self->{only_content},  $self->{id});
+    }
+
+    return $self->{id};
+}
+
 sub parse_entity {
     my ($self, $entity) = @_;
 
@@ -78,12 +120,16 @@ sub parse_entity {
 
 	my $glob_ct = $entity->{PMX_glob_ct};
 
-	if ($header_ct && $header_ct =~ m|$self->{field_value}|) {
-	    push @$res, $id;
-	} elsif ($magic_ct && $magic_ct =~ m|$self->{field_value}|) {
-	    push @$res, $id;
-	} elsif ($glob_ct && $glob_ct =~ m|$self->{field_value}|) {
+	my $check_only_content = ${self}->{only_content} // 1;
+
+	if ($magic_ct && $magic_ct =~ m|$self->{field_value}|) {
 	    push @$res, $id;
+	} elsif (!$check_only_content) {
+	    if ($header_ct && $header_ct =~ m|$self->{field_value}|) {
+		push @$res, $id;
+	    } elsif ($glob_ct && $glob_ct =~ m|$self->{field_value}|) {
+		push @$res, $id;
+	    }
 	}
     }
 
@@ -112,19 +158,34 @@ sub properties {
 	    pattern => '[0-9a-zA-Z\/\\\[\]\+\-\.\*\_]+',
 	    maxLength => 1024,
 	},
+	'only-content' => {
+	    description => "use content-type from scanning only (ignore filename and header)",
+	    type => 'boolean',
+	    optional => 1,
+	    default => 0,
+	},
     };
 }
 
 sub get {
     my ($self) = @_;
 
-    return { contenttype => $self->{field_value} };
+    return {
+	contenttype => $self->{field_value},
+	'only-content' => $self->{only_content},
+    };
 }
 
 sub update {
     my ($self, $param) = @_;
 
     $self->{field_value} = $param->{contenttype};
+
+    if (defined($param->{'only-content'}) && $param->{'only-content'} == 1) {
+	$self->{only_content} = 1;
+    } else {
+	delete $self->{only_content};
+    }
 }
 
 1;
-- 
2.39.5



_______________________________________________
pmg-devel mailing list
pmg-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pmg-devel


  parent reply	other threads:[~2025-02-21 16:48 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-21 16:48 [pmg-devel] [PATCH pmg-api/pmg-gui v5] add additional attributes to ContentTypeFilter and MatchField Stoiko Ivanov
2025-02-21 16:48 ` [pmg-devel] [PATCH pmg-api v5 1/4] ruledb: disclaimer: simplify update-case Stoiko Ivanov
2025-02-21 16:48 ` [pmg-devel] [PATCH pmg-api v5 2/4] utils: content-type: don't fallback to header information for magic Stoiko Ivanov
2025-02-21 16:48 ` Stoiko Ivanov [this message]
2025-02-21 16:48 ` [pmg-devel] [PATCH pmg-api v5 4/4] fix #2709: ruledb: match-field: optionally restrict to top mime-part Stoiko Ivanov
2025-02-21 16:48 ` [pmg-devel] [PATCH pmg-gui v5 1/3] rules/object: remove icon from remove button Stoiko Ivanov
2025-02-21 16:48 ` [pmg-devel] [PATCH pmg-gui v5 2/3] rules/content-typefilter: add checkbox for file content only matching Stoiko Ivanov
2025-02-21 16:48 ` [pmg-devel] [PATCH pmg-gui v5 3/3] fix #2709: rules: match-field: add top-level-only checkbox Stoiko Ivanov
2025-02-21 17:26 ` [pmg-devel] applied: [PATCH pmg-api/pmg-gui v5] add additional attributes to ContentTypeFilter and MatchField Thomas Lamprecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250221164821.207845-4-s.ivanov@proxmox.com \
    --to=s.ivanov@proxmox.com \
    --cc=pmg-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal