all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Stoiko Ivanov <s.ivanov@proxmox.com>
To: pmg-devel@lists.proxmox.com
Subject: [pmg-devel] [PATCH pmg-api v3 3/8] fix #2541 ruledb: encode relevant values as utf-8 in database
Date: Wed, 23 Nov 2022 10:23:29 +0100	[thread overview]
Message-ID: <20221123092336.11423-4-s.ivanov@proxmox.com> (raw)
In-Reply-To: <20221123092336.11423-1-s.ivanov@proxmox.com>

This patch adds support for storing rule names, comments(info), and
most relevant values (e.g. the header content to match) in utf-8 in
the database.

backwards-compatibility should not be an issue:
* currently the database should not contain any utf-8 multibyte
  characters, as our tooling prevented this due to sending
  wide-characters, which causes an exception in DBI.
* any character > 127 and < 256 will be correctly interpreted when
  stored in a perl-string (this happens if the decode fails in
  try_decode_utf8), and will be correctly encoded when storing into
  the database.

the database is created with SQL_ASCII encoding - which behaves by
interpreting bytes <= 127 as ascii and those > 127 are not interpreted
(see [0], which just means that we have to explicitly en-/decode upon
storing/reading from there)

This patch currently omits most Who objects:
* for email/domain we'd still need to consider how to store them
  (puny-code for the domain part, or everything as UTF-8) and it would
  need changes to the API-types.
* the LDAP objects currently would not work too well, since our LDAPCache
  is not UTF-8 safe - and fixing warants its own patch-series
* WhoRegex should work and be able to handle many use-cases

The ContentType values should also contain only ascii characters per
RFC6838 [1] and RFC2045 [2].

[0] https://www.postgresql.org/docs/13/multibyte.html
[1] https://datatracker.ietf.org/doc/html/rfc6838#section-4.2
[2] https://datatracker.ietf.org/doc/html/rfc2045#section-5.1

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 src/PMG/RuleDB.pm               | 24 ++++++++++++++++--------
 src/PMG/RuleDB/Accept.pm        |  2 +-
 src/PMG/RuleDB/BCC.pm           |  2 +-
 src/PMG/RuleDB/Block.pm         |  2 +-
 src/PMG/RuleDB/Disclaimer.pm    |  2 +-
 src/PMG/RuleDB/Group.pm         |  4 ++--
 src/PMG/RuleDB/MatchField.pm    |  8 ++++++--
 src/PMG/RuleDB/MatchFilename.pm |  5 ++++-
 src/PMG/RuleDB/ModField.pm      |  6 ++++--
 src/PMG/RuleDB/Notify.pm        |  2 +-
 src/PMG/RuleDB/Quarantine.pm    |  3 ++-
 src/PMG/RuleDB/Remove.pm        | 12 +++++++-----
 src/PMG/RuleDB/Rule.pm          |  2 +-
 src/PMG/RuleDB/WhoRegex.pm      |  5 ++++-
 14 files changed, 51 insertions(+), 28 deletions(-)

diff --git a/src/PMG/RuleDB.pm b/src/PMG/RuleDB.pm
index 895acc6..a6b0b79 100644
--- a/src/PMG/RuleDB.pm
+++ b/src/PMG/RuleDB.pm
@@ -5,6 +5,7 @@ use warnings;
 use DBI;
 use HTML::Entities;
 use Data::Dumper;
+use Encode qw(encode);
 
 use PVE::SafeSyslog;
 
@@ -70,8 +71,8 @@ sub create_group_with_obj {
 
     defined($obj) || die "proxmox: undefined object";
 
-    $name //= '';
-    $info //= '';
+    $name = encode('UTF-8', $name // '');
+    $info = encode('UTF-8', $info // '');
 
     eval {
 
@@ -174,7 +175,9 @@ sub save_group {
 	$self->{dbh}->do("UPDATE Objectgroup " .
 			 "SET Name = ?, Info = ? " .
 			 "WHERE ID = ?", undef,
-			 $og->{name}, $og->{info}, $og->{id});
+			 encode('UTF-8', $og->{name}),
+			 encode('UTF-8', $og->{info}),
+			 $og->{id});
 
 	return $og->{id};
 
@@ -183,7 +186,7 @@ sub save_group {
 	    "INSERT INTO Objectgroup (Name, Info, Class) " .
 	    "VALUES (?, ?, ?);");
 
-	$sth->execute($og->name, $og->info, $og->class);
+	$sth->execute(encode('UTF-8', $og->name), encode('UTF-8', $og->info), $og->class);
 
 	return $og->{id} = PMG::Utils::lastid($self->{dbh}, 'objectgroup_id_seq');
     }
@@ -212,7 +215,9 @@ sub delete_group {
 	$sth->execute($groupid);
 
 	if (my $ref = $sth->fetchrow_hashref()) {
-	    die "Group '$ref->{groupname}' is used by rule '$ref->{rulename}' - unable to delete\n";
+	    my $groupname = PMG::Utils::try_decode_utf8($ref->{groupname});
+	    my $rulename = PMG::Utils::try_decode_utf8($ref->{rulename});
+	    die "Group '$groupname' is used by rule '$rulename' - unable to delete\n";
 	}
 
         $sth->finish();
@@ -474,6 +479,7 @@ sub load_object_full {
 sub load_group_by_name {
     my ($self, $name) = @_;
 
+    $name = encode('UTF-8', $name);
     my $sth = $self->{dbh}->prepare("SELECT * FROM Objectgroup " .
 				    "WHERE name = ?");
 
@@ -598,13 +604,14 @@ sub save_rule {
     defined($rule->{direction}) ||
 	die "undefined rule attribute - direction: ERROR";
 
+    my $rulename = encode('UTF-8', $rule->{name});
     if (defined($rule->{id})) {
 
 	$self->{dbh}->do(
 	    "UPDATE Rule " .
 	    "SET Name = ?, Priority = ?, Active = ?, Direction = ? " .
 	    "WHERE ID = ?", undef,
-	    $rule->{name}, $rule->{priority}, $rule->{active},
+	    $rulename, $rule->{priority}, $rule->{active},
 	    $rule->{direction}, $rule->{id});
 
 	return $rule->{id};
@@ -614,7 +621,7 @@ sub save_rule {
 	    "INSERT INTO Rule (Name, Priority, Active, Direction) " .
 	    "VALUES (?, ?, ?, ?);");
 
-	$sth->execute($rule->name, $rule->priority, $rule->active,
+	$sth->execute($rulename, $rule->priority, $rule->active,
 		      $rule->direction);
 
 	return $rule->{id} = PMG::Utils::lastid($self->{dbh}, 'rule_id_seq');
@@ -779,7 +786,8 @@ sub load_rules {
     $sth->execute();
 
     while (my $ref = $sth->fetchrow_hashref()) {
-	my $rule = PMG::RuleDB::Rule->new($ref->{name}, $ref->{priority},
+	my $rulename = PMG::Utils::try_decode_utf8($ref->{name});
+	my $rule = PMG::RuleDB::Rule->new($rulename, $ref->{priority},
 					  $ref->{active}, $ref->{direction});
 	$rule->{id} = $ref->{id};
 	push @$rules, $rule;
diff --git a/src/PMG/RuleDB/Accept.pm b/src/PMG/RuleDB/Accept.pm
index cd67ea2..4ebd6da 100644
--- a/src/PMG/RuleDB/Accept.pm
+++ b/src/PMG/RuleDB/Accept.pm
@@ -93,7 +93,7 @@ sub execute {
     my $dkim = $msginfo->{dkim} // {};
     my $subgroups = $mod_group->subgroups($targets, !$dkim->{sign});
 
-    my $rulename = $vars->{RULE} // 'unknown';
+    my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
 
     foreach my $ta (@$subgroups) {
 	my ($tg, $entity) = (@$ta[0], @$ta[1]);
diff --git a/src/PMG/RuleDB/BCC.pm b/src/PMG/RuleDB/BCC.pm
index 4867d83..6244dd9 100644
--- a/src/PMG/RuleDB/BCC.pm
+++ b/src/PMG/RuleDB/BCC.pm
@@ -115,7 +115,7 @@ sub execute {
 
     my $subgroups = $mod_group->subgroups($targets, 1);
 
-    my $rulename = $vars->{RULE} // 'unknown';
+    my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
 
     my $bcc_to = PMG::Utils::subst_values_for_header($self->{target}, $vars);
 
diff --git a/src/PMG/RuleDB/Block.pm b/src/PMG/RuleDB/Block.pm
index c758787..25bb74e 100644
--- a/src/PMG/RuleDB/Block.pm
+++ b/src/PMG/RuleDB/Block.pm
@@ -89,7 +89,7 @@ sub execute {
     my ($self, $queue, $ruledb, $mod_group, $targets, 
 	$msginfo, $vars, $marks) = @_;
 
-    my $rulename = $vars->{RULE} // 'unknown';
+    my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
 
     if ($msginfo->{testmode}) {
 	my $fh = $msginfo->{test_fh};
diff --git a/src/PMG/RuleDB/Disclaimer.pm b/src/PMG/RuleDB/Disclaimer.pm
index d3003b2..c6afe54 100644
--- a/src/PMG/RuleDB/Disclaimer.pm
+++ b/src/PMG/RuleDB/Disclaimer.pm
@@ -193,7 +193,7 @@ sub execute {
     my ($self, $queue, $ruledb, $mod_group, $targets, 
 	$msginfo, $vars, $marks) = @_;
 
-    my $rulename = $vars->{RULE} // 'unknown';
+    my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
 
     my $subgroups = $mod_group->subgroups($targets);
 
diff --git a/src/PMG/RuleDB/Group.pm b/src/PMG/RuleDB/Group.pm
index 2508305..baa68ce 100644
--- a/src/PMG/RuleDB/Group.pm
+++ b/src/PMG/RuleDB/Group.pm
@@ -12,8 +12,8 @@ sub new {
     my ($type, $name, $info, $class) = @_;
 
     my $self = {
-	name => $name,
-	info => $info,
+	name => PMG::Utils::try_decode_utf8($name),
+	info => PMG::Utils::try_decode_utf8($info),
 	class => $class,
     };
 
diff --git a/src/PMG/RuleDB/MatchField.pm b/src/PMG/RuleDB/MatchField.pm
index 2671ea4..2b56058 100644
--- a/src/PMG/RuleDB/MatchField.pm
+++ b/src/PMG/RuleDB/MatchField.pm
@@ -4,6 +4,7 @@ use strict;
 use warnings;
 use DBI;
 use Digest::SHA;
+use Encode qw(encode);
 use MIME::Words;
 
 use PVE::SafeSyslog;
@@ -50,9 +51,10 @@ sub load_attr {
     defined($field) || die "undefined object attribute: ERROR";
     defined($field_value) || die "undefined object attribute: ERROR";
 
+    my $decoded_field_value = PMG::Utils::try_decode_utf8($field_value);
     # use known constructor, bless afterwards (because sub class can have constructor
     # with other parameter signature).
-    my $obj =  PMG::RuleDB::MatchField->new($field, $field_value, $ogroup);
+    my $obj =  PMG::RuleDB::MatchField->new($field, $decoded_field_value, $ogroup);
     bless $obj, $class;
 
     $obj->{id} = $id;
@@ -69,6 +71,7 @@ sub save {
 
     my $new_value = "$self->{field}:$self->{field_value}";
     $new_value =~ s/\\/\\\\/g;
+    $new_value = encode('UTF-8', $new_value);
 
     if (defined ($self->{id})) {
 	# update
@@ -105,7 +108,8 @@ sub parse_entity {
 	for my $value ($entity->head->get_all($self->{field})) {
 	    chomp $value;
 
-	    my $decvalue = MIME::Words::decode_mimewords($value);
+	    my $decvalue = PMG::Utils::decode_rfc1522($value);
+	    $decvalue = PMG::Utils::try_decode_utf8($decvalue);
 
 	    if ($decvalue =~ m|$self->{field_value}|i) {
 		push @$res, $id;
diff --git a/src/PMG/RuleDB/MatchFilename.pm b/src/PMG/RuleDB/MatchFilename.pm
index 7e5b486..c9cdbe0 100644
--- a/src/PMG/RuleDB/MatchFilename.pm
+++ b/src/PMG/RuleDB/MatchFilename.pm
@@ -4,6 +4,7 @@ use strict;
 use warnings;
 use DBI;
 use Digest::SHA;
+use Encode qw(encode);
 use MIME::Words;
 
 use PMG::Utils;
@@ -41,8 +42,9 @@ sub load_attr {
     my $class = ref($type) || $type;
 
     defined($value) || die "undefined value: ERROR";;
+    my $decvalue = PMG::Utils::try_decode_utf8($value);
 
-    my $obj = $class->new($value, $ogroup);
+    my $obj = $class->new($decvalue, $ogroup);
     $obj->{id} = $id;
 
     $obj->{digest} = Digest::SHA::sha1_hex($id, $value, $ogroup);
@@ -57,6 +59,7 @@ sub save {
 
     my $new_value = $self->{fname};
     $new_value =~ s/\\/\\\\/g;
+    $new_value = encode('UTF-8', $new_value);
 
     if (defined($self->{id})) {
 	# update
diff --git a/src/PMG/RuleDB/ModField.pm b/src/PMG/RuleDB/ModField.pm
index 34108d1..6232322 100644
--- a/src/PMG/RuleDB/ModField.pm
+++ b/src/PMG/RuleDB/ModField.pm
@@ -56,7 +56,9 @@ sub load_attr {
 
     (defined($field) && defined($field_value)) || return undef;
 
-    my $obj = $class->new($field, $field_value, $ogroup);
+    my $dec_field_value = PMG::Utils::try_decode_utf8($field_value);
+
+    my $obj = $class->new($field, $dec_field_value, $ogroup);
     $obj->{id} = $id;
 
     $obj->{digest} = Digest::SHA::sha1_hex($id, $field, $field_value, $ogroup);
@@ -69,7 +71,7 @@ sub save {
 
     defined($self->{ogroup}) || return undef;
 
-    my $new_value = "$self->{field}:$self->{field_value}";
+    my $new_value = encode('UTF-8', "$self->{field}:$self->{field_value}");
 
     if (defined ($self->{id})) {
 	# update
diff --git a/src/PMG/RuleDB/Notify.pm b/src/PMG/RuleDB/Notify.pm
index 7b38e0d..8a9945b 100644
--- a/src/PMG/RuleDB/Notify.pm
+++ b/src/PMG/RuleDB/Notify.pm
@@ -208,7 +208,7 @@ sub execute {
 
     my $from = 'postmaster';
 
-    my $rulename = $vars->{RULE} // 'unknown';
+    my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
 
     my $body = PMG::Utils::subst_values($self->{body}, $vars);
     my $subject = PMG::Utils::subst_values_for_header($self->{subject}, $vars);
diff --git a/src/PMG/RuleDB/Quarantine.pm b/src/PMG/RuleDB/Quarantine.pm
index 1426393..9d802fe 100644
--- a/src/PMG/RuleDB/Quarantine.pm
+++ b/src/PMG/RuleDB/Quarantine.pm
@@ -4,6 +4,7 @@ use strict;
 use warnings;
 use DBI;
 use Digest::SHA;
+use Encode qw(encode);
 
 use PVE::SafeSyslog;
 
@@ -89,7 +90,7 @@ sub execute {
     
     my $subgroups = $mod_group->subgroups($targets, 1);
 
-    my $rulename = $vars->{RULE} // 'unknown';
+    my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
 
     foreach my $ta (@$subgroups) {
 	my ($tg, $entity) = (@$ta[0], @$ta[1]);
diff --git a/src/PMG/RuleDB/Remove.pm b/src/PMG/RuleDB/Remove.pm
index 6b27b91..da6c25f 100644
--- a/src/PMG/RuleDB/Remove.pm
+++ b/src/PMG/RuleDB/Remove.pm
@@ -63,12 +63,14 @@ sub load_attr {
 
     defined ($value) || die "undefined value: ERROR";
 
-    my $obj;
+    my ($obj, $text);
 
     if ($value =~ m/^([01])\,([01])(\:(.*))?$/s) {
-	$obj = $class->new($1, $4, $ogroup, $2);
+	$text = PMG::Utils::try_decode_utf8($4);
+	$obj = $class->new($1, $text, $ogroup, $2);
     } elsif ($value =~ m/^([01])(\:(.*))?$/s) {
-	$obj = $class->new($1, $3, $ogroup);
+	$text = PMG::Utils::try_decode_utf8($3);
+	$obj = $class->new($1, $text, $ogroup);
     } else {
 	$obj = $class->new(0, undef, $ogroup);
     }
@@ -89,7 +91,7 @@ sub save {
     $value .= ','. ($self->{quarantine} ? '1' : '0');
 
     if ($self->{text}) {
-	$value .= ":$self->{text}";
+	$value .= encode('UTF-8', ":$self->{text}");
     }
 
     if (defined ($self->{id})) {
@@ -194,7 +196,7 @@ sub execute {
     my ($self, $queue, $ruledb, $mod_group, $targets,
 	$msginfo, $vars, $marks, $ldap) = @_;
 
-    my $rulename = $vars->{RULE} // 'unknown';
+    my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
 
     if (!$self->{all} && ($#$marks == -1)) {
 	# no marks
diff --git a/src/PMG/RuleDB/Rule.pm b/src/PMG/RuleDB/Rule.pm
index c49ad21..e7c9146 100644
--- a/src/PMG/RuleDB/Rule.pm
+++ b/src/PMG/RuleDB/Rule.pm
@@ -12,7 +12,7 @@ sub new {
     my ($type, $name, $priority, $active, $direction) = @_;
 
     my $self = { 
-	name => $name // '',
+	name => PMG::Utils::try_decode_utf8($name) // '',
 	priority => $priority // 0,
 	active => $active // 0,
     }; 
diff --git a/src/PMG/RuleDB/WhoRegex.pm b/src/PMG/RuleDB/WhoRegex.pm
index 37ec3aa..5c13604 100644
--- a/src/PMG/RuleDB/WhoRegex.pm
+++ b/src/PMG/RuleDB/WhoRegex.pm
@@ -4,6 +4,7 @@ use strict;
 use warnings;
 use DBI;
 use Digest::SHA;
+use Encode qw(encode);
 
 use PMG::Utils;
 use PMG::RuleDB::Object;
@@ -43,7 +44,8 @@ sub load_attr {
 
     defined($value) || die "undefined value: ERROR";
 
-    my $obj = $class->new ($value, $ogroup);
+    my $decoded_value = PMG::Utils::try_decode_utf8($value);
+    my $obj = $class->new ($decoded_value, $ogroup);
     $obj->{id} = $id;
 
     $obj->{digest} = Digest::SHA::sha1_hex($id, $value, $ogroup);
@@ -59,6 +61,7 @@ sub save {
 
     my $adr = $self->{address};
     $adr =~ s/\\/\\\\/g;
+    $adr = encode('UTF-8', $adr);
 
     if (defined ($self->{id})) {
 	# update
-- 
2.30.2





  parent reply	other threads:[~2022-11-23  9:24 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-23  9:23 [pmg-devel] [PATCH pmg-api/pmg-gui v3] ruledb - improve experience for non-ascii tests and mails Stoiko Ivanov
2022-11-23  9:23 ` [pmg-devel] [PATCH pmg-api v3 1/8] utils: return perl string from decode_rfc1522 Stoiko Ivanov
2022-11-23  9:23 ` [pmg-devel] [PATCH pmg-api v3 2/8] ruledb: properly substitute prox_vars in headers Stoiko Ivanov
2022-11-23  9:23 ` Stoiko Ivanov [this message]
2022-11-23  9:23 ` [pmg-devel] [PATCH pmg-api v3 4/8] ruledb: encode e-mail addresses for syslog Stoiko Ivanov
2022-11-23  9:23 ` [pmg-devel] [PATCH pmg-api v3 5/8] partially fix #2465: handle smtputf8 addresses in the rule-system Stoiko Ivanov
2022-11-23  9:23 ` [pmg-devel] [PATCH pmg-api v3 6/8] quarantine: handle utf8 data Stoiko Ivanov
2022-11-23 14:15   ` Dominik Csapak
2022-11-23  9:23 ` [pmg-devel] [PATCH pmg-api v3 7/8] pmgqm: handle smtputf8 data Stoiko Ivanov
2022-11-23 14:20   ` Dominik Csapak
2022-11-23  9:23 ` [pmg-devel] [PATCH pmg-api v3 8/8] statistics: handle utf8 data Stoiko Ivanov
2022-11-23 14:26   ` Dominik Csapak
2022-11-23  9:23 ` [pmg-devel] [PATCH pmg-gui v3 1/2] utils: add custom validator for pmg-email-address Stoiko Ivanov
2022-11-23  9:23 ` [pmg-devel] [PATCH pmg-gui v3 2/2] userblocklists: use PMGMail as validator for pmail Stoiko Ivanov
2022-11-23 14:09 ` [pmg-devel] [PATCH pmg-api/pmg-gui v3] ruledb - improve experience for non-ascii tests and mails Dominik Csapak
2022-11-26  7:00 ` [pmg-devel] applied-gui: " Thomas Lamprecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221123092336.11423-4-s.ivanov@proxmox.com \
    --to=s.ivanov@proxmox.com \
    --cc=pmg-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal