public inbox for pmg-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Stoiko Ivanov <s.ivanov@proxmox.com>
To: pmg-devel@lists.proxmox.com
Subject: [pmg-devel] [PATCH pmg-api 3/5] fix #2541 ruledb: encode relevant values as utf-8 in database
Date: Wed,  9 Nov 2022 19:27:26 +0100	[thread overview]
Message-ID: <20221109182728.629576-4-s.ivanov@proxmox.com> (raw)
In-Reply-To: <20221109182728.629576-1-s.ivanov@proxmox.com>

This patch adds support for storing rule names, comments(info), and
most relevant values (e.g. the header content to match) in utf-8 in
the database.

backwards-compatibility should not be an issue:
* following the argumentation from commit
  43f8112f0bb424f99057106d57d32276d7d422a6 in pve-storage
* we only need to consider that the valid multibyte utf-8 characters
  do not really yield sensible combinations of single-byte characters
  (starting with a byte > 127 - e.g. "£")

the database is created with SQL_ASCII encoding - which behaves by
interpreting bytes <= 127 as ascii and those > 127 are not interpreted
(see [0], which just means that we have to explicitly en-/decode upon
storing/reading from there)

This patch currently omits most Who objects:
* for email/domain we'd still need to consider how to store them
  (puny-code for the domain part, or everything as UTF-8) and it would
  need changes to the API-types.
* the LDAP objects currently would not work too well, since our LDAPCache
  is not UTF-8 safe - and fixing warants its own patch-series
* WhoRegex should work and be able to handle many use-cases

The ContentType values should also contain only ascii characters per
RFC6838 [1] and RFC2045 [2].

[0] https://www.postgresql.org/docs/13/multibyte.html
[1] https://datatracker.ietf.org/doc/html/rfc6838#section-4.2
[2] https://datatracker.ietf.org/doc/html/rfc2045#section-5.1

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 src/PMG/RuleDB.pm               | 23 ++++++++++++++++-------
 src/PMG/RuleDB/Accept.pm        |  2 +-
 src/PMG/RuleDB/BCC.pm           |  2 +-
 src/PMG/RuleDB/Block.pm         |  2 +-
 src/PMG/RuleDB/Disclaimer.pm    |  2 +-
 src/PMG/RuleDB/Group.pm         |  4 ++--
 src/PMG/RuleDB/MatchField.pm    |  6 +++++-
 src/PMG/RuleDB/MatchFilename.pm |  5 ++++-
 src/PMG/RuleDB/ModField.pm      |  6 ++++--
 src/PMG/RuleDB/Notify.pm        |  2 +-
 src/PMG/RuleDB/Quarantine.pm    |  3 ++-
 src/PMG/RuleDB/Remove.pm        | 12 +++++++-----
 src/PMG/RuleDB/Rule.pm          |  2 +-
 src/PMG/RuleDB/WhoRegex.pm      |  5 ++++-
 src/PMG/Utils.pm                |  5 +++++
 15 files changed, 55 insertions(+), 26 deletions(-)

diff --git a/src/PMG/RuleDB.pm b/src/PMG/RuleDB.pm
index 895acc6..9d6d99d 100644
--- a/src/PMG/RuleDB.pm
+++ b/src/PMG/RuleDB.pm
@@ -5,6 +5,7 @@ use warnings;
 use DBI;
 use HTML::Entities;
 use Data::Dumper;
+use Encode qw(encode decode);
 
 use PVE::SafeSyslog;
 
@@ -72,7 +73,8 @@ sub create_group_with_obj {
 
     $name //= '';
     $info //= '';
-
+    $name = encode('UTF-8', $name);
+    $info = encode('UTF-8',$info);
     eval {
 
 	$self->{dbh}->begin_work;
@@ -174,7 +176,9 @@ sub save_group {
 	$self->{dbh}->do("UPDATE Objectgroup " .
 			 "SET Name = ?, Info = ? " .
 			 "WHERE ID = ?", undef,
-			 $og->{name}, $og->{info}, $og->{id});
+			 encode('UTF-8', $og->{name}),
+			 encode('UTF-8', $og->{info}),
+			 $og->{id});
 
 	return $og->{id};
 
@@ -183,7 +187,7 @@ sub save_group {
 	    "INSERT INTO Objectgroup (Name, Info, Class) " .
 	    "VALUES (?, ?, ?);");
 
-	$sth->execute($og->name, $og->info, $og->class);
+	$sth->execute(encode('UTF-8', $og->name), encode('UTF-8', $og->info), $og->class);
 
 	return $og->{id} = PMG::Utils::lastid($self->{dbh}, 'objectgroup_id_seq');
     }
@@ -212,7 +216,9 @@ sub delete_group {
 	$sth->execute($groupid);
 
 	if (my $ref = $sth->fetchrow_hashref()) {
-	    die "Group '$ref->{groupname}' is used by rule '$ref->{rulename}' - unable to delete\n";
+	    my $groupname = PMG::Utils::try_deocode_utf8($ref->{groupname});
+	    my $rulename = PMG::Utils::try_deocode_utf8($ref->{rulename});
+	    die "Group '$groupname' is used by rule '$rulename' - unable to delete\n";
 	}
 
         $sth->finish();
@@ -474,6 +480,7 @@ sub load_object_full {
 sub load_group_by_name {
     my ($self, $name) = @_;
 
+    $name = PMG::Utils::try_decode_utf8($name);
     my $sth = $self->{dbh}->prepare("SELECT * FROM Objectgroup " .
 				    "WHERE name = ?");
 
@@ -598,13 +605,14 @@ sub save_rule {
     defined($rule->{direction}) ||
 	die "undefined rule attribute - direction: ERROR";
 
+    my $rulename = encode('UTF-8', $rule->{name});
     if (defined($rule->{id})) {
 
 	$self->{dbh}->do(
 	    "UPDATE Rule " .
 	    "SET Name = ?, Priority = ?, Active = ?, Direction = ? " .
 	    "WHERE ID = ?", undef,
-	    $rule->{name}, $rule->{priority}, $rule->{active},
+	    $rulename, $rule->{priority}, $rule->{active},
 	    $rule->{direction}, $rule->{id});
 
 	return $rule->{id};
@@ -614,7 +622,7 @@ sub save_rule {
 	    "INSERT INTO Rule (Name, Priority, Active, Direction) " .
 	    "VALUES (?, ?, ?, ?);");
 
-	$sth->execute($rule->name, $rule->priority, $rule->active,
+	$sth->execute($rulename, $rule->priority, $rule->active,
 		      $rule->direction);
 
 	return $rule->{id} = PMG::Utils::lastid($self->{dbh}, 'rule_id_seq');
@@ -779,7 +787,8 @@ sub load_rules {
     $sth->execute();
 
     while (my $ref = $sth->fetchrow_hashref()) {
-	my $rule = PMG::RuleDB::Rule->new($ref->{name}, $ref->{priority},
+	my $rulename = PMG::Utils::try_decode_utf8($ref->{name});
+	my $rule = PMG::RuleDB::Rule->new($rulename, $ref->{priority},
 					  $ref->{active}, $ref->{direction});
 	$rule->{id} = $ref->{id};
 	push @$rules, $rule;
diff --git a/src/PMG/RuleDB/Accept.pm b/src/PMG/RuleDB/Accept.pm
index cd67ea2..4ebd6da 100644
--- a/src/PMG/RuleDB/Accept.pm
+++ b/src/PMG/RuleDB/Accept.pm
@@ -93,7 +93,7 @@ sub execute {
     my $dkim = $msginfo->{dkim} // {};
     my $subgroups = $mod_group->subgroups($targets, !$dkim->{sign});
 
-    my $rulename = $vars->{RULE} // 'unknown';
+    my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
 
     foreach my $ta (@$subgroups) {
 	my ($tg, $entity) = (@$ta[0], @$ta[1]);
diff --git a/src/PMG/RuleDB/BCC.pm b/src/PMG/RuleDB/BCC.pm
index d364690..c1225f3 100644
--- a/src/PMG/RuleDB/BCC.pm
+++ b/src/PMG/RuleDB/BCC.pm
@@ -115,7 +115,7 @@ sub execute {
 
     my $subgroups = $mod_group->subgroups($targets, 1);
 
-    my $rulename = $vars->{RULE} // 'unknown';
+    my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
 
     my $bcc_to = PMG::Utils::subst_values($self->{target}, $vars);
 
diff --git a/src/PMG/RuleDB/Block.pm b/src/PMG/RuleDB/Block.pm
index c758787..25bb74e 100644
--- a/src/PMG/RuleDB/Block.pm
+++ b/src/PMG/RuleDB/Block.pm
@@ -89,7 +89,7 @@ sub execute {
     my ($self, $queue, $ruledb, $mod_group, $targets, 
 	$msginfo, $vars, $marks) = @_;
 
-    my $rulename = $vars->{RULE} // 'unknown';
+    my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
 
     if ($msginfo->{testmode}) {
 	my $fh = $msginfo->{test_fh};
diff --git a/src/PMG/RuleDB/Disclaimer.pm b/src/PMG/RuleDB/Disclaimer.pm
index d3003b2..c6afe54 100644
--- a/src/PMG/RuleDB/Disclaimer.pm
+++ b/src/PMG/RuleDB/Disclaimer.pm
@@ -193,7 +193,7 @@ sub execute {
     my ($self, $queue, $ruledb, $mod_group, $targets, 
 	$msginfo, $vars, $marks) = @_;
 
-    my $rulename = $vars->{RULE} // 'unknown';
+    my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
 
     my $subgroups = $mod_group->subgroups($targets);
 
diff --git a/src/PMG/RuleDB/Group.pm b/src/PMG/RuleDB/Group.pm
index 2508305..baa68ce 100644
--- a/src/PMG/RuleDB/Group.pm
+++ b/src/PMG/RuleDB/Group.pm
@@ -12,8 +12,8 @@ sub new {
     my ($type, $name, $info, $class) = @_;
 
     my $self = {
-	name => $name,
-	info => $info,
+	name => PMG::Utils::try_decode_utf8($name),
+	info => PMG::Utils::try_decode_utf8($info),
 	class => $class,
     };
 
diff --git a/src/PMG/RuleDB/MatchField.pm b/src/PMG/RuleDB/MatchField.pm
index 2671ea4..8246e6e 100644
--- a/src/PMG/RuleDB/MatchField.pm
+++ b/src/PMG/RuleDB/MatchField.pm
@@ -4,6 +4,7 @@ use strict;
 use warnings;
 use DBI;
 use Digest::SHA;
+use Encode qw(decode encode);
 use MIME::Words;
 
 use PVE::SafeSyslog;
@@ -50,9 +51,10 @@ sub load_attr {
     defined($field) || die "undefined object attribute: ERROR";
     defined($field_value) || die "undefined object attribute: ERROR";
 
+    my $decoded_field_value = PMG::Utils::try_decode_utf8($field_value);
     # use known constructor, bless afterwards (because sub class can have constructor
     # with other parameter signature).
-    my $obj =  PMG::RuleDB::MatchField->new($field, $field_value, $ogroup);
+    my $obj =  PMG::RuleDB::MatchField->new($field, $decoded_field_value, $ogroup);
     bless $obj, $class;
 
     $obj->{id} = $id;
@@ -69,6 +71,7 @@ sub save {
 
     my $new_value = "$self->{field}:$self->{field_value}";
     $new_value =~ s/\\/\\\\/g;
+    $new_value = encode('UTF-8', $new_value);
 
     if (defined ($self->{id})) {
 	# update
@@ -106,6 +109,7 @@ sub parse_entity {
 	    chomp $value;
 
 	    my $decvalue = MIME::Words::decode_mimewords($value);
+	    $decvalue = PMG::Utils::try_decode_utf8($decvalue);
 
 	    if ($decvalue =~ m|$self->{field_value}|i) {
 		push @$res, $id;
diff --git a/src/PMG/RuleDB/MatchFilename.pm b/src/PMG/RuleDB/MatchFilename.pm
index 7e5b486..06bf931 100644
--- a/src/PMG/RuleDB/MatchFilename.pm
+++ b/src/PMG/RuleDB/MatchFilename.pm
@@ -4,6 +4,7 @@ use strict;
 use warnings;
 use DBI;
 use Digest::SHA;
+use Encode qw(encode decode);
 use MIME::Words;
 
 use PMG::Utils;
@@ -41,8 +42,9 @@ sub load_attr {
     my $class = ref($type) || $type;
 
     defined($value) || die "undefined value: ERROR";;
+    my $decvalue = PMG::Utils::try_decode_utf8($value);
 
-    my $obj = $class->new($value, $ogroup);
+    my $obj = $class->new($decvalue, $ogroup);
     $obj->{id} = $id;
 
     $obj->{digest} = Digest::SHA::sha1_hex($id, $value, $ogroup);
@@ -57,6 +59,7 @@ sub save {
 
     my $new_value = $self->{fname};
     $new_value =~ s/\\/\\\\/g;
+    $new_value = encode('UTF-8', $new_value);
 
     if (defined($self->{id})) {
 	# update
diff --git a/src/PMG/RuleDB/ModField.pm b/src/PMG/RuleDB/ModField.pm
index fb15076..1e1727f 100644
--- a/src/PMG/RuleDB/ModField.pm
+++ b/src/PMG/RuleDB/ModField.pm
@@ -57,7 +57,9 @@ sub load_attr {
 
     (defined($field) && defined($field_value)) || return undef;
 
-    my $obj = $class->new($field, $field_value, $ogroup);
+    my $dec_field_value = PMG::Utils::try_decode_utf8($field_value);
+
+    my $obj = $class->new($field, $dec_field_value, $ogroup);
     $obj->{id} = $id;
 
     $obj->{digest} = Digest::SHA::sha1_hex($id, $field, $field_value, $ogroup);
@@ -70,7 +72,7 @@ sub save {
 
     defined($self->{ogroup}) || return undef;
 
-    my $new_value = "$self->{field}:$self->{field_value}";
+    my $new_value = encode('UTF-8', "$self->{field}:$self->{field_value}");
 
     if (defined ($self->{id})) {
 	# update
diff --git a/src/PMG/RuleDB/Notify.pm b/src/PMG/RuleDB/Notify.pm
index af853a3..bca5ebf 100644
--- a/src/PMG/RuleDB/Notify.pm
+++ b/src/PMG/RuleDB/Notify.pm
@@ -208,7 +208,7 @@ sub execute {
 
     my $from = 'postmaster';
 
-    my $rulename = $vars->{RULE} // 'unknown';
+    my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
 
     my $body = PMG::Utils::subst_values($self->{body}, $vars);
     my $subject = PMG::Utils::subst_values($self->{subject}, $vars);
diff --git a/src/PMG/RuleDB/Quarantine.pm b/src/PMG/RuleDB/Quarantine.pm
index 1426393..30bc5ec 100644
--- a/src/PMG/RuleDB/Quarantine.pm
+++ b/src/PMG/RuleDB/Quarantine.pm
@@ -4,6 +4,7 @@ use strict;
 use warnings;
 use DBI;
 use Digest::SHA;
+use Encode qw(decode encode);
 
 use PVE::SafeSyslog;
 
@@ -89,7 +90,7 @@ sub execute {
     
     my $subgroups = $mod_group->subgroups($targets, 1);
 
-    my $rulename = $vars->{RULE} // 'unknown';
+    my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
 
     foreach my $ta (@$subgroups) {
 	my ($tg, $entity) = (@$ta[0], @$ta[1]);
diff --git a/src/PMG/RuleDB/Remove.pm b/src/PMG/RuleDB/Remove.pm
index 6b27b91..da6c25f 100644
--- a/src/PMG/RuleDB/Remove.pm
+++ b/src/PMG/RuleDB/Remove.pm
@@ -63,12 +63,14 @@ sub load_attr {
 
     defined ($value) || die "undefined value: ERROR";
 
-    my $obj;
+    my ($obj, $text);
 
     if ($value =~ m/^([01])\,([01])(\:(.*))?$/s) {
-	$obj = $class->new($1, $4, $ogroup, $2);
+	$text = PMG::Utils::try_decode_utf8($4);
+	$obj = $class->new($1, $text, $ogroup, $2);
     } elsif ($value =~ m/^([01])(\:(.*))?$/s) {
-	$obj = $class->new($1, $3, $ogroup);
+	$text = PMG::Utils::try_decode_utf8($3);
+	$obj = $class->new($1, $text, $ogroup);
     } else {
 	$obj = $class->new(0, undef, $ogroup);
     }
@@ -89,7 +91,7 @@ sub save {
     $value .= ','. ($self->{quarantine} ? '1' : '0');
 
     if ($self->{text}) {
-	$value .= ":$self->{text}";
+	$value .= encode('UTF-8', ":$self->{text}");
     }
 
     if (defined ($self->{id})) {
@@ -194,7 +196,7 @@ sub execute {
     my ($self, $queue, $ruledb, $mod_group, $targets,
 	$msginfo, $vars, $marks, $ldap) = @_;
 
-    my $rulename = $vars->{RULE} // 'unknown';
+    my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
 
     if (!$self->{all} && ($#$marks == -1)) {
 	# no marks
diff --git a/src/PMG/RuleDB/Rule.pm b/src/PMG/RuleDB/Rule.pm
index c49ad21..e7c9146 100644
--- a/src/PMG/RuleDB/Rule.pm
+++ b/src/PMG/RuleDB/Rule.pm
@@ -12,7 +12,7 @@ sub new {
     my ($type, $name, $priority, $active, $direction) = @_;
 
     my $self = { 
-	name => $name // '',
+	name => PMG::Utils::try_decode_utf8($name) // '',
 	priority => $priority // 0,
 	active => $active // 0,
     }; 
diff --git a/src/PMG/RuleDB/WhoRegex.pm b/src/PMG/RuleDB/WhoRegex.pm
index 37ec3aa..ccc94a0 100644
--- a/src/PMG/RuleDB/WhoRegex.pm
+++ b/src/PMG/RuleDB/WhoRegex.pm
@@ -4,6 +4,7 @@ use strict;
 use warnings;
 use DBI;
 use Digest::SHA;
+use Encode qw(decode encode);
 
 use PMG::Utils;
 use PMG::RuleDB::Object;
@@ -43,7 +44,8 @@ sub load_attr {
 
     defined($value) || die "undefined value: ERROR";
 
-    my $obj = $class->new ($value, $ogroup);
+    my $decoded_value = PMG::Utils::try_decode_utf8($value);
+    my $obj = $class->new ($decoded_value, $ogroup);
     $obj->{id} = $id;
 
     $obj->{digest} = Digest::SHA::sha1_hex($id, $value, $ogroup);
@@ -59,6 +61,7 @@ sub save {
 
     my $adr = $self->{address};
     $adr =~ s/\\/\\\\/g;
+    $adr = encode('UTF-8', $adr);
 
     if (defined ($self->{id})) {
 	# update
diff --git a/src/PMG/Utils.pm b/src/PMG/Utils.pm
index cef232b..23f60eb 100644
--- a/src/PMG/Utils.pm
+++ b/src/PMG/Utils.pm
@@ -1542,4 +1542,9 @@ sub get_existing_object_id {
     return;
 }
 
+sub try_decode_utf8 {
+    my ($data) = @_;
+    return eval { decode('UTF-8', $data, 1) } // $data;
+}
+
 1;
-- 
2.30.2





  parent reply	other threads:[~2022-11-09 18:28 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-09 18:27 [pmg-devel] [PATCH pmg-api 0/5] ruledb - improve experience for non-ascii tests and mails Stoiko Ivanov
2022-11-09 18:27 ` [pmg-devel] [PATCH pmg-api 1/5] ruledb: modfield: properly encode field after variable substitution Stoiko Ivanov
2022-11-11 13:56   ` [pmg-devel] applied: " Thomas Lamprecht
2022-11-09 18:27 ` [pmg-devel] [PATCH pmg-api 2/5] ruledb: add deprecation warnings for unused actions Stoiko Ivanov
2022-11-14 16:02   ` Dominik Csapak
2022-11-15 14:32   ` [pmg-devel] applied: " Thomas Lamprecht
2022-11-09 18:27 ` Stoiko Ivanov [this message]
2022-11-14 14:36   ` [pmg-devel] [PATCH pmg-api 3/5] fix #2541 ruledb: encode relevant values as utf-8 in database Dominik Csapak
2022-11-09 18:27 ` [pmg-devel] [PATCH pmg-api 4/5] ruledb: encode e-mail addresses for syslog Stoiko Ivanov
2022-11-14 14:49   ` Dominik Csapak
2022-11-09 18:27 ` [pmg-devel] [PATCH pmg-api 5/5] partially fix #2465: handle smtputf8 addresses in the rule-system Stoiko Ivanov
2022-11-14 16:03   ` Dominik Csapak
2022-11-14 16:02 ` [pmg-devel] [PATCH pmg-api 0/5] ruledb - improve experience for non-ascii tests and mails Dominik Csapak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221109182728.629576-4-s.ivanov@proxmox.com \
    --to=s.ivanov@proxmox.com \
    --cc=pmg-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal