From: Stoiko Ivanov <s.ivanov@proxmox.com>
To: pmg-devel@lists.proxmox.com
Subject: [pmg-devel] [PATCH pmg-api 3/5] fix #2541 ruledb: encode relevant values as utf-8 in database
Date: Wed, 9 Nov 2022 19:27:26 +0100 [thread overview]
Message-ID: <20221109182728.629576-4-s.ivanov@proxmox.com> (raw)
In-Reply-To: <20221109182728.629576-1-s.ivanov@proxmox.com>
This patch adds support for storing rule names, comments(info), and
most relevant values (e.g. the header content to match) in utf-8 in
the database.
backwards-compatibility should not be an issue:
* following the argumentation from commit
43f8112f0bb424f99057106d57d32276d7d422a6 in pve-storage
* we only need to consider that the valid multibyte utf-8 characters
do not really yield sensible combinations of single-byte characters
(starting with a byte > 127 - e.g. "£")
the database is created with SQL_ASCII encoding - which behaves by
interpreting bytes <= 127 as ascii and those > 127 are not interpreted
(see [0], which just means that we have to explicitly en-/decode upon
storing/reading from there)
This patch currently omits most Who objects:
* for email/domain we'd still need to consider how to store them
(puny-code for the domain part, or everything as UTF-8) and it would
need changes to the API-types.
* the LDAP objects currently would not work too well, since our LDAPCache
is not UTF-8 safe - and fixing warants its own patch-series
* WhoRegex should work and be able to handle many use-cases
The ContentType values should also contain only ascii characters per
RFC6838 [1] and RFC2045 [2].
[0] https://www.postgresql.org/docs/13/multibyte.html
[1] https://datatracker.ietf.org/doc/html/rfc6838#section-4.2
[2] https://datatracker.ietf.org/doc/html/rfc2045#section-5.1
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
src/PMG/RuleDB.pm | 23 ++++++++++++++++-------
src/PMG/RuleDB/Accept.pm | 2 +-
src/PMG/RuleDB/BCC.pm | 2 +-
src/PMG/RuleDB/Block.pm | 2 +-
src/PMG/RuleDB/Disclaimer.pm | 2 +-
src/PMG/RuleDB/Group.pm | 4 ++--
src/PMG/RuleDB/MatchField.pm | 6 +++++-
src/PMG/RuleDB/MatchFilename.pm | 5 ++++-
src/PMG/RuleDB/ModField.pm | 6 ++++--
src/PMG/RuleDB/Notify.pm | 2 +-
src/PMG/RuleDB/Quarantine.pm | 3 ++-
src/PMG/RuleDB/Remove.pm | 12 +++++++-----
src/PMG/RuleDB/Rule.pm | 2 +-
src/PMG/RuleDB/WhoRegex.pm | 5 ++++-
src/PMG/Utils.pm | 5 +++++
15 files changed, 55 insertions(+), 26 deletions(-)
diff --git a/src/PMG/RuleDB.pm b/src/PMG/RuleDB.pm
index 895acc6..9d6d99d 100644
--- a/src/PMG/RuleDB.pm
+++ b/src/PMG/RuleDB.pm
@@ -5,6 +5,7 @@ use warnings;
use DBI;
use HTML::Entities;
use Data::Dumper;
+use Encode qw(encode decode);
use PVE::SafeSyslog;
@@ -72,7 +73,8 @@ sub create_group_with_obj {
$name //= '';
$info //= '';
-
+ $name = encode('UTF-8', $name);
+ $info = encode('UTF-8',$info);
eval {
$self->{dbh}->begin_work;
@@ -174,7 +176,9 @@ sub save_group {
$self->{dbh}->do("UPDATE Objectgroup " .
"SET Name = ?, Info = ? " .
"WHERE ID = ?", undef,
- $og->{name}, $og->{info}, $og->{id});
+ encode('UTF-8', $og->{name}),
+ encode('UTF-8', $og->{info}),
+ $og->{id});
return $og->{id};
@@ -183,7 +187,7 @@ sub save_group {
"INSERT INTO Objectgroup (Name, Info, Class) " .
"VALUES (?, ?, ?);");
- $sth->execute($og->name, $og->info, $og->class);
+ $sth->execute(encode('UTF-8', $og->name), encode('UTF-8', $og->info), $og->class);
return $og->{id} = PMG::Utils::lastid($self->{dbh}, 'objectgroup_id_seq');
}
@@ -212,7 +216,9 @@ sub delete_group {
$sth->execute($groupid);
if (my $ref = $sth->fetchrow_hashref()) {
- die "Group '$ref->{groupname}' is used by rule '$ref->{rulename}' - unable to delete\n";
+ my $groupname = PMG::Utils::try_deocode_utf8($ref->{groupname});
+ my $rulename = PMG::Utils::try_deocode_utf8($ref->{rulename});
+ die "Group '$groupname' is used by rule '$rulename' - unable to delete\n";
}
$sth->finish();
@@ -474,6 +480,7 @@ sub load_object_full {
sub load_group_by_name {
my ($self, $name) = @_;
+ $name = PMG::Utils::try_decode_utf8($name);
my $sth = $self->{dbh}->prepare("SELECT * FROM Objectgroup " .
"WHERE name = ?");
@@ -598,13 +605,14 @@ sub save_rule {
defined($rule->{direction}) ||
die "undefined rule attribute - direction: ERROR";
+ my $rulename = encode('UTF-8', $rule->{name});
if (defined($rule->{id})) {
$self->{dbh}->do(
"UPDATE Rule " .
"SET Name = ?, Priority = ?, Active = ?, Direction = ? " .
"WHERE ID = ?", undef,
- $rule->{name}, $rule->{priority}, $rule->{active},
+ $rulename, $rule->{priority}, $rule->{active},
$rule->{direction}, $rule->{id});
return $rule->{id};
@@ -614,7 +622,7 @@ sub save_rule {
"INSERT INTO Rule (Name, Priority, Active, Direction) " .
"VALUES (?, ?, ?, ?);");
- $sth->execute($rule->name, $rule->priority, $rule->active,
+ $sth->execute($rulename, $rule->priority, $rule->active,
$rule->direction);
return $rule->{id} = PMG::Utils::lastid($self->{dbh}, 'rule_id_seq');
@@ -779,7 +787,8 @@ sub load_rules {
$sth->execute();
while (my $ref = $sth->fetchrow_hashref()) {
- my $rule = PMG::RuleDB::Rule->new($ref->{name}, $ref->{priority},
+ my $rulename = PMG::Utils::try_decode_utf8($ref->{name});
+ my $rule = PMG::RuleDB::Rule->new($rulename, $ref->{priority},
$ref->{active}, $ref->{direction});
$rule->{id} = $ref->{id};
push @$rules, $rule;
diff --git a/src/PMG/RuleDB/Accept.pm b/src/PMG/RuleDB/Accept.pm
index cd67ea2..4ebd6da 100644
--- a/src/PMG/RuleDB/Accept.pm
+++ b/src/PMG/RuleDB/Accept.pm
@@ -93,7 +93,7 @@ sub execute {
my $dkim = $msginfo->{dkim} // {};
my $subgroups = $mod_group->subgroups($targets, !$dkim->{sign});
- my $rulename = $vars->{RULE} // 'unknown';
+ my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
foreach my $ta (@$subgroups) {
my ($tg, $entity) = (@$ta[0], @$ta[1]);
diff --git a/src/PMG/RuleDB/BCC.pm b/src/PMG/RuleDB/BCC.pm
index d364690..c1225f3 100644
--- a/src/PMG/RuleDB/BCC.pm
+++ b/src/PMG/RuleDB/BCC.pm
@@ -115,7 +115,7 @@ sub execute {
my $subgroups = $mod_group->subgroups($targets, 1);
- my $rulename = $vars->{RULE} // 'unknown';
+ my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
my $bcc_to = PMG::Utils::subst_values($self->{target}, $vars);
diff --git a/src/PMG/RuleDB/Block.pm b/src/PMG/RuleDB/Block.pm
index c758787..25bb74e 100644
--- a/src/PMG/RuleDB/Block.pm
+++ b/src/PMG/RuleDB/Block.pm
@@ -89,7 +89,7 @@ sub execute {
my ($self, $queue, $ruledb, $mod_group, $targets,
$msginfo, $vars, $marks) = @_;
- my $rulename = $vars->{RULE} // 'unknown';
+ my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
if ($msginfo->{testmode}) {
my $fh = $msginfo->{test_fh};
diff --git a/src/PMG/RuleDB/Disclaimer.pm b/src/PMG/RuleDB/Disclaimer.pm
index d3003b2..c6afe54 100644
--- a/src/PMG/RuleDB/Disclaimer.pm
+++ b/src/PMG/RuleDB/Disclaimer.pm
@@ -193,7 +193,7 @@ sub execute {
my ($self, $queue, $ruledb, $mod_group, $targets,
$msginfo, $vars, $marks) = @_;
- my $rulename = $vars->{RULE} // 'unknown';
+ my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
my $subgroups = $mod_group->subgroups($targets);
diff --git a/src/PMG/RuleDB/Group.pm b/src/PMG/RuleDB/Group.pm
index 2508305..baa68ce 100644
--- a/src/PMG/RuleDB/Group.pm
+++ b/src/PMG/RuleDB/Group.pm
@@ -12,8 +12,8 @@ sub new {
my ($type, $name, $info, $class) = @_;
my $self = {
- name => $name,
- info => $info,
+ name => PMG::Utils::try_decode_utf8($name),
+ info => PMG::Utils::try_decode_utf8($info),
class => $class,
};
diff --git a/src/PMG/RuleDB/MatchField.pm b/src/PMG/RuleDB/MatchField.pm
index 2671ea4..8246e6e 100644
--- a/src/PMG/RuleDB/MatchField.pm
+++ b/src/PMG/RuleDB/MatchField.pm
@@ -4,6 +4,7 @@ use strict;
use warnings;
use DBI;
use Digest::SHA;
+use Encode qw(decode encode);
use MIME::Words;
use PVE::SafeSyslog;
@@ -50,9 +51,10 @@ sub load_attr {
defined($field) || die "undefined object attribute: ERROR";
defined($field_value) || die "undefined object attribute: ERROR";
+ my $decoded_field_value = PMG::Utils::try_decode_utf8($field_value);
# use known constructor, bless afterwards (because sub class can have constructor
# with other parameter signature).
- my $obj = PMG::RuleDB::MatchField->new($field, $field_value, $ogroup);
+ my $obj = PMG::RuleDB::MatchField->new($field, $decoded_field_value, $ogroup);
bless $obj, $class;
$obj->{id} = $id;
@@ -69,6 +71,7 @@ sub save {
my $new_value = "$self->{field}:$self->{field_value}";
$new_value =~ s/\\/\\\\/g;
+ $new_value = encode('UTF-8', $new_value);
if (defined ($self->{id})) {
# update
@@ -106,6 +109,7 @@ sub parse_entity {
chomp $value;
my $decvalue = MIME::Words::decode_mimewords($value);
+ $decvalue = PMG::Utils::try_decode_utf8($decvalue);
if ($decvalue =~ m|$self->{field_value}|i) {
push @$res, $id;
diff --git a/src/PMG/RuleDB/MatchFilename.pm b/src/PMG/RuleDB/MatchFilename.pm
index 7e5b486..06bf931 100644
--- a/src/PMG/RuleDB/MatchFilename.pm
+++ b/src/PMG/RuleDB/MatchFilename.pm
@@ -4,6 +4,7 @@ use strict;
use warnings;
use DBI;
use Digest::SHA;
+use Encode qw(encode decode);
use MIME::Words;
use PMG::Utils;
@@ -41,8 +42,9 @@ sub load_attr {
my $class = ref($type) || $type;
defined($value) || die "undefined value: ERROR";;
+ my $decvalue = PMG::Utils::try_decode_utf8($value);
- my $obj = $class->new($value, $ogroup);
+ my $obj = $class->new($decvalue, $ogroup);
$obj->{id} = $id;
$obj->{digest} = Digest::SHA::sha1_hex($id, $value, $ogroup);
@@ -57,6 +59,7 @@ sub save {
my $new_value = $self->{fname};
$new_value =~ s/\\/\\\\/g;
+ $new_value = encode('UTF-8', $new_value);
if (defined($self->{id})) {
# update
diff --git a/src/PMG/RuleDB/ModField.pm b/src/PMG/RuleDB/ModField.pm
index fb15076..1e1727f 100644
--- a/src/PMG/RuleDB/ModField.pm
+++ b/src/PMG/RuleDB/ModField.pm
@@ -57,7 +57,9 @@ sub load_attr {
(defined($field) && defined($field_value)) || return undef;
- my $obj = $class->new($field, $field_value, $ogroup);
+ my $dec_field_value = PMG::Utils::try_decode_utf8($field_value);
+
+ my $obj = $class->new($field, $dec_field_value, $ogroup);
$obj->{id} = $id;
$obj->{digest} = Digest::SHA::sha1_hex($id, $field, $field_value, $ogroup);
@@ -70,7 +72,7 @@ sub save {
defined($self->{ogroup}) || return undef;
- my $new_value = "$self->{field}:$self->{field_value}";
+ my $new_value = encode('UTF-8', "$self->{field}:$self->{field_value}");
if (defined ($self->{id})) {
# update
diff --git a/src/PMG/RuleDB/Notify.pm b/src/PMG/RuleDB/Notify.pm
index af853a3..bca5ebf 100644
--- a/src/PMG/RuleDB/Notify.pm
+++ b/src/PMG/RuleDB/Notify.pm
@@ -208,7 +208,7 @@ sub execute {
my $from = 'postmaster';
- my $rulename = $vars->{RULE} // 'unknown';
+ my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
my $body = PMG::Utils::subst_values($self->{body}, $vars);
my $subject = PMG::Utils::subst_values($self->{subject}, $vars);
diff --git a/src/PMG/RuleDB/Quarantine.pm b/src/PMG/RuleDB/Quarantine.pm
index 1426393..30bc5ec 100644
--- a/src/PMG/RuleDB/Quarantine.pm
+++ b/src/PMG/RuleDB/Quarantine.pm
@@ -4,6 +4,7 @@ use strict;
use warnings;
use DBI;
use Digest::SHA;
+use Encode qw(decode encode);
use PVE::SafeSyslog;
@@ -89,7 +90,7 @@ sub execute {
my $subgroups = $mod_group->subgroups($targets, 1);
- my $rulename = $vars->{RULE} // 'unknown';
+ my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
foreach my $ta (@$subgroups) {
my ($tg, $entity) = (@$ta[0], @$ta[1]);
diff --git a/src/PMG/RuleDB/Remove.pm b/src/PMG/RuleDB/Remove.pm
index 6b27b91..da6c25f 100644
--- a/src/PMG/RuleDB/Remove.pm
+++ b/src/PMG/RuleDB/Remove.pm
@@ -63,12 +63,14 @@ sub load_attr {
defined ($value) || die "undefined value: ERROR";
- my $obj;
+ my ($obj, $text);
if ($value =~ m/^([01])\,([01])(\:(.*))?$/s) {
- $obj = $class->new($1, $4, $ogroup, $2);
+ $text = PMG::Utils::try_decode_utf8($4);
+ $obj = $class->new($1, $text, $ogroup, $2);
} elsif ($value =~ m/^([01])(\:(.*))?$/s) {
- $obj = $class->new($1, $3, $ogroup);
+ $text = PMG::Utils::try_decode_utf8($3);
+ $obj = $class->new($1, $text, $ogroup);
} else {
$obj = $class->new(0, undef, $ogroup);
}
@@ -89,7 +91,7 @@ sub save {
$value .= ','. ($self->{quarantine} ? '1' : '0');
if ($self->{text}) {
- $value .= ":$self->{text}";
+ $value .= encode('UTF-8', ":$self->{text}");
}
if (defined ($self->{id})) {
@@ -194,7 +196,7 @@ sub execute {
my ($self, $queue, $ruledb, $mod_group, $targets,
$msginfo, $vars, $marks, $ldap) = @_;
- my $rulename = $vars->{RULE} // 'unknown';
+ my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
if (!$self->{all} && ($#$marks == -1)) {
# no marks
diff --git a/src/PMG/RuleDB/Rule.pm b/src/PMG/RuleDB/Rule.pm
index c49ad21..e7c9146 100644
--- a/src/PMG/RuleDB/Rule.pm
+++ b/src/PMG/RuleDB/Rule.pm
@@ -12,7 +12,7 @@ sub new {
my ($type, $name, $priority, $active, $direction) = @_;
my $self = {
- name => $name // '',
+ name => PMG::Utils::try_decode_utf8($name) // '',
priority => $priority // 0,
active => $active // 0,
};
diff --git a/src/PMG/RuleDB/WhoRegex.pm b/src/PMG/RuleDB/WhoRegex.pm
index 37ec3aa..ccc94a0 100644
--- a/src/PMG/RuleDB/WhoRegex.pm
+++ b/src/PMG/RuleDB/WhoRegex.pm
@@ -4,6 +4,7 @@ use strict;
use warnings;
use DBI;
use Digest::SHA;
+use Encode qw(decode encode);
use PMG::Utils;
use PMG::RuleDB::Object;
@@ -43,7 +44,8 @@ sub load_attr {
defined($value) || die "undefined value: ERROR";
- my $obj = $class->new ($value, $ogroup);
+ my $decoded_value = PMG::Utils::try_decode_utf8($value);
+ my $obj = $class->new ($decoded_value, $ogroup);
$obj->{id} = $id;
$obj->{digest} = Digest::SHA::sha1_hex($id, $value, $ogroup);
@@ -59,6 +61,7 @@ sub save {
my $adr = $self->{address};
$adr =~ s/\\/\\\\/g;
+ $adr = encode('UTF-8', $adr);
if (defined ($self->{id})) {
# update
diff --git a/src/PMG/Utils.pm b/src/PMG/Utils.pm
index cef232b..23f60eb 100644
--- a/src/PMG/Utils.pm
+++ b/src/PMG/Utils.pm
@@ -1542,4 +1542,9 @@ sub get_existing_object_id {
return;
}
+sub try_decode_utf8 {
+ my ($data) = @_;
+ return eval { decode('UTF-8', $data, 1) } // $data;
+}
+
1;
--
2.30.2
next prev parent reply other threads:[~2022-11-09 18:28 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-09 18:27 [pmg-devel] [PATCH pmg-api 0/5] ruledb - improve experience for non-ascii tests and mails Stoiko Ivanov
2022-11-09 18:27 ` [pmg-devel] [PATCH pmg-api 1/5] ruledb: modfield: properly encode field after variable substitution Stoiko Ivanov
2022-11-11 13:56 ` [pmg-devel] applied: " Thomas Lamprecht
2022-11-09 18:27 ` [pmg-devel] [PATCH pmg-api 2/5] ruledb: add deprecation warnings for unused actions Stoiko Ivanov
2022-11-14 16:02 ` Dominik Csapak
2022-11-15 14:32 ` [pmg-devel] applied: " Thomas Lamprecht
2022-11-09 18:27 ` Stoiko Ivanov [this message]
2022-11-14 14:36 ` [pmg-devel] [PATCH pmg-api 3/5] fix #2541 ruledb: encode relevant values as utf-8 in database Dominik Csapak
2022-11-09 18:27 ` [pmg-devel] [PATCH pmg-api 4/5] ruledb: encode e-mail addresses for syslog Stoiko Ivanov
2022-11-14 14:49 ` Dominik Csapak
2022-11-09 18:27 ` [pmg-devel] [PATCH pmg-api 5/5] partially fix #2465: handle smtputf8 addresses in the rule-system Stoiko Ivanov
2022-11-14 16:03 ` Dominik Csapak
2022-11-14 16:02 ` [pmg-devel] [PATCH pmg-api 0/5] ruledb - improve experience for non-ascii tests and mails Dominik Csapak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221109182728.629576-4-s.ivanov@proxmox.com \
--to=s.ivanov@proxmox.com \
--cc=pmg-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox