From: Stoiko Ivanov <s.ivanov@proxmox.com>
To: pmg-devel@lists.proxmox.com
Subject: [pmg-devel] [PATCH pmg-api 3/5] fix #2541 ruledb: encode relevant values as utf-8 in database
Date: Wed, 9 Nov 2022 19:27:26 +0100 [thread overview]
Message-ID: <20221109182728.629576-4-s.ivanov@proxmox.com> (raw)
In-Reply-To: <20221109182728.629576-1-s.ivanov@proxmox.com>
This patch adds support for storing rule names, comments(info), and
most relevant values (e.g. the header content to match) in utf-8 in
the database.
backwards-compatibility should not be an issue:
* following the argumentation from commit
43f8112f0bb424f99057106d57d32276d7d422a6 in pve-storage
* we only need to consider that the valid multibyte utf-8 characters
do not really yield sensible combinations of single-byte characters
(starting with a byte > 127 - e.g. "£")
the database is created with SQL_ASCII encoding - which behaves by
interpreting bytes <= 127 as ascii and those > 127 are not interpreted
(see [0], which just means that we have to explicitly en-/decode upon
storing/reading from there)
This patch currently omits most Who objects:
* for email/domain we'd still need to consider how to store them
(puny-code for the domain part, or everything as UTF-8) and it would
need changes to the API-types.
* the LDAP objects currently would not work too well, since our LDAPCache
is not UTF-8 safe - and fixing warants its own patch-series
* WhoRegex should work and be able to handle many use-cases
The ContentType values should also contain only ascii characters per
RFC6838 [1] and RFC2045 [2].
[0] https://www.postgresql.org/docs/13/multibyte.html
[1] https://datatracker.ietf.org/doc/html/rfc6838#section-4.2
[2] https://datatracker.ietf.org/doc/html/rfc2045#section-5.1
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
src/PMG/RuleDB.pm | 23 ++++++++++++++++-------
src/PMG/RuleDB/Accept.pm | 2 +-
src/PMG/RuleDB/BCC.pm | 2 +-
src/PMG/RuleDB/Block.pm | 2 +-
src/PMG/RuleDB/Disclaimer.pm | 2 +-
src/PMG/RuleDB/Group.pm | 4 ++--
src/PMG/RuleDB/MatchField.pm | 6 +++++-
src/PMG/RuleDB/MatchFilename.pm | 5 ++++-
src/PMG/RuleDB/ModField.pm | 6 ++++--
src/PMG/RuleDB/Notify.pm | 2 +-
src/PMG/RuleDB/Quarantine.pm | 3 ++-
src/PMG/RuleDB/Remove.pm | 12 +++++++-----
src/PMG/RuleDB/Rule.pm | 2 +-
src/PMG/RuleDB/WhoRegex.pm | 5 ++++-
src/PMG/Utils.pm | 5 +++++
15 files changed, 55 insertions(+), 26 deletions(-)
diff --git a/src/PMG/RuleDB.pm b/src/PMG/RuleDB.pm
index 895acc6..9d6d99d 100644
--- a/src/PMG/RuleDB.pm
+++ b/src/PMG/RuleDB.pm
@@ -5,6 +5,7 @@ use warnings;
use DBI;
use HTML::Entities;
use Data::Dumper;
+use Encode qw(encode decode);
use PVE::SafeSyslog;
@@ -72,7 +73,8 @@ sub create_group_with_obj {
$name //= '';
$info //= '';
-
+ $name = encode('UTF-8', $name);
+ $info = encode('UTF-8',$info);
eval {
$self->{dbh}->begin_work;
@@ -174,7 +176,9 @@ sub save_group {
$self->{dbh}->do("UPDATE Objectgroup " .
"SET Name = ?, Info = ? " .
"WHERE ID = ?", undef,
- $og->{name}, $og->{info}, $og->{id});
+ encode('UTF-8', $og->{name}),
+ encode('UTF-8', $og->{info}),
+ $og->{id});
return $og->{id};
@@ -183,7 +187,7 @@ sub save_group {
"INSERT INTO Objectgroup (Name, Info, Class) " .
"VALUES (?, ?, ?);");
- $sth->execute($og->name, $og->info, $og->class);
+ $sth->execute(encode('UTF-8', $og->name), encode('UTF-8', $og->info), $og->class);
return $og->{id} = PMG::Utils::lastid($self->{dbh}, 'objectgroup_id_seq');
}
@@ -212,7 +216,9 @@ sub delete_group {
$sth->execute($groupid);
if (my $ref = $sth->fetchrow_hashref()) {
- die "Group '$ref->{groupname}' is used by rule '$ref->{rulename}' - unable to delete\n";
+ my $groupname = PMG::Utils::try_deocode_utf8($ref->{groupname});
+ my $rulename = PMG::Utils::try_deocode_utf8($ref->{rulename});
+ die "Group '$groupname' is used by rule '$rulename' - unable to delete\n";
}
$sth->finish();
@@ -474,6 +480,7 @@ sub load_object_full {
sub load_group_by_name {
my ($self, $name) = @_;
+ $name = PMG::Utils::try_decode_utf8($name);
my $sth = $self->{dbh}->prepare("SELECT * FROM Objectgroup " .
"WHERE name = ?");
@@ -598,13 +605,14 @@ sub save_rule {
defined($rule->{direction}) ||
die "undefined rule attribute - direction: ERROR";
+ my $rulename = encode('UTF-8', $rule->{name});
if (defined($rule->{id})) {
$self->{dbh}->do(
"UPDATE Rule " .
"SET Name = ?, Priority = ?, Active = ?, Direction = ? " .
"WHERE ID = ?", undef,
- $rule->{name}, $rule->{priority}, $rule->{active},
+ $rulename, $rule->{priority}, $rule->{active},
$rule->{direction}, $rule->{id});
return $rule->{id};
@@ -614,7 +622,7 @@ sub save_rule {
"INSERT INTO Rule (Name, Priority, Active, Direction) " .
"VALUES (?, ?, ?, ?);");
- $sth->execute($rule->name, $rule->priority, $rule->active,
+ $sth->execute($rulename, $rule->priority, $rule->active,
$rule->direction);
return $rule->{id} = PMG::Utils::lastid($self->{dbh}, 'rule_id_seq');
@@ -779,7 +787,8 @@ sub load_rules {
$sth->execute();
while (my $ref = $sth->fetchrow_hashref()) {
- my $rule = PMG::RuleDB::Rule->new($ref->{name}, $ref->{priority},
+ my $rulename = PMG::Utils::try_decode_utf8($ref->{name});
+ my $rule = PMG::RuleDB::Rule->new($rulename, $ref->{priority},
$ref->{active}, $ref->{direction});
$rule->{id} = $ref->{id};
push @$rules, $rule;
diff --git a/src/PMG/RuleDB/Accept.pm b/src/PMG/RuleDB/Accept.pm
index cd67ea2..4ebd6da 100644
--- a/src/PMG/RuleDB/Accept.pm
+++ b/src/PMG/RuleDB/Accept.pm
@@ -93,7 +93,7 @@ sub execute {
my $dkim = $msginfo->{dkim} // {};
my $subgroups = $mod_group->subgroups($targets, !$dkim->{sign});
- my $rulename = $vars->{RULE} // 'unknown';
+ my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
foreach my $ta (@$subgroups) {
my ($tg, $entity) = (@$ta[0], @$ta[1]);
diff --git a/src/PMG/RuleDB/BCC.pm b/src/PMG/RuleDB/BCC.pm
index d364690..c1225f3 100644
--- a/src/PMG/RuleDB/BCC.pm
+++ b/src/PMG/RuleDB/BCC.pm
@@ -115,7 +115,7 @@ sub execute {
my $subgroups = $mod_group->subgroups($targets, 1);
- my $rulename = $vars->{RULE} // 'unknown';
+ my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
my $bcc_to = PMG::Utils::subst_values($self->{target}, $vars);
diff --git a/src/PMG/RuleDB/Block.pm b/src/PMG/RuleDB/Block.pm
index c758787..25bb74e 100644
--- a/src/PMG/RuleDB/Block.pm
+++ b/src/PMG/RuleDB/Block.pm
@@ -89,7 +89,7 @@ sub execute {
my ($self, $queue, $ruledb, $mod_group, $targets,
$msginfo, $vars, $marks) = @_;
- my $rulename = $vars->{RULE} // 'unknown';
+ my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
if ($msginfo->{testmode}) {
my $fh = $msginfo->{test_fh};
diff --git a/src/PMG/RuleDB/Disclaimer.pm b/src/PMG/RuleDB/Disclaimer.pm
index d3003b2..c6afe54 100644
--- a/src/PMG/RuleDB/Disclaimer.pm
+++ b/src/PMG/RuleDB/Disclaimer.pm
@@ -193,7 +193,7 @@ sub execute {
my ($self, $queue, $ruledb, $mod_group, $targets,
$msginfo, $vars, $marks) = @_;
- my $rulename = $vars->{RULE} // 'unknown';
+ my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
my $subgroups = $mod_group->subgroups($targets);
diff --git a/src/PMG/RuleDB/Group.pm b/src/PMG/RuleDB/Group.pm
index 2508305..baa68ce 100644
--- a/src/PMG/RuleDB/Group.pm
+++ b/src/PMG/RuleDB/Group.pm
@@ -12,8 +12,8 @@ sub new {
my ($type, $name, $info, $class) = @_;
my $self = {
- name => $name,
- info => $info,
+ name => PMG::Utils::try_decode_utf8($name),
+ info => PMG::Utils::try_decode_utf8($info),
class => $class,
};
diff --git a/src/PMG/RuleDB/MatchField.pm b/src/PMG/RuleDB/MatchField.pm
index 2671ea4..8246e6e 100644
--- a/src/PMG/RuleDB/MatchField.pm
+++ b/src/PMG/RuleDB/MatchField.pm
@@ -4,6 +4,7 @@ use strict;
use warnings;
use DBI;
use Digest::SHA;
+use Encode qw(decode encode);
use MIME::Words;
use PVE::SafeSyslog;
@@ -50,9 +51,10 @@ sub load_attr {
defined($field) || die "undefined object attribute: ERROR";
defined($field_value) || die "undefined object attribute: ERROR";
+ my $decoded_field_value = PMG::Utils::try_decode_utf8($field_value);
# use known constructor, bless afterwards (because sub class can have constructor
# with other parameter signature).
- my $obj = PMG::RuleDB::MatchField->new($field, $field_value, $ogroup);
+ my $obj = PMG::RuleDB::MatchField->new($field, $decoded_field_value, $ogroup);
bless $obj, $class;
$obj->{id} = $id;
@@ -69,6 +71,7 @@ sub save {
my $new_value = "$self->{field}:$self->{field_value}";
$new_value =~ s/\\/\\\\/g;
+ $new_value = encode('UTF-8', $new_value);
if (defined ($self->{id})) {
# update
@@ -106,6 +109,7 @@ sub parse_entity {
chomp $value;
my $decvalue = MIME::Words::decode_mimewords($value);
+ $decvalue = PMG::Utils::try_decode_utf8($decvalue);
if ($decvalue =~ m|$self->{field_value}|i) {
push @$res, $id;
diff --git a/src/PMG/RuleDB/MatchFilename.pm b/src/PMG/RuleDB/MatchFilename.pm
index 7e5b486..06bf931 100644
--- a/src/PMG/RuleDB/MatchFilename.pm
+++ b/src/PMG/RuleDB/MatchFilename.pm
@@ -4,6 +4,7 @@ use strict;
use warnings;
use DBI;
use Digest::SHA;
+use Encode qw(encode decode);
use MIME::Words;
use PMG::Utils;
@@ -41,8 +42,9 @@ sub load_attr {
my $class = ref($type) || $type;
defined($value) || die "undefined value: ERROR";;
+ my $decvalue = PMG::Utils::try_decode_utf8($value);
- my $obj = $class->new($value, $ogroup);
+ my $obj = $class->new($decvalue, $ogroup);
$obj->{id} = $id;
$obj->{digest} = Digest::SHA::sha1_hex($id, $value, $ogroup);
@@ -57,6 +59,7 @@ sub save {
my $new_value = $self->{fname};
$new_value =~ s/\\/\\\\/g;
+ $new_value = encode('UTF-8', $new_value);
if (defined($self->{id})) {
# update
diff --git a/src/PMG/RuleDB/ModField.pm b/src/PMG/RuleDB/ModField.pm
index fb15076..1e1727f 100644
--- a/src/PMG/RuleDB/ModField.pm
+++ b/src/PMG/RuleDB/ModField.pm
@@ -57,7 +57,9 @@ sub load_attr {
(defined($field) && defined($field_value)) || return undef;
- my $obj = $class->new($field, $field_value, $ogroup);
+ my $dec_field_value = PMG::Utils::try_decode_utf8($field_value);
+
+ my $obj = $class->new($field, $dec_field_value, $ogroup);
$obj->{id} = $id;
$obj->{digest} = Digest::SHA::sha1_hex($id, $field, $field_value, $ogroup);
@@ -70,7 +72,7 @@ sub save {
defined($self->{ogroup}) || return undef;
- my $new_value = "$self->{field}:$self->{field_value}";
+ my $new_value = encode('UTF-8', "$self->{field}:$self->{field_value}");
if (defined ($self->{id})) {
# update
diff --git a/src/PMG/RuleDB/Notify.pm b/src/PMG/RuleDB/Notify.pm
index af853a3..bca5ebf 100644
--- a/src/PMG/RuleDB/Notify.pm
+++ b/src/PMG/RuleDB/Notify.pm
@@ -208,7 +208,7 @@ sub execute {
my $from = 'postmaster';
- my $rulename = $vars->{RULE} // 'unknown';
+ my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
my $body = PMG::Utils::subst_values($self->{body}, $vars);
my $subject = PMG::Utils::subst_values($self->{subject}, $vars);
diff --git a/src/PMG/RuleDB/Quarantine.pm b/src/PMG/RuleDB/Quarantine.pm
index 1426393..30bc5ec 100644
--- a/src/PMG/RuleDB/Quarantine.pm
+++ b/src/PMG/RuleDB/Quarantine.pm
@@ -4,6 +4,7 @@ use strict;
use warnings;
use DBI;
use Digest::SHA;
+use Encode qw(decode encode);
use PVE::SafeSyslog;
@@ -89,7 +90,7 @@ sub execute {
my $subgroups = $mod_group->subgroups($targets, 1);
- my $rulename = $vars->{RULE} // 'unknown';
+ my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
foreach my $ta (@$subgroups) {
my ($tg, $entity) = (@$ta[0], @$ta[1]);
diff --git a/src/PMG/RuleDB/Remove.pm b/src/PMG/RuleDB/Remove.pm
index 6b27b91..da6c25f 100644
--- a/src/PMG/RuleDB/Remove.pm
+++ b/src/PMG/RuleDB/Remove.pm
@@ -63,12 +63,14 @@ sub load_attr {
defined ($value) || die "undefined value: ERROR";
- my $obj;
+ my ($obj, $text);
if ($value =~ m/^([01])\,([01])(\:(.*))?$/s) {
- $obj = $class->new($1, $4, $ogroup, $2);
+ $text = PMG::Utils::try_decode_utf8($4);
+ $obj = $class->new($1, $text, $ogroup, $2);
} elsif ($value =~ m/^([01])(\:(.*))?$/s) {
- $obj = $class->new($1, $3, $ogroup);
+ $text = PMG::Utils::try_decode_utf8($3);
+ $obj = $class->new($1, $text, $ogroup);
} else {
$obj = $class->new(0, undef, $ogroup);
}
@@ -89,7 +91,7 @@ sub save {
$value .= ','. ($self->{quarantine} ? '1' : '0');
if ($self->{text}) {
- $value .= ":$self->{text}";
+ $value .= encode('UTF-8', ":$self->{text}");
}
if (defined ($self->{id})) {
@@ -194,7 +196,7 @@ sub execute {
my ($self, $queue, $ruledb, $mod_group, $targets,
$msginfo, $vars, $marks, $ldap) = @_;
- my $rulename = $vars->{RULE} // 'unknown';
+ my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
if (!$self->{all} && ($#$marks == -1)) {
# no marks
diff --git a/src/PMG/RuleDB/Rule.pm b/src/PMG/RuleDB/Rule.pm
index c49ad21..e7c9146 100644
--- a/src/PMG/RuleDB/Rule.pm
+++ b/src/PMG/RuleDB/Rule.pm
@@ -12,7 +12,7 @@ sub new {
my ($type, $name, $priority, $active, $direction) = @_;
my $self = {
- name => $name // '',
+ name => PMG::Utils::try_decode_utf8($name) // '',
priority => $priority // 0,
active => $active // 0,
};
diff --git a/src/PMG/RuleDB/WhoRegex.pm b/src/PMG/RuleDB/WhoRegex.pm
index 37ec3aa..ccc94a0 100644
--- a/src/PMG/RuleDB/WhoRegex.pm
+++ b/src/PMG/RuleDB/WhoRegex.pm
@@ -4,6 +4,7 @@ use strict;
use warnings;
use DBI;
use Digest::SHA;
+use Encode qw(decode encode);
use PMG::Utils;
use PMG::RuleDB::Object;
@@ -43,7 +44,8 @@ sub load_attr {
defined($value) || die "undefined value: ERROR";
- my $obj = $class->new ($value, $ogroup);
+ my $decoded_value = PMG::Utils::try_decode_utf8($value);
+ my $obj = $class->new ($decoded_value, $ogroup);
$obj->{id} = $id;
$obj->{digest} = Digest::SHA::sha1_hex($id, $value, $ogroup);
@@ -59,6 +61,7 @@ sub save {
my $adr = $self->{address};
$adr =~ s/\\/\\\\/g;
+ $adr = encode('UTF-8', $adr);
if (defined ($self->{id})) {
# update
diff --git a/src/PMG/Utils.pm b/src/PMG/Utils.pm
index cef232b..23f60eb 100644
--- a/src/PMG/Utils.pm
+++ b/src/PMG/Utils.pm
@@ -1542,4 +1542,9 @@ sub get_existing_object_id {
return;
}
+sub try_decode_utf8 {
+ my ($data) = @_;
+ return eval { decode('UTF-8', $data, 1) } // $data;
+}
+
1;
--
2.30.2
next prev parent reply other threads:[~2022-11-09 18:28 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-09 18:27 [pmg-devel] [PATCH pmg-api 0/5] ruledb - improve experience for non-ascii tests and mails Stoiko Ivanov
2022-11-09 18:27 ` [pmg-devel] [PATCH pmg-api 1/5] ruledb: modfield: properly encode field after variable substitution Stoiko Ivanov
2022-11-11 13:56 ` [pmg-devel] applied: " Thomas Lamprecht
2022-11-09 18:27 ` [pmg-devel] [PATCH pmg-api 2/5] ruledb: add deprecation warnings for unused actions Stoiko Ivanov
2022-11-14 16:02 ` Dominik Csapak
2022-11-15 14:32 ` [pmg-devel] applied: " Thomas Lamprecht
2022-11-09 18:27 ` Stoiko Ivanov [this message]
2022-11-14 14:36 ` [pmg-devel] [PATCH pmg-api 3/5] fix #2541 ruledb: encode relevant values as utf-8 in database Dominik Csapak
2022-11-09 18:27 ` [pmg-devel] [PATCH pmg-api 4/5] ruledb: encode e-mail addresses for syslog Stoiko Ivanov
2022-11-14 14:49 ` Dominik Csapak
2022-11-09 18:27 ` [pmg-devel] [PATCH pmg-api 5/5] partially fix #2465: handle smtputf8 addresses in the rule-system Stoiko Ivanov
2022-11-14 16:03 ` Dominik Csapak
2022-11-14 16:02 ` [pmg-devel] [PATCH pmg-api 0/5] ruledb - improve experience for non-ascii tests and mails Dominik Csapak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221109182728.629576-4-s.ivanov@proxmox.com \
--to=s.ivanov@proxmox.com \
--cc=pmg-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.