* [pmg-devel] [PATCH pmg-api v4 00/12] ruledb - improve experience for non-ascii tests and mails
@ 2022-11-24 12:21 Dominik Csapak
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 01/12] utils: return perl string from decode_rfc1522 Dominik Csapak
` (12 more replies)
0 siblings, 13 replies; 14+ messages in thread
From: Dominik Csapak @ 2022-11-24 12:21 UTC (permalink / raw)
To: pmg-devel
as replacement for the v3 from stoiko (i did not resend the gui patches,
as they are ok and still valid)
i added some of my notes as follow ups (ldap/bwlist/refactors)
as well as modified some commit messages of stoiko
i tested with various configurations with ldap, including
unicode characters of the local part of the account/email
(i only got this to work in active directory...)
Dominik Csapak (4):
quarantine: fix adding non-ascii senders to wl/bl
utils: refactor rfc1522_to_html
ldap: improve unicode support
statistics: refactor filter_text generation
Stoiko Ivanov (8):
utils: return perl string from decode_rfc1522
ruledb: properly substitute prox_vars in headers
fix #2541 ruledb: encode relevant values as utf-8 in database
ruledb: encode e-mail addresses for syslog
partially fix #2465: handle smtputf8 addresses in the rule-system
quarantine: handle utf8 data
pmgqm: handle smtputf8 data
statistics: handle utf8 data.
src/PMG/API2/Quarantine.pm | 14 +++----
src/PMG/CLI/pmgqm.pm | 24 ++++++-----
src/PMG/HTMLMail.pm | 7 ++--
src/PMG/LDAPCache.pm | 31 ++++++++------
src/PMG/MailQueue.pm | 10 +++--
src/PMG/Quarantine.pm | 13 +++---
src/PMG/RuleDB.pm | 24 +++++++----
src/PMG/RuleDB/Accept.pm | 2 +-
src/PMG/RuleDB/BCC.pm | 23 ++++++++--
src/PMG/RuleDB/Block.pm | 2 +-
src/PMG/RuleDB/Disclaimer.pm | 2 +-
src/PMG/RuleDB/Group.pm | 4 +-
src/PMG/RuleDB/LDAP.pm | 11 +++--
src/PMG/RuleDB/LDAPUser.pm | 13 +++---
src/PMG/RuleDB/MatchField.pm | 8 +++-
src/PMG/RuleDB/MatchFilename.pm | 5 ++-
src/PMG/RuleDB/ModField.pm | 19 +++------
src/PMG/RuleDB/Notify.pm | 24 ++++++++---
src/PMG/RuleDB/Quarantine.pm | 19 +++++++--
src/PMG/RuleDB/Remove.pm | 20 ++++++---
src/PMG/RuleDB/Rule.pm | 2 +-
src/PMG/RuleDB/Spam.pm | 17 ++++----
src/PMG/RuleDB/WhoRegex.pm | 5 ++-
src/PMG/Statistic.pm | 74 +++++++++++++++++++++++++--------
src/PMG/Utils.pm | 48 ++++++++++++---------
src/bin/pmg-smtp-filter | 7 ++--
26 files changed, 277 insertions(+), 151 deletions(-)
--
2.30.2
^ permalink raw reply [flat|nested] 14+ messages in thread
* [pmg-devel] [PATCH pmg-api v4 01/12] utils: return perl string from decode_rfc1522
2022-11-24 12:21 [pmg-devel] [PATCH pmg-api v4 00/12] ruledb - improve experience for non-ascii tests and mails Dominik Csapak
@ 2022-11-24 12:21 ` Dominik Csapak
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 02/12] ruledb: properly substitute prox_vars in headers Dominik Csapak
` (11 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Dominik Csapak @ 2022-11-24 12:21 UTC (permalink / raw)
To: pmg-devel
From: Stoiko Ivanov <s.ivanov@proxmox.com>
decode_rfc1522 is a more robust version of decode_mimewords (in
scalar context) - adapt it to return a perlstring, under the
assumption that data is utf-8 encoded (holds true for ascii and
smtputf8 mails)
the try_decode_utf8 helper sub backwards will be used extensively in
later patches and is inspired by commit
43f8112f0bb424f99057106d57d32276d7d422a6 in pve-storage:
We consider that the valid multibyte utf-8 characters do not really
yield sensible combinations of single-byte perl characters (starting
with a byte > 127 - e.g. "£") so if something decodes without error
from utf-8 it will in all likelyhood have been utf-8 to begin with
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
src/PMG/Utils.pm | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/src/PMG/Utils.pm b/src/PMG/Utils.pm
index cef232b..cfb8852 100644
--- a/src/PMG/Utils.pm
+++ b/src/PMG/Utils.pm
@@ -1088,6 +1088,7 @@ sub decode_to_html {
return $res;
}
+# assume enc contains utf-8 and mime-encoded data returns a perl-string (with wide characters)
sub decode_rfc1522 {
my ($enc) = @_;
@@ -1102,7 +1103,7 @@ sub decode_rfc1522 {
if ($cs) {
$res .= decode($cs, $d);
} else {
- $res .= $d;
+ $res .= try_decode_utf8($d);
}
}
}
@@ -1542,4 +1543,9 @@ sub get_existing_object_id {
return;
}
+sub try_decode_utf8 {
+ my ($data) = @_;
+ return eval { decode('UTF-8', $data, 1) } // $data;
+}
+
1;
--
2.30.2
^ permalink raw reply [flat|nested] 14+ messages in thread
* [pmg-devel] [PATCH pmg-api v4 02/12] ruledb: properly substitute prox_vars in headers
2022-11-24 12:21 [pmg-devel] [PATCH pmg-api v4 00/12] ruledb - improve experience for non-ascii tests and mails Dominik Csapak
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 01/12] utils: return perl string from decode_rfc1522 Dominik Csapak
@ 2022-11-24 12:21 ` Dominik Csapak
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 03/12] fix #2541 ruledb: encode relevant values as utf-8 in database Dominik Csapak
` (10 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Dominik Csapak @ 2022-11-24 12:21 UTC (permalink / raw)
To: pmg-devel
From: Stoiko Ivanov <s.ivanov@proxmox.com>
by storing the variables as perl-string (not mime-encoded, and not
utf-8 encoded), and appropriately dealing with multi-line values to
input (folding the headers and encoding as mime).
This fixes another glitch not caught by
d3d6b5dff9e4447d16cb92e0fdf26f67d9384423
the Subject was always displayed with a '?' in the end (due to the
(quoted-printable encoded) \n added).
Additionally adapt the other callsites of PMG::Utils::subst_values
where applicable.
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
src/PMG/RuleDB/BCC.pm | 2 +-
src/PMG/RuleDB/ModField.pm | 13 +------------
src/PMG/RuleDB/Notify.pm | 4 ++--
src/PMG/Utils.pm | 17 +++++++++++++++++
src/bin/pmg-smtp-filter | 2 +-
5 files changed, 22 insertions(+), 16 deletions(-)
diff --git a/src/PMG/RuleDB/BCC.pm b/src/PMG/RuleDB/BCC.pm
index d364690..4867d83 100644
--- a/src/PMG/RuleDB/BCC.pm
+++ b/src/PMG/RuleDB/BCC.pm
@@ -117,7 +117,7 @@ sub execute {
my $rulename = $vars->{RULE} // 'unknown';
- my $bcc_to = PMG::Utils::subst_values($self->{target}, $vars);
+ my $bcc_to = PMG::Utils::subst_values_for_header($self->{target}, $vars);
if ($bcc_to =~ m/^\s*$/) {
# this happens if a notification is triggered by bounce mails
diff --git a/src/PMG/RuleDB/ModField.pm b/src/PMG/RuleDB/ModField.pm
index 4ebb618..34108d1 100644
--- a/src/PMG/RuleDB/ModField.pm
+++ b/src/PMG/RuleDB/ModField.pm
@@ -5,7 +5,6 @@ use warnings;
use DBI;
use Digest::SHA;
use Encode qw(encode decode);
-use MIME::Words qw(encode_mimewords);
use PMG::Utils;
use PMG::ModGroup;
@@ -98,17 +97,7 @@ sub execute {
my ($self, $queue, $ruledb, $mod_group, $targets,
$msginfo, $vars, $marks) = @_;
- my $fvalue = '';
-
- foreach my $line (split('\r?\n\s*',PMG::Utils::subst_values ($self->{field_value}, $vars))) {
- $fvalue .= "\n" if $fvalue;
- $fvalue .= encode_mimewords(encode('UTF-8', $line), 'Charset' => 'UTF-8');
- }
-
- # support for multiline values (i.e. __SPAM_INFO__)
- $fvalue =~ s/\n/\n\t/sg; # indent content
- $fvalue =~ s/\n\s*\n//sg; # remove empty line
- $fvalue =~ s/\n?\s*$//s; # remove trailing spaces
+ my $fvalue = PMG::Utils::subst_values_for_header($self->{field_value}, $vars);
my $subgroups = $mod_group->subgroups($targets);
diff --git a/src/PMG/RuleDB/Notify.pm b/src/PMG/RuleDB/Notify.pm
index d67221e..7b38e0d 100644
--- a/src/PMG/RuleDB/Notify.pm
+++ b/src/PMG/RuleDB/Notify.pm
@@ -211,8 +211,8 @@ sub execute {
my $rulename = $vars->{RULE} // 'unknown';
my $body = PMG::Utils::subst_values($self->{body}, $vars);
- my $subject = PMG::Utils::subst_values($self->{subject}, $vars);
- my $to = PMG::Utils::subst_values($self->{to}, $vars);
+ my $subject = PMG::Utils::subst_values_for_header($self->{subject}, $vars);
+ my $to = PMG::Utils::subst_values_for_header($self->{to}, $vars);
if ($to =~ m/^\s*$/) {
# this happens if a notification is triggered by bounce mails
diff --git a/src/PMG/Utils.pm b/src/PMG/Utils.pm
index cfb8852..cc30e67 100644
--- a/src/PMG/Utils.pm
+++ b/src/PMG/Utils.pm
@@ -203,6 +203,23 @@ sub subst_values {
return $body;
}
+sub subst_values_for_header {
+ my ($header, $dh) = @_;
+
+ my $res = '';
+ foreach my $line (split('\r?\n\s*', subst_values ($header, $dh))) {
+ $res .= "\n" if $res;
+ $res .= MIME::Words::encode_mimewords(encode('UTF-8', $line), 'Charset' => 'UTF-8');
+ }
+
+ # support for multiline values (i.e. __SPAM_INFO__)
+ $res =~ s/\n/\n\t/sg; # indent content
+ $res =~ s/\n\s*\n//sg; # remove empty line
+ $res =~ s/\n?\s*$//s; # remove trailing spaces
+
+ return $res;
+}
+
sub reinject_mail {
my ($entity, $sender, $targets, $xforward, $me, $params) = @_;
diff --git a/src/bin/pmg-smtp-filter b/src/bin/pmg-smtp-filter
index 35a6ac6..45e68a7 100755
--- a/src/bin/pmg-smtp-filter
+++ b/src/bin/pmg-smtp-filter
@@ -152,7 +152,7 @@ sub get_prox_vars {
} if !$spaminfo;
my $vars = {
- 'SUBJECT' => mime_to_perl_string($entity->head->get ('subject', 0) || 'No Subject'),
+ 'SUBJECT' => PMG::Utils::decode_rfc1522($entity->head->get ('subject', 0) || 'No Subject'),
'RULE' => $rule->{name},
'RULE_INFO' => $msginfo->{rule_info},
'SENDER' => $msginfo->{sender},
--
2.30.2
^ permalink raw reply [flat|nested] 14+ messages in thread
* [pmg-devel] [PATCH pmg-api v4 03/12] fix #2541 ruledb: encode relevant values as utf-8 in database
2022-11-24 12:21 [pmg-devel] [PATCH pmg-api v4 00/12] ruledb - improve experience for non-ascii tests and mails Dominik Csapak
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 01/12] utils: return perl string from decode_rfc1522 Dominik Csapak
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 02/12] ruledb: properly substitute prox_vars in headers Dominik Csapak
@ 2022-11-24 12:21 ` Dominik Csapak
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 04/12] ruledb: encode e-mail addresses for syslog Dominik Csapak
` (9 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Dominik Csapak @ 2022-11-24 12:21 UTC (permalink / raw)
To: pmg-devel
From: Stoiko Ivanov <s.ivanov@proxmox.com>
This patch adds support for storing rule names, comments(info), and
most relevant values (e.g. the header content to match) in utf-8 in
the database.
backwards-compatibility should not be an issue:
* currently the database should not contain any utf-8 multibyte
characters, as our tooling prevented this due to sending
wide-characters, which causes an exception in DBI.
* any character > 127 and < 256 will be correctly interpreted when
stored in a perl-string (this happens if the decode fails in
try_decode_utf8), and will be correctly encoded when storing into
the database.
the database is created with SQL_ASCII encoding - which behaves by
interpreting bytes <= 127 as ascii and those > 127 are not interpreted
(see [0], which just means that we have to explicitly en-/decode upon
storing/reading from there)
This patch currently omits most Who objects:
* for email/domain we'd still need to consider how to store them
(puny-code for the domain part, or everything as UTF-8) and it would
need changes to the API-types.
* the LDAP objects currently would not work too well, since our LDAPCache
is not UTF-8 safe - and fixing warants its own patch-series
* WhoRegex should work and be able to handle many use-cases
The ContentType values should also contain only ascii characters per
RFC6838 [1] and RFC2045 [2].
[0] https://www.postgresql.org/docs/13/multibyte.html
[1] https://datatracker.ietf.org/doc/html/rfc6838#section-4.2
[2] https://datatracker.ietf.org/doc/html/rfc2045#section-5.1
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
src/PMG/RuleDB.pm | 24 ++++++++++++++++--------
src/PMG/RuleDB/Accept.pm | 2 +-
src/PMG/RuleDB/BCC.pm | 2 +-
src/PMG/RuleDB/Block.pm | 2 +-
src/PMG/RuleDB/Disclaimer.pm | 2 +-
src/PMG/RuleDB/Group.pm | 4 ++--
src/PMG/RuleDB/MatchField.pm | 8 ++++++--
src/PMG/RuleDB/MatchFilename.pm | 5 ++++-
src/PMG/RuleDB/ModField.pm | 6 ++++--
src/PMG/RuleDB/Notify.pm | 2 +-
src/PMG/RuleDB/Quarantine.pm | 3 ++-
src/PMG/RuleDB/Remove.pm | 12 +++++++-----
src/PMG/RuleDB/Rule.pm | 2 +-
src/PMG/RuleDB/WhoRegex.pm | 5 ++++-
14 files changed, 51 insertions(+), 28 deletions(-)
diff --git a/src/PMG/RuleDB.pm b/src/PMG/RuleDB.pm
index 895acc6..a6b0b79 100644
--- a/src/PMG/RuleDB.pm
+++ b/src/PMG/RuleDB.pm
@@ -5,6 +5,7 @@ use warnings;
use DBI;
use HTML::Entities;
use Data::Dumper;
+use Encode qw(encode);
use PVE::SafeSyslog;
@@ -70,8 +71,8 @@ sub create_group_with_obj {
defined($obj) || die "proxmox: undefined object";
- $name //= '';
- $info //= '';
+ $name = encode('UTF-8', $name // '');
+ $info = encode('UTF-8', $info // '');
eval {
@@ -174,7 +175,9 @@ sub save_group {
$self->{dbh}->do("UPDATE Objectgroup " .
"SET Name = ?, Info = ? " .
"WHERE ID = ?", undef,
- $og->{name}, $og->{info}, $og->{id});
+ encode('UTF-8', $og->{name}),
+ encode('UTF-8', $og->{info}),
+ $og->{id});
return $og->{id};
@@ -183,7 +186,7 @@ sub save_group {
"INSERT INTO Objectgroup (Name, Info, Class) " .
"VALUES (?, ?, ?);");
- $sth->execute($og->name, $og->info, $og->class);
+ $sth->execute(encode('UTF-8', $og->name), encode('UTF-8', $og->info), $og->class);
return $og->{id} = PMG::Utils::lastid($self->{dbh}, 'objectgroup_id_seq');
}
@@ -212,7 +215,9 @@ sub delete_group {
$sth->execute($groupid);
if (my $ref = $sth->fetchrow_hashref()) {
- die "Group '$ref->{groupname}' is used by rule '$ref->{rulename}' - unable to delete\n";
+ my $groupname = PMG::Utils::try_decode_utf8($ref->{groupname});
+ my $rulename = PMG::Utils::try_decode_utf8($ref->{rulename});
+ die "Group '$groupname' is used by rule '$rulename' - unable to delete\n";
}
$sth->finish();
@@ -474,6 +479,7 @@ sub load_object_full {
sub load_group_by_name {
my ($self, $name) = @_;
+ $name = encode('UTF-8', $name);
my $sth = $self->{dbh}->prepare("SELECT * FROM Objectgroup " .
"WHERE name = ?");
@@ -598,13 +604,14 @@ sub save_rule {
defined($rule->{direction}) ||
die "undefined rule attribute - direction: ERROR";
+ my $rulename = encode('UTF-8', $rule->{name});
if (defined($rule->{id})) {
$self->{dbh}->do(
"UPDATE Rule " .
"SET Name = ?, Priority = ?, Active = ?, Direction = ? " .
"WHERE ID = ?", undef,
- $rule->{name}, $rule->{priority}, $rule->{active},
+ $rulename, $rule->{priority}, $rule->{active},
$rule->{direction}, $rule->{id});
return $rule->{id};
@@ -614,7 +621,7 @@ sub save_rule {
"INSERT INTO Rule (Name, Priority, Active, Direction) " .
"VALUES (?, ?, ?, ?);");
- $sth->execute($rule->name, $rule->priority, $rule->active,
+ $sth->execute($rulename, $rule->priority, $rule->active,
$rule->direction);
return $rule->{id} = PMG::Utils::lastid($self->{dbh}, 'rule_id_seq');
@@ -779,7 +786,8 @@ sub load_rules {
$sth->execute();
while (my $ref = $sth->fetchrow_hashref()) {
- my $rule = PMG::RuleDB::Rule->new($ref->{name}, $ref->{priority},
+ my $rulename = PMG::Utils::try_decode_utf8($ref->{name});
+ my $rule = PMG::RuleDB::Rule->new($rulename, $ref->{priority},
$ref->{active}, $ref->{direction});
$rule->{id} = $ref->{id};
push @$rules, $rule;
diff --git a/src/PMG/RuleDB/Accept.pm b/src/PMG/RuleDB/Accept.pm
index cd67ea2..4ebd6da 100644
--- a/src/PMG/RuleDB/Accept.pm
+++ b/src/PMG/RuleDB/Accept.pm
@@ -93,7 +93,7 @@ sub execute {
my $dkim = $msginfo->{dkim} // {};
my $subgroups = $mod_group->subgroups($targets, !$dkim->{sign});
- my $rulename = $vars->{RULE} // 'unknown';
+ my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
foreach my $ta (@$subgroups) {
my ($tg, $entity) = (@$ta[0], @$ta[1]);
diff --git a/src/PMG/RuleDB/BCC.pm b/src/PMG/RuleDB/BCC.pm
index 4867d83..6244dd9 100644
--- a/src/PMG/RuleDB/BCC.pm
+++ b/src/PMG/RuleDB/BCC.pm
@@ -115,7 +115,7 @@ sub execute {
my $subgroups = $mod_group->subgroups($targets, 1);
- my $rulename = $vars->{RULE} // 'unknown';
+ my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
my $bcc_to = PMG::Utils::subst_values_for_header($self->{target}, $vars);
diff --git a/src/PMG/RuleDB/Block.pm b/src/PMG/RuleDB/Block.pm
index c758787..25bb74e 100644
--- a/src/PMG/RuleDB/Block.pm
+++ b/src/PMG/RuleDB/Block.pm
@@ -89,7 +89,7 @@ sub execute {
my ($self, $queue, $ruledb, $mod_group, $targets,
$msginfo, $vars, $marks) = @_;
- my $rulename = $vars->{RULE} // 'unknown';
+ my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
if ($msginfo->{testmode}) {
my $fh = $msginfo->{test_fh};
diff --git a/src/PMG/RuleDB/Disclaimer.pm b/src/PMG/RuleDB/Disclaimer.pm
index d3003b2..c6afe54 100644
--- a/src/PMG/RuleDB/Disclaimer.pm
+++ b/src/PMG/RuleDB/Disclaimer.pm
@@ -193,7 +193,7 @@ sub execute {
my ($self, $queue, $ruledb, $mod_group, $targets,
$msginfo, $vars, $marks) = @_;
- my $rulename = $vars->{RULE} // 'unknown';
+ my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
my $subgroups = $mod_group->subgroups($targets);
diff --git a/src/PMG/RuleDB/Group.pm b/src/PMG/RuleDB/Group.pm
index 2508305..baa68ce 100644
--- a/src/PMG/RuleDB/Group.pm
+++ b/src/PMG/RuleDB/Group.pm
@@ -12,8 +12,8 @@ sub new {
my ($type, $name, $info, $class) = @_;
my $self = {
- name => $name,
- info => $info,
+ name => PMG::Utils::try_decode_utf8($name),
+ info => PMG::Utils::try_decode_utf8($info),
class => $class,
};
diff --git a/src/PMG/RuleDB/MatchField.pm b/src/PMG/RuleDB/MatchField.pm
index 2671ea4..2b56058 100644
--- a/src/PMG/RuleDB/MatchField.pm
+++ b/src/PMG/RuleDB/MatchField.pm
@@ -4,6 +4,7 @@ use strict;
use warnings;
use DBI;
use Digest::SHA;
+use Encode qw(encode);
use MIME::Words;
use PVE::SafeSyslog;
@@ -50,9 +51,10 @@ sub load_attr {
defined($field) || die "undefined object attribute: ERROR";
defined($field_value) || die "undefined object attribute: ERROR";
+ my $decoded_field_value = PMG::Utils::try_decode_utf8($field_value);
# use known constructor, bless afterwards (because sub class can have constructor
# with other parameter signature).
- my $obj = PMG::RuleDB::MatchField->new($field, $field_value, $ogroup);
+ my $obj = PMG::RuleDB::MatchField->new($field, $decoded_field_value, $ogroup);
bless $obj, $class;
$obj->{id} = $id;
@@ -69,6 +71,7 @@ sub save {
my $new_value = "$self->{field}:$self->{field_value}";
$new_value =~ s/\\/\\\\/g;
+ $new_value = encode('UTF-8', $new_value);
if (defined ($self->{id})) {
# update
@@ -105,7 +108,8 @@ sub parse_entity {
for my $value ($entity->head->get_all($self->{field})) {
chomp $value;
- my $decvalue = MIME::Words::decode_mimewords($value);
+ my $decvalue = PMG::Utils::decode_rfc1522($value);
+ $decvalue = PMG::Utils::try_decode_utf8($decvalue);
if ($decvalue =~ m|$self->{field_value}|i) {
push @$res, $id;
diff --git a/src/PMG/RuleDB/MatchFilename.pm b/src/PMG/RuleDB/MatchFilename.pm
index 7e5b486..c9cdbe0 100644
--- a/src/PMG/RuleDB/MatchFilename.pm
+++ b/src/PMG/RuleDB/MatchFilename.pm
@@ -4,6 +4,7 @@ use strict;
use warnings;
use DBI;
use Digest::SHA;
+use Encode qw(encode);
use MIME::Words;
use PMG::Utils;
@@ -41,8 +42,9 @@ sub load_attr {
my $class = ref($type) || $type;
defined($value) || die "undefined value: ERROR";;
+ my $decvalue = PMG::Utils::try_decode_utf8($value);
- my $obj = $class->new($value, $ogroup);
+ my $obj = $class->new($decvalue, $ogroup);
$obj->{id} = $id;
$obj->{digest} = Digest::SHA::sha1_hex($id, $value, $ogroup);
@@ -57,6 +59,7 @@ sub save {
my $new_value = $self->{fname};
$new_value =~ s/\\/\\\\/g;
+ $new_value = encode('UTF-8', $new_value);
if (defined($self->{id})) {
# update
diff --git a/src/PMG/RuleDB/ModField.pm b/src/PMG/RuleDB/ModField.pm
index 34108d1..6232322 100644
--- a/src/PMG/RuleDB/ModField.pm
+++ b/src/PMG/RuleDB/ModField.pm
@@ -56,7 +56,9 @@ sub load_attr {
(defined($field) && defined($field_value)) || return undef;
- my $obj = $class->new($field, $field_value, $ogroup);
+ my $dec_field_value = PMG::Utils::try_decode_utf8($field_value);
+
+ my $obj = $class->new($field, $dec_field_value, $ogroup);
$obj->{id} = $id;
$obj->{digest} = Digest::SHA::sha1_hex($id, $field, $field_value, $ogroup);
@@ -69,7 +71,7 @@ sub save {
defined($self->{ogroup}) || return undef;
- my $new_value = "$self->{field}:$self->{field_value}";
+ my $new_value = encode('UTF-8', "$self->{field}:$self->{field_value}");
if (defined ($self->{id})) {
# update
diff --git a/src/PMG/RuleDB/Notify.pm b/src/PMG/RuleDB/Notify.pm
index 7b38e0d..8a9945b 100644
--- a/src/PMG/RuleDB/Notify.pm
+++ b/src/PMG/RuleDB/Notify.pm
@@ -208,7 +208,7 @@ sub execute {
my $from = 'postmaster';
- my $rulename = $vars->{RULE} // 'unknown';
+ my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
my $body = PMG::Utils::subst_values($self->{body}, $vars);
my $subject = PMG::Utils::subst_values_for_header($self->{subject}, $vars);
diff --git a/src/PMG/RuleDB/Quarantine.pm b/src/PMG/RuleDB/Quarantine.pm
index 1426393..9d802fe 100644
--- a/src/PMG/RuleDB/Quarantine.pm
+++ b/src/PMG/RuleDB/Quarantine.pm
@@ -4,6 +4,7 @@ use strict;
use warnings;
use DBI;
use Digest::SHA;
+use Encode qw(encode);
use PVE::SafeSyslog;
@@ -89,7 +90,7 @@ sub execute {
my $subgroups = $mod_group->subgroups($targets, 1);
- my $rulename = $vars->{RULE} // 'unknown';
+ my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
foreach my $ta (@$subgroups) {
my ($tg, $entity) = (@$ta[0], @$ta[1]);
diff --git a/src/PMG/RuleDB/Remove.pm b/src/PMG/RuleDB/Remove.pm
index 6b27b91..da6c25f 100644
--- a/src/PMG/RuleDB/Remove.pm
+++ b/src/PMG/RuleDB/Remove.pm
@@ -63,12 +63,14 @@ sub load_attr {
defined ($value) || die "undefined value: ERROR";
- my $obj;
+ my ($obj, $text);
if ($value =~ m/^([01])\,([01])(\:(.*))?$/s) {
- $obj = $class->new($1, $4, $ogroup, $2);
+ $text = PMG::Utils::try_decode_utf8($4);
+ $obj = $class->new($1, $text, $ogroup, $2);
} elsif ($value =~ m/^([01])(\:(.*))?$/s) {
- $obj = $class->new($1, $3, $ogroup);
+ $text = PMG::Utils::try_decode_utf8($3);
+ $obj = $class->new($1, $text, $ogroup);
} else {
$obj = $class->new(0, undef, $ogroup);
}
@@ -89,7 +91,7 @@ sub save {
$value .= ','. ($self->{quarantine} ? '1' : '0');
if ($self->{text}) {
- $value .= ":$self->{text}";
+ $value .= encode('UTF-8', ":$self->{text}");
}
if (defined ($self->{id})) {
@@ -194,7 +196,7 @@ sub execute {
my ($self, $queue, $ruledb, $mod_group, $targets,
$msginfo, $vars, $marks, $ldap) = @_;
- my $rulename = $vars->{RULE} // 'unknown';
+ my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
if (!$self->{all} && ($#$marks == -1)) {
# no marks
diff --git a/src/PMG/RuleDB/Rule.pm b/src/PMG/RuleDB/Rule.pm
index c49ad21..e7c9146 100644
--- a/src/PMG/RuleDB/Rule.pm
+++ b/src/PMG/RuleDB/Rule.pm
@@ -12,7 +12,7 @@ sub new {
my ($type, $name, $priority, $active, $direction) = @_;
my $self = {
- name => $name // '',
+ name => PMG::Utils::try_decode_utf8($name) // '',
priority => $priority // 0,
active => $active // 0,
};
diff --git a/src/PMG/RuleDB/WhoRegex.pm b/src/PMG/RuleDB/WhoRegex.pm
index 37ec3aa..5c13604 100644
--- a/src/PMG/RuleDB/WhoRegex.pm
+++ b/src/PMG/RuleDB/WhoRegex.pm
@@ -4,6 +4,7 @@ use strict;
use warnings;
use DBI;
use Digest::SHA;
+use Encode qw(encode);
use PMG::Utils;
use PMG::RuleDB::Object;
@@ -43,7 +44,8 @@ sub load_attr {
defined($value) || die "undefined value: ERROR";
- my $obj = $class->new ($value, $ogroup);
+ my $decoded_value = PMG::Utils::try_decode_utf8($value);
+ my $obj = $class->new ($decoded_value, $ogroup);
$obj->{id} = $id;
$obj->{digest} = Digest::SHA::sha1_hex($id, $value, $ogroup);
@@ -59,6 +61,7 @@ sub save {
my $adr = $self->{address};
$adr =~ s/\\/\\\\/g;
+ $adr = encode('UTF-8', $adr);
if (defined ($self->{id})) {
# update
--
2.30.2
^ permalink raw reply [flat|nested] 14+ messages in thread
* [pmg-devel] [PATCH pmg-api v4 04/12] ruledb: encode e-mail addresses for syslog
2022-11-24 12:21 [pmg-devel] [PATCH pmg-api v4 00/12] ruledb - improve experience for non-ascii tests and mails Dominik Csapak
` (2 preceding siblings ...)
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 03/12] fix #2541 ruledb: encode relevant values as utf-8 in database Dominik Csapak
@ 2022-11-24 12:21 ` Dominik Csapak
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 05/12] partially fix #2465: handle smtputf8 addresses in the rule-system Dominik Csapak
` (8 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Dominik Csapak @ 2022-11-24 12:21 UTC (permalink / raw)
To: pmg-devel
From: Stoiko Ivanov <s.ivanov@proxmox.com>
as done in 114655f4fdb07c789a361b2f397f5345eafd16c6 for Accept and
Block.
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
src/PMG/RuleDB/BCC.pm | 19 +++++++++++++++++--
src/PMG/RuleDB/Notify.pm | 18 ++++++++++++++++--
src/PMG/RuleDB/Quarantine.pm | 16 ++++++++++++++--
src/PMG/RuleDB/Remove.pm | 8 +++++++-
4 files changed, 54 insertions(+), 7 deletions(-)
diff --git a/src/PMG/RuleDB/BCC.pm b/src/PMG/RuleDB/BCC.pm
index 6244dd9..0f016f8 100644
--- a/src/PMG/RuleDB/BCC.pm
+++ b/src/PMG/RuleDB/BCC.pm
@@ -3,6 +3,7 @@ package PMG::RuleDB::BCC;
use strict;
use warnings;
use DBI;
+use Encode qw(encode);
use PVE::SafeSyslog;
@@ -164,10 +165,24 @@ sub execute {
$entity, $msginfo->{sender}, \@bcc_targets,
$msginfo->{xforward}, $msginfo->{fqdn}, $param);
foreach (@bcc_targets) {
+ my $target = encode('UTF-8', $_);
if ($qid) {
- syslog('info', "%s: bcc to <%s> (rule: %s, %s)", $queue->{logid}, $_, $rulename, $qid);
+ syslog(
+ 'info',
+ "%s: bcc to <%s> (rule: %s, %s)",
+ $queue->{logid},
+ $target,
+ $rulename,
+ $qid,
+ );
} else {
- syslog('err', "%s: bcc to <%s> (rule: %s) failed", $queue->{logid}, $_, $rulename);
+ syslog(
+ 'err',
+ "%s: bcc to <%s> (rule: %s) failed",
+ $queue->{logid},
+ $target,
+ $rulename,
+ );
}
}
}
diff --git a/src/PMG/RuleDB/Notify.pm b/src/PMG/RuleDB/Notify.pm
index 8a9945b..68f9b4e 100644
--- a/src/PMG/RuleDB/Notify.pm
+++ b/src/PMG/RuleDB/Notify.pm
@@ -259,10 +259,24 @@ sub execute {
my $qid = PMG::Utils::reinject_mail(
$top, $from, \@targets, undef, $msginfo->{fqdn});
foreach (@targets) {
+ my $target = encode('UTF-8', $_);
if ($qid) {
- syslog('info', "%s: notify <%s> (rule: %s, %s)", $queue->{logid}, $_, $rulename, $qid);
+ syslog(
+ 'info',
+ "%s: notify <%s> (rule: %s, %s)",
+ $queue->{logid},
+ $target,
+ $rulename,
+ $qid,
+ );
} else {
- syslog ('err', "%s: notify <%s> (rule: %s) failed", $queue->{logid}, $_, $rulename);
+ syslog (
+ 'err',
+ "%s: notify <%s> (rule: %s) failed",
+ $queue->{logid},
+ $target,
+ $rulename,
+ );
}
}
}
diff --git a/src/PMG/RuleDB/Quarantine.pm b/src/PMG/RuleDB/Quarantine.pm
index 9d802fe..0fc8352 100644
--- a/src/PMG/RuleDB/Quarantine.pm
+++ b/src/PMG/RuleDB/Quarantine.pm
@@ -101,7 +101,13 @@ sub execute {
if (my $qid = $queue->quarantine_mail($ruledb, 'V', $entity, $tg, $msginfo, $vars, $ldap)) {
foreach (@$tg) {
- syslog ('info', "$queue->{logid}: moved mail for <%s> to virus quarantine - %s (rule: %s)", $_, $qid, $rulename);
+ syslog (
+ 'info',
+ "$queue->{logid}: moved mail for <%s> to virus quarantine - %s (rule: %s)",
+ encode('UTF-8',$_),
+ $qid,
+ $rulename,
+ );
}
$queue->set_status ($tg, 'delivered');
@@ -111,7 +117,13 @@ sub execute {
if (my $qid = $queue->quarantine_mail($ruledb, 'S', $entity, $tg, $msginfo, $vars, $ldap)) {
foreach (@$tg) {
- syslog ('info', "$queue->{logid}: moved mail for <%s> to spam quarantine - %s (rule: %s)", $_, $qid, $rulename);
+ syslog (
+ 'info',
+ "$queue->{logid}: moved mail for <%s> to spam quarantine - %s (rule: %s)",
+ encode('UTF-8',$_),
+ $qid,
+ $rulename,
+ );
}
$queue->set_status($tg, 'delivered');
diff --git a/src/PMG/RuleDB/Remove.pm b/src/PMG/RuleDB/Remove.pm
index da6c25f..e7c353c 100644
--- a/src/PMG/RuleDB/Remove.pm
+++ b/src/PMG/RuleDB/Remove.pm
@@ -235,7 +235,13 @@ sub execute {
}
foreach (@$tg) {
- syslog ('info', "$queue->{logid}: moved mail for <%s> to attachment quarantine - %s (rule: %s)", $_, $qid, $rulename);
+ syslog (
+ 'info',
+ "$queue->{logid}: moved mail for <%s> to attachment quarantine - %s (rule: %s)",
+ encode('UTF-8',$_),
+ $qid,
+ $rulename,
+ );
}
}
}
--
2.30.2
^ permalink raw reply [flat|nested] 14+ messages in thread
* [pmg-devel] [PATCH pmg-api v4 05/12] partially fix #2465: handle smtputf8 addresses in the rule-system
2022-11-24 12:21 [pmg-devel] [PATCH pmg-api v4 00/12] ruledb - improve experience for non-ascii tests and mails Dominik Csapak
` (3 preceding siblings ...)
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 04/12] ruledb: encode e-mail addresses for syslog Dominik Csapak
@ 2022-11-24 12:21 ` Dominik Csapak
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 06/12] quarantine: handle utf8 data Dominik Csapak
` (7 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Dominik Csapak @ 2022-11-24 12:21 UTC (permalink / raw)
To: pmg-devel
From: Stoiko Ivanov <s.ivanov@proxmox.com>
the envelope addresses are used in the rule-system for lookups and
statistics. When the mail is received with smtputf8 the addresses are
decoded (multi-byte perl-strings) and thus need encoding before using
them as parameter in a database query.
This patch encodes the addresses as utf-8 for the relevant queries
unconditionally, because envelope-senders should either be:
* (a subset of) ascii (no smtputf8) - which is invariant for utf-8
encoding
* valid utf-8 (smtputf8)
The patch does not address the issues with multi-byte addresses in our
LDAP-implementation (hence the partial fix), but should still be an
improvment for many deployments
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
src/PMG/MailQueue.pm | 10 ++++++----
src/PMG/RuleDB/Spam.pm | 5 +++--
src/bin/pmg-smtp-filter | 5 +++--
3 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/src/PMG/MailQueue.pm b/src/PMG/MailQueue.pm
index 2841b07..8355c30 100644
--- a/src/PMG/MailQueue.pm
+++ b/src/PMG/MailQueue.pm
@@ -6,6 +6,7 @@ use warnings;
use PVE::SafeSyslog;
use MIME::Parser;
use IO::File;
+use Encode;
use File::Sync;
use File::Basename;
use File::Path;
@@ -141,6 +142,7 @@ sub quarantinedb_insert {
my ($self, $ruledb, $lcid, $ldap, $qtype, $header, $sender, $file, $targets, $vars) = @_;
eval {
+ $sender = encode('UTF-8', $sender);
my $dbh = $ruledb->{dbh};
my $insert_cmds = "SELECT nextval ('cmailstore_id_seq'); INSERT INTO CMailStore " .
@@ -188,11 +190,11 @@ sub quarantinedb_insert {
if ($pmail eq lc ($r)) {
$receiver = "NULL";
} else {
- $receiver = $dbh->quote ($r);
+ $receiver = $dbh->quote (encode('UTF-8', $r));
}
- $pmail = $dbh->quote ($pmail);
+ $pmail = $dbh->quote (encode('UTF-8', $pmail));
$insert_cmds .= "INSERT INTO CMSReceivers " .
"(CMailStore_CID, CMailStore_RID, PMail, Receiver, TicketID, Status, MTime) " .
"VALUES ($lcid, currval ('cmailstore_id_seq'), $pmail, $receiver, $tid, 'N', $now); ";
@@ -294,8 +296,8 @@ sub quarantine_mail {
$entity->head->delete ('Return-Path');
# prepend Delivered-To and Return-Path (like QMAIL MAILDIR FORMAT)
- $entity->head->add ('Return-Path', join (',', $sender), 0);
- $entity->head->add ('Delivered-To', join (',', @$tg), 0);
+ $entity->head->add ('Return-Path', encode('UTF-8', join (',', $sender)), 0);
+ $entity->head->add ('Delivered-To', encode('UTF-8', join (',', @$tg)), 0);
$entity->print ($fh);
diff --git a/src/PMG/RuleDB/Spam.pm b/src/PMG/RuleDB/Spam.pm
index cc9a347..99056a3 100644
--- a/src/PMG/RuleDB/Spam.pm
+++ b/src/PMG/RuleDB/Spam.pm
@@ -4,6 +4,7 @@ use strict;
use warnings;
use DBI;
use Digest::SHA;
+use Encode qw(encode);
use Time::HiRes qw (gettimeofday);
use PVE::SafeSyslog;
@@ -135,8 +136,8 @@ sub get_blackwhite {
my $cond = '';
foreach my $r (@$targets) {
my $pmail = $msginfo->{pmail}->{$r} || lc ($r);
- my $qr = $dbh->quote ($pmail);
- $cond .= " OR " if $cond;
+ my $qr = $dbh->quote (encode('UTF-8', $pmail));
+ $cond .= " OR " if $cond;
$cond .= "pmail = $qr";
}
diff --git a/src/bin/pmg-smtp-filter b/src/bin/pmg-smtp-filter
index 45e68a7..911e9cd 100755
--- a/src/bin/pmg-smtp-filter
+++ b/src/bin/pmg-smtp-filter
@@ -4,6 +4,7 @@ use strict;
use warnings;
use Carp;
+use Encode qw(encode);
use Getopt::Long;
use Time::HiRes qw (usleep gettimeofday tv_interval);
use POSIX qw(:sys_wait_h errno_h signal_h);
@@ -791,10 +792,10 @@ sub handle_smtp {
$insert_cmds .= ($queue->{sa_score} || 0) . ',';
$insert_cmds .= $dbh->quote($queue->{vinfo}) . ',';
$insert_cmds .= $time_total . ',';
- $insert_cmds .= $dbh->quote($msginfo->{sender}) . ');';
+ $insert_cmds .= $dbh->quote(encode('UTF-8', $msginfo->{sender})) . ');';
foreach my $r (@{$msginfo->{targets}}) {
- my $tmp = $dbh->quote($r);
+ my $tmp = $dbh->quote(encode('UTF-8',$r));
my $blocked = $queue->{status}->{$r} eq 'blocked' ? 1 : 0;
$insert_cmds .= "INSERT INTO CReceivers (CStatistic_CID, CStatistic_RID, Receiver, Blocked) " .
"VALUES ($lcid, currval ('cstatistic_id_seq'), $tmp, '$blocked'); ";
--
2.30.2
^ permalink raw reply [flat|nested] 14+ messages in thread
* [pmg-devel] [PATCH pmg-api v4 06/12] quarantine: handle utf8 data
2022-11-24 12:21 [pmg-devel] [PATCH pmg-api v4 00/12] ruledb - improve experience for non-ascii tests and mails Dominik Csapak
` (4 preceding siblings ...)
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 05/12] partially fix #2465: handle smtputf8 addresses in the rule-system Dominik Csapak
@ 2022-11-24 12:21 ` Dominik Csapak
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 07/12] pmgqm: handle smtputf8 data Dominik Csapak
` (6 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Dominik Csapak @ 2022-11-24 12:21 UTC (permalink / raw)
To: pmg-devel
From: Stoiko Ivanov <s.ivanov@proxmox.com>
use try_decode_utf8 for sender/receiver of the smtp dialog and mail
headers since they're either ASCII (not SMTPUTF8) or UTF-8 (with SMTPUTF8)
encoded
change the mail regex for wl/bl to basic email/domain syntax without
the restriction of ascii only. (whitespace and backslashes are
forbidden, but they shouldn't normally occur in email addresses and
domains)
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
[ D: Added Commmit message ]
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
src/PMG/API2/Quarantine.pm | 10 +++++-----
src/PMG/HTMLMail.pm | 7 ++++---
src/PMG/Quarantine.pm | 13 +++++++------
src/PMG/RuleDB/Spam.pm | 12 ++++++------
4 files changed, 22 insertions(+), 20 deletions(-)
diff --git a/src/PMG/API2/Quarantine.pm b/src/PMG/API2/Quarantine.pm
index ddf7c04..819c78c 100644
--- a/src/PMG/API2/Quarantine.pm
+++ b/src/PMG/API2/Quarantine.pm
@@ -141,8 +141,8 @@ my $parse_header_info = sub {
my $sender = PMG::Utils::decode_rfc1522(PVE::Tools::trim($head->get('sender')));
$res->{sender} = $sender if $sender && ($sender ne $res->{from});
- $res->{envelope_sender} = $ref->{sender};
- $res->{receiver} = $ref->{receiver} // $ref->{pmail};
+ $res->{envelope_sender} = PMG::Utils::try_decode_utf8($ref->{sender});
+ $res->{receiver} = PMG::Utils::try_decode_utf8($ref->{receiver} // $ref->{pmail});
$res->{id} = 'C' . $ref->{cid} . 'R' . $ref->{rid} . 'T' . $ref->{ticketid};
$res->{time} = $ref->{time};
$res->{bytes} = $ref->{bytes};
@@ -437,7 +437,7 @@ __PACKAGE__->register_method ({
$sth->execute();
while (my $ref = $sth->fetchrow_hashref()) {
- push @$res, { mail => $ref->{pmail} };
+ push @$res, { mail => PMG::Utils::try_decode_utf8($ref->{pmail}) };
}
return $res;
@@ -532,7 +532,7 @@ __PACKAGE__->register_method ({
}
while (my $ref = $sth->fetchrow_hashref()) {
- push @$res, { mail => $ref->{pmail} };
+ push @$res, { mail => PMG::Utils::try_decode_utf8($ref->{pmail}) };
}
return $res;
@@ -569,7 +569,7 @@ my $quarantine_api = sub {
}
if ($check_pmail || $role eq 'quser') {
- $sth->execute($pmail);
+ $sth->execute(encode('UTF-8', $pmail));
} else {
$sth->execute();
}
diff --git a/src/PMG/HTMLMail.pm b/src/PMG/HTMLMail.pm
index 87f5c40..207c52c 100644
--- a/src/PMG/HTMLMail.pm
+++ b/src/PMG/HTMLMail.pm
@@ -192,9 +192,10 @@ sub read_raw_email {
# read header
my $header;
while (defined(my $line = <$fh>)) {
- $raw_header .= $line;
- chomp $line;
- push @$header, $line;
+ my $decoded_line = PMG::Utils::try_decode_utf8($line);
+ $raw_header .= $decoded_line;
+ chomp $decoded_line;
+ push @$header, $decoded_line;
last if $line =~ m/^\s*$/;
}
diff --git a/src/PMG/Quarantine.pm b/src/PMG/Quarantine.pm
index 77af8cc..aa6b948 100644
--- a/src/PMG/Quarantine.pm
+++ b/src/PMG/Quarantine.pm
@@ -3,6 +3,7 @@ package PMG::Quarantine;
use strict;
use warnings;
use Net::SMTP;
+use Encode qw(encode);
use PVE::SafeSyslog;
use PVE::Tools;
@@ -16,7 +17,7 @@ sub add_to_blackwhite {
my $name = $listname eq 'BL' ? 'BL' : 'WL';
my $oname = $listname eq 'BL' ? 'WL' : 'BL';
- my $qu = $dbh->quote ($username);
+ my $qu = $dbh->quote (encode('UTF-8', $username));
my $sth = $dbh->prepare(
"SELECT * FROM UserPrefs WHERE pmail = $qu AND (Name = 'BL' OR Name = 'WL')");
@@ -25,13 +26,13 @@ sub add_to_blackwhite {
my $list = { 'WL' => {}, 'BL' => {} };
while (my $ref = $sth->fetchrow_hashref()) {
- my $data = $ref->{data};
+ my $data = PMG::Utils::try_decode_utf8($ref->{data});
$data =~ s/[,;]/ /g;
my @alist = split('\s+', $data);
my $tmp = {};
foreach my $a (@alist) {
- if ($a =~ m/^[[:ascii:]]+$/) {
+ if ($a =~ m/^[^\s\\\@]+(?:\@[^\s\/\\\@]+)?$/) {
$tmp->{$a} = 1;
}
}
@@ -50,7 +51,7 @@ sub add_to_blackwhite {
if ($delete) {
delete($list->{$name}->{$v});
} else {
- if ($v =~ m/[[:^ascii:]]/) {
+ if ($v =~ m/[\s\\]/) {
die "email address '$v' contains invalid characters\n";
}
$list->{$name}->{$v} = 1;
@@ -58,8 +59,8 @@ sub add_to_blackwhite {
}
}
- my $wlist = $dbh->quote(join (',', keys %{$list->{WL}}) || '');
- my $blist = $dbh->quote(join (',', keys %{$list->{BL}}) || '');
+ my $wlist = $dbh->quote(encode('UTF-8', join (',', keys %{$list->{WL}})) || '');
+ my $blist = $dbh->quote(encode('UTF-8', join (',', keys %{$list->{BL}})) || '');
if (!$delete) {
my $maxlen = 200000;
diff --git a/src/PMG/RuleDB/Spam.pm b/src/PMG/RuleDB/Spam.pm
index 99056a3..bc1d422 100644
--- a/src/PMG/RuleDB/Spam.pm
+++ b/src/PMG/RuleDB/Spam.pm
@@ -94,7 +94,7 @@ sub parse_addrlist {
my $regex = $addr;
# SA like checks
$regex =~ s/[\000\\\(]/_/gs; # is this really necessasry ?
- $regex =~ s/([^\*\?_a-zA-Z0-9])/\\$1/g; # escape possible metachars
+ $regex =~ s/([^\*\?_\w])/\\$1/g; # escape possible metachars
$regex =~ tr/?/./; # replace "?" with "."
$regex =~ s/\*+/\.\*/g; # replace "*" with ".*"
@@ -149,13 +149,13 @@ sub get_blackwhite {
$sth->execute();
while (my $ref = $sth->fetchrow_hashref()) {
- my $pmail = lc ($ref->{pmail});
+ my $pmail = lc (PMG::Utils::try_decode_utf8($ref->{pmail}));
if ($ref->{name} eq 'WL') {
$target_info->{$pmail}->{whitelist} =
- parse_addrlist($ref->{data});
+ parse_addrlist(PMG::Utils::try_decode_utf8($ref->{data}));
} elsif ($ref->{name} eq 'BL') {
$target_info->{$pmail}->{blacklist} =
- parse_addrlist($ref->{data});
+ parse_addrlist(PMG::Utils::try_decode_utf8($ref->{data}));
}
}
@@ -205,7 +205,7 @@ sub what_match_targets {
($list = $queue->{blackwhite}->{$pmail}->{whitelist}) &&
check_addrlist($list, $queue->{all_from_addrs})) {
syslog('info', "%s: sender in user (%s) whitelist",
- $queue->{logid}, $pmail);
+ $queue->{logid}, encode('UTF-8', $pmail));
} else {
$target_info->{$t}->{marks} = []; # never add additional marks here
$target_info->{$t}->{spaminfo} = $info;
@@ -234,7 +234,7 @@ sub what_match_targets {
$target_info->{$t}->{marks} = [];
$target_info->{$t}->{spaminfo} = $info;
syslog ('info', "%s: sender in user (%s) blacklist",
- $queue->{logid}, $pmail);
+ $queue->{logid}, encode('UTF-8',$pmail));
}
}
}
--
2.30.2
^ permalink raw reply [flat|nested] 14+ messages in thread
* [pmg-devel] [PATCH pmg-api v4 07/12] pmgqm: handle smtputf8 data
2022-11-24 12:21 [pmg-devel] [PATCH pmg-api v4 00/12] ruledb - improve experience for non-ascii tests and mails Dominik Csapak
` (5 preceding siblings ...)
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 06/12] quarantine: handle utf8 data Dominik Csapak
@ 2022-11-24 12:21 ` Dominik Csapak
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 08/12] statistics: handle utf8 data Dominik Csapak
` (5 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Dominik Csapak @ 2022-11-24 12:21 UTC (permalink / raw)
To: pmg-devel
From: Stoiko Ivanov <s.ivanov@proxmox.com>
$data->{pmail} is both used in the template rendering ('Spam Report for
$pmail'), and as content for the To header, which need different
treatment. Thus introduce 'pmail_raw' additionally.
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
src/PMG/CLI/pmgqm.pm | 24 +++++++++++++-----------
src/PMG/Utils.pm | 7 ++++---
2 files changed, 17 insertions(+), 14 deletions(-)
diff --git a/src/PMG/CLI/pmgqm.pm b/src/PMG/CLI/pmgqm.pm
index dbec8ef..7293579 100755
--- a/src/PMG/CLI/pmgqm.pm
+++ b/src/PMG/CLI/pmgqm.pm
@@ -2,6 +2,7 @@ package PMG::CLI::pmgqm;
use strict;
use Data::Dumper;
+use Encode qw(encode);
use Template;
use MIME::Entity;
use HTML::Entities;
@@ -17,6 +18,7 @@ use PVE::SafeSyslog;
use PVE::Tools;
use PVE::INotify;
use PVE::CLIHandler;
+use PVE::JSONSchema qw(get_standard_option);
use PMG::RESTEnvironment;
use PMG::Utils;
@@ -57,7 +59,7 @@ sub get_item_data {
}
$item->{envelope_sender} = $ref->{sender};
- $item->{pmail} = $ref->{pmail};
+ $item->{pmail} = encode_entities(PMG::Utils::try_decode_utf8($ref->{pmail}));
$item->{receiver} = $ref->{receiver} || $ref->{pmail};
$item->{date} = strftime("%F", localtime($ref->{time}));
@@ -157,11 +159,10 @@ __PACKAGE__->register_method ({
parameters => {
additionalProperties => 0,
properties => {
- receiver => {
+ receiver => get_standard_option('pmg-email-address', {
description => "Generate report for a single email address. If not specified, generate reports for all users.",
- type => 'string', format => 'email',
optional => 1,
- },
+ }),
timespan => {
description => "Select time span.",
type => 'string',
@@ -175,11 +176,10 @@ __PACKAGE__->register_method ({
enum => ['short', 'verbose', 'custom'],
optional => 1,
},
- redirect => {
+ redirect => get_standard_option('pmg-email-address', {
description => "Redirect spam report email to this address.",
- type => 'string', format => 'email',
optional => 1,
- },
+ }),
debug => {
description => "Debug mode. Print raw email to stdout instead of sending them.",
type => 'boolean',
@@ -280,7 +280,7 @@ __PACKAGE__->register_method ({
"ORDER BY pmail, time, receiver");
if ($target) {
- $sth->execute($target);
+ $sth->execute(encode('UTF-8', $target));
} else {
$sth->execute();
}
@@ -302,16 +302,18 @@ __PACKAGE__->register_method ({
};
while (my $ref = $sth->fetchrow_hashref()) {
- if ($creceiver ne $ref->{pmail}) {
+ my $decoded_pmail = PMG::Utils::try_decode_utf8($ref->{pmail});
+ if ($creceiver ne $decoded_pmail) {
$finalize->() if $data;
$data = clone($global_data);
- $creceiver = $ref->{pmail};
+ $creceiver = $decoded_pmail;
$mailcount = 0;
- $data->{pmail} = $creceiver;
+ $data->{pmail} = encode_entities($decoded_pmail);
+ $data->{pmail_raw} = $ref->{pmail};
$data->{managehref} = "$protocol_fqdn_port/quarantine";
if ($data->{authmode} ne 'ldap') {
$data->{ticket} = PMG::Ticket::assemble_quarantine_ticket($data->{pmail});
diff --git a/src/PMG/Utils.pm b/src/PMG/Utils.pm
index cc30e67..5c9e873 100644
--- a/src/PMG/Utils.pm
+++ b/src/PMG/Utils.pm
@@ -1143,12 +1143,13 @@ sub rfc1522_to_html {
my ($d, $cs) = @$r;
if ($d) {
if ($cs) {
- $res .= encode_entities(decode($cs, $d));
+ $res .= encode('UTF-8', decode($cs, $d));
} else {
- $res .= encode_entities($d);
+ $res .= $d;
}
}
}
+ $res = encode_entities(decode('UTF-8', $res));
};
$res = $enc if $@;
@@ -1257,7 +1258,7 @@ sub finalize_report {
my $top = MIME::Entity->build(
Type => "multipart/related",
- To => $data->{pmail},
+ To => $data->{pmail_raw},
From => $mailfrom,
Subject => bencode_header(decode_entities($title)));
--
2.30.2
^ permalink raw reply [flat|nested] 14+ messages in thread
* [pmg-devel] [PATCH pmg-api v4 08/12] statistics: handle utf8 data.
2022-11-24 12:21 [pmg-devel] [PATCH pmg-api v4 00/12] ruledb - improve experience for non-ascii tests and mails Dominik Csapak
` (6 preceding siblings ...)
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 07/12] pmgqm: handle smtputf8 data Dominik Csapak
@ 2022-11-24 12:21 ` Dominik Csapak
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 09/12] quarantine: fix adding non-ascii senders to wl/bl Dominik Csapak
` (4 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Dominik Csapak @ 2022-11-24 12:21 UTC (permalink / raw)
To: pmg-devel
From: Stoiko Ivanov <s.ivanov@proxmox.com>
for SMTPUTF8, we have do decode the sender/receiver address, since
they might contain UTF-8 byte sequences.
before inserting them in the database, encode them again
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
[ D: Added commit message ]
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
src/PMG/Statistic.pm | 67 +++++++++++++++++++++++++++++++++-----------
1 file changed, 50 insertions(+), 17 deletions(-)
diff --git a/src/PMG/Statistic.pm b/src/PMG/Statistic.pm
index 6d27930..96ef61d 100755
--- a/src/PMG/Statistic.pm
+++ b/src/PMG/Statistic.pm
@@ -3,6 +3,7 @@ package PMG::Statistic;
use strict;
use warnings;
use DBI;
+use Encode qw(encode);
use Time::Local;
use Time::Zone;
@@ -545,6 +546,22 @@ my $compute_sql_orderby = sub {
return $orderby;
};
+sub user_stat_to_perlstring {
+ my ($entry) = @_;
+
+ my $res = { };
+
+ for my $a (keys %$entry) {
+ if ($a eq 'receiver' || $a eq 'sender' || $a eq 'contact') {
+ $res->{$a} = PMG::Utils::try_decode_utf8($entry->{$a});
+ } else {
+ $res->{$a} = $entry->{$a};
+ }
+ }
+
+ return $res;
+}
+
sub user_stat_contact_details {
my ($self, $rdb, $receiver, $limit, $sorters, $filter) = @_;
@@ -554,19 +571,21 @@ sub user_stat_contact_details {
my $cond_good_mail = $self->query_cond_good_mail ($from, $to);
+ my $filter_pattern = $rdb->{dbh}->quote(encode('UTF-8', "%${filter}%"));
+
my $query = "SELECT * FROM CStatistic, CReceivers " .
"WHERE cid = cstatistic_cid AND rid = cstatistic_rid AND $cond_good_mail " .
"AND NOT direction AND sender != '' AND receiver = ? " .
- ($filter ? "AND sender like " . $rdb->{dbh}->quote("%${filter}%") . ' ' : '') .
+ ($filter_pattern ? "AND sender like " . $filter_pattern . ' ' : '') .
"ORDER BY $orderby limit $limit";
my $sth = $rdb->{dbh}->prepare($query);
- $sth->execute($receiver);
+ $sth->execute(encode('UTF-8',$receiver));
my $res = [];
while (my $ref = $sth->fetchrow_hashref()) {
- push @$res, $ref;
+ push @$res, user_stat_to_perlstring($ref);
}
$sth->finish();
@@ -583,11 +602,14 @@ sub user_stat_contact {
my $cond_good_mail = $self->query_cond_good_mail($from, $to);
+ my $filter_pattern;
+ $filter_pattern = $rdb->{dbh}->quote(encode('UTF-8', "%${filter}%")) if $filter;
+
my $query = "SELECT receiver as contact, count(*) AS count, sum (bytes) AS bytes, " .
"count (virusinfo) as viruscount " .
"FROM CStatistic, CReceivers " .
"WHERE cid = cstatistic_cid AND rid = cstatistic_rid " .
- ($filter ? "AND receiver like " . $rdb->{dbh}->quote("%${filter}%") . ' ' : '') .
+ ($filter_pattern ? "AND receiver like " . $filter_pattern . ' ' : '') .
"AND $cond_good_mail AND NOT direction AND sender != '' ";
if ($advfilter) {
@@ -603,7 +625,7 @@ sub user_stat_contact {
my $res = [];
while (my $ref = $sth->fetchrow_hashref()) {
- push @$res, $ref;
+ push @$res, user_stat_to_perlstring($ref);
}
$sth->finish();
@@ -620,20 +642,23 @@ sub user_stat_sender_details {
my $cond_good_mail = $self->query_cond_good_mail($from, $to);
+ my $filter_pattern;
+ $filter_pattern = $rdb->{dbh}->quote(encode('UTF-8', "%${filter}%")) if $filter;
+
my $sth = $rdb->{dbh}->prepare(
"SELECT " .
"blocked, bytes, ptime, sender, receiver, spamlevel, time, virusinfo " .
"FROM CStatistic, CReceivers " .
"WHERE cid = cstatistic_cid AND rid = cstatistic_rid AND " .
"$cond_good_mail AND NOT direction AND sender = ? " .
- ($filter ? "AND receiver like " . $rdb->{dbh}->quote("%${filter}%") . ' ' : '') .
+ ($filter_pattern ? "AND receiver like " . $filter_pattern . ' ' : '') .
"ORDER BY $orderby limit $limit");
- $sth->execute($sender);
+ $sth->execute(encode('UTF-8',$sender));
my $res = [];
while (my $ref = $sth->fetchrow_hashref()) {
- push @$res, $ref;
+ push @$res, user_stat_to_perlstring($ref);
}
$sth->finish();
@@ -650,11 +675,14 @@ sub user_stat_sender {
my $cond_good_mail = $self->query_cond_good_mail ($from, $to);
+ my $filter_pattern;
+ $filter_pattern = $rdb->{dbh}->quote(encode('UTF-8', "%${filter}%")) if $filter;
+
my $query = "SELECT sender,count(*) AS count, sum (bytes) AS bytes, " .
"count (virusinfo) as viruscount, " .
"count (CASE WHEN spamlevel >= 3 THEN 1 ELSE NULL END) as spamcount " .
"FROM CStatistic WHERE $cond_good_mail AND NOT direction AND sender != '' " .
- ($filter ? "AND sender like " . $rdb->{dbh}->quote("%${filter}%") . ' ' : '') .
+ ($filter_pattern ? "AND sender like " . $filter_pattern . ' ' : '') .
"GROUP BY sender ORDER BY $orderby limit $limit";
my $sth = $rdb->{dbh}->prepare($query);
@@ -662,7 +690,7 @@ sub user_stat_sender {
my $res = [];
while (my $ref = $sth->fetchrow_hashref()) {
- push @$res, $ref;
+ push @$res, user_stat_to_perlstring($ref);
}
$sth->finish();
@@ -679,18 +707,21 @@ sub user_stat_receiver_details {
my $cond_good_mail = $self->query_cond_good_mail($from, $to);
+ my $filter_pattern;
+ $filter_pattern = $rdb->{dbh}->quote(encode('UTF-8', "%${filter}%")) if $filter;
+
my $sth = $rdb->{dbh}->prepare(
"SELECT blocked, bytes, ptime, sender, receiver, spamlevel, time, virusinfo " .
"FROM CStatistic, CReceivers " .
"WHERE cid = cstatistic_cid AND rid = cstatistic_rid AND $cond_good_mail AND receiver = ? " .
- ($filter ? "AND sender like " . $rdb->{dbh}->quote("%${filter}%") . ' ' : '') .
+ ($filter_pattern ? "AND sender like " . $filter_pattern . ' ' : '') .
"ORDER BY $orderby limit $limit");
- $sth->execute($receiver);
+ $sth->execute(encode('UTF-8',$receiver));
my $res = [];
while (my $ref = $sth->fetchrow_hashref()) {
- push @$res, $ref;
+ push @$res, user_stat_to_perlstring($ref);
}
$sth->finish();
@@ -708,6 +739,9 @@ sub user_stat_receiver {
my $cond_good_mail = $self->query_cond_good_mail ($from, $to) . " AND " .
"receiver IS NOT NULL AND receiver != ''";
+ my $filter_pattern;
+ $filter_pattern = $rdb->{dbh}->quote(encode('UTF-8', "%${filter}%")) if $filter;
+
my $query = "SELECT receiver, " .
"count(*) AS count, " .
"sum (bytes) AS bytes, " .
@@ -728,7 +762,7 @@ sub user_stat_receiver {
}
$query .= "AND $cond_good_mail and direction " .
- ($filter ? "AND receiver like " . $rdb->{dbh}->quote("%${filter}%") . ' ' : '') .
+ ($filter_pattern ? "AND receiver like " . $filter_pattern . ' ' : '') .
"GROUP BY receiver ORDER BY $orderby LIMIT $limit";
my $sth = $rdb->{dbh}->prepare($query);
@@ -736,7 +770,7 @@ sub user_stat_receiver {
my $res = [];
while (my $ref = $sth->fetchrow_hashref()) {
- push @$res, $ref;
+ push @$res, user_stat_to_perlstring($ref);
}
$sth->finish();
@@ -873,9 +907,8 @@ sub recent_receivers {
my $sth = $rdb->{dbh}->prepare($cmd);
$sth->execute ($from, $limit);
-
while (my $ref = $sth->fetchrow_hashref()) {
- push @$res, $ref;
+ push @$res, user_stat_to_perlstring($ref);
}
$sth->finish();
--
2.30.2
^ permalink raw reply [flat|nested] 14+ messages in thread
* [pmg-devel] [PATCH pmg-api v4 09/12] quarantine: fix adding non-ascii senders to wl/bl
2022-11-24 12:21 [pmg-devel] [PATCH pmg-api v4 00/12] ruledb - improve experience for non-ascii tests and mails Dominik Csapak
` (7 preceding siblings ...)
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 08/12] statistics: handle utf8 data Dominik Csapak
@ 2022-11-24 12:21 ` Dominik Csapak
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 10/12] utils: refactor rfc1522_to_html Dominik Csapak
` (3 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Dominik Csapak @ 2022-11-24 12:21 UTC (permalink / raw)
To: pmg-devel
by trying to decode them since they maybe were sent with SMTPUTF8
also make 'try_decode_utf8' an export of Utils and use that
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
src/PMG/API2/Quarantine.pm | 8 ++++----
src/PMG/Utils.pm | 1 +
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/src/PMG/API2/Quarantine.pm b/src/PMG/API2/Quarantine.pm
index 819c78c..f9e3e3d 100644
--- a/src/PMG/API2/Quarantine.pm
+++ b/src/PMG/API2/Quarantine.pm
@@ -24,7 +24,7 @@ use PVE::RESTHandler;
use PVE::INotify;
use PVE::APIServer::Formatter;
-use PMG::Utils;
+use PMG::Utils qw(try_decode_utf8);
use PMG::AccessControl;
use PMG::Config;
use PMG::DBTools;
@@ -141,8 +141,8 @@ my $parse_header_info = sub {
my $sender = PMG::Utils::decode_rfc1522(PVE::Tools::trim($head->get('sender')));
$res->{sender} = $sender if $sender && ($sender ne $res->{from});
- $res->{envelope_sender} = PMG::Utils::try_decode_utf8($ref->{sender});
- $res->{receiver} = PMG::Utils::try_decode_utf8($ref->{receiver} // $ref->{pmail});
+ $res->{envelope_sender} = try_decode_utf8($ref->{sender});
+ $res->{receiver} = try_decode_utf8($ref->{receiver} // $ref->{pmail});
$res->{id} = 'C' . $ref->{cid} . 'R' . $ref->{rid} . 'T' . $ref->{ticketid};
$res->{time} = $ref->{time};
$res->{bytes} = $ref->{bytes};
@@ -1164,7 +1164,7 @@ __PACKAGE__->register_method ({
for my $id (@idlist) {
my $ref = $get_and_check_mail->($id, $rpcenv, $dbh);
- my $sender = $get_real_sender->($ref);
+ my $sender = try_decode_utf8($get_real_sender->($ref));
if ($action eq 'whitelist') {
PMG::Quarantine::add_to_blackwhite($dbh, $ref->{pmail}, 'WL', [ $sender ]);
diff --git a/src/PMG/Utils.pm b/src/PMG/Utils.pm
index 5c9e873..463de6d 100644
--- a/src/PMG/Utils.pm
+++ b/src/PMG/Utils.pm
@@ -45,6 +45,7 @@ use base 'Exporter';
our @EXPORT_OK = qw(
postgres_admin_cmd
+try_decode_utf8
);
my $valid_pmg_realms = ['pam', 'pmg', 'quarantine'];
--
2.30.2
^ permalink raw reply [flat|nested] 14+ messages in thread
* [pmg-devel] [PATCH pmg-api v4 10/12] utils: refactor rfc1522_to_html
2022-11-24 12:21 [pmg-devel] [PATCH pmg-api v4 00/12] ruledb - improve experience for non-ascii tests and mails Dominik Csapak
` (8 preceding siblings ...)
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 09/12] quarantine: fix adding non-ascii senders to wl/bl Dominik Csapak
@ 2022-11-24 12:21 ` Dominik Csapak
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 11/12] ldap: improve unicode support Dominik Csapak
` (2 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Dominik Csapak @ 2022-11-24 12:21 UTC (permalink / raw)
To: pmg-devel
by reusing the utf8 decoding logic of decode_rfc1522
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
src/PMG/Utils.pm | 21 ++-------------------
1 file changed, 2 insertions(+), 19 deletions(-)
diff --git a/src/PMG/Utils.pm b/src/PMG/Utils.pm
index 463de6d..e20fc91 100644
--- a/src/PMG/Utils.pm
+++ b/src/PMG/Utils.pm
@@ -1135,25 +1135,8 @@ sub decode_rfc1522 {
sub rfc1522_to_html {
my ($enc) = @_;
- my $res = '';
-
- return '' if !$enc;
-
- eval {
- foreach my $r (MIME::Words::decode_mimewords($enc)) {
- my ($d, $cs) = @$r;
- if ($d) {
- if ($cs) {
- $res .= encode('UTF-8', decode($cs, $d));
- } else {
- $res .= $d;
- }
- }
- }
- $res = encode_entities(decode('UTF-8', $res));
- };
-
- $res = $enc if $@;
+ my $res = eval { encode_entities(decode_rfc1522($enc)) };
+ return $enc if $@;
return $res;
}
--
2.30.2
^ permalink raw reply [flat|nested] 14+ messages in thread
* [pmg-devel] [PATCH pmg-api v4 11/12] ldap: improve unicode support
2022-11-24 12:21 [pmg-devel] [PATCH pmg-api v4 00/12] ruledb - improve experience for non-ascii tests and mails Dominik Csapak
` (9 preceding siblings ...)
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 10/12] utils: refactor rfc1522_to_html Dominik Csapak
@ 2022-11-24 12:21 ` Dominik Csapak
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 12/12] statistics: refactor filter_text generation Dominik Csapak
2022-11-24 15:45 ` [pmg-devel] applied-series: [PATCH pmg-api v4 00/12] ruledb - improve experience for non-ascii tests and mails Thomas Lamprecht
12 siblings, 0 replies; 14+ messages in thread
From: Dominik Csapak @ 2022-11-24 12:21 UTC (permalink / raw)
To: pmg-devel
when we receive mails with SMTPUTF8 encoded sender/recipient,
we have to encode these values for our ldapcache to work,
otherwise pmg-smtp-filter fails with when trying to insert
perl strings.
on read from the cache we have to decode these values again so
that the webui can show them correctly
also encode/decode dn and group names, since according to rfc4514[0]
utf-8 should be ok here
0: https://www.ietf.org/rfc/rfc4514.txt
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
openldap/freeipa did not let me add an email with unicode characters,
but active directory did. so tested with that
src/PMG/LDAPCache.pm | 31 ++++++++++++++++++-------------
src/PMG/RuleDB/LDAP.pm | 11 +++++++----
src/PMG/RuleDB/LDAPUser.pm | 13 ++++++++-----
3 files changed, 33 insertions(+), 22 deletions(-)
diff --git a/src/PMG/LDAPCache.pm b/src/PMG/LDAPCache.pm
index f0698da..6cc4383 100755
--- a/src/PMG/LDAPCache.pm
+++ b/src/PMG/LDAPCache.pm
@@ -6,6 +6,7 @@ use File::Path;
use LockFile::Simple;
use Data::Dumper;
use DB_File;
+use Encode qw(encode decode);
use PVE::SafeSyslog;
use PVE::Tools qw(split_list);
@@ -491,7 +492,7 @@ sub get_groups {
my $status = $dbh->seq($key, $value, R_FIRST());
while ($status == 0) {
- $res->{$value} = $key;
+ $res->{$value} = PMG::Utils::try_decode_utf8($key);
$status = $dbh->seq($key, $value, R_NEXT());
}
@@ -515,9 +516,9 @@ sub get_users {
while ($status == 0) {
my ($pmail, $account, $dn) = unpack('n/a* n/a* n/a*', $value);
$res->{$key} = {
- pmail => $pmail,
- account => $account,
- dn => $dn,
+ pmail => PMG::Utils::try_decode_utf8($pmail),
+ account => PMG::Utils::try_decode_utf8($account),
+ dn => PMG::Utils::try_decode_utf8($dn),
};
$status = $dbh->seq($key, $value, R_NEXT());
}
@@ -595,7 +596,7 @@ sub list_addresses {
return undef if !$dbhmails || !$dbhusers;
- $mail = lc($mail);
+ $mail = encode('UTF-8', lc($mail));
my $res = [];
@@ -609,7 +610,7 @@ sub list_addresses {
my ($pmail, $account, $dn) = unpack('n/a* n/a* n/a*', $rdata);
- push @$res, { primary => 1, email => $pmail };
+ push @$res, { primary => 1, email => PMG::Utils::try_decode_utf8($pmail) };
my $key = 0 ;
my $value = "" ;
@@ -617,7 +618,7 @@ sub list_addresses {
while ($status == 0) {
if ($value == $cuid && $key ne $pmail) {
- push @$res, { primary => 0, email => $key };
+ push @$res, { primary => 0, email => PMG::Utils::try_decode_utf8($key) };
}
$status = $dbhmails->seq($key, $value, R_NEXT());
}
@@ -631,7 +632,7 @@ sub mail_exists {
my $dbh = $self->{dbstat}->{mails}->{dbh};
return 0 if !$dbh;
- $mail = lc($mail);
+ $mail = encode('UTF-8', lc($mail));
my $res;
$dbh->get($mail, $res);
@@ -644,7 +645,7 @@ sub account_exists {
my $dbh = $self->{dbstat}->{accounts}->{dbh};
return 0 if !$dbh;
- $account = lc($account);
+ $account = encode('UTF-8', lc($account));
my $res;
$dbh->get($account, $res);
@@ -657,6 +658,8 @@ sub group_exists {
my $dbh = $self->{dbstat}->{groups}->{dbh};
return 0 if !$dbh;
+ $group = encode('UTF-8', $group);
+
my $res;
$dbh->get($group, $res);
return $res;
@@ -669,8 +672,8 @@ sub account_has_address {
my $dbhaccounts = $self->{dbstat}->{accounts}->{dbh};
return 0 if !$dbhmails || !$dbhaccounts;
- $account = lc($account);
- $mail = lc($mail);
+ $account = encode('UTF-8', lc($account));
+ $mail = encode('UTF-8', lc($mail));
my $accid;
$dbhaccounts->get($account, $accid);
@@ -692,12 +695,14 @@ sub user_in_group {
return 0 if !$dbhmails || !$dbhgroups || !$dbhmemberof;
- $mail = lc($mail);
+ $mail = encode('UTF-8', lc($mail));
my $cuid;
$dbhmails->get($mail, $cuid);
return 0 if !$cuid;
+ $group = encode('UTF-8', $group);
+
my $groupid;
$dbhgroups->get($group, $groupid);
return 0 if !$groupid;
@@ -715,7 +720,7 @@ sub account_info {
return undef if !$dbhmails || !$dbhusers;
- $mail = lc($mail);
+ $mail = encode('UTF-8', lc($mail));
my $res = {};
diff --git a/src/PMG/RuleDB/LDAP.pm b/src/PMG/RuleDB/LDAP.pm
index a132499..3fcf5f0 100644
--- a/src/PMG/RuleDB/LDAP.pm
+++ b/src/PMG/RuleDB/LDAP.pm
@@ -3,6 +3,7 @@ package PMG::RuleDB::LDAP;
use strict;
use warnings;
use DBI;
+use Encode qw(encode);
use PVE::Exception qw(raise_param_exc);
@@ -45,12 +46,14 @@ sub load_attr {
defined($value) || die "undefined value: ERROR";
+ my $decoded = PMG::Utils::try_decode_utf8($value);
+
my $obj;
- if ($value =~ m/^([^:]*):(.*)$/) {
+ if ($decoded =~ m/^([^:]*):(.*)$/) {
$obj = $class->new($2, $1, $ogroup);
- $obj->{digest} = Digest::SHA::sha1_hex($id, $2, $1, $ogroup);
+ $obj->{digest} = Digest::SHA::sha1_hex($id, encode('UTF-8', $2), encode('UTF-8', $1), $ogroup);
} else {
- $obj = $class->new($value, '', $ogroup);
+ $obj = $class->new($decoded, '', $ogroup);
$obj->{digest} = Digest::SHA::sha1_hex($id, $value, '#', $ogroup);
}
@@ -69,7 +72,7 @@ sub save {
my $grp = $self->{ldapgroup};
my $profile = $self->{profile};
- my $confdata = "$profile:$grp";
+ my $confdata = encode('UTF-8', "$profile:$grp");
if (defined ($self->{id})) {
# update
diff --git a/src/PMG/RuleDB/LDAPUser.pm b/src/PMG/RuleDB/LDAPUser.pm
index 022d784..345decb 100644
--- a/src/PMG/RuleDB/LDAPUser.pm
+++ b/src/PMG/RuleDB/LDAPUser.pm
@@ -4,6 +4,7 @@ use strict;
use warnings;
use DBI;
use Digest::SHA;
+use Encode qw(encode);
use PVE::INotify;
@@ -46,13 +47,15 @@ sub load_attr {
my $class = ref($type) || $type;
defined($value) || die "undefined value: ERROR";
-
+
+ my $decoded = PMG::Utils::try_decode_utf8($value);
+
my $obj;
- if ($value =~ m/^([^:]*):(.*)$/) {
+ if ($decoded =~ m/^([^:]*):(.*)$/) {
$obj = $class->new($2, $1, $ogroup);
- $obj->{digest} = Digest::SHA::sha1_hex($id, $2, $1, $ogroup);
+ $obj->{digest} = Digest::SHA::sha1_hex($id, encode('UTF-8', $2), encode('UTF-8', $1), $ogroup);
} else {
- $obj = $class->new($value, '', $ogroup);
+ $obj = $class->new($decoded, '', $ogroup);
$obj->{digest} = Digest::SHA::sha1_hex ($id, $value, '#', $ogroup);
}
@@ -71,7 +74,7 @@ sub save {
my $user = $self->{ldapuser};
my $profile = $self->{profile};
- my $confdata = "$profile:$user";
+ my $confdata = encode('UTF-8', "$profile:$user");
if (defined($self->{id})) {
# update
--
2.30.2
^ permalink raw reply [flat|nested] 14+ messages in thread
* [pmg-devel] [PATCH pmg-api v4 12/12] statistics: refactor filter_text generation
2022-11-24 12:21 [pmg-devel] [PATCH pmg-api v4 00/12] ruledb - improve experience for non-ascii tests and mails Dominik Csapak
` (10 preceding siblings ...)
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 11/12] ldap: improve unicode support Dominik Csapak
@ 2022-11-24 12:21 ` Dominik Csapak
2022-11-24 15:45 ` [pmg-devel] applied-series: [PATCH pmg-api v4 00/12] ruledb - improve experience for non-ascii tests and mails Thomas Lamprecht
12 siblings, 0 replies; 14+ messages in thread
From: Dominik Csapak @ 2022-11-24 12:21 UTC (permalink / raw)
To: pmg-devel
it's basically always the same and having one place where we encode
the filter, makes it more legible
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
src/PMG/Statistic.pm | 41 ++++++++++++++++++++++++-----------------
1 file changed, 24 insertions(+), 17 deletions(-)
diff --git a/src/PMG/Statistic.pm b/src/PMG/Statistic.pm
index 96ef61d..8d63b40 100755
--- a/src/PMG/Statistic.pm
+++ b/src/PMG/Statistic.pm
@@ -562,6 +562,18 @@ sub user_stat_to_perlstring {
return $res;
}
+my sub get_filter_text {
+ my ($dbh, $field, $filter) = @_;
+
+ if (!$filter || !$field) {
+ return '';
+ }
+
+ my $pattern = $dbh->quote(encode('UTF-8', "%${filter}%"));
+
+ return "AND ${field} like ${pattern} ";
+}
+
sub user_stat_contact_details {
my ($self, $rdb, $receiver, $limit, $sorters, $filter) = @_;
@@ -571,12 +583,12 @@ sub user_stat_contact_details {
my $cond_good_mail = $self->query_cond_good_mail ($from, $to);
- my $filter_pattern = $rdb->{dbh}->quote(encode('UTF-8', "%${filter}%"));
+ my $filter_text = get_filter_text($rdb->{dbh}, 'sender', $filter);
my $query = "SELECT * FROM CStatistic, CReceivers " .
"WHERE cid = cstatistic_cid AND rid = cstatistic_rid AND $cond_good_mail " .
"AND NOT direction AND sender != '' AND receiver = ? " .
- ($filter_pattern ? "AND sender like " . $filter_pattern . ' ' : '') .
+ $filter_text .
"ORDER BY $orderby limit $limit";
my $sth = $rdb->{dbh}->prepare($query);
@@ -602,14 +614,13 @@ sub user_stat_contact {
my $cond_good_mail = $self->query_cond_good_mail($from, $to);
- my $filter_pattern;
- $filter_pattern = $rdb->{dbh}->quote(encode('UTF-8', "%${filter}%")) if $filter;
+ my $filter_text = get_filter_text($rdb->{dbh}, 'receiver', $filter);
my $query = "SELECT receiver as contact, count(*) AS count, sum (bytes) AS bytes, " .
"count (virusinfo) as viruscount " .
"FROM CStatistic, CReceivers " .
"WHERE cid = cstatistic_cid AND rid = cstatistic_rid " .
- ($filter_pattern ? "AND receiver like " . $filter_pattern . ' ' : '') .
+ $filter_text .
"AND $cond_good_mail AND NOT direction AND sender != '' ";
if ($advfilter) {
@@ -642,8 +653,7 @@ sub user_stat_sender_details {
my $cond_good_mail = $self->query_cond_good_mail($from, $to);
- my $filter_pattern;
- $filter_pattern = $rdb->{dbh}->quote(encode('UTF-8', "%${filter}%")) if $filter;
+ my $filter_text = get_filter_text($rdb->{dbh}, 'receiver', $filter);
my $sth = $rdb->{dbh}->prepare(
"SELECT " .
@@ -651,7 +661,7 @@ sub user_stat_sender_details {
"FROM CStatistic, CReceivers " .
"WHERE cid = cstatistic_cid AND rid = cstatistic_rid AND " .
"$cond_good_mail AND NOT direction AND sender = ? " .
- ($filter_pattern ? "AND receiver like " . $filter_pattern . ' ' : '') .
+ $filter_text .
"ORDER BY $orderby limit $limit");
$sth->execute(encode('UTF-8',$sender));
@@ -675,14 +685,13 @@ sub user_stat_sender {
my $cond_good_mail = $self->query_cond_good_mail ($from, $to);
- my $filter_pattern;
- $filter_pattern = $rdb->{dbh}->quote(encode('UTF-8', "%${filter}%")) if $filter;
+ my $filter_text = get_filter_text($rdb->{dbh}, 'sender', $filter);
my $query = "SELECT sender,count(*) AS count, sum (bytes) AS bytes, " .
"count (virusinfo) as viruscount, " .
"count (CASE WHEN spamlevel >= 3 THEN 1 ELSE NULL END) as spamcount " .
"FROM CStatistic WHERE $cond_good_mail AND NOT direction AND sender != '' " .
- ($filter_pattern ? "AND sender like " . $filter_pattern . ' ' : '') .
+ $filter_text .
"GROUP BY sender ORDER BY $orderby limit $limit";
my $sth = $rdb->{dbh}->prepare($query);
@@ -707,14 +716,13 @@ sub user_stat_receiver_details {
my $cond_good_mail = $self->query_cond_good_mail($from, $to);
- my $filter_pattern;
- $filter_pattern = $rdb->{dbh}->quote(encode('UTF-8', "%${filter}%")) if $filter;
+ my $filter_text = get_filter_text($rdb->{dbh}, 'sender', $filter);
my $sth = $rdb->{dbh}->prepare(
"SELECT blocked, bytes, ptime, sender, receiver, spamlevel, time, virusinfo " .
"FROM CStatistic, CReceivers " .
"WHERE cid = cstatistic_cid AND rid = cstatistic_rid AND $cond_good_mail AND receiver = ? " .
- ($filter_pattern ? "AND sender like " . $filter_pattern . ' ' : '') .
+ $filter_text .
"ORDER BY $orderby limit $limit");
$sth->execute(encode('UTF-8',$receiver));
@@ -739,8 +747,7 @@ sub user_stat_receiver {
my $cond_good_mail = $self->query_cond_good_mail ($from, $to) . " AND " .
"receiver IS NOT NULL AND receiver != ''";
- my $filter_pattern;
- $filter_pattern = $rdb->{dbh}->quote(encode('UTF-8', "%${filter}%")) if $filter;
+ my $filter_text = get_filter_text($rdb->{dbh}, 'receiver', $filter);
my $query = "SELECT receiver, " .
"count(*) AS count, " .
@@ -762,7 +769,7 @@ sub user_stat_receiver {
}
$query .= "AND $cond_good_mail and direction " .
- ($filter_pattern ? "AND receiver like " . $filter_pattern . ' ' : '') .
+ $filter_text .
"GROUP BY receiver ORDER BY $orderby LIMIT $limit";
my $sth = $rdb->{dbh}->prepare($query);
--
2.30.2
^ permalink raw reply [flat|nested] 14+ messages in thread
* [pmg-devel] applied-series: [PATCH pmg-api v4 00/12] ruledb - improve experience for non-ascii tests and mails
2022-11-24 12:21 [pmg-devel] [PATCH pmg-api v4 00/12] ruledb - improve experience for non-ascii tests and mails Dominik Csapak
` (11 preceding siblings ...)
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 12/12] statistics: refactor filter_text generation Dominik Csapak
@ 2022-11-24 15:45 ` Thomas Lamprecht
12 siblings, 0 replies; 14+ messages in thread
From: Thomas Lamprecht @ 2022-11-24 15:45 UTC (permalink / raw)
To: Dominik Csapak, pmg-devel
Am 24/11/2022 um 13:21 schrieb Dominik Csapak:
> as replacement for the v3 from stoiko (i did not resend the gui patches,
> as they are ok and still valid)
>
> i added some of my notes as follow ups (ldap/bwlist/refactors)
> as well as modified some commit messages of stoiko
>
> i tested with various configurations with ldap, including
> unicode characters of the local part of the account/email
> (i only got this to work in active directory...)
>
> Dominik Csapak (4):
> quarantine: fix adding non-ascii senders to wl/bl
> utils: refactor rfc1522_to_html
> ldap: improve unicode support
> statistics: refactor filter_text generation
>
> Stoiko Ivanov (8):
> utils: return perl string from decode_rfc1522
> ruledb: properly substitute prox_vars in headers
> fix #2541 ruledb: encode relevant values as utf-8 in database
> ruledb: encode e-mail addresses for syslog
> partially fix #2465: handle smtputf8 addresses in the rule-system
> quarantine: handle utf8 data
> pmgqm: handle smtputf8 data
> statistics: handle utf8 data.
>
applied series, thanks!
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2022-11-24 15:46 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-24 12:21 [pmg-devel] [PATCH pmg-api v4 00/12] ruledb - improve experience for non-ascii tests and mails Dominik Csapak
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 01/12] utils: return perl string from decode_rfc1522 Dominik Csapak
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 02/12] ruledb: properly substitute prox_vars in headers Dominik Csapak
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 03/12] fix #2541 ruledb: encode relevant values as utf-8 in database Dominik Csapak
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 04/12] ruledb: encode e-mail addresses for syslog Dominik Csapak
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 05/12] partially fix #2465: handle smtputf8 addresses in the rule-system Dominik Csapak
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 06/12] quarantine: handle utf8 data Dominik Csapak
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 07/12] pmgqm: handle smtputf8 data Dominik Csapak
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 08/12] statistics: handle utf8 data Dominik Csapak
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 09/12] quarantine: fix adding non-ascii senders to wl/bl Dominik Csapak
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 10/12] utils: refactor rfc1522_to_html Dominik Csapak
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 11/12] ldap: improve unicode support Dominik Csapak
2022-11-24 12:21 ` [pmg-devel] [PATCH pmg-api v4 12/12] statistics: refactor filter_text generation Dominik Csapak
2022-11-24 15:45 ` [pmg-devel] applied-series: [PATCH pmg-api v4 00/12] ruledb - improve experience for non-ascii tests and mails Thomas Lamprecht
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox