* [pmg-devel] [PATCH pmg-api v2 0/8] ruledb - improve experience for non-ascii tests and mails
@ 2022-11-17 15:06 Stoiko Ivanov
2022-11-17 15:06 ` [pmg-devel] [PATCH pmg-api v2 1/8] utils: return perl string from decode_rfc1522 Stoiko Ivanov
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: Stoiko Ivanov @ 2022-11-17 15:06 UTC (permalink / raw)
To: pmg-devel
v1->v2:
* dropped already applied patches
* added a patch for one further glitch in ModField/Notify actions (when
parsing/replacing non-ascii characters) - patch 1/5+2/5
* added support for utf-8 data in the mailflow additionally for:
** quarantine API handlng
** user BL/WL (the GUI still needs adaptation to parse e-mail-addresses
more liberally - but else it seems to work)
** pmgqm (spamreports)
** statistics
still missing support for:
* LDAP
* Who Objects
huge thanks to Dominik for taking the time to review and test the v1!
original cover-letter for v1:
this patchseries partially fixes #2465 and #2541, two quite often reported
issues, which are causing quite a disappointing experience for users
in non-ascii only environments
the main assumption of the patches are:
* envelope addresses are either ascii or utf-8 (latter only with smtputf8)
* thus we can unconditionally de-/encode envelope addresses for database
results/lookups
* the matching in the rule-objects will see the relevant parts of the mail
as properly encoded perl-strings (with multi-byte characters - e.g. the
euro sign as \x{20ac} instead of \x{e2}\x{82}\x{ac})
(I did a bit of testing to verify them, by e.g. sending an ISO-8859-1
encoded mail and matching for an umlaut in the subject)
While going through the RuleDB classes I remembered, that we have a few
pieces of legacy objects (Attach, ReportSpam, Counter actions) there, and
went ahead with deprecating them (initially I simply deleted them, but
decided to be more cautious and just log the deprecation until 8.0, when
we can drop them explicitly). They cannot be instantiated currently (short
of a direct insert into the database) - but I don't know if they were ever
used in pre 5.0 times in their current form. - patch 2/5.
Out of scope of the series for now:
* utf-8 support in the LDAP subsystem (deployments with a configured LDAP
profile still won't be able to process smtputf8 mails) - mostly until I
get around to create test-environment with the appropriate schema for
having non-ascii mail-addresses
* Domain/Email objects - did not find the time to consider how to store
them most sensibly (puny-code, utf-8) and if the choice should be
carried over to all of our 'email' formats (it probably shouldn't)
patches 1/5 and 4/5 address 2 small bugs I ran into while testing
Given that I quite often miss a few fine points or use-cases I'd be very
grateful for some more experimenting/testing!
Stoiko Ivanov (8):
utils: return perl string from decode_rfc1522
ruledb: properly substitute prox_vars in headers
fix #2541 ruledb: encode relevant values as utf-8 in database
ruledb: encode e-mail addresses for syslog
partially fix #2465: handle smtputf8 addresses in the rule-system
quarantine: handle utf8 data
pmgqm: handle smtputf8 data
statistics: handle utf8 data.
src/PMG/API2/Quarantine.pm | 16 ++++----
src/PMG/CLI/pmgqm.pm | 24 ++++++------
src/PMG/HTMLMail.pm | 7 ++--
src/PMG/MailQueue.pm | 10 +++--
src/PMG/Quarantine.pm | 13 ++++---
src/PMG/RuleDB.pm | 24 ++++++++----
src/PMG/RuleDB/Accept.pm | 2 +-
src/PMG/RuleDB/BCC.pm | 23 +++++++++--
src/PMG/RuleDB/Block.pm | 2 +-
src/PMG/RuleDB/Disclaimer.pm | 2 +-
src/PMG/RuleDB/Group.pm | 4 +-
src/PMG/RuleDB/MatchField.pm | 8 +++-
src/PMG/RuleDB/MatchFilename.pm | 5 ++-
src/PMG/RuleDB/ModField.pm | 19 +++-------
src/PMG/RuleDB/Notify.pm | 24 +++++++++---
src/PMG/RuleDB/Quarantine.pm | 19 ++++++++--
src/PMG/RuleDB/Remove.pm | 20 +++++++---
src/PMG/RuleDB/Rule.pm | 2 +-
src/PMG/RuleDB/Spam.pm | 17 +++++----
src/PMG/RuleDB/WhoRegex.pm | 5 ++-
src/PMG/Statistic.pm | 67 ++++++++++++++++++++++++---------
src/PMG/Utils.pm | 39 ++++++++++++++++---
src/bin/pmg-smtp-filter | 7 ++--
23 files changed, 243 insertions(+), 116 deletions(-)
--
2.30.2
^ permalink raw reply [flat|nested] 9+ messages in thread
* [pmg-devel] [PATCH pmg-api v2 1/8] utils: return perl string from decode_rfc1522
2022-11-17 15:06 [pmg-devel] [PATCH pmg-api v2 0/8] ruledb - improve experience for non-ascii tests and mails Stoiko Ivanov
@ 2022-11-17 15:06 ` Stoiko Ivanov
2022-11-17 15:06 ` [pmg-devel] [PATCH pmg-api v2 2/8] ruledb: properly substitute prox_vars in headers Stoiko Ivanov
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Stoiko Ivanov @ 2022-11-17 15:06 UTC (permalink / raw)
To: pmg-devel
decode_rfc1522 is a more robust version of decode_mimewords (in
scalar context) - adapt it to return a perlstring, under the
assumption that data is utf-8 encoded (holds true for ascii and
smtputf8 mails)
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
src/PMG/Utils.pm | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/src/PMG/Utils.pm b/src/PMG/Utils.pm
index cef232b..77abde4 100644
--- a/src/PMG/Utils.pm
+++ b/src/PMG/Utils.pm
@@ -1088,6 +1088,7 @@ sub decode_to_html {
return $res;
}
+# assume enc contains utf-8 and mime-encoded data a perl-string (with wide characters)
sub decode_rfc1522 {
my ($enc) = @_;
@@ -1100,7 +1101,7 @@ sub decode_rfc1522 {
my ($d, $cs) = @$r;
if ($d) {
if ($cs) {
- $res .= decode($cs, $d);
+ $res .= encode('UTF-8', decode($cs, $d));
} else {
$res .= $d;
}
@@ -1108,8 +1109,11 @@ sub decode_rfc1522 {
}
};
- $res = $enc if $@;
-
+ if ($@) {
+ $res = $enc;
+ } else {
+ $res = decode('UTF-8', $res);
+ }
return $res;
}
--
2.30.2
^ permalink raw reply [flat|nested] 9+ messages in thread
* [pmg-devel] [PATCH pmg-api v2 2/8] ruledb: properly substitute prox_vars in headers
2022-11-17 15:06 [pmg-devel] [PATCH pmg-api v2 0/8] ruledb - improve experience for non-ascii tests and mails Stoiko Ivanov
2022-11-17 15:06 ` [pmg-devel] [PATCH pmg-api v2 1/8] utils: return perl string from decode_rfc1522 Stoiko Ivanov
@ 2022-11-17 15:06 ` Stoiko Ivanov
2022-11-17 15:06 ` [pmg-devel] [PATCH pmg-api v2 3/8] fix #2541 ruledb: encode relevant values as utf-8 in database Stoiko Ivanov
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Stoiko Ivanov @ 2022-11-17 15:06 UTC (permalink / raw)
To: pmg-devel
by storing the variables as perl-string (not mime-encoded, and not
utf-8 encoded), and appropriately dealing with multi-line values to
input (folding the headers and encoding as mime).
This fixes another glitch not caught by
d3d6b5dff9e4447d16cb92e0fdf26f67d9384423
the Subject was always displayed with a '?' in the end (due to the
(quoted-printable encoded) \n added).
Additionally adapt the other callsites of PMG::Utils::subst_values
where applicable.
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
src/PMG/RuleDB/BCC.pm | 2 +-
src/PMG/RuleDB/ModField.pm | 13 +------------
src/PMG/RuleDB/Notify.pm | 4 ++--
src/PMG/Utils.pm | 17 +++++++++++++++++
src/bin/pmg-smtp-filter | 2 +-
5 files changed, 22 insertions(+), 16 deletions(-)
diff --git a/src/PMG/RuleDB/BCC.pm b/src/PMG/RuleDB/BCC.pm
index d364690..4867d83 100644
--- a/src/PMG/RuleDB/BCC.pm
+++ b/src/PMG/RuleDB/BCC.pm
@@ -117,7 +117,7 @@ sub execute {
my $rulename = $vars->{RULE} // 'unknown';
- my $bcc_to = PMG::Utils::subst_values($self->{target}, $vars);
+ my $bcc_to = PMG::Utils::subst_values_for_header($self->{target}, $vars);
if ($bcc_to =~ m/^\s*$/) {
# this happens if a notification is triggered by bounce mails
diff --git a/src/PMG/RuleDB/ModField.pm b/src/PMG/RuleDB/ModField.pm
index 4ebb618..34108d1 100644
--- a/src/PMG/RuleDB/ModField.pm
+++ b/src/PMG/RuleDB/ModField.pm
@@ -5,7 +5,6 @@ use warnings;
use DBI;
use Digest::SHA;
use Encode qw(encode decode);
-use MIME::Words qw(encode_mimewords);
use PMG::Utils;
use PMG::ModGroup;
@@ -98,17 +97,7 @@ sub execute {
my ($self, $queue, $ruledb, $mod_group, $targets,
$msginfo, $vars, $marks) = @_;
- my $fvalue = '';
-
- foreach my $line (split('\r?\n\s*',PMG::Utils::subst_values ($self->{field_value}, $vars))) {
- $fvalue .= "\n" if $fvalue;
- $fvalue .= encode_mimewords(encode('UTF-8', $line), 'Charset' => 'UTF-8');
- }
-
- # support for multiline values (i.e. __SPAM_INFO__)
- $fvalue =~ s/\n/\n\t/sg; # indent content
- $fvalue =~ s/\n\s*\n//sg; # remove empty line
- $fvalue =~ s/\n?\s*$//s; # remove trailing spaces
+ my $fvalue = PMG::Utils::subst_values_for_header($self->{field_value}, $vars);
my $subgroups = $mod_group->subgroups($targets);
diff --git a/src/PMG/RuleDB/Notify.pm b/src/PMG/RuleDB/Notify.pm
index d67221e..7b38e0d 100644
--- a/src/PMG/RuleDB/Notify.pm
+++ b/src/PMG/RuleDB/Notify.pm
@@ -211,8 +211,8 @@ sub execute {
my $rulename = $vars->{RULE} // 'unknown';
my $body = PMG::Utils::subst_values($self->{body}, $vars);
- my $subject = PMG::Utils::subst_values($self->{subject}, $vars);
- my $to = PMG::Utils::subst_values($self->{to}, $vars);
+ my $subject = PMG::Utils::subst_values_for_header($self->{subject}, $vars);
+ my $to = PMG::Utils::subst_values_for_header($self->{to}, $vars);
if ($to =~ m/^\s*$/) {
# this happens if a notification is triggered by bounce mails
diff --git a/src/PMG/Utils.pm b/src/PMG/Utils.pm
index 77abde4..cc7e9b3 100644
--- a/src/PMG/Utils.pm
+++ b/src/PMG/Utils.pm
@@ -203,6 +203,23 @@ sub subst_values {
return $body;
}
+sub subst_values_for_header {
+ my ($header, $dh) = @_;
+
+ my $res = '';
+ foreach my $line (split('\r?\n\s*', subst_values ($header, $dh))) {
+ $res .= "\n" if $res;
+ $res .= MIME::Words::encode_mimewords(encode('UTF-8', $line), 'Charset' => 'UTF-8');
+ }
+
+ # support for multiline values (i.e. __SPAM_INFO__)
+ $res =~ s/\n/\n\t/sg; # indent content
+ $res =~ s/\n\s*\n//sg; # remove empty line
+ $res =~ s/\n?\s*$//s; # remove trailing spaces
+
+ return $res;
+}
+
sub reinject_mail {
my ($entity, $sender, $targets, $xforward, $me, $params) = @_;
diff --git a/src/bin/pmg-smtp-filter b/src/bin/pmg-smtp-filter
index 35a6ac6..45e68a7 100755
--- a/src/bin/pmg-smtp-filter
+++ b/src/bin/pmg-smtp-filter
@@ -152,7 +152,7 @@ sub get_prox_vars {
} if !$spaminfo;
my $vars = {
- 'SUBJECT' => mime_to_perl_string($entity->head->get ('subject', 0) || 'No Subject'),
+ 'SUBJECT' => PMG::Utils::decode_rfc1522($entity->head->get ('subject', 0) || 'No Subject'),
'RULE' => $rule->{name},
'RULE_INFO' => $msginfo->{rule_info},
'SENDER' => $msginfo->{sender},
--
2.30.2
^ permalink raw reply [flat|nested] 9+ messages in thread
* [pmg-devel] [PATCH pmg-api v2 3/8] fix #2541 ruledb: encode relevant values as utf-8 in database
2022-11-17 15:06 [pmg-devel] [PATCH pmg-api v2 0/8] ruledb - improve experience for non-ascii tests and mails Stoiko Ivanov
2022-11-17 15:06 ` [pmg-devel] [PATCH pmg-api v2 1/8] utils: return perl string from decode_rfc1522 Stoiko Ivanov
2022-11-17 15:06 ` [pmg-devel] [PATCH pmg-api v2 2/8] ruledb: properly substitute prox_vars in headers Stoiko Ivanov
@ 2022-11-17 15:06 ` Stoiko Ivanov
2022-11-17 15:06 ` [pmg-devel] [PATCH pmg-api v2 4/8] ruledb: encode e-mail addresses for syslog Stoiko Ivanov
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Stoiko Ivanov @ 2022-11-17 15:06 UTC (permalink / raw)
To: pmg-devel
This patch adds support for storing rule names, comments(info), and
most relevant values (e.g. the header content to match) in utf-8 in
the database.
backwards-compatibility should not be an issue:
* following the argumentation from commit
43f8112f0bb424f99057106d57d32276d7d422a6 in pve-storage
* we only need to consider that the valid multibyte utf-8 characters
do not really for sensible combinations of single-byte characters
(starting with a byte > 127 - e.g. "£")
the database is created with SQL_ASCII encoding - which behaves by
interpreting bytes <= 127 as ascii and those > 127 are not interpreted
(see [0], which just means that we have to explicitly en-/decode upon
storing/reading from there)
This patch currently omits most Who objects:
* for email/domain we'd still need to consider how to store them
(puny-code for the domain part, or everything as UTF-8) and it would
need changes to the API-types.
* the LDAP objects currently would not work too well, since our LDAPCache
is not UTF-8 safe - and fixing warants its own patch-series
* WhoRegex should work and be able to handle many use-cases
The ContentType values should also contain only ascii characters per
RFC6838 [1] and RFC2045 [2].
[0] https://www.postgresql.org/docs/13/multibyte.html
[1] https://datatracker.ietf.org/doc/html/rfc6838#section-4.2
[2] https://datatracker.ietf.org/doc/html/rfc2045#section-5.1
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
src/PMG/RuleDB.pm | 24 ++++++++++++++++--------
src/PMG/RuleDB/Accept.pm | 2 +-
src/PMG/RuleDB/BCC.pm | 2 +-
src/PMG/RuleDB/Block.pm | 2 +-
src/PMG/RuleDB/Disclaimer.pm | 2 +-
src/PMG/RuleDB/Group.pm | 4 ++--
src/PMG/RuleDB/MatchField.pm | 8 ++++++--
src/PMG/RuleDB/MatchFilename.pm | 5 ++++-
src/PMG/RuleDB/ModField.pm | 6 ++++--
src/PMG/RuleDB/Notify.pm | 2 +-
src/PMG/RuleDB/Quarantine.pm | 3 ++-
src/PMG/RuleDB/Remove.pm | 12 +++++++-----
src/PMG/RuleDB/Rule.pm | 2 +-
src/PMG/RuleDB/WhoRegex.pm | 5 ++++-
src/PMG/Utils.pm | 5 +++++
15 files changed, 56 insertions(+), 28 deletions(-)
diff --git a/src/PMG/RuleDB.pm b/src/PMG/RuleDB.pm
index 895acc6..a6b0b79 100644
--- a/src/PMG/RuleDB.pm
+++ b/src/PMG/RuleDB.pm
@@ -5,6 +5,7 @@ use warnings;
use DBI;
use HTML::Entities;
use Data::Dumper;
+use Encode qw(encode);
use PVE::SafeSyslog;
@@ -70,8 +71,8 @@ sub create_group_with_obj {
defined($obj) || die "proxmox: undefined object";
- $name //= '';
- $info //= '';
+ $name = encode('UTF-8', $name // '');
+ $info = encode('UTF-8', $info // '');
eval {
@@ -174,7 +175,9 @@ sub save_group {
$self->{dbh}->do("UPDATE Objectgroup " .
"SET Name = ?, Info = ? " .
"WHERE ID = ?", undef,
- $og->{name}, $og->{info}, $og->{id});
+ encode('UTF-8', $og->{name}),
+ encode('UTF-8', $og->{info}),
+ $og->{id});
return $og->{id};
@@ -183,7 +186,7 @@ sub save_group {
"INSERT INTO Objectgroup (Name, Info, Class) " .
"VALUES (?, ?, ?);");
- $sth->execute($og->name, $og->info, $og->class);
+ $sth->execute(encode('UTF-8', $og->name), encode('UTF-8', $og->info), $og->class);
return $og->{id} = PMG::Utils::lastid($self->{dbh}, 'objectgroup_id_seq');
}
@@ -212,7 +215,9 @@ sub delete_group {
$sth->execute($groupid);
if (my $ref = $sth->fetchrow_hashref()) {
- die "Group '$ref->{groupname}' is used by rule '$ref->{rulename}' - unable to delete\n";
+ my $groupname = PMG::Utils::try_decode_utf8($ref->{groupname});
+ my $rulename = PMG::Utils::try_decode_utf8($ref->{rulename});
+ die "Group '$groupname' is used by rule '$rulename' - unable to delete\n";
}
$sth->finish();
@@ -474,6 +479,7 @@ sub load_object_full {
sub load_group_by_name {
my ($self, $name) = @_;
+ $name = encode('UTF-8', $name);
my $sth = $self->{dbh}->prepare("SELECT * FROM Objectgroup " .
"WHERE name = ?");
@@ -598,13 +604,14 @@ sub save_rule {
defined($rule->{direction}) ||
die "undefined rule attribute - direction: ERROR";
+ my $rulename = encode('UTF-8', $rule->{name});
if (defined($rule->{id})) {
$self->{dbh}->do(
"UPDATE Rule " .
"SET Name = ?, Priority = ?, Active = ?, Direction = ? " .
"WHERE ID = ?", undef,
- $rule->{name}, $rule->{priority}, $rule->{active},
+ $rulename, $rule->{priority}, $rule->{active},
$rule->{direction}, $rule->{id});
return $rule->{id};
@@ -614,7 +621,7 @@ sub save_rule {
"INSERT INTO Rule (Name, Priority, Active, Direction) " .
"VALUES (?, ?, ?, ?);");
- $sth->execute($rule->name, $rule->priority, $rule->active,
+ $sth->execute($rulename, $rule->priority, $rule->active,
$rule->direction);
return $rule->{id} = PMG::Utils::lastid($self->{dbh}, 'rule_id_seq');
@@ -779,7 +786,8 @@ sub load_rules {
$sth->execute();
while (my $ref = $sth->fetchrow_hashref()) {
- my $rule = PMG::RuleDB::Rule->new($ref->{name}, $ref->{priority},
+ my $rulename = PMG::Utils::try_decode_utf8($ref->{name});
+ my $rule = PMG::RuleDB::Rule->new($rulename, $ref->{priority},
$ref->{active}, $ref->{direction});
$rule->{id} = $ref->{id};
push @$rules, $rule;
diff --git a/src/PMG/RuleDB/Accept.pm b/src/PMG/RuleDB/Accept.pm
index cd67ea2..4ebd6da 100644
--- a/src/PMG/RuleDB/Accept.pm
+++ b/src/PMG/RuleDB/Accept.pm
@@ -93,7 +93,7 @@ sub execute {
my $dkim = $msginfo->{dkim} // {};
my $subgroups = $mod_group->subgroups($targets, !$dkim->{sign});
- my $rulename = $vars->{RULE} // 'unknown';
+ my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
foreach my $ta (@$subgroups) {
my ($tg, $entity) = (@$ta[0], @$ta[1]);
diff --git a/src/PMG/RuleDB/BCC.pm b/src/PMG/RuleDB/BCC.pm
index 4867d83..6244dd9 100644
--- a/src/PMG/RuleDB/BCC.pm
+++ b/src/PMG/RuleDB/BCC.pm
@@ -115,7 +115,7 @@ sub execute {
my $subgroups = $mod_group->subgroups($targets, 1);
- my $rulename = $vars->{RULE} // 'unknown';
+ my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
my $bcc_to = PMG::Utils::subst_values_for_header($self->{target}, $vars);
diff --git a/src/PMG/RuleDB/Block.pm b/src/PMG/RuleDB/Block.pm
index c758787..25bb74e 100644
--- a/src/PMG/RuleDB/Block.pm
+++ b/src/PMG/RuleDB/Block.pm
@@ -89,7 +89,7 @@ sub execute {
my ($self, $queue, $ruledb, $mod_group, $targets,
$msginfo, $vars, $marks) = @_;
- my $rulename = $vars->{RULE} // 'unknown';
+ my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
if ($msginfo->{testmode}) {
my $fh = $msginfo->{test_fh};
diff --git a/src/PMG/RuleDB/Disclaimer.pm b/src/PMG/RuleDB/Disclaimer.pm
index d3003b2..c6afe54 100644
--- a/src/PMG/RuleDB/Disclaimer.pm
+++ b/src/PMG/RuleDB/Disclaimer.pm
@@ -193,7 +193,7 @@ sub execute {
my ($self, $queue, $ruledb, $mod_group, $targets,
$msginfo, $vars, $marks) = @_;
- my $rulename = $vars->{RULE} // 'unknown';
+ my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
my $subgroups = $mod_group->subgroups($targets);
diff --git a/src/PMG/RuleDB/Group.pm b/src/PMG/RuleDB/Group.pm
index 2508305..baa68ce 100644
--- a/src/PMG/RuleDB/Group.pm
+++ b/src/PMG/RuleDB/Group.pm
@@ -12,8 +12,8 @@ sub new {
my ($type, $name, $info, $class) = @_;
my $self = {
- name => $name,
- info => $info,
+ name => PMG::Utils::try_decode_utf8($name),
+ info => PMG::Utils::try_decode_utf8($info),
class => $class,
};
diff --git a/src/PMG/RuleDB/MatchField.pm b/src/PMG/RuleDB/MatchField.pm
index 2671ea4..2b56058 100644
--- a/src/PMG/RuleDB/MatchField.pm
+++ b/src/PMG/RuleDB/MatchField.pm
@@ -4,6 +4,7 @@ use strict;
use warnings;
use DBI;
use Digest::SHA;
+use Encode qw(encode);
use MIME::Words;
use PVE::SafeSyslog;
@@ -50,9 +51,10 @@ sub load_attr {
defined($field) || die "undefined object attribute: ERROR";
defined($field_value) || die "undefined object attribute: ERROR";
+ my $decoded_field_value = PMG::Utils::try_decode_utf8($field_value);
# use known constructor, bless afterwards (because sub class can have constructor
# with other parameter signature).
- my $obj = PMG::RuleDB::MatchField->new($field, $field_value, $ogroup);
+ my $obj = PMG::RuleDB::MatchField->new($field, $decoded_field_value, $ogroup);
bless $obj, $class;
$obj->{id} = $id;
@@ -69,6 +71,7 @@ sub save {
my $new_value = "$self->{field}:$self->{field_value}";
$new_value =~ s/\\/\\\\/g;
+ $new_value = encode('UTF-8', $new_value);
if (defined ($self->{id})) {
# update
@@ -105,7 +108,8 @@ sub parse_entity {
for my $value ($entity->head->get_all($self->{field})) {
chomp $value;
- my $decvalue = MIME::Words::decode_mimewords($value);
+ my $decvalue = PMG::Utils::decode_rfc1522($value);
+ $decvalue = PMG::Utils::try_decode_utf8($decvalue);
if ($decvalue =~ m|$self->{field_value}|i) {
push @$res, $id;
diff --git a/src/PMG/RuleDB/MatchFilename.pm b/src/PMG/RuleDB/MatchFilename.pm
index 7e5b486..c9cdbe0 100644
--- a/src/PMG/RuleDB/MatchFilename.pm
+++ b/src/PMG/RuleDB/MatchFilename.pm
@@ -4,6 +4,7 @@ use strict;
use warnings;
use DBI;
use Digest::SHA;
+use Encode qw(encode);
use MIME::Words;
use PMG::Utils;
@@ -41,8 +42,9 @@ sub load_attr {
my $class = ref($type) || $type;
defined($value) || die "undefined value: ERROR";;
+ my $decvalue = PMG::Utils::try_decode_utf8($value);
- my $obj = $class->new($value, $ogroup);
+ my $obj = $class->new($decvalue, $ogroup);
$obj->{id} = $id;
$obj->{digest} = Digest::SHA::sha1_hex($id, $value, $ogroup);
@@ -57,6 +59,7 @@ sub save {
my $new_value = $self->{fname};
$new_value =~ s/\\/\\\\/g;
+ $new_value = encode('UTF-8', $new_value);
if (defined($self->{id})) {
# update
diff --git a/src/PMG/RuleDB/ModField.pm b/src/PMG/RuleDB/ModField.pm
index 34108d1..6232322 100644
--- a/src/PMG/RuleDB/ModField.pm
+++ b/src/PMG/RuleDB/ModField.pm
@@ -56,7 +56,9 @@ sub load_attr {
(defined($field) && defined($field_value)) || return undef;
- my $obj = $class->new($field, $field_value, $ogroup);
+ my $dec_field_value = PMG::Utils::try_decode_utf8($field_value);
+
+ my $obj = $class->new($field, $dec_field_value, $ogroup);
$obj->{id} = $id;
$obj->{digest} = Digest::SHA::sha1_hex($id, $field, $field_value, $ogroup);
@@ -69,7 +71,7 @@ sub save {
defined($self->{ogroup}) || return undef;
- my $new_value = "$self->{field}:$self->{field_value}";
+ my $new_value = encode('UTF-8', "$self->{field}:$self->{field_value}");
if (defined ($self->{id})) {
# update
diff --git a/src/PMG/RuleDB/Notify.pm b/src/PMG/RuleDB/Notify.pm
index 7b38e0d..8a9945b 100644
--- a/src/PMG/RuleDB/Notify.pm
+++ b/src/PMG/RuleDB/Notify.pm
@@ -208,7 +208,7 @@ sub execute {
my $from = 'postmaster';
- my $rulename = $vars->{RULE} // 'unknown';
+ my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
my $body = PMG::Utils::subst_values($self->{body}, $vars);
my $subject = PMG::Utils::subst_values_for_header($self->{subject}, $vars);
diff --git a/src/PMG/RuleDB/Quarantine.pm b/src/PMG/RuleDB/Quarantine.pm
index 1426393..9d802fe 100644
--- a/src/PMG/RuleDB/Quarantine.pm
+++ b/src/PMG/RuleDB/Quarantine.pm
@@ -4,6 +4,7 @@ use strict;
use warnings;
use DBI;
use Digest::SHA;
+use Encode qw(encode);
use PVE::SafeSyslog;
@@ -89,7 +90,7 @@ sub execute {
my $subgroups = $mod_group->subgroups($targets, 1);
- my $rulename = $vars->{RULE} // 'unknown';
+ my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
foreach my $ta (@$subgroups) {
my ($tg, $entity) = (@$ta[0], @$ta[1]);
diff --git a/src/PMG/RuleDB/Remove.pm b/src/PMG/RuleDB/Remove.pm
index 6b27b91..da6c25f 100644
--- a/src/PMG/RuleDB/Remove.pm
+++ b/src/PMG/RuleDB/Remove.pm
@@ -63,12 +63,14 @@ sub load_attr {
defined ($value) || die "undefined value: ERROR";
- my $obj;
+ my ($obj, $text);
if ($value =~ m/^([01])\,([01])(\:(.*))?$/s) {
- $obj = $class->new($1, $4, $ogroup, $2);
+ $text = PMG::Utils::try_decode_utf8($4);
+ $obj = $class->new($1, $text, $ogroup, $2);
} elsif ($value =~ m/^([01])(\:(.*))?$/s) {
- $obj = $class->new($1, $3, $ogroup);
+ $text = PMG::Utils::try_decode_utf8($3);
+ $obj = $class->new($1, $text, $ogroup);
} else {
$obj = $class->new(0, undef, $ogroup);
}
@@ -89,7 +91,7 @@ sub save {
$value .= ','. ($self->{quarantine} ? '1' : '0');
if ($self->{text}) {
- $value .= ":$self->{text}";
+ $value .= encode('UTF-8', ":$self->{text}");
}
if (defined ($self->{id})) {
@@ -194,7 +196,7 @@ sub execute {
my ($self, $queue, $ruledb, $mod_group, $targets,
$msginfo, $vars, $marks, $ldap) = @_;
- my $rulename = $vars->{RULE} // 'unknown';
+ my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
if (!$self->{all} && ($#$marks == -1)) {
# no marks
diff --git a/src/PMG/RuleDB/Rule.pm b/src/PMG/RuleDB/Rule.pm
index c49ad21..e7c9146 100644
--- a/src/PMG/RuleDB/Rule.pm
+++ b/src/PMG/RuleDB/Rule.pm
@@ -12,7 +12,7 @@ sub new {
my ($type, $name, $priority, $active, $direction) = @_;
my $self = {
- name => $name // '',
+ name => PMG::Utils::try_decode_utf8($name) // '',
priority => $priority // 0,
active => $active // 0,
};
diff --git a/src/PMG/RuleDB/WhoRegex.pm b/src/PMG/RuleDB/WhoRegex.pm
index 37ec3aa..ccc94a0 100644
--- a/src/PMG/RuleDB/WhoRegex.pm
+++ b/src/PMG/RuleDB/WhoRegex.pm
@@ -4,6 +4,7 @@ use strict;
use warnings;
use DBI;
use Digest::SHA;
+use Encode qw(decode encode);
use PMG::Utils;
use PMG::RuleDB::Object;
@@ -43,7 +44,8 @@ sub load_attr {
defined($value) || die "undefined value: ERROR";
- my $obj = $class->new ($value, $ogroup);
+ my $decoded_value = PMG::Utils::try_decode_utf8($value);
+ my $obj = $class->new ($decoded_value, $ogroup);
$obj->{id} = $id;
$obj->{digest} = Digest::SHA::sha1_hex($id, $value, $ogroup);
@@ -59,6 +61,7 @@ sub save {
my $adr = $self->{address};
$adr =~ s/\\/\\\\/g;
+ $adr = encode('UTF-8', $adr);
if (defined ($self->{id})) {
# update
diff --git a/src/PMG/Utils.pm b/src/PMG/Utils.pm
index cc7e9b3..750ea3a 100644
--- a/src/PMG/Utils.pm
+++ b/src/PMG/Utils.pm
@@ -1563,4 +1563,9 @@ sub get_existing_object_id {
return;
}
+sub try_decode_utf8 {
+ my ($data) = @_;
+ return eval { decode('UTF-8', $data, 1) } // $data;
+}
+
1;
--
2.30.2
^ permalink raw reply [flat|nested] 9+ messages in thread
* [pmg-devel] [PATCH pmg-api v2 4/8] ruledb: encode e-mail addresses for syslog
2022-11-17 15:06 [pmg-devel] [PATCH pmg-api v2 0/8] ruledb - improve experience for non-ascii tests and mails Stoiko Ivanov
` (2 preceding siblings ...)
2022-11-17 15:06 ` [pmg-devel] [PATCH pmg-api v2 3/8] fix #2541 ruledb: encode relevant values as utf-8 in database Stoiko Ivanov
@ 2022-11-17 15:06 ` Stoiko Ivanov
2022-11-17 15:06 ` [pmg-devel] [PATCH pmg-api v2 5/8] partially fix #2465: handle smtputf8 addresses in the rule-system Stoiko Ivanov
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Stoiko Ivanov @ 2022-11-17 15:06 UTC (permalink / raw)
To: pmg-devel
as done in 114655f4fdb07c789a361b2f397f5345eafd16c6 for Accept and
Block.
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
src/PMG/RuleDB/BCC.pm | 19 +++++++++++++++++--
src/PMG/RuleDB/Notify.pm | 18 ++++++++++++++++--
src/PMG/RuleDB/Quarantine.pm | 16 ++++++++++++++--
src/PMG/RuleDB/Remove.pm | 8 +++++++-
4 files changed, 54 insertions(+), 7 deletions(-)
diff --git a/src/PMG/RuleDB/BCC.pm b/src/PMG/RuleDB/BCC.pm
index 6244dd9..0f016f8 100644
--- a/src/PMG/RuleDB/BCC.pm
+++ b/src/PMG/RuleDB/BCC.pm
@@ -3,6 +3,7 @@ package PMG::RuleDB::BCC;
use strict;
use warnings;
use DBI;
+use Encode qw(encode);
use PVE::SafeSyslog;
@@ -164,10 +165,24 @@ sub execute {
$entity, $msginfo->{sender}, \@bcc_targets,
$msginfo->{xforward}, $msginfo->{fqdn}, $param);
foreach (@bcc_targets) {
+ my $target = encode('UTF-8', $_);
if ($qid) {
- syslog('info', "%s: bcc to <%s> (rule: %s, %s)", $queue->{logid}, $_, $rulename, $qid);
+ syslog(
+ 'info',
+ "%s: bcc to <%s> (rule: %s, %s)",
+ $queue->{logid},
+ $target,
+ $rulename,
+ $qid,
+ );
} else {
- syslog('err', "%s: bcc to <%s> (rule: %s) failed", $queue->{logid}, $_, $rulename);
+ syslog(
+ 'err',
+ "%s: bcc to <%s> (rule: %s) failed",
+ $queue->{logid},
+ $target,
+ $rulename,
+ );
}
}
}
diff --git a/src/PMG/RuleDB/Notify.pm b/src/PMG/RuleDB/Notify.pm
index 8a9945b..68f9b4e 100644
--- a/src/PMG/RuleDB/Notify.pm
+++ b/src/PMG/RuleDB/Notify.pm
@@ -259,10 +259,24 @@ sub execute {
my $qid = PMG::Utils::reinject_mail(
$top, $from, \@targets, undef, $msginfo->{fqdn});
foreach (@targets) {
+ my $target = encode('UTF-8', $_);
if ($qid) {
- syslog('info', "%s: notify <%s> (rule: %s, %s)", $queue->{logid}, $_, $rulename, $qid);
+ syslog(
+ 'info',
+ "%s: notify <%s> (rule: %s, %s)",
+ $queue->{logid},
+ $target,
+ $rulename,
+ $qid,
+ );
} else {
- syslog ('err', "%s: notify <%s> (rule: %s) failed", $queue->{logid}, $_, $rulename);
+ syslog (
+ 'err',
+ "%s: notify <%s> (rule: %s) failed",
+ $queue->{logid},
+ $target,
+ $rulename,
+ );
}
}
}
diff --git a/src/PMG/RuleDB/Quarantine.pm b/src/PMG/RuleDB/Quarantine.pm
index 9d802fe..0fc8352 100644
--- a/src/PMG/RuleDB/Quarantine.pm
+++ b/src/PMG/RuleDB/Quarantine.pm
@@ -101,7 +101,13 @@ sub execute {
if (my $qid = $queue->quarantine_mail($ruledb, 'V', $entity, $tg, $msginfo, $vars, $ldap)) {
foreach (@$tg) {
- syslog ('info', "$queue->{logid}: moved mail for <%s> to virus quarantine - %s (rule: %s)", $_, $qid, $rulename);
+ syslog (
+ 'info',
+ "$queue->{logid}: moved mail for <%s> to virus quarantine - %s (rule: %s)",
+ encode('UTF-8',$_),
+ $qid,
+ $rulename,
+ );
}
$queue->set_status ($tg, 'delivered');
@@ -111,7 +117,13 @@ sub execute {
if (my $qid = $queue->quarantine_mail($ruledb, 'S', $entity, $tg, $msginfo, $vars, $ldap)) {
foreach (@$tg) {
- syslog ('info', "$queue->{logid}: moved mail for <%s> to spam quarantine - %s (rule: %s)", $_, $qid, $rulename);
+ syslog (
+ 'info',
+ "$queue->{logid}: moved mail for <%s> to spam quarantine - %s (rule: %s)",
+ encode('UTF-8',$_),
+ $qid,
+ $rulename,
+ );
}
$queue->set_status($tg, 'delivered');
diff --git a/src/PMG/RuleDB/Remove.pm b/src/PMG/RuleDB/Remove.pm
index da6c25f..e7c353c 100644
--- a/src/PMG/RuleDB/Remove.pm
+++ b/src/PMG/RuleDB/Remove.pm
@@ -235,7 +235,13 @@ sub execute {
}
foreach (@$tg) {
- syslog ('info', "$queue->{logid}: moved mail for <%s> to attachment quarantine - %s (rule: %s)", $_, $qid, $rulename);
+ syslog (
+ 'info',
+ "$queue->{logid}: moved mail for <%s> to attachment quarantine - %s (rule: %s)",
+ encode('UTF-8',$_),
+ $qid,
+ $rulename,
+ );
}
}
}
--
2.30.2
^ permalink raw reply [flat|nested] 9+ messages in thread
* [pmg-devel] [PATCH pmg-api v2 5/8] partially fix #2465: handle smtputf8 addresses in the rule-system
2022-11-17 15:06 [pmg-devel] [PATCH pmg-api v2 0/8] ruledb - improve experience for non-ascii tests and mails Stoiko Ivanov
` (3 preceding siblings ...)
2022-11-17 15:06 ` [pmg-devel] [PATCH pmg-api v2 4/8] ruledb: encode e-mail addresses for syslog Stoiko Ivanov
@ 2022-11-17 15:06 ` Stoiko Ivanov
2022-11-17 15:06 ` [pmg-devel] [PATCH pmg-api v2 6/8] quarantine: handle utf8 data Stoiko Ivanov
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Stoiko Ivanov @ 2022-11-17 15:06 UTC (permalink / raw)
To: pmg-devel
the envelope addresses are used in the rule-system for lookups and
statistics. When the mail is received with smtputf8 the addresses are
decoded (multi-byte perl-strings) and thus need encoding before using
them as parameter in a database query.
This patch encodes the addresses as utf-8 for the relevant queries
unconditionally, because envelope-senders should either be:
* (a subset of) ascii (no smtputf8) - which is invariant for utf-8
encoding
* valid utf-8 (smtputf8)
The patch does not address the issues with multi-byte addresses in our
LDAP-implementation (hence the partial fix), but should still be an
improvment for many deployments
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
src/PMG/MailQueue.pm | 10 ++++++----
src/PMG/RuleDB/Spam.pm | 5 +++--
src/bin/pmg-smtp-filter | 5 +++--
3 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/src/PMG/MailQueue.pm b/src/PMG/MailQueue.pm
index 2841b07..8355c30 100644
--- a/src/PMG/MailQueue.pm
+++ b/src/PMG/MailQueue.pm
@@ -6,6 +6,7 @@ use warnings;
use PVE::SafeSyslog;
use MIME::Parser;
use IO::File;
+use Encode;
use File::Sync;
use File::Basename;
use File::Path;
@@ -141,6 +142,7 @@ sub quarantinedb_insert {
my ($self, $ruledb, $lcid, $ldap, $qtype, $header, $sender, $file, $targets, $vars) = @_;
eval {
+ $sender = encode('UTF-8', $sender);
my $dbh = $ruledb->{dbh};
my $insert_cmds = "SELECT nextval ('cmailstore_id_seq'); INSERT INTO CMailStore " .
@@ -188,11 +190,11 @@ sub quarantinedb_insert {
if ($pmail eq lc ($r)) {
$receiver = "NULL";
} else {
- $receiver = $dbh->quote ($r);
+ $receiver = $dbh->quote (encode('UTF-8', $r));
}
- $pmail = $dbh->quote ($pmail);
+ $pmail = $dbh->quote (encode('UTF-8', $pmail));
$insert_cmds .= "INSERT INTO CMSReceivers " .
"(CMailStore_CID, CMailStore_RID, PMail, Receiver, TicketID, Status, MTime) " .
"VALUES ($lcid, currval ('cmailstore_id_seq'), $pmail, $receiver, $tid, 'N', $now); ";
@@ -294,8 +296,8 @@ sub quarantine_mail {
$entity->head->delete ('Return-Path');
# prepend Delivered-To and Return-Path (like QMAIL MAILDIR FORMAT)
- $entity->head->add ('Return-Path', join (',', $sender), 0);
- $entity->head->add ('Delivered-To', join (',', @$tg), 0);
+ $entity->head->add ('Return-Path', encode('UTF-8', join (',', $sender)), 0);
+ $entity->head->add ('Delivered-To', encode('UTF-8', join (',', @$tg)), 0);
$entity->print ($fh);
diff --git a/src/PMG/RuleDB/Spam.pm b/src/PMG/RuleDB/Spam.pm
index cc9a347..b7e7dd4 100644
--- a/src/PMG/RuleDB/Spam.pm
+++ b/src/PMG/RuleDB/Spam.pm
@@ -4,6 +4,7 @@ use strict;
use warnings;
use DBI;
use Digest::SHA;
+use Encode qw(encode decode);
use Time::HiRes qw (gettimeofday);
use PVE::SafeSyslog;
@@ -135,8 +136,8 @@ sub get_blackwhite {
my $cond = '';
foreach my $r (@$targets) {
my $pmail = $msginfo->{pmail}->{$r} || lc ($r);
- my $qr = $dbh->quote ($pmail);
- $cond .= " OR " if $cond;
+ my $qr = $dbh->quote (encode('UTF-8', $pmail));
+ $cond .= " OR " if $cond;
$cond .= "pmail = $qr";
}
diff --git a/src/bin/pmg-smtp-filter b/src/bin/pmg-smtp-filter
index 45e68a7..bb8e264 100755
--- a/src/bin/pmg-smtp-filter
+++ b/src/bin/pmg-smtp-filter
@@ -4,6 +4,7 @@ use strict;
use warnings;
use Carp;
+use Encode qw(encode decode);
use Getopt::Long;
use Time::HiRes qw (usleep gettimeofday tv_interval);
use POSIX qw(:sys_wait_h errno_h signal_h);
@@ -791,10 +792,10 @@ sub handle_smtp {
$insert_cmds .= ($queue->{sa_score} || 0) . ',';
$insert_cmds .= $dbh->quote($queue->{vinfo}) . ',';
$insert_cmds .= $time_total . ',';
- $insert_cmds .= $dbh->quote($msginfo->{sender}) . ');';
+ $insert_cmds .= $dbh->quote(encode('UTF-8', $msginfo->{sender})) . ');';
foreach my $r (@{$msginfo->{targets}}) {
- my $tmp = $dbh->quote($r);
+ my $tmp = $dbh->quote(encode('UTF-8',$r));
my $blocked = $queue->{status}->{$r} eq 'blocked' ? 1 : 0;
$insert_cmds .= "INSERT INTO CReceivers (CStatistic_CID, CStatistic_RID, Receiver, Blocked) " .
"VALUES ($lcid, currval ('cstatistic_id_seq'), $tmp, '$blocked'); ";
--
2.30.2
^ permalink raw reply [flat|nested] 9+ messages in thread
* [pmg-devel] [PATCH pmg-api v2 6/8] quarantine: handle utf8 data
2022-11-17 15:06 [pmg-devel] [PATCH pmg-api v2 0/8] ruledb - improve experience for non-ascii tests and mails Stoiko Ivanov
` (4 preceding siblings ...)
2022-11-17 15:06 ` [pmg-devel] [PATCH pmg-api v2 5/8] partially fix #2465: handle smtputf8 addresses in the rule-system Stoiko Ivanov
@ 2022-11-17 15:06 ` Stoiko Ivanov
2022-11-17 15:06 ` [pmg-devel] [PATCH pmg-api v2 7/8] pmgqm: handle smtputf8 data Stoiko Ivanov
2022-11-17 15:06 ` [pmg-devel] [PATCH pmg-api v2 8/8] statistics: handle utf8 data Stoiko Ivanov
7 siblings, 0 replies; 9+ messages in thread
From: Stoiko Ivanov @ 2022-11-17 15:06 UTC (permalink / raw)
To: pmg-devel
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
src/PMG/API2/Quarantine.pm | 16 ++++++++--------
src/PMG/HTMLMail.pm | 7 ++++---
src/PMG/Quarantine.pm | 13 +++++++------
src/PMG/RuleDB/Spam.pm | 12 ++++++------
4 files changed, 25 insertions(+), 23 deletions(-)
diff --git a/src/PMG/API2/Quarantine.pm b/src/PMG/API2/Quarantine.pm
index ddf7c04..69cc7c6 100644
--- a/src/PMG/API2/Quarantine.pm
+++ b/src/PMG/API2/Quarantine.pm
@@ -134,15 +134,15 @@ my $parse_header_info = sub {
my @lines = split('\n', $ref->{header});
my $head = Mail::Header->new(\@lines);
- $res->{subject} = PMG::Utils::decode_rfc1522(PVE::Tools::trim($head->get('subject'))) // '';
+ $res->{subject} = PMG::Utils::try_decode_utf8(PMG::Utils::decode_rfc1522(PVE::Tools::trim($head->get('subject'))) // '');
- $res->{from} = PMG::Utils::decode_rfc1522(PVE::Tools::trim($head->get('from') || $ref->{sender})) // '';
+ $res->{from} = PMG::Utils::try_decode_utf8(PMG::Utils::decode_rfc1522(PVE::Tools::trim($head->get('from') || $ref->{sender})) // '');
- my $sender = PMG::Utils::decode_rfc1522(PVE::Tools::trim($head->get('sender')));
+ my $sender = PMG::Utils::try_decode_utf8(PMG::Utils::decode_rfc1522(PVE::Tools::trim($head->get('sender'))));
$res->{sender} = $sender if $sender && ($sender ne $res->{from});
- $res->{envelope_sender} = $ref->{sender};
- $res->{receiver} = $ref->{receiver} // $ref->{pmail};
+ $res->{envelope_sender} = PMG::Utils::try_decode_utf8($ref->{sender});
+ $res->{receiver} = PMG::Utils::try_decode_utf8($ref->{receiver} // $ref->{pmail});
$res->{id} = 'C' . $ref->{cid} . 'R' . $ref->{rid} . 'T' . $ref->{ticketid};
$res->{time} = $ref->{time};
$res->{bytes} = $ref->{bytes};
@@ -437,7 +437,7 @@ __PACKAGE__->register_method ({
$sth->execute();
while (my $ref = $sth->fetchrow_hashref()) {
- push @$res, { mail => $ref->{pmail} };
+ push @$res, { mail => PMG::Utils::try_decode_utf8($ref->{pmail}) };
}
return $res;
@@ -532,7 +532,7 @@ __PACKAGE__->register_method ({
}
while (my $ref = $sth->fetchrow_hashref()) {
- push @$res, { mail => $ref->{pmail} };
+ push @$res, { mail => PMG::Utils::try_decode_utf8($ref->{pmail}) };
}
return $res;
@@ -569,7 +569,7 @@ my $quarantine_api = sub {
}
if ($check_pmail || $role eq 'quser') {
- $sth->execute($pmail);
+ $sth->execute(encode('UTF-8', $pmail));
} else {
$sth->execute();
}
diff --git a/src/PMG/HTMLMail.pm b/src/PMG/HTMLMail.pm
index 87f5c40..207c52c 100644
--- a/src/PMG/HTMLMail.pm
+++ b/src/PMG/HTMLMail.pm
@@ -192,9 +192,10 @@ sub read_raw_email {
# read header
my $header;
while (defined(my $line = <$fh>)) {
- $raw_header .= $line;
- chomp $line;
- push @$header, $line;
+ my $decoded_line = PMG::Utils::try_decode_utf8($line);
+ $raw_header .= $decoded_line;
+ chomp $decoded_line;
+ push @$header, $decoded_line;
last if $line =~ m/^\s*$/;
}
diff --git a/src/PMG/Quarantine.pm b/src/PMG/Quarantine.pm
index 77af8cc..aa6b948 100644
--- a/src/PMG/Quarantine.pm
+++ b/src/PMG/Quarantine.pm
@@ -3,6 +3,7 @@ package PMG::Quarantine;
use strict;
use warnings;
use Net::SMTP;
+use Encode qw(encode);
use PVE::SafeSyslog;
use PVE::Tools;
@@ -16,7 +17,7 @@ sub add_to_blackwhite {
my $name = $listname eq 'BL' ? 'BL' : 'WL';
my $oname = $listname eq 'BL' ? 'WL' : 'BL';
- my $qu = $dbh->quote ($username);
+ my $qu = $dbh->quote (encode('UTF-8', $username));
my $sth = $dbh->prepare(
"SELECT * FROM UserPrefs WHERE pmail = $qu AND (Name = 'BL' OR Name = 'WL')");
@@ -25,13 +26,13 @@ sub add_to_blackwhite {
my $list = { 'WL' => {}, 'BL' => {} };
while (my $ref = $sth->fetchrow_hashref()) {
- my $data = $ref->{data};
+ my $data = PMG::Utils::try_decode_utf8($ref->{data});
$data =~ s/[,;]/ /g;
my @alist = split('\s+', $data);
my $tmp = {};
foreach my $a (@alist) {
- if ($a =~ m/^[[:ascii:]]+$/) {
+ if ($a =~ m/^[^\s\\\@]+(?:\@[^\s\/\\\@]+)?$/) {
$tmp->{$a} = 1;
}
}
@@ -50,7 +51,7 @@ sub add_to_blackwhite {
if ($delete) {
delete($list->{$name}->{$v});
} else {
- if ($v =~ m/[[:^ascii:]]/) {
+ if ($v =~ m/[\s\\]/) {
die "email address '$v' contains invalid characters\n";
}
$list->{$name}->{$v} = 1;
@@ -58,8 +59,8 @@ sub add_to_blackwhite {
}
}
- my $wlist = $dbh->quote(join (',', keys %{$list->{WL}}) || '');
- my $blist = $dbh->quote(join (',', keys %{$list->{BL}}) || '');
+ my $wlist = $dbh->quote(encode('UTF-8', join (',', keys %{$list->{WL}})) || '');
+ my $blist = $dbh->quote(encode('UTF-8', join (',', keys %{$list->{BL}})) || '');
if (!$delete) {
my $maxlen = 200000;
diff --git a/src/PMG/RuleDB/Spam.pm b/src/PMG/RuleDB/Spam.pm
index b7e7dd4..a9bf392 100644
--- a/src/PMG/RuleDB/Spam.pm
+++ b/src/PMG/RuleDB/Spam.pm
@@ -94,7 +94,7 @@ sub parse_addrlist {
my $regex = $addr;
# SA like checks
$regex =~ s/[\000\\\(]/_/gs; # is this really necessasry ?
- $regex =~ s/([^\*\?_a-zA-Z0-9])/\\$1/g; # escape possible metachars
+ $regex =~ s/([^\*\?_\w])/\\$1/g; # escape possible metachars
$regex =~ tr/?/./; # replace "?" with "."
$regex =~ s/\*+/\.\*/g; # replace "*" with ".*"
@@ -149,13 +149,13 @@ sub get_blackwhite {
$sth->execute();
while (my $ref = $sth->fetchrow_hashref()) {
- my $pmail = lc ($ref->{pmail});
+ my $pmail = lc (PMG::Utils::try_decode_utf8($ref->{pmail}));
if ($ref->{name} eq 'WL') {
$target_info->{$pmail}->{whitelist} =
- parse_addrlist($ref->{data});
+ parse_addrlist(PMG::Utils::try_decode_utf8($ref->{data}));
} elsif ($ref->{name} eq 'BL') {
$target_info->{$pmail}->{blacklist} =
- parse_addrlist($ref->{data});
+ parse_addrlist(PMG::Utils::try_decode_utf8($ref->{data}));
}
}
@@ -205,7 +205,7 @@ sub what_match_targets {
($list = $queue->{blackwhite}->{$pmail}->{whitelist}) &&
check_addrlist($list, $queue->{all_from_addrs})) {
syslog('info', "%s: sender in user (%s) whitelist",
- $queue->{logid}, $pmail);
+ $queue->{logid}, encode('UTF-8', $pmail));
} else {
$target_info->{$t}->{marks} = []; # never add additional marks here
$target_info->{$t}->{spaminfo} = $info;
@@ -234,7 +234,7 @@ sub what_match_targets {
$target_info->{$t}->{marks} = [];
$target_info->{$t}->{spaminfo} = $info;
syslog ('info', "%s: sender in user (%s) blacklist",
- $queue->{logid}, $pmail);
+ $queue->{logid}, encode('UTF-8',$pmail));
}
}
}
--
2.30.2
^ permalink raw reply [flat|nested] 9+ messages in thread
* [pmg-devel] [PATCH pmg-api v2 7/8] pmgqm: handle smtputf8 data
2022-11-17 15:06 [pmg-devel] [PATCH pmg-api v2 0/8] ruledb - improve experience for non-ascii tests and mails Stoiko Ivanov
` (5 preceding siblings ...)
2022-11-17 15:06 ` [pmg-devel] [PATCH pmg-api v2 6/8] quarantine: handle utf8 data Stoiko Ivanov
@ 2022-11-17 15:06 ` Stoiko Ivanov
2022-11-17 15:06 ` [pmg-devel] [PATCH pmg-api v2 8/8] statistics: handle utf8 data Stoiko Ivanov
7 siblings, 0 replies; 9+ messages in thread
From: Stoiko Ivanov @ 2022-11-17 15:06 UTC (permalink / raw)
To: pmg-devel
$data->{pmail} is both used in the template rendering ('Spam Report for
$pmail'), and as content for the To header, which need different
treatment. Thus introduce 'pmail_raw' additionally.
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
src/PMG/CLI/pmgqm.pm | 24 +++++++++++++-----------
src/PMG/Utils.pm | 7 ++++---
2 files changed, 17 insertions(+), 14 deletions(-)
diff --git a/src/PMG/CLI/pmgqm.pm b/src/PMG/CLI/pmgqm.pm
index dbec8ef..7293579 100755
--- a/src/PMG/CLI/pmgqm.pm
+++ b/src/PMG/CLI/pmgqm.pm
@@ -2,6 +2,7 @@ package PMG::CLI::pmgqm;
use strict;
use Data::Dumper;
+use Encode qw(encode);
use Template;
use MIME::Entity;
use HTML::Entities;
@@ -17,6 +18,7 @@ use PVE::SafeSyslog;
use PVE::Tools;
use PVE::INotify;
use PVE::CLIHandler;
+use PVE::JSONSchema qw(get_standard_option);
use PMG::RESTEnvironment;
use PMG::Utils;
@@ -57,7 +59,7 @@ sub get_item_data {
}
$item->{envelope_sender} = $ref->{sender};
- $item->{pmail} = $ref->{pmail};
+ $item->{pmail} = encode_entities(PMG::Utils::try_decode_utf8($ref->{pmail}));
$item->{receiver} = $ref->{receiver} || $ref->{pmail};
$item->{date} = strftime("%F", localtime($ref->{time}));
@@ -157,11 +159,10 @@ __PACKAGE__->register_method ({
parameters => {
additionalProperties => 0,
properties => {
- receiver => {
+ receiver => get_standard_option('pmg-email-address', {
description => "Generate report for a single email address. If not specified, generate reports for all users.",
- type => 'string', format => 'email',
optional => 1,
- },
+ }),
timespan => {
description => "Select time span.",
type => 'string',
@@ -175,11 +176,10 @@ __PACKAGE__->register_method ({
enum => ['short', 'verbose', 'custom'],
optional => 1,
},
- redirect => {
+ redirect => get_standard_option('pmg-email-address', {
description => "Redirect spam report email to this address.",
- type => 'string', format => 'email',
optional => 1,
- },
+ }),
debug => {
description => "Debug mode. Print raw email to stdout instead of sending them.",
type => 'boolean',
@@ -280,7 +280,7 @@ __PACKAGE__->register_method ({
"ORDER BY pmail, time, receiver");
if ($target) {
- $sth->execute($target);
+ $sth->execute(encode('UTF-8', $target));
} else {
$sth->execute();
}
@@ -302,16 +302,18 @@ __PACKAGE__->register_method ({
};
while (my $ref = $sth->fetchrow_hashref()) {
- if ($creceiver ne $ref->{pmail}) {
+ my $decoded_pmail = PMG::Utils::try_decode_utf8($ref->{pmail});
+ if ($creceiver ne $decoded_pmail) {
$finalize->() if $data;
$data = clone($global_data);
- $creceiver = $ref->{pmail};
+ $creceiver = $decoded_pmail;
$mailcount = 0;
- $data->{pmail} = $creceiver;
+ $data->{pmail} = encode_entities($decoded_pmail);
+ $data->{pmail_raw} = $ref->{pmail};
$data->{managehref} = "$protocol_fqdn_port/quarantine";
if ($data->{authmode} ne 'ldap') {
$data->{ticket} = PMG::Ticket::assemble_quarantine_ticket($data->{pmail});
diff --git a/src/PMG/Utils.pm b/src/PMG/Utils.pm
index 750ea3a..acc621a 100644
--- a/src/PMG/Utils.pm
+++ b/src/PMG/Utils.pm
@@ -1146,12 +1146,13 @@ sub rfc1522_to_html {
my ($d, $cs) = @$r;
if ($d) {
if ($cs) {
- $res .= encode_entities(decode($cs, $d));
+ $res .= encode('UTF-8', decode($cs, $d));
} else {
- $res .= encode_entities($d);
+ $res .= $d;
}
}
}
+ $res = encode_entities(decode('UTF-8', $res));
};
$res = $enc if $@;
@@ -1260,7 +1261,7 @@ sub finalize_report {
my $top = MIME::Entity->build(
Type => "multipart/related",
- To => $data->{pmail},
+ To => $data->{pmail_raw},
From => $mailfrom,
Subject => bencode_header(decode_entities($title)));
--
2.30.2
^ permalink raw reply [flat|nested] 9+ messages in thread
* [pmg-devel] [PATCH pmg-api v2 8/8] statistics: handle utf8 data.
2022-11-17 15:06 [pmg-devel] [PATCH pmg-api v2 0/8] ruledb - improve experience for non-ascii tests and mails Stoiko Ivanov
` (6 preceding siblings ...)
2022-11-17 15:06 ` [pmg-devel] [PATCH pmg-api v2 7/8] pmgqm: handle smtputf8 data Stoiko Ivanov
@ 2022-11-17 15:06 ` Stoiko Ivanov
7 siblings, 0 replies; 9+ messages in thread
From: Stoiko Ivanov @ 2022-11-17 15:06 UTC (permalink / raw)
To: pmg-devel
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
src/PMG/Statistic.pm | 67 +++++++++++++++++++++++++++++++++-----------
1 file changed, 50 insertions(+), 17 deletions(-)
diff --git a/src/PMG/Statistic.pm b/src/PMG/Statistic.pm
index 6d27930..6cb65ff 100755
--- a/src/PMG/Statistic.pm
+++ b/src/PMG/Statistic.pm
@@ -3,6 +3,7 @@ package PMG::Statistic;
use strict;
use warnings;
use DBI;
+use Encode qw(encode);
use Time::Local;
use Time::Zone;
@@ -545,6 +546,22 @@ my $compute_sql_orderby = sub {
return $orderby;
};
+sub encode_user_stat {
+ my ($entry) = @_;
+
+ my $res = { };
+
+ for my $a (keys %$entry) {
+ if ($a eq 'receiver' || $a eq 'sender' || $a eq 'contact') {
+ $res->{$a} = PMG::Utils::try_decode_utf8($entry->{$a});
+ } else {
+ $res->{$a} = $entry->{$a};
+ }
+ }
+
+ return $res;
+}
+
sub user_stat_contact_details {
my ($self, $rdb, $receiver, $limit, $sorters, $filter) = @_;
@@ -554,19 +571,21 @@ sub user_stat_contact_details {
my $cond_good_mail = $self->query_cond_good_mail ($from, $to);
+ my $filter_pattern = $rdb->{dbh}->quote(encode('UTF-8', "%${filter}%"));
+
my $query = "SELECT * FROM CStatistic, CReceivers " .
"WHERE cid = cstatistic_cid AND rid = cstatistic_rid AND $cond_good_mail " .
"AND NOT direction AND sender != '' AND receiver = ? " .
- ($filter ? "AND sender like " . $rdb->{dbh}->quote("%${filter}%") . ' ' : '') .
+ ($filter_pattern ? "AND sender like " . $filter_pattern . ' ' : '') .
"ORDER BY $orderby limit $limit";
my $sth = $rdb->{dbh}->prepare($query);
- $sth->execute($receiver);
+ $sth->execute(encode('UTF-8',$receiver));
my $res = [];
while (my $ref = $sth->fetchrow_hashref()) {
- push @$res, $ref;
+ push @$res, encode_user_stat($ref);
}
$sth->finish();
@@ -583,11 +602,14 @@ sub user_stat_contact {
my $cond_good_mail = $self->query_cond_good_mail($from, $to);
+ my $filter_pattern;
+ $filter_pattern = $rdb->{dbh}->quote(encode('UTF-8', "%${filter}%")) if $filter;
+
my $query = "SELECT receiver as contact, count(*) AS count, sum (bytes) AS bytes, " .
"count (virusinfo) as viruscount " .
"FROM CStatistic, CReceivers " .
"WHERE cid = cstatistic_cid AND rid = cstatistic_rid " .
- ($filter ? "AND receiver like " . $rdb->{dbh}->quote("%${filter}%") . ' ' : '') .
+ ($filter_pattern ? "AND receiver like " . $filter_pattern . ' ' : '') .
"AND $cond_good_mail AND NOT direction AND sender != '' ";
if ($advfilter) {
@@ -603,7 +625,7 @@ sub user_stat_contact {
my $res = [];
while (my $ref = $sth->fetchrow_hashref()) {
- push @$res, $ref;
+ push @$res, encode_user_stat($ref);
}
$sth->finish();
@@ -620,20 +642,23 @@ sub user_stat_sender_details {
my $cond_good_mail = $self->query_cond_good_mail($from, $to);
+ my $filter_pattern;
+ $filter_pattern = $rdb->{dbh}->quote(encode('UTF-8', "%${filter}%")) if $filter;
+
my $sth = $rdb->{dbh}->prepare(
"SELECT " .
"blocked, bytes, ptime, sender, receiver, spamlevel, time, virusinfo " .
"FROM CStatistic, CReceivers " .
"WHERE cid = cstatistic_cid AND rid = cstatistic_rid AND " .
"$cond_good_mail AND NOT direction AND sender = ? " .
- ($filter ? "AND receiver like " . $rdb->{dbh}->quote("%${filter}%") . ' ' : '') .
+ ($filter_pattern ? "AND receiver like " . $filter_pattern . ' ' : '') .
"ORDER BY $orderby limit $limit");
- $sth->execute($sender);
+ $sth->execute(encode('UTF-8',$sender));
my $res = [];
while (my $ref = $sth->fetchrow_hashref()) {
- push @$res, $ref;
+ push @$res, encode_user_stat($ref);
}
$sth->finish();
@@ -650,11 +675,14 @@ sub user_stat_sender {
my $cond_good_mail = $self->query_cond_good_mail ($from, $to);
+ my $filter_pattern;
+ $filter_pattern = $rdb->{dbh}->quote(encode('UTF-8', "%${filter}%")) if $filter;
+
my $query = "SELECT sender,count(*) AS count, sum (bytes) AS bytes, " .
"count (virusinfo) as viruscount, " .
"count (CASE WHEN spamlevel >= 3 THEN 1 ELSE NULL END) as spamcount " .
"FROM CStatistic WHERE $cond_good_mail AND NOT direction AND sender != '' " .
- ($filter ? "AND sender like " . $rdb->{dbh}->quote("%${filter}%") . ' ' : '') .
+ ($filter_pattern ? "AND sender like " . $filter_pattern . ' ' : '') .
"GROUP BY sender ORDER BY $orderby limit $limit";
my $sth = $rdb->{dbh}->prepare($query);
@@ -662,7 +690,7 @@ sub user_stat_sender {
my $res = [];
while (my $ref = $sth->fetchrow_hashref()) {
- push @$res, $ref;
+ push @$res, encode_user_stat($ref);
}
$sth->finish();
@@ -679,18 +707,21 @@ sub user_stat_receiver_details {
my $cond_good_mail = $self->query_cond_good_mail($from, $to);
+ my $filter_pattern;
+ $filter_pattern = $rdb->{dbh}->quote(encode('UTF-8', "%${filter}%")) if $filter;
+
my $sth = $rdb->{dbh}->prepare(
"SELECT blocked, bytes, ptime, sender, receiver, spamlevel, time, virusinfo " .
"FROM CStatistic, CReceivers " .
"WHERE cid = cstatistic_cid AND rid = cstatistic_rid AND $cond_good_mail AND receiver = ? " .
- ($filter ? "AND sender like " . $rdb->{dbh}->quote("%${filter}%") . ' ' : '') .
+ ($filter_pattern ? "AND sender like " . $filter_pattern . ' ' : '') .
"ORDER BY $orderby limit $limit");
- $sth->execute($receiver);
+ $sth->execute(encode('UTF-8',$receiver));
my $res = [];
while (my $ref = $sth->fetchrow_hashref()) {
- push @$res, $ref;
+ push @$res, encode_user_stat($ref);
}
$sth->finish();
@@ -708,6 +739,9 @@ sub user_stat_receiver {
my $cond_good_mail = $self->query_cond_good_mail ($from, $to) . " AND " .
"receiver IS NOT NULL AND receiver != ''";
+ my $filter_pattern;
+ $filter_pattern = $rdb->{dbh}->quote(encode('UTF-8', "%${filter}%")) if $filter;
+
my $query = "SELECT receiver, " .
"count(*) AS count, " .
"sum (bytes) AS bytes, " .
@@ -728,7 +762,7 @@ sub user_stat_receiver {
}
$query .= "AND $cond_good_mail and direction " .
- ($filter ? "AND receiver like " . $rdb->{dbh}->quote("%${filter}%") . ' ' : '') .
+ ($filter_pattern ? "AND receiver like " . $filter_pattern . ' ' : '') .
"GROUP BY receiver ORDER BY $orderby LIMIT $limit";
my $sth = $rdb->{dbh}->prepare($query);
@@ -736,7 +770,7 @@ sub user_stat_receiver {
my $res = [];
while (my $ref = $sth->fetchrow_hashref()) {
- push @$res, $ref;
+ push @$res, encode_user_stat($ref);
}
$sth->finish();
@@ -873,9 +907,8 @@ sub recent_receivers {
my $sth = $rdb->{dbh}->prepare($cmd);
$sth->execute ($from, $limit);
-
while (my $ref = $sth->fetchrow_hashref()) {
- push @$res, $ref;
+ push @$res, encode_user_stat($ref);
}
$sth->finish();
--
2.30.2
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-11-17 15:07 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-17 15:06 [pmg-devel] [PATCH pmg-api v2 0/8] ruledb - improve experience for non-ascii tests and mails Stoiko Ivanov
2022-11-17 15:06 ` [pmg-devel] [PATCH pmg-api v2 1/8] utils: return perl string from decode_rfc1522 Stoiko Ivanov
2022-11-17 15:06 ` [pmg-devel] [PATCH pmg-api v2 2/8] ruledb: properly substitute prox_vars in headers Stoiko Ivanov
2022-11-17 15:06 ` [pmg-devel] [PATCH pmg-api v2 3/8] fix #2541 ruledb: encode relevant values as utf-8 in database Stoiko Ivanov
2022-11-17 15:06 ` [pmg-devel] [PATCH pmg-api v2 4/8] ruledb: encode e-mail addresses for syslog Stoiko Ivanov
2022-11-17 15:06 ` [pmg-devel] [PATCH pmg-api v2 5/8] partially fix #2465: handle smtputf8 addresses in the rule-system Stoiko Ivanov
2022-11-17 15:06 ` [pmg-devel] [PATCH pmg-api v2 6/8] quarantine: handle utf8 data Stoiko Ivanov
2022-11-17 15:06 ` [pmg-devel] [PATCH pmg-api v2 7/8] pmgqm: handle smtputf8 data Stoiko Ivanov
2022-11-17 15:06 ` [pmg-devel] [PATCH pmg-api v2 8/8] statistics: handle utf8 data Stoiko Ivanov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox